Acts like IPC: an interoperability cage match

The Habitat IPC configuration topic recently started to veer off into a discussion of actors, IPC approaches, etc. Understandably, @nunoo has requested that this sort of thing be handled elsewhere:

PS: as a more general note going forward, this thread was not intended as an opportunity to debate the actor model in Monty, a topic that can only benefit from its own threads, I think.

So, I’m starting a topic that @AgentRev, @DLed, and anyone else who is interested can use to argue about the plusses and minuses of various approaches. All aboard…

1 Like

Over in Habitat IPC configuration, @DLed said:

Indeed I’d not use thread or even coroutine-centric abstractions for all 3: concurrency, parallelism and distribution. The Actor Model encapsulates all without extra mental overhead. Vaughn Vernon had recently published the Python port of his actor model library GitHub - VaughnVernon/DomoActors-Py: Actor Model toolkit for Python: Fault-tolerant, message-driven concurrency.

As I had never heard of DomoActors and was concerned about interop, I asked the Goog for a rundown:

Can DomoActors interoperate with Elixir?

Yes, DomoActors (used with the Commanded library in Elixir) readily interoperate with Elixir’s native actor model (BEAM processes) because they are Elixir processes, leveraging GenServer for state management and the BEAM’s message passing for communication, allowing for seamless integration within Elixir applications for Command Query Responsibility Segregation (CQRS) and event sourcing. …

However, this is not a guarantee of drama-free interop, so I asked a follow-up question:

Does Domoactors-py play nicely with the Elixir version?

It is important to distinguish between DomoActors-Py and the Domo Elixir library, as they are separate projects by different authors.

Interaction with Elixir

DomoActors-Py (released in late 2025 by Vaughn Vernon) is a type-safe Actor Model library for Python. It is inspired by the actor model found in languages like Elixir and Erlang but is not a direct “port” meant to communicate natively over the Erlang Distribution Protocol.

  • No Native Binary Protocol: DomoActors-Py uses Python’s asyncio for concurrency. It does not natively “speak” the Erlang distribution protocol used by Elixir’s actors (processes).
  • Interoperability Methods: To use them together, you would typically use standard cross-language communication:
  • JSON over HTTP/gRPC: Treat them as separate microservices.
  • Pythonx: If you need to run Python code directly inside an Elixir node, the Pythonx library allows you to embed a Python interpreter within the Elixir BEAM VM.
  • Port Drivers: Use Erlport to manage Python processes from Elixir.

(ducks…)

You don’t need HTTP. You don’t even need delivery guarantees. If the whole thing is jammed into the same machine you could even use Unix sockets and pass raw data around, no serialization, no nothing.

Unless one sends a well-defined format via a Unix pipe, interop between different runtimes would be dangerous. I wouldn’t wonder about performance because zeromq can send millions of messages per second on either localhost or via named pipes which is shared memory anyway.

Modern json or msgpack serializers can handle millions of messages per second likewise. As we know the brain likely doesn’t need millions per second, thus I’d not worry prematurely.

PS if one adds distribution, which is inevitable: external sensors, robots, vehicles, etc, unifying the communication Ports and using various Adaptors might also be inevitable. Having one approach to serialization or even one communication approach is then likely desirable. What’s good e.g. about zeromq that one can switch from ipc (named pipes) to tcp without send/receive API changes

Duck! :laughing:
Interop must be defined. An actor is just a unit of computation. It can hold a socket or whatever other channel and thus communicate easily with other actors via a unified computing abstraction. If you want I can point you to oss demos I made connecting various tech

My take is that the performance of the message transmission infrastructure is not likely to be a gating issue for Monty in the short to medium term. So, for example, I’d be happy with JSON over ZeroMQ or any other language-agnostic, popular, and reasonably performant mashup of technologies.

In the short term, multi-module prototypes (e.g., actor networks) can be built out of OS processes, Python interpreters, etc. It sounds like DomoActors-Py might be a useful tool for this, if it finesses some of Python’s limitations.

Sticking with Python makes a lot of sense in the short term. The most obvious reasons have to do with leveraging the current code base and developer community. However, for the medium term I worry about Python’s lack of support for robust concurrency, lightweight processes, etc. I’m also concerned that Python’s OOP (as opposed to FP) orientation may cause problems down the road, but that discussion should probably be handled in another forum topic…

Running Python under Elixir (e.g., using Pythonx) might be a useful way to add some of Elixir’s robustness to the system, but (AFAIK) it still imposes the enormous memory cost of having a copy of the Python interpreter for each module. In any event, for this combination to be successful, it will need to act as an alternate, entirely optional way of running (large parts of) the Python code base.

My medium to long term preference, as expressed before in this forum, would be to rework (tweak?) Monty’s code base to support language-agnostic message handling. This would let Monty’s key algorithms be migrated into languages (e.g., Elixir, Go) that support robust concurrency, lightweight processes, etc.

Does any of this suggest a useful path forward (e.g., using DomoActors-Py and perhaps Pythonx)?

1 Like

I remember the discussion about global asynchronicity in HTM/NuPic using Erlang, someone even tried to implement it I think. Almost six years ago now, there’s probably still a video on Matt’s Twitch channel. Back then no one could answer the question about synchronization mechanism: async removes the single synchronization point, and incoming events stop being strictly sequential, which means we have to start timestamping frames, introducing time buckets, all kinds of additional complexity needed to compensate for it. It’s probably simultaneously somewhat different and somewhat similar in Monty.

But, the main point is: asynchronicity is IO. Training and inference is computation.
This means that asynchronicity can be only ever useful at the seams, and nowhere else. Network, filesystem, IPC. It has its place, it exists to solve a specific issue. And Monty will need to solve that issue if it wants to scale. But using async as a platform is just burning electricity; and it’s also burning human-hours, because asynchronous code introduces its own wealth of cases, which makes code much more complex, not even mentioning refactoring it in the first place. Instead, async needs to be injected into the code. This is how, for example, Zig is aiming to solve this problem: all IO handlers are passed around, just like memory allocators. Benefits are obvious: you can introduce your own, at any moment, whether you want to keep sync or async - everything is in your control. This is especially important during prototyping.

And this is why the reality tree of the Monty project has so many injections. It actually needs more. I also like the way @tslominski keeps bringing it up in his replies; because many of these questions are, in fact, whiteboard questions. When we’re trying to justify introducing this or that technology, or this or that approach, we need to go back to the whiteboard. What does it tell us? What problem are we trying to solve? What do we gain, and what do we lose?

2 Likes

Per @ash:

You don’t need HTTP. You don’t even need delivery guarantees. If the whole thing is jammed into the same machine you could even use Unix sockets and pass raw data around, no serialization, no nothing.

Indeed. However, computer performance isn’t always the most important metric. In an experimental, exploratory project such as TBP, using convenient (if inefficient) approaches may improve the project velocity and even the results obtained. So, let’s look at some Elixir-flavored possibilities in terms of message handling, process dispatching, etc.

Elixir Basics

Elixir runs under the control of the Erlang virtual machine (aka the BEAM). Developed a few decades ago to support digital telephone switches, the BEAM is now used for a variety of use cases, including Internet of Things (IoT) networks, large-scale web servers, etc.

It supports distributed, fault tolerant, soft real-time behavior, using light-weight processes, preemptive multi-tasking, and a number of other mechanisms. Although most new and/or current BEAM-based systems are written in Elixir, Erlang is still used for about half (!) of the world’s telephone switch code.

Any BEAM process can send a message to any other, in raw “machine” form (i.e., as a data structure), as long as it has an ID for the target. This works no matter where the target is located; the BEAM takes care of “minor” details like addressing (i.e., which processor and process are involved) and transport.

However, there is no guarantee that messages will arrive, let alone in any particular order. So, it’s a bit like User Datagram Protocol (UDP), as opposed to Transmission Control Protocol (TCP) or Hypertext Transfer Protocol (HTTP, a TCP variant).

Also, the target process is required to tell the BEAM (via a sequence of input patterns) which message types to deliver, in what order. Worse, unmatched message types can fill up the BEAM’s heap and cause the entire node (OS process) to crash. So, Best Practice is to end the sequence with a universal (accept everything!) pattern.

Phoenix Speculation, redux

As I’ve mentioned before (to vanishingly little applause :-), I think it would be both possible and worthwhile to build a “lab bench” for Monty, based on Elixir, the Phoenix web framework et al, and a selected set of “web programming” standards, e.g.:

Although this would obviously add significant overhead, it would allow Monty developers and researchers to use the (very cushy and rapidly expanding) programming environment that Phoenix and its friends provide.

I hope to put together a first cut at such a lab bench at some point, but no promises. However, in any event I want Monty’s design to avoid precluding this sort of thing…

What I do appreciate about this discussion is that it points to a much deeper story: there’s an orchestration layer that we might be overlooking. And Python might be simply not good, or good enough, at orchestration.

Now, there are several possible solutions to this. The industry standard solution these days is to have everything containerized and put into Kubernetes, possibly with some persistence layer in between (database/message queue/streaming/etc). As someone who’s been using and managing Kubernetes for many years, on a daily basis, I’m not advocating for it personally here; but I was surprised to find out that the app is not containerized yet, there’s no Docker Compose setup, it could simplify a lot - including overall prototyping and evolution of the platform: it makes it much simpler to try out/add basically anything when the only interface you have is the network, without locking yourself into a specific technology or approach, and it’s safe in terms of not having to turn the codebase inside out every time there’s something shiny and promising.

On the other hand, outsourcing managing the outer complexity to containerization means less control overall. But this is only important if we’re willing to actually deeply understand what we even need from that orchestration layer in the first place. Not just a bunch of batteries and let’s see which one of them will come in handy in the future, but like actually understand our requirements.

And also, yes, as you’ve mentioned: interoperability between languages is where things get messy. So eventually it might be that these tiny Python processes that Elixir, or anything else, is supposed to orchestrate in that scenario, will have to expose some network endpoints or send something back using network - and why do we need them in that case anyway.

1 Like

If Monty modules can communicate using standard web protocols and readily available technology, as described in my last note, supporting containers would be a drop-in. This could massively reduce the “activation energy” for setting up a Monty instance.

FWIW, there are at least two actor implementations that are based on HTTP et al. The Goog sez:

Actor implementations based on HTTP use the Actor Model’s principles (isolated state, message passing) to build scalable, concurrent web services, with popular examples including HTTP (Scala/Java) and Actix-web (Rust), which handle HTTP requests as messages to underlying actors for efficient state management, avoiding locks and complex threading, perfect for high-throughput systems by mapping REST calls to internal actor operations for business logic, database interaction, and background tasks. …

However (IMNSHO), Rust and Scala are pretty much at the opposite end of the “ease of programming” spectrum from Python. And, while I like Elixir a lot, it isn’t generally promoted as a language for newbies (though it could be…).

So, I suspect that ZeroMQ may be in the sweet spot for data exchange and the command line, HTTP, et al can handle user interaction…

Introducing JVM versus having to deal with Rust compilation times and Sync | Send everywhere is a painful choice I’d personally rather not make at all. Rust-based HTTP is one of my latest and most massive sources of severe burnout a couple years ago, not sure if anything has improved in the ecosystem since

2 Likes

(A challenger approaches)

I’m gonna side with @ash on this one. I do software engineering for safety-critical networked industrial equipment, we have a lot of synchronous code within the machines themselves for millisecond response times, but the networking is handled asynchronously of course.

In the context of robotics, you want blazing-fast speed for everything related to vision and movement compute, which async code cannot achieve on its own. It wouldn’t make sense to go toward an ecosystem that restricts either paradigm, you want both at your disposal depending on whatever needs arise.

If Monty / TBP grows to point where it can / wants to compete with SOTA models, it will likely need high data throughput between learning modules for thousands of large SDRs a second. In this scenario, a shared memory approach would likely be the primary communication path. An actor-only model at this scale would essentially be like putting a huge concrete wall with a tiny door in front of a jumbo jet.

Gotta think ahead; it’s a unified brain, not a website.

It’s also important to distinguish between simulator and learning module communication. As I said in the other thread, simulator comms can totally be networked. In the Brighton video, the team noted that it could function the same way as a cloud gaming service.

Inter-module communication is another domain completely. On that topic specifically, I think it might depend a lot on future heterarchy research by the team. I’m doubtful a networked approach would be the right choice.

The researchers’ needs and discoveries are what should drive the implementation. For example, Hawkins wants a hippocampus, that’s gonna require some form of database too, possibly an exotic custom-made one. There’s many things still in the air.

In regard to containerization, the code is currently an early prototype with regular breaking changes, and the user base is small, so it’s probably not worth investing effort specifically into containers right now. All in due time!

2 Likes

I don’t quite understand why all that means that one would need to start timestamping frames, etc. Even though messaging within an erlang cluster is causal (messages between 2 processes will not be reordered), I think, Monty’s architecture doesn’t require synchronization at all. Rather, imposing synchronization might lead the critical physical/biological constraints astray.

Asynchronicity is not just about I/O. It may be so in non-concurrency-oriented languages but so in Erlang/Elixir/Pony/etc. It’s about modeling, simplifying the programming model. It doesn’t burn extra electricity when it doesn’t have to.

Despite all oft that, the use of deep learning and LLMs and blockchains offsets all compute/envionmental costs so much that the choice of the programming language for small computing tasks (20 Watts would be great :wink:) doesn’t really matter, and one can choose for guarantees.

that’s more or less the point of the actor model. Where you want - you can be synchronous. Where you need concurrency or isolation of failures, you can have an isolated unit of computation - an actor. Which has identity. Async/await code usually does not have identities of isolatable units of computation.

E.g. in Elixir: need speed for stuff like vision: just implement it in whatever you want, C,C++,Zig,Rust :roll_eyes: .

And likewise: the Actor Model doesn’t mean any particular language or run-time. It’s only an abstraction, which people sometimes implement without knowing - but in a broken way, e.g. by sharing mutable state among actors. That said, shared-(almost)-nothing-(mutable or unsafely) thinking allows for very efficient code while avoiding burn-out of having to think about global mutable state in the engineer’s head.

I don’t see the contradiction. With a ports & adapters approach one can choose the channels wisely at each point of scale. Scale can also be quite different - e.g. many publishers with one subscriber. Or few publishers and lots of subscribers. These partterns will have to be implemented - with shmem or without, regardless of the language or the concurrency abstraction. One PITA will be managing lifetime. In the Actor Model, the actor is a perfect holder of a RAII-like resource like a piece of shmem. With zeromq ipc gigabytes per second are possible. See e.g. comparison with shmem from CERN: shm.pdf - interstingly, they show faster speeds on osx or ubuntu via tcp :smile:

The brain has slow messaging speeds, and is far from physically having a shared memory. Thus, regardless of the computing abstraction for the brain, messaging-like APIs will be needed. Sensible architectural decisions will help defer and de-risk the detailed implementation decisions

1 Like

I’m still in the discovery phase with regards to Monty’s architecture, so I’m unable to model the issue around its abstractions, but the gist of it was as follows: when you start parallelizing the load, both for training and inference - especially inference - different computational units might start processing the same load, e.g. sensory, with different speed. Imagine running an experiment on your Mac, and suddenly an internal process awakes from slumber and takes up 50% of a single core or more; meanwhile all the others operate normally. This means that all output from that single core’s workload will be processed and sent with a delay. If the system simply goes on and does not wait, i.e. there’s no loop with a single synchronization point, then the system starts to receive frames that are obsolete, which leads to distorted results, and there’s no way to correct them. It’s not an issue for learning and recognizing static objects, but for any kind of movement - absolutely.

Even more so if we consider scaling the system beyond a single CPU.

fair points! That’s what good run-times and good design using actors address! Keep synchronous things synchronous (actors process incoming messages synchronously!). Keep asynchronous things asynchronous: receiving messages are non-blocking for both the sender and the receiver.

Fair scheduling is of utmost importance, indeed! (The Erlang goes to great lengths to ensure that! You don’t want a chatty neighbor conference call make your call crappy. If it’s good enough for companies like WhatsApp spanning the world, I’m sure it’s good for us. But again, no need to use it in this case for now).

absolutely. That’s what the actor model helps with: unifying concurrency, parallelism and distribution behind the same abstraction without the complex synchronization overhead and potential to shoot oneself in the foot.

If you need some learnings, learn from the late Pieter Hintjens:

or late Joe Armstrong:

Perhaps, Greg Young can help with the architectural decisions: https://www.youtube.com/watch?v=1FPsJ-if2RU by optimizing the code for deletion.

1 Like

Regardless of their applicability to this or that particular project, these links are definitely worth checking out, thank you so much!

1 Like

My point isn’t really about the actor model itself, but more about the ecosystem. If the primary ecosystem strongly relies on the actor model as the foundation, synced code has to be duct-taped thru a second ecosystem, which introduces avoidable complexity. The primary ecosystem should ideally support shared mem, synced code, async code, and the actor model, all at once.

I wouldn’t promote shared mem as means of transfer (you don’t need a shared mem for that), but to avoid transfer as much as possible. You can have multiple threads and processes that use a single shared mem as a common canvas, e.g. a sensor module writes a large chunk of data, then wakes up an awaiting learning module that crunches the data. That CERN example is nice, but it’s not the same use-case.

Well… The brain’s not an actor model either :wink: Continuous signals with massive overlap and coupling between regions, as opposed to discrete messages and isolation with clear boundaries. Gotta leverage the hardware we have, and shared memory is the closest alternative that can efficiently do it. Neurons are units of both compute and memory, but neuromorphic chips sadly aren’t at our doorstep yet…

Any approach has trade-offs. A fine-grained ecosystem comes at the cost of simplicity, but one with lots of developer childproofing comes at the cost of latitude and performance. We shouldn’t ask ourselves “How do I propose my favorite tool as the solution?”;

“You’ve gotta start with the customer experience and work backwards to the technology. You can’t start with the technology and try to figure out where you’re gonna try to sell it.” ― Steve Jobs

People want state-of-the-art, humane AI. This requires pushing both software and hardware to their limits in every aspect. :smiling_face_with_sunglasses:

@Rich_Morin FYI: a long time ago I experimented with rewriting HTM into the Actor Model concurrency and memory-safe compiled language Pony: GitHub - d-led/htm.pony: an HTM experiment in Pony based on the Go implementation https://github.com/htm-community/htm/

and later a C node in Pony that could join an erlang cluster: GitHub - d-led/otp_pony_node: An Erlang C Node for the Pony language via ei_connect

To your doubts of interfacing - this was quite easy because when one thinks in actors, one can interconnect just about anything with the same approach. E.g. python and vlingo (java predecessor of Domo-actors) via zmq or vlingo with elixir

2 Likes