Elixir-based subsystem(s) for Monty?

Great questions and ideas surfacing here! Generally, Monty is designed in a way that all learning modules can be updated in parallel. There is no requirement to update them in any particular order (like for feed-forward hierarchical processing). The idea is that if a lower-level LM produces an output, it will become the input to the higher-level LM at the next step (or in a more asynchronous system, the next time that higher-LM receives/processes input).

We designed it that way since, analogous to the brain, we don’t only have feed-forward processing of input. As the attached diagram shows, there are many types of connections:

  • Classical feed-forward processing
  • Skip connections where direct sensory input arrived at LMs that are higher in the hierarchy
  • Top-down feedback to bias which object and pose are recognized in the lower LM
  • Top-down goal states decompose goals into subgoals
  • Lateral voting (which can also happen across LMs at different levels in the hierarchy)
  • Motor output to sub-cortical regions/the motor system from every LM/cortical column

These types of connections are derived from long-range connections found in the neocortex. We are currently writing on a paper about this which I am pretty excited about!

Interesting side point: Given all this connectivity, it is hard to think of the system as a hierarchical system. In LM may be getting direct sensory input but also input that has been processed by other LMs before. It is therefore hard to even define which level of the hierarchy an LM is part of. We’ve started to use the term “Heterarchy” instead since there are clearly hierarchical connections in Monty and in the neocortex, but also many non-hierarchical connections. We have a page in our documentation on this: Connecting LMs into a Heterarchy

To get back to the main question: All LMs could be updated in parallel (currently, we update them using a for loop, but there is no prescribed order in which they need to be updated tbp.monty/src/tbp/monty/frameworks/models/graph_matching.py at f2e58fdbfd72e6012d7cafe8ea086c9b0c70a5e9 · thousandbrainsproject/tbp.monty · GitHub ) In our Monty step method we currently have a series of things that happen (tbp.monty/src/tbp/monty/frameworks/models/abstract_monty_classes.py at f2e58fdbfd72e6012d7cafe8ea086c9b0c70a5e9 · thousandbrainsproject/tbp.monty · GitHub ):

  • Collect sensory inputs (getting outputs from sensor and learning modules from the previous step)
  • Step learning modules (process the input in each learning module & update hypotheses)
  • Vote (each LM can send and receive votes. Like @mthiboust mentioned, LMs don’t request votes, they just process whatever arrives)
  • Pass goal states (passing goal states from higher LM to lower LMs)
  • Pass infos to motor system (each LM produces a motor output in the form of a goal state which is sent to the motor system)
  • Check if done (this is more an experimental detail if we work with episodes that have a termination condition)

Not every LM needs to receive input at every step (in fact they often don’t, for instance, if the sensor module didn’t detect a significant change in features it won’t send a new observation to the LM and therefore the LM doesn’t need to update).

All of this could be made even more asynchronous (more like we would imagine it happens in the brain), but it would require some more serious refactoring of the code. However, an important point to note is that even if LMs are not updated, they would still need to model passing time somehow (like decaying their evidence for an object even when not receiving any input). Otherwise, they may be “perceiving” different things at different points in time. E.g. if you saccade, some LMs might receive new sensory input about something, while they receive votes from a “past” representation in another LM. Even without considering LMs that don’t receive input, if we have asynchronous updates, they should happen on a significantly faster time scale than movements are happening in the world (which I think is a fair assumption, i.e. muscular contraction speeds vs. neural conduction speeds).

Hope this helps! Happy to answer more questions :slight_smile:

5 Likes

superb input, thanks!

To my understanding, the neurons, or even whatever sub-system one looks at in the brain has state, and could be seen as independent, receiving inputs in a physically parallel fashion. Thus, I’d also not design any biologically unaligned synchronization into an implementation.

If the temporal aspect of receiving votes is taken into account in the sub-system (LM) state, neither the order, nor synchronization would be needed in an Actor System, as actors process incoming messages sequentially by design, although the messaging itself is asynchronous (by means of an inbox), and all actors are scheduled independently.

As for the strategy on refactoring to enable asynchronicity bit, I think, that is the type of effort I think that might not pay off, because of the given challenges of Python, especially with regards of unifying distribution with parallelism.

However, what might work as another experiment or a refactoring spike would be to stay with the current code, and only move the computations into any of the Python implementation of actors - for the sake of design, and not scalability. Because, when actor systems are designed, the asynchronous part can be redesigned in another technology much easier, and the synchronous part becomes a linear effort rewrite.

Check out the following, which are probably the major implementations:
Actors — Pykka 4.1.1 documentation (lightweight, it seems)
or
Actors — Ray 2.40.0 (a bit more intrusive)

I’d lean on the code-first lightweight option. This can be done incrementally - first putting the main method into an actor, and then incrementally putting those parts into actors that need to be asynchronous.

I think, once I go through the code in a bit more detail, I’d be able to run such a rewrite spike. At the moment, I ran into some divisions by zero when running one of the demo experiments. Although, I must add, trying to run actors in a non-actor-native system is usually a pain (due to the mismatch of the concurrent nature and the primary language)

2 Likes

Quick clarification (just in case)

When I was talking about requests/responses above, I’m using language that is common to working with GenServers in Elixir/Erlang, which model the request-response nature of client-servers relationships in telecom (and if you’re interested in learning more about that model, see here).

So when I was speaking of how a LM might respond to a voting/bias/goal “request”, it was within the context of the LM being a separate process/server and brainstorming the expected behavior said process would have upon receiving that message.

I hope that either helps clear up some confusion surrounding some of what I was saying before or if no such confusion existed, then I guess you can just ignore this :grinning:

2 Likes

First up, I really appreciate the in-depth response!

The point you bring up about modelling time in a more concurrent system is interesting…

The first thing that came to my mind about it was to ask the question: “is time fundamental to the LM?”

Could the time be an input from the environment? And thus and input to the sensor module and learning module?

Time as a feature maybe? Or perhaps encoded into the cortical messaging protocol as a 1st-class attribute?

1 Like

An interesting thing to also consider - messages might be more frequent / less frequent in some columns.

For example, we know that language processing columns in your brain send updates faster than in other areas because there is more myelin casing around the long range connection neurons that process language. Language Exposure and Brain Myelination in Early Development - PMC

1 Like

indeed, the discussion makes sense if a sensor or a learning module is represented in BEAM with a GenServer.

As for state above - the state could map directly to a GenServer state. As this can be of temporal nature (e.g. “last N votes”), a streaming, batching or sampling (ticking) implementation can be thought of. This is where we could align to the TBP team for the proposals

This aspect is definitely easily representable. Messages within an Erlang system can scale from 1 per year (ad absurdum) to hundreds of thousands per second if necessary (given a good design).

Since actors/processes react to messages and can send messages depending on their state, bursts of activity are representable too. E.g. keeping a buffer of last seen votes, and firing a train of notifications to other columns/modules can be thought of.

In general, a GenServer (API-formalized version of a process) does this in a function, in abstract terms:

New State, Messages Sent, Processes Started = handle_message(self, Old State, Message) where state and message can be any erlang term, even a reference to a persistent object or another process, or other data composed of these terms.

Some years ago, while reading Joe Armstrong’s (Erlang’s co-inventor) PhD, I had an epiphany that unified programming paradigms, this, self, self(), objects & actors, C++, Python and Erlang, functions and methods to me. Some (sometimes important) difference is in the execution semantics, however, the syntax is rarely an interesting distinction.

@vclay @brainwaves @tslominski if you want a very quick intro into Elixir & recognize the TBP concepts at the same time, here’s a self-contained, runnable sketch:

see the output at the end of the page on ideone.

This sketch:

  • starts 3 asynchronously running “ticking” sensor modules that send votes to whoever is subscribed
  • starts a prototypical learning module, letting it know the PIDs (process references) of the 3 sensors
  • when the learning module starts, it lets itself known to the sensor modules (aka subscribes to their messages)
  • upon a message from any of the sensor modules, the learning module adds one vote, and trims it to a predefined maximum length, and keeps that in its state
  • the Main module then sleeps a couple of times and queries the asynchronously running learning module in an RPC/request-reply fashion what the current hypothesis is, printing it to the console

I’ve annotated the lines with some elixir basics as a source for quick learning, plus some code explanation.

@naramore, on modeling time: I think, time is a “fact of life”, and could be modeled in such an approach, e.g. by tracking it, the mere fact of sending and receiving messages, timeouts, scheduled sending, but not necessarily explicitly. A neuron/column/module doesn’t have to know how much time its peers took to take a measurement of a make computation. In a real-time system, the timestamp might be needed to e.g. discard outdated data past some deadline, but I wouldn’t do that prematurely.

1 Like

I completely agree that avoiding the direct modelling of time would be preferable when talking about a more concurrent, actor-based model of Monty/TBP.

The main reason I was bringing it up in the 1st place was that, at the time, I didn’t know how necessary (or not) it would be with the current implementations of the Evidence-based Learning Module.

I’ve had more time to look at the code today, tracing the execution path from run to everywhere it touches, and my preliminary stance is that time is currently, implicitly encoded into the execution via epochs, episodes, and most importantly, steps.

Pseudo-code outline below:

run()
  expriment.train()
    foreach epoch
      foreach episode
        foreach (observation, step)
          MontyBase.step(observation)
            Monty._matching_step(observation)
              prepare_inputs
              handle_matching   // multi-threaded
              generate_next_goal_state
              vote    // multi-threaded
              pass_goal_states
              pass_info_to_motor_systems
              set_step_type_and_check_if_done

Monty (the experiment) is the synchronizer of the current algorithm, as it orchestrates the matching and voting and goal state generation to all happen at the same time for all modules in the experiment.

In order to translate this to a more Erlang-y system, using GenServers for learning modules, sensor modules, motor systems and the Environment, we will need to understand where/if synchronization (b/t the modules) is required.

For example, a Sensor Module “Server”

  • receives raw sensory data from the environment
    • and sends it along to linked Motor Systems (subcortical connections)
    • and converts the raw sensory data to cortical messaging protocol and sends it to all the upstream connected LMs

But is the Sensory Module Server periodically requesting sensory data from the environment and processing the response? Is the Environment continually pushing sensory data to all “subscribed” SMs? Is there backpressure from upstream LMs to the SMs (in case they are slower for whatever reason)?

These are just some of the questions that come to my mind in considering how a real-time SM would behave. When thinking about the real-time LMs, it becomes even more complicated b/c they have more volume and variety of connections (and implied communications).

Anyway, hopefully some of what I’ve outlined here makes sense to you all!

1 Like

Check out the link in my previous message :smiley:. Perhaps, you could fork it and demonstrate passing down a message to a motor module at a different rate than the incoming sensor messages, eg, every 5th incoming vote.

P.S. valuable pseudo-code, btw. Thanks!

1 Like

Thanks for the code!

I played around with it and altered it (quite a bit) to better support multiple LMs.

I made some assumptions here with regards voting, bias etc. The core assumption being this: that the bias/votes being slightly out of sync/old is not catastrophic.

A lot more work would need to be done I think though to prove that is true though (i.e. a less toy example closer to what the python code is capable of).

Anyway, I’d love to hear all your thoughts / questions on my code!

1 Like

Ideone looks very cool for quick and easy code sharing, but I suspect that it may not be the right platform for more substantial tasks. What about transforming these example(s) into interactive Elixir notebooks, using LiveBook?

This would let folks add Markdown documentation, hide some complexity, etc. Also, if paired with GitHub, it could provide reliable support for distributed development, version control, etc.

-r

2 Likes

@vclay - Could you provide some guidance on how the Python version of Monty would “like” to exchange messages with (say) an Elixir version? Here is a first try at a description; please amend (e.g., correct, expand) as needed…

Various Monty implementations should be able to exchange messages with each other, using:

  • data format: JavaScript Object Notation (JSON)
  • data structure: Cortical Messaging Protocol (CMP)
  • discovery: Domain Name Service (DNS) or UDP?
  • protocol: User Datagram Protocol (UDP)
  • request type: Elixir/Erlang “cast”

By way of background, each lightweight BEAM (e.g., Elixir, Erlang) process has a locally unique ID (e.g., #PID<0.123.0>). This can be made globally unique by specifying the “node” (BEAM instance) and processor. That said, it looks like the Peerage module could be used to map BEAM addressing into DNS and/or UDP discovery, making life easier for Python (etc) code.

@Rich_Morin surely, demos with livebook would be a much fancier and maybe readable option. Only, I think, there are currently no free livebook hosting options. Although, a github repo should do with proper instructions. Good idea for further demos. I only used ideone for quick iteration and as a quick intro for anyone looking into Elixir, in TBP context.

I’m a bit torn on the various aspects we 3, as in @naramore, @Rich_Morin and myself are tackling. All of these are definitely of value for a future implementation in TBP, or alongside it, based on the same principles. There are lots of sub-problems to be solved, and maybe lots of proofs of concept to be written on each of the sub-aspect.

Perhaps, we could move towards the RFC idea, with each RFC being a solution to a specific and valid problem within the TBP context.

In the context pf TBP these seem to be, e.g.

  • In Python, concurrency and parallelism are late add-ons and thus don’t provide the robustness and horizontal and vertical scalability of modules communicating via the Cortical Messaging Protocol → use lightweight BEAM(Elixir/Erlang) processes as units of concurrency, parallelism
  • In Python, concurrency and distribution are two separate concepts that require different approaches, which adds to implementation complexity → use OTP (Erlang/Elixir) clustering coupled to unify messaging between processes on one machine and in a cluster
  • Too many infrastructural dependencies may be required to run a distributed TBP system with scheduling, persistence and messaging → use OTP(Erlang/Elixir) built-in mechanisms for scheduling, distribution, storage, robustness and messaging instead of re-implementing Erlang with dependencies

now, here comes the potentially tricky part. The more I read the current implementation, the more I see that without a dedicated team (or an individual with free time), the switch might not quite fit into the TBP tight roadmap. E.g. recently @brainwaves mentioned that scaling-out is not a current priority. Thus, a couple of hybrid options for RFCs:

  • switching to Elixir will sink too much time of the small TBP team into technology, instead of basic research → incrementally introduce isolated concurrent processes as Actors in Monty in Python, creating a blueprint for a future concurrency-native solution, e.g. in Elixir.
  • switching to Elixir to test physical parallelism of e.g. robots or sensors or off-loaded learning modules would require too much rewriting up-front → treat concurrent and distributed parts of Monty as Actors, and let them communicate with each other using asynchronous messages transparently (not hiding of messaging via language-level RPC)
  • some Python modules or dependencies will not be easily portable → leave the implementation option open by using the Actor Model, e.g. connecting via a common protocol (either Erlang Clustering via e.g. pyrlang) or something more neutral, e.g. brokerless zeromq.

All of these can definitely be refined and shortened into an impactful RFC title. I’m sorry, I won’t be able to do much in the coming days. Feel free to take on.

As for the method, the inspiration is Simplicity-Oriented Design by late Pieter Hintjens. One of the things I have learned to do in situations as these is: to delay concrete decisions “till the last responsible moment”. Priorities might shift, and when they do, our time & effort investment should not feel like loss but rather learnings gained.

1 Like

I’m sorry that I won’t be able to dive too deep in the TBP aspects of your example, at least in the next days. A brief look rather looked like you’re trying to model a Monty experiment but in Elixir. I’d agree with @Rich_Morin that this might need some more structure or explanation (e.g. in comments/Markdown/LiveBook) for easier consumption, perhaps, with cross-references to Monty source code.

Perhaps, this is a worthy task (I unfortunately won’t have time to try for now) - to try and recreate a real Monty experiment, with a minimal module configuration, perhaps, with pre-trained models or pre-recorded observations? (I forgot how this debugging help was called in one of the videos).

By the way, found a good slide by @vclay on the current parallelization, related to the pseudocode you posted earlier: https://youtu.be/yJBhZkkZ-XM?si=ka3pAAtbu4-22Syk&t=2276

2 Likes

Clearly, increasing processing throughput would be a large part of the motivation for creating Elixir-based subsystem(s) for Monty. And, if and when Monty systems go into production use, processing throughput will be a Big Issue for some tasks.

However, it seems reasonable to ask whether current and near-term development efforts are being held up by the slow speed of the pure Python implementation. So, here are some questions:

  • How much time is spent running test suites?
  • Is any desirable testing being left undone?
  • Would 10x faster make a useful difference?
  • What about 100x or 1000x?

-r

Hi @Rich_Morin those are great questions. Right now speed is not our main bottleneck. Of course, it is always nice to be able to run experiments faster and more efficiently, but it is not a big enough pain point that we focus on this ourselves at the moment. The very first implementation was much slower, so we did some optimizations (summarized in the video @DLed linked), but now we have the system at a speed that allows us to run all the experiments we need in a reasonable time.
Most of our experiments actually just use one Learning Module as we are still working on the basic capabilities that each LM needs to have. For specific capabilities like voting, hierarchy, and compositional policies, we will need to run experiments with more LMs, but also there we start with minimal setups like 2-5 LMs.
At the moment, I would not say the code is in a state where it can be meaningfully scaled to 100s or 1000s of LMs. Our internal research team is focusing on fundamental capabilities of each LM such as dealing with multi object environments, modeling compositional objects, and model-based policies to manipulate the world which will all be required for applications where you may want to have more than a few LMs.

That said, I don’t want to discourage anyone from looking into speeding up the code further since, at the very least, it may give us interesting insights about what types of communication and modeling can be implemented efficiently in today’s hardware (vs. in the brain). Also it would surely be nice to have our test suites run faster (exact current runtimes are in our tables here Benchmark Experiments) but since most of the are 1LM (max 5). But I also want to be clear that we are currently focused on basic research and quick experimentation, so any changes that are directly introduced to the tbp.monty repository should increase ease of use/customization/experimentation for our researchers instead of prematurely optimizing the algorithm as it is being developed.

4 Likes

Thanks for the clarification, @vclay. It seems like the best plan for the Elixir enthusiasts in the crowd is to work on making integration possible, but try to avoid getting in the way of the Python-based basic research.

To that end, I’m making a start on a “CMP Interop” RFC that tries to lay out the standards and tooling that would be needed to support convenient and flexible cross-language interoperability. Some of the open questions are laid out in Elixir-based subsystem(s) for Monty? - #32 by Rich_Morin; if anyone can send me some clues, it would be really helpful (and appreciated).

Meanwhile, I have some related thoughts about the benefits of massive concurrency, etc. First, there’s the notion that “Quantity has a quality of its own.”, somewhat echoed in The Unreasonable Effectiveness of Data. That is, a massive scale-up could cause differences of kind in the results obtained. So, making that possibility available seems worthwhile.

Also, Elixir’s tooling for monitoring the behavior of large sets of processes could prove very useful in tracking and analyzing the dynamic behavior of large-scale Monty implementations. (It’s a bit like the way that some biology researchers are moving on from the genome to the proteome.)

-r

3 Likes

Its diistributability would allow platforms where Monty isn’t happy to run because of Habitat lock-in (such as Windows & Raspberry Pi Raspian) to make use of their unique sensors & environments. Hopefully, CMP Interop would allow the sharing of Monty Learnings

1 Like

I’m not very familiar with Elixir, but I’m using Akka (in Scala) for the actor model GitHub - micseydel/tinker-casting

A similar project, Jido, does use Elixir Jido - A SDK for Building Autonomous Agent Systems - #5 by jswny - Libraries - Elixir Programming Language Forum

1 Like