Software architecture for neural voting

Hi all,

I’ve been observing Numenta since “On Intelligence” for years, and have only slowly immersed into the theory and NuPic.

One idea I find could add to the success of software modelling a highly distributed and concurrent biological system would be a software system that has these characteristics.
I remember seeing the discussions on how to scale/distribute NuPic and thinking that this should not be done as an afterthought.

Coincidentally there’s one (and probably the only one) run-time that has been designed to support highly concurrent and distributed software systems: the BEAM (supporting e.g. Erlang, Elixir as languages and OTP as its base library). Since sending and receiving messages is part of the run-time, and the scheduler does a good job at a reasonably fair near real-time scheduling of the processes (actors), this could be a perfect fit for the Thousand Brains Project. Depending on the hardware, millions of processes running concurrently are thinkable. If linear or hardware-accelerated performance is needed, one can switch to processing in native code, and the rest on the BEAM, which also has a JIT that e.g. improves string handling significantly.

Another aspect beyond distribution and concurrency is the temporal one, e.g. interrupting unnecessary computations. This is safe in languages running on the BEAM due to the share (almost) nothing architecture. A scenario I have been playing in my mind with is: consider some form of (column?) voting in real-time: if a deadline or a different signal to stop voting and pass on the result is needed. The architecture supports this really well.

I haven’t implemented any of the actual voting or prediction mechanisms but simulated the voting here: GitHub - d-led/elixir_ne: a neural voting experiment

P.S. I have no business affiliation with Erlang but it’s the other one of my interest focus’.
P.P.S. the tech is in some way a “secret and effective weapon”: Why WhatsApp Only Needs 50 Engineers for Its 900M Users.

9 Likes

more on the run-time: The Soul of Erlang and Elixir • Sasa Juric • GOTO 2019

Upd1:

this particular passage from the IEEE press release is what this tries to address:

The project aims to mimic this neuroscience structure in AI with many cortical column-like units that can each perform a sensorimotor task, such as operate a robotic finger. These units can then communicate with each other using links that are much like the long-range connections seen in the neocortex. Hawkins believes that this modular structure will make his approach easily scalable.

This is perfectly aligned with the Erlang architecture and run-time

2 Likes

Interesting idea,

If one looks at the history of neural networks and what made them successful, it seems like implementing them in GPUs is a fundamental step [1] [2].

Research groups could train a neural network and experiment with big datasets and big models on their own compute clusters (a GPU which later on became many GPUs). Most importantly this did require Google’s scale of compute [3].

When you say

Depending on the hardware, millions of processes running concurrently are thinkable

This is precisely only possible if you consider hardware that is warehouse scale. This big downside here is obviously cost. Not many have access to a warehouse scale computer, but it may take many research groups to crack the code of the Thousand Brains Project.

Even if we consider a singular machine that is capable of running many cores, the main memory bottleneck will hit you hard limiting your capability to scale [4]. This is especially true for workloads related to the Thousand Brains Project which exhibit hard to predict memory accesses with high strides due to sparsity, and also require MIMD type of processing due to having many different modules that do not share weights. Realistically, this limits the number of “actors” to scales that are very very far from the scale of say a dog neocortex, let alone the human neocortex.

But I agree with you that software is a huge issue, and also very exciting. Have you thought about applying your ideas to Processing-in-Memory systems that may well be capable of scaling to a large number of columns? [4]

[1]: Raina, Rajat, Anand Madhavan, and Andrew Y. Ng. “Large-scale deep unsupervised learning using graphics processors.” Proceedings of the 26th annual international conference on machine learning . 2009.
[2]: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems 25 (2012).
[3]: Dean, Jeffrey, et al. “Large scale distributed deep networks.” Advances in neural information processing systems 25 (2012).
[4]: Mutlu, Onur, et al. “A modern primer on processing in memory.” Emerging computing: from devices to systems: looking beyond Moore and Von Neumann . Singapore: Springer Nature Singapore, 2022. 171-243.

1 Like

thanks for your questions and thoughts! I’ll go through them in detail a bit later.

On scalability: a million processes is still tenable on a laptop within a couple of seconds (inputs_used: 713346 meaning, 700k messages have been received before the deadline):

# demo.ex: n = 1000_000

> export ELIXIR_ERL_OPTS="+P 5000000"
> time mix run                       
starting
Started top level: #PID<0.120.0>
received: {:prediction, %{delay: 107, input_count: 1000000, prediction: [value: 70.83167408697068, reason: :deadline, inputs_used: 713346]}}
stopping
mix run  6.44s user 5.31s system 276% cpu 4.247 total

I see it as a question of whether one has the potential to have a simple model to distribute computation without changing it between distribution and concurrency on a single machine. Processes within an Erlang cluster are transparently addressable. Networking errors and process crashes are monitor-able in the cluster.

Erlang processes are very lightweight unlike OS threads. GPUs can, of course, be used as well: AI GPU Clusters, From Your Laptop, With Livebook · The Fly Blog

For the sparsity and communication overhead: since neural signalling is slower than that e.g. within a co-located cluster of computers, I think, the highly distributed architecture could work this way, especially, with opportunistic communication with deadlines.

Some form of abstraction (columns + SDRs?) will be necessary to make the computations efficient, of course. So, the actor/process does not have to represent one biological neuron - that is likely to be impossible even with today’s supercomputers. But, perhaps, that won’t even be needed

just pushed a distributed version of the experiment:

2 nodes run

Number of neurons: 1000
Neurons started on nodes: %{"a@127.0.0.1": 538, "b@127.0.0.1": 462}
received: {:prediction, %{
  delay: 123,
  input_count: 1000,
  prediction: [
   value: 66.46301708068091,
   reason: :deadline,
   inputs_used: 614
  ]}}

4 nodes:

Number of neurons: 1000
Neurons started on nodes: %{
  "a@127.0.0.1": 225,
  "b@127.0.0.1": 264,
  "c@127.0.0.1": 246,
  "d@127.0.0.1": 265
}
received: {:prediction, %{
  delay: 121,
  input_count: 1000,
  prediction: [
   value: 61.89064556554701,
   reason: :deadline,
   inputs_used: 600
  ]}}

P.S. the scheduling is a sketch only. It doesn’t have to look exactly like that (chatty) in real applications

Depending on actual algorithms behind processes, this is scalable vertically, horizontally, up or down. Utilizing sparsity and sizing the computations just right for the available hardware might get one quite far. Lightweight concurrent processes are possible. What’s behind them is flexible.

another idea: Conway’s Game of Life is a sparse system. Here’s one article demonstrating how modeling one pixel as a process works out: Simulating Conway's Game of Life with 100.000 Erlang Processes

sorry for the many separate messages. A new user can’t post more than 1 link :smile:

It seems, the idea has been in the air for more than 10 years: Handbook of Neuroevolution Through Erlang | SpringerLink (gloss over the weights part)

Because of Erlang’s architecture, it perfectly matches that of evolutionary and neurocomptational systems

Coincidentally, the Actor Model has been published in 1973 by C. Hewitt et. al. as “A Universal Modular ACTOR Formalism for Artificial Intelligence”

1 Like

Hi @DLed :wave: … you said the magic words “actor model”, so now I am compelled to comment :slight_smile:.

I also think the actor model is compelling for experimenting with scaling and distributing a Monty system. Since Monty is modular, consisting of Sensor Modules and Learning Modules, there are some apparent mappings to try out.

The best actor runtime will probably depend on its ability to handle the computations that the modules require and on the choice of what maps to an actor and what maps to a message. It may also depend on whether we want to restrict the runtime to be a virtual machine running on an existing OS and hardware, or a custom actor OS on custom hardware. Exciting times ahead.

6 Likes

Neato 1973 MIT paper. The Alan Kay and Seymour Papert references, SMALL TALK (Smalltalk), and the hints of Agent-Based Models were nice. Some AI research tends to rush right past solid work in the chase of shiny bobbles. And the literary snippets were fun.

1 Like

Hi @tslominski , hi @scidata ,

great to hear that you see the potential as well. Indeed, the Actor Model has gained a couple of new flavors and implementations recently. Perhaps, the most prominent one is the “virtual actors” idea where the actors’ lifecycle is not controlled explicitly but rather via configuration. For some well-understood problems I think, this could be a gain, however, maybe not in the cases where the actual computation implementation isn’t stable yet.

One thing that the Actor Model allows is the abstraction of the implementation of the actor, meaning that the only thing that other actors see is a transparent reference where these can address messages. This allows delaying design choices. In Hewitt’s paper one can see that the actor can be anything, even the whole computer, waiting for emails and reacting to them.

Why I think, there’s potential in Erlang and its ecosystem is that it’s open: one can always implement a computation natively (C FFI/Rust (via rustler)/as a C Node (e.g. Python via Pyrlang)), and the batteries included can be sufficient for a relatively large cluster. The strengths are in fault handling and fault tolerance. Pre-emptive scheduling, which adds to robustness and by the ability to safely abort computations. As Alan Kay says in some paper: representing concurrency in a language must be very easy so that a child can understand it, and it definitely is in Erlang.

With the Thousand Brains Theory, it seems, concurrency takes a central point, as real neurons are physically parallel. So, in the design choices I’d keep the freedom of being able to design concurrent processes at any scale. E.g. within one run-time - most likely candidate is the BEAM VM, running millions of processes on one machine if necessary. Plus it scales to a cluster transparently if necessary. A level up, a thread in a native language can also be an actor, e.g. bridging to another actor system via low-latency messaging (e.g. zeromq :smile:). Yet another level is an OS process, doing same, or the whole machine. It goes to planetary scale if necessary (Question message: …? Answer message: 42). The question is which part is truly concurrent, and designing it to be that from the beginning. This is what got me to try the trivial “voting” experiment with deadlines.

Anyway, good luck! I’ll be lurking around. The endeavor sounds like fun. If you need an outsider look at design choices, I might reflect upon them in spare time.

P.S. some posts seem to have been flagged. I guess, if this discussion doesn’t fit into “General”, would a separate category for side discussions make sense?

4 Likes

@DLed You seem to have hit one of the prebuilt flagging systems built into discourse, where new users who post lots of links get flagged for review. It seems like a good setting, I’ll increase your trust level so it doesn’t happen to you again.

1 Like

FWIW, there are a couple of Elixir subprojects that could be helpful for Monty:

  • The Nx (Numerical Elixir) library supports math-friendly data structures, code generation for GPUs, etc.

  • The Nerves Project, designed to support embedded systems, supports a lot of sensor and effector interfaces that Monty might find useful.

-r

1 Like

recently spotted as well: axon/guides/serialization/onnx_to_axon.livemd at main · elixir-nx/axon · GitHub

1 Like

FWIW, before I learned about this thread, I posted “Elixir-based subsystem(s) for Monty?”. It suggests, roughly, that a collection of lightweight Elixir processes could be used to handle compute-intensive tasks for Monty.

-r

1 Like

great to see a convergence of ideas!

I’m trying to start an implementation discussion, over in Elixir-based subsystem(s) for Monty. Like, what architecture, data formats, and protocols would make the most sense… Feel free to wander over if this sounds interesting.

-r

2 Likes

update: a sequence diagram of the demo

3 Likes

I dug out the constant which inspired me to make the voting deadline:

We propose that the presently attended location is periodically re-assessed (every ~250 ms) to confirm that it is still the most important location

These periodic disruptions in attention-related sampling may have provided our ancestors with an evolutionary advantage, e.g., allowing them to detect and therefore avoid predators while foraging.

Having concentration troubles, I remember hearing about that periodic attention disruption, and being excited about it. Perhaps, heard on the Brain Science Podcast but I can’t remember the episode.

I’m not very familiar with Elixir, but I’m using Akka (in Scala) for the actor model GitHub - micseydel/tinker-casting

A similar project, Jido, does use Elixir Jido - A SDK for Building Autonomous Agent Systems - #5 by jswny - Libraries - Elixir Programming Language Forum