Does Monty implement a biological mechanism for strengthening synaptic connections between neurons?

In the biological brain, the more frequently two neurons interact, the stronger their connection becomes — a process that involves multiple mechanisms such as changes in neurotransmitter release, receptor sensitivity, and synaptic morphology.

Does Monty implement an equivalent mechanism for synaptic strengthening?

  • If so, how is this achieved computationally?
  • If not, what is the reasoning behind omitting it?

:folded_hands: :eyes: :brain:

1 Like

Hey there @srgg6701 ,

It sounds like you’re referring to the idea of Hebbian learning (where neurons that fire together, wire together). To my understanding, yes, TBP can be considered ‘hebbian-like’. @vclay had actually answered a question very similar to this some weeks back:

4 Likes

@HumbleTraveller, thanks! :folded_hands:
Let me clarify what I meant.
I wasn’t really asking about Hebbian learning, but rather about the biological analogue of plasticity — when the strengthening of connections between neurons involves structural and physiological changes: an increase in receptor density, changes in synaptic morphology, local biochemistry, and so on.

My question is more about whether Monty implements any mechanism for dynamically changing the strength of connections between components (Learning Modules, Sensor Modules, etc.),
that could be considered a computational equivalent of biological synaptic plasticity,
or whether the system relies only on statistical updates of probabilities and connections, without a separate “plasticity layer”?

1 Like

Hi @srgg6701

We currently don’t have any mechanism to dynamically update the connections between entire components. The connectivity between sensor modules and learning modules is fixed in the sm_to_lm_matrix, lm_to_lm_matrix, and lm_to_lm_vote_matrix. In the future we imagine this connectivity to be learnable and even dynamically changing through attention, but we are currently not working on implementing this as we are still more focused on learning within learning modules.

I am not 100% sure what you mean with

whether the system relies only on statistical updates of probabilities and connections, without a separate “plasticity layer”

so I am sorry if my answer doesn’t address your question completely. As I mentioned above, there is currently no plasticity in the connections between entire Monty components. However, there is hebbian-like learning within learning modules to associatively pair features to locations. This associative learning can be very fast but can also be influenced by more global constraints on object models (fully implemented but not used or described in our recent publication), making only the most frequently observed features at locations form permanent connections that are used during inference (see our documentation on object models for more details). We don’t simulate spikes or modele molecular-level events like neurotransmitter release but instead focus on higher-level mechanisms and principles.

I hope this helps!

2 Likes

Dear @vclay,
Thank you for your thoughtful reply — it clarified many things.
Still, I’d be happy to get your thoughts on a more specific question. What I’m really interested in are the comparative principles behind Monty’s approach and the biological brain’s approach, in order to understand how exactly the function of intelligence is realized in the former.

As stated on Numenta’s website, the mission of the project is to reverse-engineer the neocortex.
That implies that:

  1. All key functions of the brain responsible for producing intelligence must be understood and reproduced to the extent that we can claim genuine intelligence has been achieved.

  2. Directly copying the biological implementation makes little sense, since Monty operates on a non-biological substrate. What matters is the principle, not the pattern of the process.

It’s in this sense that I’m asking about neuroplasticity — whether there’s any reason to replicate this mechanism in AI systems at all.

As far as I understand, neuroplasticity in the brain is largely conditioned by its biological nature:
the shorter the signal paths between neurons and the wider the transmission “bus,” the fewer resources are required — which matters a lot for such an energy-hungry organ.
However, for a digital “brain,” this probably doesn’t apply, does it?

So my final questions are these:

  1. Does neuroplasticity serve any purpose other than optimizing resource usage through the organization of neural connections?

  2. If it does, what is that purpose — and how is (or could it be) implemented in Monty?

  3. If it doesn’t, does that mean we can safely ignore this mechanism altogether?

I hope I managed to phrase my question clearly — I did my best! :blush:
In any case, thank you again for your time and for the work you’re doing.

I think he might be referring to some kind of mechanistic layer of control, one which helps modulate learning. Something analogous to neuromodulation, or perhaps the brain’s glial systems?

@srgg6701

Just wanted to chime in on your latest reponse :slight_smile:

The TBP team’s primary focus has been on the neocortex, and they’re just now starting to expand out into things like the thalamus. I suspect they’re refrain from recreating large portions of the limbic system though, as many of those systems serve as motivational subsystems. To me, it seems like the team wants to outsource goal generation to external control sources (e.g., biological humans). If you’re interested, heres a conversation we had a while ago (pertaining to incorrporating hippocampal-enterihnal complex principles): Hippocampal - Entorhinal Complex - #3 by nleadholm

In my opinion, one of the things which most sets Monty apart is its energy efficency (compared to a conventional LLM). I’d argue that this is just as important a topic in digital brains as they are biological ones, especially when we factor in things like robotics.

Re. Neuroplasticity analog in Monty’s codebase

I’m curious to know vclays thoughts too, but here is how I’ve come to view it: Each learning module within a Monty instance maintains a graph-based memory of sensory features and spatial relationships. In a way, these kind of act like cortical microcircuits.

For instance, Nodes might be considered analagous to ‘neurons’ in that they can represent surface points, curvatures, object parts, et cetera.

The Edges found between those Nodes act kind of like synaptic associations. They represent the spatial displacements and co-activation relations between shared features.

Edge weights then are adjusted when the system updates evidence for or against a given hypothesis. So when new sensory data arrive, these graphs can become expanded, pruned, strengthed depending on statistical consistancy between those inputs. Not dissimilar from principles of neuroplasticity.

I’m not sure if they plan on incorporating a form of neurogenesis. But I don’t believe they have that implemented yet.

1 Like

As far as I understand, neuroplasticity in the brain is largely conditioned by its biological nature:
the shorter the signal paths between neurons and the wider the transmission “bus,” the fewer resources are required — which matters a lot for such an energy-hungry organ.
However, for a digital “brain,” this probably doesn’t apply, does it?

Leaving biology aside for a moment, I’d like to talk about some issues that a full-scale Monty implementation might face. Since I’m a fan of Elixir and Erlang, I may lean on them a bit, but any Actor-based implementation would have similar issues. That said…

The number of edges (i.e., connections) in a fully connected graph goes up very quickly as the number of nodes increases. Considering only undirected edges, with no self-loops, the formula is “N * (N - 1) / 2”. Here’s a table of sample values:

nodes edges
1,000 499,500
10,000 49,995,000
100,000 4,999,950,000
1,000,000 499,999,500,000

Each of Monty’s Learning Modules (LMs) mimics a cortical column (CC). So, a Monty instance that emulates 100K CCs might have about five billion edges. If all of the LMs try to broadcast messages to each other at the same time, the number of message deliveries (and resulting process activations) would swamp any Erlang system on a single processing node. Nor does using a distributed system solve the problem; indeed, it would probably make things worse (because inter-node communication is comparatively slow).

However, even this greatly understates the problem. Because each CC has several levels and a huge number of neurons, LMs might need some sort of sub-module addressing (to handle the semantics of sending messages to particular parts of a target CC). In short, the numbers could easily be far worse.

So, Monty will clearly need to minimize unnecessary connectivity, message traffic, etc. In a biological system, a lot of this is handled by proximity: CCs tend to chat mostly with their neighbors. Longer-range connections are then established as needed.

FWIW, I’ve been thinking that Monty could use URI-style addressing as a way to finesse the issues of long-range connections, proximity, sub-LM targets, etc. Obviously, this isn’t closely modeled on biology, but it’s a well understood and reasonably human-friendly approach…

2 Likes

Dear @Rich_Morin,
It seems to me that your message helps us move precisely in the direction I’m interested in.
If the number of edges indeed increases exponentially with the number of nodes (as I understood it), this gives an important insight into the possible role of “neuroplasticity” in Monty.
In this case, that function clearly makes sense and, in essence, serves the same purpose — although the mechanism of its implementation would, of course, be non-biological.

That’s my conclusion. I’m looking forward to seeing this topic develop further.
Once again, thank you for the valuable insight! :pink_heart:

1 Like

I wouldn’t use the term “exponentially”, because N (the number of actively communicating nodes) isn’t in the exponent of the formula. Perhaps “quadratically” is a better (if not perfect) fit.

In any event, we clearly need to keep all of the LMs from trying to communicate with each other. So, here’s some complete Sci-Fi…

Start by using Jeff’s flattened (“dinner napkin”) visualization of the cortex. Separate this into layers (e.g., L1 … L6), then add layers for supporting actors. Borrowing from my “Mermaid Musings” thread, we might get something like this:

MH    Motor Hardware  (e.g., electric motor)
MD    Motor Drivers   (handle output geometry, etc.)
MM    Motor Modules   (map from CMP into geometry)
LM    (in L1 ... L6)  (model known cortical layers)
SM    Sensor Modules  (map from geometry into CMP)
SD    Sensor Drivers  (handle input geometry, etc.)
SH    Sensor Hardware (e.g., RGBD digital camera)

To bootstrap the system, we’d supply the actors with hints about default connectivity and activity, as:

  • Each SM should converse with the “relevant” SDs, getting data for a small region around a particular part of the digitized image.
  • Each LM in L1 … L6 should request and receive data from the relevant SMs, as well as “neighboring” LMs.

Then, while the system is operating, we’d (somehow) add more hints concerning long-range connections. (ducks)

2 Likes

Nice, Thanks @Rich_Morin and @HumbleTraveller for chiming in here. There is little that I have to add to what you already said.

As @HumbleTraveller pointed out, even though Monty could consume more energy than a brain, we would likely still not want it to if we can avoid it. And as @Rich_Morin pointed out, having hard-coded all-to-all connectivity quickly gets out of hand. Therefore, connectivity between learning modules will likely be quite sparse and ideally learned. It can be initialized in an informed way (e.g. based on neighborhood relationships) but it is likely important to keep it flexible and learnable as well. As we haven’t really scaled Monty to more than 16LMs yet, I can’t make any claims for certain, but I would actually expect it to be beneficial to move away from the all-to-all, hardcoded connectivity.

Another topic that I think @Rich_Morin might be touching on when talking about “requesting” information is attention. So not only having sparse connectivity between LMs but, in addition, filtering the incoming information further to only attend to a small part of them. While the connectivity would be more like whether a physical connection is present, attention could modulate the signals on those connections. Again, we have not introduced the concept of attention into Monty yet, so this is all still under active discussion in our research team. But even though Monty, as a super-human system, could theoretically pay attention to everything at all times, it seems like this might actually not be beneficial (at least for learning).

2 Likes

I know very little about actor implementations in general, but I have some understanding of BEAM-based systems (e.g., Elixir, Erlang). So, I’ll take a swing at this analogy, in case it helps…

Message Delivery and Dispatching

Any (lightweight) process running on a BEAM instance can send a message to any other process (or indeed, broadcast to a collection of processes). The receiving process may be just about anywhere, as long as its address (i.e., node name, PID) can be resolved and connectivity is available. For example, it could be running in:

  • the same BEAM instance (i.e., Elixir runtime)
  • another BEAM instance on the same processor
  • another BEAM instance on another processor

That said, there are no guarantees about delivery, let alone timing. The system simply makes a “best effort” to deliver the message to the recipient’s incoming “mailbox”.

Complicating matters somewhat, the recipient isn’t required to accept messages in the order they were delivered. Instead, it can set up a series of dispatching patterns which control the order in which incoming messages are dispatched.

Roughly speaking, the BEAM uses the patterns for the recipient, in a highly optimized manner. It scans the recipient’s incoming mailbox, using the first dispatching pattern. If that fails, it falls back to the second pattern, etc. If and when a match is found, the designated function is called to handle the message contents.

FWIW, here is a link to a ChatGPT summary.

Caveat: If the number of incoming messages exceeds the recipients’ ability to process them, their mailboxes will grow in size. And, if there is no matching pattern, a message could remain in a mailbox forever. All of this can lead to memory and/or processing issues.

Sub-Module Addressing, Dispatching, Filtering, etc.

Diagrams of cortical columns show several levels and sometimes a lot of internal structure (e.g., many different types of neurons). So, using a single address for (say) a Learning Module could get in the way of efficient dispatching, as well as complicating the LM’s design.

In an Elixir implementation, this might be handled in various ways. Typically, messages would contain one or more symbols (e.g., :L1) which the modules’ dispatching patterns could use for matching.

Alternatively, because BEAM processes are pretty cheap, the LM could be divided into pieces and a “front end” process could dispatch, edit, and/or filter (i.e., modulate?) the incoming messages as needed.

Finally, note that there is nothing keeping a process from sending messages to itself. So, it might handle :L1_foo messages by sending related messages with other symbols (e.g., :L2_bar).

Note: I have no clue about how a Python implementation would handle this sort of thing. Comments and clues welcome…

Connectivity, Attention, etc.

Let’s compare neural and BEAM-based connectivity and attention a bit (corrections welcome!) …

  • Neural connectivity is limited by “wiring” constraints (i.e., axons, dendrites, synapses). In contrast, the BEAM allows any process to send messages to any other process.

  • In neural connectivity, the interpretation of a “message” is defined by the wiring of the sending and/or receiving neuron(s). The BEAM’s message dispatching is based on addresses and internal message cues.

  • Neural message timing is controlled by physical issues; a BEAM-based emulation would have to create its own timing support.

  • Neural attention is controlled by the nature of the message, the state of the receiving neuron(s), the synapses involved, etc. A BEAM-based Monty implementation would have to emulate all of this using addresses, message cues, and the state of the receiving process(es).

Although LMs could be told to “pay attention” to specified modules and/or sensor regions, this could force the BEAM to carry a lot of unwanted messages. A better approach might use something like the publish–subscribe (aka PubSub) pattern.