What sets Monty apart?

HumbleTraveller · January 17, 2025, 8:41pm

To start, this is meant to be a bit tongue-in-cheek. More fun dialog than anything serious. That said, what sets Monty apart from other forms of AI?

I watched the video comparing Monty to transformer architecture and while I have opinions on that thats not the kind of comparison I find particularly interesting. I want to know how Monty compares to other neurologically inspired architectures.

For instance, how is Monty an improvement over something like convolutional nets? Or even better, how does Monty compare to something like Jürgen Schmidhuber’s Neural History Compressor?

For those unfamiliar with the architecture, here is a description from Jürgen himself:

What is a Neural History Compressor?

“It uses unsupervised/self-supervised learning and predictive coding in a deep hierarchy of recurrent neural networks (RNNs) to find compact internal representations of long sequences of data, across multiple time scales and levels of abstraction. Each RNN tries to solve the pretext task of predicting its next input, sending only unexpected inputs to the next RNN above. This greatly facilitates downstream supervised deep learning such as sequence classification. By 1993, the approach solved problems of depth 1000 (requiring 1000 subsequent computational stages/layers—the more such stages, the deeper the learning). A variant collapses the hierarchy into a single deep net. It uses a so-called conscious chunker RNN which attends to unexpected events that surprise a lower-level so-called subconscious automatiser RNN. The chunker learns to understand the surprising events by predicting them. The automatiser uses my neural knowledge distillation procedure of 1991 [UN0-UN2] to compress and absorb the formerly conscious insights and behaviours of the chunker, thus making them subconscious. The systems of 1991 allowed for much deeper learning than previous methods.”

Is Monty a natural evolutionary ‘next step’ for this type of approach, or is it something more? If it is a paradigm shift, then how so?

vclay · January 20, 2025, 3:58pm

Great question! To our knowledge, there is no approach quite like ours out there and it is a significant paradigm shift to current AI. Did you have a chance to look at our whitepaper? https://arxiv.org/pdf/2412.18354

I would particularly recommend the sections “2.2 Core Principles” and “2.3 Challenging Preconceptions”. Section 2.2 outlines the foundational principles outlined in the Thousand Brains Theory which our system is based on. There aren’t many AI systems out there that adhere to one or even several of them. Section 2.3 gets a bit more specific on some assumptions that are often made in today’s AI which do not apply to our system.

For a more detailed discussion of how Monty compares to specific other approaches, I would recommend having a look at this section in our technical FAQ: FAQ - Monty

The neural history compressor specifically seems quite different from Monty in many aspects. Some that jump out right away are:

No use of reference frames that can be used for rapid learning, generalization, and path integration.
Using a huge amount of hierarchical layers (>1000) whereas the brain is much more shallow (see for example How deep is the brain? The shallow brain hypothesis - PubMed ). We’ve thought a lot about this and pretty much any problem can be modeled with only 2 levels of hierarchy and shifting a window of attention around that. If you think about it, you never perceive more than two levels of parent-child relationships and there is certainly no need for >1000.
Use of backpropagation (see our FAQ on why we don’t use backprop in Monty: FAQ - Monty )
No learning through sensorimotor interaction. Monty is fundamentally a sensorimotor system. Not just that, every individual column/learning module receives motor input and outputs motor signals (as opposed to the classical view of AI in sensorimotor applications where we have many layers of sensory processing followed by the motor processing and output).

There are many other differences, like the lack of the concept of powerful sub-processing units (learning modules) that can model complete objects and the lack of the concept of policies, … the links above go into more details on those.

I hope this helps!

Viviane

HumbleTraveller · January 20, 2025, 4:23pm

Awesome response. Yes, I have read the whitepaper you guys released. It was very good

Though I havent read up on the shallow brain hypothesis. Looks like it’ll be a fun read though!

One of the other things I find interesting about Monty (that you don’t get with too many other architectures) is the modularity of the system, namely through the CMP. Ultimatly I think this will give Monty a kind of adaptability that we don’t see much elsewhere.

Rich_Morin · January 21, 2025, 10:24am

I remember someone noting that a four year old child can typically recognize a displayed image of a dog and push a button in half a second. So, if a neuron takes about 5 ms. to function, there can’t be more than 100 neurons in the shortest path between the retina and the muscles moving the finger.

Guessing that half of these are on the “input” path and that half of these are sub-cortical, this gives us an upper bound of 25 levels of cortical neurons for this task. A lot more than two, but also a lot less than 1000 (:-).

vclay · January 21, 2025, 11:39am

Hi Rich,

I wouldn’t equate a neuron with a hierarchical level. That is maybe the case for deep neural networks but when I talk about levels in the hierarchy I am talking about V1, V2, V4,…

Each cortical column contains several thousands of neurons and encodes information through a population code. A cortical column itself is made up of several layers (classically described as 6 layers but if you look close you can divide some of them into more sublayers) with intricate wiring between them. When I talk about levels in a hierarchy I also didn’t refer to the layers within a cortical column but the different regions in the brain. A hierarchical relationship in neuroscience is classically described as a lower-level column sending output from layer 3 to layer 4 in the higher level column. The higher level column connects back to the lower level column (top-down connection) from it’s layer 6 to the lower column’s layer 6 and 1. This kind of connectivity and classical definition of hierarchy in the neocortex is for instance described here https://academic.oup.com/cercor/article-abstract/1/1/1/408896?redirectedFrom=fulltext&login=false (and the paper I mentioned in my previous response)

Following this definition, there are certainly more than two hierarchical levels of processing in the primatehuman neocortex (probably around 4-5, i.e. V1, V2, V4, posterior inferior temporal cortex (TEO) and the anterior inferior temporal cortex (TE)), but less than 25. When I discuss two levels of hierarchy being used at a given time, I’m referring to attention being brought to bear on any two of these ~4-5 levels of hierarchy.

Another important thing to point out again is that information doesn’t always have to flow through the entire hierarchy of cortical processing before we can begin to generate an action output. Even V1 has projections to subcortical motor regions and can, for instance, directly control saccades of the eyes.

Viviane

HumbleTraveller · January 21, 2025, 2:11pm

If it helps, here’s a diagram of the layers. Granted this goes a little beyond pure cortex, but still, I think it captures what you’re describing.

vclay · January 21, 2025, 3:57pm

Nice, thanks for sharing that! I’ll add another one here (from the shallow brain hypothesis paper I posted in my first reply) which I think captures a bit better that all of the levels of the hierarchy project to subcortical structures.

HumbleTraveller · January 21, 2025, 4:32pm

Nice! This is a pretty good diagram (still need to read that paper you originally linked).

Also, in regards to your comment about “not equating a neuron with a hierarchical level,” I agree. For a bit now I’ve been equating cortical columns themselves as to ‘units’ within an ML model. The layers of those units then being divided up as you’ve described (V1, V2, et cetera). Is this an appropriate way to conceptualize these?

vclay · January 22, 2025, 7:34am

Yes, exactly. This is how we usually think about hierarchy and hierarchical levels and how it is commonly understood in neuroscience.

fcred · January 22, 2025, 2:20pm

A great discussion of the paper “How deep is the brain? The shallow brain hypothesis”, presented by one of its authors, Jaan Aru

brainwaves · January 24, 2025, 1:38pm

here’s the paper mentioned - https://gwern.net/doc/psychology/neuroscience/2023-suzuki.pdf

Topic		Replies	Views
Using Monty for Sensorimotor Learning Over Industrial SCADA Sensor Layouts Monty Code	3	90	April 24, 2025
Inquiry Regarding the Monty Framework’s Generalizability Research and Theory	3	94	December 8, 2024
2024/01 - Current Capabilities of the first TBP Implementation, Monty Video Discussions core-video	3	151	December 10, 2024
2023/03 - Monty Compared to Transformers Video Discussions review-video	0	60	December 12, 2024
New Tutorials on Using Monty in Custom Applications Monty Code	0	58	April 28, 2025

What sets Monty apart?

Related topics