To start, this is meant to be a bit tongue-in-cheek. More fun dialog than anything serious. That said, what sets Monty apart from other forms of AI?
I watched the video comparing Monty to transformer architecture and while I have opinions on that thats not the kind of comparison I find particularly interesting. I want to know how Monty compares to other neurologically inspired architectures.
For instance, how is Monty an improvement over something like convolutional nets? Or even better, how does Monty compare to something like JĂźrgen Schmidhuberâs Neural History Compressor?
For those unfamiliar with the architecture, here is a description from JĂźrgen himself:
What is a Neural History Compressor?
âIt uses unsupervised/self-supervised learning and predictive coding in a deep hierarchy of recurrent neural networks (RNNs) to find compact internal representations of long sequences of data, across multiple time scales and levels of abstraction. Each RNN tries to solve the pretext task of predicting its next input, sending only unexpected inputs to the next RNN above. This greatly facilitates downstream supervised deep learning such as sequence classification. By 1993, the approach solved problems of depth 1000 (requiring 1000 subsequent computational stages/layersâthe more such stages, the deeper the learning). A variant collapses the hierarchy into a single deep net. It uses a so-called conscious chunker RNN which attends to unexpected events that surprise a lower-level so-called subconscious automatiser RNN. The chunker learns to understand the surprising events by predicting them. The automatiser uses my neural knowledge distillation procedure of 1991 [UN0-UN2] to compress and absorb the formerly conscious insights and behaviours of the chunker, thus making them subconscious. The systems of 1991 allowed for much deeper learning than previous methods.â
Is Monty a natural evolutionary ânext stepâ for this type of approach, or is it something more? If it is a paradigm shift, then how so?
Great question! To our knowledge, there is no approach quite like ours out there and it is a significant paradigm shift to current AI. Did you have a chance to look at our whitepaper? https://arxiv.org/pdf/2412.18354
I would particularly recommend the sections â2.2 Core Principlesâ and â2.3 Challenging Preconceptionsâ. Section 2.2 outlines the foundational principles outlined in the Thousand Brains Theory which our system is based on. There arenât many AI systems out there that adhere to one or even several of them. Section 2.3 gets a bit more specific on some assumptions that are often made in todayâs AI which do not apply to our system.
For a more detailed discussion of how Monty compares to specific other approaches, I would recommend having a look at this section in our technical FAQ: FAQ - Monty
The neural history compressor specifically seems quite different from Monty in many aspects. Some that jump out right away are:
No use of reference frames that can be used for rapid learning, generalization, and path integration.
Using a huge amount of hierarchical layers (>1000) whereas the brain is much more shallow (see for example How deep is the brain? The shallow brain hypothesis - PubMed ). Weâve thought a lot about this and pretty much any problem can be modeled with only 2 levels of hierarchy and shifting a window of attention around that. If you think about it, you never perceive more than two levels of parent-child relationships and there is certainly no need for >1000.
Use of backpropagation (see our FAQ on why we donât use backprop in Monty: FAQ - Monty )
No learning through sensorimotor interaction. Monty is fundamentally a sensorimotor system. Not just that, every individual column/learning module receives motor input and outputs motor signals (as opposed to the classical view of AI in sensorimotor applications where we have many layers of sensory processing followed by the motor processing and output).
There are many other differences, like the lack of the concept of powerful sub-processing units (learning modules) that can model complete objects and the lack of the concept of policies, ⌠the links above go into more details on those.
Awesome response. Yes, I have read the whitepaper you guys released. It was very good
Though I havent read up on the shallow brain hypothesis. Looks like itâll be a fun read though!
One of the other things I find interesting about Monty (that you donât get with too many other architectures) is the modularity of the system, namely through the CMP. Ultimatly I think this will give Monty a kind of adaptability that we donât see much elsewhere.
I remember someone noting that a four year old child can typically recognize a displayed image of a dog and push a button in half a second. So, if a neuron takes about 5 ms. to function, there canât be more than 100 neurons in the shortest path between the retina and the muscles moving the finger.
Guessing that half of these are on the âinputâ path and that half of these are sub-cortical, this gives us an upper bound of 25 levels of cortical neurons for this task. A lot more than two, but also a lot less than 1000 (:-).
I wouldnât equate a neuron with a hierarchical level. That is maybe the case for deep neural networks but when I talk about levels in the hierarchy I am talking about V1, V2, V4,âŚ
Each cortical column contains several thousands of neurons and encodes information through a population code. A cortical column itself is made up of several layers (classically described as 6 layers but if you look close you can divide some of them into more sublayers) with intricate wiring between them. When I talk about levels in a hierarchy I also didnât refer to the layers within a cortical column but the different regions in the brain. A hierarchical relationship in neuroscience is classically described as a lower-level column sending output from layer 3 to layer 4 in the higher level column. The higher level column connects back to the lower level column (top-down connection) from itâs layer 6 to the lower columnâs layer 6 and 1. This kind of connectivity and classical definition of hierarchy in the neocortex is for instance described here https://academic.oup.com/cercor/article-abstract/1/1/1/408896?redirectedFrom=fulltext&login=false (and the paper I mentioned in my previous response)
Following this definition, there are certainly more than two hierarchical levels of processing in the primatehuman neocortex (probably around 4-5, i.e. V1, V2, V4, posterior inferior temporal cortex (TEO) and the anterior inferior temporal cortex (TE)), but less than 25. When I discuss two levels of hierarchy being used at a given time, Iâm referring to attention being brought to bear on any two of these ~4-5 levels of hierarchy.
Another important thing to point out again is that information doesnât always have to flow through the entire hierarchy of cortical processing before we can begin to generate an action output. Even V1 has projections to subcortical motor regions and can, for instance, directly control saccades of the eyes.
Nice, thanks for sharing that! Iâll add another one here (from the shallow brain hypothesis paper I posted in my first reply) which I think captures a bit better that all of the levels of the hierarchy project to subcortical structures.
Nice! This is a pretty good diagram (still need to read that paper you originally linked).
Also, in regards to your comment about ânot equating a neuron with a hierarchical level,â I agree. For a bit now Iâve been equating cortical columns themselves as to âunitsâ within an ML model. The layers of those units then being divided up as youâve described (V1, V2, et cetera). Is this an appropriate way to conceptualize these?