A great conversation between Jeff Hawkins and David Eagleman on his podcast “Inner Cosmos” . Check out this fascinating episode to learn more about human intelligence, how it is fundamentally different from current artificial intelligence, and how the Thousand Brains Project is working to create a fundamentally different type of AI.
If you’re curious about how neuroscience can drive the next breakthrough in AI, or just love a good conversation between two brilliant minds, this episode is a must-listen!
There is a good quote in an up and coming video from Jeff that I liked.
One way to think about Monty and with the brain is that we always think about AI systems or vision systems or whatever it is, processing visual data or processing auditory data or processing tactile data. But I think what’s exposed here and one of the strange lessons of thousand brains theory is that The brain is really a processor of space. That’s what it is. It processes space, the data type of space and pose and orientation, and the vast majority of what’s going on in your brain is processing reference frames and spaces and distances.
It is interesting how he compares the brain with current AI systems at (10:17). The fact that the brain is a sensory-motor system that learns through exploration and movement, while current AI models are primarily fed data and learn from it without experiencing the world directly.:
“Today’s AI is mostly built on deep learning of transformer technologies, which we essentially feed it. It doesn’t explore. We feed to large language models, we just feed the language. There’s no inherent knowledge about what these words mean, only what these words mean in the context of other words, right?
But you pick up a cat and touch it and feel it, know its warmth, and we understand how its body is moved because no one has to tell us that, we just experience it directly.
So there’s a huge gap between brains. Pretty much all brains work by sensory motor learning, and almost all AI doesn’t. You can just peel the layers apart and see what the differences are, and it makes a huge difference.” - Jeff Hawkins
Most research has fallen in love with (GPU style) massively parallel computation instead of decentralized ‘voting’. (To a hammer, everything looks like a nail)
And ask yourself which path concatenative, selective, exploratory evolution would have followed.
My take is that the ideas of the Thousand Brains Project require just as much parallelism than NNs implemented on GPUs today. Only difference is that they require a fundamentally different type of parallelism.
GPUs work on high arithmetic intensity: reuse weights with many inputs (batching)
One set of weights, many inputs
Decentralized voting requires many sets of weights and operate on single inputs at a time.
Many sets of weights, one input
I believe the underlying computing architecture severely constraints the learning algorithms that can scale on top of them.
Many ideas about learning algorithms are good, but only so many are scalable using today’s computing architecture.
Many believe the architecture of GPUs was fundamental to the success of Deep Learning methods [1] [2].
I personally believe that Processing-in-Memory [3] [4] systems could well be the architecture that enables the type of learning algorithms that you are talking about: decentralized voting.
Do you have other algorithms than the Thousand Brains Projects that could fall in that category? Maybe other decentralized voting systems that can be scaled to many voting units like what the Thousand Brains Project could enable?
Very interestingly put and great references! I agree, GPUs are really ideal for ANNs but not for brain-based architectures where you only need to use a tiny subset of “weights” for each specific input. Currently we are using CPUs and parallelize across those which gives much more flexibility and efficiency for an architecture like ours. But we are also looking to collaborate with groups implementing hardware that is specifically optimized for brain-like architectures. For instance this group at CMU is doing super interesting work: [2405.11844] NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors
Good paper, especially since they are the first ones (and so far the only ones?) to explore hardware for cortical columns and grid/place cells.
I remember hearing Jeff Hawkins talk about object-vector cells and learning a graph representations. Also this work from Marcus Lewis [1].
In your opinion, does this new addition fundamentally changes the necessary hardware or can NeRTCAM components be reused in a system that builds graphs of the world?
Hey Xavier, good question - I think in general it’s fair to say that a “final” version of our learning modules is going to look fairly different from how they are at the moment, e.g. as we implement things like modeling object behaviors and time. However, the hope is that we can achieve most of these algorithms with things that map on to sparse, Content-Addressable Memory mechanisms, so while the NeRTCAM might need changes, it probably wouldn’t be anything too fundamental. I should clarify that hardware is definitely not my specialty, but hopefully that makes sense?
Hi Xavier, sorry for not replying sooner, the past week has been crazy with the preparations for the open-source release of the project. Just to add to @nleadholm’s response, we’ve also been in contact with the group at CMU that developed NeRTCAM for a while now and want to make their CMOS implementation compatible with our approach.
One more point to make is that we are not limiting ourselves to representing objects as graphs or using any graph-representations inside LMs. This is what we currently do, but if in hardware there is a better approach to represent space with path integration properties, that is totally fine too. Just like the brain doesn’t use x,y,z coordinates but grid cells. The idea of Monty is that people can implement different versions of learning modules (in software or hardware) and as long as they adhere to the CMP, they can be used within Monty (although they probably won’t work well if they don’t have a good way to represent space).
Associative storage, flexible representations, sparse networks, some sort of theory of mind beyond brute force crunching, and thinking outside the von Neumann box. It feels like coming home after spending almost 50 years in the wilderness since my COSMAC days. Next you folks will start talking about Pick, Agents, and the Palm Pilot.
Do you mean different LM implementations could cohabit together in the same network? This would add an extra difficulty to align the object ID SDRs between the different LMs, right?
yes, that’s exactly what I mean. All LM inputs and outputs must adhere to the cortical messaging protocol (CMP). So you could imagine using different types of learning modules in the same system (although I’m not sure of when/why this would be necessary) but more importantly, any sensor module from any sensor modality can plug into any learning module. Also learning modules can be stacked hierarchically in any way. The fact that different LMs may encode object ID for the same object differently shouldn’t matter, they will just learn the association (at least that is the plan, there is still a little bit on that to implement).
I would be worried about a combinatory explosion if each LM has to learn the association of the SDRs of all the other LMs it is connected to. This is heavy work for a LM! If object IDs shared between LMs are represented with SDRs, then I would favor a mechanism where the SDRs creation inside each LM is contrained by the inputs from the other LMs so that SDRs are aligned from the beginning (we would need a smart algorithm for this, not an easy task).
Hi @mthiboust ,
I understand your concerns but I don’t think there is a combinatorial explosion. An LM just has to learn 1:1 mappings between input SDRs and what it is usually sensing when it receives those SDRs. The brain has to do the same thing. There is no way the brain could use a globally consistent SDR representation for each object. The neurons just do associative learning. A cortical column has no idea what the incoming spikes mean. It just has to learn to associate the incoming information with it’s current sensations. I know it may feel unintuitive to think about at first since in computers we could have all this information available but I think in the end, solving it the way the brain does it will be much more elegant.