The Thousand Brains Project: A New Paradigm for Sensorimotor Intelligence

brainwaves · December 30, 2024, 6:14pm

New Paper Alert!

@vclay, @nleadholm and @jhawkins introduce “A New Paradigm for Sensorimotor Intelligence.”

Explore how the principles of the neocortex establish a new type of AI with rapid, flexible learning - powered by sensorimotor learning, reference frames and modularity.

Read the paper:

HumbleTraveller · December 30, 2024, 11:13pm

Very exciting! Look forward to reading this tonight

HumbleTraveller · January 2, 2025, 2:30am

Still reading through this. I’m just now working my way towards the end and I have some questions.

On agent types:
If vision in humans is being delivered to us by way of photons hitting the retina, wouldn’t vision also be a kind of “surface agent” in a way? Is there an advantage to differentiating between distant and surface agents, as opposed to developing a policy that can respond to both types of action spaces equally well?

On Policies:
I really like your approach with policies. The hypothesis-testing policy is particularly interesting to me. It almost strikes me as analogous to something like the Central Exec Network (CEN) in the brain. I wonder if it couldn’t decompose complex task-states, being those could be viewed almost as a kind of abstract space in and of themselves? (you seem to hint at working towards this behavior a little later in the reading).

In relation to the CEN, have you guys entertained emulating something like the DMN? I know discussions on the default mode network can be a bit contentious among neuroscientists, but the brain is obviously doing something while at rest. My thought is that the DMN is serving as a kind of resting-state manifold learner, mapping the complex, low-dimensional structure underlying high-dimensional brain activity. Would there be benefit to enabling something like this in the TBP framework?

Lastly, the way you describe using Euclidean distances to plot graph points almost reminds me of EC-hippocampal path encoding. It sounds like you guys are working to constrain much of the brains overall functioning to these SM/LM units. Is that the case? Would you essentially be looking to emulate things like the hippocampus and cerebellum over broader, distributed sensory-motor space? If so, do you not worry about running into issues of space complexity, similar to how transformers run into issues with arbitrarily long input sequence lengths?

nleadholm · January 3, 2025, 10:14am

Thanks for the questions.

Re. agent types

It’s definitely true that these exist on a continuum, however there are some subtle points about each that inform how we approach them. These are discussed in the bullet points on “Distant Agent” and “Surface Agent” on this page - I’d be curious if this answers your question.

Re. policies

That’s great to hear you’re excited about the model-based policies, yes decomposition is a very important thing indeed! If you haven’t already seen it, you would probably be interested to read the section on Decomposing Goals in our Future Work part of the documentation.

Re. the default mode network

That’s an interesting question, I don’t think it’s something we’ve considered too much. An initial thought is that something like this might emerge naturally in a large-scale Monty system when attending primarily to internally generated information (as opposed to external sensory information). Like you mention, higher cognitive functions require that the brain learn a model of itself, and so attending to internal information would probably be a natural basis for this network.

Re. Euclidean distance and computational complexity

Let me know if the discussion in the other thread has helped with this, but otherwise happy to discuss it more.

HumbleTraveller · January 4, 2025, 1:14am

Hey @nleadholm, thanks for all these replies. You’ve been busy today!

Re. Euclidean distance and computational complexity

Nah. I think we’ve pretty much covered everything in that other thread. No need to rehash it here.

Re. the default mode network

That’s an interesting thought. I’m going to try and probe some of the other members’ views on this. That said, if it is an emergent function, I’d be curious to know it becomes constrained to “task-agnostic” cortical areas only (think frontal lobe, PFC, et cetera). As opposed to cortical areas which receive constant externally-derived throughput (e.g. occipital lobe). Also, given the anti-correlated relationship between the CEN and DMN, I would suspect they share some common principle in their functioning. There’s a niggling part of my brain that keeps wanting to explore concepts of memory equivalent capacities (similar to what’s explained here: https://www.youtube.com/watch?v=nWZhgWBdgQQ), but IDK yet. I’ll need to do some thinking…

Re. agent types & policies

Yes! I was actually wanting to explore approaches to goal decomposition. This link is great!
In regards to the agent types, I’ll need give that paper a read. My intuition says the LM should be adaptable enough to adjust to wildly different sensor-types. That ultimately those sensors should be the things responsible for dictating the resulting morphology of the LM’s overall structure (e.g. distant vs. surface agent types), not an architectural design decision bespoke to the LM itself. That said, this is 100% a me problem. Let me give that paper a read and get back to you here.

Edit: Please ignore my comment regarding agent types. Apparently I’m a dummy. I must have misconstrued something and mixed up learning modules for sensor modules. Turns out you were talking about actors in reference to the latter, which I agree with. My bad.

Edit 2: I’m not sure how feasible this would be from an engineering standpoint, but how beneficial would it be for you to emulate something like Merkel disk receptors? I would imagine it’d allow for easier form perceptions of grasped objects’. Would that make processing easier for the downstream LM’s, you think?

Edit 3: Okay, so the way you’re approaching agents in the SM space makes a ton of sense. Essentially, you have retinotopic mappers and non-retinotopic mappers (e.g. tactile mapping via touch sensor), whereby the former is providing allocentric reference frames and the latter egocentric. You’re then performing transforms between the two in LM space in order to achieve things like object invariance, and the like. Then your have your motor policies (essentially sensorimotor schema) downstream of all
of this. Is this close to what’s actually going on in the TBP framework?

Rich_Morin · January 11, 2025, 9:58am

In working my way through the paper (which is FAR more readable than most I’ve encountered!), I’ve noticed a few typos and such. What is the proper way to report these?

brainwaves · January 11, 2025, 3:52pm

If you can write them here I think that would be fine. Or if you’d prefer you can email info@thousandbrains.org.

Rich_Morin · January 17, 2025, 11:23pm

One of the capabilities listed in the paper caught my eye:

Recognizing an object and its pose by moving one sensor over the object.

I have no problem with this as a general goal for this stage of the development effort, but in most cases many sensors (and a lot of coordination) will be needed.

Even if we’re just talking about a fingertip moving along the surface of a cup, there are zillions of nerve cells in the skin and the musculature of the arm that need to be involved.

Or, consider some more real-world examples:

Hmmm; that sounds like a siren; where is it coming from?
Ouch; I have a sharp pain in my foot. Oh yeah, a stone got into my shoe.

It will be fascinating to find out how the motors, sensors, planning, and overall coordination need to play out.

nleadholm · January 24, 2025, 3:04pm

Thanks for the follow-up questions.

@mthiboust yes that’s a good description of how we’re approaching distant vs. surface agents. Re. Merkel disk receptors, yes something like this would definitely be useful. This would then become a modality-specific feature input, similar to color for Sensor Modules that detect light.

@Rich_Morin Thank you for the suggested edits. And yes re. your question: To clarify, a single sensor-module is more than a single neuron - it’s more akin to a bundle of many neurons that capture sensory information in a patch. We stress the narrow receptive field to distinguish it from the full, high-resolution image typically fed into deep neural networks.

The other thing to highlight is that perception will be very slow and laborious with only one SMs (as recognizing the world through a straw or with a single finger tip is!), however as you say this has been our focus as part of our developmental stage, in this case to ensure we have the functionality of a single Learning Module correctly implemented. Hope that makes sense.

Topic		Replies	Views
Hi from Colorado! Introductions	9	103	April 5, 2025
2024/09 - Q&A Session with the Team Video Discussions brainstorming-video	5	85	January 15, 2025
Explore the Thousand Brains Project: Online Meetup with Jeff Hawkins and Team General	9	220	December 5, 2024
Relevant neuroscience research materials Research and Theory	1	93	December 2, 2024
Hi from San Francisco! Introductions	14	200	May 24, 2025

The Thousand Brains Project: A New Paradigm for Sensorimotor Intelligence

Related topics