Hi all,
I used to follow Numenta quite closely about 5 to 6 years ago and was fairly familiar with HTM principles at the time. Since then, I haven’t kept up with developments, but I recently started exploring the Thousand Brains Project.
Maybe a naive question: From what I understand, TBP seems to describe a learning framework, including components like sensor modules, learning modules, voting mechanisms, etc. rather than a specific learning algorithm. Is that correct?
Also, is it reasonable to think of HTM as fitting within a learning module under the broader TBP framework?
Welcome to the community! Yes and yes, TBP is a learning framework based on the columns in the cortex and their connectivity whereas HTM Sequence Memory is a specific algorithm. It’s correct to think that HTM Sequence Memory may well be an algorithm used within a learning module - A bit more on that in our FAQs - FAQ - Thousand Brains Project
It’s a fair question as the learning modules are the part of the system we change the most as we build out implementations to mirror our research progress.
Currently there are the following Learning Modules
List of all learning module classes
Description
GraphLM
Learning module that contains a graph memory class and a buffer class. It also has properties for logging the target and detected object and pose. It contains functions for calculating displacements, updating the graph memory and logging. Class is not used on its own but is super-class of DisplacementGraphLM, FeatureGraphLM, and EvidenceGraphLM.
DisplacementGraphLM
Learning module that uses the displacements stored in graph models to recognize objects.
FeatureGraphLM
Learning module that uses the locations stored in graph models to recognize objects.
EvidenceGraphLM
Learning module that uses the locations stored in graph models to recognize objects and keeps a continuous evidence count for all its hypotheses.
They all do different things and implement different abilities. The common thing that allows them all to work together is the Cortical Messaging Protocol (CMP). Cortical Messaging Protocol
just a few more details to add to @brainwaves response:
The current LM implementations build on each other (they are all subclasses of GraphLM and we’re developed one of the other, each improving on the last version. You can find a more detailed comparison of them in this separate document if you are very interested) However, if you just want to get a general idea I would recommend looking at the EvidenceGraphLM as this is the most recent one we developed and what we are currently using for all our experiments. This page in the documentation delves a bit deeper into how this LM works.
At a very high level, the current LMs learn very explicit models by storing points at locations in a cartesian coordinate frame. Think of a 3D point cloud. This is a bit like fast, local, associative learning (hebbian learning) in the brain. We don’t use deep learning or global update rules. This comes with many advantages and is uniquely suited for learning from an ever-changing stream of sensorimotor inputs.
Contrary to HTM, the current models in our LMs don’t have a temporal component. We are actively working on this.
HTM could certainly be used as an algorithm inside an LM. In fact, we have already looked into such an LM implementation a couple of years ago at Numenta. You can find the code in our monty_lab repository here (although it is not actively maintained) One important thing to note is that the HTM algorithm needs to be combined with a mechanism to do path integration to keep track of how movement of the sensor takes us through the obejct’s reference frame and learn structured models. In the implementation I linked, we use a grid-cell-like mechanism for this.
Not sure if those extra details are useful but maybe they give a bit more context and additional links to places where you can dig deeper. Let me know if you have more questions!
Thanks a bunch for the response @vclay@brainwaves. And thanks for the amazing documentation as well.
I will dig deeper into the docs/code to understand them in detail.
Thank you for the detailed response as I am also interested in HTM. Could you point me to some resources that I could look at, from TBP (or otherwise) if say I wanted to work on implementing this myself? I’m sure it’s a monumental task however, I’m still interested.
If you are interested in working with HTM-style algorithms within Monty, I would encourage you to look at the 2019 papers from Hawkins et al and Lewis et al. In those, the original HTM was extended so as to be able to model spatial reference frames. This was done through the use of grid-cell modules.
Note that in the above work, the grid-cell modules only supported 2D space, and the input features were synthetic (made-up) SDRs. If you want to work with our existing benchmarks with 3D objects and real sensory inputs, then it would be important to think about how you will ensure the grid cells support 3D space, as well as how you will map raw sensory input into SDRs.
If you are looking for potential collaborators, it could be worth checking out this thread and reaching out to @Spencer .
Would you suggest trying to get the temporal_memory from the monty_lab repository working first? I wonder if it is newer than the code from the 2019 paper and maybe less errors will come up when trying to integrate it into monty.
Yes, codewise that might be the easiest place to start as it shows how we once integrated HTM into Monty (although this was over 3 years ago and Monty changed quite a bit since then). It also deals with encoding 3D space instead of just 2D space. We just uploaded a video of Abhi presenting on how he implemented this and the issues he encountered: https://youtu.be/XEPHjbJUpvs This might be a good place to start to get an understanding of what we tried and the problems we encountered.
No worries at all. Just to add to the items Viviane shared, it’s worth pointing out that I believe the SDR encoding of 3D location that Abhi used was an imperfect, quicker solution used to get things up and working / tested. However, I believe it did not have path integration, nor did it actually make use of 3D grid cells. Rather, it used Numenta’s older encoder for geospatial data (https://www.numenta.com/assets/pdf/biological-and-machine-intelligence/BaMI-Encoders.pdf). As such, you will likely need to re-implement location encoding such that it provides:
An internal reference frame, where the initial location representation (SDR) can be randomly initialized when learning a new object.
Unique location codes for different objects.
Path integration through this space.
Using grid-cell modules (like the Lewis et al work), but tailored for 3D would probably be your best bet.
Yes I’m currently working on integrating the SDRs from the Hierarchical temporal memory into Monty. As Neil said, one of the biggest challenges is visualization and inspection.
I’m going to write up a presentation soon of the design so far.
Associative arrays between feature SDR and phase SDRs can’t fully recreate 3D objects, instead it creates 2.5D facets, which can then be combined to recreate the 3D object. So the goal at the end of the demo would be to show that a single column model and distinguish these individual 2.5D facets of the objects. This is because the grid modules are 2D, and 3D coordinates would not translate to unique codes. The sensor motion would look at a static object view, moving the sensor patch around like the straw model, in x,y, which translates directly to rho/phi positions in the grid modules, which are then captured into the phase SDR, and associated with feature SDRs. You can’t capture 3D information with this. But the 2D facets can be distinguished, and reconstructed.
These facets would be recreated using couterfactual evidence, where you can pick a location in phase space, pick a feature, and probe the logit space to see if that feature exists in that spot, and converge on putting the correct features in the correct spots. The facets and objects can be distinguished using a context SDR which will add to the phase SDR. Later on we ca probe the associatve matrix A_CP and A_CF to get distinct objects and facets.
Later on once I prove the single column demo works, I will integrate multiple columns using product of experts logit fusion. Instead of a bunch of cortical message protocol concensus voting, you can just multiply all the logit vectors and update each associative array
So I have a semblance of a plan for the first experiment. I’m going to write up a presentation soon, but I’m still working through the details and the math. If you’re interested, I’d love to chat!
In case anyone’s curious here’s the codebase to follow.
I decided to build the demo outside of Monty because I found myself fighting against the framework in too many places. But eventually I’d like to bring it back in.