Hi, all! Thanks for make this great project open source!
My name is Ely Matos. I’m a researcher at FrameNet Brasil Lab (from Federal University of Juiz de Fora - Brazil), a Computational Linguistics Lab, working mainly with Computational Cognitive Linguistics. I’m a newbie at TBT, so excuses in advance for silly (or already answered) questions.
I’m trying to understand if the idea of “displacement cells” from previous paper (“A Framework for Intelligence…”, 2019) was used in the current implementation of Monty, as it seems the idea of “grid cells” was used to represent poses. It seems that the idea of “displacement” was computationally implemented and not directly implemented at neural network. Maybe because this implies in a “neural link” between different columns, I guess.
I’d like also to know if there are some motivations to implement different “learning modules” (one for each column) and not a unique “shared” module for all columns (in a region, e.g.). Trying to be more biological plausible, it is hard to imagine how the CCP are related to lateral connections.
Thanks for attention,
Ely
Hey there Ely, glad to see you here!
I’ll have to read up on that paper you’re referencing before I can try and assist with your displacement cells question. As to your second question though, to my knowledge, the TBP team is seeking to implement a single, general-purpose learning module, avoiding specialization. Also, I believe you are correct in observing that each learning module represents a single cortical column. I believe this is the teams intent.
A useful way of viewing LM-CCP relations (for me at any rate) would be to equate them to the cortico-thalamic loops found between cortical columns and the thalamus.
Also, for what its worth, there have been a few of us who’ve asked the TBP team if they intended to incorporate subcortical structures into their framework (e.g. EC-Hippocampal functioning). As it currently stands, no. They do not anticipate doing this. Although they also stressed that theres no reason why another person couldn’t do this, only that they themselves had other, more critical areas of focus.
Hopefully this helps answer some of your questions. If theres anything else you would like to know, please don’t hesitate to ask! If I don’t get back to you tonight, I’m sure someone from the team will respond come morning
Edit: Got a chance to look at that paper you mentioned (A Framework for Intelligence). To the best of my knowledge, while displacement cells aren’t explicity incorporated into Monty, you can still find their underlying principals in the frameworks design. Specifically, within the graph-based representations of object models.
The graphs’ edges can encode displacement-information between observed features (graph nodes). There’s also likely something to be said about the relative spatial displacement of features between disparate reference frames, though that may not be as direct a correlate as something like the graph edges.
I haven’t seen any real characterization of the “graph-based representations of object models”. Could someone point me to relevant resources, preferably at a good level for a n00b?
Hey there,
I’m a little confused by your question (could just be that it’s late and I’m tired). But if you’re wondering about my comparison, I pulled the info out of their recent “new paradigm” paper (section 9.6, I believe): [2412.18354] The Thousand Brains Project: A New Paradigm for Sensorimotor Intelligence
Cool! I’ll peruse that and see what questions it raises for me.
I read sections 9.4-9.8 of the paper and am significantly enlightened. There isn’t any discussion of the exact data structure(s) used for the graph; is this documented? I also wonder about which (and in what manner) LMs should be selected as “representative nodes” for topical regions in the heterarchy. Fun stuff…
FWIW, it occurs to me that it might be interesting and useful (e.g., for post-run analysis) to export the generated graphs to a suitable database. There are a couple of graph databases that I’ve been looking at for other projects: Neo4j and ArangoDB.
Neo4j has been around for decades; it is very performant, robust, and well supported. One of the things that I like about Neo4j is that the graph linkage is all stored in memory, so following an edge simply requires dereferencing a pointer (i.e., FAST). Of course, this does impose some limits on how large a graph can be, but it should be fine for a lot of Monty’s experimental needs.
ArangoDB is actually a multi-model database, so it can act as a document database, graph database, etc. It uses JSON as its storage format, which could really ease interoperability and such. There is tooling to migrate graphs from Neo4j to ArangoDB, so one could conceivably set up both to handle short and long term storage needs.
Both projects are looking into making their databases usable by LLMs. I think this could be very useful in trying to understand Monty’s behavior.
Thats going to be more of a codebase-type question. I can poke around this weekend maybe, if I have the time, and try to get back to you. But yeah, the general paper won’t have that. Seems more geared for public consumption.
And thats an interesting thought on graph databases. Admitably, I’m a little unfamiliar with both of those. Would they just be used for something like knowledge graph generation? Also, if ArrangoDB takes JSON format, couldn’t you just serialize LM output into it? Could be pretty interesting…
I could see them being used for knowledge graph generation, analysis, etc. For example, in order to capture and export a topically-related set of LMs to another Monty instance, I’d want a way to interrogate the graph about what Monty “thinks” it’s modeling.
And yes, it should be trivial to serialize LM output into either JSON or Neo4j’s Cypher graph query language. FWIW, there are various other possible encoding formats (e.g., Turtle for RDF).
Sorry for chiming in so late, the last two weeks have been crazy. Thank you @HumbleTraveller for answering questions in such a great way! It’s really nice to see
Just a few more further resources from my side if @Rich_Morin you are interested to dig deeper:
- Our documentation on the learning module goes into some detail on the object model representations (although a lot of this is already covered in the paper that @HumbleTraveller pointed out)
- For a super detailed investigation of different LM versions and the way they use graphs to recognize objects we also have a separate writeup here: Overleaf, Online LaTeX Editor
- If you prefer to look at concrete code, I would recommend the
object_model.py
script as a starting point
Hope this helps!
-Viviane
Thank you for the additional info @vclay, it’s very helpful. Out of curiousity, you guys don’t have overleafs for other Monty componants, do you? (e.g. for the CMP)
No, we have some additional writeups for subprojects (mostly stuff in the monty_lab repo GitHub - thousandbrainsproject/monty_lab: Our day-to-day experiment files and data analysis scripts.) but if they were not discontinued then all the crucial information is in the documentation. Is there some specific information on the CMP that you are looking for?
Is there some specific information on the CMP that you are looking for?
Oh, no. Nothing like that. Was just more curious as to whether there was additional info outside of the documentation, that’s all.
Hi! I’d like to quickly return to original question, just to confirm if my impressions are correct: it seems the idea of “grid cells” was used to represent poses, but the idea of “displacement cells” don’t have a direct correlate, as displacement are computationally implemented. Does it make sense?
@HumbleTraveller We try to have everything in the project be publicly accessible. So unless the documents were super outdated we incorporated them in our documentation or as readme’s in monty_lab. Even the overleaf doc I shared is linked to in the documentation
@ElyMatos Yes, our first approach (DisplacementGraphLM
) was actually trying to use the idea of displacement cells to model and recognize object and it had some very nice properties (automatic rotation, translation and scale invariance without having to explicitly test rotations like we do with the FeatureGraphLM) but ultimately we decided it wouldn’t work so we abandoned the idea for now and went with representing objects as features at locations.
The main issue with the displacement representation is that you can’t interpolate between displacements the same way as you can with locations. That means, to recognize an object you have to sample the same displacements that are stored in the model. This is a huge limitation. For a more detailed writeup of this issue and a comparison between the displacement vs. features@locations approach you can see the overleaf document (particularly the table on page 5 and section “3.3 Problems with Predictions Using Displacements” on page 17).
The code for the DisplacementGraphLM is still in the tbp.monty repository and you can still run experiments with it if you are curious. Every once in a while we come back to this idea and think about whether we can do a hybrid approach, since the displacement matching also has some significant advantages.
- Viviane
Thanks @vclay for very comprehensive explanation!
The motivation for my question is that I trying to figure out how Monty could be used for language - specifically sentences. As (at first) a sentence has one dimension, the relation between words could be handled by the displacement idea (as, at first again, we wouldn’t have 3D operations - e.g. rotation).
I’ll check the mentioned docs, thanks!
Nice! I am curious to see where you will go with this. Generally, I would say, in humans, language comes last. We first learn physical models of the world by interacting with it and then ground the language in those models. A child can very quickly learn an association between a word and the object it refers to (“fast mapping”). I think grounding language in the physical world instead of simply learning statistical regularities between words is pretty crucial to our robust and quick understanding (no internet scale dataset needed for humans to learn language). How Monty would best model language is an interesting topic to explore!
Just one more thing to highlight (maybe you are already aware of this, but I just thought I’d clarify this) is that also, with the features@location approach, we use displacements to recognize the object. Each object is learned in it’s own reference frame (locations of features on the object relative to each other) so an object can be observed at one location in the world but automatically be recognized in any other location in the world. The incoming displacements of the sensor (relative to the body) are transformed into the reference frame of the hypothesized object and then compared to the information stored in that object’s reference frame. I hope this makes sense?
Best wishes,
Viviane