Hello, and thoughts about leveraging LLMs in Monty

Hello ThousandBrains community!

To briefly introduce myself, I have been following the work of Jeff Hawkins and Numenta since I read ‘On Intelligence’ many years ago. I had the honor to Skype once with Jeff when I was a young dev consultant, advising one project to try Numenta’s HTM (that was before Deep Learning was a thing).

15 years later, I am thinking about topics for a postdoc after my Ph.D. on the participatory design of telerobotic puppets. My experience in machine learning is limited (mostly using, some training and fine-tuning), but it seems that there is a dire need for more human-centered and sustainable approaches to creating AI.

In my vision, I imagine a participatory workshop where participants try to explain their deep thoughts about life to a ThousandBrains robot (or puppet), maybe teaching it how to behave in their culture. I realize that this is a far stretch, even for a long-term project, but maybe the first step could be to try and leverage existing LLMs to fast-track Monty into understanding language. Specifically, I’ve been wondering (and of course discussing with LLMs) if it’s possible to hook up some middle layer of a pre-trained transformer (maybe Llama?) and convert it to CMP so that Monty understands semantics but learns context and action-predictions on its own. I have seen some discussions of similar ideas in this forum but didn’t find any concrete implementation suggestions.

My hope is that this approach could be sustainable, since the big pre-training of LLMs is already done and the Monty integration might even run on edge devices? Also, I feel that providing an interactive learning experience that uses language could show the main thing that Monty can do and LLMs cannot, which is actual learning of concepts and behavior, not just maintaining a stack of text prompts as ‘memory’ (like in the film Memento).

Looking forward to your thoughts!
/Avner

3 Likes

Hi @avner.peled , welcome to the forums!

It sounds like an interesting project. I don’t want to put you off combining Monty with LLMs - there could certainly be scope for some interesting demos there, but just a few thoughts that might be helpful.

Re. a system learning about someone’s culture through conversation - as you might be aware, deep learning systems (including LLMs), are notoriously bad at continual learning, and also learning with limited amounts of data. While there are approaches that can be taken with fine-tuning or in-context learning, having a system that learns dynamically and quickly from a conversation with a person is not something that current technology (Monty or deep-learning) would support well. However, it is the kind of thing we think Monty would excel at in the long term.

On the note of long-term research and language, @vclay made a really nice post about how we think about language and Monty. As described there, we think it would be a mistake to try to shortcut language understanding with LLMs, which do not have grounded, structured language concepts, but rather have become statistical text-prediction systems. Of course, I don’t want to stop you from exploring whether Monty could be controlled with voice commands using an LLM as an interface - that definitely sounds like it could make for some interesting demos. However, in terms of a long term solution to the problem you’re describing, I think it will unfortunately have to wait until we have language capabilities in a thousand-brains system like Monty.

If you’d like to contribute to our roadmap (including during your post-doc) to get us there quicker, please do check out our How You Can Contribute Page. We’d love to have you involved.

1 Like

Thank you Niels for the thorough answers and references.
I want to try to refine my question and get your feedback on a specific idea that I presented, that is, using only a middle layer of an LLM as a gateway for Monty to learn a linguistic model.
For example, here is one paper analyzing the roles of different layers in LLaMA. What I was hoping is to shortcut only the fundamental part of language, basic syntax, basic meanings of words and sentences (maybe also through audio). Then, let Monty develop real structural concepts and nuances that are a combination of the LLM’s basic processing of speech and Monty’s higher-level spatial reasoning. Does that make sense?

1 Like

No worries, yeah I think that taking an approach like this could have a variety of interesting use-cases.

To clarify, without grounding from the beginning, the system is unlikely to develop a robust mapping between language and sensorimotor concepts as diverse as “on”, “below”, “within”, “open” etc. However, I can imagine how it could map between language and things like object labels.

If this enables you to explore an application you are interested in, like issuing a voice command to have a Monty robot retrieve an object that is named, then that is great! (bearing in mind that manipulating the environment is also something that is still in our research pipeline)

Overall I think your project sounds really interesting, I’m just concerned it might be a bit ambitious given the capabilities of both Monty and LLMs today. If you are interested in an interactive voice+Monty demo, I would suggest simplifying at least the first version to something like getting Monty to point to a named object. We are close to supporting mutli-objects in Monty (see for example this policy that would need to be implemented first), so I think that would be doable.