Abstract Concept in Monty

Hi @vclay @nleadholm @jhawkins
I understand that for modelling the objects in the 3-D world like a coffee mug, Monty uses the pose and features at a location to do so and it tries to learn the representation of the object in the world.

But how does Monty learn abstract concepts like language, polity, maths etc. For example if I want to make monty learn a paragraph of a book, then how can i go about it? What can be equivalent to pose in this abstract world. Also how will monty understand the semantic meaning and relationships between various words?

Thanks,
Avinash

2 Likes

Hey there Avinash,

Its been a little bit since you asked this, so I figured I’d step in and take a crack and answering your question.

To start, I think we should ask ourselves, what is a ā€˜pose’. To me, pose is context. Applied spatially, a pose may be viewed as a sensors position and orentation, relative to the enviroment.

However, applied abstractly, a pose might be a paragraph in a book, a mathematical state within a given problem, or perhaps a set of assumptions within a political model.

We had actually had a discussion on this very thing some time back. It was on how to apply sensorimotor learning against network topology. You can check it out here: Extending and Generalizing TBT Learning Models

To answer the second part of your question (how Monty learns a paragraph of a book), we’d actually want to understand how nodes and edges get applied to abstract representations.

A node might come to represent concepts and symbols (e.g., words/characters, mathematical symbols or logical propositions).

Edges then become the relationships found between those concepts and symbols. (syntax rules, cause-effect chains, et cetera.)

So now, with all this in mind, how does Monty come to learn a paragraph?

Imagine giving Monty a paragraph. It would:

(1) Break the text into tokens (e.g., words or concepts).
(2) Identify relationships between those token (subject-verb-object triples, co-reference, topic flow, etc).
(3) Form a graph of these concepts — similar to how it models features-at-locations.

Over time, Monty would learn to associate meanings based on recurring patterns of relationship, then use its evidence-based updates to reinforce hypotheses about what concepts (aka poses) mean in relation to their surrounding ā€œenviroments.ā€ Its actually not terribly differrent from how Transformers work, albiet Monty gets there in a more sensorially grounded way.

I feel like I’m beginning to get into the weeds here a bit. Does all this make sense?

2 Likes

Thanks @HumbleTraveller for such a detailed response. Your response looks very deep. Will take some time to wrap my head around everything you said and then, if you don’t mind, will come up with some follow up questions.

Thanks again :grinning_face:

1 Like

Thanks for this great response already @HumbleTraveller

Just to add a few more thoughts to this:

[Disclaimer] As we have not implemented or tested any ideas related to language, these are mostly still speculative ideas.

  • Humans don’t learn just language: LLMs give us a strange idea that we can learn meaningful language understanding from just reading vast amounts of text. This is not how humans learn language. In the first years of life, most learning is spent on figuring out how to sense and interact with the world. Language, and especially abstract thought, comes as one of the last things we figure out. A child first learns about how an object looks like, feels like, and how to interact with it and can then, very quickly, associate words with it (see literature on ā€œfast mappingā€). The same goes for adults, too. Before StarWars came out, you had probably never seen a lightsaber or heard that word, but you were able to instantly pick up the concept, recognize it, and name it again at a later time. This is a long-winded way of leading up to my main point:
  • Language in humans is grounded in models learned through sensorimotor interaction: We imagine that Monty will learn associative connections between linguistic models, models of physical objects, and models of non-physical concepts. In the brain, this could be implemented by long-range associative connections such as those found between neurons in layer ā…”. Think about reading the word ā€œcatā€. When you read this, you probably also mentally ā€œhearā€ how the word sounds and invoke a mental ā€œimageā€ of a cat.

Here is an image of a slightly more complex example of reading the sentence ā€œThe cup is on the tableā€. Learning modules at the lowest level may be recognizing the individual strokes, which are then composed into letters. The relative arrangement of letters can be recognized as words. The words can then have associative connections to models of objects (e.g., tactile and visual models of a cup and table) or invoke spatial relationships (ā€œonā€). Together, those form a mental image in your head. You know exactly what words refer to, and this is not just based on statistical regularities in text that you have previously read.

  • Even abstract space may be anchored in physical space: This is the most speculative one that keeps coming up in our research meetings. It is questionable whether abstract concepts are represented in a different type of space or if they are just more abstract features (object ID outputs from lower-level LMs) anchored in the same types of reference frames as physical objects. A couple of clues for this are that 1) it is very hard for us to think in >3D space (e.g., try understanding quaternions!) and 2) a lot of our language when talking about abstract concepts uses physical expressions (e.g., ā€œthese two people are closeā€, ā€œthey broke upā€, ā€œwe grew apartā€, ā€œthis idea is far fetchedā€, ā€œI was way off targetā€, ā€œ it’s a long shotā€, …) Often when we try to understand more abstract things we visualize them in physical space (e.g., putting historical events on a timeline, putting family relationships onto a family tree, drawing a mathematical expression as a function, …) There is even an interesting theory I stumbled upon the other day (by Jerome Bruner) that the best way to teach children is to go from enactive (Learning about something through physical interaction) to iconic (Learning about it through a visual representation), to abstract (Connecting the concept to language and symbols). For example, to teach children about division, first have them divide an actual cake between each other, then have them draw a cake cut into pieces, then introduce mathematical symbols (https://www.researchgate.net/publication/376580822_Bruner's_3_Steps_of_Learning_in_a_Spiral_Curriculum ).

I hope those extra thoughts are useful!

  • Viviane
12 Likes

Hi Viviane,
Thanks a lot for the response. This is super helpful. I have some follow up questions

  1. From my observation and own experience, one thing that is evident to me about learning is that many a times learning happens by questioning. So, we observe something by interacting with the world and then ask questions about it and that leads to learning and knowing new things and the loop continues. How is that questioning part going to be incorporated in Monty. Will it be through Motor Policy or some other mechanism?

  2. After you highlighted anchoring learning in physical space, I could think of how I learnt about the abstract concept of ā€œdemocracyā€ā€” So, since childhood we keep on seeing the campaigns and posters during the election time, then voting days are special in India so it grabs our attention ----> I ask question about what it is and I was told simply that it is election but still concept of democracy is not explained----> Finally when I read it in books and was taught in school, then I was able to associate it with all my prior experiences---------- Now my question is, Will Monty also need to go through such rigorous process where it is shown many such life experiences and then it will learn about the concept .

  3. In a child’s learning, there is always a supervisor like parents, siblings etc. Taking your example of understanding ā€œcup on the tableā€---- One of the ways, child learns about this association of cup on the table is that they are told at times to keep the cup they have in their hand on the table. For a very young child, this sentence might be repeated multiple times at one instance and it would be augmented with some visual actions from the parents like ā€œput that on the tableā€, ā€œon the tableā€ and showing with hand what on the table mean.------ My question here is that in Monty also for such learnings, we will have to have a supervisor and how do we pass multimodal inputs to monty like both verbal and visual.

Thanks,
Avinash

1 Like

In Bret Victor’s tour de force presentation, Media for Thinking the Unthinkable, he talks about different ways of thinking. For example, at about 13:15, he discusses Jerome Bruner’s ideas, bringing up the way that Watson & Crick used mechanical aids to help them visualize the structure of DNA.

His companion essay, An Ill-Advised Personal Note about ā€œMedia for Thinking the Unthinkableā€, is also worth reading…

3 Likes

Hi @ak90
Great follow-up questions!

  1. Just like a human, Monty already actively tests hypotheses to resolve uncertainty about what it is sensing. At any point in time, Monty may have several hypotheses about what it is sensing. This might be ambiguity about which object it is sensing or the pose of that object or both. The more Monty senses, the more certain it gets in what it is sensing. The hypotheses that has accumulated the most evidence so far is called the most likely hypothesis (mlh in the code).

We currently have one model-based policy implemented in Monty which we call the hypothesis testing policy and it aims at moving efficiently to resolve what is being sensed. The learning module looks at the two most likely hypotheses and compares the models it has learned of it. So for instance the LM might be unsure of whether it is sensing a spoon or a fork while it is moving along the handle. It would then compare the models of those two objects (using the pose hypotheses to align them in space) and find the point that would resolve most ambiguity. In this case it would suggest to the motor system to move to the top of the cutlery as this is where the two objects differe the most.

You can read more about this policy here: Policy or watch this video: https://www.youtube.com/watch?v=lBWV3Yw5tCI

As this thread started with talking about abstract models it is important to highlight that this policy and principle doesn’t just apply to testing hypotheses about physical 3D objects. You could compare differences between abstract models and test hypotheses there using the exact same mechanism.

  1. That’s an interesting example. It’s a bit hard for me to say exactly now how Monty will learn about abstract concepts like democracy, as we haven’t worked on these past conceptual ideas yet. However, I wouldn’t say that it is necessarily required to have a lot of past experiences with a concept to learn about it. For example, you probably also learned about other political systems in school that are different from the one implemented in your country. You may have a more intuitive understanding of the one that you were most exposed to or with the democratic process after actually going through the process of voting, but you can still understand other systems without experiencing them personally.
  2. Yes, children learn language from their parents and other contacts, and this is like a supervisory signal. However, it is a very sparse signal. There is no one following a child around and naming everything the child sees at every moment. Also, we can generalize language to things we have never heard before. For instance, after learning what a lightsaber is, you instantly also know what ā€œthe lightsaber is on the tableā€ would mean.

Similar to humans, Monty might learn about an object in a mostly unsupervised way (like playing with a toy to explore its shape and features). It could then quickly associate a word with that rich and structured model when it is exposed to them in combination.

I hope this makes sense :slight_smile:

  • Viviane
2 Likes

Thanks a lot @vclay. This is super helpful.

  • So basically, we learn something and then in our subsequent interaction with the environment we try to apply our learning there and either we are able to successfully apply it or if we fail then we learn a new thing. For example, after learning democracy, if somebody is exposed to a country where there is rule of king then they might think that how king is elected here and then they would come to know that it happens by succession and here we learn a new concept called ā€œsuccessionā€ and this loop continues. exploration----->learning------>application in new exploration-----> either old learning consolidates or we learn new things

  • I have a small question-----> Does thinking also come as a part of exploration only. Because as i can think of, new knowledge is created by virtue of thinking. For example somebody would have thought that can there be a better way of having a person who governs us other than ā€œsuccession of kingā€ and hence might he concept of democracy would have arose. If thinking is separate from exploration of environment then how will Monty think and create new knowledge or will it always be bounded by human knowledge.

Thanks,
Avinash

Interesting question :thinking: As I view it, thinking is also a type of movement, but it only happens in mental space. It could be that the movement command to subcortical regions is suppressed to just simulate movement in mental space (like when you imagine walking into the next room without actually doing it).
If the thinking is abstract enough (like thinking about governmental systems) it is hard to imagine what a corresponding physical movement could be so potentially at high levels in the hierarchy, a lot of movement always happens in mental space. But there is still movement. We usually still structure our knowledge in reference frames and have an idea about how two concepts relate to each other and how you can conceptually move between them. How this kind of movement is learned and how it relates to physical movement is still something we actively discuss in our research meetings so this is still speculative.
-Viviane

2 Likes

3 posts were split to a new topic: Monty and Graphs