Thanks for this great response already @HumbleTraveller
Just to add a few more thoughts to this:
[Disclaimer] As we have not implemented or tested any ideas related to language, these are mostly still speculative ideas.
- Humans don’t learn just language: LLMs give us a strange idea that we can learn meaningful language understanding from just reading vast amounts of text. This is not how humans learn language. In the first years of life, most learning is spent on figuring out how to sense and interact with the world. Language, and especially abstract thought, comes as one of the last things we figure out. A child first learns about how an object looks like, feels like, and how to interact with it and can then, very quickly, associate words with it (see literature on “fast mapping”). The same goes for adults, too. Before StarWars came out, you had probably never seen a lightsaber or heard that word, but you were able to instantly pick up the concept, recognize it, and name it again at a later time. This is a long-winded way of leading up to my main point:
- Language in humans is grounded in models learned through sensorimotor interaction: We imagine that Monty will learn associative connections between linguistic models, models of physical objects, and models of non-physical concepts. In the brain, this could be implemented by long-range associative connections such as those found between neurons in layer ⅔. Think about reading the word “cat”. When you read this, you probably also mentally “hear” how the word sounds and invoke a mental “image” of a cat.
Here is an image of a slightly more complex example of reading the sentence “The cup is on the table”. Learning modules at the lowest level may be recognizing the individual strokes, which are then composed into letters. The relative arrangement of letters can be recognized as words. The words can then have associative connections to models of objects (e.g., tactile and visual models of a cup and table) or invoke spatial relationships (“on”). Together, those form a mental image in your head. You know exactly what words refer to, and this is not just based on statistical regularities in text that you have previously read.
- Even abstract space may be anchored in physical space: This is the most speculative one that keeps coming up in our research meetings. It is questionable whether abstract concepts are represented in a different type of space or if they are just more abstract features (object ID outputs from lower-level LMs) anchored in the same types of reference frames as physical objects. A couple of clues for this are that 1) it is very hard for us to think in >3D space (e.g., try understanding quaternions!) and 2) a lot of our language when talking about abstract concepts uses physical expressions (e.g., “these two people are close”, “they broke up”, “we grew apart”, “this idea is far fetched”, “I was way off target”, “ it’s a long shot”, …) Often when we try to understand more abstract things we visualize them in physical space (e.g., putting historical events on a timeline, putting family relationships onto a family tree, drawing a mathematical expression as a function, …) There is even an interesting theory I stumbled upon the other day (by Jerome Bruner) that the best way to teach children is to go from enactive (Learning about something through physical interaction) to iconic (Learning about it through a visual representation), to abstract (Connecting the concept to language and symbols). For example, to teach children about division, first have them divide an actual cake between each other, then have them draw a cake cut into pieces, then introduce mathematical symbols (https://www.researchgate.net/publication/376580822_Bruner's_3_Steps_of_Learning_in_a_Spiral_Curriculum ).
I hope those extra thoughts are useful!
- Viviane
