Hey @xiaowenhao , welcome to the forums! Those are good questions. A lot of it depends on whether discussing how Monty currently works, vs. our long-term vision.
Re. motivation in Monty at the moment:
- Your general understanding is correct - Monty moves in order to either recognize or build models. In our current experiments, typically the episode ends once the object is recognized, although this is not always the case. This is largely due to constraints around what we actually have implemented.
- In particular, adding hierarchy, as well as the ability to switch policies, is still a work in progress, so each LM has a pretty simple internal “motivation” driving how it acts in the world. For simplicity, this basically consists of learning about objects, and inferring objects. More specifically, we have lower-level policies that the LM can engage during learning (like systematically moving over the object, i.e. the scan policy), and inference (like rapidly exploring the surface of the object).
Re. motivation in future versions:
- One could argue that in the absence of any outside drive, an LMs default motivation is to try to predict what it is seeing (inference), and if it doesn’t understand what it is seeing, learn about it. This is coarsely captured with the baked-in policies we currently have, and something like this will probably continue to be present in our LMS in the future
- In the long-term however, we imagine that goal-states will generally drive the motivation of an LM, and that these will come from higher level LMs. For example, an LM that knows about coffee machines might receive the goal-state of, “brewed coffee is in the machine”. This might have come from an LM that is trying to reduce the mental fatigue of the agent. The coffee-machine LM might then send goal-states to yet other LMs in order to achieve its driving goal, or it might interface directly with the motor system.
- In the brain, the top-level driver of motivation is indeed likely to be “old brain” structures like the hypothalamus and basal ganglia. However in Monty, the top-level goal state might be set by an external source, like a human programmer.
Re. your comments on moving objects:
- I would find it helpful if you were to draw out your specific proposal. However, as you may have seen in some of our recordings, we are thinking a lot about separating out behavior modeling from morphology modelling. I think this aligns well with your general intuition.