New Tutorials on Using Monty in Custom Applications

vclay · July 21, 2025, 12:32pm

Hi @srgg6701 thank you for the kind words!

Communication between Monty and the user is an interesting topic. As we are not yet at the stage where Monty models language I can just offer some general thoughts on the topic:

The only outputs of the brain are actions. Whether those are commands to move limbs, commands to move the eyes, or commands to move the muscles of the mouth to produce language or the hand to write or type. All of them are motor commands.
Similarly, Monty’s LMs outputs go to the motor system, which then outputs actuator-specific motor commands. Since we are not dealing with a biological system we have a few more options on the kind of motor systems we have. You could potentially have a motor system that outputs binary code or tokens. Or it could have prelearned primitives of how to write letters or sound out words instead of having to learn this from scratch. But this is more about quick, specific solutions, not a requirement.
To tell Monty what it should do we currently imagine to use “goal states”. This is a message in the CMP format that specifies the state in which the world should be (e.g. “I want cup at location x” or “I want cup to be filled with coffee”). The LMs can then break this high-level goal down into subgoals (e.g., “I need the agent to go to the kitchen”, “I need to start the coffee machine”,…) and use its internal, structured models to figure out how to achieve them through outputting actions.
To specify the highest-level goal state, you could potentially use an LLM that translates a natural language query into a CMP signal. Eventually, Monty should be able to model language itself though and turn language into a goal state (but this is far out on our research roadmap).
Side note: Currently, Monty’s only goal is to model the world and infer what it is sensing. This is its intrinsic desire to learn and resolve uncertainty, and not any specific task provided from outside. For its output, we basically insert an electrode into Monty’s brain and measure the representation that an LM outputs (to the next higher level LM, not to the motor system), which is its classification of what object and pose it currently senses (in the form of a CMP signal).

Sorry if this goes into more detail than you asked for. The short summary is that in the short-term it might be useful to use LLMs as an interface to translate between human language and Monty’s language (CMP). However, in the longer term Monty should also be able to model, understand, and output human language itself (using the same principles as our brains do).
In case you are interested, I wrote a couple more thoughts on language in Monty here: Abstract Concept in Monty - #4 by vclay You can also checkout a discussion here on using LLMs with Monty, and the potential challenges (Hello, and thoughts about leveraging LLMs in Monty - #3 by avner.peled)

Best wishes,
Viviane