I am delighted to see that you have open-sourced the Monty project and its accompanying codebase—thank you for making this valuable resource available to the community.
I have watched the videos and reviewed some of the documentation. However, I find myself seeking further clarity regarding the generalizability of the framework. From my understanding, the Monty framework dynamically learns an object’s morphology graph and pose through sensorimotor learning.
For instance, consider the task of learning molecular structures. Given a molecule’s SMILES string (e.g., C([C@@H]1C@HO)O), its secondary structure (which can be represented as a graph), or its tertiary structure (represented as points in 3D coordinates), how would the sensory module in Monty approach learning the discrete graph structure and 3D point configuration? This challenge is particularly intriguing given the potential absence of well-defined normal or curvature vectors to guide pose estimation.
Additionally, would it be possible to input a novel, unseen sequence (e.g., a SMILES string) and have Monty predict its potential secondary or tertiary structure?
I appreciate your time and insights on this matter. Please find attached images of 2D and 3D molecular graphs for reference.
Thanks for your question, it depends a lot on how data is available. Given a static dataset of SMILES strings mapping onto secondary and tertiary structure, Monty is not going to work well, while this kind of mapping is a good example of where a general function-approximation algorithm like deep-learning is useful.
Where Monty could eventually shine is when it is able to control experimental devices that allow it to further probe information about the world - e.g. if Monty had some hypotheses about the structure, and wanted to test these through various experiments, probing the space in which it is uncertain. We embed representations in 3D coordinates that can take on any kind of graph structure necessary which is <= 3D space (strings, graphs defined by edges, or 3D point-clouds), so in theory these things can all be represented.
However, your question about how Monty would learn to generalize a mapping between these levels of representations is a good one, and relates to how Monty can learn a mapping between different spaces (e.g. meeting a new family, and mapping these people onto an abstract family-tree structure). We are still figuring out exactly how this would work in a simpler case like the family-tree. In the molecule case, the rules are much more complex, and so learning this is definitely not something that Monty can do now. However, just like a human scientist, the ultimate aim would be for Monty to learn to do this mapping based on a causal understanding of the world, informed by the above mentioned experiments. This is in contrast to an end-to-end black-box function approximation. It may be a while before we reach that point, but it would certainly be a useful tool once developed! Hope that makes sense.
Maybe let’s imagine that the agent can modify the molecule graph, for example, add edges or delete edges, and it may go through each node and explore the graph. Would it be helpful?
BTW, I noticed that Geometric Algebra(Clifford Algebra) and forms may be relevant to your research. They may simplify geometry space operations.
here are some gentle introductions to geometric algebra
and here is a gentle introduction to differential forms
Good question, I would say that will only be helpful if Monty can somehow get new information from the environment (e.g. via experiments or some other form of “sensory input”), to either validate or confirm the changes it is making to the edges. Otherwise it doesn’t have any information to ground and validate it’s hypotheses about the world. Does that make sense?
Thanks for sharing that, looks interesting and will check it out.