Hi everyone,
Here is a recently released paper by INSAIT/Sofia University and ETH Zurich about point clouds as an embodiment agnostic representation. As the authors put it:
In this work, we approached the challenge of learning a generalist manipulation policy from a mix of labeled and unlabeled video and proposed MotoVLA together with a two-stage training procedure. By establishing dynamic point clouds as an embodiment agnostic representation, our approach successfully transfers knowledge from video to manipulation motion priors. Using simulation and real-world experiments, we demonstrate a consistently improved model performance in in- and out-of-domain settings and showcase the direct transfer from human demonstration to robot actions.
Source: Generalist Robot Manipulation Beyond Action Labeled Data
I’m sharing this as there is slight resemblance in this approach to TBP/Monty. Namely, generalizing points of interest (cloud) and their movement in a human hand, and being able to transfer such behavior “on a high-level” to a robot hand. To me it is a light and simplified version of what Monty is built for and capable of, but yet an interesting approach.
Let me know if you see something else in this article which might be interesting to pick at.
Thank you,
Alex Kamburov