2025/09 - Brighton Retreat Final Presentations

The team presents the results from the Brighton focus-week retreat.

00:00 Introduction
04:00 Python Upgrade
15:32 Salience Showdown
27:47 Exponential Moving Average Evidence-Bounded Scores
42:06 Compositional Object Test Bedbed

1 Like

So, are you gonna leave us hanging? Who won!?

My personal pick would be Scott & Hojae hands down, both fruitful and entertaining. Plus, it paves the way for better feature extraction.

Compositional models are coming along nicely, keep up the good work. I wonder how BioSaliency would affect the results :thinking:

Ramy’s interactive visualizations are awesome, they really put things into perspective. I feel like having multiple time constants is the way to go, as Niels suggested. Probably even have some that are dynamic and context-dependent too. The human mind definitely has some level of control over at least one of them.

Regarding the Habitat “wisdom tooth extraction”, well you couldn’t have known about the 10x slowdown without trying! That’s what science is for. Have you guys made more progress since then? I do happen to have experience with interprocess communication, perhaps I could take a crack at it :eyes:

Are these the repos in question?

3 Likes

Thanks for the kind words @AgentRev :slight_smile: I think all the projects were really cool and had impressive results for just a week’s work! In the end, it was a tie between Scott and Hojae’s saliency project and Will & Tristan’s simulator extraction (I had to order some more trophies; the store is now out of SPAM cans :grinning_face_with_smiling_eyes: )

There is a new repo for the interactive visualizations here: GitHub - thousandbrainsproject/tbp.plot: Tooling for plotting tbp.monty visualizations @rmounir put a lot of work into making them easy to use and assemble custom versions of them (a video of him walking through how to use them coming out soon).

After the compositional testbed is integrated into Monty (PR here feat: Add compositional experiments & metrics by vkakerbeck · Pull Request #467 · thousandbrainsproject/tbp.monty · GitHub) we plan to test all our other research prototypes in the pipeline on this benchmark, including the bio saliency :slight_smile:

@tslominski is probably the best to answer on possible ways to contribute to the simulator extraction. For now, we are focusing on integrating MuJoCo as a second simulator and we are not currently working on implementing the separate simulator prototype ourselves (not because we won’t ever want to, just because of limited resources).

2 Likes

Hi @AgentRev,

Yes, those are the two repositories that we used for the demo.

As @vclay mentioned, we haven’t updated Monty with anything from the prototypes yet.

My recommendation would be to first integrate the changes that reduce the Simulator protocol to the bare minimum. Those changs are all contained in GitHub - thousandbrainsproject/prototype.simulator: Monty prototype using a simulator via a socket. While the separation made it very clear what is the minimum required, reducing the protocol surface is a good thing even prior to/without the separation. There’s less to implement in any simulator, including MuJoCo.

With the minimal protocol, I assume it should be easier to try out different serialization/communication approaches.

@jshoemaker will likely take a look at reducing the Simulator protocol, but I don’t know when that will be. Both him and I are currently integrating the latest research into Monty. I’m integrating Scott’s work and Jeremy’s integrating compositional work. So, we aren’t working on the simulator seperation (nor the Simulator protocol simplification) aspect of things yet.

1 Like

Sure, I can take a look at that. I already see there are merge conflicts to solve due to the DataLoader refactor.

Or, if you’d instead prefer help with other things right now, just designate a target.

Simplifying the Simulator would be helpful.

I think it is worth noting that prototype.simulator is prototype code, and I recommend using it as a reference for an implementation rather than attempting to merge it as is or with modifications.

Additionally, some of the prototype.simulator commits are handoffs from pair programming sessions and correspond to partial work in progress.

2 Likes

Here is the video of the judges announcing the winners.

2 Likes

@brainwaves Haha! There it is. Everyone rightfully earned their potted meat, indeed. :trophy:

@tslominski Yes I figured as much, that as-is merge attempt was just to inspect the diff. I opened an issue to continue the discussion: Simulator protocol should be reduced to the bare minimum · Issue #506 · thousandbrainsproject/tbp.monty · GitHub

4 Likes

Hello! What is the progress on the integration of Mujoco as environment?

I see there is already a MuJoCoSimulator avaliable, but it doesn’t implement an actuator as HabitatSim does. Does it mean that is incomplete? or that is following a different approach?

What are the (missing?) steps to be able to train monty on a Mujoco environment?

Thanks!

1 Like

Work on the MuJoCo simulator been somewhat stalled in the last few months, as other projects have taken priority. @AgentRev has also been doing a lot of work to integrate the changes to simplify the Simulator protocol and it seemed best to wait for that work to finish before proceeding.

At a high level, I think the work left to be done on the MuJoCoSimulator is to:

  • Implement the methods of Simulator that aren’t currently implemented. This is mainly the step and reset methods, as well as add_object support for custom objects (see next bullet point).
  • Add a way to load custom objects, e.g. the YCB dataset. I’ve started looking into this but haven’t made a lot of progress on it. The dataset is available in a format that MuJoCo accepts, but there may be some extra work needed to make them usable by MuJoCo.
  • Then we need to make a MuJoCoEnvironment or modify/rename HabitatEnvironment. That might not be a lot of work if it’s only using the methods from Simulator (which it should be).
  • While doing all this, making sure to add unit tests to cover all the new functionality.

There are probably some layers higher up that I’m forgetting, but I think if we get the Environment working, then the rest shouldn’t care about the differences. I was planning to focus at that point on getting a single experiment to run that uses objects from the YCB set, and then go from there.

That first bullet point is the most complicated, because we need to make sure that it behaves similarly to what HabitatSim returns, and there are subtle differences between the two libraries (e.g. differences between Habitat’s semantic sensors and MuJoCo’s segmentation rendering). I don’t think we expect to get matching results from the two simulators, since they are completely different rendering pipelines and consequently the sensor data we receive from them won’t match perfectly.

As for the actuator that HabitatSim implements, that’s more of an implementation detail, and so we don’t necessarily need to follow that pattern with the MuJoCoSimulator.

I’ve tried to avoid using the XML definition language that MuJoCo uses, and define scenes programmatically (see _add_primitive_object), but that might be the wrong approach. It might turn out to be easier to define the scene for an experiment declaratively in XML and not worry about adding objects in the imperative way we currently do.

6 Likes

Thank you very much for the detailed explanation!

The plan makes a lot of sense. I will dig a bit to understand more the Simulator class and what it the step method requires, and how to get observations from and perform actions in the environment.

Thanks again!

2 Likes

At our meeting today, we found additional context explaining why I introduced the actuator pattern. The entire context is in RFC 4 Action Object, but the relevant bit is:

In order to support the ability of taking a generic Action and using it with different environments, we can use the Visitor pattern to declare Actions independent of environments. Actuator would declare the visitor interface and any concrete environment would implement a concrete visitor, such as HabitatActuator, that would handle actuating an Action into an environment-specific act.

1 Like

Just to understand, when thinking about supporting environments, would it make sense to be compatible with the Gymnasium interface? Gym is the default standard when creating environments for imitation learning and reinforcement learning research and it includes reset and step methods as well as very general state and action spaces.

If I am understanding correctly, the work to support a new environment would be the implementation of wrappers that connect the observations provided by the environment and the model’s actions with the observations provided by the sensor modules to, e.g., transform an image provided by the environment in a patch.

Does this make sense? or am I missing the issue?

Hi @sergioval,

I think we are seeing the same word used to describe different concepts. “Environment” in Gymnasium is not the same as “environment” in Monty.

For example, the Gymnasium env.step(action):

observation, reward, terminated, truncated, info = env.step(action)

The Gymnasium API assumes that an environment step can return a reward.

In tbp.monty, the environment has no faculty to convey any reward. It only provides observations and proprioceptive state. The reward, if any, would need to be decided in Monty (the intelligence part of the framework). This is an example of a pretty significant mismatch between the APIs, and I’m not sure I know how to reconcile it.

The existing tbp.monty environment API is heavily inspired by reinforcement learning, so we do have similar methods like .step() and .reset(). (The API will be defined in terms of protocols imminently, in the scope of refactor!: split `EmbodiedEnvironment` into protocols by AgentRev · Pull Request #672 · thousandbrainsproject/tbp.monty · GitHub).

I don’t have a clear idea of what in Monty could correspond to Gymnasium’s “environment.”

Regarding the work to support a new environment, we have the Using Monty in a Custom Application tutorial, which describes some of what’s required.

I would phrase the work as connecting observations from the environment to the sensor modules’ inputs, and connecting actions from the motor system to the environment’s inputs. Also, the proprioceptive state output from the environment needs to be connected to the motor system input.

There is a complication: our EnvironmentInterface applies transforms to the environment’s observations before passing them to the sensor modules. We expect to move that functionality into the sensor modules themselves eventually, but for now, it is the responsibility of an EnvironmentInterface. Also, the other complication is that we have a thing called EnvironmentInterface, a thing called a Simulator, and a thing called Environment. There is a historical path for how we got here, and our goal is to eventually remove the EnvironmentInterface entirely once we migrate all its functionality. Still, for now, we have all three concepts, and transforms happen in an EnvironmentInterface.

Lastly, there are experiment-specific steps, such as populating the environment with desired objects in desired configurations, and positioning the Monty system in the environment as needed by the experiment.

3 Likes

Thank you very much for the detailed explanation. I am still quite lost, so I really appreciate it.

I completely agree on the reward bit, as this should be intrinsically motivated within the learning module. One quick and dirty alternative would be to ignore it completely it.

I think I am getting it, by “connecting observations from the environment to the sensor modules’ inputs, and connecting actions from the motor system to the environment’s inputs” you mean that current environments decouple the observation from the agent. For example, in Mujoco, we can have external cameras that are disconnected from the robot, and this doesn’t make sense from a learning module, as the observation is coupled with the action. So, a monty version of mujoco has to make this coupling, right? Similar to the proprioceptive state, which has to be part of the returned observation.

I will read about the new API defined in terms of protocols.

Thank you very much again!

1 Like

I feel I should clarify that at first glance it seems to me like you could wrap Gymnasium in a tbp.monty environment, say class GymnasiumEnvironment(Environment, ResettableEnvironment) (after https://github.com/thousandbrainsproject/tbp.monty/pull/672 is merged). In my post, I was mainly saying I don’t have a clear idea of how to make Monty more directly compatible with Gymnasium, like adopting the Gymnasium API for Monty.

As you mentioned, in a GymnasiumEnvironment like this, you could ignore the reward return. You’d also have to deal with truncated and terminated, since in tbp.monty the environment is not allowed to stop the experiment. You’d also have to figure out how to turn Monty actions into actions that Gymnasium understands.

I’m sorry, but I don’t understand what you mean by the current environment decoupling observation from the agent. The data structure that leaves the environment looks like a tuple[Observations, ProprioceptiveState] with the shape of, for example:

(
Observations(
  {
    AgentID("agent_id_0"): AgentObservations(
      {
        SensorID("patch_0"): SensorObservations(
          {
            "depth": depth_sensed_by_patch_0,
            "rgba": rgba_sensed_by_patch_0
          }
        ),
        SensorID("patch_1"): SensorObservations(
          {
            "depth": depth_sensed_by_patch_1,
            "semantic": semantic_sensed_by_patch_1
          }
        )
      }
    )
  }
),
ProprioceptiveState(
  {
    AgentID("agent_id_0"): AgentState(
      {
        "position": current_agent_position,
        "rotation": current_agent_rotation,
        "sensors":
        {
          SensorID("patch_0.depth"): SensorState(
            {
              "position": current_patch_0_depth_sensor_position
              "rotation": current_patch_0_depth_sensor_rotation
            }
          ),
          SensorID("patch_0.rgba"): SensorState(
            {
              "position": current_patch_0_rgba_sensor_position
              "rotation": current_patch_0_rgba_sensor_rotation
            }
          ),
          SensorID("patch_1.depth"): SensorState(
            {
              "position": current_patch_1_depth_sensor_position
              "rotation": current_patch_1_epth_sensor_rotation
            }
          ),
          SensorID("patch_1.rgba"): SensorState(
            {
              "position": current_patch_1_rgba_sensor_position
              "rotation": current_patch_1_rgba_sensor_rotation
            }
          ),
        }
      }
    )
  }
) 
)

One way to think about Agents is that they are intended to correspond directly to effectors, the things that Monty can move in the environment. So, a Monty agent can have direct correspondence with a Habitat agent or a MuJoCo camera (if we choose that Monty’s effector is the camera itself).

tbp.monty does not forbid you from having additional cameras in MuJoCo that are not intended to be connected to Monty. The way I would think about those is as experiment artifacts/entities. Experiment would know about them and maybe use them to visualize something for the experimenter, but Monty could be oblivious to their presence. One of our goals is to have a more precise separation between what is a part of Monty and what is a part of the experimental framework that experiments with Monty.

2 Likes

Thank you for this part, it clarified quite a few things for me while I’m (still) reading the code

1 Like