Sensor fundamentals

node0 · June 4, 2025, 2:51am

Hi all, I am having a hard time understanding the basic underlying mechanism by which learning is done in Monty. Below is a description of my current understanding of how sensors work. Please let me know if I am on the right track.

(1) Sensors move across their environment and produce a sequence of observations. Each observation consists of a set of observed features (color, curvature, etc.) , the 3D coordinate of the features in the “world” reference frame, and the pose of the sensor relative to the sensors reference frame.
(2) Sensors also determine object boundaries. For each observation a sensor will produce a flag indicating if the observation was made on an object or not. My understanding is that the sensor uses data from the simulator to determine what is an object and what is the background.

Is this what is happening in the sensors (without getting into what the learning modules are doing)?

blimpyway · June 4, 2025, 6:58am

Regarding 2 above I don’t know how it is done, I just don’t think it is a correct way of doing it. It is the learning module task to tell whether the current sensor input (considering previous inputs and respective movements) matches some known object or not.

vclay · June 4, 2025, 7:26am

Hi @node0 and welcome to the forum!
Good questions.

That is a good description. Each observation returned from the sensor module contains a pose (location and orientation in a common reference frame) and a list of features. For more details, you could have a look at the State class (tbp.monty/src/tbp/monty/frameworks/models/states.py at 451ba5f576cfdf3eb1b14250d85125429b93d1e4 · thousandbrainsproject/tbp.monty · GitHub). All messages sent and received by learning modules (so also the outputs of sensor modules) are instances of this State class. You may have already seen this, but here is our documentation page on this topic: Observations, Transforms & Sensor Modules
The current default setup is that we use the depth image to estimate which parts of the sensor patch are part of the object surface and which ones are not. The corresponding function is here: tbp.monty/src/tbp/monty/frameworks/environment_utils/transforms.py at 451ba5f576cfdf3eb1b14250d85125429b93d1e4 · thousandbrainsproject/tbp.monty · GitHub Other parts of the sensor patch may also be on the object (like imagine looking at the rim of a cup where you have the front surface of the mug in the foreground but you also see part of the inside of the mug) but we don’t want to use those pixels/depth values to estimate the principal curvature and point normal for the center of the patch. If we were to use all of them, it will introduce some noisy distortions. We are not using privileged information from the simulation for this; we are just applying some heuristics to the depth map.
Here are some examples of this:

Screenshot 2025-06-04 at 10.22.37 AM1326×1472 107 KB

(The viewfinder image is just to give you some context, the LM only sees the patch. We take the depth values in the patch and look at their distribution. If it is bimodal, we use this to set a cutoff point for which values are on or off object (green and red respectively on the right column of this figure)
There is some additional logic to decide which side of the distribution to use depending on where the center of the patch ends up.

Screenshot 2025-06-04 at 10.26.24 AM1386×1138 85 KB

We do have access to object labels from the simulator and sometimes extract those (like in multi object experiments) but they are only used for logging and evaluating performance. The learning module can not use this information to recognize objects.

I hope this helps!

Viviane

node0 · June 6, 2025, 3:10am

Thanks @vclay for the detailed explanation. I have two follow up questions:

a) Besides being used to distinguish the object from the background, is the depth information also being used to compute the orthonormal vectors?

b) When we way that a sensor module “extracts features”, are we basically talking about this step where we apply the get_point_normal_* methods to the 3D surface shape detected in the patch?

vclay · June 6, 2025, 8:40am

Hi @node0

a) yes, the depth information is combined with the sensor location to determine the location of the sensor patch. It is also used to estimate the orthonormal vectors.
b) Yes, estimating the point normal and principal curvature directions is part of what the sensor module is doing. Those three vectors are used to define the orientation of the sensor patch. Additionally, we can extract other features, such as the color at the center of the patch or the amount of curvature. These are just some basic examples of the features we extract right now. You could implement custom sensor modules that extract other features (like temperature, texture, or simple patterns).

Viviane

node0 · June 6, 2025, 3:17pm

Thank you, this is a lot of help. I can now go look at the code and understand what is generally going on.

Topic		Replies	Views
2021/11 - Initial Outline of the Requirements of Monty Modules Video Discussions core-video	15	451	January 31, 2025
2021/11 - Continued Discussion of the Requirements of Monty Modules Video Discussions brainstorming-video	3	100	November 12, 2024
Help with space and poses General	1	71	April 8, 2025
2023/01 - A Comprehensive Overview of Monty and the Evidence-Based Learning Module Video Discussions core-video	8	307	January 5, 2025
Some Questions from the Documentation General	21	393	January 10, 2025

Sensor fundamentals

Related topics