Design of Final-Year Undergraduate Project with the Inclusion of the TBP!

Happy new year to you too Zach!

I can’t say I have any experience with these, but the DFRobot 100x100 sensor looks interesting. As well as the plug-and-play appeal, the screen on the back seems like it might be helpful for debugging and experimenting. As I understand it there isn’t a huge cost difference?

1 Like

Hi Rich,

I did originally consider mounting a camera to the gripper but have since settled with a static stereo camera that has movement as done in Monty Meets World. I actually didn’t realise you could buy decent endoscope cameras for that cheap, so something I might consider if project time allows.

I considered using a turntable like in the Lego Robot project, with a motor underneath that Monty controls. It might still be something I use if my sensors (and robot) struggle to model/reach parts of the objects being learnt.

I’ve not heard of Turtles before, so I’ll check it out! Thanks

I’m not sure about maintaining potted plants for this project as I wouldn’t know how to apply Monty to it in the timeframe (and besides, I’m not super skilled at keeping houseplants alive! I once managed to kill a cactus). But still an interesting concept!

3 Likes

Hi Niels,

They are roughly similar costs, after shipping and the additional items for the LightRanger 14.

My main worry is the minimum range, the DFRobot can get within about 15cm of an object before readings stop. This might be ok but the robot I’m using only has a range of 44cm so it might be a little tight.

On the other hand, the LightRanger 14 can get as close at 1cm, but again is a lot more fiddly.

I will give it some thought and pick one over the next day (best to get the purchase order form in sooner to prevent procurement delays)

1 Like

Ah I see, yeah that’s a good point about the minimum range. From a functional point of view, there’s no real reason that the surface agent (or let’s maybe call it “exploring” agent - since it will be physically moving) needs to actually be very close to the object, so long as it is relying on depth sensors rather than actual touch. Assuming your final examiners are not expecting it to move close to the object, then I think the more distant sensor would be fine. It would be neat though if you are able to have something that closely approximates the surface agent in Monty. If your budget allows, maybe you could order both, highlighting your reasoning for having both options available?

2 Likes

I’ve just sent off a purchase request for the DFRobot sensor. Looking at this demo video it seems to work pretty well.

The budget might just about stretch to get me the LightRanger 14 and its additional necessary components. I was considering using any extra budget on a sensor of another modality (we’ll see if time permits), but I could go for the LightRanger instead. For now I’ll stick with the DFRobot sensor.

1 Like

In other news, I believe I’ve made a breakthrough with the XCode-less Monty Meets World, using the iPads back facing LiDAR sensor instead of the front facing TrueDepth Camera. Will share the method later!

4 Likes

Ok nice, sounds like a good plan. And that’s great to hear! Looking forward to hearing more soon.

1 Like

Sorry, this has been slightly delayed - it’s exam season for me at the moment. Will write it up when I can.

2 Likes

Definitely no rush, and good luck with the exams!

2 Likes

Hello all, sorry for the delay. It’s about time for a project update.

I’ll first share how I ran my own version of Monty Meets World without XCode and using an iPads back facing LiDAR sensor instead of the front facing TrueDepth Camera.

For starters, to snap the correct depth picture I used this app, allowing me to take a 32-bit float TIFF image, each pixel representing the distance (in metres) to a point in the scene. This app should be compatible with apple devices that have a LiDAR (such as iPhone pro and pro max - 12 onwards, or some iPad pros). It should be noted that the app is a little clunky, but works for the intended purpose.

Unfortunately, I haven’t built a method to stream the images captured to Monty (as is done in the original Monty Meets World). A manual method is required instead.

  1. Use the Depth Camera RAW app on a LiDAR device to capture an image of one of the Monty Meets World pre-scanned 3D objects (I used the Numenta/TBP mug), ensuring the object of interest is located in the centre of the image.
  2. Locate the TIFF and corresponding JPEG image files on the LiDAR device. I found them within the app files.
  3. Use the attached scripts (“jpg_to_reshaped_png.py” and “tiff_to_http_payload.py”) to convert the TIFF file into the accepted http payload and the JPEG into the reshaped PNG (you will need PIL and numpy python packages installed). Example terminal commands are as follows: $ python jpg_to_reshaped_png.py <your_tiff_file_location_here>.tiff <your_jpg_file_location_here>.jpg and $ python tiff_to_http_payload.py <your_tiff_file_location_here>.tiff
  4. From running the example commands you should have two files called depth_x.data and rgb_x.png in the same folder you have saved the scripts in.
  5. Make a folder in following location of your fork of the Monty repo ~/tbp/data/worldimages/<your_scenes>/<your_object> (for instance mine was ~/tbp/data/worldimages/zachs_scenes/tbp_mug), and move your depth_x.data and rgb_x.png files here.
  6. Change the “x” in both files to an index (starting from 0). The experiment will run in this order, based on the scenes and versions you have provided for eval_env_interface_args in the experiment YAML (see next step). If you just have the one pair of images, replace “x” with “0”.
  7. Create an experiment YAML in ~/tbp/src/tbp/monty/conf/experiment/<your_experiment_name> (I called mine zachs_monty_meets_world.yaml)
  8. Write the following experiment config in your YAML file:
defaults:
  - /experiment/config/eval@config
  - /experiment/config/logging/parallel_evidence_lm@config.logging
  - /experiment/config/monty/patch_and_view@config.monty_config
  - /experiment/config/monty/learning_modules/clear_learning_module_configs@config.monty_config
  - /experiment/config/monty/learning_modules/default_evidence_1lm@config.monty_config.learning_module_configs
  - /experiment/config/monty/args/clear_monty_args@config.monty_config
  - /experiment/config/monty/args/defaults@config.monty_config.monty_args
  - /experiment/config/monty/motor_system/clear_motor_system_config@config.monty_config
  # the following config moves 20 pixels at a time (this doesn't work very well when reduced to say 8)
  - /experiment/config/monty/motor_system/informed_no_trans_step_s20@config.monty_config.motor_system_config
  - /experiment/config/environment/world_image@config.env_interface_config
  - /experiment/config/environment/init_args/clear_env_init_args@config.env_interface_config
  - /experiment/config/environment/init_args/monty_world_standard_scenes@config.env_interface_config.env_init_args
  - /experiment/config/environment_interface/clear_eval_env_interface_args@config
  - /experiment/config/environment_interface/world_image@config.eval_env_interface_args

_target_: tbp.monty.frameworks.experiments.object_recognition_experiments.MontyObjectRecognitionExperiment
config:
  model_name_or_path: ${path.expanduser:${benchmarks.pretrained_dir}/surf_agent_1lm_numenta_lab_obj/pretrained/}
  n_eval_epochs: 1
  show_sensor_output: true
  env_interface_config:
    env_init_args:
      data_path: ${path.expanduser:${oc.env:MONTY_DATA}/worldimages/<your_scenes_folder_name>/} # add your path to the scene folder here
      patch_size: 30 # change the sensor patch size
  logging:
    run_name: my_monty_meets_world
    wandb_group: my_monty_meets_world
# To see stats and plots in W&B, uncomment the following lines...
    # monty_handlers:
    #   - ${monty.class:tbp.monty.frameworks.loggers.monty_handlers.BasicCSVStatsHandler}
    #   - ${monty.class:tbp.monty.frameworks.loggers.monty_handlers.ReproduceEpisodeHandler}
    # wandb_handlers:
    #   - ${monty.class:tbp.monty.frameworks.loggers.wandb_handlers.BasicWandbTableStatsHandler}
    #   - ${monty.class:tbp.monty.frameworks.loggers.wandb_handlers.BasicWandbChartStatsHandler}
  monty_config:
    monty_args:
      min_eval_steps: ${benchmarks.min_eval_steps}
  eval_env_interface_class: ${monty.class:tbp.monty.frameworks.environments.embodied_data.SaccadeOnImageEnvironmentInterface}
  eval_env_interface_args:
    scenes: [0] # which scene folder do you take the version from, if all are located in the same folder you can repeat for multiple images: i.e. [0, 0, 0]
    versions: [0] # Change this for specific versions in the specified scene folder: i.e. [0, 1, 2]
  1. Run the experiment with $ python run.py experiment=<your_experiment_name>

There are few modifications to the original Monty Meets World YAML here, one of them being the patch_size. The Depth Camera RAW app are notably smaller than the examples taken for the original Monty Meets World and so the sensor patch size has to be decreased accordingly.

One of the things I didn’t manage to solve was how many pixels the sensor patch moves. Because the Depth Camera RAW images are smaller, the default 20 pixel movement of the sensor patch on the image at each step seemed quite hectic. I created alternative motor system configs that reduced this (to 8 or 12 pixels etc) but this seemed to greatly decrease correct detection. I’m not yet sure why! For now I’ve kept it as is.

2 Likes

Attached scripts:

jpg_to_reshaped_png.py

from PIL import Image
import argparse
import os

''' Example usage:
For default output PNG path (rgb_x.png):
$ python jpg_to_reshaped_png.py path/to/input.tiff path/to/input.jpg

To specify output PNG path:
$ python jpg_to_reshaped_png.py path/to/input.tiff path/to/input.jpg --output_png_path path/to/output.png
'''
 
def reshape_jpg_to_match_tiff(tiff_path, jpg_path, output_png_path = "rgb_x.png"):
    """
    Reshape a JPG file to match the dimensions of a TIFF file,
    then convert and save as PNG.
    
    Args:
        tiff_path: Path to the TIFF file (source for dimensions)
        jpg_path: Path to the JPG file to be reshaped
        output_png_path: Path to save the output PNG file
    """
    try:
        # Validate file extensions
        if not tiff_path.lower().endswith(('.tiff', '.tif')):
            raise ValueError(f"TIFF file must have .tiff or .tif extension, got: {tiff_path}")
        
        if not jpg_path.lower().endswith(('.jpg', '.jpeg')):
            raise ValueError(f"JPG file must have .jpg or .jpeg extension, got: {jpg_path}")
        
        # Validate files exist
        if not os.path.isfile(tiff_path):
            raise FileNotFoundError(f"TIFF file not found: {tiff_path}")
        
        if not os.path.isfile(jpg_path):
            raise FileNotFoundError(f"JPG file not found: {jpg_path}")
        
        print(f"Loading TIFF: {tiff_path}")
        tiff_image = Image.open(tiff_path)
        tiff_width, tiff_height = tiff_image.size
        print(f"  TIFF dimensions: {tiff_width}x{tiff_height}")
        
        print(f"\nLoading JPG: {jpg_path}")
        jpg_image = Image.open(jpg_path)
        jpg_width, jpg_height = jpg_image.size
        print(f"  JPG dimensions: {jpg_width}x{jpg_height}")
        
        # Reshape (resize) JPG to match TIFF dimensions
        print(f"\nResizing JPG to {tiff_width}x{tiff_height}...")
        resized_jpg = jpg_image.resize((tiff_width, tiff_height), Image.Resampling.LANCZOS)
        
        # Add alpha channel (convert to RGBA)
        print(f"Adding alpha channel...")
        if resized_jpg.mode != 'RGBA':
            resized_jpg = resized_jpg.convert('RGBA')
        
        # Save as PNG
        print(f"Saving as PNG: {output_png_path}")
        resized_jpg.save(output_png_path, format="PNG")
        
        print(f"\nSuccess! PNG saved with dimensions: {tiff_width}x{tiff_height} (RGBA)\nat location: {os.path.abspath(output_png_path)}")
        
    except FileNotFoundError as e:
        print(f"Error: File not found - {e}")
    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Reshape JPG to match TIFF dimensions and save as PNG")
    parser.add_argument("tiff_path", help="Path to the TIFF file (source for dimensions)")
    parser.add_argument("jpg_path", help="Path to the JPG file to be reshaped")
    parser.add_argument("--output_png_path", default="rgb_x.png", help="Path to save the output PNG file (default: rgb_x.png)")
    
    args = parser.parse_args()
    
    reshape_jpg_to_match_tiff(args.tiff_path, args.jpg_path, args.output_png_path)

tiff_to_http_payload.py

import numpy as np
from PIL import Image
import argparse
import os

''' Example usage:
For default output path (depth_x.data):
$ python tiff_to_http_payload.py path/to/input.tiff

To specify output path and read centre depth value:
$ python tiff_to_http_payload.py path/to/input.tiff --output_path path/to/output.data --centre
'''

def convert_tiff_to_http_payload(tiff_file_path, output_filename = "depth_x.data"):
    """
    Convert a TIFF depth map to HTTP payload (raw bytes) format.
    
    The output is a binary file containing the depth values as raw bytes, with no header or metadata. The depth values are stored as 32-bit floats.
    - HTTP payload written as raw bytes with no header
    - Format: 32-bit float (converted from Float16 if needed)
    - Layout: flat 1D array of length width * height
    - Units: metres
    """
    try:
        print(f"Processing TIFF: {tiff_file_path}")
        
        # 1. Load the TIFF file
        tiff_image = Image.open(tiff_file_path)
        width, height = tiff_image.size
        
        # 2. Convert to numpy array
        depth_array = np.array(tiff_image)
        
        # 3. If TIFF contains Float16 data, convert to Float32
        if depth_array.dtype == np.float16:
            depth_float32 = depth_array.astype(np.float32)
            print(f"Converted Float16 to Float32")
        elif depth_array.dtype in [np.uint8, np.uint16, np.uint32]:
            # If integer data, convert to float
            depth_float32 = depth_array.astype(np.float32)
            print(f"Converted {depth_array.dtype} to Float32")
        else:
            # Already float or other format
            depth_float32 = depth_array.astype(np.float32)
        
        # 4. Flatten to 1D array (width * height)
        flat_payload = depth_float32.flatten()
        
        # Verify dimensions
        expected_size = width * height
        assert len(flat_payload) == expected_size, \
            f"Size mismatch: expected {expected_size}, got {len(flat_payload)}"
        
        # 5. Save as raw bytes (no header)
        raw_bytes = flat_payload.tobytes()
        
        with open(output_filename, "wb") as f:
            f.write(raw_bytes)
        
        print(f"  Saved {len(raw_bytes)} bytes ({width}x{height}) to '{output_filename}'")
        print(f"  Data range: {flat_payload.min():.6f} to {flat_payload.max():.6f}")
        print(f"  Units: metres (assumed)")
        print(f"  File saved to: {os.path.abspath(output_filename)}")
        
        return raw_bytes, width, height
        
    except Exception as e:
        print(f"Error: {e}")
        import traceback
        traceback.print_exc()
        return None, None, None


def read_centre_depth(tiff_file_path):
    """
    Read the depth value at the centre of the TIFF image.
    Useful for verifying if the depth value is as expected (in metres).
    """
    try:
        print(f"\nReading centre value from: {tiff_file_path}")
        
        # 1. Load the TIFF file
        tiff_image = Image.open(tiff_file_path)
        width, height = tiff_image.size
        
        # 2. Convert to numpy array
        depth_array = np.array(tiff_image)
        
        # 3. Convert to float32 if needed
        depth_float32 = depth_array.astype(np.float32)
        
        # 4. Calculate centre coordinates
        centre_x = width // 2
        centre_y = height // 2
        
        # 5. Read centre pixel value
        centre_value = depth_float32[centre_y, centre_x]
        
        print(f"  Image size: {width}x{height}")
        print(f"  centre position: ({centre_x}, {centre_y})")
        print(f"  centre depth value: {centre_value} metres")
        print(f"  Data range: {depth_float32.min():.6f} to {depth_float32.max():.6f} metres\n")
        
        return centre_value
        
    except Exception as e:
        print(f"Error reading centre value: {e}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Convert TIFF depth map to HTTP payload (raw bytes)")
    parser.add_argument("tiff_path", help="Path to the TIFF depth map file")
    parser.add_argument("--output_path", default="depth_x.data", help="Path to save the output binary file (default: depth_x.data)")
    parser.add_argument("--centre", action="store_true", help="Read and display centre depth value")
    
    args = parser.parse_args()
    
    # Read centre value to verify depth if requested
    if args.centre:
        read_centre_depth(args.tiff_path)
    
    # Convert and save
    convert_tiff_to_http_payload(args.tiff_path, args.output_path)
1 Like

See here for the demo in action: https://youtu.be/a9u1Y3Amlxc

2 Likes

Current update:

I have a few documents I would share, such as the project timeline (in the form of a GANTT) I’ve been following since late November, and the gateway 2 presentation submission I used as part of an assessed presentation at the start of January. But I don’t think I have the ability to upload Excel or PowerPoint docs to Discourse.

Since the last update I’ve acquired the ToF sensor, cadded and laser cut its robot mount.

testing ToF sensor:

Robot mount for Sensor:

Ufactory Lite 6 robot that sensor will be attached to:

I have access to mounts for the Zed 2 stereo camera (the other modality) in the lab.

Next step is to design sensor modules to convert the ToF sensor output and stereo camera output into CMP format.

I’m hoping the risk assessment I wrote will be approved tomorrow (or hopefully this week), at which point I will start work on the motor system that allows Monty to operate the 6DoF robot arm with the mounted ToF sensor.

System Architecture:

The following was shown on the Meet Monty 2026 December Meetup, but I thought it would be good to share here also:

(note that this diagram is slightly simplified and does not include hierarchical learning modules)

4 Likes

This is really exciting to see! Thank you for sharing this detailed update :slight_smile: It’s great to see Monty recognize things in the video.

Strange about the decreased accuracy with the smaller step sizes. Let us know if you want to help debugging (might need some more detailed logs to be able to pinpoint the issue). One thing you could try is using @rmounir ResamplingHypothesisUpdater (latest feature integrated here: https://github.com/thousandbrainsproject/tbp.monty/pull/700 ) which might help a bit with the sim to real transfer.

I’m looking forward to seeing the ToF sensor mouted onto the arm, that looks pretty cool!

Best wishes,

Viviane

2 Likes

Awesome stuff Zach, thanks for sharing!

Agree with Viviane that the step size issue is a bit mysterious. How much of a drop in accuracy did you see? If you’re able to share some videos of the example policy with the different step sizes (and corresponding accuracy), that might also be helpful for debugging.

Thanks also for sharing the scripts you developed. You’re probably already doing this, but just mentioning that if you track your work in a GitHub repository, then at the end of the project, we can link to it on our public showcase page for others who might be interested in building on your work.

And re. the documents you mention, feel free to send those to my TBP email address, which I believe you have.

Looks like you’ve made some really nice progress and yeah looking forward to seeing the ToF in action :smiling_face_with_sunglasses:

2 Likes

Thanks both, I think it may have been an issue my end as it now seems to work well with smaller step sizes (except for the timeout/mis-recognition of the final image).

The video with the smaller steps is here if you’re interested.

I’ve also collected the W&B data, but I’m new to this and not sure what the best way of sharing it is?

For the smaller steps I created a monty motor system called “informed_no_trans_step_s8.yaml”, which is almost identical to the “informed_no_trans_step_s20.yaml” already there except for a change to the rotation_degrees variable to 8.0 instead of 20.0.

I’ll send over the documents to you soon Niels, I’m first trying to work out some issues with scheduling as I’ve been accepted on an educational scholarship trip to Taiwan from the 27th Feb-14th March so my original project timeline has been thrown off.

1 Like

Ok great, glad you were able to resolve it. The policy visualization also looks reasonable.

In WandB it can be helpful to organize the data using WandB Reports. You can then export these as PDF files. Alternatively, if you set the sharing permissions correctly, you should be able to link to runs and reports directly. For example, this is what Ai2 have been doing with their Olmo training runs. Hope that helps.

Sounds exciting about Taiwan, no rush at all from our end.

Ok perfect, I’ll share the data for completeness. Hopefully the following links work:

This is the WandB run with the smaller steps.

This is the WandB run without the smaller steps.

Since my last message I have mounted the ToF to the Ufactory Lite 6 robot and connected up and mounted the static ZED 2i stereo camera. Everything is functional.

I’ve tested the Ufactory robot with it’s python SDK and am able to control its movements and collect its joint data (within ±0.5mm repeatability). The Maixsense a010 ToF sensor didn’t have a python API but is based around a GUI which is a python package, so I’ve used GenAI to reverse engineer this package to make (and test) a python API for extracting the sensors data in code, this seems to work as intended.

The ZED 2i comes with a python API, but I’ve encountered a few slight difficulties I intend to iron out - the first time it runs the camera extracts data as expected but trying to run this again causes the camera and script to freeze. I have a feeling the API is not properly releasing control (might be something to do with not correctly releasing OpenCV frames) so I’ll look into that. Another thing to consider with the ZED cam is that it has a lens focal length of 2.1mm, which means its minimum depth range is 0.3m, so I’ll have to mount it further from the object than I thought. The ZED 2i is a bit overspecced for this application, but hopefully should provide good data.

My next steps are to work on integrating these components into Monty so that it can control motor movements (robot/camera) and know the location of the sensor relative to its world frame, any chance I could be pointed in the direction of where I should be writing the Motor system and adapting/writing sensor modules in the codebase for this?

I’m assuming Monty still is yet to support multiple independent agents? For now I’ll focus on getting the 2 individual agents working separately and then look into how to potentially get them working together at a later stage.

3 Likes

Really cool, thanks for the update Zach.

Thank you for sharing the Wandb reports, I can’t see anything obvious from looking at them - given the small sample size, the failure on the last trial with the smaller step size may just be a fluke.

I think your plan re. getting each agent working independently first is a good one.

Re. the sensor modules and motor systems, @tslominski may have some additional useful advice, although as a starting point I’d be interested in better understanding how you intend to control the arm. There are a few approaches you could take which come to mind:

  1. You could try to use the SurfacePolicy class; in principle it should work (i.e. use detected surface normals to guide movements) even with a very high (30cm+) minimum distance. The key behavior is that it should pivot around visible surface normals such that it translates in the plane tangent to the object’s surface, while continuing to face the surface. However this policy is currently written to control an agent that can move without any constraints in a simulated environment. Working out how to define the relative rotations and translations for your arm might be fairly complex to update, but it has the advantage that you can use the model-free signal of the object’s surface to guide your arm’s movements.
  2. You could adapt the approach taken by JumpToGoalStateMixin and execute_jump_attempt . The code is messy at the moment (in particular the inappropriate embedding of policy/motor actions within the EnvironmentInterface classes), but the basic idea is that a learning module can propose a goal, which is basically, “be sensing a particular location, at a particular viewing vector”. This is passed to the motor system, which then executes on this goal. In Habitat, this is done by teleporting the agent to the necessary location. In your case, you could possibly pass this goal, and the arm’s current state, to an off-the-shelf inverse kinematics system, and have that figure out the series of low level movements which are necessary to interpolate between these. This has the advantage that you get “model-based” information driving your system to move. The main caveat to be aware of is that the hypothesis-testing goals that are generated by the EvidenceGoalStateGenerator are sparse (e.g. every 10-20 steps), and so would only be part of your policy.
  3. A combination of the above: you could use the concept in (2) to also guide the surface policy. In particular, rather than your surface policy outputting a relative rotation or translation that should take place, it could output a final goal in environment coordinates. You could then pass this to your inverse kinematics model to work out the low-level movements that need to be executed. This would probably be a nice approach, as then you only need to implement a single interface between Monty’s motor system and the actuators in your robot, but you will be able to benefit from both the surface policy (for the dense policy, i.e. controlling most steps) and the goal states being generated by the LMs.

A final comment: the motor system is being significantly refactored at the moment by Tristan and Scott; this should make it easier to use once this is done, but in the meantime I would recommend that you pull from upstream frequently, there is a good chance that there will be daily updates for the next couple weeks.

Hi Niels,

So the ZED camera has that 30cm+ depth minimum, but the ToF sensor mounted to the robot has a reported minimum of 15cm, though it appears it actually seems to have accurate readings down to ≈5cm. If I am to use the SurfacePolicy designed for Habitat I guess I’ll have to prevent Monty modelling the work surface and other background items not present in the simulator, but hopefully I’ll be able to get the sensor reasonably close.

I will try combining the 2 options as suggested and reach out if I hit any walls. Relating to the motor system refactoring, I’m away this Friday on my trip so I won’t have access to the robot anyway. I’ll see if I can get much done remotely whilst keeping my repo up-to-date, but the motor system might be in a more stable state when I return.

For the stereo camera I was considering using SaccadeOnImageEnvironment, unless you can suggest a better alternative method I might try?

1 Like