I’ve been testing several datasets to evaluate Monty’s object recognition performance.
Regardless of recognition accuracy, I noticed that recognition time varied significantly depending on the dataset:
With the Omniglot dataset (using SaccadeOnImageEnvironment), it took about 30–40 seconds per object.
With the MNIST dataset (also using SaccadeOnImageEnvironment), it took around 20 seconds per object.
In contrast, recognition with the YCB dataset only took about 3–5 seconds per object.
This seems counterintuitive, as I would expect 3D object recognition to take longer. However, I’m assuming there may have been issues in how I implemented the 2D recognition tests.
That said, I would like to achieve at least 10 to 20 times faster recognition time for the YCB dataset.
From my understanding, since the hypotheses are independent of each other, parallel comparison of all hypotheses with the input patch might be possible.
Are there any plans to improve Monty’s recognition speed this summer?
If so, would it be possible to share any ideas or strategies you’re working on? I’d love to apply similar techniques to my own project.
that’s exciting to hear that you are already implementing tests of Monty’s object recognition accuracy on other datasets! I am not sure why you are seeing slower runtimes for those. I would have to see your code to be able to pinpoint the exact issue, but some potential ideas that come to mind are:
You may be learning more dense models of the objects in the Omniglot and MNIST dataset. This could happen because they are at different scales. I.e., moving your sensor by a unit of 1 on the YCB dataset in habitat moves it by a meter, so most movements are at the scale of 0.01 (1cm) or less. But I think at least on Omniglot a unit of 1 represents moving over 1 pixel so relatively speaking movements are much larger. This means you might have to adjust some of your learning parameters like graph_delta_thresholds.
Relatively, during matching, you may need to set different tolerances and max_match_distance parameters to converge faster.
You could have a look at how many steps each episode takes for the different environments. If the Omniglot and MNIST environments take about the same number of steps to converge, that means each individual step is slower, and the problem is likely that the learned models contain way more points. If they take more steps than the YCB objects, then the issue is likely the LM parameters that define how quickly the terminal condition is reached and when observations will eliminate hypotheses.
We did discuss speedups in the past and made several improvements. For a recording of those discussions, you could watch those two videos:
However, this is a research code base (written in Python) so we did not take a lot of measures optimizing for speed besides what was fundamentally necessary to allow for quick experimentation. There are certainly many ways the current implementation could be sped up (whether this is parallelizing it more like you mention, or using more efficient implementations of certain functions), and we are happy about any contributions on this front if you have ideas!
I just watched Speed-up Discussions – Part 1 and, as expected, KD-Tree search dominates Monty’s processing time.
In the video you mentioned a lookup-table approach as a faster alternative to the KD-Tree.
Is TBP actively developing or planning to integrate that option? I realize the LUT would demand far more memory as the number of objects and LMs grows, so I’m curious how feasible a hardware implementation of Monty would be in that case.
For now, parallel KD-Tree queries seem more practical: during the recognition phase the KD-Tree is read-only, so every hypothesis that uses the same model can access it concurrently.
Are there any runtime-optimization tricks already baked into tbp.monty? If so, could you point me to the relevant code or documentation so I can try them out?
Great questions!
We explored alternatives to KD-tree search, including alternative tree search methods and LSH. The corresponding code is in this folder in our monty_lab repo. Particularly, the KNNSearch.ipynb contains relevant code for LSH at the bottom. However, it didn’t turn out to be more efficient. It seems like for search in 3D space (as opposed to higher dimensional space) kd-tree search is already quite optimized and LSH used a lot more memory while not providing much speedup. Later on when I put together the constrained object models (GridObjectModel in the code) I again tried alternative search by indexing the grid but again, it wasn’t faster. However, I am not an expert with this so I might have missed something.