Learning Categories of Objects

YiannosD · December 27, 2025, 1:17am

Hello,

I have been looking at Grid Object Models for Unsupervised Learning and how to learn categories of objects. Before jumping into making a dataset and testing I had some conceptual questions the team might have already thought about.

I’m planning on starting with mugs or something easy like that, but when thinking about the future I inevitably thought about applying this to medical imaging since that’s my area of work. Given a CT scan, we essentially have a 3D model of part of a human with features at a pose (intensity, curvature). The cool thing about it is we get X-ray vision, so Monty could learn a model of a human body, but within that know the objects that compose it such as heart, liver etc. This is where generalizing to within-category objects comes in, since organs can have different sizes and structures.

Although mugs can also vary in structure and size, most of them just vary in color patterns. My main question was: do you think it will be better to start with an object that has more variance in its size and structure? If so, do you have any recommendations?

**Edit to say I think this is very interconnected with the Support Scale Invariance issue but does the team consider them completely separate? One could separate categories with objects of same structure and size but different color but I think that’s only a subset of generalizing within categories, and to truly do that you would need scale invariance.

nleadholm · December 29, 2025, 3:29pm

Hi @YiannosD , that’s great to hear you’re interested in this item.

You’re absolutely right that morphological/structural differences would be important for testing this. While mugs do often differ mostly by color, they can also have structural differences like the (admittedly odd!) examples below show. You would want to reflect this to some degree in any dataset to see if canonical representations emerged that are more like the “Platonic” representation of a mug we often think of (partly hollowed cylinder + a handle).

It certainly isn’t a requirement that you use mugs. Maybe one other idea if you wanted to build a dataset from publicly available/open source 3D assets would be a dataset of 3D modelled trees and boulders. Do you then get a fairly generic model of a tree (brown, vertical center, bushy green top), and boulder (round, brown-grey object)? If you have different threshold for detecting an object as being unfamiliar, you may get different models of different granularity (e.g. one LM learns a model for conifers vs. deciduous trees, while another one has even more fine-grained separation by species).

Medical data would definitely be interesting to explore in the longer term, but I think it would be best to stick to something simpler for now.

Re. scale invariance, I agree this is a related issue, in that categorization can either be dependent or totally invariant to the scale of an object (e.g. toy car vs. an actual car). It’s probably simplest if this can be avoided for the moment by limiting scale differences between objects that we expect to have the same category to within ~10% of one another. Once we have implemented a solution to scale invariance, then we could consider more extreme deviations from this.

Hope that makes sense, let me know if I can clarify anything.

YiannosD · January 7, 2026, 3:26pm

That makes sense, thank you! I think trees might be a very good example to start with. I’ll work on this and see where it takes me!

nleadholm · January 8, 2026, 9:37am

Ok great, sounds good! Let us know if there’s anything we can help with once you get started.

Rich_Morin · January 8, 2026, 6:21pm

Although you may be able to find datasets of 3D modeled trees and/or boulders, a problem could arise if you want to bring actual examples into your lab (:-). One advantage of mugs and staplers is that they are compact, light weight, etc. So, you might want to consider working with smaller plants and/or rocks, instead.

On a vaguely related note, some years ago I took a blind friend to the San Diego Botanic Garden (Encinitas). We examined and discussed a number of plants, trying to find structural and morphological commonalities that would translate between sight and touch.

YiannosD · January 10, 2026, 3:05am

Once we need a real tree it may actually be fun to work outside haha! But yes plants might be easier to work with, thanks.

That’s an interesting story, sounds fun!

Topic		Replies	Views
Some thoughts about scale invariance and model composition Research and Theory	2	81	January 16, 2025
Inferring and learning non-rigid objects Research and Theory	6	113	December 2, 2025
How does Monty know that it is the same object at different scales? Research and Theory lm , scale-invariance	2	46	January 10, 2026
Some Questions from the Documentation General	21	383	January 10, 2025
2025/06 - Brainstorming Around Behavior and Deformations Video Discussions brainstorming-video	11	203	December 21, 2025

Learning Categories of Objects

Related topics