Inferring and learning non-rigid objects

AgentRev · December 2, 2025, 9:12am

When I wrote “dimensionality reduction” above, I was more referring to sparse coding, as described in this paper: Neural correlates of sparse coding and dimensionality reduction

(My wires probably got crossed because of that title!)

So I’m thinking more about raw visual features, rather than abstract things like “heaviness/brittleness/sliminess”. Stuff I listed at the beginning of my previous post like spatial frequency, texture, etc. but learned and inferred thru columnar voting rather than DNNs. Think of it as compositional features / categories to help further disambiguate objects or entities. The idea is not fully formed yet in my mind as to how this would be computed efficiently, especially in an orientation-agnostic way, however one thing is certain:

It’s kinda normal that DNNs don’t take “shape” into account because the datasets are curated to be orientation-agnostic, which dilutes shape representation altogether. And I’m thinking that animals probably deal with shapes at a higher cognitive level beyond what DNNs can achieve. So if you successfully mix texture recognition with point clouds, you’d theoretically achieve better results than both DNNs and 3D-only Monty. I’m not certain that even this would go far enough to qualify as human-level, but it would be a great stepping stone.

About affordances, is it realistic to think about them at this stage? To me, that would be more of a high-level reasoning task to come later down the road, since it requires more advanced understanding of environmental relationships and world physics. Gotta build an associative predictive coding engine for the whole Monty machinery. Once you get that engine running smooth, then you can power the affordances drivetrain.

Unless you meant that you intend to start tackling this in the short term, or you’re taking another approach?

@AdamLD I think composite objects are also a higher-level task that Monty might not have the pre-requisites for right now. In my opinion, the way we think in terms of “parts” is more of a cultural skill than an evolutionary cortical function. Does it make sense to try “hardcoding” it rather than teaching it?