We humans perceived natural objects long before cups and staplers were invented.
It might seem beneficial to start Monty with manufactured objects - because they usually have clean lines and simple behaviors - before natural stuff, because they are complex. But doing it the other way, ie. starting with natural stuff, gives us potentially useful perspectives:
-
Computing point normal vectors isn’t efficient for natural surfaces such as leaves, treebark, rocks,.. Therefore, Monty will likely start with an object’s rough outline.
-
For natural objects, we will more likely perceive the whole before its components. Prehistoric brains perceived a forest before its trees, a tree before its individual leaves.
5 Likes
Welcome to the forums @Trung_Doan ! I definitely agree! It’s something we try to be mindful of, and came up recently in a conversation about “logos on cups”, and how frequent 2D references frames (like a printed logo) are in nature vs. the natural world.
When it comes to static objects, I think we’re optimistic that similar principles would apply, and as you say, it would be rough outlines. For example, a rock would not have all of its contours learned (unless it was a very very important rock). Instead, the point-normal type feature extraction would ignore the details of crevices and cracks in the surface. Similarly for a tree, a cluster of leaves could inform a local surface-like feature, rather than each leaf being recognized. Together with sparse models, this would contribute to the more child-like understanding of a tree as a central trunk with a blob of green.
We think the above could work well with hierarchy/heterarchy, in line with what you say about “the whole before its components”. For example, you might have very rough models of coarse shapes like blobs, ovals, etc. These could serve as a Gestalt perception that you see immediately, and which can prime you to see something like a tree. Only with focused attention would you start spotting individual leaves.
1 Like