2025/06 - Brainstorming Around Behavior and Deformations

@hlee leads the team in brainstorming around our current thinking of behavior, morphology models and distortions.

00:00 Introduction and Meeting Agenda
01:50 Recap of Object Behavior Modeling
05:55 Discussion on Object Distortions
13:01 Debate on General Deformation Models
56:41 Understanding Behavioral Models
58:59 Flow Fields and Sequence Predictions
01:05:38 Distortion Maps and Object Reference Frames

1 Like

This is a fascinating discussion. But pleaaaaase, could you make a presentation publicly accessible? It requires a paid account but even after that it doesn’t provide an access to it :angry:

hi @srgg6701 ,

I talked to the team and we’re ok with making these public. Here is an exported image, and the link should now be read only viewable! :tada:

https://link.excalidraw.com/l/gO8aypxgkZ/6kbbyIp6oU1

4 Likes

Great, I got them. Thank you, Will! :folded_hands:

1 Like

TL;DR: Could columns handle deformations by learning a graph of morphologies for a single object? Or by learning key frames and relying on a higher-level sequence to lead the predictions? While thinking about this, I came up with a small mental‑imagery experiment I share below; would love your take.

Thank you for your work and for sharing it with the world! This video in particular was very thought‑provoking, so I wanted to share some ideas it sparked. I apologize if my questions have already been covered elsewhere, I simply don’t have enough time to check everything you publish.

Two hypotheses on deformations

From what I learned about HTM before, it remembers sequences of it’s inputs and outputs a “label” for a given sequence as long as the predictions keep matching. But does it have to be a linear sequence, or could it instead be a more general case of a graph? If so, a column might learn not just one morphology but a family of morphologies under the same object label, with predictions driven by the current node in that graph, while persistent label will help to perceive the object as “the same” throughout the deformation process

Another possibility is that deformations could be learned by a hierarchical setup, where the lower-level column learns “key frames” of an object’s morphology as distinct objects, and the higher-level column learns sequences of those key frames. Intermediate frames can then be represented by an ambiguity in the lower-level column’s output, supported (maybe even maintained) by a feedback from the upper level that knows the sequence. That may also allow for accurate predictions on the lower-level, since (as far as I understood from an HTM course) ambiguity in the column output means predicting next inputs for every model in question, simultaneously.

Mini thought‑experiment

  1. Pick an object that has two common end‑states (e.g., a two‑position toggle switch or an automatic umbrella).
  2. Mentally visualize it in one familiar state for ~5s.
  3. Visualize it in an unusual mid‑transition state (e.g., switch two‑thirds through its stroke, umbrella two‑thirds open) for ~5s

Shared observations (several participants)

  • For the unusual state, the brain seems to start from a familiar state and mentally “move” the object—some described it as pausing a video, others as manipulating it by hand
  • Conceptually, the target state feels instantly understood, but rendering it often requires mentally running through a sequence.
  • Holding the image of an unusual state feels noticeably harder than holding the familiar version, and a bit unpleasant. It tends to “snap back” to a familiar state unless we actively keep it frozen. A feeling is similar to hearing an unstable musical note that “wants” to be resolved
  • After several attempts, it becomes much easier to hold the intermediate state in mind, as if the brain learned it

Not sure how well this supports any of the hypotheses above, but it fascinates me that simply replaying mental sequences seems enough to learn new internal objects. It makes me wonder whether the mechanism of learning deformations is also part of how new high-level concepts arise in the brain from well-known ones, even without fresh sensory input.

Thanks again for the inspiration! I’d love to hear your thoughts

2 Likes

Hey @vizvamitra, welcome to the forum! :smiley:

Really cool thought experiment you did, and thanks for sharing what you found here.

From what I learned about HTM before, it remembers sequences of it’s inputs and outputs a “label” for a given sequence as long as the predictions keep matching. But does it have to be a linear sequence, or could it instead be a more general case of a graph? If so, a column might learn not just one morphology but a family of morphologies under the same object label, with predictions driven by the current node in that graph, while persistent label will help to perceive the object as “the same” throughout the deformation process

Another possibility is that deformations could be learned by a hierarchical setup, where the lower-level column learns “key frames” of an object’s morphology as distinct objects, and the higher-level column learns sequences of those key frames. Intermediate frames can then be represented by an ambiguity in the lower-level column’s output, supported (maybe even maintained) by a feedback from the upper level that knows the sequence. That may also allow for accurate predictions on the lower-level, since (as far as I understood from an HTM course) ambiguity in the column output means predicting next inputs for every model in question, simultaneously.

I think both of these hypotheses make sense, generalizing a sequence (a path graph) into a graph or storing some list of “key frames”. I suppose both the edge connections in a graph can inform us how objects could change their morphology, especially if there are multiple ways an object can deform but also if one deformed state cannot jump to another deformed state directly.

Few things I want to mention about “key frames” in particular:

  1. We want to avoid storing all key frames of an object while it is deforming. I think this is actually supported by your thought experiment. :slight_smile: It seems like participants didn’t really “store” the unusual intermediate state in the beginning of the experiment, hence feeling a bit harder.
  2. There are many reasons for the above “constraint” - not just in pure memory requirement, but also, we want to avoid having to re-learn whole objects in every step of the deformation sequence. For example, if you see a stapler open for the first time, you will have to learn how the locations of the stapler change but you don’t have to relearn everything about the stapler (e.g. you know it will still have the same color and print on it and the overall shape of the top will be preserved and doesn’t need to be relearned).
  3. That said, there may still be some key frames, especially when objects are stationary after deformation - I could imagine storing a switch model in both “on” and “off” state, but not necessarily the middle limbo, haha. In some of our past research meetings on modeling object behaviors we discussed that while the object is moving, the sensed changes would be stored in the behavior model. When the object stops moving, there is no more change detected, so the sensed features get stored in the morphology model again, potentially as a new “key frame”. You might find this example interesting that Jeff brought up in a recent research meeting about how we can only learn how a horses legs look like as it is running if we can take a video of it and pause it https://youtu.be/daOHoMy0Ly4?si=QZTOLdhB4kUUrlF_&t=3969 .

So modeling object behaviors is still a very ongoing brainstorming topic (and I hope to hear your thoughts on future videos we post about these!), and we have been discussing things like interpolation as well (to quickly “construct” a middle state on the fly without having to learn and store a keyframe).

It makes me wonder whether the mechanism of learning deformations is also part of how new high-level concepts arise in the brain from well-known ones, even without fresh sensory input.

This is a pretty intriguing idea! By “high-level concepts” are you referring to abstract concepts like “democracy”?

2 Likes

I’ve watched the Brainstorming video, and would like to add my own ideas to the brainstorming session, and hoping to have people consider other possibilities as well.

From Jeff’s book, there is a quote where he says that a single neuron can’t do much on its own. But when you put many of them together, magic happens. In the same way, I get the sense that we are trying to cram too much into a single Learning Module and have it do too much. I think we can solve the deformations problem more easily if we use higher level LMs. So instead of one layer of LMs, we would have several, and together they can identify deformations.

So I think we could have one single LM be able to recognize a full object by itself. But we don’t have to get a single LM to learn deformations by itself. The system could only support deformations only when multiple LMs are working together.


Also, I think Jeff was right in the video, we are missing some key concepts.
We are not thinking about the problem correctly. Here are some ideas I think are missing…

What if a cup is not a cup? We are thinking of a cup as a single standalone object.

But what if it’s not?

Imagine we drop the cup and break it, then we glue it back together. So the cup is now made of multiple shards. It’s not a standalone object anymore. It’s a composite object, made of several sub objects - much like a car.

So what if we think of the unbroken cup also as a composite object? To us as humans it feels like a single standalone object, but does our brain really represent the cup as a standalone object? Guessing… probably not.

Let’s consider this idea for a moment. If the first layer of LMs can identify the parts of the object (for a cup that would be: handle, body, base, rim), then send that information to higher level LMs where their relative location to one another can be processed…

And say the higher level LMs can store a “range” of relative distances between the parts of the object (instead of storing a transformation map)…

For example, the higher LM can store the distance between the cup handle and the cup body as 0 - 5 centimeters. Then it can store the distance between the round rim to the base as 6 - 12 centimeters.

What can this do? I think this type of setup could detect most transformation of a cup, and still identify it as a cup. It wouldn’t be able to detect a specific version of a cup (unique cup) but it can detect the category of the object as a “cup category”.

And for unique objects that are in the real world, it can store the exact distances between the different parts of the cup. That way it could both identify unique objects, and also other objects in that category when seeing them for the first time.


Also, I’d like to challenge the idea that morphology is the only thing that matters. Currently we are thinking about “objects at location”. But I think instead maybe a different approach to consider would be “mass at location”. (By “mass” I mean as the one defined by physics, like in “matter”.)

Let’s consider a cloud in the sky… It doesn’t have a fixed shape. It can take any shape. It can even take the shape of a cup if we’re lucky enough haha… So a cloud is just matter, just mass, just water vapor in a higher concentration.

Or we can consider the air itself. It’s invisible, we can’t see air but we feel the wind. If we close our eyes and blow into our palm, we can feel the air pressure on our skin. But we know that the pressure doesn’t mean an object is present there. (Imagine a robot with touch sensors being placed outside in high winds… the robot will think it’s touching objects when in fact it’s only sensing the air pressure on its touch sensors.)

So I’m saying that certain things can be defined by their morphology. But other things cannot.

And I think one of those things is the t-shirt. I believe a t-shirt is not defined only by its shape, but also by its matter type (the material it’s made of - fabric). I could cut a large piece of paper into the shape of a t-shirt, and the AI will think it’s a t-shirt (based on the shape), when in fact it’s not.

Well, my hope is that you will consider these ideas. And if not, I hope that just thinking about these can spark other ideas at least.

1 Like

Hi @AdamLD

Those are some great points and definitely things we have considered or are still considering. I hope you don’t mind if I group them into two broad categories to give a bit more context on how we think of those:

  1. Using hierarchy to deal with deformations

Hierarchy is definitely something we plan to leverage to deal with differently shaped objects that belong to the same broader class of objects (like different types of cups, cars, or airplanes having differing low-level shapes but a consistent relative arrangement of subcomponents). I gave a presentation on this a while back with some visualizations that you might find interesting: https://youtu.be/-qPfBrTVoks?si=mBfh_lLpAvNEpsB\_&t=3791

However, I think this will only be part of the solution as it can just be used to recognize different instances of the same type of object, but not to predict how the relative arrangement of features on an object changes as it deforms. Imagine a balloon or a ball being inflated or squished. You can make pretty good predictions of how the print on the balloon will deform as you inflate it, or on the ball as you squish it (even if you have never seen a ball/balloon with this particular print). It might still be solved using hierarchy (@nleadholm made an interesting proposal in the next research meeting, which should be uploaded soon). We are still trying to figure out all the details of it, so if you have more ideas around those particular cases, please feel free to share!

  1. The definition of morphology

I think this might be a bit of a misunderstanding of what we mean when we talk about morphology. It is not meant to be limited to hard physical mass. The models we are building are simply defined as “features at locations”. The features can also be how the water vapor of clouds appears on our retina, and the relative arrangement of those features can change. When we talk about morphology models in distinction from feature models, we isolate orientations from pose-independent features like color, texture, or temperature. So a purely morphological model would just be orientations at locations.

I am not sure how to tie the sensation of wind into this right now. I think this is a question for another day :smiley:

A t-shirt is an example we often use in our brainstorming meetings, but we still don’t fully understand it. It seems to be defined by local relative arrangements of features, but the global shape can vary a lot. It can have so many variations that we can’t possibly learn them all, yet there are many variations that would not make it a t-shirt anymore (like if you cut it up into little pieces), and we are able to tell them apart. Just the fact that something is made out of fabric isn’t the only thing that tells you it is a t-shirt (+I would argue that a human would also recognize that the paper is in the shape of a t-shirt without trouble, even if it is a different material).

I hope this helps give a bit more context on how we are thinking about these topics. As you can see, we also still have a lot of questions around this ourselves so your thoughts and ideas are definitely appreciated :slight_smile:

2 Likes

Hi @vclay

Thanks for clarifying the morphology miscommunication. :grinning_face:

For the t-shirt, agreed. I think the shape also matters. (Not just the fabric material type.)

I believe to solve the t-shirt problem we need “AND” thinking instead of “OR” thinking. Meaning, I believe it’s not either fabric OR shape. I think it’s fabric AND shape.

Maybe it’s some other things as well that we are missing…. And maybe they count in different amounts… (like a weighted sum).

Overall, I think a t-shirt could be identified by a Perceptron like weighted sum:

weighted_sum = w1*(shape) + w2*(fabric) + w3*(some other things)

(Where w1, w2, w3 are the weights.) So if the sum is greater than some threshold (the negative of the Perceptron’s bias), then the Perceptron would fire and the t-shirt is confirmed and identified!

The question is what are those other hidden things? haha (Maybe it’s something we should not explicitly code in, but let the general cortical algorithm figure it by itself?)

Also thanks for the video link, I’ll definitely check it out! :grin:

1 Like

I agree, it is a mix of features and shapes. One idea we’ve been tossing around for a while now is to have separate feature and morphology models. That way they can be recognized independently and arbitrarily combined. You could recognize the shape of a t-shirt in the paper cutout, but also that it has the features of paper (texture, color, …). Different types of t-shirts can have different feature maps associated with them for different fabrics or prints, without having to relearn the general morphology of a t-shirt.

Since we have many semi-independent modeling units and not just one global classification output, those representations can be active at the same time and provide us with more nuanced perception of the world and its huge range of possible combinations of models. The same object can be “clothing”, “t-shirt”, “my favorite t-shirt with that fun print”, “cotton” and “folded” at the same time.

I hope that makes sense :slight_smile: Also, just again a disclaimer, those are my current musings on the topic and not finalized answers that we all agree on.

2 Likes