2024/12 - Brainstorming on Compositional Policies - Part 5

brainwaves · January 27, 2025, 11:10pm

Viviane presents a summary of object behaviors as it relates to modelling object state and presents possible identified connections between layers in the neocortex that might support this. Bonus: Jeff presents a working hypothesis about how minicolumns might have evolved from output to control muscles.

Rich_Morin · January 29, 2025, 2:33am

I’m pleased to report that the A/V quality of this session was greatly improved from the previous ones. This made it possible for me to follow the discussion pretty well, even when the topics got a bit chewy, speculative, etc. +1, and thanks for letting me be a fly on the wall!

The discussion of staplers made me think about a recent experience. I had a shipping envelope that needed sealing, so I looked around for relevant objects. I briefly considered using a motorized stapler, but settled on a common desk stapler and some packaging tape. These three objects have very little in common in terms of morphology or even features, but I was easily able to evaluate them, make a decision, and act. I find it fun to consider how different hierarchical levels might have acted (and interacted) to achieve this result.

Incidentally, Don Norman’s book The Design of Everyday Things talks a lot about affordances. I found it interesting, illuminating, and a lot of fun (e.g., his snark: “It probably won a design award.”).

On a different note, I’ve been having a lot of fun watching a pair of kittens explore our house. According to a ChatGPT response, the basic structure of a cat’s cortex is pretty much the same as a human’s. That is, it has the same columns, levels, minicolumns, etc. However, “Their prefrontal cortex is much smaller, meaning less high-level planning and abstraction compared to humans.”

All of this makes me wonder whether it might be useful to consider and discuss behaviors and compositional policies in terms of cats. Clearly, they have goals and actions, employ the same senses, and use sensorimotor behavior to solve physical problems. They also appear to have some ability to abstract over objects (e.g., cabinet doors, small toys). So, I’d love to hear a session that takes on “modeling cats”.

artem · January 29, 2025, 9:00am

This comment is not regarding “biological” part of the discussion, but rather the conceptual one, about states, behaviors and reusing them. Yet in first brainstorming part there was a suggestion, that upper neocortex layers’ grid cell-like machinery might encode state space, which is distinct from physical one, but which, as any space, may imply existence of concepts of location, movement, probably path integration. Terminology is always important, but the discussion had shown that some terms meanings here are yet to be clearly defined, so I will describe my understanding of the concepts, which may be wrong or differ from the Monty team’s one.

I understand “state” as a certain relative spatial arrangement of object’s features. A non-compositional rigid object (if such objects can exist in principle, as objects can be at least broken into pieces) within its own reference frame can have only 1 state, so there are no behaviors to model. Behaviors are possible when an object consists of a few parts that can have relative poses and these poses may change, like 2 parts of a stapler can move relatively to each other. (Related post: Some thoughts about scale invariance and model composition)

We need to have behaviors as standalone entities that could be reused across objects. If we assume that there can be a standalone “state space”, then probably this space has to contain these reusable entities, which have to be somehow distinct from each other. Behaviors happen when something, internal or external to an object, forces it to change its state. The way an object is composed defines how exactly it can change with a force applied. Therefore, I would give following general definition to “behavior”: it is a force directed by constraints. The constraints can be physical borders + resistance to a change within the borders. A non-rotating object in zero-g and vacuum is a corner case with no borders, but it will still have at least inertia. From this, it looks to me that the reusable entity, the atom of state space is a simplest constraint, or a primitive degree of freedom (DOF). Movement in physical space is following certain primitive DOF(s) until finding states with different set of available DOFs, this transition from one set of DOFs to another is the movement in state/behavioral space. The space can be represented by a graph with primitive and composite DOFs as nodes. Looks like a state machine.

As for “path integration”, in the video it was explained with example of returning to the starting point directly via a new path. In reality direct path may be not possible due to unknown obstacles (constraints), so I can imagine path integration as the ability to find a new path of arbitrary complexity to get to arbitrary point by avoiding present obstacles, foreseen or discovered during movement. Like on the picture below: something can make certain paths unusable, which the traveler may find out by looking or walking, but knowing AB and BC directions and lengths, the traveler can calculate location of A and hypothetical D point to return to the starting point via it, like CD->DA. Depending on the obstacles there may be many points between C and A, some path segments may even lead away from A but eventually turn back to it.

This picture is probably about what HC-EC does, while a column should work with objects. To me the following example looks both simple and informative: a bolt latch.

It may have one DOF in fully closed or open states (roll up/down), and another one in between those states (slide left/right). For a latch, movement in the state space means movement between DOFs. Changing position in physical space within any DOF can be considered as a cyclic transition to the same state space’s node, like when rotating the bolt up from closed/open state: same vertical DOF is available with each roll movement. Reaching certain positions (which we need to learn) in certain DOFs can lead to transition to another node, with different amount of DOFs, for the latch it is 90 degrees roll, from where one could either roll it further up/down (rotational DOF), or slide sideways (translational DOF). Sliding sideways leads to transition to translational DOF, which eventually can lead to both DOFs and then again to rotational one only:

If we had only orthogonal DOFs, there would be maximum 6 of them in 3D space and predefined amount for any other dimensionality. But different parts of an object may constrain each other in various ways, probably making DOFs non-orthogonal, hence we need to learn DOFs and their combinations for each object class. Also we need to learn the limits within each DOF and the magnitude of resistance to change, to simulate friction, inertia, etc.

Each DOF can be represented by a vector (rotation or translation axis) and therefore can have a pose in object reference frame: this is where and how association of objects and behaviors could be done. A model of an object would contain not only object’s features-at-poses, but also a DOFs graph. Some DOFs could be reused across different objects models, means be included in multiple graphs, allowing to move between alike objects in behavioral (state) space.

Having DOFs as vectors in physical space may allow for path integration: finding new paths in physical space is equivalent to exploring new DOFs that would eventually allow to return back to original state in a novel way.

Does this stuff make any sense?

Thanks.

Rich_Morin · January 30, 2025, 7:26pm

I’m in violent agreement with this, though I have no clue how it can or should be implemented. For example, our easy recognition of @vclay’s running banana shows that the “running” behavior extends to all sorts of objects.

On a related note, I wonder how object behavior relates to active manipulation. We can recognize that objects are spinning, but we can also take actions that cause them to start (or stop) spinning. This might be direct (e.g., spinning a top) or indirect (e.g., switching on a fan). Comments?

HumbleTraveller · January 30, 2025, 9:33pm

Couldn’t we define “behaivor” as the temporal pattern of a given model state? If so, then wouldn’t we be able to just group together temporally similar patterns?

artem · January 30, 2025, 9:49pm

I guess “running” could be recognized even without banana itself, just by arms and legs moving in space, although it might take few extra moments to understand that these are arms and legs. It’s the way they move back and forth, with higher amplitudes at their distal ends, what defines “running”, and maybe the angle in the knees joints, which helps to distinguish “running” from “walking”, or it just seems to me. It is easy for us because we already are the sophisticated sensorymotor systems (with lots of experience) Monty strives to become, but the running behavior as such looks quite complex, otherwise it wouldn’t be a hard task in robotics. Probably a pendulum could be a simpler example to consider. It can be just a rigid bar with a single degree of freedom and certain acceleration/deceleration profile, and it can be extended to other things: if something hangs somewhere with its upper part and swings (say me on a pull up bar), accelerating and decelerating, it can be recognized as a pendulum. If there would be same DOF and no acceleration/deceleration, but abrupt start/stop in extreme positions, this would rather suggest that it is a kind of mechanism like a windshield wiper or so, not a pendulum. Its suspension (depending on a type) can constrain it to only 1 degree of freedom; acceleration/deceleration is defined by interplay of tension and gravity, and how the pendulum resists to them due to its mass/inertia. “Resistance” may be a fixed value (friction) or a function of position (to model acceleration); how it actually swings will depend on the forces’ magnitudes (e.g. on the Moon gravity will be lower and so the pendulum period longer), but it will still be a pendulum (or me on a pull up bar), not something else. That’s why for me constraints (or DOFs as what’s allowed by constraints) and resistance (maybe also an exact configuration of an object, as with the cartoonish banana’s knee joint angle) are what defines “behavior”. To me this is something that has to be simulated, and I guess a possible way to implement it is mentioned in Implement & Test GNNs to Model Object Behaviors & States.

On a related note, I wonder how object behavior relates to active manipulation.

A manipulator, say a finger, is also an object that has own constraints - bones’ shapes and joints - and muscles, which, being constrained by the bones and each other (antagonists), apply force to a certain position in space (where target object is) in a certain direction. And it is again something to simulate. Indirection like switching something to make it behave on its own (due to internal reasons) just adds more things to model and simulate. That’s the way I can think of it now. I hope I’ve got your point right, please correct me if I’m wrong.

artem · January 31, 2025, 7:57am

How would you define “temporal pattern” itself? A kind of melody with a “given model state” as an arbitrary note in it? Take for example a gamepad analog stick, almost any next position is possible for almost any given state, depending on our actions, so there is a huge amount of possible movement trajectories: how its temporal pattern could look like?

HumbleTraveller · January 31, 2025, 11:02am

Hmmm. I actually quite like your melody analogy. Personally, I find myself wanting to view DOF as a sort of predictability horizon (in the chaos theory sense of the word). With a highly constrained DOF (your latch bolt example), you have a much more narrow horizon, moreso than something like the analog stick. Given that narrower horizon, the range of potential future states becomes much more predictable. It’s “behavior” becomes constrained.

However, not everything has such constrained behavior, nor are most things predictable over long enough time-scales.

I guess, imagine this: a song begins to play on the radio. You recognize the opening tune, even though you can’t quite name the song yet, though you try and guess its name all the same. Your guess turns out to be wrong. Turns out it’s a completely different song than what you predicted, despite the similarity in that opening state. The analog stick is like this.

Sometimes it is impossible to model a things temporal pattern, at least while working within an isolated “model state” of that pattern. Instead, sometimes what you need to do is observe a sequence of these model states, which are in close temporal proximity to each other. It’s as though you had listed to that song on the radio for a full minute now. You’ll have gained a much more likely guess as to its name now. As for our analogy stick analogy, it’d be like observing: " Up, Up, Down, Down, Left, Right, Left…" (you can probably guess the next direction).

Does this make sense?
Not sure if I answered your question, or if how I’m viewing this is even correct, but it’s how I see it in my own mind.

artem · January 31, 2025, 12:09pm

Thank you for the details. Melody analogy is not really mine, I guess it was used by the Monty team as example of how HTM works, which is about learning and predicting sequences, modelling one-dimensional data. To me it seems like you mean HTM. Specific sequences of stick movements also can be learned, if repeated enough times and are quite stable. But it is modelless, as far as I understand. It was mentioned in the brainstorming series that action sequences are first based on objects models, but if repeated frequently will eventually migrate to HTM-like execution, less flexible, but more fast. From my understanding such learned sequence may still work even if constraints somehow change, unless they do not interfere with the sequence’s movements and allow to reach same results.

Given that narrower horizon, the range of potential future states becomes much more predictable.

I think that the point of model-based behaviors is to avoid predicting potential future and predicting in general. Models are needed to actually run a simulation, “live” through various possible scenarios and pick up needed one to execute in real world. What we may know beforehand is the structure of the objects and constraints (DOFs) they have, the rest we have to discover during simulations.

If we’d like to speak in more general way, we could bring in Stephen Wolfram’s concept of computational irreducibility: in complex situations we have to model a system and let it live based on its rules to see which state it will end up in, as we may not be able to predict it directly. HTM-like activity is useful when we already know some causal relationships and may skip simulation.

What I was trying to imagine in my initial comment here is how behavior could be a reusable part of an object model to allow for model based behavior. If I’m not mistaken, it was only one of the points in the brainstorming session, while sequence learning you mention was also discussed there.

But it seems to me that even in case of learning sequences “behavior” is possible to define in terms of constraints: your body constrains your movements to reproduce remembered sequence which hopefully should be “congruent” with the state of environment, so environment will not constrain you; but if it will, then you need to get back to model-based exploration to see what’s changed there.

HumbleTraveller · January 31, 2025, 2:55pm

Ah, perhaps theres a misunderstanding on my part then.

I’d thought HTM was essentially a more biologically constrained version of Monty. Unless you meant to refer to an SDR? But those haven’t been implemented yet, at least not to my understanding.

As for model-based vs model-free policies, I didn’t think those were overly related to what we were describing here, honestly. In my mind, model-based policies are emulating whole brain networks (within context to Monty). For example, the hypothesis-testing policy is (broadly) capturing central executive functioning - at least I’ve interupted it as such. Though I could be wrong in thinking that.

Model-free policies then are capturing more autonomic function, where the input-to-behaivoral-output path is algorithmically simple. Though its also like you had said, policies which once started off as model-based may in time become model-free through repeated, consistant execution. Think muscle memory. Basal ganglia/cerrebellar functioning helps facilitate this migration in biological nervous systems (amongst other structures), though this also has yet to be implemented into Monty.

Stephen Wolfram’s concept of computational irreducibility

This is pretty much just algorithmic uncertainty. Nothing against Wolfram (I actually quite like him!), but this wasn’t a concept developed entirely by him. For example, its been used to understand cellular automata principals by the ALife community pretty much for forever, going back to things like Conways Game of Life (and likely pre-dating him by a long ways). Pedantic point aside, I agree with your line of thinking here regarding simulating enviromental conditions.

What I was trying to imagine in my initial comment here…

In my mind, model “behaivor” is the temporal equivalent of something like “feature space,” which we can use to spatially discribe an object. Both of which are just as equally reusable across different object models.

Also! You’re last paragraph reminds me of the old Alfred Northhead quote: “The whole purpose of thinking is to let the thoughts die instead of us dying.”

Edit: It should be noted that I could be totally wrong in my understanding of these definitions and of the teams design intent. It might not be a bad idea for somebody on the TBP team to step in and clarify some o these things

artem · January 31, 2025, 3:51pm

I’d thought HTM was essentially a more biologically constrained version of Monty. Unless you meant to refer to an SDR? But those haven’t been implemented yet, at least not to my understanding.

My understanding is that HTM (as such mentioned yet in “On Intelligence”) initially used SDR inside and was intended for learning sequences, as a simpler task and first step. And probably may be in future extended for Monty. But of course I might have mixed things up.

In my mind, model “behaivor” is the temporal equivalent of something like “feature space,” which we can use to spatially discribe an object. Both of which are just as equally reusable across different object models.

Actually I think I also consider it this way. And my point was about how it could be reused. For me “temporal” means “progressing in time” in a sense that object’s state (relative spatial arrangement of features) iteratively changes. That’s another reason I recalled Wolfram - his notion of time as “the inexorable progress of computation” (I don’t claim it’s his invention, just heard it from him). So spatial features arrangement changes iteratively. How? We need to simulate each exact case to know; but at least we may know that object state at moment t imposes certain constraints on its state at t+1. So the object “behavior” boils down to something we know (constraints/DOFs) and simulation of forces that influence it. Hence DOF could be a reusable entity for alike objects that constrain their behaviors in alike ways (stapler, laptop lid, door/window).

Edit: It should be noted that I could be totally wrong in my understanding of these definitions and of the teams design intent. It might not be a bad idea for somebody on the TBP team to step in and clarify some o these things

Same here, agree

HumbleTraveller · January 31, 2025, 4:43pm

Thats a really interesting question. I’m not totally sure. My intuition wants to lean towards using oscillitory wave properties as a timekeeping mechanism, by which various saliant value assignments can serve as a kind of metadata, stitching “model instances” together across time.

Something like this:

You can ignore a lot of the verbiage (semantic memory, episodic memory, etc.). This was just a diagram I had put together for a much earlier project, basically me trying to conceptualize the autobiagraphical playback of memory. Though I do think a lot of the principals here may still be at play. But what are your thoughts? Do you have a suggestions for the question of how?

artem · February 1, 2025, 9:04am

But what are your thoughts? Do you have a suggestions for the question of how?

Sorry, unfortunately I cannot tell a thing here as it touches neurobiological aspects and I have no required background to think in these terms; I still have lots of stuff to read to build up at least some basic understanding. My considerations above were rather conceptual, not specific to biology.

HumbleTraveller · February 1, 2025, 10:30am

Ah, right. That fair enough. I’ll try and revisit this later today and see if I can explain it in a non-biological way. I can try to rephrase it in a way familiar to you. If I may ask, what’s your background in?

artem · February 8, 2025, 10:10am

I’m just an average software engineer in an average company; there’s no real need to formulate it in some special “familiar way”, it’s just that neurobiological specifics (in all their vastness and complexity) multiplied by my yet poor knowledge of them won’t allow for a substantive discussion.

P.S.

I think that the point of model-based behaviors is to avoid predicting potential future and predicting in general. Models are needed to actually run a simulation …

That was wrong from my side to say this, of course, as a simulation can be considered a prediction of what will or would happen given certain circumstances and actions.

Topic		Replies	Views
2025/06 - Brainstorming Around Behavior and Deformations Video Discussions brainstorming-video	9	151	August 1, 2025
2025/07 - Hierarchy or Heterarchy? Video Discussions	11	162	August 18, 2025
Modelling goal-defined behavior through model-based policy recursion Research and Theory	8	178	January 30, 2025
New Tutorials on Using Monty in Custom Applications Monty Code	16	356	July 21, 2025
Breakthrough Ideas for Modeling Object Behaviors Research and Theory	7	317	August 18, 2025

2024/12 - Brainstorming on Compositional Policies - Part 5

Related topics