About compositionality and heterarchy

ElyMatos · February 13, 2025, 6:10pm

Hi, all!
Maybe this is a question for the “Monty code” category, but I’m not sure.
In the process of heterarchy described in the paper, the “projection” of one LM output to another LM input (like the example of tire and car), what is the “pose” of the “component object” (e.g. tire) ? As the component is “made of” a number of poses, are all of them projected to the upper LM? Or a new pose is defined for the component object?
Thanks,
Ely

tslominski · February 13, 2025, 6:43pm

Howdy, inter-LM communication happens via the Cortical Message Protocol (CMP) defined in the State class. In there, the morphological_features consist of pose_vectors of shape (3,3) and pose_fully_defined (bool). A new pose will be defined for the component object.

HumbleTraveller · February 13, 2025, 6:55pm

Just to add onto @tslominski’s excellent response, in reference to composite modeling, I believe the lower-level LMs output aggregated representations of their respective objects. They don’t output every individual pose. I’m sure theres some sort of threshold which defines whether or not a pose should become part of that unified output, though I don’t yet know how thats handeled in the code.

ElyMatos · February 13, 2025, 7:26pm

Thanks @tslominski and @HumbleTraveller for responses.
Just to clarify: suppose I have a component object defined as a cylinder with 100 nodes (pose + features). To communicate this object to another “upper” LM, will CMP inform all the 100 nodes? if “A new pose will be defined for the component object” it will have 101 poses/nodes or just one pose in the vector?if just one pose, how this pose is calculated? as a average of the current poses ? another function to aggregates these poses? Thanks!

HumbleTraveller · February 13, 2025, 8:54pm

Hey there,

I noticed tslominski began typing up a response, then stopped. I’m guessing he’ll come back later with some info. In the meantime, I’ll help where I can…

To answer your question broadly, only a single unified pose is typically conveyed via the CMP (at least in this scenario). The “high-level” LM will then treat this single pose as a compact feature within its own internal model.

As for calculating the aggregate, thats really more @tslominski’s wheelhouse, than mine. He’s one of the brains behind the codebase, I’m just a fan of their work. But if you were curious, I would look up the Evidence Matching LM. Specifically, I’d start with the Evidance Graph LM class.

I can try and get you this information over the weekend. But I’ll be honest, I’d be starting from the same spot as you. Theres also the chance my estimate of it will simply be wrong. I’d hate to misinform you. It may be best to just wait for the teams response here.

HumbleTraveller · February 15, 2025, 5:44am

@ElyMatos and @tslominski,

Typing this up on my phone. Hopefully everything is readable enough. Tslominki, I’m including you here so you’ll hopefully have the opportunity to point out any inaccuracy I might have make.

Alright, here we go…

The way aggregation works is actually pretty interesting. As it turns out, an LM does not actually “average out” or “aggregate” the sum of all its graph’s nodes, or at least it doesn’t do it in the way we’re imagining it.

What actually happens is that within that graph there exists a single node by which the entirety of the graph will become represented by. As an analogy, imagine there is a village of people who assign a delegate to represent them. It’s like this.

Now, we need to have a way to determine which node will become this delegate. The process by which we can do this is called evidence accumulation.

Each node has attributes (features) like spatial location, curvature, color, or orientation. During evidence updates, these features are compared against observations. You can liken this to making a prediction about the world, then comparing that prediction against reality. The closer your prediction is to the real-world observation, the higher the evidence.

Now, what’s interesting here is how a nodes neighbors can affect its own evidence score. For example, if your neighboring nodes are shown to be accurate, then the evidence score of your own node will increase proportionality.

A good way to view this, in my opinion, is to imagine that you own a home in a neighborhood. Now imagine that one of your neighbors completes a bunch of home improvement projects, thereby raising the property value of their house. This is obviously good for their own home’s resale value, however it will also increase the value of your home too, due to your shared proximity. Neighboring nodes are exactly like this. Also, if it wasn’t obvious, inaccurate neighboring nodes will just as easily decrease the evidence score of your node. It works both ways.

The main factors which contribute to a nodes evidence score are the following:

Feature matching: How well the observed features match the node’s predicted features
Displacement Matching: How well the node’s pose aligns with the observed displacement or movement.
Voting Inputs: Evidence from other LMs. We haven’t talked about this one much, but the evidence of neighboring LMs can affect the global evidence space of your own LM.

So now, we finish calculating all of these evidence scores across all of the nodes of our given graph. We then select the node with the highest evidence score, giving it the designation of most_likely_hypothesis (MLH).

Here’s the bit of code which I believe handles this:

mlh_id = np.argmax(self.evidence[graph_id])
mlh = self._get_mlh_dict_from_id(graph_id, mlh_id)

That first line (mlh_id) performs the actual indexing of the node. The second line (mlh) grabs all of that node’s relevant information.

By doing the above we end up getting a single representative node which serves as a proxy for the entirety of the graph. Later, this representative node gets packaged up by the get_output function:

mlh = self.get_current_mlh()
pose_features = self._object_pose_to_features(mlh[“rotation”].inv())
object_id_features = self._object_id_to_features(mlh[“graph_id”])

Pose_features converts the MLH’s rotation into feature vectors, whereas object_id_features encodes the object ID into features. In this way an object ID is a feature.

Now then, there’s a ton of really cool things here we haven’t talked about yet. For instance, the way LM voting plays into this (both laterally and hierarchically), or the way evidence constraints are established (they basically operate bounded between -1 to 2). We also haven’t mentioned how the evidence_based.py file is basically scaffolded onto the graph_matching.py file, and how the latter contributes to all of this.

These other things seem pretty important, but maybe not critical to our understanding the gist of the process. That, plus it’s getting pretty late and I’m tired.

But anyways, I hope this helps answer your question, at least a little. Please don’t hesitate to ask if you need any clarification on anything. Until then, have a good weekend!

ElyMatos · February 15, 2025, 1:57pm

Thank you @HumbleTraveller very much, for the clear and detailed explanation. It is really interesting. In the NeuroCognitive Linguistics (from Sidney Lamb) this “representative node” would be called “coordination node”.
It is interesting because this make the network at same time “localist” and “distributional”. Localist because a whole object is represented by just one node (the idea of “grandmother cell”) but it is distributional because the information about the object is not in this cell alone, but distributed along other nodes.
The process of influence neighbors nodes reminds me of the “spreading activation” process, used at some semantic networks.
Thanks again for the help!
Ely

tslominski · February 15, 2025, 4:32pm

@ElyMatos I would be cautious about the detail of your localist analogy. If I recall correctly the link between the code and the neuroscience, object IDs are intended to correspond to SDRs, which themselves correspond to population codes of neural activity. There is no “grandmother cell.”

ElyMatos · February 15, 2025, 5:55pm

Yes, @tslominski , there is no grandmother cell, because the “component node” represents “the collection of nodes associated to the object”, and not “the object” like in the (classic) interpretation of grandmother cell.
But it is just an rough comparison, as the Monty system, with all these abstract data structures, seems to be different from a (classic) connectionist system.

HumbleTraveller · February 15, 2025, 6:48pm

Hey there @ElyMatos,

Of course! I’m glad I was able to help

I definitely agree with your localist/distributional statement. It looks like that to me as well. What you said about the “grandmother cell” makes a lot sense too, even if @tslominski is right in the need for specificity. To me, the architecture almost looks like a kind of small-world network, where the localized graphs are globally connected to one another through these MLH/CMP transmissions (basically serving as ‘hub nodes’). Not sure if this is any more an accurate analogy than your grandmother cells example, but it’s what comes to my mind.

P.S.
I took a quick look at FrameNet Brasil, that’s some pretty cool stuff you guys are working on! Framenet’s main focus is on lexicographic database-type work, right? I think the closest I ever really got into studying that would probably be colexification networks. Not quite the same, I know, but still. Both are fascinating.

tslominski · February 17, 2025, 4:04pm

I wanted to double-check with the team, so getting around to this answer took a while.

The pose of the “lower” LM corresponds to how its XYZ reference frame coordinates are oriented. This pose will be sent to the “upper” LM.

Another way of saying this is that your component object is represented by points in an XYZ reference frame. Those points in the XYZ reference frame describe an object with some orientation. When passing this composite object to the “upper” LM, think about how the XYZ reference frame itself (not the composite object described by points in it) needs to be posed/oriented to fit whatever object the “upper” LM perceives. The pose of the XYZ reference frame itself is what’s passed up.

Topic		Replies	Views
2021/11 - Continued Discussion of the Requirements of Monty Modules Video Discussions brainstorming-video	3	77	November 12, 2024
2021/11 - Initial Outline of the Requirements of Monty Modules Video Discussions core-video	15	327	January 31, 2025
Some thoughts about scale invariance and model composition General	2	51	January 16, 2025
2022/03 - How Compositional Models are Constructed Video Discussions archive-video	0	31	April 25, 2025
Some Questions from the Documentation General	21	217	January 10, 2025

About compositionality and heterarchy

Related topics