2024/08 - Encoding Object Similarity in SDRs

Ramy gives a presentation on his work during his internship. He presents on how he uses the relative evidence scores for objects to create a similarity matrix and learn SDRs that encode these similarities in overlaps.

1 Like

The presentation features an algorithm that generates three SDRs from three target overlaps (or similarities) between them.

I see two problems with such an algorithm (any, not this one in particular):

  • first is scaling, how difficulty increases when it is applied to thousands of target “objects” or SDRs. If it scales e.g. quadratically, then you might have a problem.
  • Objects or represntations aren’t available all at once. First the child learns 100 objects, then another 100, etc.. Somehow you’d like already known representations to be consistent have some … persistence to be recognizable later without requiring to rewrite them.
2 Likes

Hi @blimpyway and welcome to TBP discourse :slight_smile:

These are definitely valid points to consider with such an optimization approach.

Scaling issues

We have not observed scaling issues with optimizing SDRs for thousands of synthetic objects. Even though the target overlap scales quadratically with the number of objects, we sample a small number of objects every “minibatch” for calculating a representative gradient and optimizing SDRs in a similar manner to stochastic gradient descent (not discussed in the presentation).

The actual calculation of the target overlaps relies on evidence scores already existing in LM memory. To calculate the overlaps, we use the output of the evidence updates already calculated in EvidenceGraphLM, but we do not directly compare object graphs to determine similarity; EvidenceGraphLM does the heavy lifting of computing evidence scores across all existing objects by expanding the size of its hypothesis space to account for the added objects.

I’ve borrowed some common terminology from machine learning that relates to optimization (e.g., calculating gradients, minibatch, SGD), but there isn’t any kind of (deep) learning here. The optimization directly modifies the SDR representations to match the target overlap without training any hidden parameters.

Streaming Setup

We ran some experiments with synthetic objects for optimizing SDRs in a streaming setup, where we do not assume that we have all the objects during optimization. This simulates the scenario you described of progressively increasing the number of objects as the agent explores the environment. Results are here. The overlap matrix size increases because more objects are being added, which results in a transient increase of overlap error until optimization quickly drives that error back down.

In the Monty-YCB setup, we do not assume that we have all the similarity values existing before optimizing SDRs. After each episode, Monty observes a new object and uses the relative evidence scores to fill the values of a new row in the target similarity matrix. This can be seen here.

We have yet to test this approach in a more challenging setup of unsupervised learning, where learning and inference occur together in the same episode. In such case, Monty will only add relative similarity scores with existing object graphs, and we cannot assume knowledge of similarity values with objects that do not exist in memory. I do not expect this to be much more challenging than the synthetic experiments of streaming objects.

Related issues to consider here are representation drift and stability of SDRs. EvidenceSDRGraphLM has a stability parameter that controls how much learned SDRs are allowed to change as we introduce new objects. This is targeted at solving some of the challenges you described with learning persistent objects representations in a streaming manner. We have had some ideas for how Hebbian style learning rules could be used to enable continual learning in this streaming setting, without revisiting all known evidence scores, but we realistically won’t get to exploring these anytime soon. It does suggest an avenue for how the brain might solve this issue, and something we might revisit.

Note that working on encoding similarity into SDRs is currently not a TBP priority according to the current roadmap. A broader discussion of neural elements in Monty can be found here.

2 Likes

Sounds thoroughly considered.

A question I would ask is why? I thought that in real world resemblance between X and Y is something about to be inferred from observation/experiencing both X and Y, not something that is aimed for, and not to any level of “precision” of overlap.

Some side thought also… what if there is no single SDR encoding a cup, ant, bike, etc? So “resemblance” and “identity” isn’t as straight forward as theory assumes?

The similarity between objects is inferred from observing the objects, as you said. This measure of similarity is a byproduct of updating evidence scores as we observe features on an object. It is not something that we aim for. However, the mapping between a similarity score to overlap bits in SDRs is learned because Monty is not implemented with SDRs.

The notion of similarity would be useful for representing compositionality. In Monty, every message between any two learning modules (LMs) is represented as a feature at a location. For a low-level LM observing a cup, the features can be morphological (e.g., point normal and principal curvature directions) and/or non-morphological features (e.g., hue), and the location in this case would be a location in the reference frame of the cup. For a higher-level LM that observes a scene representation (e.g., dinner set), the features would be different objects at different locations in the reference frame of the scene. Any learning module needs to be able to compute the similarity of the incoming features to compare the sensed features with the model features. It is easy to compute the similarity of hue or vectors, but a bit more involved for higher level features (e.g., objects).

I like to think of it as a measure of interchangeability, i.e., can we swap out one object with another one in a high-level scene representation and still recognize the same scene? We would need to know how similar the objects are to perform such computation. The below image is from another presentation during my internship.

Note that while we think similarity is useful for representing compositional objects, we are still not sure that SDRs can meaningfully encode similarities as bit overlaps. We do not want to assign meaning to individual bits of such a sparse representation.

Hope this clarifies things.

1 Like