Possible RFC for PhaseLM?

Hi all,

This is my first time posting, so I don’t want to assume much here because I’m new to TBP and new to computational neuroscience. I’m an embedded systems engineer by trade. But this project excited me and I want to be a part of it in any way I can!

I’ve been diving deep into Monty and the Thousand Brains framework, and I’m exploring an idea I think might be worth developing into a formal RFC – but I’d love to get early feedback first to see if others think it makes sense.

What would it take to implement a phase-based learning module (PhaseLM), and should it be a core architectural direction?

The basic idea: rather than relying on index-based or world-relative coordinate systems to track feature locations, a PhaseLM would represent where a feature is sensed via circular phase spaces – inspired by grid/ring cell behavior. Phase would rotate as a function of movement, and when combined with feature signatures (the “what”), you’d get a natural pairing of [coefficients, phase] that could scale up into compositional object hierarchies.

Phase provides a continuous low-dimensional way to represent senorimotor space without requiring full map-like representations, getting around the curse of dimensionality problem.
It aligns closely with how biological systems might encode motion and repetition in reference frames. And most importantly, it’s foundational. If this system proves viable, it would affect how reference frames, compositionality, voting and even memory systems are built on top. So it makes sense to explore it early before higher-level mechanisms are too baked in.

I’m not proposing a full rewrite – just an optional PhaseLM module that could live alongside the existing GraphLM and demonstrate some basic phase tracking and prediction first. If it works welll, it might bootstrap more complex capabilities like phase-based graph alignment, feature voting, and reference frame composition.

I also watched a video where Hawkins discussed the benefits of a phase based system, although I can’t find it right now.

Proposed structure for PhaseLM

phased ring buffers (RingModule) each phase dimension is represented as a circular buffer (heading, position, curvature, etc)

  • Movement updates rotate phase values using learned transformations from motor input
  • phase acts as local cyclic representations of relative location within a reference frame

Feature Coefficient Store

  • Stores compressed representations of feature inputs (wavelet or sparse encodings)
  • Each coefficient set is indexed by a phase tuple

Phase-aware association graph

  • Graph where edges link phase locations to specific feature coefficients
  • Compositional objects are built by linking phase-anchored sub-features together.

Phase voting and alignment

  • When encountering a feature, the column attempts to align it’s current phase against known patterns to predict what feature should be sensed.
  • Other columns can vote based on phase alignment to converge on a shared reference frame.

Motor-phase transformation functions

  • learned mappings from motor command deltas to phase deltas
  • may be initialized with simple oscillators and refined over time using sensorimotor feedback

Reference Frame Anchoring

  • Columns anchor their phase spaces to reference frames derived from stable parent features (e.g. object surfaces or boundaries)
  • Phase-relative locations allow nested, compositional hierarchies across features

Questions for the community:

  • Has this kind of designed been explored internally already?
  • Would love some feedback on drafting an RFC for PhaseLM if the design and early prototype looked promising
  • What design constraints or compatibility requirements should be considered if this were to integrate cleanly into Monty?

Would love to hear thoughts, pushback or any relevant prior efforts. Thanks for building such a well thought out system, it’s early days but I’m really excited about the direction of this.

5 Likes

This seems very plausible to me; I’ll be interested in reading the researchers’ comments. In particular, I’d like to know whether the current choice of rectangular coordinates is based on known neurological patterns. If not, it would seem like all sorts of coordinate systems could be considered.

Meanwhile, can you clarify what coordinate system(s) this approach would use (e.g., polar, spherical)? ELI5…

Hi @Spencer, thanks for the thoughtful post! The research team will have a look and get back to you.

@Rich_Morin, Good question - there is nothing to suggest the brain uses xyz coordinates in the neuroscience and, in fact, any coordinate systems could be used. This question also brings up a point that’s worth reiterating about the implementation called Monty. That is, nothing in Monty is required to use neurons, or the approximation of the way neurons work. What Numenta, and TBP have done, is understand the neuroscience of the neocortex, and from that understanding, derived a number of principles of intelligence - you can read the list here Thousand Brains Principles | Thousand Brains Project . Monty is an implementation of those principles, not the complex neuroscience behind the principles. The reason we use xyz coordinates in Monty is because it’s easy for us feeble humans to visualize and this visualizable property is also one of the reasons we’re not using SDRs in Monty yet - they are inscrutable.

Another subtle point about this is that everything we want to build must be possible in a biological cortical column as a test of whether we’re going in the right direction (towards human level intelligence). Though Monty may not be implemented in the way neurons implement intelligence, the implementation must not violate any of the principles behind human intelligence.

3 Likes

@Rich_Morin

To answer your question about which coordinate system,

It’s more than just spatial coordinates. It’s any dimension for any combination of sensors and locations. And it’s more of a discrete version of a smooth manifold.

I never formally studied differential geometry so correct me if I’m wrong, but a high dimensional manifold can have low dimensions locally. If there’s 1000 dimensions, and at a local region only 3 of them change, the rest are zeroed out and can be reduced to a 3 dimensional space.

That’s how this gets around the curse of dimensionality. But instead of continuous manifolds, it’s more like a graph, a series of nodes and edges that follow the rules of manifolds. The continuity of manifolds can be approximated locally to allow predictive behavior. And if you are given a set of only a few phase axes that are changing in a measurement, we can quickly jump to which nodes vary in those dimensions, drastically reducing your candidate feature search.

Hope that helps

Hi @Spencer ,

Firstly, welcome to the community, and thank you for the interesting post! I think these are ideas that are definitely worth discussing. After some back and forth, we can see if it makes sense to transition to an RFC or what would be best.

To answer your question about similar work:

  • If you have not already seen the Numenta paper from Lewis et al in 2019, I would recommend checking it out. They built a system that uses grid cells with phase codes to encode the locations of features in objects. These objects were synthetic “grid-world” type objects, where each feature was a randomly chosen SDR. This system was essentially a more biologically plausible precursor to what is now Monty.

  • We have occasionally discussed other phase-codes in our research meetings, such as 1D phase codes for movements in particular directions. These have some appealing properties, but also issues. For example, keeping track of movement in 3D space (path integration) is challenging unless the phase code itself is three-dimensional.

  • More generally, while grid cells are clearly important for biological systems representing environmental reference frames, we have debated how necessary they are for Monty.

  • Lastly, there was an interesting discussion about half a year ago on coordinate systems here on the forums which you might find interesting.

Overall this sounds really interesting. A few initial questions from reading your proposal:

  • I would find it helpful if you could explore the concrete example of using a phase code for location. You could maybe start with 2D space, but then discuss generalizing it to 3D space. Working through how path integration would work, as well as making some drawings would also be very helpful. For example, representing head direction with phase is very different from representing a unique location in a reference frame, so some diagrams would help make sure we are on the same page for any follow-up discussion.

  • Can you unpack a bit more what you are referring to in terms of avoiding the curse of dimensionality? Are the dimensions you are referring to here those of e.g. physical space (constrained to 3D plus time), an abstract reference frame (e.g. movement directions through a family tree), or feature space (e.g. the different dimensions i.e. elements of a 128 dimension SDR encoding a feature like a color)? The description you gave of a low-dimensional manifold embedded in a high-dimensional neural space sounded to me like you were referring to the latter of these examples. However, dimensionality is generally not a curse in SDR encodings, but rather a useful feature. In particular, matching SDRs is a fast computation, and the high dimensionality can confer robustness to noise. On the other hand when we path integrate in reference frames, the dimensionality is relatively low (e.g. 2D or 3D), avoiding excessive computation. Hope that makes sense, but happy to clarify.

Looking forward to discussing more, I think after clarifying the above it should be easier to focus on some of the points you raised.

3 Likes

Hi Niels,

Thanks for the warm welcome and thoughtful questions. I’ve spent some time reflecting and I’d like to clarify both the technical angle and my current thinking about PhaseLM’s direction which has evolved a bit.

Initially I was looking into using phase/frequency representations as a primary modelling paradigm. However I now believe the more productive and biologically plausible direction is to position PhaseLM as a layer of texture inference, a mid level enhancement that augments systems like GraphLM rather than replacing them.

Specifically I’m exploring now how local phase-coded sparse frequency transforms over short spatial-motor histories can yield texture patch descriptors. Esssentially bounded frequency patterns tied to spatial regions traversed by sensors. These patches could assist with:

  • Improved recognition via repeating texture patterns,
  • Compression of redundant sampling over familiar objects
  • Bootstrapping world model detail when sensory coverage is partial
  • Better modelling of dynamic/moving objects, because the texture patterns can sometimes be consistent with motion

Clarifying the curse of dimensionality comment:
You’re right to call this out. My reference to the “curse of dimensionality” was not about SDRs themselves. I was referring to the challenge of combinatorially growing axes of variation in sensorimotor space, such as: physical movement (x,y,z), temporal or contextual axes (lighting, deformation), and semantci layers (grasped vs seen, active vs passive perception)
The hope with phase encodings was to compactly represent relative displacements in this multi-dimensional space without explicitly enumerating over each axes. But I think I agree that when used carefully, high-dimensional sparse representations like SDRs are a strength, not a curse.

How it would work

1.Buffer recent samples: Keep a rolling buffer of the last n sensory samples, each with their relative displacement (motor movement or position in a reference frame)

2. Map to texture space: Use the movement deltas as if they were “time” and apply a sparse N-dimensional fourier-like transform over these points. Because the samples are separated by gaps, we can’t use traditional FFT since that requires continuous signals. Instead we would predict a set of frequencies we would want to convolve, put those complex coefficients in a sparse vector, similar to how sparse-FFT works.

3. Bound the region: Assign this texture a patch covering the area we traversed. this can be a convex polygon, bounding circle or inferred surface. It’s an approximation of the region where this texture was measured.

4. Attach to graph: Bind the patch to the current node in the spatial graph. If similar texture patches exist nearby, this strengthens that node’s feature identity.

5. Blend overlaps: As movement continues, new patches overlap old ones. their descrioptors can be blended, sharpened, or contrast-enhanced. This layering effect gradually forms richer object-level representations.

Each patch is a tuple: (texture_freq_vector, region_geometry, attachment_node) that can be queried later for recognition, comparison or prediction. They can be helpful for positioning as well.

Overall I think this is a good mid-scale representation of objects. Raw point data is too fine-grained. Whole-object graphs are more course grained. Texture patches live in the middle, preserving useful sub-object structure that is otherwise lost. THis is especially helpful for objects that are defined more by surface patterns (wood grain, fabric weave, brick wall) than shape alone.

Biological plausibility
The neocortex/hippocampal circuits combine spatial navigation and oscillatory phase codes. Texture patching parallels this by creating phase-informed descriptors of local environments, echoing grid/place cell-like integration.

Thanks Spencer, to summarize, the new description you’ve provided would come more under the category of a custom sensor-module. Overall we do want to improve the quality of the features we extract (see this related Future Work section: Extract Better Features), and so something like this could probably be helpful.

A couple further thoughts to just highlight is that we don’t want sensor-modules/feature-extracting pipelines (what would correspond to subcortical structures in the brain) to build “models” of objects, which is why we generally describe them as encoding the feature for a single point, rather than a specific bounded area, even if the information is gathered over a broader area.

On the other hand, we have also discussed Learning Modules that learn models of surfaces / 2D references frames, including things like complex textures (cheetah fur, etc.). That might be another way of approaching this problem. In that case, the LM would store the learned models.

Both of the above would be valid, but they would have important differences in their assumptions. Only Learning Modules can store models (including things like a bound region) for recovery in the future. They also process build their models through movement, which sounds like it could fit well with some of the ideas you had about integrating over time.