HTM Sequence Memory and Time and Grid Cells in the Neocortex

@jhawkins gives a recap of the temporal memory algorithm in Layer 4 for sequence learning and contrasts it to object morphology learning which requires association with L6a location representations. The team then discusses different ideas on how behavior could be represented in the cortex, including whether grid cells can represent space and time together (4D).

Short Summary

Main Video

00:00 HTM Sequence Memory and Time and Grid Cells in the Neocortex
00:08 A Review of HTM Sequence Memory
18:11 Model of Sequences
23:37 Model of Objects
01:07:41 A Summary of the Open Question
01:42:56 Spacetime and Grid Cells

2 Likes

It sounds like you may be dancing around the possibility that there are in fact two types of sequences: discrete and continuous. Each one has their own unique representation and behavior model and they are actually quite complimentary to one another.

A discrete sequence is used to represent transitions between clearly discernible states (C0 discontinuous). These sequences are stored as discrete transitions between the SDRs for the states. Conceptually, this can be considered a form of graph representation, and is probably the most efficient way to model sequences with a finite number of transitions. This is the classical HTM sequence model, and I believe it is the appropriate model to invoke when considering high-level planning.

A continuous sequence is used to represent any feature or space that could be considered continuous (C0) and/or differentiable (C1) w.r.t. some parameter (spatial, temporal, or conceptual). This type of feature space is probably best represented by coupling grid cell modules to one of the discrete feature SDRs described above. This coupling allows for interpolation in and around a collection of discrete features in an otherwise continuous state space. I can think of multiple ways to implement this in TBT; the most obvious one being a quasi-stable SDR (in L2/3) that represents a canonical feature (L4) at a pose (L6) coupled to a set of grid-cell modules that represents some small offset from this canonical representation. This SDR could then conceivably transition seamlessly to another adjacent SDR and its continuous space (i.e. reanchoring).

These two representations are complimentary to one another. When planning our movements (or thinking), we first operate at a high-level in the discrete sense. Here we are attempting to identify the landmark states that will take us from some initial state to some final state. Path planning is literally the process of identifying a set of intermediate states that must be passed through. In other words, identify what intermediate states are accessible from the initial state and potentially getting closer to the final state. Because this is a graph model rather than a continuous space model, we are free to have state transitions at multiple levels (ref. Tolman-Eichenbaum Machine).

As part of this process, at any time, we can mentally anchor ourself to anyone of these intermediate states and consider a more refined discrete sequence planning (e.g. sub-stepping) or possibly explore the continuous space around that state (e.g. gradient sampling). This allows us to consider additional factors that might impact the broader path. From this we can not only construct a nearly optimal path, but also consider reasonable alternatives. (e.g. What route will I take home from work tonight if my regular route is congested due to the parade downtown?)

Once we have sufficiently considered and settled on a plan (or at least a preliminary plan), the go/no-go decision is reached and we start to navigate through the plan. Starting with the initial state we instantiate the continuous representation around that state and begin to move in the direction of the first way-point (landmark state). Once we get close enough to the next way-point, we reanchor our current state to it and continue to navigate through the continuous space of this new state. If at any point, our perception of the state space is not aligned with our imagined predictions during planning, we can immediately reevaluate the plan considering the newly acquired information.

So, for the stapler example. You may have several canonical SDRs to which you can anchor your perception (open, half-opened, closed, compressed). Around each of these, there is a continuous morphological space that you have experienced in the past and for which you have a good perceptual/behavioral model. When planning to close the stapler, you can easily imagine going from the open state to the closed state, but you can also imagine stopping at any intermediate state along the way. However, you are always aware that there is a continuous set of states that must be passed through to get from the open to close state. While I can imagine the stapler instantly going from open to closed, I also understand that the probability of that actually happening is vanishingly small. (i.e. I would be very surprised if I observed it behaving that way.)

3 Likes

Yes, we’ve been talking a lot about these ideas at last weeks brainstorming focus week (hence the late reply here). We haven’t really settled on an answer to those questions unfortunately. The issue with representing a continuous state space is that the actions required to go from one state to another are usually not easily generalizable the way that movements in 3D space are. Like I need to apply a very different action to go from open stapler to closed stapler than I need to go from closed to open stapler. There are also a lot more constraints and on the states that are possible and the transitions between them. Althought the stapler moves continuously, it is not moving freely like a sensor in the world. It has to follow a very specific trajectory and from each state there are only two possible next states it can go into (moving further up or further down). It’s hard to think of grid cells representing this. It seems more like states of an object are like a graph connected by different edges, and each edge has an action associated with it. But the edge in the other direction could have a non-obvious reverse action associated or not exist at all.

Jeff proposed an idea of how those state transitions and causilities could be learned in the brain at yesterday’s research meeting. It seems promising, and we will post a link here once it is uploaded on YouTube :slight_smile:

2 Likes

I was thinking about how the speculated unique location (23:38) changes the mechanisms described in the 2019 paper on locations in the neocortex. I need some smart cookies to help me figure it out.

In the paper, two things can drive grid cell activity: (1) movement and (2) feature layer activity.

This new location layer as described in this video would have the following consequences:

  • Movement would drive which column is active. The first question I am asking myself is: would movement cause all cells in the column to be active, or specific unique cells? The latter would mean that not only does the movement shift the activity bump that are now the columns, it also transitions unique locations from the pre-movement column to the unique locations in the post-movement column, and I am not too sure how these would be learnt.
  • Feature layer activity now does not drive a shift in activity bump, it only activates cells in the active column that have enough support. Am I wrong about this? If I’m not, I guess my last point about movement seems obvious: a movement would activate a whole column, not precise cells within a columns as well, because the precise cells are determined by feature layer activity. But I am not too sure.

Also, it sounds to me that the purpose of having many grid cell modules was to make the location unique to the object and location on object. It seems like this speculated view solves that problem. Does that mean that we may not have a location layer populated with many grid cell modules and instead of one “big” network made of columns and cells per column or is this completely off beat?

In this new view, do the cells in the feature layer represent the full context to the “unique” grid cells, or do “unique” grid cells also distally connect to “unique” grid cells in the same layer?

Hopefully my post is not too basic for the brilliant people of this forum

1 Like

These are not basic questions! We have thought about these issues for years and our best guess as to what the mechanisms are has evolved. Here are our current beliefs.

Grid cells are not a good representation of space, so something else must be happening. We need a means of generating representations of locations that are unique to both the object being modeled and the location on that object. At first we adopted the idea that if you have multiple grid cell modules, where each module was slightly different, then by looking at the active grid cells in all the modules you get a unique representation. We didn’t come up with this idea, but it was elegant and we went with it. But there are multiple problems with this idea, e.g. one problem is there aren’t enough grid cell modules to make the math work. I am not going to describe the other problems, but we had to abandon this idea of multiple grid modules as the basis for unique locations.

Our current belief is that there is only one grid cell module per cortical column and the the grid cell layer uses the same mini-column mechanism we have proposed for L4. Grid cells are arranged laterally in a cellular layer. https://www.cell.com/cell/fulltext/S0092-8674(18)31167-X Figure 7 of this paper is the best description I know, of the details of what grid cell modules look like. The area of a grid cell module is about the area of a cortical column. Imagine there is a mini-column of cells (perhaps 15 cells) associated with each grid cell. By activating one of the cells in the active mini-column we get the unique representation (SDR) that we desire. Again this is nearly identical to our proposal for L4 representations of features.

There is empirical evidence supporting the mini-column hypothesis. E.g. current theory of how grid cells are generated requires cells that represent movements in different directions where the spike frequency represents velocity. In addition there needs to be a set of cells for each movement vector where the cells differ in the phase of their activation. Empirically, mini-columns in L6 look like this (the phase is a prediction and I don’t think it has been looked for).

As you point out, the multiple grid module hypothesis, the one we abandoned, has the nice property that movement will correctly update the unique representation of location. It remains to be determined how the mini-column method of representing locations can predict the correct unique (SDR) location after movement. I believe it can and I have ideas how this could work, but this is a complex topic. As you say, a unique SDR in L4 will be associated with a unique SDR in L6, but it can’t move the bump of activity in L6. The mini-column hypothesis of representing location is also elegant. It uses a mechanism that is nearly identical to the one in L4. In fact it made me ask if phase of activation could be playing a role in L4 too. I didn’t lead anywhere yet.

3 Likes

Very cool, having the same mini-column mechanism in l4 and l6 sure is very elegant to me too.

The empirical evidence you are citing got me pretty confused, so I tried to unconfuse myself by thinking about it and researching, but I guess I am not that smart because I am still pretty confused. If I understand your paragraph correctly, mini-columns in l6 would be the cells that represent movement vectors in different phase of their activation. How does that not clash with the proposal that the cells in each column represent a unique object and a unique location on an object? It sounds like the cells would to have two purposes or two function? I wonder what I am getting wrong.

Very excited to one day hear about how these l6 mini-columns predict the correct location after movement!

1 Like

It IS confusing. Mini-columns are physical entities. They span all the layers, but the layers still matter. So the cells in L4 in a mini-column can be doing something different than the cells in L6 in the same mini-column.

In L6 in V1 they find cells that respond to directional motion. The cells aligned vertically, presumably in the same mini-column, all respond to motion in the same direction. It is not known why these cells are arranged like this.

Separately, we deduced that there must be grid cell-like mechanisms in the lower layers. People who study grid cells have tried to figure out how grid cells come about. The best hypothesis IMO requires cells that respond to directional movement, multiple cells for each direction. The cells would fire for the same movement but differ by phase of spikes relative to a background frequency. We put two and two together and proposed that the L6 cells in each mini-column would exhibit this phase shift. As far as I know this idea has not been tested.

In summary, in L4, the cells in a mini-column represent the same feature, but in context only one cell is active. In L6, the cells in a mini-column represent movement in the same direction, but only one cell is in phase with background frequency, so only that one is effective. The L4 and L6 mechanisms both rely on the power of forming sparse representations, but how they achieve sparsity is different.

5 Likes

I guess I even confused you with my question because it wasn’t even about how minicolumns in l6 could be doing something different than minicolumns in l4.

I think I finally unconfused myself, but I am not sure.

  • In your proposal, entire mini-columns in l6 activate in the same way that “grid cells” do. All cells in the same minicolumn in l6 have the potential to respond to motion in the same direction.
  • In your proposal, specific cells in each active mini-column in l6 activate to represent location that is unique to the object and the location on the object.

Does that sound right?

Regardless, I look forward to be plenty more confused about all of this.

1 Like