2025/01 - Brainstorming on Compositional Policies - Part 6

Niels presents thoughts on how to represent states and transitions using graphs and/or sequences. What are the approaches that have been tried and how might HTM Sequence Memory play into this area.

3 Likes

Still going through the vid now, so forgive me if you guys cover this later. But I’m at the point where you’re discussing what node a represents. Wouldn’t it simply be the sparse representation of a much denser array of nodes, grouped together then by some shared common fate? (e.g. shared location, orientation or ID, as Jeff described.)

Edit: Finally finished.

At around the 25 minute mark, I think I’m tracking what Jeff is laying out. It makes a lot of sense that a “feature” can represent a wide range of possible “things.” For instance, in @nleadholm’s example, he mentions how a feature might represent something as simple as a color. Obviously this could be considered a feature. It is a feature. However, as Jeff later states, a feature could potentially be much more contextually rich. A feature is simply whatever passes through L4 of a given column. So it would make sense that as the throughput of given information trace steps up along the compositional hierarchy the “feature(s)” therein would grow to become more complex, likely increasing in their dimensionality.

However, for me, the thing which differentiates a node from a feature is its meta-structural awareness (e.g. an awareness over its time series or neighbor connectivity). A node possesses this awareness, whereas a feature exists independent of them. Is this a appropriate way to view these things.

Lastly, for what its worth, I feel like this is an oddly appropriate companion video: https://www.youtube.com/watch?v=Ecqff-9Upjw

2 Likes

Although I found the presenter’s voice intonations in “A Surprising Way Your Brain Is Wired” to be a bit odd, I had no real problem understanding them. More to the point, it was a very clear presentation of Small Graphs and some related concepts, terminology, etc.

While watching the video, I started trying to imagine how small graphs might play out in a highly distributed Actor model based on the BEAM, Elixir, and current realities of computer architecture. Here goes…

For purposes of discussion, let’s consider the RasPi 5, a fairly typical and moderately powerful single-board computer (SBC). Here’s a rundown from Wikipedia:

The Raspberry Pi 5 uses a 64-bit 2.4 GHz quad-core ARM Cortex-A76 processor. The Raspberry Pi 5 uses the Broadcom BCM2712 SoC, which is a chip designed in collaboration with Raspberry Pi. The SoC features a quad-core ARM Cortex-A76 processor clocked at 2.4 GHz, alongside a VideoCore VII GPU clocked at 800 MHz.

One or more instances of the BEAM could be running on the SBC. Each one could utilize all four cores at once, using OS threads to distribute the workload. For simplicity, however, let’s assume that we’re only running one BEAM instance per SBC.

To scale up, we just add more SBCs and some network support. And, to simplify I/O, we designate and wire specific SBCs to handle specific sensors and effectors. Of course, these can also handle higher level tasks (and some of them may not deal with I/O at all).

As a natural consequence of this architecture, Elixir processes which are running on a given SBC would get first crack at any connected sensor’s data and any connected effector’s capabilties. Although a highly related Learning Module could be located on another SBC, this would be less likely. Which is a Good Thing, because messages to other SBCs will be far slower than local ones.

Careful Reader will have realized that we’re getting a form of small graphs as a natural fall-out of the prevailing state of computer and network hardware technology, coupled with the vagaries of the BEAM.

Really Careful Reader may also notice that I’ve avoided using the Elixir meaning of “node”, which refers to a lightweight process running on another BEAM instance.

In another thread, I suggested that “perhaps cosine vectors might also get involved”. I’d like to expand on this notion, with the full realization that it seems only vaguely connected to cortical structure and behavior…

Let’s say that we have a module which “thinks” it has detected an object (e.g., a powered burner element on an electric stovetop). It has accumulated various data on this object, encoded as keys and values:

  • brightness: medium
  • color: red
  • diameter: 6"
  • temperature: hot
  • touching: painful
  • …

It takes each datum, turns it into a text string, and then calculates a hash function on the string. Using the calculated value as an index, it sets a location in a bit vector. After all the data has been encoded, the module includes the resulting vector in an outgoing CMP structure.

Upon receiving this vector, another module can quickly (albeit coarsely) assess how related it is to any other vector it already has “in hand”. As Wikipedia says:

In data analysis, cosine similarity is a measure of similarity between two non-zero vectors defined in an inner product space. Cosine similarity is the cosine of the angle between the vectors; that is, it is the dot product of the vectors divided by the product of their lengths.

Dunno how useful any of this is, but I was struck by the way this compresses the transmitted information sets, producing something akin to a sparse representation. (ducks…)

Ha! Your “(ducks…)” actually got a small chuckle from me. Give me some time with what you laid out here and I’ll get back to you a little later this morning. (shambles out of bed, looking for coffee…)

1 Like

I just realized that my presentation left out a couple of points, so:

AIUI, the dot product of a couple of numerical vectors A and B is normally calculated by summing A(i) x B(i). However, if we’re talking about bit vectors, this turns into counting the number of “one” (true) bits in AND(A,B). So, it’s a pretty fast operation.

Because of the hashing, some result bits will get set for differing data values, causing a false positive. For example, the hashes “color: red” and “surface: smooth” might be the same. So, the result is mostly a way to make a quick check for similarity.

Instead of coffee, I ended up having to settle for chai (poor me).

I suggested that “perhaps cosine vectors might also get involved”.

I actually think implementing something like that could be interesting. It actually kind of reminds of auto-encoding principals, where you take high-dimensional data and compress it through a low-dimensional filter, then expand/interpret it again on the other side; a bowtie architecture. This kind of architecture gets used in the brain all the time, so there’s some biological precedence there for it.

One potential limitation I see however would be the normalization of magnitude. I’d need to double-check how CMP packages things like scale or voting confidence, but normalizing those could potentially be problematic. Though you also mentioned hashing the string, so maybe that would prevent this issue altogether? Either way, it’s an interesting thought. If my plate wasn’t already full, I’d be tempted to test it.

Hey @HumbleTraveller re. the definition of a node - yes I think in some instances it can represent a condensed representation of a series of other nodes (i.e. with hierarchy), although if the graph is at the bottom of the hierarchy (direct sensory patch input to each node), then this won’t always be the case.

I think the main thing to capture is that it is a collection of information at a point, which “feature” also captures, although I think the discussion was just because this latter term is overloaded. But yes thinking of it as the input to L4 is a useful framing, and specifically for the terminology of “node”, I agree that focusing on the fact that nodes are also defined by neighborhood connectivity is useful.

1 Like