RFC CMP v1 Feedback

The new RFC link worked fine for me (tnx). Here are some random comments which may or may not deserve to be raised as issues…

1 Key Concepts

1.1 Constraints

1.1.1 Message Propagation Delay

The Cortical Message propagation delay between the same sender and the same receiver MUST be constant.

As Joe Armstrong might have put it, that breaks the laws of physics for distributed systems. (Indeed, it is a hard constraint for concurrent systems to meet.) That said, it should be possible to produce a reasonable simulation of this by buffering input messages until a given propagation time is reached.

1.1.2 Module Processing Duration

Each Module MUST take constant time to receive, process, and emit Cortical Messages.

Once again, this is difficult to guarantee, but can be simulated by buffering output messages until a given processing time is reached.

1.1.3 Module Processing Completeness

Each Module MUST process all received Cortical Messages.

This doesn’t sound hard for Elixir, with some provisos. Each Elixir process will typically have a set of input dispatching patterns. These are tried in sequence by the BEAM until a match is found. A trailing fallback pattern can “process” (for some value of process :slight_smile:) anything that comes in.

2 Functional Specification

2.1 Step

Looking at the chart, I am reminded of my notion regarding unique and genealogically-based IDs in each message. To be clear, I don’t expect Monty to pay attention to these, but developers might find them very useful. Roughly:

  • Each Sensor Module is given a universally unique identifier (UUID) at its creation. It uses this ID in all messages it emits.
  • Each resulting message (e.g., from an LM) contains one or more dynamically calculated IDs, based on all of the module’s current input IDs.

The question of how the resulting IDs should be calculated is left as an exercise for the student (:-), but here are a couple of possibilities:

  • a comprehensive tree structure, containing all of the input IDs
  • a cosine vector, characterizing and summarizing the input IDs
1 Like

Please consider signal propagation in an adult neocortex (once myelinated). When signals arrive at a cortical column is what allows the cortical colum to determine whether those signals are correlated or not.

1 Like

Indeed, but here’s a possible way to finesse the issue:

  • Set up a system to distribute a global clock.
  • Store a current time stamp in each message.
  • Buffer input and output messages to reduce jitter.
  • Base correlation on the received time stamps.
2 Likes

Yes, that’s one way of dealing with the issue. There are also others. For example, Monty currently uses “virtual” time and controls all message delivery by enforcing time steps via the step() methods.

1 Like

@ tslominski - Thanks for breaking out this topic!

While reviewing the RFC, I didn’t see anything on the message format and content. Here are some comments, for discussion…

With few exceptions, positionally defined data structures (e.g., arrays, lists) should be avoided. In brief, they aren’t extensible or self-documenting. Worse, they introduce connascence of position, a form of coupling which forces the receiver to follow the order set by the generator.

I’d like to see the CMP include a substantial amount of metadata, organized in some structured manner. (This needn’t be large or hard to generate, in order to be useful.) For example:

  • Time stamps could be used by the modules themselves, to simulate biological propagation delays and reduce the impact of temporal jitter in the supporting communication framework.

  • Provenance metadata could help researchers to understand what modules were involved in the message’s generation.

  • Textual tags could help observers (e.g., LLMs, researchers) to understand the “meaning” of the message and relate it to external context. For example:

    • What objects and behaviors were used to train this model?
    • What processing did the module perform on the input data?
    • What part(s) of the sensory system contributed to the data?
  • Addressing information could indicate the sending and receiving modules, as well as the module sections (e.g., levels) involved.

  • As detailed in Possible message types, for consideration and discussion a “type” symbol (e.g., :vote) could help in dispatching messages to appropriate parts of the receiving module.

  • Annotations could specify information about the type of the sending neuron (e.g., pyramidal, excitatory, inhibitory), the length and nature of the axon’s path, etc.

3 Likes

Sorry about the incompleteness. The draft RFC is still missing a bunch of stuff.

We’ll work on formalizing the Cortical Message format from the State class reference implementation.

Interesting re the metadata :thinking:… I’ve been thinking about metadata more in terms of telemetry emitted by the system (traces and similar observability tooling come to mind) than as part of the Cortical Message itself.

3 Likes

I’d be delighted to see Monty incorporate popular (and preferably industry standard) observability tooling such as OpenTelemetry and GraphQL. Whether this should be part of the CMP or separate is beyond my pay grade.

To explore the GraphQL notion, consider that each module is an Actor, with a (possibly complex) internal state. It’s unreasonable to expect modules to report (e.g., trace) every state change, but there’s nothing that would prevent another actor from inquiring about specific state data. GraphQL is specifically designed to support this sort of thing…