Mermaid musings: simple graphs of actors

Now that we have Mermaid support in the TBP forum (thanks!), I feel compelled to play around with directed graphs of actors (e.g., Monty modules).

A while back, there was a post On using Monty for Audio processing. Let’s follow up (and blue sky) a bit, with roughly the same general goals.

A Naive Subsystem

Humans have two ears, supporting the input of stereo audio. In order to take advantage of this, we may need to position the ears. A naive audio input subsystem might thus look something like:

graph LR;
  AM_MM["Asst. Monty<br>(LM)"]
  EP_MM["Ear Position<br>(MM)"]
  LE_HW["Left Ear<br>(HW)"];
  LE_SM["Left Ear<br>(SM)"]
  RE_HW["Right Ear<br>(HW)"];
  RE_SM["Right Ear<br>(SM)"]
  SA_LM["Stereo Audio<br>(LM)"]

  LE_HW-->LE_SM;
  RE_HW-->RE_SM;
  LE_SM<-->SA_LM;
  RE_SM<-->SA_LM;
  SA_LM<-->AM_MM;
  SA_LM<-->EP_MM;
  • The Left and Right Ear hardware (microphone, DAC,…) collect sampled and digitized audio information, in the form of amplitude over time.

  • The Left and Right Ear Sensor Modules process this information into congenial formats (e.g., amplitude and timing data, by frequency). So, for example, a sensor module might perform a Fourier transform to simulate the behavior of the cochlea.

  • The Stereo Audio Learning Module then:

    • sends requests to the Ear Position Motor Module
    • shares its findings with assorted Monty Learning Modules

Of course, this description glosses over a huge number of details, many of which will need to be solved before even minimal functionality can be achieved. There are far too many interesting and challenging issues to be discussed (let alone resolved) here, but we can explore a couple…

The Promiscuity of Actors

As many tabloids have covered, actors are famously promiscuous.
Monty’s LMs, in particular, are no exception. They are allowed (nay, expected) to send messages (e.g., VOTE) to any other LMs that might be nearby or otherwise involved.

So, for example, the Left Ear LM might tell the Right Ear LM that it heard something. We should also expect conversations with an unknown number of assorted (e.g., nearby) LMs. Making matters worse, I’d expect a fair amount of administrative traffic (e.g., alert, clock, status, timing, tracing).

All of this will make it difficult to draw (let alone understand) a complete diagram for any non-trivial Monty instance. However, we can use relevant subsets, as long as we bear in mind that we’re being deliberately incomplete.

Harnessing Monty

Let’s add some LLM-based harnesses (LHs) to our instance. The idea is that the LHs will “hear” the same things as Monty’s LMs do, then report their analysis of the sounds. This information could be used as a form of supervised learning, to annotate (i.e. “tag”) and/or tune Monty’s models with textual descriptions (e.g., “coin on glass”), directional information, etc:

graph LR;
  AM_MM["Asst. Monty<br>(LM)"]
  EP_MM["Ear Position<br>(MM)"]
  LE_HW["Left Ear<br>(HW)"];
  LE_LH["Left Ear<br>(LH)"];
  LE_SM["Left Ear<br>(SM)"];
  RE_HW["Right Ear<br>(HW)"];
  RE_LH["Right Ear<br>(LH)"];
  RE_SM["Right Ear<br>(SM)"]
  SA_LM["Stereo Audio<br>(LM)"]

  LE_HW-->LE_SM;
  LE_HW-->LE_LH;
  RE_HW-->RE_SM;
  RE_HW-->RE_LH;
  
  LE_LH<-->SA_LM;
  RE_LH<-->SA_LM;
 
  LE_SM<-->SA_LM;
  RE_SM<-->SA_LM;
  
  SA_LM<-->AM_MM;
  SA_LM-->EP_MM;

A Test Case

Here’s a relatively simple test case to check some basic system functionality. After setting things up:

  • generate a sonic impulse (e.g., a coin tapping on a window) at a known direction and distance from the sensors.
  • have Monty turn the head toward the impulse
  • have Monty report the direction, distance, etc.
1 Like

The initial posting left out myriad details; let’s explore a few…

Data Representation, Structures, etc.

In an actor-based system such as Monty, all interaction among the processes is based on the exchange of data structures. So, for example, the Learning, Motor, and Sensor Modules all exchange messages based on the Cortical Message Protocol (CMP).

note: ChatGPT contends that the TBP documentation does not discuss “Motor Modules”, per se. However, in an actor-based system, this seems like an appropriate approach (and thus nomenclature).

Cortical Message Protocol

AFAIK, the Cortical Message Protocol (CMP) is Monty’s only recognized protocol. According to CMP and the State Class,
CMP-compliant messages should contain:

  • location (relative to the body)
  • morphological features: pose_vectors (3x3 orthonormal), pose_fully_defined (bool), on_object (bool)
  • non-morphological features: color, texture, curvature, … (dict)
  • confidence (in [0, 1])
  • use_state (bool)
  • sender_id (unique string identifying the sender)
  • sender_type (string in [“SM”, “LM”])

However, this is pretty abstract, in that it does not precisely define the data structure(s) involved. Hmmmmmm. Another chat produced this:

state = State(
  location = np.array([...]), 
  morphological_features = { "pose_vectors": [v1, v2, v3], "pose_fully_defined": True },
  non_morphological_features = { "color": [...], "texture": [...] },
  confidence = 0.8,
  use_state = True,
  sender_id = "SM_0",
  sender_type = "SM"
)

So, we now have a representative data structure. As long as we’re running Monty in a single Python process, we’re free to ship this data around using native Python data types. However, as soon as we want to use multiple languages, modules, and/or processes, we’ll need to encode the data in some manner, handle addressing, etc. IMHO, the obvious way to do this is to embed the encoded information in a higher-level format.

Data Encoding and Embedding

Although JavaScript Object Notation (JSON) has some annoying limitations (e.g., limited data types, no comments), it is supported by every modern computer language. It is also the base notation for assorted higher-level formats. So, for discussion, let’s assume that we’re using JSON.

The particular “higher-level format” I’d like to propose here is the Model Context Protocol (MCP). MCP, a JSON-based format, appears poised to become the lingua franca of the AI world. It can handle (possibly with a bit of help) most of the administrative issues involved in exchanging messages among AI actors. As ChatGPT put it:

CMP vs MCP: Roles

Protocol Purpose
CMP (Cortical Messaging Protocol) Defines what a message contains: State/GoalState, location, features, confidence, sender metadata. It’s the data format / semantic payload.
MCP (Model Context Protocol) Defines how messages are contextualized and routed between modules. MCP tracks the contextual state of an agent, including sequences, dependencies, and hierarchies among CMP messages. It manages state aggregation, temporal context, and multi-module integration.

Let’s close things out by adding some data type annotations to the links in the previous diagram:

graph LR;
  AM_MM["Asst. Monty<br>(LM)"]
  EP_MM["Ear Position<br>(MM)"]
  LE_HW["Left Ear<br>(HW)"];
  LE_LH["Left Ear<br>(LH)"];
  LE_SM["Left Ear<br>(SM)"];
  RE_HW["Right Ear<br>(HW)"];
  RE_LH["Right Ear<br>(LH)"];
  RE_SM["Right Ear<br>(SM)"]
  SA_LM["Stereo Audio<br>(LM)"]

  LE_HW-- Raw -->LE_SM;
  LE_HW-- Raw -->LE_LH;
  RE_HW-- Raw -->RE_SM;
  RE_HW-- Raw -->RE_LH;
  
  LE_LH<-- MCP -->SA_LM;
  RE_LH<-- MCP -->SA_LM;
 
  LE_SM<-- CMP -->SA_LM;
  RE_SM<-- CMP -->SA_LM;
  
  SA_LM<-- CMP -->AM_MM;
  SA_LM-- CMP -->EP_MM;

Legend

  • Edge Types

    • CMP: CMP data structures, embedded in MCP
    • MCP: MCP data structures, other than CMP
    • Raw: raw analog or digital data
  • Node Types

    • HW: hardware (e.g., microphone, DAC)
    • LH: LLM-based harness
    • LM: Learning Module
    • MM: Motor Module
    • SM: Sensor Module
2 Likes

Support Infrastructure

The LLM-based harness mentioned earlier in this thread can be thought of as support infrastructure for Monty. As discussed, it can provide a form of supervised learning, by telling the LM what it “thinks” the sensors are hearing. Expanding on this idea, let’s consider various support actors.

Control and Observability

Using MCP, an actor could control Monty’s modules, observe their behavior, etc. So, for example, there might be a global, regional, and/or module-specific dashboards, allowing a researcher to examine Monty’s activities.

Just as CMP can be used under MCP, other low-level protocols can be added to the mix. For example, GraphQL can be used as a way to:

  • query (request data)
  • mutate (modify data)
  • subscribe (request live updates)
  • test (sanity-check returned data)

Processing

A Monty instance might need various forms of processing on its data streams, in order to take best advantage of patterns in the data. For example, an audio stream might benefit from Fourier transforms, logarithmic scaling, statistical inference, etc.

Some forms of data management might also be useful. For example, a circular buffer (aka ring buffer) could be used to provide a running window of current samples for a data stream.

Using MCP, an LM or SM could request a wide range of such processing services. Various collections (i.e., archives, indexes, marketplaces) of MCP servers are already in place, including:

In general, these collections support discovery by both humans and LLMs. Although Monty might not use this feature directly, a developer and/or LLM-based tool certainly could.

Seeding

By “seeding” Monty’s LMs with data, hints, and tags (e.g., about input streams), an LLM-based harness can influence how Monty’s LMs will behave, evolve, etc.

  • directing LMs to pay attention to particular inputs
  • providing nomenclature for events and/or objects
  • suggesting relationships between inputs

Synchronization

It seems very likely that Monty will need to maintain timing information in its data streams. To make best use of this, however, there may need to be a synchronization mechanism.

One way to address this might be to set up something like a Network Time Protocol (NTP) server on each processing node. (NTP is used on the Internet to keep computers’ clocks synchronized, compensating for transmission delays, etc.)

graph LR;
  AM_MM["Asst. Monty<br>(LM)"]
  DB_SA["Dashboard<br>(SA)"]
  SC_SA["System Clock<br>(SA)"]
  EP_MM["Ear Position<br>(MM)"]
  LE_HW["Left Ear<br>(HW)"];
  LE_LH["Left Ear<br>(LH)"];
  LE_SM["Left Ear<br>(SM)"];
  RE_HW["Right Ear<br>(HW)"];
  RE_LH["Right Ear<br>(LH)"];
  RE_SM["Right Ear<br>(SM)"]
  SA_LM["Stereo Audio<br>(LM)"]

  DB_SA <-- GQL --> SA_LM;
  SC_SA -- NTP --> SA_LM;

  LE_HW -- Raw --> LE_SM;
  LE_HW -- Raw --> LE_LH;
  RE_HW -- Raw --> RE_SM;
  RE_HW -- Raw --> RE_LH;
  
  LE_LH <-- MCP --> SA_LM;
  RE_LH <-- MCP --> SA_LM;
 
  LE_SM <-- CMP --> SA_LM;
  RE_SM <-- CMP --> SA_LM;
  
  SA_LM <-- CMP --> AM_MM;
  SA_LM -- CMP --> EP_MM;

Legend

  • Edge Types

    • CMP: CMP data structures, embedded in MCP
    • GQL: GraphQL data structures, embedded in MCP
    • MCP: other data structures, embedded in MCP
    • NTP: NTP data structures, embedded in MCP
    • Raw: raw analog or digital data
  • Node Types

    • HW: hardware (e.g., microphone, DAC)
    • LH: LLM-based harness
    • LM: Learning Module
    • MM: Motor Module
    • SA: Support Actor
    • SM: Sensor Module

After watching the first part of Jeff’s video How Embodied Movements Might be Learned and Controlled, I decided it would be fun, interesting, and possibly useful to recast Jeff’s diagrams in the pattern I’ve been experimenting with in this thread.

Recasting Hearing as Vision

Because Jeff’s presentation concerns vision rather than hearing, we need to recast our last diagram a bit, as:

graph LR;
  AM_MM["Asst. Monty<br>(LM)"]
  DB_SA["Dashboard<br>(SA)"]
  SC_SA["System Clock<br>(SA)"]
  EP_MM["Eye Position<br>(MM)"]
  LE_SH["Left Eye<br>Sensors<br>(SH)"];
  LE_LH["Left Eye<br>(LH)"];
  LE_SM["Left Eye<br>(SM)"];
  RE_SH["Right Eye<br>Sensors<br>(SH)"];
  RE_LH["Right Eye<br>(LH)"];
  RE_SM["Right Eye<br>(SM)"]
  BV_LM["Binocular and<br>Stereo Vision<br>(LM)"]

  DB_SA <-- GQL --> BV_LM;
  SC_SA -- NTP --> BV_LM;

  LE_SH -- Raw --> LE_SM;
  LE_SH -- Raw --> LE_LH;
  RE_SH -- Raw --> RE_SM;
  RE_SH -- Raw --> RE_LH;
  
  LE_LH <-- MCP --> BV_LM;
  RE_LH <-- MCP --> BV_LM;
 
  LE_SM <-- CMP --> BV_LM;
  RE_SM <-- CMP --> BV_LM;
  
  BV_LM <-- CMP --> AM_MM;
  BV_LM -- CMP --> EP_MM;

Legend

  • Edge Types

    • CMP: CMP data structures, embedded in MCP
    • GQL: GraphQL data structures, embedded in MCP
    • MCP: other data structures, embedded in MCP
    • NTP: NTP data structures, embedded in MCP
    • Raw: raw analog or digital data
  • Node Types

    • LH: LLM-based harness
    • LM: Learning Module
    • MM: Motor Module
    • SA: Support Actor
    • SH: Sensor Hardware
    • SM: Sensor Module

Digital Cameras vs. Eyes, etc.

Just as microphones aren’t ears, digital cameras aren’t eyes. Even a “simple” RGB CCD array is quite different from a retina. Fold in the things needed to provide an RGBD camera with depth information (e.g., stereo vision, structured light, time of flight sensing) and things quickly get very far from biological practices.

Fortunately, we can skip over these differences, while noting that there are going to be obvious similarities in the results (e.g., color vision, depth perception, non-linear intensity scaling) we’d like Monty to achieve.

We can also leave out the Dashboard, LLM-based harnesses, and System Clock, making room for more details in other aspects of the sketch (e.g., Drivers). But that will have to wait for the next installment…

1 Like

Trimming Down Some Infrastructure…

As promised, here’s a trimmed-down version of the preceding diagram, omitting several sorts of infrastructure:

  • Assorted Monty Learning Modules
  • Dashboard (Support Actor)
  • Left & Right Eye LLM-based Harnesses
  • System Clock (Support Actor)
graph LR;
  EP_MM["Eye Position<br>Motor Module"];
  LE_SH["Left Eye<br>Sensor Hardware"];
  LE_SM["Left Eye<br>Sensor Module"];
  RE_SH["Right Eye<br>Sensor Hardware"];
  RE_SM["Right Eye<br>Sensor Module"];
  BV_LM["Binocular Vision<br>Learning Modules"]

  LE_SH <-- Raw --> LE_SM;
  RE_SH <-- Raw --> RE_SM;
  
  LE_SM <-- CMP --> BV_LM;
  RE_SM <-- CMP --> BV_LM;
  
  BV_LM <-- CMP --> EP_MM;

Adding Device Drivers

Now, in order to support Jeff’s breakout, let’s add some “device drivers”. These will handle device-specific details and enable CMP-based communication with higher-level actors (e.g., Sensor Modules, Motor Modules).

On the input side, most of the data flows from the Sensor Hardware through the Sensor Drivers and/or Sensor Modules to the Binocular (and Stereo) Vision Learning Module(s):

graph LR;
  LE_SD["Left Eye<br>Sensor Driver"];
  LE_SH["Left Eye<br>Sensor Hardware"];
  LE_SM["Left Eye<br>Sensor Module"];

  RE_SD["Right Eye<br>Sensor Driver"];
  RE_SH["Right Eye<br>Sensor Hardware"];
  RE_SM["Right Eye<br>Sensor Module"];

  BV_LM["Binocular Vision<br>Learning Module(s)"];

  LE_SH <-- Raw --> LE_SD <-- CMP  --> LE_SM;
  RE_SH <-- Raw --> RE_SD <-- CMP --> RE_SM;

  LE_SD <-- CMP--> BV_LM;
  RE_SD <-- CMP--> BV_LM;

  LE_SM <-- CMP --> BV_LM;
  RE_SM <-- CMP --> BV_LM;

On the output side, the data flows from the Vision Learning Module through the Motor Modules and Motor Drivers to the Motor Hardware:

graph LR;
  LE_MD["Left Eye<br>Motor Driver"];
  LE_MH["Left Eye<br>Motor Hardware"];
  LE_MM["Left Eye<br>Motor Module"];

  RE_MD["Right Eye<br>Motor Driver"];
  RE_MH["Right Eye<br>Motor Hardware"];
  RE_MM["Right Eye<br>Motor Module"];

  BV_LM["Binocular Vision<br>Learning Module(s)"];

  BV_LM <-- CMP --> LE_MM <-- CMP --> LE_MD
    <-- Raw --> LE_MH;
  BV_LM <-- CMP --> RE_MM <-- CMP --> RE_MD
    <-- Raw --> RE_MH;

Discussion

In this architecture, the Sensor and Motor Drivers would handle low-level tasks (e.g., geometric transformations, non-linear scaling). Jeff’s “Sub-cortical Behavior Generation” would be handled by the Motor Modules.

1 Like