Now that we have Mermaid support in the TBP forum (thanks!), I feel compelled to play around with directed graphs of actors (e.g., Monty modules).
A while back, there was a post On using Monty for Audio processing. Let’s follow up (and blue sky) a bit, with roughly the same general goals.
A Naive Subsystem
Humans have two ears, supporting the input of stereo audio. In order to take advantage of this, we may need to position the ears. A naive audio input subsystem might thus look something like:
graph LR;
AM_MM["Asst. Monty<br>(LM)"]
EP_MM["Ear Position<br>(MM)"]
LE_HW["Left Ear<br>(HW)"];
LE_SM["Left Ear<br>(SM)"]
RE_HW["Right Ear<br>(HW)"];
RE_SM["Right Ear<br>(SM)"]
SA_LM["Stereo Audio<br>(LM)"]
LE_HW-->LE_SM;
RE_HW-->RE_SM;
LE_SM<-->SA_LM;
RE_SM<-->SA_LM;
SA_LM<-->AM_MM;
SA_LM<-->EP_MM;
-
The Left and Right Ear hardware (microphone, DAC,…) collect sampled and digitized audio information, in the form of amplitude over time.
-
The Left and Right Ear Sensor Modules process this information into congenial formats (e.g., amplitude and timing data, by frequency). So, for example, a sensor module might perform a Fourier transform to simulate the behavior of the cochlea.
-
The Stereo Audio Learning Module then:
- sends requests to the Ear Position Motor Module
- shares its findings with assorted Monty Learning Modules
Of course, this description glosses over a huge number of details, many of which will need to be solved before even minimal functionality can be achieved. There are far too many interesting and challenging issues to be discussed (let alone resolved) here, but we can explore a couple…
The Promiscuity of Actors
As many tabloids have covered, actors are famously promiscuous.
Monty’s LMs, in particular, are no exception. They are allowed (nay, expected) to send messages (e.g., VOTE) to any other LMs that might be nearby or otherwise involved.
So, for example, the Left Ear LM might tell the Right Ear LM that it heard something. We should also expect conversations with an unknown number of assorted (e.g., nearby) LMs. Making matters worse, I’d expect a fair amount of administrative traffic (e.g., alert, clock, status, timing, tracing).
All of this will make it difficult to draw (let alone understand) a complete diagram for any non-trivial Monty instance. However, we can use relevant subsets, as long as we bear in mind that we’re being deliberately incomplete.
Harnessing Monty
Let’s add some LLM-based harnesses (LHs) to our instance. The idea is that the LHs will “hear” the same things as Monty’s LMs do, then report their analysis of the sounds. This information could be used as a form of supervised learning, to annotate (i.e. “tag”) and/or tune Monty’s models with textual descriptions (e.g., “coin on glass”), directional information, etc:
graph LR;
AM_MM["Asst. Monty<br>(LM)"]
EP_MM["Ear Position<br>(MM)"]
LE_HW["Left Ear<br>(HW)"];
LE_LH["Left Ear<br>(LH)"];
LE_SM["Left Ear<br>(SM)"];
RE_HW["Right Ear<br>(HW)"];
RE_LH["Right Ear<br>(LH)"];
RE_SM["Right Ear<br>(SM)"]
SA_LM["Stereo Audio<br>(LM)"]
LE_HW-->LE_SM;
LE_HW-->LE_LH;
RE_HW-->RE_SM;
RE_HW-->RE_LH;
LE_LH<-->SA_LM;
RE_LH<-->SA_LM;
LE_SM<-->SA_LM;
RE_SM<-->SA_LM;
SA_LM<-->AM_MM;
SA_LM-->EP_MM;
A Test Case
Here’s a relatively simple test case to check some basic system functionality. After setting things up:
- generate a sonic impulse (e.g., a coin tapping on a window) at a known direction and distance from the sensors.
- have Monty turn the head toward the impulse
- have Monty report the direction, distance, etc.