Extending and Generalizing TBT Learning Models

Doonhammer · January 28, 2025, 5:04pm

I am coming at this problem from a background in signal processing, distributed systems, networking and security. I am not a neuroscientist by any stretch of the imagination. So I may not be fully understanding the theory and goals of TBT, so apologies in advance.

Looking into TBT I see the potential of it to provide real time unsupervised learning. I see this as a exciting aspect of AI that is somewhat orthogonal to the current LLM/LRM approaches but complimentary. However I am struggling to understand how to generalize the Monty/TBT code for a broader set of use cases.

To provide a concrete example, I would like to build a realtime network security application that listens to network traffic and learns its behavior to determine anomalous events. Some work has already been done in this area trying to insert ML into the Linux kernel see: Machine learning-powered traffic processing in commodity hardware with eBPF.

Many security applications do use machine learning to do this today but they need training data and if a new type of threat arises then in many cases the training data needs to be updated and the code redeployed. So with a combination of static rules, heuristics and machine learning solutions are being deployed today. However wondering if we can it a lot better by using TBT/Monty that provides unsupervised learning and acts more like a human operator examining the network. The issue is that humans do not scale with the amount of nodes, sessions and packets flowing through a modern network it is not humanly possible to observe everything and then learn the patterns and make decisions.

The sensor model is fairly easy and can by built using eBPF (for Linux) and will provide a real-time session table that is updated as new packets are received by the system. The same with action model, it is fairly simple allow, deny, notify etc. Motion would be prompted by the sensor learning of other nodes and deploying sensors to them to build a graph of the network, i.e. moving to get different views. The challenge is I see how to generalize the learning model. I can write a specific learning model but this seems to defeat the goals of TBT to create a general model of AI.

I can see a similar problem with the existing vision sensor, if it is extended to a larger part of the electro-magnetic spectrum, new fields would have to be added for the sensor and new learning models added to detect different aspects of the electromagnetic spectrum. While it might require different sensors they are still just detecting spectral content - and while the sensors would go beyond human senses they may be a natural for for robotic applications.

So my core question is, what is the proposed method for creating generalized learning models. LLM are a mechanism for extracting meaning from large volumes of training data and while there is a lot of work in making them human consumable the core code is not rewritten for every use case.

I think my understanding must be off as the only approach I can see is to create custom code for every sensor and I am not sure this is scalable? Any help suggestions would be welcome.

HumbleTraveller · January 29, 2025, 3:32am

Hey there, let me try and get a full appreciation for what you’re asking here.

You’re hoping to probe the feasibility of using Monty for a network grade sec app, correct? With its Sensor Modules serving almost as akin to end-point agents?

It’s an interesting thought, 100%. Though had I’d some reservations concerning the potential for false positives, given the nature of unsupervised learning. Lord knows we already get enough of that with heuristics AV solutions (I’m looking at you Carbon Black…).

For what it’s worth, I’ve always thought it would be rad to design a bio inspired AV solution off from immune response principals. I’m curious to know what the Monty team will have to say in response to your question here.

Edit: Also, I don’t think you’d need design any sort of custom LM (sensor modules might be a different story). You’d just need to think through how to map your network domain onto Monty’s reference frame approach. In my mind, there’s three principals you’ll want to think through:

A consistent way to represent locations/positions within “space” (perhaps this could be represented as nodes on a networks topology)
“Features” which can be detected at those positions (packet sizes, protocols and ports. That sort of thing)
And a way to “move” through the space (think traffic flow)

As for our “network objects” (really just a collection of features), we could imagine something like a traffic pattern. Disambiguating ‘normal’ vs ‘anomalous’ patterns here could be a bit of an interesting task. But hey, that’s where all the fun is.

Ultimately, most of the custom work to be done here, in my opinion, would be on policy design side of things. Just my two cents.

hlee · January 29, 2025, 7:29pm

Hi! This is Hojae, one of the researchers at TBP.

Thanks for the question @Doonhammer and great response @HumbleTraveller. I’ll just add few more points that might help clarify how Monty could be applied to network security:

As you alluded, Monty’s Learning Modules are quite general and modality-agnostic. They work with “feature at locations” in a common format defined by the Cortical Messaging Protocol (CMP), regardless of sensor type.

For the network security use case, I think you wouldn’t necessarily need to write a new custom Learning Module, but rather:

Write a custom Sensor Module that transforms network traffic data into CMP format.
Define what “location” means in your context (e.g. maybe temporal or topological position in the network? Apologies as I’m not familiar with network security).
Define what “features” mean in your context (e.g. packets or other features from eBPF)

Then, an existing Learning Module (e.g. EvidenceGraphLM) can work on this abstracted CMP by building models of what “normal” patterns are, detect anomalies when observations don’t match learned patterns, and updating evidence continuously as new data arrives.

Regarding your point your EM spectrum - the Learning Module wouldn’t need modification for different parts of the EM spectrum as the Sensor Module would handle different sensory inputs while the LM works on “feature at location” representation. In other words, learning modules are modality agnostic. Happy to clarify this more if it’s not clear.

That said, I think the biggest challenge in this case would be defining what meaningful “locations” and “displacements” of these locations. As we are in early stages, creating a specific Learning Module would be immensely helpful, not just for network security but also in informing future generic LMs.

Hope this helps! Cheers. - Hojae

Doonhammer · January 29, 2025, 7:41pm

@HumbleTraveller thanks for the reply. I am using the network security app as a thought experiment, while I know to how to build the sensor, as do others see Cilium and other efforts. What I am really trying to get to is as you mention is how to build a solution using true human AI. In a Security Operations Center (SOC) humans have the job of sorting through large amounts of information to determine the threats to a network. Currently there are many techniques to sort the data, machine learning being one and I assume LLM/LRMs are being investigated.

What the sensors are processing is just a set of numbers, different from pixels in the vision use case but still just numbers. So my core question (I think) is how to generalize learning models such that we do not have to code new learning modules for every different set of data. I think TBT is trying to create a model of the human brain. The human brain is very good at interpreting different sets of numbers - we are kinda slow processing large sets of numbers, though one could argue we are very fast at vision processing which is just a bunch of pixels.

So yes how would a learning model be designed to [quote=“HumbleTraveller, post:2, topic:416”]
Disambiguating ‘normal’ vs ‘anomalous’ patterns
[/quote] - in a generic manner so we are not creating new “Monty” implementations for every use case.

HumbleTraveller · January 29, 2025, 8:37pm

Ah, so you’re looking more into the agential side of Monty application, moreso than some bespoke app. I understand. With reference to your SOC example, I would ask, “how do humans do this?”

If we can solve for that problem then we’ll have what you’re looking for here. This is ultimatly what the core Monty team is striving towards, albiet it is still clearly ‘early days.’

As for my own approach to getting there, I have some thoughts, but they’re only just that, opinions. They’re not answers. You can see some of my (very) early brainstorming here: Modelling goal-defined behavior through model-based policy recursion

Doonhammer · January 31, 2025, 9:14pm

@hlee thanks for the informative reply. So if I understand it correctly I can create a new sensor that collects a new set of features which I pass into build_model. features is a dict so for every session I would create a dict that I would add the various network parameters as I build the model from the sensor e.g. srcIP, dstIP, srcPort, dstPort, protocol, then grow with message length, layer 7 information etc. Every packet received or sent would be step in the model. I would think about motion as once I know srcIP or dstIP I would move too a sensor at the other end of the conversation. Does this make sense?

Regards

John

Doonhammer · January 31, 2025, 9:32pm

@HumbleTraveller good question about how humans do it. The major issue is the sheer volume of data that is exposed. To solve this vendors have been building Security Information and Event Management systems and some are now adding “AI” which is typical machine learning trained on large amounts of existing data the needs to be continually retrained and have new ML systems added. All this is directed by humans who build the systems and design the ML systems to look for anomalies or patterns of attacks. Therefore as @hlee suggested if I added a set of network “features” monty can learn the behavior the question then is how to teach monty to recognized “bad actors” in the system, preferably without resorting to training. Any thoughts would be welcome.

Regards

John

HumbleTraveller · January 31, 2025, 9:47pm

hmmmmm. Give me some time and I’ll brainstorm up an approach or two (I do some work in the network security space). I’ll revisit this post later tonight with a suggestion!

Edit: Alright, I’m back.

To start, we should address the data volume concern you raise. You are correct in bringing it up, however, the flaw here isn’t in the SOC teams methodologies or strategies, but rather in their processing speed. If you trained a Monty system on your SOC teams functioning you could you could probably capture their behaviors and work strategies, however executing those strategies at the speed of silicon. You could even refer this Monty assisted approach as an “Augmented Security Operations Center” (A-SOC), or something like that.

We still have to contend with your preference of “without resorting to training,” however, I’m going to suggest you don’t actually want this. Ask yourself, would you ever hire an employee and then opt not to train them? Of course not. A fully agential Monty system should be no different, albeit their training regime will likely be both denser and more expedient.

As for how I might approach it…

I’d honestly take a red-team approach to it. Virtualize the pen testing procedure. I would create a virtualized representation of your networking environment then deploy Monty against it in an offensive role, documenting breaches. This will likely get you your low hanging fruit. I’ve been working on a security focused guide for a bit, you can find some rudimentary offensive network strategies on pages 101 - 109. (P.S. I was originally going for a Cyberpunk aesthetic when writing the doc, so please try to ignore the nerdy veneer )

But anyways, taking a red-team approach will do a few things for you: (1) You’ll identify vulnerabilities, obviously; (2) you can have a second Monty system (a blue-team Monty) monitoring net traffic at the time of pentest. Once the test is concluded, you can have that blue-team Monty cross-correlate network activity patterns against the test results. This will help it identify the “features” of known bad “behaviors” on the network. You can then carry this knowledge over to the monitoring of your real-world network. If blue-team Monty notices similar behavioral trends on the network, it can notify the rest of your A-SOC team.

Doonhammer · January 31, 2025, 11:58pm

@HumbleTraveller thanks for all the work you are putting into this also your book is very good! I think what I am struggling with is supervised learning versus un-supervised learning. The red/blue team you suggest will work and it is very similar to what current SIEM systems do with ML.

I would ask are we getting into the debate going in LLM/LRM systems around reinforced learning versus reinforced learning with human feedback. I was hoping that TBT was leading us to a model where unsupervised learning would work - but not being a neuroscientist I might be a little overly optimistic

I was thinking that the system would observe the network patterns, for example srcIP A always talks to dstIP B,C,D over port X and then it starts to talk to dstIP over port Y and sends a lot more data than perviously. There are a lot of these simple heuristics that are looked for (unfortunately usually after the attack has succeeded ). Fortunately most sophisticated attacks have a multistep approach so if the anomalous behavior is detected early enough it can be stopped, i.e. Cyber Kill Chain.

Sorry for the ramble I am kinda thinking out loud… Perhaps the question for @hlee is if I built a network sensor could the learning model detect anomalous patterns - kinda like looking at thousands of mugs and finding those that are chipped or cracked?

tslominski · February 1, 2025, 12:13am

For some use cases, anything Monty doesn’t recognize could be considered anomalous.

One approach would be to only learn the normal operational traffic (assuming you have traffic without a bad actor present ). This way, Monty learns all the “objects” considered “normal operations.” Monty will continuously try to recognize “objects,” so if it comes across something where the features don’t match any known objects, it means it never learned it during “normal operations.” Hence, that could be considered anomalous.

The trick will be how you represent normal operations “objects,” as you’ll want the adversarial activity to not match those “objects” with this approach.

Doonhammer · February 1, 2025, 12:31am

@tslominski thank for the thoughts. Assuming that nothing is 100% accurate, for instance the case where the first flow captured is a bad actor, it would only be after the system learned what good flows were before it flagged the “bad flow” - I think this is just real life. A human would have the same issue we need a set of data to work from.

I think the core question would be how do we replicate the human understanding/knowledge of normal versus abnormal. Perhaps there is the concept of a goal - find abnormal patterns - would this be part of the learning model. If the current learning model needed to find cracked or chipped mugs what would the changes that would be needed in monty?

HumbleTraveller · February 1, 2025, 3:17am

supervised learning versus un-supervised learning

I feel like what we’re really trying to ask here is, “discrete learning” vs “continuous.” I suspect the latter with end up being the correct path forward, and it’s what I believe the TBP team is engineering Monty to become. Also, for what it’s worth, Monty learns unsupervised. It has the ability to learn via supervised training, but that’s just for testing purposes. It’s primarily meant to be ran in an unsupervised manner.

looking at thousands of mugs and finding those that are chipped or cracked?

In order to do this the system would first need to know what a “proper” mug looks like.

The way I see it, you really only have two approaches to your network security thought experiment.

Option 1: you can observe known good network states (“proper mugs”). This is @tslominski’s suggestion. It is likely the most straightforward approach and also probably the safest. If we were to view it in cyber security terms, we might liken it to a form of ‘whitelisting.’

Option 2: my option. We damage the mugs ourselves. In this approach we are observing known bad states by intentionally creating those bad states ourselves. Think of this as the ‘blacklisting’ approach. This approach is messy and it’s expensive. However, its also more likely to expose unique attack vectors you otherwise may have missed.

Chances are, however, that the optimal solution to our thought experiment lies somewhere between options 1 and 2. It’s kind of like a biological system in this way, where ideal learning is achieved by finding a balance between excitation and inhibition. Does this make sense?

Not sure if my explanation is really helping you get to where you’re wanting to go.

Doonhammer · February 5, 2025, 12:25am

@HumbleTraveller sorry for the silence was thinking and trying to wrap my head around more of the documentation and code base. I am thinking that what I should be looking at is the policy module as this is where (I think) monty makes decisions on how to do deeper examination of an object under investigation and then how it compares with existing objects in the database. The current policy model is very tied to vision and I would need to write one that was focused on network data.

The question I would have for @hlee and @tslominski in the TBT team at what point an I just writing a completely new application following the TBT pattern/model - and is that what you are thinking about for extensions to Monty?

tslominski · February 13, 2025, 7:50pm

Hi @Doonhammer,

If you can encode/map your application onto the 3D space that works with the current learning modules, then the existing policies should work by analogy. For example, the existing model-based policy (hypothesis-testing GoalStateGenerator) should work as is - it just compares two similar graphs transformed by their most likely rotations.

Suppose your application only partially encodes/maps onto the 3D space for the current learning modules. In that case, you may need a new policy to compensate for the nuance of the specific encoding/mapping.

Similarly, suppose the encoding/mapping is very different and abstract. In that case, you’ll likely need a new policy, perhaps something more appropriate for network heuristics, e.g., a breadth-first vs. depth-first navigation policy.

Our eventual goal is for you to be able to bring your own learning modules and policies into the Monty framework.

Doonhammer · February 15, 2025, 8:14pm

Hi @tslominski thanks for the reply. So yes looking at the code I think I need to create my own policy and sensor modules. From what I can see the initial implementation of TBT is designed for 3D sensors, vision, touch, sound - so I am probably pushing the envelope somewhat.

I think I need to sit down and write some code to do a rough implementation of what I am thinking to see what is possible.

My core objective is to build a system that can learn in realtime and adapt to its environment without large training sets and have a small footprint. The small footprint and realtime are key goals as machine learning/LLM approaches while they work are cumbersome to train and deploy - something more like a human brain seems a better solution

Rich_Morin · February 15, 2025, 9:15pm

I don’t appear to have said much about it in this forum, but I’m quite interested in a closely related notion: harvesting and analyzing static and dynamic info from BEAM-based (e.g., Elixir, Erlang) systems. These have many characteristics in common with large-scale computing networks (e.g., code files, communication protocols, data formats, distributed hardware nodes, messages). And, although many of them appear to ignore security issues (possibly to their peril :-/), the scales involved can be similar.

FWIW, my own thinking has centered around using graph databases (e.g., Neo4j, ArangoDB) to store and access the collected information. I’ve also wondered about how well LLMs could ingest and use graph-based data.

Anyway, I’d like to suggest that you consider these options, as well as opening up your charter to include issues other than security. And, of course, try to use open and generic technologies and standards (e.g., from the Cloud Native Computing Foundation) in your work.

Using Monty for some analysis doesn’t seem at all silly to me; indeed, having a distributed version of Monty introspect itself seems like a Really Good Idea (:-)…

HumbleTraveller · February 15, 2025, 10:14pm

I may have an interest in a network analysis use case too. Not sure if we’d be interested in posting something up under the monty-code/projects category, but it wouldn’t be such a bad thing for the three of us to have a central place where we could bounce implementation ideas off from one another. Just a thought.

Rich_Morin · February 16, 2025, 12:54am

Agreed, but this thread seems like a good starting point for the moment. In addition, we can always link to Github resources (e.g., Gists, Pages, Repos, Wikis) and other forums.

Speaking of which, here’s a post I made recently in the Elixir Forum. It provides a bit more context, motivation, and detail than I’ve gone into above:

I’d like to see better tooling for exploring and understanding both the static and dynamic structure of complex Elixir (et al) systems. People talk about how Elixir systems can have millions of processes, but generally this involves a lot of replication. So, in Rich Hickey’s terms (e.g., Simple Made Easy), it doesn’t complect things very much.

The real challenge (IMHO) is making it easier to comprehend the relationships among thousands of data structures, files, functions, process and message types, etc. Observer hints at this when it draws diagrams of supervision trees, but that is only one type of connectivity. Each instance of message transmission or process spawning has the potential to create new relationships among sets of entities (e.g., CPUs, nodes, processes).

I’ve imagined setting up a system that could harvest these relationships from static (e.g., code) files and dynamic (e.g., trace) data, then record them in (say) a graph database such as ArangoDB or Neo4j. This would make them available to generate diagrams, sets of interlinked web pages, etc. There is also the possibility of using an LLM to examine and explicate this information. And a pony…

– What are the remaining gaps in the elixir ecosystem?

Doonhammer · February 16, 2025, 1:06am

@HumbleTraveller and @Rich_Morin happy to start something and start a GitHub repo to get it started. The concerns I have is this type of learning module on target for TBT/Monty ? This is probably a question for the core TBT team @tslominski can you give nay guidance? I am not worried about putting the work in just want to make sure the goal is somewhat possible

On a more practical view my Python is a little rusty and looking at the Monty code base I think it might require a lot of surgery to make it extensible for sensor/policy and learning modules but this could be a good learning experience towards making Monty extensible.

HumbleTraveller · February 16, 2025, 1:37am

I honestly think the graph matching module will work fine. As for the evidence based extension, let me brain storm some ideas. I might be able to whip up a couple different approaches, then we can touch base and figure out where to go from there. Would that work for you?

@Rich_Morin, let me educate myself on Elixir systems a bit more. I want to take a closer look at your post, but don’t know enough about it to make any sort of meaningful comment yet.

Topic		Replies	Views
Using Monty for Sensorimotor Learning Over Industrial SCADA Sensor Layouts Monty Code	3	92	April 24, 2025
2024/01 - Current Capabilities of the first TBP Implementation, Monty Video Discussions core-video	3	152	December 10, 2024
2024/10 - Overview of Action Policies in Monty, Part II - Model-Based Policies Video Discussions core-video	3	81	December 6, 2024
2025/04 - TBP Future Applications & Positive Impacts Video Discussions core-video	0	62	April 4, 2025
About "displacement cells" Research and Theory	15	150	February 24, 2025

Extending and Generalizing TBT Learning Models

Related topics