Using Monty to explore graphs?

It strikes me that Monty could be tasked with exploring graphs. For instance, what if it were asked to find patterns in a highly-linked web site such as Wikipedia? What interesting insights and/or issues might it discover?

Getting to a (possibly) more practical application, I wonder whether Monty could be used to model the connectivity and behavior of complex software systems. For example, a web site using the Elixir language and the Phoenix library might well have thousands of functions, modules, and nodes, millions of processes, etc. This sort of complexity is far beyond my capacity to grok, but perhaps Monty could help.

Specifically, I could imagine Monty crawling through the graph of entities and relationships, looking for issues such as messages which are never matched by the recipient processes.

-r

3 Likes

Interesting thought! We haven’t applied Monty to more abstract spaces like this but the general idea is that as long as you can define actions/movement and observations it can move through virtual/conceptual space and learn from it, just like it can move and learn in physical space :slight_smile:

1 Like

As I understand it, a pose is a triple containing three angles in a cartesian coordinate system. So, for example, it doesn’t encode distance. I’m not sure how one might use this to encode relationships in a (probably directed) graph. Might you have any ideas about this?

To motivate the discussion, let’s assume that we are exploring entities and relationships in a running Elixir system and have edges such as:

  • which functions are defined in which modules
  • which functions are used in which processes
  • which processes send messages to which processes
  • which modules are defined in which libraries

-r

Pose, as we use the term, contains location and orientation info (Glossary). If you have two poses (for example, by moving), you can then calculate the displacement between the two (which can tell you a distance). Does that make sense?

2 Likes

I also thought about a similar sensor that can learn to use a shell, so it must understand how software works in a very simple way.

There would probably be two agents:

  • A Visual Agent that processes a shell screenshot at every step, using, for example, 100 visual sensors.
  • A Touch Agent with one or more sensors that outputs motor commands to a keyboard:
    • Each key on the keyboard has to have a unique curvature and location.
    • When a key is sensed, the corresponding keystroke will appear in the shell program, and the Visual Agent can recognize it.

However, I am unsure if the system is capable of learning the relationships necessary to effectively use or manipulate the world. Would be greate to have your thoughts.

Best regardes and @Rich_Morin this is a very interessting idea with the graph exploring

Thanks for the clarification. Unfortunately, it seems like the CMP is pretty much hard-coded to work in 3D space. So, for example, when a finger moves along the surface of a given cup, it will always get consistent feedback about angles, positions, etc. However, I don’t see how (say) a web crawler wandering around Wikipedia could map the URLs it finds into anything useful. Am I missing something?

-r

1 Like

What do you think about that @Rich_Morin,

to map URLs into a 3D space, you could use a hash function to transform each URL into numeric values. These values can then be converted into 3D coordinates, for example by dividing the hash into three parts corresponding to the x, y, and z axes. This allows you to create a spatial representation of the web, where each URL occupies a unique position. Normalizing coordinates would also ensure a meaningful distribution of points.

I also thought about some other coding and mapping methods to build sensors working in the digital space but its really bending my head :sweat_smile:

There are many possible ways to map graph nodes and edges (e.g., entities and relationships, web pages and URLs) into CMP. However, my concern is that the result would lack the internal consistency that the physical world provides. That is, CMP can encode location and orientation without problems because these characteristics exist in the external world.

When Monty explores the surface of a cup, it can take advantage of the fact that there are (commonly) connected surfaces nearby. The orientations may be different, but there is still a degree of local consistency. So, the exploration can take lots of tiny steps and get tiny (but useful) results. The code can even employ strategies to predict and follow the surface geometry, etc.

Wikipedia URLs, in contrast, have no inherent location or orientation. All they have is connectivity, a label, and some context about nearby text. Hashing this information into CMP fields wouldn’t provide the same sort of consistency. It seems like this would interfere with Monty’s analysis…

-r

This is a very interesting topic that we didn’t explore too much yet. Let me make a distinction between the vision and the current implementation:
The general idea is that the CMP should not be restricted to 3D space but that it just defines features at a pose in a common reference frame. We should be able to take any space that we can move through and get movements and relative poses in that space, whether it is 1D, 2D, 3D or ND.
One strange thing about navigating the internet is that it is unclear how path integration works there. I could imaging just modeling one webpage where we have buttons at relative locations to each other and clicking them brings you to a new location on that page but you also know how to get back to where you came from. There is probably not much sense in modeling rotation in that case but that should be fine, it just simplifies the problem. However, when modeling the web, it seems like space is much weirder if we define movement as following one link to other page. We may have circular links but that doesn’t mean there is a way to cut across the circle to take a shortcut for instance. I’m not sure what kind of space/reference frame this could be embedded in. Definitely something to think about more.
-Viviane

3 Likes

Just to add to Viviane’s points, this should be similar to other abstract graphs with discrete transitions, such as family trees. Somehow (and that’s the million dollar question!), we need to learn that the input change that happens when one web-page moves to the next is a movement/displacement in the graph space. Learning this may be similar to learning how to integrate sensory flow, but here the feature changes are transitions in abstract object space (web page A → web page B, etc.).

In terms of path integration, if you’ve learned the space (i.e. which pages link to others) then you should be able to use a model based policy to follow a novel path through the graph, even if there aren’t short-cuts. You might also be able to infer certain “short-cut” type interactions, e.g. most pages on a website still display the home-page, which always brings you back to a consistent node.

So it feels like it should still be quite possible to embed in Monty space, but we still need to figure out how to learn the abstract “movements” in an unsupervised way.

A recent paper from Abhi Iyer (a previous Numenta employee, now PhD student at MIT) and Ila Fiete is appearing at this year’s NeurIPS, and might be a useful hint for how to learn such abstract movements.

3 Likes

It seems like a very nice paper! Thanks for the pointer

1 Like

The pdf version of this paper is available here

4 Likes

Clearly, many sighted users base reference frames on the 2D locations of, and/or relationships between, items on display screens, web pages, etc. (Just listen to the screaming when layouts or even styles are changed…) More critically for blind users, a screen reader can only rely on page structure.

Web page designers commonly wrestle with these sorts of issues. In general, page structure is defined by HTML, while layout is defined by CSS (and sometimes JavaScript). Adaptive and responsive design also play a part here. Best Practice, AFAIK, is to use logical page structure and Semantic HTML to address these sorts of issues.

Similarly, web site reorganization can change URLs, causing problems for users and/or code that depend on “deep linking”. For that matter, humans commonly use physical placement to encode semantic relationships (“Those things belong over there…”)

Regardless of the way that humans do things, however, I’d be happier if Monty’s reference frames were based on characteristics that are less likely to change. Any other approach leads to brittle systems…

-r

1 Like