2023/01 - A Comprehensive Overview of Monty and the Evidence-Based Learning Module

mthiboust · November 25, 2024, 10:16pm

vclay:

Good questions! One of the requirements for any sensor that connects to Monty is that it can infer it’s pose (location & orientation) in space. The pose is extracted with the sensor module and then sent to the learning module using the CMP. For many sensors, like touch, lidar, echolocation, etc. figuring out the location of the sensor patch in space is pretty straight forward. But for vision it is not that easy and it requires depth perception. We know that the brain uses many different cues (besides stereopsis) to infer depth from the inputs that hit the retina. Where exactly this happens is unclear. One interesting thing about columns in primate V1 is that they have some extra sublayers in L4. One might speculate that these could be used to extract depth information. But this is just speculation.

Thanks for the clarification, I understand your reasoning better now. It seems that motion parallax would be a good candidate to compute the depth information inside a visual LM without relying on the other LMs. I guess this capability will come later with the integration of temporal dynamics in your algorithm, but I understand that you can already use a temporary shortcut by directly feeding the depth information as a first step.

As for your speculation about where this computation happens in the brain, I would not target the extra sublayers in V1 L4 if we want to extend this capability to mice that don’t have this layer subdivision but still have many depth-selective neurons in V1 L2/3 (a recent reference that you probably already know about: A depth map of visual space in the primary visual cortex ). Said differently, the depth information could be a product of the canonical LM algorithm itself. That being said, maybe the extra sublayers in L4 can enhance this process but it would be nicer if they are not strictly required.

vclay:

Yes, we expect any LM to be able to recognize objects invariant of rotation, location, and scale. We definitely don’t want to have to rely on hierarchy for this (although LMs at different levels in the hierarchy may have different spatial resolutions and limits to the size of objects they can model). The current testing for rotation hypotheses could be accomplished by the L6b feedback projection to the thalamus communicating the rotation hypothesis. The thalamus uses this to rotate the incoming sensory information into the LM’s object’s reference frame. The LM can then directly compare whether it is consistent with its model of the object. For scale we think that the brain may be using theta frequencies in the grid cell mechanism (we have a whole meeting recording on this, I’ll see about pulling it forward in the release schedule) but we are not sure yet how to translate this to Monty. If you have any ideas on how to achieve scale invariance in Monty I’d love to talk more about that.

Personally, I feel that recognizing complex objects (like the shape of a coffee cup) in a rotation- & scale-invariant way at the level of a single LM is a strong bet. On my side, I have no clue how a cortical column could achieve this by itself (in fact, this is the main reason why I am so keen on relying on inter-LM interactions with “less capable”-LMs to recognize rotation- & scale-invariant cup-like objects). Still, I am curious to see where this path leads.

You mention some hypotheses about how it could be implemented in the brain. I currently have different speculations for the L6b modulatory feedbacks and a grid-cell-like phase coding mechanism in the neocortex:

Thalamo-cortical projections to L4 convey information about upcoming motor commands (either explicit like “contract a given neck muscle” or implicit when it is encoded as desired goals / expected outcomes like “turn the head 30° left”; the thalamus gets those signals from other cortical areas via their L5 PT projections, cerebellum or other subcortical motor nuclei like the superior colliculus) and also sensory stimuli for first-order thalamic nuclei. The L6 cortico-thalamic feedbacks of a cortical column dynamically adapt the gain of those motor-command-related thalamo-cortical projections in order to keep its reference frame in sync with the upcoming changes. When the reference frame is in sync with the upcoming changes (the prediction is accurate), then the thalamo-cortical activity is gated; if there is a difference, then only the delta is transmitted to the cortex.
The grid cell phase precession phenomenon (where grid cells fire at progressively earlier phases of the local theta rhythm as an animal moves through the spatial field of the grid cell) is something that could be at play in neocortical columns as well: imagine if a LM sequentially outputs 4 object IDs in each cycle (4 gamma periods inside an alpha period for the cortex, instead of 6-7 gamma periods inside a theta period for the medial entorhinal cortex where grid cells are). Those 4 sequential object IDs could represent dynamic trajectories of past, present and future of the matched object. There is already some evidence of this in the PFC but I haven’t found any such evidence for other cortical areas yet (wrong speculation or maybe not fully tested yet by experimentalists?).

Not sure to understand what you mean by gridcell-like mechanism in L6. For me, the analogy between the mEC and the neocortex is as follows: grid cells (primarily located in L2 of mEC) directly represent object IDs, similar to how neurons in L2/3 of a LM represent object IDs. The “object IDs” of mEC represent allocentric locations whereas “object IDs” of other cortical areas represent objects. Maybe I should use the term “concept ID” instead of “object ID” to make it clearer. L2/3 concept IDs related to allocentric, egocentric, arm-centric locations (computed in the temporal and parietal lobes) are then used by other LMs in their deep layers as cues for their reference frame computations. I know that mEC is an evolutionary-ancient cortex that has differences with the neocortex but I hypothesize that the main framework is still at play here (if you don’t agree with this, where would you put the limit in this “cortical continuum” from mesocortex to neocortex?). Also, I am keen on seeing an analogy between the potential “phase pinwheels” of grid cells in mEC with the orientation pinwheels in primate V1, but it is very speculative and I am going off-topic here.

Generally speaking, I am very interested in your understanding and vision of how the biological cortex works at a macro & micro level and the biological evidence that supports it. I know it is a lot to ask (obviously!), but I hope that we can have some insightful discussions about it here. Thanks again for doing this research in the open!

Topic		Replies	Views
2024/10 - Overview of Action Policies in Monty, Part I - Model-Free Policies Video Discussions core-video	1	49	January 5, 2025
2024/08 - Neural Elements on our Roadmap Video Discussions core-video	8	148	January 22, 2025
Building egocentric models of local space from retinal input Research and Theory	1	26	January 7, 2025
2024/10 - Overview of Action Policies in Monty, Part II - Model-Based Policies Video Discussions core-video	3	80	December 6, 2024
2024/01 - Current Capabilities of the first TBP Implementation, Monty Video Discussions core-video	3	151	December 10, 2024

2023/01 - A Comprehensive Overview of Monty and the Evidence-Based Learning Module

Related topics