Modelling goal-defined behavior through model-based policy recursion

Thank you @HumbleTraveller while I haven’t had a chance to look through your entire proposal, I thought it would be helpful to paraphrase the approach we are currently taking. In particular, we are seeking to find a universal algorithm that is reused by cortical columns across the brain, from primary sensory cortex, all the way up to and including prefrontal cortex. One familiar aspect of this is that we believe that cortical columns throughout the brain each are modeling entire “objects”, that is discrete, structured entities composed of other objects. These could be everything from the physical model of a coffee mug, to the conceptual model of how you plan a day.

As part of this, we believe that every learning module contains model-based policies, and is able to generate goal-states based on the objects (i.e. models) it knows about. As such, we don’t think there will be a single part of the brain responsible for model-based policies (like PFC or motor cortex), but rather that these will be found throughout the brain. This is why a Goal-State Generator (GSG) exists within each Learning Module, where the GSG may map onto layer 5 in a cortical column, although that is speculative. This is also an important aspect of how complex tasks like making coffee can be broken down by learning modules that know about different objects (day planning, kitchen layout, coffee machines, power buttons, etc.), where each learning module can be sequentially recruited when necessary.

I can recommend checking out our recently posted videos on compositional policies if you haven’t already seen them:
Part 1
Part 2
Part 3

Hope that makes sense and thanks for your ongoing interest in the Thousand Brains Project!