Proposal: Engage Sutton, LeCun, and Pearl to Accelerate AGI

After watching the capability raodmap of tbp: https://youtu.be/Iap_sq1_BzE, I sincerely suggest TBP initiate focused discussions with three key pioneers: Richard Sutton, Yann LeCun, and Judea Pearl. While they work in different domains, they all share TBP’s core vision: building intelligent agents that learn through interaction with a structured world.

**Why This Matters**

Their expertise directly addresses the key components needed for AGI, and each offers a piece of the puzzle that complements TBP’s approach:

* **Richard Sutton** is the authority on **scalable learning through trial and error** (Reinforcement Learning). His work provides the framework for agents to learn optimal actions, a perfect fit for TBP’s sensorimotor loops.

* **Yann LeCun** is a leader in building **predictive world models** through self-supervised learning. His focus on learning representations from observation aligns directly with TBP’s goal of creating models of objects and their behaviors.

* **Judea Pearl** established the foundations for **causal reasoning**, allowing agents to understand “why” things happen and to reason about interventions and counterfactuals. This is the critical next step beyond correlation, enabling true understanding and robust planning.

**The Opportunity**

By bringing these three perspectives into conversation with TBP’s object-centric, reference-frame architecture, we can create a powerful synthesis. Their combined knowledge offers a clear path toward agents that can learn, predict, and reason—the trifecta for AGI.

Given their senior status and the fast pace of AI, now is the time to build these connections. A simple outreach to share our progress and invite them to a roundtable could spark invaluable collaboration.

note: I know TBP has had Richar Sutton to have a speech years before, but better engage in with them deeply.

4 Likes

LeCun already has his hands full with V-JEPA over at Meta. He’s still very much into deep learning, and biological plausibility isn’t his focus.

Pearl’s work is more orientated toward statistics and symbolic AI, which is a long ways off from where Monty stands at right now. He seems more into political activism than research these days, not to mention doing it at a venerable 89 years old!

Sutton is plenty busy with reinforcement learning research alongside John Carmack, on top of his academic duties at UAlberta.

Notably, Sutton wrote the famous bitter lesson article, asserting that the most effective AI approaches are those that leverage computation and general-purpose methods, rather than those built around human representations of knowledge. These passages seem of particular relevance to TBP:

In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of [scale-invariant feature transform] features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.

[…] the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity.

In 2019, Hawkins was asked about Sutton’s bitter lesson on the Lex Fridman Podcast #25 (01:12:43). Long story short, Lex asked if scaling would lead to intelligence; Hawkins said no and that there’s an alternate path, then talked about his early vision of TBP. (I think Lex’s question was a bit “watered down”, he kinda shifted sideways to the emergence of intelligence through scaling, rather than the effectiveness of general methods.)

Over 6 years later, with LLMs and everything, the landscape has changed quite a bit! Sutton appeared on the Dwarkesh Podcast a week ago, clarifying his statements about the bitter lesson:

Full transcript, definitely worth a read: Richard Sutton – Father of RL thinks LLMs are a dead end

Here’s the most relevant part:

Patel: “Why do we need a whole new architecture to begin doing experiential, continual learning? Why can’t we start with LLMs to do that?”

Sutton: “In every case of the bitter lesson, you could start with human knowledge and then do the scalable things. That’s always the case. There’s never any reason why that has to be bad. But in fact, and in practice, it has always turned out to be bad. People get locked into the human knowledge approach, and they psychologically… Now I’m speculating why it is, but this is what has always happened. They get their lunch eaten by the methods that are truly scalable.”

Patel: “Give me a sense of what the scalable method is.”

Sutton: “The scalable method is you learn from experience. You try things, you see what works. No one has to tell you. First of all, you have a goal. Without a goal, there’s no sense of right or wrong or better or worse. Large language models are trying to get by without having a goal or a sense of better or worse. That’s just exactly starting in the wrong place.”

I wonder what would Sutton think of the TBP. According to his podcast comments, he’d probably see potential in it, contrary to what he wrote in 2019.

In fact, I’d love to hear Sutton and Hawkins discuss together about their opinions. I’m sure it would be very fascinating…


Patel also published a follow-up today, based on feedback from viewers: Some thoughts on the Sutton interview - by Dwarkesh Patel

Some tidbits:

What is the bitter lesson about? It is not saying that you just want to throw as much compute away as possible. The bitter lesson says that you want to come up with techniques which most effectively and scalably leverage compute.

LLMs aren’t capable of learning on-the-job, so we’ll need some new architecture to enable continual learning. And once we have it, we won’t need a special training phase — the agent will just learn on-the-fly, like all humans, and indeed, like all animals.

Models of humans can give you a prior which facilitates learning “true” world models.

Being able to continuously learn from the environment in a high throughput way is obviously necessary for true AGI. And it clearly doesn’t exist with LLMs trained on RLVR.

Even if Sutton’s Platonic ideal doesn’t end up being the path to first AGI, he’s identifying genuine basic gaps which we don’t even notice because they are so pervasive in the current paradigm: lack of continual learning, abysmal sample efficiency, dependence on exhaustible human data.

5 Likes

Thank you for the detailed feedback, AgentRev!

### Blueprint, not methods

The proposal is about a shared architectural blueprint, not about lifting any specific method wholesale from LeCun, Sutton, or Pearl. The common skeleton is: build a reasonably correct world model, set explicit goals, and let learning and planning derive effective actions. That skeleton aligns with TBP’s biologically grounded sensorimotor paradigm, even if individual implementation choices differ.

### Not anti‑biological

It seems implied that LeCun’s deep learning path and Pearl’s causal inference are somehow “anti‑biological,” hence at odds with TBP. The intent here is the opposite. Biology provides the organizing principles (world modeling, goal-directed learning, continual adaptation), but an IT agent need not replicate every biological mechanism at every layer. When a specific subproblem admits an efficient algorithmic solution, it is reasonable to let a learned system recognize that structure and invoke the right tool.

- Deep/shallow learning as intuition: a learned module can provide fast, heuristic “intuition” to detect when a problem is best solved by an explicit algorithm. That intuition layer need not be extremely deep; even relatively shallow learned controllers (think in the spirit of [NEAT](https://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf)) can be sufficient when the structure is clear. The point is to keep the sensorimotor learning loop central while allowing a learned policy to trigger specialized solvers when appropriate intuition is inevitable(For instance, an F1 champion can explain his overall race strategy, but he cannot articulate the conscious reasoning behind EACH split-second decision, such as precisely when to turn the wheel or apply the brakes… These are gained to be also treatead as reasonable by day after day’s training,but without explicit symbolic explaination).

- Causal inference as explanation and control: Pearl’s framework is not “just statistics.” It encodes directionality, interventions, and counterfactuals, which are central to explainability and deliberate reasoning. In a sensorimotor agent, causal structure helps answer “what if I act this way now?” and “why did this plan work?”, without hardcoding brittle, high-level human ontologies.

### Example: trip planning

Consider trip routing with rich map data. A sensible agent pipeline is: 1. perceive and abstract locations and connections; 2. internally form a graph; 3. then select a known optimal algorithm (e.g., Dijkstra) to compute the route; 4. finally explain the choice and the route via causal reasoning about why paln this way, and the practicial details like: delays, constraints, and trade-offs.

This does not abandon biology fundemental; it uses biological principles to learn the right abstractions and meta-choices, and then leverages the most efficient solver once the structure is recognized.

- Sensorimotor + learning: build the world model of places, edges, and costs.

- Learned “intuition”: recognize this as a shortest-path problem and pick the solver.

- Causal explanation: clarify why decides to use the used algorithm to plan.

### Collaboration, scoped

“Collaboration” here need not mean direct co-authorship with the three pioneers. It can be team-to-team exchanges, leveraging open research artifacts, or focused consultations on subproblems where their groups have strong prior art (e.g., world-model learning heuristics, efficient continual learning, or causal modeling for explainability). The aim is to keep TBP’s biological blueprint intact while being pragmatic about subcomponents where non-biological—but architecturally consistent—techniques accelerate progress.

If this framing resonates, happy to refine concrete sub-areas where cross-pollination could yield the most leverage without diluting TBP’s core principles.

I think I just make things too complex. After all, this discussion is trying to see the future path more clearly on implementation based on the official roadmap of clear final end targets. But it can be distilled into a simple question: when facing implementation choices, how should TBP balance three key dimensions?

Here’s a framework covering all meaningful scenarios(a kind of solution when facing a certain problem):

Decision Framework:

Scenario | Efficiency | Explainability | Biological-Closeness | Recommendation
---------|------------|---------------|-----------------------|---------------
    A    |     ⬆️      |       ⬆️       |       ⬇️       | ✅ Adopt (transitional)
    B    |     ⬆️      |       ⬇️       |       ⬇️       | ⚠️ Limited use
    C    |     ⬆️      |       ⬇️       |       ⬆️       | ✅ Strong priority
    D    |     ⬇️      |       ⬆️       |       ⬆️       | ✅ Long-term focus
    E    |     ⬇️      |       ⬇️       |       ⬆️       | ⚠️ Research-only
    F    |     ⬇️      |       ⬆️       |       ⬇️       | ❌ Reject

Rationale for each scenario:

Scenario A - :white_check_mark: Adopt (transitional):
Enables rapid iteration and research transparency; aligns with TBP’s current “simplified implementation” approach. Examples: causal inference tools, explicit graph algorithms.

Scenario B - :warning: Limited use:
Only as peripheral components (e.g., visual feature extractors); core learning modules must remain interpretable.

Scenario C - :white_check_mark: Strong priority:
Biologically-inspired efficiency is TBP’s ultimate goal; multiple modules can compensate for individual opacity. Examples: Hebbian learning, columnar processing.

Scenario D - :white_check_mark: Long-term focus:
The research gold standard – slow now, but enables understanding why biological algorithms work and iterative refinement. For example, this approach could illuminate why humans systematically make decision errors in problems like the Monty Hall dilemma, revealing the underlying neural mechanisms that produce cognitive biases rather than just engineering around them.

Scenario E - :warning: Research-only:
High biological fidelity brings valuable insights, but the combination of slowness + opacity makes engineering iteration nearly impossible. Reserve for targeted theoretical investigations.

Scenario F - :cross_mark: Reject:
No practical or theoretical value; violates all TBP principles.

Key Principle: TBP should maintain biological inspiration as the architectural skeleton while being pragmatic about implementation details. As the project matures, successful “shortcuts” (Scenario A) should evolve toward biologically-grounded alternatives (Scenarios C/D) as understanding deepens—aligning with Sutton’s insight that truly scalable methods emerge from interaction-based learning, not hand-coded structure.

Thanks for the recommendation @dbsx I have huge respect for all three of those researchers and what they contributed to their fields over the past decades. I loved reading Richard Sutton’s book on reinforcement learning as well as Yann LeCun and Judea Pearl’s papers. I can definitely see how some focused discussions and exchange of ideas with them (and other researchers) would be valuable.

The past year has been crazy with setting up the project and getting it off the ground, but we are definitely interested in such scientific exchanges. If you have any connections to one of those researchers or are organizing a workshop or other event with a fitting scope, feel free to reach out or let them know to reach out :slight_smile: We are also planning on organizing more exchanges with other researchers in the field, similar to how Numenta did, but just haven’t gotten around to that in the past year.

6 Likes

Maybe Machine Learning Street Talk podcast would be a very nice stage for such great discussions, they know Jeff Hawkins, Rich Sutton and Yann LeCun. I got to know Thousands brain theory from their podcast, that was a great interview.

2 Likes

I saw that bitter lesson video and it’s like so many with Hawkins where people insist that intelligence is something only humans have. The part of those style of interview that is endlessly annoying to me is that there is a constant referral to other people’s ideas without any real understanding of them. The worst was someone who interviewed Hawkins and paused to talk about how he prepared by asking chatGPT to summarize decades of research, and he believed he understood the topic well enough after that to have an in-depth discussion. Well, maybe the worst was actually the guy who wouldn’t stop asking Jeff about sexbots and whether they should have emotions.

Part of the problem is that people today are very passive. They look for other people to solve the problems we could solve together by looking for solutions instead of a savior. The top 10% of IQ is 800 million human beings. They can’t all be too busy or apathetic to get involved. There’s more than enough “ordinary brilliant people” to do anything humanity wants to do that is possible to do. We don’t need celebrities. Maturity yes, celebrities no. In reality, we don’t need AI. We need to be better educated human beings who can work cooperatively in extremely large groups to solve enormous problems like how to structure a civilization in a way that it doesn’t attract and encourage parasites and predators who burn the planet down. If we can’t do that, this is all just a nerd party at the end of the world.

4 Likes