LeCun already has his hands full with V-JEPA over at Meta. He’s still very much into deep learning, and biological plausibility isn’t his focus.
Pearl’s work is more orientated toward statistics and symbolic AI, which is a long ways off from where Monty stands at right now. He seems more into political activism than research these days, not to mention doing it at a venerable 89 years old!
Sutton is plenty busy with reinforcement learning research alongside John Carmack, on top of his academic duties at UAlberta.
Notably, Sutton wrote the famous bitter lesson article, asserting that the most effective AI approaches are those that leverage computation and general-purpose methods, rather than those built around human representations of knowledge. These passages seem of particular relevance to TBP:
In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of [scale-invariant feature transform] features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.
[…] the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity.
In 2019, Hawkins was asked about Sutton’s bitter lesson on the Lex Fridman Podcast #25 (01:12:43). Long story short, Lex asked if scaling would lead to intelligence; Hawkins said no and that there’s an alternate path, then talked about his early vision of TBP. (I think Lex’s question was a bit “watered down”, he kinda shifted sideways to the emergence of intelligence through scaling, rather than the effectiveness of general methods.)
Over 6 years later, with LLMs and everything, the landscape has changed quite a bit! Sutton appeared on the Dwarkesh Podcast a week ago, clarifying his statements about the bitter lesson:
Full transcript, definitely worth a read: Richard Sutton – Father of RL thinks LLMs are a dead end
Here’s the most relevant part:
Patel: “Why do we need a whole new architecture to begin doing experiential, continual learning? Why can’t we start with LLMs to do that?”
Sutton: “In every case of the bitter lesson, you could start with human knowledge and then do the scalable things. That’s always the case. There’s never any reason why that has to be bad. But in fact, and in practice, it has always turned out to be bad. People get locked into the human knowledge approach, and they psychologically… Now I’m speculating why it is, but this is what has always happened. They get their lunch eaten by the methods that are truly scalable.”
Patel: “Give me a sense of what the scalable method is.”
Sutton: “The scalable method is you learn from experience. You try things, you see what works. No one has to tell you. First of all, you have a goal. Without a goal, there’s no sense of right or wrong or better or worse. Large language models are trying to get by without having a goal or a sense of better or worse. That’s just exactly starting in the wrong place.”
I wonder what would Sutton think of the TBP. According to his podcast comments, he’d probably see potential in it, contrary to what he wrote in 2019.
In fact, I’d love to hear Sutton and Hawkins discuss together about their opinions. I’m sure it would be very fascinating…
Patel also published a follow-up today, based on feedback from viewers: Some thoughts on the Sutton interview - by Dwarkesh Patel
Some tidbits:
What is the bitter lesson about? It is not saying that you just want to throw as much compute away as possible. The bitter lesson says that you want to come up with techniques which most effectively and scalably leverage compute.
LLMs aren’t capable of learning on-the-job, so we’ll need some new architecture to enable continual learning. And once we have it, we won’t need a special training phase — the agent will just learn on-the-fly, like all humans, and indeed, like all animals.
Models of humans can give you a prior which facilitates learning “true” world models.
Being able to continuously learn from the environment in a high throughput way is obviously necessary for true AGI. And it clearly doesn’t exist with LLMs trained on RLVR.
Even if Sutton’s Platonic ideal doesn’t end up being the path to first AGI, he’s identifying genuine basic gaps which we don’t even notice because they are so pervasive in the current paradigm: lack of continual learning, abysmal sample efficiency, dependence on exhaustible human data.