Do today's AI models really remember, think, plan, and reason like the human brain? Some AI labs would like you to think so. But no, says Meta chief AI scientist Yann LeCun. And he thinks we'll get there in a decade or so by pursuing a new method called a "world model.".
Earlier this year, OpenAI introduced a feature it calls "memory," which allows ChatGPT to remember your conversations. With its latest generation of models, called o1, the output occasionally shows the word "thinking," but OpenAI claims that the same models can perform "complex reasoning.".
All of that sounds as if we're pretty much at the door to AGI. Speaking recently at the Hudson Forum, LeCun sabotaged the optimists about AI, including xAI founder Elon Musk and Google DeepMind co-founder Shane Legg, who suggest human-level AI is just around the corner.
"We need machines that understand the world; [machines] that can remember things, that have intuition, have common sense, things that can reason and plan to the same level as humans," said LeCun during the talk. "Despite what you might have heard from some of the most enthusiastic people, current AI systems are not capable of any of this."
LeCun says the most well-known members of that class-today's big language models, of which those powering ChatGPT and Meta AI are the most-famous examples-are still far from "human-level AI." Mankind is "years to decades" away from that, he said later. (That doesn't stop his boss, Mark Zuckerberg, from asking him when AGI will happen, though.)
The reason why is simple: those LLMs work by predicting the next token (usually a few letters or a short word), and today's image/video models are predicting the next pixel. Language models are one-dimensional predictors, whereas AI image/video models are two-dimensional predictors. These models have become pretty good at predicting in their respective dimensions but do not really understand the three-dimensional world.
According to this, modern AI systems simply can't do things that most humans can easily do-by clearing a dinner table by the time they are 10 or driving a car by 17. And they both learn in hours, while even the world's most advanced AI systems today, built on thousands or millions of hours of data, can't reliably act in the physical world.
Adding complexity to that is that according to LeCun, we must start building three dimensional models which perceive the world about you, and focus on a new kind of AI architecture: world models.
A world model is your mental model of how the world behaves," he explained. "You can imagine a sequence of actions you might take, and your world model will allow you to predict what the effect of the sequence of action will be on the world."
Consider the "world model" in your head. For example, picture looking into a dirty bedroom and wanting to clean it up. You can think of how picking up all the clothes and putting them away would do the trick. You do not need to try several methods or learn to clean a room first. Your brain observes the three-dimensional space, and creates an action plan to achieve your goal on the first try. That action plan is the secret sauce that AI world models promise.
Part of the benefit is that world models can take in orders of magnitude more data than LLMs. That also makes them computationally intensive, which is why cloud providers are racing to partner with AI companies.
World models are the big idea several AI labs are now chasing and the term is quickly becoming the next buzzword to attract venture funding. A group of highly respected AI researchers, including Fei-Fei Li and Justin Johnson, just raised $230 million for their startup, World Labs. The "godmother of AI" and her team are also convinced world models will unlock significantly smarter AI systems. OpenAI also describes its unreleased Sora video generator as a world model, but hasn't gotten into specifics.
He outlined an idea for using world models to create human-level AI in a 2022 paper on "objective-driven AI," though he notes the concept is over 60 years old. In short, a base representation of the world (such as video of a dirty room, for example) and memory are fed into an world model. Then the world model predicts what the world will look like based on that information. Then you provide the world model with objectives that include an altered state of the world you would like to achieve (like a cleaned room) and guardrails so the model does not harm humans in the process of attaining the objective (don't kill me in the process of cleaning my room, please). The world model then finds an action sequence to achieve these objectives.
Meta's longterm AI research lab, FAIR or Fundamental AI Research, is working toward building objective-driven AI and world models, LeCun said. FAIR was used to work on AI for Meta's new products; however, LeCun opined that the lab has transitioned in recent years to purely longterm AI research. FAIR doesn't even use LLMs anymore, said LeCun.
World models-an incredibly interesting idea-but LeCun says we haven't made much progress on bringing these systems to reality. "There's a lot of very hard problems to get from where we are today," he says and "it's certainly more complicated than we think.".
"It's going to take years before we can get everything here to work, if not a decade," said Lecun. "Mark Zuckerberg keeps asking me how long it's going to take."