If he's right (that LLMs cannot achieve AGI, but what he's working on can, and does), this would be huge for AI and humanity at large.
Hope it puts to bed the "Europe can't innovate" crowd too.
bluefirebrand•Mar 10, 2026
I'm still just so surprised any time I encounter people who think AI will be overall good for humanity
I pretty strongly think it will only benefit the rich and powerful while further oppressing and devaluing everyone else. I tend to think this is an obvious outcome and it would be obviously very bad (for most of us)
So I wonder if you just think you will be one of the few who benefit at the expense of others, or do you truly believe AI will benefit all of humanity?
sofixa•Mar 10, 2026
> So I wonder if you just think you will be one of the few who benefit at the expense of others
It's not a zero sum game, IMO. It will benefit some, be neutral for others, negative for others.
For instance, improved productivity could be good (and doesn't have to result in layoffs, Jevon's paradox will come into play, IMO, with increased demand). Easier/better/faster scientific research could be good too. Not everyone would benefit from those, but not everyone has to for it to be generally good.
Autonomous AI-powered drone swarms could be bad, or could result in a Mutually Assured Destruction stalemate.
AndrewKemendo•Mar 10, 2026
>It's not a zero sum game, IMO. It will benefit some, be neutral for others, negative for others.
This is literally a description of a zero sum game
sofixa•Mar 10, 2026
No, a zero sum game would require for the "winners" to take it from the "losers", and there is a limited amount to go around. If there is a majority of "winners" by expanding, some neutral, some negative, that is not a zero sum game.
AndrewKemendo•Mar 10, 2026
> No, a zero sum game would require for the "winners" to take it from the "losers"
You’re so close to getting it and I’m rooting for you
bluefirebrand•Mar 10, 2026
> improved productivity could be good (and doesn't have to result in layoffs
It already has resulted in layoffs and one of the weakest job markets we've seen in ages
Executives could not have used it as an excuse for layoffs faster, they practically tripped over themselves trying to use it as an excuse to lay people off
kerlap10•Mar 10, 2026
What use is it to understand the physical world if all investments are misallocated to the virtual world? Perhaps the AI will detect that there is a housing shortage and politicians will finally believe it because AI said so?
Or is it to accelerate Skynet?
pingou•Mar 10, 2026
Yann LeCun said a number of things that are very dubious, like autoregressive LLMs are a dead end, LLMs do not have an internal world model, and this morning https://www.youtube.com/watch?v=AFi1TPiB058 (in french) that an IA cannot find a strategy to preserve itself against the will of its creator.
As a french, I wish him good luck anyway, I'm all for exploring different avenues of achieving AGI.
fennecfoxy•Mar 10, 2026
Why world model? To emulate how we became sentient?
A "world" is just senses. In a way the context is one sense. A digital only world is still a world.
I think more success is in a model having high level needs and aspirations that are borne from lower level needs. Model architecture also needs to shift to multiple autonomous systems that interact, in the same ways our brains work - there's a lot under the surface inside our heads, it's not just "us" in there.
We only interact with our environment because of our low level needs, which are primarily: food, water. Secondary: mating. Tertiary: social/tribal credit (which can enable food, water and mating).
omegastick•Mar 10, 2026
Because if you have an explicit world model you can optimize against it.
It sounds like you are imagining tacking a world model onto an LLM. That's one approach but not what LeCun advocates for.
abmmgb•Mar 10, 2026
Not based on true valuation unless h-index has become a valuation metric lol
Academics don’t always make great entrepeneurs
A_D_E_P_T•Mar 10, 2026
Justifiable.
There are a lot more degrees of freedom in world models.
LLMs are fundamentally capped because they only learn from static text -- human communications about the world -- rather than from the world itself, which is why they can remix existing ideas but find it all but impossible to produce genuinely novel discoveries or inventions. A well-funded and well-run startup building physical world models (grounded in spatiotemporal understanding, not just language patterns) would be attacking what I see as the actual bottleneck to AGI. Even if they succeed only partially, they may unlock the kind of generalization and creative spark that current LLMs structurally can't reach.
10xDev•Mar 10, 2026
Whether it is text or an image, it is just bits for a computer. A token can represent anything.
A_D_E_P_T•Mar 10, 2026
Sure, but don't conflate the representation format with the structure of what's being represented.
Everything is bits to a computer, but text training data captures the flattened, after-the-fact residue of baseline human thought: Someone's written description of how something works. (At best!)
A world model would need to capture the underlying causal, spatial, and temporal structure of reality itself -- the thing itself, that which generates those descriptions.
You can tokenize an image just as easily as a sentence, sure, but a pile of images and text won't give you a relation between the system and the world. A world model, in theory, can. I mean, we ought to be sufficient proof of this, in a sense...
firecall•Mar 10, 2026
It’s worth noting how our human relationship or understanding of our world model changed as our tools to inspect and describe our world advanced.
So when we think about capturing any underlying structure of reality itself, we are constrained by the tools at hand.
The capability of the tool forms the description which grants the level of understanding.
Bombthecat•Mar 10, 2026
Can a token represent concentration, will?
10xDev•Mar 11, 2026
Those sound more like emergent properties then something you can engineer.
bsenftner•Mar 10, 2026
There will be no "unlocking of AGI" until we develop a new science capable of artificial comprehension. Comprehension is the cornucopia that produces everything we are, given raw stimulus an entire communicating Universe is generated with a plethora of highly advanceds predator/prey characters in an infinitely complex dynamic, and human science and technology have no lead how to artificially make sense of that in a simultaneous unifying whole. That's comprehension.
chilmers•Mar 10, 2026
Ironically, your comment is practically incomprehensible.
copperx•Mar 10, 2026
These two comments above me capture Slashdot in the early 2000s.
rvz•Mar 10, 2026
A lot more justifiable than say, Thinking Machines at least. But we will "see".
World models and vision seems like a great use case for robotics which I can imagine that being the main driver of AMI.
andy12_•Mar 10, 2026
I don't understand this view. How I see it the fundamental bottleneck to AGI is continual learning and backpropagation. Models today are static, and human brains don't learn or adapt themselves with anything close to backpropagation. World models don't solve any of these problems; they are fundamentally the same kind of deep learning architectures we are used to work with. Heck, if you think learning from the world itself is the bottleneck, you can just put a vision-action LLM on a reinforcement learning loop in a robotic/simulated body.
zelphirkalt•Mar 10, 2026
> I don't understand this view. How I see it the fundamental bottleneck to AGI is continual learning and backpropagation. Models today are static, and human brains don't learn or adapt themselves with anything close to backpropagation.
Even with continuous backpropagation and "learning", enriching the training data, so called online-learning, the limitations will not disappear. The LLMs will not be able to conclude things about the world based on fact and deduction. They only consider what is likely from their training data. They will not foresee/anticipate events, that are unlikely or non-existent in their training data, but are bound to happen due to real world circumstances. They are not intelligent in that way.
Whether humans always apply that much effort to conclude these things is another question. The point is, that humans fundamentally are capable of doing that, while LLMs are structurally not.
The problems are structural/architectural. I think it will take another 2-3 major leaps in architectures, before these AI models reach human level general intelligence, if they ever reach it. So far they can "merely" often "fake it" when things are statistically common in their training data.
andy12_•Mar 10, 2026
> Even with continuous backpropagation and "learning"
That's what I said. Backpropagation cannot be enough; that's not how neurons work in the slightest. When you put biological neurons in a Pong environment they learn to play not through some kind of loss or reward function; they self-organize to avoid unpredictable stimulation. As far as I know, no architecture learns in such an unsupervised way.
Forgive me for being ignorant - but 'loss' in supervised learning ML context encode the difference between how unlikely (high loss) or likely (low loss) was the network in predicting the output based on the input.
This sounds very similar to me as to what neurons do (avoid unpredictable stimulation)
andy12_•Mar 10, 2026
So, I have been thinking about this for a little while. Image a model f that takes a world x and makes a prediciton y. At a high-level, a traditional supervised model is trained like this
f(x)=y' => loss(y',y) => how good was my prediction? Train f through backprop with that error.
While a model trained with reinforcement learning is more similar to this. Where m(y) is the resulting world state of taking an action y the model predicted.
f(x)=y' => m(y')=z => reward(z) => how good was the state I was in based on my actions? Train f with an algorithm like REINFORCE with the reward, as the world m is a non-differentiable black-box.
While a group of neurons is more like predicting what is the resulting word state of taking my action, g(x,y), and trying to learn by both tuning g and the action taken f(x).
f(x)=y' => m(y')=z => g(x,y)=z' => loss(z,z') => how predictable was the results of my actions? Train g normally with backprop, and train f with an algorithm like REINFORCE with negative surprise as a reward.
After talking with GPT5.2 for a little while, it seems like Curiosity-driven Exploration by Self-supervised Prediction[1] might be an architecture similar to the one I described for neurons? But with the twist that f is rewarded by making the prediction error bigger (not smaller!) as a proxy of "curiosity".
So can't you just use how real neurons learn as training data to to learn how to learn the same way?
wiz21c•Mar 10, 2026
I'm sure that if a car appeared from nowhere in the middle of your living room, you would not be prepared at all.
So my question is: when is there enough training data that you can handle 99.99% of the world ?
jstummbillig•Mar 10, 2026
> They will not foresee/anticipate events, that are unlikely or non-existent in their training data, but are bound to happen due to real world circumstances. They are not intelligent in that way.
Can you be a bit more specific at all bounds? Maybe via an example?
steego•Mar 10, 2026
I think people MOSTLY foresee and anticipate events in OUR training data, which mostly comprises information collected by our senses.
Our training data is a lot more diverse than an LLMs. We also leverage our senses as a carrier for communicating abstract ideas using audio and visual channels that may or may not be grounded in reality. We have TV shows, video games, programming languages and all sorts of rich and interesting things we can engage with that do not reflect our fundamental reality.
Like LLMs, we can hallucinate while we sleep or we can delude ourselves with untethered ideas, but UNLIKE LLMs, we can steer our own learning corpus. We can train ourselves with our own untethered “hallucinations” or we can render them in art and share them with others so they can include it in their training corpus.
Our hallucinations are often just erroneous models of the world. When we render it into something that has aesthetic appeal, we might call it art.
If the hallucination helps us understand some aspect of something, we call it a conjecture or hypothesis.
We live in a rich world filled with rich training data. We don’t magically anticipate events not in our training data, but we’re also not void of creativity (“hallucinations”) either.
Most of us are stochastic parrots most of the time. We’ve only gotten this far because there are so many of us and we’ve been on this earth for many generations.
Most of us are dazzled and instinctively driven to mimic the ideas that a small minority of people “hallucinate”.
There is no shame in mimicking or being a stochastic parrot. These are critical features that helped our ancestors survive.
robwwilliams•Mar 10, 2026
> We can steer our own learning corpus
This is critical. We have some degree of attentional autonomy. And we have a complex tapestry of algorithms running in thalamocortical circuits that generate “Nows”. Truncation commands produce sequences of acts (token-like products).
perfmode•Mar 10, 2026
Humans are notoriously bad at formal logic. The Wason selection task is the classic example: most people fail a simple conditional reasoning problem unless it’s dressed up in familiar social context, like catching cheaters. That looks a lot more like pattern matching than rule application.
Kahneman’s whole framework points the same direction. Most of what people call “reasoning” is fast, associative, pattern-based. The slow, deliberate, step-by-step stuff is effortful and error-prone, and people avoid it when they can. And even when they do engage it, they’re often confabulating a logical-sounding justification for a conclusion they already reached by other means.
So maybe the honest answer is: the gap between what LLMs do and what most humans do most of the time might be smaller than people assume. The story that humans have access to some pure deductive engine and LLMs are just faking it with statistics might be flattering to humans more than it’s accurate.
Where I’d still flag a possible difference is something like adaptability. A person can learn a totally new formal system and start applying its rules, even if clumsily. Whether LLMs can genuinely do that outside their training distribution or just interpolate convincingly is still an open question. But then again, how often do humans actually reason outside their own “training distribution”? Most human insight happens within well-practiced domains.
lich_king•Mar 10, 2026
> The Wason selection task is the classic example: most people fail a simple conditional reasoning problem unless it’s dressed up in familiar social context, like catching cheaters.
I've never heard about the Wason selection task, looked it up, and could tell the right answer right away. But I can also tell you why: because I have some familiarity with formal logic and can, in your words, pattern-match the gotcha that "if x then y" is distinct from "if not x then not y".
In contrast to you, this doesn't make me believe that people are bad at logic or don't really think. It tells me that people are unfamiliar with "gotcha" formalities introduced by logicians that don't match the everyday use of language. If you added a simple additional to the problem, such as "Note that in this context, 'if' only means that...", most people would almost certainly answer it correctly.
Mind you, I'm not arguing that human thinking is necessarily more profound from what what LLMs could ever do. However, judging from the output, LLMs have a tenuous grasp on reality, so I don't think that reductionist arguments along the lines of "humans are just as dumb" are fair. There's a difference that we don't really know how to overcome.
edanm•Mar 10, 2026
Agree with much of your comment.
Though note that as GP said, on the Wason selection task, people famously do much better when it's framed in a social context. That at least partially undermines your theory that its lack of familiarity with the terminology of formal logic.
the_mar•Mar 11, 2026
I for the life of me could not solve the <18 example from wikipedia. but the number/color one is super easy
Ajedi32•Mar 11, 2026
Maybe the social version just creates a context where "if x then y" obviously does not include "if not x then not y". Everyone knows people over the drinking age can drink both alcoholic and non-alcoholic drinks, so you obviously don't have to check the person drinking the soft drink to make sure they aren't an adult.
lugu•Mar 10, 2026
Your response contains a performative contradiction: you are asserting that humans are naturally logical while simultaneously committing several logical errors to defend that claim.
jacquesm•Mar 10, 2026
This comment would be a lot more useful with an enumeration of those logical errors.
lugu•Mar 11, 2026
commenter’s specific claim—that adding a note about the definition of "if" would solve the problem—is a moving the goalposts fallacy and a tautology. The comment also suffers from hasty generalization (in their experience the test isn't hard) and special pleading (double standard for LLM and humans).
lich_king•Mar 10, 2026
When someone tells you "you can have this if you pay me", they don't mean "you can also have it if you don't pay". They are implicitly but clearly indicating you gotta pay.
It's as simple as that. In common use, "if x then y" frequently implies "if not x then not y". Pretending that it's some sort of a cognitive defect to interpret it this way is silly.
retsibsi•Mar 11, 2026
In the original studies, most people made an error that can't be explained by that misunderstanding: they failed to select the card showing 'not y'.
gopher_space•Mar 11, 2026
From my armchair this feels relevant:
> Decoding analyses of neural activity further reveal significant above chance decoding accuracy for negated adjectives within 600 ms from adjective onset, suggesting that negation does not invert the representation of adjectives (i.e., “not bad” represented as “good”)[...]
From: Negation mitigates rather than inverts the neural representations of adjectives
> If you added a simple additional to the problem, such as "Note that in this context, 'if' only means that...", most people would almost certainly answer it correctly.
Agreed. More broadly, classical logic isn't the only logic out there. Many logics will differ on the meaning of implication if x then y. There's multiple ways for x to imply y, and those additional meanings do show up in natural language all the time, and we actually do have logical systems to describe them, they are just lesser known.
Mapping natural language into logic often requires a context that lies outside the words that were written or spoken. We need to represent into formulas what people actually meant, rather than just what they wrote. Indeed the same sentence can be sometimes ambiguous, and a logical formula never is.
As an aside, I wanna say that material implication (that is, the "if x then y" of classical logic) deeply sucks, or rather, an implication in natural language very rarely maps cleanly into material implication. Having an implication if x then y being vacuously true when x is false is something usually associated with people that smirk on clever wordplays, rather than something people actually mean when they say "if x then y"
retsibsi•Mar 11, 2026
Quoting the Wikipedia article's formulation of the task for clarity:
> You are shown a set of four cards placed on a table, each of which has a number on one side and a color on the other. The visible faces of the cards show 3, 8, blue and red. Which card(s) must you turn over in order to test that if a card shows an even number on one face, then its opposite face is blue?
Confusion over the meaning of 'if' can only explain why people select the Blue card; it can't explain why people fail to select the Red card. If 'if' meant 'if and only if', then it would still be necessary to check that the Red card didn't have an even number. But according to Wason[0], "only a minority" of participants select (the study's equivalent of) the Red card.
I've confidently picked 8+blue and is now trying to understand why I personally did that. I think that maybe the text of the puzzle is not quite unambiguous. The question states "test a card" followed by "which cards", so this is what my brain immediately starts to check - every card one by one. Do I need to test "3"? No, not even. Do I need to test "8"? yes. Do I need to test "blue"? Yes, because I need to test "a card" to fit the criteria. And lastly "red" card also immediately fails verification of a "a card" fitting that criteria.
I think a corrected question should clarify in any obvious way that we are verifying not "a card" but "a rule" applicable to all cards. So a needs to be replaced with all or any, and mention of rule or pattern needs to be added.
coldtea•Mar 11, 2026
People in everyday life are not evaluating rules. They evaluate cases, for whether a case fits a rule.
So, when being told:
"Which card(s) must you turn over in order to test that if a card shows an even number on one face, then its opposite face is blue?"
they translate it to:
"Check the cards that show an even number on one face to see whether their opposite face is blue and vice versa"
Based on this, many would naturally pick the blue card (to test the direct case), and the 8 card (to test the "vice versa" case).
They wont check the red to see if there's an odd number there that invalidates the formulation as a general rule, because they're not in the mindset of testing a general rule.
Would they do the same if they had more familiarity with rule validation in everyday life or if the had a more verbose and explicit explanation of the goal?
j4k0o•Mar 11, 2026
Exactly. We invented rule-based machines so that we could have a thing that follows rules, and adheres strictly to them, all day long.
Im not sure why people keep comparing machine-behaviour to human's. Its like Economic models that assume perfect rationality... yeah that's not reality mate.
Ajedi32•Mar 11, 2026
Yeah maybe if you phrased it as "Which card(s) must you turn over in order to ensure that all odd-numbered cards are blue?" you'd get a better response?
Ajedi32•Mar 11, 2026
It also doesn't explain why people don't think it necessary to check the 3 to make sure it's not blue (which it would be if "if" meant "if and only if").
mirekrusin•Mar 11, 2026
As they say, "think about how smart the average person is, then realize half the population is below that". There are far more haikus than opuses walking this planet.
We keep benchmarking models against the best humans and the best human institutions - then when someone points out that swarms, branching, or scale could close the gap, we dismiss it as "cheating". But that framing smuggles in an assumption that intelligence only counts if it works the way ours does. Nobody calls a calculator a cheat for not understanding multiplication - it just multiplies better than you, and that's what matters.
LLMs are a different shape of intelligence. Superhuman on some axes, subpar on others. The interesting question isn't "can they replicate every aspect of human cognition" - it's whether the axes they're strong on are sufficient to produce better than human outcomes in domains that matter. Calculators settled that question for arithmetic. LLMs are settling it for an increasingly wide range of cognitive work. The fact that neither can flip a burger is irrelevant.
Humans don't have a monopoly on intelligence. We just had a monopoly on generality and that moat is shrinking fast.
ghywertelling•Mar 11, 2026
The "God of the gaps" theory is a theological and philosophical viewpoint where gaps in scientific knowledge are cited as evidence for the existence and direct intervention of a divine creator. It asserts that phenomena currently unexplained by science—such as the origin of life or consciousness—are caused by God.
We are doing inversion of God of gaps to "LLM of Gaps" where gaps in LLM capabilities are considered inherently negative and limiting
qsera•Mar 11, 2026
It is not actually the gaps in capability, and instead it arises from an understanding of how it works and an honest acknowledgement of how far it could go.
qsera•Mar 11, 2026
The question is not if these things are actually intelligent or not. The question is if these things will be useful without an endless supply of training data and continuous re-alignment using it..
And the questions "Are these things really intelligent" is just a proxy for that.
And we are interested in that question because that is necessary to justify the massive investment these things are getting now. It is quite easy to look at these things and conclude that it will continue to progress without any limit.
But that would be like looking at data compression at the time of its conception, and thinking that it is only a matter of time we can compress 100GB into 1KB..
We live in a time of scams that are obvious if you take a second look. If something that require much deeper scrutiny, then it is possible to generate a lot more larger bubble.
> and that moat is shrinking fast..
The point is that in reality it is not. It is just appearance. If you consider how these things work, then there is no justification of this conclusion.
I have said this elsewhere, but the problem of Hallucination itself along with the requirement of re-training, the smoking gun that these things are not intelligence in ways that would justify these massive investments.
perfmode•Mar 11, 2026
I think we're actually closer to agreement than it might seem.
You're right that the Wason task is partly about a mismatch between how "if" works in formal logic and how it works in everyday language. That's a fair point. But I think it actually supports what I'm saying rather than undermining it. If people default to interpreting "if x then y" as "if and only if" based on how language normally works in conversation, that is pattern-matching from familiar context. It's a totally understandable thing to do, and I'm not calling it a cognitive defect. I'm saying it's evidence that our default mode is contextual pattern-matching, not rule application. We agree on the mechanism, we're just drawing different conclusions from it.
Your own experience is interesting too. You got the right answer because you have some background in formal logic. That's exactly what I'd expect. Someone who's practiced in a domain recognizes the pattern quickly. But that's the claim: most reasoning happens within well-practiced domains. Your success on the task doesn't counter the pattern-matching thesis, it's a clean example of it working well.
On the broader point about LLMs having a "tenuous grasp on reality," I hear that, and I don't want to flatten the differences. There probably is something meaningfully different going on with how humans stay grounded. I just think the "humans reason, LLMs pattern-match" framing undersells how much human cognition is also pattern-matching, and that being honest about that is more productive than treating it as a reductionist insult.
jonahx•Mar 11, 2026
> The story that humans have access to some pure deductive engine and LLMs are just faking it with statistics might be flattering to humans more than it’s accurate.
Your point rings true with most human reasoning most of the time. Still, at least some humans do have the capability to run that deductive engine, and it seems to be a key part (though not the only part) of scientific and mathematical reasoning. Even informal experimentation and iteration rest on deductive feedback loops.
Nevermark•Mar 11, 2026
The fact that humans can learn to do X, sometimes well, often badly, and while many don’t, strongly supports the conjecture that X is not how they naturally do things.
I can perform symbolic calculations too. But most people have limited versions of this skill, and many people who don’t learn to think symbolically have full lives.
I think it is fair to say humans don’t naturally think in formal or symbolic reasoning terms.
People pattern match,
Another clue is humans have to practice things, become familiar with them to reason even somewhat reliable about them. Even if they already learned some formal reasoning.
—-
Higher level reasoning is always implemented as specific forms of lower order reasoning.
There is confusion about substrate processing vs. what higher order processes can be created with that substrate.
We can “just” be doing pattern matching from an implementation view, and yet go far “beyond” pattern matching with specific compositions of pattern matching, from a capability view.
How else could neurons think? We are “only” neurons. Yet we far surpass the kinds of capabilities neurons have.
jonahx•Mar 11, 2026
I don't disagree with any of that. My comment was only in relation to the question of human-specific capability that current LLMs may not be able to duplicate. I was not making the value judgments you seem to have read.
mikkupikku•Mar 11, 2026
When people do math or rigorous deductive reasoning, are we sure they aren't just pattern matching with a set of carefully chosen interacting patterns that have been refined by ancient philosophers as being useful patterns that produce consistent results when applied in correctly patterned ways?
jonahx•Mar 11, 2026
I've often wondered this. I suspect not, though I don't know. You're right that the answer matters to understanding LLM limitations relative to humans, though.
nextaccountic•Mar 11, 2026
> Kahneman’s whole framework points the same direction. Most of what people call “reasoning” is fast, associative, pattern-based. The slow, deliberate, step-by-step stuff is effortful and error-prone, and people avoid it when they can. And even when they do engage it, they’re often confabulating a logical-sounding justification for a conclusion they already reached by other means.
System 1 really looks like a LLM (indeed completing a phrase is an example of what it can do, like, "you either die a hero, or you live enough to become the _"). It's largely unconscious and runs all the time, pattern matching on random stuff
System 2 is something else and looks like a supervisor system, a higher level stuff that can be consciously directed through your own will
But the two systems run at the same time and reinforce each other
drdaeman•Mar 11, 2026
In my naive understanding, neither requires any will or consciousness.
S1 is “bare” language production, picking words or concepts to say or think by a fancy pattern prediction. There’s no reasoning at this level, just blabbering. However, language by itself weeds out too obvious nonsense purely statistically (some concepts are rarely in the same room), but we may call that “mindlessly” - that’s why even early LLMs produced semi-meaningful texts.
S2 is a set of patterns inside the language (“logic”), that biases S1 to produce reasoning-like phrases. Doesn’t require any consciousness or will, just concepts pushing S1 towards a special structure, simply backing one keeps them “in mind” and throws in the mix.
I suspect S2 has a spectrum of rigorousness, because one can just throw in some rules (like “if X then Y, not Y therefore not X”) or may do fancier stuff (imposing a larger structure to it all, like formulating and testing a null hypothesis). Either way it all falls down onto S1 for a ultimate decision-making, a sense of what sounds right (allowing us our favorite logical flaws), thus the fancier the rules (patterns of “thought”) the more likely reasoning will be sounder.
S2 doesn’t just rely but is a part of S1-as-language, though, because it’s a phenomena born out (and inside) the language.
Whether it’s willfully “consciously” engaged or if it works just because S1 predicts logical thinking concept as appropriate for certain lines of thinking and starts to involve probably doesn’t even matter - it mainly depends on whatever definition of “will” we would like to pick (there are many).
LLMs and humans can hypothetically do both just fine, but when it comes to checking, humans currently excel because (I suspect) they have a “wider” language in S1, that doesn’t only include word-concepts but also sensory concepts (like visuospatial thinking). Thus, as I get it, the world models idea.
rhubarbtree•Mar 11, 2026
Brilliant insight. The success of LLM reasoning, ie “telling yourself a story”, has greatly increased my belief that humans are actually much less impressive than they seem. I do think it’s mostly pattern matching and a bunch of interacting streams analogous to LLM tokens. Obviously the implementations are different, because nature has to be robust and learn online, but I do not think we are as different from these machines as most people assume. There’s a reason Hofstadter et al. reacted as they did even to the earlier models.
pixl97•Mar 11, 2026
This is why I also think humans being logical inference machines is mostly not true. We are seemingly capable of it, but there must be some cost that keeps it from being commonly used.
While humans did seemingly evolve socially very fast, with the tools we seem to have had for a few hundred thousand years it could have been far faster if there were not some other limitations that are being applied.
ChildOfChaos•Mar 11, 2026
I remember reading about this in a book, 'The enigma of reason', basically it was saying that reasoning was exactly that, we decided and then we came up with a reason for what we had decided and usually not the other way around.
This is because, the 'reasoning' part of our brain came from evolution when we started to communicate with others, we needed to explain our behaviour.
Which is fascinating if you think of the implications of that. In the most part we think we are being logical, but in reality we are pattern matching/impulsive and using our reasoning/logic to come up for excuses for why we have chosen what we had already decided.
It explains a lot about the world and why it's so hard to reason with someone, we are assuming the decision came from reason in the first place, which when you look at such peoples choices, makes sense as it's clear it didn't.
bwfan123•Mar 11, 2026
> But then again, how often do humans actually reason outside their own “training distribution”? Most human insight happens within well-practiced domains.
Humans can produce new concepts and then symbolize them for communication purposes. The meaning of concepts is grounded in operational definitions - in a manner that anyone can understand because they are operational, and can be reproduced in theory by anyone.
For example, euclid invented the concepts of a point, angle and line to operationally represent geometry in the real world. These concepts were never "there" to begin with. They were created from scratch to "build" a world-model that helps humans navigate the real world.
Euclid went outside his "training distribution" to invent point, angle, and line. Humans have this ability to construct new concepts by interaction with the real world - bringing the "unknown" into the "known" so-to-speak. Animals have this too via evolution, but it is unclear if animals can symbolize their concepts and skills to the extent that humans can.
perfmode•Mar 11, 2026
> Humans can produce new concepts and then symbolize them for communication purposes.
Sure, but the question is how often this actually happens versus how often people are doing something closer to recombination and pattern-matching within familiar territory. The point was about the base rate of genuine novel reasoning in everyday human cognition, and I don't think this addresses that.
> Euclid invented the concepts of a point, angle and line to operationally represent geometry in the real world. These concepts were never "there" to begin with.
This isn't really true though. Egyptian and Babylonian surveyors were working with geometric concepts long before Euclid. What Euclid did was axiomatize and systematize knowledge that was already in wide practical use. That's a real achievement, but it's closer to "sophisticated refinement within a well-practiced domain" than to reasoning from scratch outside a training distribution. If anything the example supports the parent comment.
There's also something off about saying points and lines were "never there." Humans have spatial perception. Geometric intuitions come from embodied experience of edges, boundaries, trajectories. Formalizing those intuitions is real work, but it's not the same as generating something with no prior basis.
The deeper issue is you're pointing to one of the most extraordinary intellectual achievements in human history and treating it as representative of human cognition generally. The whole point, drawing on Kahneman, is that most of what we call reasoning is fast associative pattern-matching, and that the slow deliberate stuff is rarer and more error-prone than people assume. The fact that Euclid existed doesn't tell us much about what the other billions of humans are doing cognitively on a Tuesday afternoon.
bwfan123•Mar 11, 2026
> Formalizing those intuitions is real work, but it's not the same as generating something with no prior basis.
> The fact that Euclid existed doesn't tell us much about what the other billions of humans are doing cognitively on a Tuesday afternoon.
Birds can fly - so, there is some flying intelligence built into their dna. But, are they aware of their skill to be able to create a theory of flight, and then use that to build a plane ? I am just pointing out that intuitions are not enough - the awareness of the intuitions in a manner that can symbolize and operationalize it is important.
> The whole point, drawing on Kahneman, is that most of what we call reasoning is fast associative pattern-matching, and that the slow deliberate stuff is rarer and more error-prone than people assume
David Bessis, in his wonderful book [1] argues that the cognitive actions done by you and I on a tuesday afternoon is the same that mathematicians do - just that we are unaware of it. Also, since you brought up Kahneman, Bessis proposes a System 3 wherein inaccurate intuitions is corrected by precise communication.
[1] Mathematica: A Secret World of Intuition and Curiosity
perfmode•Mar 11, 2026
The bird analogy is actually a really good one, but I think it supports a narrower claim than you're making. You're right that the capacity to symbolize and formalize intuitions is a distinct and important thing, separate from just having the intuitions. No argument there. But my point wasn't that symbolization doesn't matter. It was about how often humans actually exercise that capacity in a strong sense versus doing something more like recombination within familiar frameworks. The bird can't theorize flight, agreed. But most humans who can in principle theorize about their intuitions also don't, most of the time. The capacity exists. The base rate of its deployment is the question.
On Bessis, I actually think his argument is more compatible with what I was saying than it might seem. If the cognitive process underlying mathematical reasoning is the same one operating on a Tuesday afternoon, that's an argument against treating Euclid-level formalization as categorically different from everyday cognition. It suggests a continuum rather than a bright line between "pattern matching" and "genuine reasoning." Which is interesting and probably right. But it also means you can't point to Euclid as evidence that humans routinely do something qualitatively beyond what LLMs do. If Bessis is right, then the extraordinary cases and the mundane cases share the same underlying machinery, and the question becomes quantitative (how far along the continuum, how often, under what conditions) rather than categorical.
I'll check out the book though, it sounds like it's making a more careful version of the point than usually gets made in these threads.
conartist6•Mar 11, 2026
Models don't care. They aren't alive. This is the source of the chasm between here and AGI. You have to fear death to reason about the world and how to behave in it.
I guess I just always thought it was obvious that you can't do better than nature. You can do different things, sure, but if a society of unique individuals wasn't the most effective way of making progress, nature itself would not have chosen it.
So in a way I think Yan is smart because he got money, but in a way I think he's a fucking idiot if he can't see just how very, very very far we are from competing with organic intelligence.
conartist6•Mar 11, 2026
Not only that but people like this aren't actually interested in understanding the physical world. Because we don't understand it yet. If you care about understanding the world I think you become someone more like Jane Goodall than Yan LeCun
j4k0o•Mar 11, 2026
"You have to fear death to reason about the world and how to behave in it."
You're onto something there.
If everyone knew they were to die tomorrow, all of a sudden they'd choose to act differently. There is no logical thought process that determines that - it's something else. Something we can't concretely point toward as an object.
energy123•Mar 10, 2026
I don't understand why online learning is that necessary. If you took Einstein at 40 and surgically removed his hippocampus so he can't learn anything he didn't already know (meaning no online learning), that's still a very useful AGI. A hippocampus is a nice upgrade to that, but not super obviously on the critical path.
andy12_•Mar 10, 2026
That's true. Though could that hippocampus-less Einstein be able to keep making novel complex discoveries from that point forward? Seems difficult. He would rapidly reach the limits of his short term memory (the same way current models rapidly reach the limits of their context windows).
zelphirkalt•Mar 10, 2026
I guess the sheer amount and also variety of information you would need to pre-encode to get an Einstein at 40 is huge. Every day stream of high resolution video feed and actions and consequences and thoughts and ideas he has had until the age of 40 of every single moment. That includes social interactions, like a conversation and mimic of the other person in combination with what was said and background knowledge about the other person. Even a single conversation's data is a huge amount of data.
But one might say that the brain is not lossless ... True, good point. But in what way is it lossy? Can that be simulated well enough to learn an Einstein? What gives events significance is very subjective.
andsoitis•Mar 10, 2026
Where does that training data come from?
staticman2•Mar 10, 2026
> If you took Einstein at 40 and surgically removed his hippocampus so he can't learn anything he didn't already know (meaning no online learning), that's still a very useful AGI.
I like how people are accepting this dubious assertion that Einstein would be "useful" if you surgically removed his hippocampus and engaging with this.
It also calls this Einstein an AGI rather than a disabled human???
squeegmeister•Mar 10, 2026
Hypotheticals fear him
jeltz•Mar 10, 2026
It could possibly be useful but I don't see why it would be AGI.
a-french-anon•Mar 10, 2026
Kinda a moot point in my eyes because I very much doubt you can arrive at the same result without the same learning process.
daxfohl•Mar 10, 2026
He basically said that himself:
"Reading, after a certain age, diverts the mind too much from its creative pursuits. Any man who reads too much and uses his own brain too little falls into lazy habits of thinking".
-- Albert Einstein
A_D_E_P_T•Mar 10, 2026
You could have continual learning on text and still be stuck in the same "remixing baseline human communications" trap. It's a nasty one, very hard to avoid, possibly even structurally unavoidable.
As for the "just put a vision LLM in a robot body" suggestion: People are trying this (e.g. Physical Intelligence) and it looks like it's extraordinarily hard! The results so far suggest that bolting perception and embodiment onto a language-model core doesn't produce any kind of causal understanding. The architecture behind the integration of sensory streams, persistent object representations, and modeling time and causality is critically important... and that's where world models come in.
ben_w•Mar 10, 2026
> Models today are static, and human brains don't learn or adapt themselves with anything close to backpropagation.
While I suspect latter is a real problem (because all mammal brains* are much more example-efficient than all ML), the former is more about productisation than a fundamental thing: the models can be continuously updated already, but that makes it hard to deal with regressions. You kinda want an artefact with a version stamp that doesn't change itself before you release the update, especially as this isn't like normal software where specific features can be toggled on or off in isolation of everything else.
* I think. Also, I'm saying "mammal" because of an absence of evidence (to my *totally amateur* skill level) not evidence of absence.
program_whiz•Mar 10, 2026
they can be continuously updated, assuming you re-run representative samples of the training set through them continuously. Unlike a mammal brain which preserves the function of neurons unless they activate in a situation which causes a training signal, deep nets have catastrophic forgetting because signals get scattered everywhere. If you had a model continuously learning about you in your pocket, without tons of cycles spent "remembering" old examples. In fact, this is a major stumbling block in standard training, sampling is a huge problem. If you just iterate through the training corpus, you'll have forgotten most of the english stuff by the time you finish with chinese or spanish. You have to constantly mix and balance training info due to this limitation.
The fundamental difference is that physical neurons have a discrete on/off activation, while digital "neurons" in a network are merely continuous differentiable operations. They also don't have a notion of "spike timining dependency" to avoid overwriting activations that weren't related to an outcome. There are things like reward-decay over time, but this applies to the signal at a very coarse level, updates are still scattered to almost the entire system with every training example.
10xDev•Mar 10, 2026
The fact that models aren't continually updating seems more like a feature. I want to know the model is exactly the same as it was the last time I used it. Any new information it needs can be stored in its context window or stored in a file to read the next it needs to access it.
kergonath•Mar 10, 2026
> The fact that models aren't continually updating seems more like a feature.
I think this is true to some extent: we like our tools to be predictable. But we’ve already made one jump by going from deterministic programs to stochastic models. I am sure the moment a self-evolutive AI shows up that clears the "useful enough" threshold we’ll make that jump as well.
10xDev•Mar 10, 2026
Stochastic and unpredictability aren't exactly the same. I would claim current LLMs are generally predictable even if it is not as predictable as a deterministic program.
kergonath•Mar 11, 2026
No, but my point is that to some extent we value determinism. By making the jump to stochastic models we already move away from the status quo; further jumps are entirely possible. Depending on use case we can accept more uncertainty if it comes with benefits.
I also don’t think there is a reason to believe that self-learning models must be unpredictable.
jnd-cz•Mar 10, 2026
Unless you use your oen local models then you don't even know when OpenAI or Anthropic tweaked the model less or more. One week it's a version x, next week it's a version y. Just like your operating system is continuously evolving with smaller patches of specific apps to whole new kernel version and new OS release.
10xDev•Mar 10, 2026
There is still a huge gap between a model continuously updating itself and weekly patches by a specialist team. The former would make things unpredictable.
lxgr•Mar 10, 2026
Persistent memory through text in the context window is a hack/workaround.
And generally:
> I want to know the model is exactly the same as it was the last time I used it.
What exactly does that gain you, when the overall behavior is still stochastic?
But still, if it's important to you, you can get the same behavior by taking a model snapshot once we crack continuous learning.
edgyquant•Mar 11, 2026
It’s a feature of a good tool, but a sentient intelligence is more than just a tool
charcircuit•Mar 10, 2026
Agents have the ability of continual learning.
andy12_•Mar 10, 2026
Putting stuff you have learned into a markdown file is a very "shallow" version of continual learning. It can remember facts, yes, but I doubt a model can master new out-of-distribution tasks this way. If anything, I think that Google's Titans[1] and Hope[2] architectures are more aligned with true continual learning (without being actual continual learning still, which is why they call it "test-time memorization").
I have had it master tasks by doing this. The first time it tries to solve an issue it may take a long time, but it documents its findings and how it was able to do it and then it applies that knowledge the next time the task comes up.
andy12_•Mar 11, 2026
There is some things that just don't transfer really well without specific training. I tried to create diagrams in Typst with Cetz (a Processing and Tikz inspired graphing library), and even with documentation, GPT 5.2-thinking can't really do complex nice diagrams like it can in Tikz. It can do simple things that are similar to the shown examples, but nothing really interesting. Typst and specially Cetz is too new for any current model to really "get it", so they can't use it. I need to wait to the next batch of frontier models so that they learn Typst and Cetz examples during pre-training.
patapong•Mar 11, 2026
It really reminds me of the movie Memento - it has to constantly put notes down to remember who it is and what it should do after waking up without memory every n minutes.
nurettin•Mar 10, 2026
Who knows? Perhaps attention really is all you need. Maybe our context window is really large. Or our compression is really effective. Perhaps adding external factors might be able to indirectly teach the models to act more in line with social expectations such as being embarrassed to repeat the same mistake, unlocking the final piece of the puzzle. We are still stumbling in the dark for answers.
jacquesm•Mar 10, 2026
The main difference is humans are learning all the time and models learn batch wise and forget whatever happened in a previous session unless someone makes it part of the training data so there is a massive lag.
Whoever cracks the continuous customized (per user, for instance) learning problem without just extending the context window is going to be making a big splash. And I don't mean cheats and shortcuts, I mean actually tuning the model based on received feedback.
aurareturn•Mar 11, 2026
Why not just provide more compute for say, 1 billion token context for each user to mimic continuous learning. Then retrain the model in the background to include learnings.
The user wouldn’t know if the continuous learning came from the context or the model retrained. It wouldn’t matter.
Continuous learning seems to be a compute and engineering problem.
jacquesm•Mar 11, 2026
Because that re-training is not strong enough to hold, or so it seems. The same dumb factual errors keep coming up on different generations of the same models. I've yet to see proof that something 'stuck' from model to model. They get better in a general sense but not in the specific sense that what was corrected stays put, not from session to session and not from one generation to the next.
My solution is to have this massive 'boot up' prompt but it becomes extremely tedious to maintain.
eloisant•Mar 11, 2026
They can write to files then refer to them in a next session.
A bit like the main character played by Guy Pierce in the movie Memento (which doesn't work great for him to be honest).
edgyquant•Mar 11, 2026
Iirc LeCunn talks about a self organizing hierarchy of real world objects and imo this is exactly how the human brain actually learns
stanfordkid•Mar 11, 2026
It's pretty simple... the word circle and what you can correlate to it via english language description has somewhat less to do with reality than a physical 3D model of a circle and what it would do in an environment. You can't just add more linguistic description via training data to change that. It doesn't really matter that you can keep back propagating because what you are back propagating over is fundamentally and qualitatively less rich.
mxkopy•Mar 11, 2026
The reason LLMs fail today is because there’s no meaning inherent to the tokens they produce other than the one captured by cooccurrence within text. Efforts like these are necessary because so much of “general intelligence” is convention defined by embodied human experience, for example arrows implying directionality and even directionality itself.
anon7000•Mar 11, 2026
I don’t understand your view. Reality is that we need some way to encode the rules of the world in a more definitive way. If we want models to be able to make assertive claims about important information and be correct, it’s very fair to theorize they might need a more deterministic approach than just training them more. But it’s just a theory that this will actually solve the problem.
Ultimately, we still have a lot to learn and a lot of experiments to do. It’s frankly unscientific to suggest any approaches are off the table, unless the data & research truly proves that. Why shouldn’t we take this awesome LLM technology and bring in more techniques to make it better?
A really, really basic example is chess. Current top AI models still don’t know how to play it (https://www.software7.com/blog/ai_chess_vs_1983_atari/) The models are surely trained on source material that include chess rules, and even high level chess games. But the models are not learning how to play chess correctly. They don’t have a model to understand how chess actually works — they only have a non-deterministic prediction based on what they’ve seen, even after being trained on more data than any chess novice has ever seen about the topic. And this is probably one of the easiest things for AI to stimulate. Very clear/brief rules, small problem space, no hidden information, but it can’t handle the massive decision space because its prediction isn’t based on the actual rules, but just “things that look similar”
(And yeah, I’m sure someone could build a specific LLM or agent system that can handle chess, but the point is that the powerful general purpose models can’t do it out of the box after training.)
Maybe more training & self-learning can solve this, but it’s clearly still unsolved. So we should definitely be experimenting with more techniques.
andy12_•Mar 11, 2026
> Reality is that we need some way to encode the rules of the world in a more definitive way
I mean, sure. But do world models the way LeCun proposes them solves this? I don't think so. JEPAs are just an unsupervised machine learning model at the end of the day; they might end up being better that just autoregressive pretraining on text+images+video, but they are not magic. For example, if you train a JEPA model on data of orbital mechanics, will it learn actually sensible algorithms to predict the planets' motions or will it just learn a mix of heuristic?
slashdave•Mar 11, 2026
If your model is poor, no amount of learning can fix it. If you don't think your model architecture is limited, you aren't looking hard enough.
the_black_hand•Mar 11, 2026
yes those are bottlenecks that world models don't solve. but the promise of world models is, unlike LLMs, they might be able to learn things about the world that humans haven't written. For example, we still don't fully know how insects fly. A world model could be trained on thousands of videos of insects and make a novel observation about insect trajectories. The premise is that despite being here for millenia, humans have only observed a tiny fraction of the world.
So I do buy his idea. But I disagree that you need world models to get to human level capabilities. IMO there's no fundamental reason why models can't develop human understanding based on the known human observations.
eloisant•Mar 11, 2026
LeCun is a researcher.
From his point of view, there are not much research left on LLM. Sure we can still improve them a bit with engineering around, but he's more interested in basic research.
a1371•Mar 11, 2026
I never understood why we believe humans don't backprop. Isn't it that during the day we fill up our context (short term memory) and sleep is actually where we use that to backprop? Heck, everyone knows what "sleep on it" means.
cedilla•Mar 11, 2026
Brains are not doing linear algebra, and they don't follow a concise algorithm.
What LLM do is even farther away from what neural nets do, and even there - artificial neurons are inspired by but not reimplementing biological neurons.
You can understand human thought in terms of LLMs, but that is just a simile, like understanding physical reality in terms of computers or clockworks.
carlmr•Mar 11, 2026
Especially they will require even more compute to get anything close to usable output. Human brains are super efficient at learning and producing output. We will need exponentially more compute for real time learning from video + audio + haptic data.
energy123•Mar 10, 2026
why LLMs (transformers trained on multimodal token sequences, potentially containing spatiotemporal information) can't be a world model?
> One major critique LeCun raises is that LLMs operate only in the realm of language, which is a simple, discrete space compared to the continuous, complex physical world we live in. LLMs can solve math problems or answer trivia because such tasks reduce to pattern completion on text, but they lack any meaningful grounding in physical reality. LeCun points out a striking paradox: we now have language models that can pass the bar exam, solve equations, and compute integrals, yet “where is our domestic robot? Where is a robot that’s as good as a cat in the physical world?” Even a house cat effortlessly navigates the 3D world and manipulates objects — abilities that current AI notably lacks. As LeCun observes, “We don’t think the tasks that a cat can accomplish are smart, but in fact, they are.”
energy123•Mar 10, 2026
But they don't only operate on language? They operate on token sequences, which can be images, coordinates, time, language, etc.
kergonath•Mar 10, 2026
It’s an interesting observation, but I think you have it backwards. The examples you give are all using discrete symbols to represent something real and communicating this description to other entities. I would argue that all your examples are languages.
samrus•Mar 10, 2026
Whats the first L stand for? Thats not just vestogial, their model of the world is formed almost exclusively from language rather than a range of things contributing significantly like for humans.
The biggest thing thats missing is actual feedback to their decisions. They have no "idea of that because transformers and embeddings dont model that yet. And langiage descriptions and image representations of feedback arent enough. They are too disjointed. It needs more
mrguyorama•Mar 10, 2026
How is a Linear stream of symbols able to capture the relationships of a real world?
It's like the people who are so hyped up about voice controlled computers. Like you get a linear stream of symbols is a huge downgrade in signals, right? I don't want computer interaction to be yet more simplified and worsened.
Compare with domain experts who do real, complicated work with computers, like animators, 3D modelers, CAD, etc. A mouse with six degrees of freedom, and a strong training in hotkeys to command actions and modes, and a good mental model of how everything is working, and these people are dramatically more productive at manipulating data than anyone else.
Imagine trying to talk a computer through nudging a bunch of vertexes through 3D space while flexibly managing modes of "drag" on connected vertexes. It would be terrible. And no, you would not replace that with a sentence of "Bot, I want you to nudge out the elbow of that model" because that does NOT do the same thing at all. An expert being able to fluidly make their idea reality in real time is just not even remotely close to the instead "Project Manager/mediocre implementer" relationship you get prompting any sort of generative model. The models aren't even built to contain specific "Style", so they certainly won't be opinionated enough to have artistic vision, and a strong understanding of what does and does not work in the right context, or how to navigate "My boss wants something stupid that doesn't work and he's a dumb person so how do I convince him to stop the dumb idea and make him think that was his idea?"
mrguyorama•Mar 10, 2026
>We don’t think the tasks that a cat can accomplish are smart, but in fact, they are.
All the things we look at as "Smart" seem to be the things we struggle with, not what is objectively difficult, if that can even be defined.
LarsDu88•Mar 10, 2026
I really hate the world model terminology, but the actual low level gripe between LeCunn and autoregressive LLMs as they stand now is the fact that the loss function needs to reconstruct the entirety of the input. Anything less than pixel perfect reconstruction on images is penalized. Token by token reconstruction also is biased towards that same level of granularity.
The density of information in the spatiotemporal world is very very great, and a technique is needed to compress that down effectively. JEPAs are a promising technique towards that direction, but if you're not reconstructing text or images, it's a bit harder for humans to immediately grok whether the model is learning something effectively.
I think that very soon we will see JEPA based language models, but their key domain may very well be in robotics where machines really need to experience and reason about the physical the world differently than a purely text based world.
energy123•Mar 10, 2026
Isn't the Sora video model a ViT with spatiotemporal inputs (so they've found a way to compress that down), but at the same time LeCunn wouldn't consider that a world model?
LarsDu88•Mar 10, 2026
VideoGen models have to have decoder output heads that reproduce pixel level frames. The loss function involes producing plausible image frames that requires a lot of detailed reconstruction.
I assume that when you get out of bed in the morning, the first thing you dont do is paint 1000 1080p pictures of what your breakfast looks like.
LeCunns models predict purely in representation space and output no pixel scale detailed frames. Instead you train a model to generate a dower dimension representation of the same thing from different views, penalizing if the representation is different ehen looking at the same thing
Unearned5161•Mar 10, 2026
I have a pet peeve with the concept of "a genuinely novel discovery or invention", what do you imagine this to be? Can you point me towards a discovery or invention that was "genuinely novel", ever?
I don't think it makes sense conceptually unless you're literally referring to discovering new physical things like elements or something.
Humans are remixers of ideas. That's all we do all the time. Our thoughts and actions are dictated by our environment and memories; everything must necessarily be built up from pre-existing parts.
A_D_E_P_T•Mar 10, 2026
Suno is transformer-based; in a way it's a heavily modified LLM.
You can't get Suno to do anything that's not in its training data. It is physically incapable of inventing a new musical genre. No matter how detailed the instructions you give it, and even if you cheat and provide it with actual MP3 examples of what you want it to create, it is impossible.
The same goes for LLMs and invention generally, which is why they've made no important scientific discoveries.
I don't see how this is an architectural problem though. The problem is that music datasets are highly multimodal, and the training process is relying almost entirely on this dataset instead of incorporating basic musical knowledge to allow it to explore a bit further. That's what happens when computer scientists aim to "upset" a field without consulting with experts in said field.
davidfarrell•Mar 10, 2026
W Brian Arthur's book "The Nature of Technology" provides a framework for classifying new technology as elemental vs innovative that I find helpful. For example the Huntley-Mcllroy diff operates on the phenomenon that ordered correspondence survives editing. That was an invention (discovery of a natural phenomenon and a means to harness it). Myers diff improves the performance by exploiting the fact that text changes are sparse. That's innovation. A python app using libdiff, that's engineering.
And then you might say in terms of "descendants": invention > innovation > engineering. But it's just a perspective.
0x3f•Mar 10, 2026
Novel things can be incremental. I don't think LLMs can do that either, at least I've never seen one do it.
bonesss•Mar 10, 2026
Genuinely novel discovery or invention?
Einstein’s theory of relativity springs to mind, which is deeply counter-intuitive and relies on the interaction of forces unknowable to our basic Newtonian senses.
There’s an argument that it’s all turtles (someone told him about universes, he read about gravity, etc), but there are novel maths and novel types of math that arise around and for such theories which would indicate an objective positive expansion of understanding and concept volume.
jungturk•Mar 10, 2026
Nah - Poincare & Lorentz did quite a bit of groundwork on relativity and its implications before Einstein put it all together.
The term LLM is confusing your point because VLMs belong to the same bin according to Yann.
Using the term autoregressive models instead might help.
kadushka•Mar 10, 2026
Diffusion models are not autoregressive but have the same limitations
jnd-cz•Mar 10, 2026
The sum of human knowledge is more than enough to come up with innovative ideas and not every field is working directly with the physical world. Still I would say there's enough information in the written history to create virtual simulation of 3d world with all ohysical laws applying (to a certain degree because computation is limited).
What current LLMs lack is inner motivation to create something on their own without being prompted. To think in their free time (whatever that means for batch, on demand processing), to reflect and learn, eventually to self modify.
I have a simple brain, limited knowledge, limited attention span, limited context memory. Yet I create stuff based what I see, read online. Nothing special, sometimes more based on someone else's project, sometimes on my own ideas which I have no doubt aren't that unique among 8 billions of other people. Yet consulting with AI provides me with more ideas applicable to my current vision of what I want to achieve. Sure it's mostly based on generally known (not always known to me) good practices. But my thoughts are the same way, only more limited by what I have slowly learned so far in my life.
daxfohl•Mar 10, 2026
I guess you need two things to make that happen. First, more specialization among models and an ability to evolve, else you get all instances thinking roughly the same thing, or deer in the headlights where they don't know what of the millions of options they should think about. Second, fewer guardrails; there's only so much you can do by pure thought.
The problem is, idk if we're ready to have millions of distinct, evolving, self-executing models running wild without guardrails. It seems like a contradiction: you can't achieve true cognition from a machine while artificially restricting its boundaries, and you can't lift the boundaries without impacting safety.
jandrewrogers•Mar 10, 2026
> virtual simulation of 3d world
Virtual simulations are not substitutable for the physical world. They are fundamentally different theory problems that have almost no overlap in applicability. You could in principle create a simulation with the same mathematical properties as the physical world but no one has ever done that. I'm not sure if we even know how.
Physical world dynamics are metastable and non-linear at every resolution. The models we do build are created from sparse irregular samples with large error rates; you often have to do complex inference to know if a piece of data even represents something real. All of this largely breaks the assumptions of our tidy sampling theorems in mathematics. The problem of physical world inference has been studied for a couple decades in the defense and mapping industries; we already have a pretty good understanding of why LLM-style AI is uniquely bad at inference in this domain, and it mostly comes down to the architectural inability to represent it.
Grounded estimates of the minimum quantity of training data required to build a reliable model of physical world dynamics, given the above properties, is many exabytes. This data exists, so that is not a problem. The models will be orders of magnitude larger than current LLMs. Even if you solve the computer science and theory problems around representation so that learning and inference is efficient, few people are prepared for the scale of it.
(source: many years doing frontier R&D on these problems)
MITSardine•Mar 10, 2026
> You could in principle create a simulation with the same mathematical properties as the physical world but no one has ever done that. I'm not sure if we even know how.
What do you mean by that? Simulating physics is a rich field, which incidentally was one of the main drivers of parallel/super computing before AI came along.
jandrewrogers•Mar 10, 2026
The mapping of the physical world onto a computer representation introduces idiosyncratic measurement issues for every data point. The idiosyncratic bias, errors, and non-repeatability changes dynamically at every point in space and time, so it can be modeled neither globally nor statically. Some idiosyncratic bias exhibits coupling across space and time.
Reconstructing ground truth from these measurements, which is what you really want to train on, is a difficult open inference problem. The idiosyncratic effects induce large changes in the relationships learnable from the data model. Many measurements map to things that aren't real. How badly that non-reality can break your inference is context dependent. Because the samples are sparse and irregular, you have to constantly model the noise floor to make sure there is actually some signal in the synthesized "ground truth".
In simulated physics, there are no idiosyncratic measurement issues. Every data point is deterministic, repeatable, and well-behaved. There is also much less algorithmic information, so learning is simpler. It is a trivial problem by comparison. Using simulations to train physical world models is skipping over all the hard parts.
I've worked in HPC, including physics models. Taking a standard physics simulation and introducing representative idiosyncratic measurement seems difficult. I don't think we've ever built a physics simulation with remotely the quantity and complexity of fine structure this would require.
infinite8s•Mar 10, 2026
Is this like some scale-independent version of Heisenberg's Uncertainty Principle?
MITSardine•Mar 11, 2026
I'm probably missing most of your point, but wouldn't the fact that we have inverse problems being applied in real-world situations somewhat contradict your qualms? In those cases too, we have to deal with noisy real-world information.
I'll admit I'm not very familiar with that type of work - I'm in the forward solve business - but if assumptions are made on the sensor noise distribution, couldn't those be inferred by more generic models? I realize I'm talking about adding a loop on top of an inverse problem loop, which is two steps away (just stuffing a forward solve in a loop is already not very common due to cost and engineering difficulty).
Or better yet, one could probably "primal-adjoint" this and just solve at once for physical parameters and noise model, too. They're but two differentiable things in the way of a loss function.
robrenaud•Mar 10, 2026
Was Alphago's move 37 original?
In the last step of training LLMs, reinforcement learning from verified rewards, LLMs are trained to maximize the probability of solving problems using their own output, depending on a reward signal akin to winning in Go. It's not just imitating human written text.
Fwiw, I agree that world models and some kind of learning from interacting with physical reality, rather than massive amounts of digitized gym environments is likely necessary for a breakthrough for AGI.
ml-anon•Mar 10, 2026
Honestly, how do people who know so little have this much confidence to post here?
> LLMs are fundamentally capped because they only learn from static text -- human communications about the world -- rather than from the world itself, which is why they can remix existing ideas but find it all but impossible to produce genuinely novel discoveries or inventions.
No hate, but this is just your opinion.
The definition of "text" here is extremely broad – an SVG is text, but it's also an image format. It's not incomprehensible to imagine how an AI model trained on lots of SVG "text" might build internal models to help it "visualise" SVGs in the same way you might visualise objects in your mind when you read a description of them.
The human brain only has electrical signals for IO, yet we can learn and reason about the world just fine. I don't see why the same wouldn't be possible with textual IO.
daxfohl•Mar 10, 2026
Yeah I don't even think you'd need to train it. You could probably just explain how SVG works (or just tell it to emit coordinates of lines it wants to draw), and tell it to draw a horse, and I have to imagine it would be able to do so, even if it had never been trained on images, svg, or even cartesian coordinates. I think there's enough world model in there that you could simply explain cartesian coordinates in the context, it'd figure out how those map to its understanding of a horse's composition, and output something roughly correct. It'd be an interesting experiment anyway.
But yeah, I can't imagine that LLMs don't already have a world model in there. They have to. The internet's corpus of text may not contain enough detail to allow a LLM to differentiate between similar-looking celebrities, but it's plenty of information to allow it to create a world model of how we perceive the world. And it's a vastly more information-dense means of doing so.
masteranza•Mar 10, 2026
A few years ago I've made this simple thought experiment to convince myself that LLM's won't achieve superhuman level (in the sense of being better than all human experts):
Imagine that we made an LLM out of all dolphin songs ever recorded, would such LLM ever reach human level intelligence? Obviously and intuitively the answer is NO.
Your comment actually extended this observation for me sparking hope that systems consuming natural world as input might actually avoid this trap, but then I realized that tool use & learning can in fact be all that's needed for singularity while consuming raw data streams most of the time might actually be counterproductive.
kadushka•Mar 10, 2026
Imagine that we made an LLM out of all dolphin songs ever recorded, would such LLM ever reach human level intelligence?
It could potentially reach super-dolphin level intelligence
hodgehog11•Mar 10, 2026
I mean no offense here, but I really don't like this attitude of "I thought for a bit and came up with something that debunks all of the experts!". It's the same stuff you see with climate denialism, but it seems to be considered okay when it comes to AI. As if the people that spend all day every day for decades have not thought of this.
Dataset limitations have been well understood since the dawn of statistics-based AI, which is why these models are trained on data and RL tasks that are as wide as possible, and are assessed by generalization performance. Most of the experts in ML, even the mathematically trained ones, within the last few years acknowledge that superintelligence (under a more rigorous definition than the one here) is quite possible, even with only the current architectures. This is true even though no senior researcher in the field really wants superintelligence to be possible, hence the dozens of efforts to disprove its potential existence.
smokel•Mar 11, 2026
> Imagine that we made an LLM out of all dolphin songs ever recorded, would such LLM ever reach human level intelligence? Obviously and intuitively the answer is NO.
Not so fast. People have built pretty amazing thought frameworks out of a few axioms, a few bits, or a few operations in a Turing machine. Dolphin songs are probably more than enough to encode the game of life. It's just how you look at it that makes it intelligence.
ljm•Mar 10, 2026
I'm gonna be a cynic and say this is money following money and Yann LeCun is an excellent salesman.
I 100% guarantee that he will not be holding the bag when this fails. Society will be protecting him.
On that proviso I have zero respect for this guy.
thinkling•Mar 11, 2026
Um, why would anyone be "holding the bag" and who needs protecting by society? He's not taking out a loan, he's getting capital investment in a startup. People are gambling that he will do well and make money for them. If they gamble wrong, that's on them. Society won't be doing anything either way because investors in startups that fail don't get anything.
roromainmain•Mar 10, 2026
Agree. LLMs operate in the domain of language and symbols, but the universe contains much more than that. Humans also learn a great deal from direct phenomenological experience of the world, even without putting those experiences into words.
I remember a talk by Yann LeCun where he pointed out that in just the first couple of years of life, a human baby is exposed to orders of magnitude more sensory data (vision, sound, etc.) than what current LLMs are typically trained on. This seems like a major limitation of purely language-based models.
_s_a_m_•Mar 11, 2026
Really? As if not everyone told him the last 10 years, especially Gary Marcus which he ridiculed on Twitter at every occasion and now silently like a dog returning home switches to Gary's position. As if anyone was waiting for this, even 5 years ago this was old news, Tenenbaum is building world models for a long time. People in pop venture capital culture don't seem to know what is going on in research. Makes them easier to milk.
jimbo808•Mar 11, 2026
You're right that world models are the bottleneck, but people underestimate the staggering complexity gap between modeling the physical world and modeling a one-dimensional stream of text. Not only is the real world high-dimensional, continuous, noisy, and vastly more information dense, it's also not something for which there is an abundance of training data.
mountainriver•Mar 11, 2026
Okay but most modern LLMs are multimodal, and it’s fairly easy to make an LLM multimodal.
Also there is no evidence that novel discoveries are more than remixes. This is heavily debated but from what we’ve seen so far I’m not sure I would bet against remix.
World models are great for specific kinds of RL or MPC. Yann is betting heavily on MPC, I’m not sure I agree with this as it’s currently computationally intractable at scale
uoaei•Mar 11, 2026
> There are a lot more degrees of freedom in world models.
Perhaps for the current implementations this is true. But the reason the current versions keep failing is that world dynamics has multiple orders of magnitude fewer degrees of freedom than the models that are tasked to learn them. We waste so much compute learning to approximate the constraints that are inherent in the world, and LeCun has been pressing the point the past few years that the models he intends to design will obviate the excess degrees of freedom to stabilize training (and constrain inference to physically plausible states).
If my assumption is true then expect Max Tegmark to be intimately involved in this new direction.
8bitsrule•Mar 11, 2026
Gotta say, good luck with that effort. Lenat started Cyc 42 years ago, and after a while it seemed to disappear. 'Understanding' the 'physical world' is something that a few -may- start to approach intuitively after a decade or five of experience. (Einstein, Maxwell, et.al.) But the idea of feeding a machine facts and equations ... and dependence on human observations ... seems unlikely to lead to 'mastering the physical world'. Let alone for $1Billon.
mirekrusin•Mar 11, 2026
Thank you for not saying "language", but "text".
It's true, but it's also true that text is very expressive.
Programming languages (huge, formalized expressiveness), math and other formal notation, SQL, HTML, SVG, JSON/YAML, CSV, domain specific encoding ie. for DNA/protein sequences, for music, verilog/VHDL for hardware, DOT/Graphviz/Mermaid, OBJ for 3D, Terraform/Nix, Dockerfiles, git diffs/patches, URLs etc etc.
The scope is very wide and covers enough to be called generic especially if you include multi modalities that are already being blended in (images, videos, sound).
I'm cheering for Yann, hope he's right and I really like his approach to openness (hope he'll carry it over to his new company).
At the same time current architectures do exist now and do work, by far exceeding his or anybody's else expectations and continue doing so. It may also be true they're here to stay for long on text and other supported modalities as cheaper to train.
vidarh•Mar 11, 2026
It's just not true LLMs are limited to "static text". Data is data. Sensory input is still just data, and multimodal models has been a thing for a while. Ongoing learning and more extensive short term memory is a challenge, and so I am all for research in alternative architectures, but so much of the discourse about the limitations of LLMs act as if they have limitations they do not have.
slibhb•Mar 11, 2026
> LLMs are fundamentally capped because they only learn from static text -- human communications about the world -- rather than from the world itself, which is why they can remix existing ideas but find it all but impossible to produce genuinely novel discoveries or inventions.
This seems wrong to me on a few levels.
First, there is no way to "experience the world directly," all experience is indirect, and language is a very good way of describing the world. If language was a bad choice or limited in some fundamental way, LLMs wouldn't work as well as they do.
Second, novel ideas are often existing ideas remixed. It's hard/impossible to point to any single idea that sprung from nowhere.
Third, you can provide an LLM with real-world information and suddenly it's "interacting with the world". If I tell an LLM about the US war on Iran, I am in a very real sense plugging it into the real world, something that isn't part of its training data.
Finally, modern LLMs are multi-modal, meaning they have the ability to handle images/video. My understanding is that they use some kind of adapter to turn non-text data into data that the LLM can make sense of.
A_D_E_P_T•Mar 11, 2026
Re 1: You experience the world in real time (or close enough) via your senses, which combine to form a spatiotemporal sense: A sense of being a bounded entity in space and time. The LLM has none of that. They experience the world via stale old text and text derivatives.
Re 2: There's something tremendous in the fact, staring us right in the face, that LLMs are unable to meaningfully contribute to academic/medical research. I'm not saying that they need to perform on the level of a one-in-a-million Maxwell, DaVinci, or whatever. But as Dwarkesh asked one year ago: "What do you make of the fact that these things have basically the entire corpus of human knowledge memorized and they haven't been able to make a single new connection that has led to a discovery?"
Re 3: Sure, you can hold it by the hand and spoonfeed it. You can also create for it a mirror reality which doesn't exist, which is pure fiction. Given how limited these systems are, I don't suppose it makes much of a difference. There's no way for it to tell. The "human in the loop" is its interaction with the world. And a pale, meager interaction it is.
Re 4: Static, old images/video that they were trained on some months ago. That, too, is no way of interacting with the world.
slibhb•Mar 11, 2026
> Re 1: You experience the world in real time (or close enough) via your senses, which combine to form a spatiotemporal sense: A sense of being a bounded entity in space and time. The LLM has none of that. They experience the world via stale old text and text derivatives.
It's not clear to me that this is a fundamental limitation. If you provide LLMs with a news feed, it's closer to real-time. You can incrementally get closer than that in very obvious ways.
> Re 2: There's something tremendous in the fact, staring us right in the face, that LLMs are unable to meaningfully contribute to academic/medical research. I'm not saying that they need to perform on the level of a one-in-a-million Maxwell, DaVinci, or whatever. But as Dwarkesh asked one year ago: "What do you make of the fact that these things have basically the entire corpus of human knowledge memorized and they haven't been able to make a single new connection that has led to a discovery?"
LLMs have been around for a very short time. It wouldn't surprise me if researchers have used them to make discoveries. If they haven't, they will soon. Then there's a question about attribution...if you're a researcher and you use an LLM to discover something, do you give it credit? Or is it just a tool? There's a long, long history of researchers being less than honest how they made some discovery.
> Re 3: Sure, you can hold it by the hand and spoonfeed it. You can also create for it a mirror reality which doesn't exist, which is pure fiction. Given how limited these systems are, I don't suppose it makes much of a difference. There's no way for it to tell. The "human in the loop" is its interaction with the world. And a pale, meager interaction it is.
Our perception of reality is meager too. You can easily imagine how an LLM could be "plugged in" to reality. Again nothing fundamental here.
> Re 4: Static, old images/video that they were trained on some months ago. That, too, is no way of interacting with the world.
No, you can send an LLM a video/image and it can "understand it". It's not perfect but, like I said, the technology is already here to project video data into something the LLMs can interact with.
crazygringo•Mar 11, 2026
> What do you make of the fact that these things have basically the entire corpus of human knowledge memorized and they haven't been able to make a single new connection that has led to a discovery?
If that's what you're experiencing, then you're not asking them the right questions.
If you're at the edge of your field so you're able to judge whether something is novel or not, and you have a direction you'd like the LLM to explore, just ask it. Prompt it to come up with some ideas of how to solve X, or categorize Y, or analyze Z. Encourage it to take ideas from, or find parallels in, closely related or distantly related fields.
You will probably quickly find yourself with a ton of new ideas, of varying quality, in the same way as if you were brainstorming with a colleague.
But they don't work "solo". They need to you guide the conversation. But when you do, they're chock-full of new ideas and connections and discoveries. But again -- just like with people, the quality varies. If you're looking for a good startup idea, you need to sift through hundreds. Similarly if you're looking for an idea of a paper you could publish, there are a lot of hypotheses to sift through. And you're supplying your own expert "good taste" to try to determine what's worth pursuing and developing further, etc.
LLMs don't just magically come up with new proven discoveries unprompted. But they turn out to be fantastic research and idea-generation partners. They excel at combining existing related-but-distant facts and models and interpretations in novel ways.
general1465•Mar 10, 2026
Here you can see why it is so hard to compete as European startup with US startups - abysmal access to money. Investment of 1B USD in Europe is glorified as largest seed ever, but in USA it is another Tuesday.
weego•Mar 10, 2026
A billion seed is not an every day event anywhere.
mattmaroon•Mar 10, 2026
Not at all. A quick google turns up evidence of 4. There may be more but I think probably not many.
s08148692•Mar 10, 2026
For a foundation AI lab with a world famous AI researcher at the helm though, it's not so impressive. Won't even touch the sides of the hardware costs they'd need to be anywhere near competitive
oceansky•Mar 10, 2026
A startup getting 1B net worth is so rare that such companies are called unicorns.
As the other commenter pointed out, this is 1B seed.
ArnoVW•Mar 10, 2026
actually, they raised $1.03 billion at a $3.5 billion valuation.
compounding_it•Mar 10, 2026
Europeans have free healthcare and retirement. They consider putting their money with long term benefits not just become CEO on Tuesday and declare bankruptcy on Wednesday.
MrBuddyCasino•Mar 10, 2026
„free“
ExpertAdvisor01•Mar 10, 2026
Free healthcare and retirement ?
ExpertAdvisor01•Mar 10, 2026
It is an universal system but definitely not free .
In Germany you pay on average 17.5% of your salary for healthcare insurance and 18.6% for retirement .
However contribution caps exists . 70k for healthcare and 100k for retirement .
general1465•Mar 10, 2026
It is not free, we just pay taxes.
ExpertAdvisor01•Mar 10, 2026
Retirement is the worst.
You are basically forced to pay into a unsustainable system ( at least in Germany ).
It already has to be subsidized by taxes .
joe_mamba•Mar 10, 2026
Exactly. State retirement in Europes is not free nor great. We pay extra in taxes for it and it's only great for the present day retirees, not for those paying into the system right now who will retire into the future. It's the same as US social security, it's not some extra perk that Europeans have over Americans.
Top tier scientists aren't gonna be swayed by European state retirement systems.
dude250711•Mar 10, 2026
Yes, the faster they get used to the thought that loosing a billion is not a big deal, the better.
rvz•Mar 10, 2026
Once again, US companies and VCs are in this seed round. Just like Mistral with their seed round.
Europe again missing out, until AMI reaches a much higher valuation with an obvious use case in robotics.
Either AMI reaches over $100B+ valuation (likely) or it becomes a Thinking Machines Lab with investors questioning its valuation. (very unlikely since world models has a use-case in vision and robotics)
thibaut_barrere•Mar 10, 2026
It is well enough to attract worthy talents & produce interesting outcomes.
embedding-shape•Mar 10, 2026
> Europe again missing out
I can't read the article, but American investors investing into European companies, isn't US the one missing out here? Or does "Europe" "win" when European investors invest in US companies? How does that work in your head?
joe_mamba•Mar 10, 2026
>isn't US the one missing out here?
Why would the US miss out here? The US invests in something = the US owns part of something.
This isn't a zero sum game.
embedding-shape•Mar 10, 2026
> Why would the US miss out here?
Personally I don't believe anyone is missing out on anything here.
But rvz earlier claimed that Europe is missing out, because US investors are investing in a European company. That's kind of surprising to me, so asking if they also believe that the US is "missing out" whenever European investors invest in US companies, or if that sentiment only goes one way.
insydian•Mar 10, 2026
As someone in the tech twitter sphere this is yann and his ideas performing a suplex on LLM based companies. It is completely unfathomable to start an ai research company… Only sell off 20% and have 1 billion for screwing around for a few years.
insydian•Mar 10, 2026
I liken this to watching a godzilla esque movie. Just grab some popcorn and enjoy the ride.
Adds up : We are seeing a clear exodus of both capital and talent from the US - with the current US administration’s shift toward cronyism - and the EU stands as the most compelling alternative with a uniform market of 500 million people and the last major federation truly committed to the rule of law.
drstewart•Mar 10, 2026
"Exodus of capital" as if OpenAI didn't just raise 115b
gmerc•Mar 10, 2026
That's a bonfire of capital into a gaping hole in the ground with zero chance outside of "military pork" and "overcharging the taxpayer" to ever make their money back.
The brain capital loss here is what's going to spook investors.
whiplash451•Mar 10, 2026
You lost me at “uniform”…
ZeroCool2u•Mar 10, 2026
Regardless of your opinion of Yann or his views on auto regressive models being "sufficient" for what most would describe as AGI or ASI, this is probably a good thing for Europe. We need more well capitalized labs that aren't US or China centric and while I do like Mistral, they just haven't been keeping up on the frontier of model performance and seem like they've sort of pivoted into being integration specialists and consultants for EU corporations. That's fine and they've got to make money, but fully ceding the research front is not a good way to keep the EU competitive.
Hm, Singapour looks more like "one of their base"; they will have offices in Paris, Montréal, Singapour and New York (according to both this article and the interview Yann Le Cun did this morning on France Inter, the most listened radio in France).
Of course, each relevant newspaper on those areas highlight that it's coming to their place, but it really seems to be distributed.
rubzah•Mar 10, 2026
All your base are belong to Yann LeCun.
re-thc•Mar 10, 2026
> they are setting up in Singapore as their base
Europe in general has been tightening up their rules / taxes / laws around startups / companies especially tech and remote.
It's been less friendly. these days.
Signez•Mar 10, 2026
Yann Le Cun litteraly said this morning on the radio in France that it is headquarted in Paris and will pay taxes in France. Go figure…
kvgr•Mar 10, 2026
There will be no corporate taxes for a long time, so alls good.
mi_lk•Mar 10, 2026
Doesn’t he live in New York himself? Although not sure if that matters depending on his role
roromainmain•Mar 10, 2026
For such companies, France also offers generous R&D tax credits (Crédit Impôt Recherche): companies can recover roughly 30% of eligible R&D expenses incurred in France as a tax credit, which can eventually be refunded (in cash) if the company has no taxable profit.
storus•Mar 10, 2026
Is that alongside 100% of R&D expenses amortized in taxes when a company has taxable profit covering them?
roromainmain•Mar 10, 2026
Yes indeed, if the company is profitable.
ttoinou•Mar 10, 2026
No he said something like “well yes, only for the parts of profits made in France”
lotsofpulp•Mar 10, 2026
Why would it be any other way?
ttoinou•Mar 10, 2026
French people have this pipe dream all others french people to pay 75% of what they produce worldwide to pay for their retreats, hospital, useless schools system and all theirs “comité Théodule”
Imustaskforhelp•Mar 10, 2026
This is a singaporean news article from a singporean company[0] (Had to look it up)
As such, They are more likely to talk about singapore news and exaggerate the claims.
Singapore isn't the Key location. From what I am seeing online, France is the major location.
Singapore is just one of the more satellite like offices. They have many offices around the world it seems.
> Europe in general has been tightening up their rules / taxes / laws around startups / companies especially tech and remote.
Like? Care to provide any specific examples? "Europe" is a continent composed of various countries, most of which have been doing a lot to make it easier for startups and companies in general.
stingraycharles•Mar 10, 2026
That's a Singaporian newspaper, though; not sure if it's objectively their main base, or just one of them
fnands•Mar 10, 2026
Probably just a satellite office.
Might be to be close to some of Yann's collaborators like Xavier Bresson at NUS
throwpoaster•Mar 10, 2026
"Show me the incentive and I will show you the outcome."
Almost certainly the IP will be held in Singapore for tax reasons.
RamblingCTO•Mar 10, 2026
Which would be a good idea, as a European. I'd hate to see the investment go to waste on taxes that are spent on stupid shit anyway. Should go into R&D not fighting bureaucracy.
giancarlostoro•Mar 10, 2026
I didn't really know who he was, so I went and found his wikipedia, which is written like either he wrote it himself to stroke his ego, or someone who likes him wrote it to stroke his ego:
> He is the Jacob T. Schwartz Professor of Computer Science at the Courant Institute of Mathematical Sciences at New York University. He served as Chief AI Scientist at Meta Platforms before leaving to work on his own startup company.
That entire sentence before the remarks about him service at Meta could have been axed, its weird to me when people compare themselves to someone else who is well known. It's the most Kanye West thing you can do. Mind you the more I read about him, the more I discovered he is in fact egotistical. Good luck having a serious engineering team with someone who is egotistical.
timr•Mar 10, 2026
It's not comparing him to anyone. He has an endowed professorship. This is standard in academia, and you give the name because a) it's prestigious for the recipient and b) it strokes the ego of the donor.
You underestimate academia. Any academic that reads these two sentences only focuses on the first one: He has a named chair at Courant. In Germany, being a a Prof is added to your ID card/passport and becomes part of your official name, like knighthood in other countries.
dr_hooo•Mar 10, 2026
No true regarding the IDs, only PhD titles can be added. Not job descriptions. Source: academia person in Germany.
DeathArrow•Mar 10, 2026
It seems Germans add their PhD titles even to their nicknames. :)
bobwaycott•Mar 10, 2026
That’s not a comparison to another person. That’s his job title. It is not uncommon for universities to have distinguished chairs within departments named after a notable person—in this case, the founder of NYU’s Department of Computer Science.
This is just the official name of a chair at NYU. I'm not even sure Jacob T. Schwartz is more well known than Yann LeCun
stephencanon•Mar 10, 2026
Yann is definitely more well-known outside of academia. Inside academia, it's going to depend a lot on your specific background and how old you are.
g947o•Mar 10, 2026
Eh, that paragraph reads perfectly normal to me.
Either you have not read enough Wikipedia pages, or you have too much to complain about. (Or both.)
brandonb•Mar 10, 2026
LeCun's technical approach with AMI will likely be based on JEPA, which is also a very different approach than most US-based or Chinese AI labs are taking.
If you're looking to learn about JEPA, LeCun's vision document "A Path Towards Autonomous Machine Intelligence" is long but sketches out a very comprehensive vision of AI research:
https://openreview.net/pdf?id=BZ5a1r-kVsf
Training JEPA models within reach, even for startups. For example, we're a 3-person startup who trained a health timeseries JEPA. There are JEPA models for computer vision and (even) for LLMs.
You don't need a $1B seed round to do interesting things here. We need more interesting, orthogonal ideas in AI. So I think it's good we're going to have a heavyweight lab in Europe alongside the US and China.
sanderjd•Mar 10, 2026
Have you published anything about your health time series model? Sounds interesting!
BTW, I went to your website looking for this, but didn't find your blog. I do now see that it's linked in the footer, but I was looking for it in the hamburger menu.
brandonb•Mar 10, 2026
Thanks! We need to re-do the top navigation / hamburger menu -- we've added a bunch of new things in the past few months, and it badly needs to be re-organized.
smugma•Mar 10, 2026
Very interesting. I am keenly interested in this space and coincidentally had my blood drawn this morning.
That said, have you considered that “Measure 100+ biomarkers with a single blood draw” combined with "heart health is a solved problem” reads a lot like Theranos?
brandonb•Mar 10, 2026
FWIW, the single blood draw is 6-8 vials -- so we're not claiming to get 100 biomarkers from a single drop. The point of that is mostly that it just takes one appointment / is convenient.
mkeoliya•Mar 10, 2026
This is very cool work! I have a quick follow-up: in the biomarker prediction task, what horizon (ie. how far into the future) did you set for the predictions? Prediction is hard beyond an hour, so it'd be impressive if your model handles that.
brandonb•Mar 10, 2026
The prediction task is set up as predicting the next measured biomarkers based on a week of wearable data. So it's not necessarily predicting into the future, but predicting dataset Y given dataset X.
The specific biomarkers being predicted are the ones most relevant to heart health, like cholesterol or HbA1c. These tend to be more stable from hour to hour -- they may vary on a timescale of weeks as you modify your diet or take medications.
volkk•Mar 10, 2026
oh nice, i actually used you guys for some labs a few months ago. Glad you're competing with function & superpower
tomrod•Mar 10, 2026
I've been working to understand the potential uses for JEPA. Outside of video, has anyone made a list of any type (geared towards dummies like me)?
mandeepj•Mar 10, 2026
Appreciate your work! Healthcare is a regulated industry. Everything (Research, proposals, FDA submissions, Compliance docs, Accreditation Standards, etc.) is documented and follows a process, which means there's a lot of thesis. You can't sneak in anything unverified or unreliable. Why does healthcare need a JEPA\World model?
brandonb•Mar 10, 2026
Regulation is quickly catching up to modern AI techniques; for the most part, the approach is to verify outputs rather than process. For example, Utah's pilot to let AI prescribe medications has doctors check the first N prescriptions of each medication. Medicare is starting to pay for AI-enabled care, but tying payment to objective biomarkers like cholesterol or blood pressure actually got better.
jsnell•Mar 10, 2026
I don't think it's "regardless", your opinion on LeCun being right should be highly correlated to your opinion on whether this is good for Europe.
If you think that LLMs are sufficient and RSI is imminent (<1 year), this is horrible for Europe. It is a distracting boondoggle exactly at the wrong time.
andrepd•Mar 10, 2026
It's been 6 months away for 5 years now. In that time we've seen relatively mild incremental changes, not any qualitative ones. It's probably not 6 months away.
basket_horse•Mar 10, 2026
But I swear this time is different! Just give me another 6 months!
andrepd•Mar 10, 2026
And another 6 trillion dollars :^)
AStrangeMorrow•Mar 10, 2026
Yeah. I feel like that like many projects the last 20% take 80% of time, and imho we are not in the last 20%
Sure LLMs are getting better and better, and at least for me more and more useful, and more and more correct. Arguably better than humans at many tasks yet terribly lacking behind in some others.
Coding wise, one of the things it does “best”, it still has many issues: For me still some of the biggest issues are still lack of initiative and lack of reliable memory. When I do use it to write code the first manifests for me by often sticking to a suboptimal yet overly complex approach quite often. And lack of memory in that I have to keep reminding it of edge cases (else it often breaks functionality), or to stop reinventing the wheel instead of using functions/classes already implemented in the project.
All that can be mitigated by careful prompting, but no matter the claim about information recall accuracy I still find that even with that information in the prompt it is quite unreliable.
And more generally the simple fact that when you talk to one the only way to “store” these memories is externally (ie not by updating the weights), is kinda like dealing with someone that can’t retain memories and has to keep writing things down to even get a small chance to cope. I get that updating the weights is possible in theory but just not practical, still.
lordmathis•Mar 10, 2026
It's 6 months away the same way coding is apparently "solved" now.
HarHarVeryFunny•Mar 10, 2026
I think we - in last few months - are very close to, if not already at, the point where "coding" is solved. That doesn't mean that software design or software engineering is solved, but it does mean that a SOTA model like GPT 5.4 or Opus 4.6 has a good chance of being able to code up a working version of whatever you specify, with reason.
What's still missing is the general reasoning ability to plan what to build or how to attack novel problems - how to assess the consequences of deciding to build something a given way, and I doubt that auto-regressively trained LLMs is the way to get there, but there is a huge swathe of apps that are so boilerplate in nature that this isn't the limitation.
I think that LeCun is on the right track to AGI with JEPA - hardly a unique insight, but significant to now have a well funded lab pursuing this approach. Whether they are successful, or timely, will depend if this startup executes as a blue skies research lab, or in more of an urgent engineering mode. I think at this point most of the things needed for AGI are more engineering challenges rather than what I'd consider as research problems.
lordmathis•Mar 10, 2026
Sure, Claude and other SOTA LLMs do generate about 90% of my code but I feel like we are not closer to solving the last 10% than we were a year ago in the days of Claude 3.7. It can pretty reliably get 90% there and then I can either keep prompting it to get the rest done or just do it manually which is quite often faster.
j4k0o•Mar 11, 2026
It's interesting that people don't seem to think the likely outcome might be... capital and labour. Not capital alone.
You see this in construction - the capital is used for certain things and is operated by labour.
HarHarVeryFunny•Mar 11, 2026
We're certainly in the "capital/robot + labor" phase of AI at the moment, which Dario Amodei is referring to as the "centaur" (half horse, half human) phase, and expects to be very short lived.
Eventually (maybe taking a lot longer than a lot of people expect and/or are hoping for) we'll achieve full human-equivalent AI, at which point you won't NEED a centaur approach - the mechanical horse will be capable of doing ALL non-physical work by itself, but that doesn't mean this is how this will actually play out. If we do end up heading for some dystopian "Soylent Green" type future where most humans are unemployed, surviving poorly on government handouts, then I expect there would eventually be riots and uprising that would push back against it. It also just doesn't work - you can't create profits without customers, and customers need money to buy what you're selling.
Part of why we may (and hopefully will) continue to see humans, from CEO on down, still working when they could be replaced with AI, is that even "AGI", which we've yet to achieve, doesn't mean human-like - it's really just focusing on intelligence. Creating an actual remote-worker replacement requires more than just automating the intelligent decision-making part of a human (the "AGI" part) - it also requires the human/social/emotional part, which will take longer, and there may not even be any desire to push for that. I think people maybe discount how much of being a successful member of a team is based around human soft skills, our ability to understand and interact with each other, not just raw intellectual capacity, and certainly at this point in time corporate success is still very much "who you know, not what you know".
nsjdjdekkddk•Mar 11, 2026
you know Amodei is a salesman, right
mfru•Mar 10, 2026
Reminds me of how cold fusion reactors are only 5 years away for decades now
vidarh•Mar 10, 2026
Cold fusion reactors haven't produced usable intermediate results. LLMs have.
leptons•Mar 10, 2026
LLMs produce slop far to often to say they are in any way better than cold fusion in terms of usable results. "AI" kind of is the cold fusion of tech. We've always been 5 or 10 years away from "AGI" and likely always will be.
vidarh•Mar 10, 2026
That's just nonsense. That they produce slop does not negate that I and many others get plenty of value out of them in their current form, while we get zero value out of fusion so far - cold or otherwise.
next_xibalba•Mar 10, 2026
> RSI
Wait, we have another acronym to track. Is this the same/different than AGI and/or ASI?
mietek•Mar 10, 2026
Some people should definitely be getting Repetitive Strain Injury from all the hyping up of LLMs.
notnullorvoid•Mar 10, 2026
Recursive Self Improvement
robrenaud•Mar 10, 2026
Recursive self improvement. It's when AI speeds up the development of the next AI.
Insanity•Mar 10, 2026
Whenever I see claims about AGI being reachable through large language models, it reminds me of the miasma theory of disease. Many respectable medical professionals were convinced this was true, and they viewed the entire world through this lens. They interpreted data in ways that aligned with a miasmatic view.
Of course now we know this was delusional and it seems almost funny in retrospect. I feel the same way when I hear that 'just scale language models' suddenly created something that's true AGI, indistinguishable from human intelligence.
visarga•Mar 10, 2026
> Whenever I see claims about AGI being reachable through large language models, it reminds me of the miasma theory of disease.
Whenever I see people think the model architecture matters much, I think they have a magical view of AI. Progress comes from high quality data, the models are good as they are now. Of course you can still improve the models, but you get much more upside from data, or even better - from interactive environments. The path to AGI is not based on pure thinking, it's based on scaling interaction.
To remain in the same miasma theory of disease analogy, if you think architecture is the key, then look at how humans dealt with pandemics... Black Death in the 14th century killed half of Europe, and none could think of the germ theory of disease. Think about it - it was as desperate a situation as it gets, and none had the simple spark to keep hygiene.
The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model. For example 1B users do more for an AI company than a better model, they act like human in the loop curators of LLM work.
0x3f•Mar 10, 2026
If model arch doesn't matter much how come transformers changed everything?
visarga•Mar 10, 2026
Luck. RNNs can do it just as good, Mamba, S4, etc - for a given budget of compute and data. The larger the model the less architecture makes a difference. It will learn in any of the 10,000 variations that have been tried, and come about 10-15% close to the best. What you need is a data loop, or a data source of exceptional quality and size, data has more leverage. Architecture games reflect more on efficiency, some method can be 10x more efficient than another.
0x3f•Mar 10, 2026
That's not how I read the transformer stuff around the time it was coming out: they had concrete hypotheses that made sense, not just random attempts at striking it lucky. In other words, they called their shots in advance.
I'm not aware that we have notably different data sources before or after transformers, so what confounding event are you suggesting transformers 'lucked' in to being contemporaneous with?
Also, why are we seeing diminishing returns if only the data matters. Are we running out of data?
jsnell•Mar 10, 2026
The premise is wrong, we are not seeing diminishing returns. By basically any metric that has a ratio scale, AI progress is accelerating, not slowing down.
0x3f•Mar 10, 2026
For example?
jsnell•Mar 10, 2026
For example:
The METR time-horizon benchmark shows steady exponential growth. The frontier lab revenue has been growing exponentially from basically the moment they had any revenues. (The latter has confounding factors. For example it doesn't just depend on the quality of the model but on the quality of the apps and products using the model. But the model quality is still the main component, the products seem to pop into existence the moment the necessary model capabilities exist.)
0x3f•Mar 11, 2026
Note we're in a sub-thread about whether 'only data matters, not architecture', so I don't disagree that functionality or revenue are growing _in general_, but that's not we're talking about here.
The point is that core model architectures don't just keep scaling without modification. MoE, inference-time, RAG, etc. are all modifications that aren't 'just use more data to get better results'.
awakeasleep•Mar 10, 2026
If I'm understanding you, it seems like you're struck by hindsight bias. No one knew the miasma theory was wrong... it could have been right! Only with hindsight can we say it was wrong. Seems like we're in the same situation with LLMs and AGI.
0x3f•Mar 10, 2026
> Only with hindsight can we say it was wrong
It really depends what you mean by 'we'. Laymen? Maybe. But people said it was wrong at the time with perfectly good reasoning. It might not have been accessible to the average person, but that's hardly to say that only hindsight could reveal the correct answer.
nradov•Mar 10, 2026
The miasma theory of disease was "not even wrong" in the sense that it was formulated before we even had the modern scientific method to define the criteria for a theory in the first place. And it was sort of accidentally correct in that some non-infectious diseases are caused by airborne toxins.
scarmig•Mar 10, 2026
Plenty of scientific authorities believed in it through the 19th century, and they didn't blindly believe it: it had good arguments for it, and intelligent people weighed the pros and cons of it and often ended up on the side of miasma over contagionism. William Farr was no idiot, and he had sophisticated statistical arguments for it. And, as evidence that it was a scientific theory, it was abandoned by its proponents once contagionism had more evidence on its side.
It's only with hindsight that we think contagionism is obviously correct.
ordu•Mar 10, 2026
> Of course you can still improve the models, but you get much more upside from data, or even better - from interactive environments.
I'm on the contrary believe that the hunt for better data is an attempt to climb the local hill and be stuck there without reaching the global maximum. Interactive environments are good, they can help, but it is just one of possible ways to learn about causality. Is it the best way? I don't think so, it is the easier way: just throw money at the problem and eventually you'll get something that you'll claim to be the goal you chased all this time. And yes, it will have something in it you will be able to call "causal inference" in your marketing.
But current models are notoriously difficult to teach. They eat enormous amount of training data, a human needs much less. They eat enormous amount of energy to train, a human needs much less. It means that the very approach is deficient. It should be possible to do the same with the tiny fraction of data and money.
> The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model.
Well, I learned English almost all the way to B2 by reading books. I was too lazy to use a dictionary most of the time, so it was not interactive: I didn't interact even with dictionary, I was just reading books. How many books I've read to get to B2? ~10 or so. Well, I read a lot of English in Internet too, and watched some movies. But lets multiply 10 books by 10. Strictly speaking it was not B2, I was almost completely unable to produce English and my pronunciation was not just bad, it was worse. Even now I stumble sometimes on words I cannot pronounce. Like I know the words and I mentally constructed a sentence with it, but I cannot say it, because I don't know how. So to pass B2 I spent some time practicing speech, listening and writing. And learning some stupid topic like "travel" to have a vocabulary to talk about them in length.
How many books does LLM need to consume to get to B2 in a language unknown to it? How many audio records it needs to consume? Life wouldn't be enough for me to read and/or listen so much.
If there was a human who needed to consume as much information as LLM to learn, they would be the stupidest person in all the history of the humanity.
bethekidyouwant•Mar 10, 2026
Are you asking how many books a large language model would need to read to learn a new language if it was only trained on a different language? probably just 1 (the dictionary)
suddenlybananas•Mar 11, 2026
Do you know anything about how languages work? A dictionary doesn't have sufficient information to speak a language.
ainch•Mar 10, 2026
It's unintuitive to me that architecture doesn't matter - deep learning models, for all their impressive capabilities, are still deficient compared to human learners as far as generalisation, online learning, representational simplicity and data efficiency are concerned.
Just because RNNs and Transformers both work with enormous datasets doesn't mean that architecture/algorithm is irrelevant, it just suggests that they share underlying primitives. But those primitives may not be the right ones for 'AGI'.
scarmig•Mar 10, 2026
The miasma theory of disease, though wrong, made lots of predictions that proved useful and productive. Swamps smell bad, so drain them; malaria decreases. Excrement in the street smells bad, so build sewage systems; cholera decreases. Florence Nightingale implemented sanitary improvements in hospitals inspired by miasma theory that improved outcomes.
It was empirical and, though ultimately wrong, useful. Apply as you will to theories of learning.
dheera•Mar 10, 2026
Just because you raise 1 billion dollars to do X doesn't mean you can't pivot and do Y if it is in the best interest of your mission.
I won't comment on Yann LeCun or his current technical strategy, but if you can avoid sunk cost fallacy and pivot nimbly I don't think it is bad for Europe at all. It is "1 billion dollars for an AI research lab", not "1 billion dollars to do X".
vidarh•Mar 10, 2026
It's sufficient to think that there is a chance that they will not be, however, for there to be a non-zero value to fund other approaches.
And even if you think the chance is zero, unless you also think there is a zero chance they will be capable of pivoting quickly, it might still be beneficial.
I think his views are largely flawed, but chances are there will still be lots of useful science coming out of it as well. Even if current architectures can achieve AGI, it does not mean there can't also be better, cheaper, more effective ways of doing the same things, and so exploring the space more broadly can still be of significant value.
Tenoke•Mar 10, 2026
I think LeCun has been so consistently wrong and boneheaded for basically all of the AI boom, that this is much, much more likely to be bad than good for Europe. Probably one of the worst people to give that much money to that can even raise it in the field.
gozucito•Mar 10, 2026
Could you please elaborate on what he was wrong about?
conradkay•Mar 10, 2026
He said that LLMs wouldn't have common sense about how the real world physically works, because it's so obvious to humans that we don't bother putting it into text. This seems pretty foolish honestly given the scale of internet data, and even at the time LLMs could handle the example he said they couldn't
I believe he didn't think that reasoning/CoT would work well or scale like it has
ainch•Mar 10, 2026
LeCun was stubbornly 'wrong and boneheaded' in the 80s, but turned out to be right. His contention now is that LLMs don't truly understand the physical world - I don't think we know enough yet to say whether he is wrong.
barrell•Mar 10, 2026
While I’d love there to be a European frontier model, I do very much enjoy mistral. For the price and speed it outperforms any other model for my use cases (language learning related formatting, non-code non-research).
vessenes•Mar 10, 2026
Partner in a fund that wrote a small check into this — I have no private knowledge of the deal - while I agree that one’s opinion on auto regressive models doesn’t matter, I think the fact of whether or not the auto regressive models work matters a lot, and particularly so in LeCun’s case.
What’s different about investing in this than investing in say a young researcher’s startup, or Ilya’s superintelligence? In both those cases, if a model architecture isn’t working out, I believe they will pivot. In YL’s case, I’m not sure that is true.
In that light, this bet is a bet on YL’s current view of the world. If his view is accurate, this is very good for Europe. If inaccurate, then this is sort of a nothing-burger; company will likely exit for roughly the investment amount - that money would not have gone to smaller European startups anyway - it’s a wash.
FWIW, I don’t think the original complaint about auto-regression “errors exist, errors always multiply under sequential token choice, ergo errors are endemic and this architecture sucks” is intellectually that compelling. Here: “world model errors exist, world model errors will always multiply under sequential token choice, ergo world model errors are endemic and this architecture sucks.” See what I did there?
On the other hand, we have a lot of unused training tokens in videos, I’d like very much to talk to a model with excellent ‘world’ knowledge and frontier textual capabilities, and I hope this goes well. Either way, as you say, Europe needs a frontier model company and this could be it.
neversupervised•Mar 10, 2026
Is it good? This will almost certainly fail. Not because Yann or Europe, but because these sort of hyper-hyped projects fail. SSI and Thinking Machines haven’t lived to the hype.
ma2rten•Mar 10, 2026
Erm, ... OpenAI has hyped when it started and it took 6 years to take off. It's way to early to declare the SSI and Thinking Machines have failed.
koakuma-chan•Mar 10, 2026
They took money and haven't released anything. How are they doing?
levocardia•Mar 10, 2026
To be fair to SSI, they were very explicit about their plan: "we are going to take money and not release anything until we one-shot superintelligence."
If you invested in that you knew what you were getting yourself into!
crystal_revenge•Mar 10, 2026
> fully ceding the research front is not a good way to keep the EU competitive
Tech is ultimately a red herring as far as what's needed to keep the EU competitive. The EU has a trillion dollar hole[0] to fill if they want to replace US military presence, and current net import over 50% of their energy. Unfortunately the current situation in Iran is not helping either of these as they constrains energy further and risks requiring military intervention.
Right, they really need a military industrial complex to be "competitive" :eyeroll. Are you suggesting regressing to the stone age?
crystal_revenge•Mar 10, 2026
Europe doesn't want to be reliant (understandably) on the US military for defense, because if they are, as Trump has demonstrated, they will be pressured to make concessions not in their interests.
The need for a military is tightly coupled with the EU's need for energy. You can see this in the immediate impact that the war in Iran has had on Germany's natural gas prices [0]. But already unable to defend itself from Russia, EU countries are in a tough spot since they can't really afford to expend military resources defending their energy needs, and yet also don't have the energy independence to ignore these military engagements without risk. Meanwhile Russia has spend the last 4 years transition to a wartime economy and is getting hungry for expanded resource acquisition.
The world hasn't fundamentally changed since the stone age: humans need resources to survive and if there aren't enough people for those resources then violence will decide who has access the them.
> But already unable to defend itself from Russia, EU countries
I'm sorry, but this is just crazy talk. Russia cannot enforce its will on Ukraine, one of the poorest and most corrupt countries in Europe, with a (at time of invasion) relatively small and underequipped army. Yes it has grown through conscription, has been equipped by foreign and domestic supplies, has made some brilliant advances in tech and tactics... but when it was attacked, it was weak. And Russia lost its best troops and equipment failing to defat that.
Why would anyone think that the Russia that cannot defeat Ukraine would fare better against Poland? Let alone French warning strike nukes, or French, British, German troops and planes and what not.
crystal_revenge•Mar 11, 2026
It’s funny how you basically explain precisely why the war in Ukraine has gone on so long but refuse to recognize it.
As Russia’s economy has continually reshaped over the last 4 years there has been increasingly a domestic demand for war. You point out all the evidence yourself:
> Yes it has grown through conscription, has been equipped by foreign and domestic supplies, has made some brilliant advances in tech and tactics...
Russia (well its oligarchs and rulers) has increasingly benefited from perpetual war. Yes, soon it will need to switch positions to expansion to maintain its economy, but this situation in Iran presents a perfect opportunity if things play it Russia’s interests.
You also will find that if you paid any attention to European politics over the years this is a serious topic to all leaders there.
But I don’t mind if you’re not convinced, I had similar people on hacker news unconvinced Russia could sustain operations in Russia longer than a few months because they were doing so poorly… 4 years ago.
sofixa•Mar 11, 2026
> Russia (well its oligarchs and rulers) has increasingly benefited from perpetual war
No it has not. It has a ballooning debt crisis (at different levels - regions, military contractors, banks) which will pop at some point; the budget is so unbalanced they're projecting to reduce military spending (unlikely), increase taxes, and still have a pretty heavy deficit. They've been given the gift of the Strait of Hormuz being closed, so oil and gas revenues will grow, which will definitely buy them more time. But they are running against a clock, and they cannot win in Ukraine.
> You also will find that if you paid any attention to European politics over the years this is a serious topic to all leaders there.
Yes, because Russia only responds to strength, so you need to be strong militarily to be able to dissuade them from attacking you. That doesn't mean that realistically they have a chance of winning any conflict.
suddenlybananas•Mar 11, 2026
France has nukes and is making more. They're fine.
AngryData•Mar 10, 2026
Hard disagree, military might isn't going to secure anybody into the future, modern society and our economies will only get more vulnerable as time goes on and large wars or engagements will just push economies closer to collapse. And without a solid modern economy to back up the military, modern military will fall apart.
chrisgd•Mar 10, 2026
33% of the business in a seed round is nuts
ak_111•Mar 10, 2026
can you elaborate more, also isn't this necessary for a Lab that wants to compete with highly funded entities (like OpenAI, Anthropic)?
nailer•Mar 10, 2026
> Regardless of your opinion of Yann or his views on auto regressive models being "sufficient" for what most would describe as AGI or ASI
My main concern with Lecunn are the amount of times he has repeatedly told people software is open source when it’s license directly violates the open source definition.
gigatexal•Mar 10, 2026
As an American here in Berlin, I, too welcome this. I would love for there to be many large well capitalized companies here for me to work at.
I may be using CF DNS 1.1.1.1, for a while if so, and only seeing the issue today. It definitely seems specific to me at this point.
Have they changed something on their end?
Oras•Mar 10, 2026
> But this is not an applied AI company.
There is absolutely no doubt about Yann's impact on AI/ML, but he had access to many more resources in Meta, and we didn't see anything.
It could be a management issue, though, and I sincerely wish we will see more competition, but from what I quoted above, it does not seem like it.
Understanding world through videos (mentioned in the article), is just what video models have already done, and they are getting pretty good (see Seedance, Kling, Sora .. etc). So I'm not quite sure how what he proposed would work.
the_real_cher•Mar 10, 2026
He was suffocated by the corporate aspect Meta I suspect.
_giorgio_•Mar 10, 2026
I can’t reconcile this dichotomy: most of the landmark deep learning papers were developed with what, by today’s standards, were almost ridiculously small training budgets — from Transformers to dropout, and so on.
So I keep wondering: if his idea is really that good — and I genuinely hope it is — why hasn’t it led to anything truly groundbreaking yet? It can’t just be a matter of needing more data or more researchers. You tell me :-D
samrus•Mar 10, 2026
Its a matter of needing more time, which is a resource even SV VCs are scared to throw around. Look at the timeline of all these advancements and how long it took
Lecun introduced backprop for deep learning back in 1989
Hinton published about contrastive divergance in next token prediction in 2002
Alexnet was 2012
Word2vec was 2013
Seq2seq was 2014
AiAYN was 2017
UnicornAI was 2019
Instructgpt was 2022
This makes alot of people think that things are just accelerating and they can be along for the ride. But its the years and years of foundational research that allows this to be done. That toll has to be paid for the successsors of LLMs to be able to reason properly and operate in the world the way humans do. That sowing wont happen as fast as the reaping did. Lecun was to plant those seeds, the others who onky was to eat the fruit dont get that they have to wait
_giorgio_•Mar 10, 2026
If his ideas had real substance, we would have seen substantial results by now.
He introduced I-JEPA in 2023, so almost three years ago at this point.
If he still hasn’t produced anything truly meaningful after all these years at Meta, when is that supposed to happen? Yann LeCun has been at Facebook/Meta since December 2013.
Your chronological sequence is interesting, but it refers to a time when the number of researchers and the amount of compute available were a tiny fraction of what they are today.
samrus•Mar 11, 2026
> If his ideas had real substance, we would have seen substantial results by now
This is naive. Like saying if backprop had any real substance, it would have had results within 10 years of its publication in 1989
> Your chronological sequence is interesting, but it refers to a time when the number of researchers and the amount of compute available were a tiny fraction of what they are today.
Again. Those resources are important. But one resource being ignored is time. Try baking a turkey at 300 for 4 hours veruss at 900 for 1 hour and see how edible each one is
boccaff•Mar 10, 2026
llama models pushed the envelope for a while, and having them "open-weight" allowed a lot of tinkering. I would say that most of fine tuned evolved from work on top of llama models.
oefrha•Mar 10, 2026
Llama wasn’t Yann LeCun’s work and he was openly critical of LLMs, so it’s not very relevant in this context.
> My only contribution was to push for Llama 2 to be open sourced.
Quite a big contribution in practice.
oefrha•Mar 10, 2026
Sure, but I don't that's relevant in a startup with 1B VC money either. Meta can afford to (attempt to) commoditize their complement.
rockinghigh•Mar 11, 2026
He founded FAIR and the team in Paris that ultimately worked on the early Llama versions.
oefrha•Mar 11, 2026
FAIR was founded in 2015 and Llama's first release was in 2023. Musk co-founded OpenAI in 2015 but no reasonable person credits ChatGPT in 2022 to him.
stein1946•Mar 10, 2026
> There is absolutely no doubt about Yann's impact on AI/ML, but he had access to many more resources in Meta, and we didn't see anything.
That's true for 99% of the scientists, but dismissing their opinion based on them not having done world shattering / ground breaking research is probably not the way to go.
> I sincerely wish we will see more competition
I really wish we don't, science isn't markets.
> Understanding world through videos
The word "understanding" is doing a lot of heavy lifting here. I find myself prompting again and again for corrections on an image or a summary and "it" still does not "understand" and keeps doing the same thing over and over again.
GorbachevyChase•Mar 10, 2026
Do not keep bad results in context. You have to purge them to prevent them from effecting the next output. LLMs deceptively capable, but they don’t respond like a person. You can’t count on implicit context. You can’t count on parts of the implicit context having more weight than others.
torginus•Mar 10, 2026
Most folks get paid a lot more in a corporate job than tinkering at home - using the 'follow the money' logic it would make sense they would produce their most inspired works as 9-5 full stack engineers.
But often passion and freedom to explore are often more important than resources
YetAnotherNick•Mar 10, 2026
> we didn't see anything.
Is it a troll? Even if we just ignore Llama, Meta invented and released so many foundational research and open source code. I would say that the computer vision field would be years behind if Meta didn't publish some core research like DETR or MAE.
famouswaffles•Mar 10, 2026
You should ignore Llama because by his own admission,
>My only contribution was to push for Llama 2 to be open sourced.
rockinghigh•Mar 11, 2026
He founded the team that worked on fasttext, llama and other similarly impactful projects.
koolala•Mar 10, 2026
Did he work on those vision models?
nashadelic•Mar 10, 2026
Your take is brutal but spot on
andreyk•Mar 10, 2026
"and we didn't see anything" is not justified at all.
Meta absolutely has (or at least had) a word class industry AI lab and has published a ton of great work and open source models (granted their LLM open source stuff failed to keep up with chinese models in 2024/2025 ; their other open source stuff for thins like segmentation don't get enough credit though). Yann's main role was Chief AI Scientist, not any sort of product role, and as far as I can tell he did a great job building up and leading a research group within Meta.
He deserved a lot of credit for pushing Meta to very open to publishing research and open sourcing models trained on large scale data.
Just as one example, Meta (together with NYU) just published "Beyond Language Modeling: An Exploration of Multimodal Pretraining" (https://arxiv.org/pdf/2603.03276) which has a ton of large-experiment backed insights.
Yann did seem to end up with a bit of an inflated ego, but I still consider him a great research lead. Context: I did a PhD focused on AI, and Meta's group had a similar pedigree as Google AI/Deepmind as far as places to go do an internship or go to after graduation.
Oras•Mar 10, 2026
I wasn't criticising his scientific contribution at all, that's why I started my comment by appraising what he did.
Creating a startup has to be about a product. When you raise 1B, investors are expecting returns, not papers.
magicalist•Mar 10, 2026
>> but he had access to many more resources in Meta, and we didn't see anything
> I wasn't criticising his scientific contribution at all, that's why I started my comment by appraising what he did.
You were criticising his output at Facebook, though, but he was in the research group at facebook, not a product group, so it seems like we did actually see lots of things?
JMiao•Mar 10, 2026
they are not expecting returns at 1B+, just for some one to pay more than they paid six months ago
overfeed•Mar 11, 2026
> Creating a startup has to be about a product. When you raise 1B, investors are expecting returns, not papers.
Speaking of returns - Apple absolutely fucked Meta ads with the privacy controls, which trashed ad performance, revenue and share price. Meta turned things around using AI, with Yann as the lead researcher. Are you willing to give him credit for that? Revenue is now greater than pre-Apple-data-lockdown
yellow_lead•Mar 11, 2026
How much of Meta's increased revenue is attributed to AI? I think Meta "turned things around" by bypassing privacy controls [1].
> I think Meta "turned things around" by bypassing privacy controls
Why would Apple be complicit on this for years?
Razengan•Mar 11, 2026
Apple has allowed Facebook, TikTok etc. to track users across devices AND device resets via the iCloud Keychain API.
When you log into FB on any account on any device, then install FB on a new device, or even after you erase the device, they know it's you even before you log in. Because the info is tied to your Apple iCloud account.
And there's no way for users to see or delete what data other companies have stored and linked to your Apple ID via that API.
It's been like this for at least 5 years and nobody seems to care.
twoWhlsGud•Mar 11, 2026
Is there a write up of this somewhere? Curious to read more...
Razengan•Mar 11, 2026
None that I found. You can test it right now yourself. Install FB, log in, delete FB, reinstall FB. Your previous login info will be there.
That would be fine if users could SEE what has been stored and DELETE it WITHOUT going through the app and trusting it to show you everything honestly.
What's even worse is that it silently persists across DEVICE reinstalls.
Erase and reset your iPhone/iPad. Sign into the same iCloud account. Reinstall FB. Your login info will still be there.
Buy a new iPhone/iPad. Sign into the same iCloud account. Reinstall FB. Your login info will still be there.
And nope, no one seems to care.
nonameiguess•Mar 11, 2026
They're expecting what you promised them when they handed over the money. That is "more money" for most investors but that isn't the sole universal human objective. Money has to serve an instrumental purpose and if one of your purposes is something that can't currently be achieved, simply getting more money won't help. You need to give that money to some venture that might actually be able to achieve it. I have no doubt there are at least a few very rich people out there who just have sci-fi nerd dreams and want to see someone go to Mars, go to Jupiter, discover alien life, rebuild dinosaurs, or create a truly autonomous entirely new form of artificial life just to see if they can. If it makes money, great. If it doesn't, what else was I going to do? Die with $60 billion in the bank instead of $40 billion?
nextos•Mar 11, 2026
For instance, under Yann's direction Meta FAIR produced the ESM protein sequence model, which is less hyped than AlphaFold, but has been incredibly influential. They achieved great performance without using multiple alignments as an input/inductive bias. This is incredibly important for large classes of proteins where multiple alignments are pretty much noise.
LarsDu88•Mar 10, 2026
That's such a terrible take.
For a hot minute Meta had a top 3 LLM and open sourced the whole thing, even with LeCunn's reservations around the technology.
At the same time Meta spat out huge breakthroughs in:
- 3d model generation
- Self-supervised label-free training (DINO). Remember Alexandr Wang built a multibillion dollar company just around having people in third world countries label data, so this is a huge breakthrough.
- A whole new class of world modeling techniques (JEPAs)
- SAM (Segment anything)
Oras•Mar 10, 2026
> - Self-supervised label-free training (DINO). Remember Alexandr Wang built a multibillion dollar company just around having people in third world countries label data, so this is a huge breakthrough.
If it was a breakthrough, why did Meta acquire Wang and his company? I'm genuinely curious.
airstrike•Mar 10, 2026
People make stupid acquisitions all of the time.
LarsDu88•Mar 10, 2026
Wang fits the profile of a possible successor ceo for meta.
Young, hit it big early, hit the ai book early straight out of college. Obviously not woke (just look at his public statements).
Unfotunately the dude knows very little about ai or ml research. He's just another wealthy grifter.
At this point decision making at Meta is based on Zuckerberg's vibes, and i suspect the emperor has no clothes.
dabeeeenster•Mar 10, 2026
> It could be a management issue, though
Or, maybe it's just hard?
lee•Mar 11, 2026
In an interview, Yann mentioned that one reason he left Meta was that they were very focused on LLMs and he no longer believed LLMs were the path forward to reaching AGI.
htrp•Mar 11, 2026
this is absolutely an applied ai company, the only question is whether the applied AI will be subordinated to the research
npn•Mar 10, 2026
I wish him luck.
Recently all papers are about LLM, it brings up fatigue.
As GPT is almost reaching its limit, new architecture could bring out new discovery.
margorczynski•Mar 10, 2026
He couldn't achieve at least parity with LLMs during his days at Meta (and having at his disposal billions in resources most probably) but he'll succeed now? What is the pitch?
samrus•Mar 10, 2026
The pitch isnt to try to squeeze money out of a product like altman does. Its to lay the groundwork for the next evolution in AI. Llms were built on decades of work and theyve hit their limits. We'll need to invest alot of time building foundations without getting any tangible yeild for the next step to work. Get too greedy and youll be stuck
That article is from June 2025 so may be out of date, and the definition of "seed round" is a bit fuzzy.
_giorgio_•Mar 10, 2026
Thinking Machines looks half-dead already.
The giant seed round proves investors were willing to fund Mira Murati, not that the company had built anything durable.
Within months, it had already lost cofounder Andrew Tulloch to Meta, then cofounders Barret Zoph and Luke Metz plus researcher Sam Schoenholz to OpenAI; WIRED also reported that at least three other researchers left. At that point, citing it as evidence of real competitive momentum feels weak.
az226•Mar 10, 2026
Was just a grift
hnarayanan•Mar 10, 2026
Shock, gasp.
sylware•Mar 10, 2026
If, for even 1s, they get in a position which is threatening, in any way, Big Tech AI (mostly US based if not all), they will be raided by international finance to be dismantled and poached hardcore with some massive US "investment funds" (which looks more and more as "weaponized" international finance!!). Only china is very immune to international finance. Those funds have tens of thousands of billions of $, basically, in a world of money, there is near zero resistance.
ismailmaj•Mar 10, 2026
I don't see a world where they become threatening and the employees don't become rich from investors flooding in.
sylware•Mar 10, 2026
Where have you been in the last 2 decades?
ismailmaj•Mar 10, 2026
Don’t think that’s a fair interpretation of what I said.
Liquid money rich? No.
Can get pulled for big tech packages? Also no, for most of the employees.
AFAIK, big tech didn’t aggressively poach OpenAI-like talent, they did spend 10M+ pay packages but it was for a select few research scientists. Some folks left and came but it boiled down to culture mostly.
sylware•Mar 11, 2026
What???
microsoft openai is Big Tech.
Are you ok?
ismailmaj•Mar 11, 2026
Ah yes, OpenAI the puppet of Microsoft that is currently declaring war against GitHub, sounds logical.
itigges22•Mar 10, 2026
I just saw a post from Yann mentioning that AMI Labs is hiring too!
secondary_op•Mar 10, 2026
That being sad, Yann LeCun's twitter reposts are below average IQ.
goldenarm•Mar 10, 2026
Do you have a recent example ?
whiplash451•Mar 10, 2026
A fair amount of negative comments here, but Yann might very well be the person who brings the Bell Labs culture back to life. It’s been badly missing, and not just in Europe.
az226•Mar 10, 2026
Yann LeCun seeks $5B+ valuation for world model startup AMI (Amilabs).
He has hired LeBrun to the helm as CEO.
AMI has also hired LeFunde as CFO and LeTune as head of post-training.
They’re also considering hiring LeMune as Head of Growth and LePrune to lead inference efficiency.
I was thinking the same, are all people he hires LeSomething like those working at Bolson Construction having -son as a suffix.
dude250711•Mar 10, 2026
First grinding LEetcode, now having to have 'Le' in the name?
I have no chance in AI industry...
O4epegb•Mar 11, 2026
LeBron is missing out an opportunity to invest
nsbk•Mar 11, 2026
Or LeX
andrepd•Mar 10, 2026
Bolson-ass hiring policy.
doruk101•Mar 10, 2026
nominative determinists are running the world
vrganj•Mar 10, 2026
The guy overseeing the funds is called LeFunde and the guy doing the fine-tuning LeTune??
sinuhe69•Mar 11, 2026
He just made a joke
baxtr•Mar 11, 2026
It almost sound as if an LLM thought this up!
har2008preet•Mar 11, 2026
These all are claude agents name right?
paxys•Mar 10, 2026
I feel like I'm the only one not getting the world models hype. We've been talking about them for decades now, and all of it is still theoretical. Meanwhile LLMs and text foundation models showed up, proved to be insanely effective, took over the industry, and people are still going "nah LLMs aren't it, world models will be the gold standard, just wait."
pendenthistory•Mar 10, 2026
I bet LLMs and world models will merge. World models essentially try to predict the future, with or without actions taken. LLMs with tokenized image input can also be made to predict the future image tokens. It's a very valuable supervised learning signal aside from pre-training and various forms of RL.
HarHarVeryFunny•Mar 10, 2026
I think "world models" is the wrong thing to focus on when contrasting the "animal intelligence" approach (which is what LeCun is striving for) with LLMs, especially since "world model" means different things to different people. Some people would call the internal abstractions/representations that an LLM learns during training a "world model" (of sorts).
The fundamental problem with today's LLMs that will prevent them from achieving human level intelligence, and creativity, is that they are trained to predict training set continuations, which creates two very major limitations:
1) They are fundamentally a COPYING technology, not a learning or creative one. Of course, as we can see, copying in this fashion will get you an extremely long way, especially since it's deep patterns (not surface level text) being copied and recombined in novel ways. But, not all the way to AGI.
2) They are not grounded, therefore they are going to hallucinate.
The animal intelligence approach, the path to AGI, is also predictive, but what you predict is the external world, the future, not training set continuations. When your predictions are wrong (per perceptual feedback) you take this as a learning signal to update your predictions to do better next time a similar situation arises. This is fundamentally a LEARNING architecture, not a COPYING one. You are learning about the real world, not auto-regressively copying the actions that someone else took (training set continuations).
Since the animal is also acting in the external world that it is predicting, and learning about, this means that it is learning the external effects of it's own actions, i.e. it is learning how to DO things - how to achieve given outcomes. When put together with reasoning/planning, this allows it to plan a sequence of actions that should achieve a given external result ("goal").
Since the animal is predicting the real world, based on perceptual inputs from the real world, this means that it's predictions are grounded in reality, which is necessary to prevent hallucinations.
So, to come back to "world models", yes an animal intelligence/AGI built this way will learn a model of how the world works - how it evolves, and how it reacts (how to control it), but this behavioral model has little in common with the internal generative abstractions that an LLM will have learnt, and it is confusing to use the same name "world model" to refer to them both.
sothatsit•Mar 10, 2026
RL on LLMs has changed things. LLMs are not stuck in continuation predicting territory any more.
Models build up this big knowledge base by predicting continuations. But then their RL stage gives rewards for completing problems successfully. This requires learning and generalisation to do well, and indeed RL marked a turning point in LLM performance.
A year after RL was made to work, LLMs can now operate in agent harnesses over 100s of tool calls to complete non-trivial tasks. They can recover from their own mistakes. They can write 1000s of lines of code that works. I think it’s no longer fair to categorise LLMs as just continuation-predictors.
libraryofbabel•Mar 10, 2026
Thanks for saying this. It never ceases to amaze me how many people still talk about LLMs like it’s 2023, completely ignoring the RLVR revolution that gave us models like Opus that can one-shot huge chunks of works-first-time code for novel use cases. Modern LLMs aren’t just trained to guess the next token, they are trained to solve tasks.
HarHarVeryFunny•Mar 10, 2026
Forget 2023 - the advances in coding ability in just last 2-months are amazing. But, they are still not AGI, and it is almost certainly going to take more than just a new training regime such as RL to get there. Demis Hassabis estimates we need another 2-3 "transformer-level" discoveries to get there.
HarHarVeryFunny•Mar 10, 2026
RL adds a lot of capability in the areas where it can be applied, but I don't think it really changes the fundamental nature of LLMs - they are still predicting training set continuations, but now trying to predict/select continuations that amount to reasoning steps steering the output in a direction that had been rewarded during training.
At the end of the day it's still copying, not learning.
RL seems to mostly only generalize in-domain. The RL-trained model may be able to generate a working C compiler, but the "logical reasoning" it had baked into it to achieve this still doesn't stop it from telling you to walk to the car wash, leaving your car at home.
There may still be more surprises coming from LLMs - ways to wring more capability out of them, as RL did, without fundamentally changing the approach, but I think we'll eventually need to adopt the animal intelligence approach of predicting the world rather than predicting training samples to achieve human-like, human-level intelligence (AGI).
sothatsit•Mar 11, 2026
You can’t really say it is just predicting continuations when it is learning to write proofs for Erdos problems, formalise significant math results, or perform automated AI research. Those are far beyond what you get by just being a copying and re-forming machine, a lot of these problems require sophisticated application of logic.
I don’t know if this can reach AGI, or if that term makes any sense to begin with. But to say these models have not learnt from their RL seems a bit ludicrous. What do you think training to predict when to use different continuations is other than learning?
I would say LLM’s failure cases like failing at riddles are more akin to our own optical illusions and blind spots rather than indicative of the nature of LLMs as a whole.
HarHarVeryFunny•Mar 11, 2026
I think you're conflating mechanism with function/capability.
I'm not sure what I wrote that made you conclude that I thought these models are not learning anything from their RL training?! Let me say it again: they are learning to steer towards reasoning steps that during training led to rewards.
The capabilities of LLMs, both with and without RL, are a bit counter-intuitive, and I think that, at least in part, comes down to the massive size of the training sets and the even more massive number of novel combinations of learnt patterns they can therefore potentially generate...
In a way it's surprising how FEW new mathematical results they've been coaxed into generating, given that they've probably encountered a huge portion of mankind's mathematical knowledge, and can potentially recombine all of these pieces in at least somewhat arbitrary ways. You might have thought that there are results A, B and C hiding away in some obscure mathematical papers that no human has previously considered to put together before (just because of the vast number of such potential combinations), that might lead to some interesting result.
If you are unsure yourself about whether LLMs are sufficient to reach AGI (meaning full human-level intelligence), then why not listen to someone like Demis Hassabis, one of the brightest and best placed people in the field to have considered this, who says the answer is "no", and that a number of major new "transformer-level" discoveries/inventions will be needed to get there.
HarHarVeryFunny•Mar 11, 2026
> What do you think training to predict when to use different continuations is other than learning?
Sure, training = learning, but the problem with LLMs is that is where it stops, other than a limited amount of ephemeral in-context learning/extrapolation.
With an LLM, learning stops post-training when it is "born" and deployed, while with an animal that's when it starts! The intelligence of an animal is a direct result of it's lifelong learning, whether that's imitation learning from parents and peers (and subsequent experimentation to refine the observed skill), or the never ending process of observation/prediction/surprise/exploration/discovery which is what allows humans to be truly creative - not just behaving in ways that are endless mashups of things they have seen and read about other humans doing (cf training set), but generating truly novel behaviors (such as creating scientific theories) based on their own directed exploration of gaps in mankind's knowledge.
Application of AGI to science and new discovery is a large part of why Hassabis defines AGI as human-equivalent intelligence, and understands what is missing, while others like Sam Altman are content to define AGI as "whatever makes us lots of money".
qsera•Mar 11, 2026
>The fundamental problem with today's LLMs that will prevent them from achieving human level intelligence, and creativity, is that they are trained to predict training set continuations, which creates two very major limitations:
I am of the opinion that imagination and creativity comes from emotion, hence a machine that cannot "feel" will never be truly intelligent.
One can go ahead and ask, but you are just a lump of meat, if you can feel, then so a computer of similar structure can.
If we assume that physical reality is fundamental, then that might make sense. But what if consciousness is fundamental and reality plays on consciousness?
Then randomness, and in-turn ideas come from the attributes of the fundamental reality that we are in.
I ll try to simplify it. Imagine you having an idea that extends your life for a day. Then from all the possible worlds, in some worlds, you find yourselves living in the next day (in others you are dead). But this "idea" you had, was just one among the infinite sea of possibilities, and your consciousness inside one such world observes you having that idea and survive for a day!
If you want to create a machine that can do that, it implies that you should be a consciousness inside a world in it (because the machine cannot pick valid worlds from infinite samples, but just enables consciousness to exists such suitable worlds). So it cannot be done in our reality!
Mayyyyy be "Quantum Darwinism" is what I am trying to describe here..
HarHarVeryFunny•Mar 11, 2026
> I am of the opinion that imagination and creativity comes from emotion
How do you see emotion as being necessary for creativity?
It sure seems that things like surprise (prediction failure) driven "curiosity" and exploration (I can't predict what will happen if I do X, so let me try) are behind creativity, pushing the boundaries of knowledge and discovering something new.
Perhaps you mean artistic creativity rather than scientific, in which case we're talking about different things, but I'd agree with you since the goal of much art is to elicit an emotional response in those engaging with it.
I don't think there is anything stopping us from implementing emotions, every bit as real as our own, in some form of artificial life if we want to though. At the end of the day emotion comes down to our primitive brain releasing chemicals like adrenaline, dopamine, etc as a result of certain stimuli, the functioning of our brain/body being affected by those chemicals, and the feedback loop of us then recognizing how our brain/body is operating differently ("I feel sad/exited/afraid" etc). It's all very mechanical.
FWIW I think consciousness is also very mechanical, but it seems somewhat irrelevant to the discussion of intelligence/AGI.
myth_drannon•Mar 10, 2026
This could have been 1000 seed rounds. We are creating technological deserts by going all-in on AI and star personalities.
net01•Mar 10, 2026
Because for these investors the opportunity cost of this is higher than other startups.
I agree with you; there should be more diversity in investments in EU startups, but ¯\_(ツ)_/¯ not my money.
dmix•Mar 10, 2026
There's seems to be no little shortage of capital in the global market.
imjonse•Mar 10, 2026
At least some of that money should definitely go towards improving his powerpoint slides on JEPA related work :)
storus•Mar 10, 2026
Wasn't there some recent argument that world models won't achieve AGI either due to overlooking the normative framework, fundamental symmetries of the world purely from data and collapse in multi-step reasoning? JEPA is sacrificing fidelity for abstract representation yet how does that help in the real world where fidelity is the most important point? It's like relying on differential equations yet soon finding out they only cover minuscule amount of real world problems and almost all interesting problems are unsolvable by them.
htrp•Mar 10, 2026
impressive that the round was 100% oversubscribed but to be expected when it's the prof that trained a good chunk of the current AI founders.
That seems to be the valuation, not how much they raised afaik.
mmaunder•Mar 10, 2026
That's between 1 and 10 training runs on a large foundational model, depending on pricing discounts and how much they manage to optimize it. I priced this out last night on AWS, which is admittedly expensive, but models have also gotten larger.
mihaitoth•Mar 10, 2026
This couldn't have happened sooner, for 2 reasons.
1) the world has become a bit too focused on LLMs (although I agree that the benefits & new horizons that LLMs bring are real). We need research on other types of models to continue.
2) I almost wrote "Europe needs some aces". Although I'm European, my attitude is not at all that one of competition. This is not a card game. What Europe DOES need is an ATTRACTIVE WORKPLACE, so that talent that is useful for AI can also find a place to work here, not only overseas!
FartyMcFarter•Mar 11, 2026
> What Europe DOES need is an ATTRACTIVE WORKPLACE, so that talent that is useful for AI can also find a place to work here, not only overseas!
There is DeepMind, OpenAI and Anthropic in London. Even after Brexit, London is still in Europe.
sbcorvus•Mar 10, 2026
More research on more models = more betta
ardawen•Mar 10, 2026
Does anyone have a sense of how funding like this is typically allocated?
how much tends to go toward compute/training versus researchers, infrastructure, and general operations?
cmrdporcupine•Mar 10, 2026
Looks like they'll be hiring on in Montreal in addition to Paris (and NYC and Signapore): https://jobs.ashbyhq.com/ami
I hope they grow that office like crazy. This would be really good for Canada. We have (or have had) the AI talent here (though maybe less so overall in Montreal than in Toronto/Waterloo and Vancouver and Edmonton).
And I hope Carney is promoting the crap out of this and making it worth their while to build that office out.
I don't really do Python or large scale learning etc, so don't see a path for myself to apply there but I hope this sparks some employment growth here in Canada. Smart choice to go with bilingual Montreal.
compounding_it•Mar 11, 2026
Montreal and Paris means the europeans and French can move in and out when it comes to hiring. I really like how the world has interest in EU, Canada and Australia now that the west has become unstable for immigration.
LarsDu88•Mar 10, 2026
There's been a few very interesting JEPA publications from LeCun recently, particularly the leJEPA paper which claims to simplify a lot of training headaches for that class of models.
JEPAs also strike me as being a bit more akin to human intelligence, where for example, most children are very capable of locomotion and making basic drawings, but unable to make pixel level reconstructions of mental images (!!).
One thing I want to point out is that very LeCunn type techniques demonstrating label free training such as JEAs like DINO and JEPAs have been converging on performance of models that require large amounts of labeled data.
Alexandr Wang is a billionaire who made his wealth through a data labeling company and basically kicked LeCunn out.
Overall this will be good for AI and good for open source.
AI is developing backwards. The simplest organisms eat and find food. More complex ones can smell and sense tremors. After several steps in evolution comes vision and complex thought.
AIs that can't smell, can't feel hunger, can't desire -- I do not think it can understand the world the way organic life does.
ruler88•Mar 10, 2026
Meta's greatest loss of the decade
yalogin•Mar 10, 2026
This feels like more justified investment as it’s try to move the needle. Hope he succeeds
levodelellis•Mar 10, 2026
I have no faith in anyone doing AI to accomplish anything (especially relative to how much money they spend) except John Carmack. People should be trying to throw money at him
manojbajaj95•Mar 10, 2026
I attended a talk from Yann LeCun, and he always had a strong opinion about auto-regressive models. Its nice to see someone not just chasing hype and doing more research.
I had lunch with Yann last August, about a week after Alex Wang became his "boss." I asked him how he felt about that, and at the time he told me he would give it a month or two and see how it goes, and then figure out if he should stay or find employment elsewhere. I told him he ought to just create his own company if he decides to leave Meta to chase his own dream, rather than work on the dream's of others.
That said, while I 100% agree with him that LLM's won't lead to human-like intelligence (I think AGI is now an overloaded term, but Yann uses it in its original definition), I'm not fully on board with his world model strategy as the path forward.
yalok•Mar 10, 2026
> I'm not fully on board with his world model strategy as the path forward
can you please elaborate on your strategy as the path forward?
echelon•Mar 11, 2026
You have to understand the strategy of all the other players:
Build attention-grabbing, monetizable models that subsidize (at least in part) the run up to AGI.
Nobody is trying to one-shot AGI. They're grinding and leveling up while (1) developing core competencies around every aspect of the problem domain and (2) winning users.
I don't know if Meta is doing a good job of this, but Google, Anthropic, and OpenAI are.
Trying to go straight for the goal is risky. If the first results aren't economically viable or extremely exciting, the lab risks falling apart.
This is the exact point that Musk was publicly attacking Yann on, and it's likely the same one that Zuck pressed.
YetAnotherNick•Mar 11, 2026
> Trying to go straight for the goal is risky.
That's the point of it. You need to take more risk for different approach. Same as what OpenAI did initially.
SilverBirch•Mar 11, 2026
There's two points here. The first is that a strategy of monetizing models to fund the goal of reaching AI is indistinguishable from just running a business selling LLM model access, you don't actually need to be trying to reach AGI you can just run an LLM company and that is probably what these companies are largely doing. The AGI talk is just a recruiting/marketing strategy.
Secondly, it's not clear that the current LLMs are a run up to AGI. That's what LeCun is betting - that the LLM labs are chasing a local maxima.
khafra•Mar 11, 2026
I mean, Sutskevar and Carmack are trying to one-shot AGI. We just don't talk about them as much as we do the labs with products because their labs aren't selling products.
boulos•Mar 11, 2026
On recent podcasts, Ilya says he's no longer assuming they can jump straight there.
teleforce•Mar 10, 2026
It's really inevitable isn't it, we are going from RAG to PAG, or physical augmented generation.
We already have PINN or physics-informed neural networks [1]. Soon we are going to have physical field computing by complex-valued network quantization or CVNN that has been recently proposed for more efficient physical AI [2].
I wonder how Carmack's AGI work is going. He's been quite for a while.
groundzeros2015•Mar 10, 2026
Well he will need to spend a lot less time on twitter to be successful in a new venture
hinkley•Mar 11, 2026
The better to make paperclips, my dear!
sbinnee•Mar 11, 2026
So it is a startup? I expected it in fact from his reply to my concern. In my opinions, to explore the unknown, I think an institute like Mila, led by Yoshua Bengio, would have been more fitting. But Yann LeCun's career and his reply to my rant[1] speak for himself. I wonder how he is going to make money. Aside all my concerns, I wish him the best.
> You're absolutely right. Only large and profitable companies can afford to do actual research. All the historically impactful industry labs (AT&T Bell Labs, IBM Research, Xerox PARC, MSR, etc) were with companies that didn't have to worry about their survival. They stopped funding ambitious research when they started losing their dominant market position.
Refreshing to see some competition to the US AI scene. It's been the same three models trying to one up each other by copying and tweaking rather than pushing true innovation
halayli•Mar 11, 2026
I feel HN comments have been getting hijacked for a long time now by LLM agents. Always so early, very positive, and hard to spot. Some replaced em-dash with --, some replace them with a single dash, some remove them all together. I wonder how much time it is taking from @dang and other moderators helping to maintain this community.
dang•Mar 11, 2026
Can you mention some specific examples? If you don't want to post them here, emailing hn@ycombinator.com would be good.
We recently promoted the no-generated-comments rule from case law [1] to the site guidelines [2], and we're being pretty active about banning accounts that break it.
It's curious to me why we have no theory of intelligence. By which I mean an actual hard and verified theory, as in physics for gravity, electromagnetism, quantum mechanics.
Intelligence is simply not well-understood at a mathematical level. Like medieval engineers, we rely so heavily on experimentation in AI. We have no idea how far away from the human level we actually are. Or how far above the human level we can get. Or what, if anything, the limits of intelligence are.
jimbokun•Mar 11, 2026
By now you would have to say it’s because “intelligence” is no more well defined than “consciousness” or “the soul”.
A more concrete idea like “learning” has been very strongly defined and quantifiable, which is maybe why progress in a theory of learning is so much more advanced than a theory of “intelligence“.
programjames•Mar 11, 2026
I think this is the equivalent of a non-nuclear physicist asking, "why do we have no theory of nuclear physics?" in the late 1930s. Some people do, they're just not sharing it.
booleandilemma•Mar 11, 2026
Who is more intelligent: a twenty-something influencer making money from her bedroom, or a grad student barely making ends meet?
Who is more intelligent: a politician, or a high school teacher?
What is intelligence, anyway?
Mistletoe•Mar 11, 2026
We have a pretty good answer to your questions, they are called IQ tests. It’s not like measuring intelligence is uncharted territory.
Gemini 3 Pro has an IQ of 130 now but we keep moving the goalposts and being like “not THAT intelligence, we mean this other intelligence”. I suspect, and history shows us this will be the case, that humans will judge AIs as not human and not intelligent and not needing rights way past the point where they should have rights, even when vastly superior to human intelligence.
booleandilemma•Mar 11, 2026
IQ tests are nonsense. The more IQ tests you take the better at them you get. And who is "we", you pretentious dirtbag.
namero999•Mar 11, 2026
IQ tests only measure the ability to pass IQ tests, they say very little about intelligence. MMA fighters might be among the most intelligent people on the planet, playing 4D bullet chess with each part of their body at light speed, while scoring a flat 100 at IQ tests (the average).
tellarin•Mar 11, 2026
Selfless plug here... Some collaborators and I just released a first version of a benchmark we think highlights a critical gap in recent models in understanding causality in the real-world, beyond a physics focus.
Everyday environments are rich in tangible control interfaces (TCIs), like, light switches, appliance panels, and embedded GUIs, that are designed for humans and demand commonsense and physics reasoning, but also causal prediction and outcome verification in time and space (e.g., delayed heating, remote lights).
Feedback, suggestions, and collaborators are very welcome!
ernsheong•Mar 11, 2026
One wonders why this sort of research isn’t in academia but in startups instead.
chabons•Mar 11, 2026
Where in academia can one get a Billion (with a b) dollars to research something?
blobbers•Mar 11, 2026
Am I going to finally get a robot to fold my clothes?
noiv•Mar 11, 2026
Wouldn't that involve to read and understand an enormous amount of sensor data?
Toto336699•Mar 11, 2026
Following in the foot steps of miss fei fei li's World Lab?
Toto336699•Mar 11, 2026
Following in the foot steps of miss Fei Fei Li's World Lab?
They are currently estimated to be at a 5bn valuation.
_giorgio_•Mar 11, 2026
LeCun has had every advantage imaginable — and the scoreboard remains empty.
He joined Facebook (now Meta) in December 2013. That's over 12 years of access to one of the largest AI labs in the world, near-unlimited compute, and some of the best researchers money can buy.
He introduced I-JEPA in 2023, nearly 3 years ago. It was supposed to represent a fundamental shift in how machines learn — moving beyond generative models toward a deeper, more structured world understanding.
And yet: I-JEPA hasn't decisively beaten existing models on any major benchmark. No Meta product uses JEPA as a core approach. The research community hasn't adopted it — the field keeps pushing on LLMs and diffusion models. There's been no "GPT moment" for JEPA, no single result that made its value obvious to everyone.
So the question becomes simple: how many years, how many resources, and how many failed proof-of-concepts does it take before we're allowed to judge whether an idea actually works?
snek_case•Mar 11, 2026
First, believe it or not, 3 years is not that long. It's also not a given that LeCun was given the resources he needed to work on this tech at Meta. Zuck wanted another llama.
Second, AMI Labs just secured a billion in funding, and while that's a lot of money, it's literally just a fraction of the yearly salary they are paying to Wang. Big tech companies are literally throwing tens of billions to keep doing the same thing, just on a bigger scale. Why not try something else once in a while?
julius_eth_dev•Mar 11, 2026
LeCun has been pushing world models and joint embedding predictive architectures (JEPA) for years now as an alternative to the generative pretraining paradigm. The core bet — that you need learned abstract representations of physical dynamics rather than just next-token prediction — is compelling, but $1B is a lot of capital to validate an architecture that still hasn't demonstrated clear advantages over scaling what already works. The interesting question is whether this funding lets them finally show JEPA-style approaches outperforming autoregressive models on tasks requiring genuine physical reasoning, or if the money just gets absorbed into the same GPU scaling game everyone else is playing.
taytus•Mar 11, 2026
He raises $1B, couldn't OAI, Google or Anthropic try similar approaches? Lack of funding isn't a problem those companies have. Why wouldn't they also spend $1B or 5 times that and outcompete (in theory)?
bluesounddirect•Mar 11, 2026
Yann is going to sell you the opportunity to sell people the opportunity of better AI .
kkwteh•Mar 11, 2026
$1B at a $3.5B valuation. Seems problematic from a cap table perspective.
lazyguythugman•Mar 11, 2026
I've been following him on X for awhile. I'm surprised he has time for this because he is always retweeting anti Trump stuff all day every day.
voxleone•Mar 11, 2026
I rank with those who think human-like intelligence will require embeddings grounded in multiple physical sensory domains (vision, touch, audio, chemical sensing, etc.) fused into a shared world representation. That seems much closer to how biological intelligence works than text-only models. But if this path succeeds and produces systems with something like genuine understanding or sentience, there’s a deeper question: what is the moral status of such systems? If they have experiences or agency, treating them purely as tools could start to look uncomfortably close to slavery.
djeastm•Mar 11, 2026
It's an interesting question. On one hand we don't worry about this much with animals, the most advanced of which we know have personalities, moods, etc (Pigs, for instance). They really only seem to lack the language and higher-order reasoning skills. But where's the line?
confidantlake•Mar 11, 2026
And while they don't have language like we do, dogs can understand basic commands and they aren't even the smartest animals.
pegasus•Mar 11, 2026
We do worry much more about animal well-being than we worry about our "lumps of metal" (as a cousin comment fittingly put it). As we should, and generally I think we should worry much more about animal welfare. I find concerns for AI system welfare voiced by people like Thomas Metzinger wildly misguided.
nprateem•Mar 11, 2026
Lol. A lump of metal can't be sentient.
heisig•Mar 11, 2026
Says the bag of lipids and proteins :)
snek_case•Mar 11, 2026
Mostly water, actually.
butlike•Mar 11, 2026
Typical. You know they pump the chickens at the grocery store too.
busyant•Mar 11, 2026
Carbon, Hydrogen, Oxygen, Nitrogen, Phosphorus, Sulfur and a dash of other elements.
$99.85 at Sigma-Aldrich
ToValueFunfetti•Mar 11, 2026
Yeah, call me when Yann incorporates the four humors and the elemental force of fire, from which we draw life. Metal lacks the nature for this purpose.
I think the more likely retort will be that we can't be smart, by the AI's standard.
mc32•Mar 11, 2026
What’s the difference from thinking your brain is a slave to your body or vice versa?
We only think slavery is bad because have a philosophy and language to describe and evaluate the situation. It’s unlikely Ant colonies understand the concept of slavery, eunuchs, or feminism. We have the framework to understand these concepts without them we’d be oblivious to them.
carra•Mar 11, 2026
I don't think they will have sentience or agency unless they are designed to:
1) Keep thinking continuously, as opposed to current AIs that stop functioning between prompts.
2) Have permanent memory of their previous experiences.
3) Be able to alter their own weights based on those experiences (a.k.a. learn).
snek_case•Mar 11, 2026
That's the direction the field is already going with "agents". People want autonomous AI agents that are capable of acting independently and that have more and more capabilities. For example, something like Claude code, but that acts as a sidekick that is constantly running, and able to act without being prompted. That's what people are imagining when they talk about teams of agents. You act as a manager, but your coding agents are off working on various features and only check in periodically.
butlike•Mar 11, 2026
They won't have sentience because it will be antithetical to capitalist business ideology. There's no good business value proposition for having the AI daydream like humans do, or 'sleep' while 'on', or have inspirational thought that might be seen as 'wrong' or useless. If that behavior ever manifests, it will probably be stamped out in a future release.
You can't justify to the board the wasted money to have the android dream.
re5i5tor•Mar 11, 2026
Does anyone else see an echo of Severance (Apple TV series) here?
boringg•Mar 11, 2026
Its interesting that you seem to be more concerned that we would potentially enslave human like robots (while arguing sentience) while the likelihood of events is that we are far more likely to be enslaved to/by our own creations.
Id say probability wise we don’t create sentient like behavior for a long time (low probability) much higher is the second circumstance.
nashashmi•Mar 11, 2026
Personal Agency is a strong characteristic of a personality. AI would have to acquire a personality first. It could probably do this by copying others statistically. In that case, it is only doing what someone else has done.
There is no such thing as real sentient AI theoretically. Our current models are only emulations of humans. Maybe in the future someone will figure out a way for computers to learn how to learn. Then maybe someone will codify computers to acquire base methodologies vs just implementing any methodology it finds in the world.
thih9•Mar 11, 2026
Off topic, in case anyone wants to reject cookies, click the underlined "228" in the popup's:
> We, and our 228 partners use cookies
And then you'll see a "reject all" button. Can't make this up.
zahlman•Mar 11, 2026
I just block "all cross-site cookies" in Firefox settings. This "may cause websites to break" but it hasn't affected anything I care about in years.
JhonOliver•Mar 11, 2026
This is so stupid. AI already understands the physical world. What it can't do is interact with it. There's a hardware bottleneck. It simply isn't responsive.
leventilo•Mar 11, 2026
Interesting that AMI is betting on video-first world models. A 4-year-old learns physics mostly through interaction, pushing, dropping, breaking things, not just watching. Vision helps but the feedback loop from acting in the world seems at least as important. Still, glad someone is putting $1B on a fundamentally different bet than "more text, bigger model."
SilentM68•Mar 11, 2026
Nice, all avenues should be explored. I'm for anything that leads to real solutions, cures, knowledge :)
catigula•Mar 11, 2026
At this point, given that we basically literally have AGI, pursuing other avenues seems like an interesting approach.
JimSanchez•Mar 11, 2026
Interesting perspective from LeCun. The debate between scaling LLMs versus building systems that understand the physical world seems like one of the big open questions in AI right now. It will be fascinating to see whether “world models” end up complementing LLMs or eventually replacing parts of them.
73 Comments
Hope it puts to bed the "Europe can't innovate" crowd too.
I pretty strongly think it will only benefit the rich and powerful while further oppressing and devaluing everyone else. I tend to think this is an obvious outcome and it would be obviously very bad (for most of us)
So I wonder if you just think you will be one of the few who benefit at the expense of others, or do you truly believe AI will benefit all of humanity?
It's not a zero sum game, IMO. It will benefit some, be neutral for others, negative for others.
For instance, improved productivity could be good (and doesn't have to result in layoffs, Jevon's paradox will come into play, IMO, with increased demand). Easier/better/faster scientific research could be good too. Not everyone would benefit from those, but not everyone has to for it to be generally good.
Autonomous AI-powered drone swarms could be bad, or could result in a Mutually Assured Destruction stalemate.
This is literally a description of a zero sum game
You’re so close to getting it and I’m rooting for you
It already has resulted in layoffs and one of the weakest job markets we've seen in ages
Executives could not have used it as an excuse for layoffs faster, they practically tripped over themselves trying to use it as an excuse to lay people off
Or is it to accelerate Skynet?
As a french, I wish him good luck anyway, I'm all for exploring different avenues of achieving AGI.
A "world" is just senses. In a way the context is one sense. A digital only world is still a world.
I think more success is in a model having high level needs and aspirations that are borne from lower level needs. Model architecture also needs to shift to multiple autonomous systems that interact, in the same ways our brains work - there's a lot under the surface inside our heads, it's not just "us" in there.
We only interact with our environment because of our low level needs, which are primarily: food, water. Secondary: mating. Tertiary: social/tribal credit (which can enable food, water and mating).
It sounds like you are imagining tacking a world model onto an LLM. That's one approach but not what LeCun advocates for.
Academics don’t always make great entrepeneurs
There are a lot more degrees of freedom in world models.
LLMs are fundamentally capped because they only learn from static text -- human communications about the world -- rather than from the world itself, which is why they can remix existing ideas but find it all but impossible to produce genuinely novel discoveries or inventions. A well-funded and well-run startup building physical world models (grounded in spatiotemporal understanding, not just language patterns) would be attacking what I see as the actual bottleneck to AGI. Even if they succeed only partially, they may unlock the kind of generalization and creative spark that current LLMs structurally can't reach.
Everything is bits to a computer, but text training data captures the flattened, after-the-fact residue of baseline human thought: Someone's written description of how something works. (At best!)
A world model would need to capture the underlying causal, spatial, and temporal structure of reality itself -- the thing itself, that which generates those descriptions.
You can tokenize an image just as easily as a sentence, sure, but a pile of images and text won't give you a relation between the system and the world. A world model, in theory, can. I mean, we ought to be sufficient proof of this, in a sense...
So when we think about capturing any underlying structure of reality itself, we are constrained by the tools at hand.
The capability of the tool forms the description which grants the level of understanding.
World models and vision seems like a great use case for robotics which I can imagine that being the main driver of AMI.
Even with continuous backpropagation and "learning", enriching the training data, so called online-learning, the limitations will not disappear. The LLMs will not be able to conclude things about the world based on fact and deduction. They only consider what is likely from their training data. They will not foresee/anticipate events, that are unlikely or non-existent in their training data, but are bound to happen due to real world circumstances. They are not intelligent in that way.
Whether humans always apply that much effort to conclude these things is another question. The point is, that humans fundamentally are capable of doing that, while LLMs are structurally not.
The problems are structural/architectural. I think it will take another 2-3 major leaps in architectures, before these AI models reach human level general intelligence, if they ever reach it. So far they can "merely" often "fake it" when things are statistically common in their training data.
That's what I said. Backpropagation cannot be enough; that's not how neurons work in the slightest. When you put biological neurons in a Pong environment they learn to play not through some kind of loss or reward function; they self-organize to avoid unpredictable stimulation. As far as I know, no architecture learns in such an unsupervised way.
https://www.sciencedirect.com/science/article/pii/S089662732...
This sounds very similar to me as to what neurons do (avoid unpredictable stimulation)
f(x)=y' => loss(y',y) => how good was my prediction? Train f through backprop with that error.
While a model trained with reinforcement learning is more similar to this. Where m(y) is the resulting world state of taking an action y the model predicted.
f(x)=y' => m(y')=z => reward(z) => how good was the state I was in based on my actions? Train f with an algorithm like REINFORCE with the reward, as the world m is a non-differentiable black-box.
While a group of neurons is more like predicting what is the resulting word state of taking my action, g(x,y), and trying to learn by both tuning g and the action taken f(x).
f(x)=y' => m(y')=z => g(x,y)=z' => loss(z,z') => how predictable was the results of my actions? Train g normally with backprop, and train f with an algorithm like REINFORCE with negative surprise as a reward.
After talking with GPT5.2 for a little while, it seems like Curiosity-driven Exploration by Self-supervised Prediction[1] might be an architecture similar to the one I described for neurons? But with the twist that f is rewarded by making the prediction error bigger (not smaller!) as a proxy of "curiosity".
[1] https://arxiv.org/pdf/1705.05363
So my question is: when is there enough training data that you can handle 99.99% of the world ?
Can you be a bit more specific at all bounds? Maybe via an example?
Our training data is a lot more diverse than an LLMs. We also leverage our senses as a carrier for communicating abstract ideas using audio and visual channels that may or may not be grounded in reality. We have TV shows, video games, programming languages and all sorts of rich and interesting things we can engage with that do not reflect our fundamental reality.
Like LLMs, we can hallucinate while we sleep or we can delude ourselves with untethered ideas, but UNLIKE LLMs, we can steer our own learning corpus. We can train ourselves with our own untethered “hallucinations” or we can render them in art and share them with others so they can include it in their training corpus.
Our hallucinations are often just erroneous models of the world. When we render it into something that has aesthetic appeal, we might call it art.
If the hallucination helps us understand some aspect of something, we call it a conjecture or hypothesis.
We live in a rich world filled with rich training data. We don’t magically anticipate events not in our training data, but we’re also not void of creativity (“hallucinations”) either.
Most of us are stochastic parrots most of the time. We’ve only gotten this far because there are so many of us and we’ve been on this earth for many generations.
Most of us are dazzled and instinctively driven to mimic the ideas that a small minority of people “hallucinate”.
There is no shame in mimicking or being a stochastic parrot. These are critical features that helped our ancestors survive.
This is critical. We have some degree of attentional autonomy. And we have a complex tapestry of algorithms running in thalamocortical circuits that generate “Nows”. Truncation commands produce sequences of acts (token-like products).
Kahneman’s whole framework points the same direction. Most of what people call “reasoning” is fast, associative, pattern-based. The slow, deliberate, step-by-step stuff is effortful and error-prone, and people avoid it when they can. And even when they do engage it, they’re often confabulating a logical-sounding justification for a conclusion they already reached by other means.
So maybe the honest answer is: the gap between what LLMs do and what most humans do most of the time might be smaller than people assume. The story that humans have access to some pure deductive engine and LLMs are just faking it with statistics might be flattering to humans more than it’s accurate.
Where I’d still flag a possible difference is something like adaptability. A person can learn a totally new formal system and start applying its rules, even if clumsily. Whether LLMs can genuinely do that outside their training distribution or just interpolate convincingly is still an open question. But then again, how often do humans actually reason outside their own “training distribution”? Most human insight happens within well-practiced domains.
I've never heard about the Wason selection task, looked it up, and could tell the right answer right away. But I can also tell you why: because I have some familiarity with formal logic and can, in your words, pattern-match the gotcha that "if x then y" is distinct from "if not x then not y".
In contrast to you, this doesn't make me believe that people are bad at logic or don't really think. It tells me that people are unfamiliar with "gotcha" formalities introduced by logicians that don't match the everyday use of language. If you added a simple additional to the problem, such as "Note that in this context, 'if' only means that...", most people would almost certainly answer it correctly.
Mind you, I'm not arguing that human thinking is necessarily more profound from what what LLMs could ever do. However, judging from the output, LLMs have a tenuous grasp on reality, so I don't think that reductionist arguments along the lines of "humans are just as dumb" are fair. There's a difference that we don't really know how to overcome.
Though note that as GP said, on the Wason selection task, people famously do much better when it's framed in a social context. That at least partially undermines your theory that its lack of familiarity with the terminology of formal logic.
It's as simple as that. In common use, "if x then y" frequently implies "if not x then not y". Pretending that it's some sort of a cognitive defect to interpret it this way is silly.
> Decoding analyses of neural activity further reveal significant above chance decoding accuracy for negated adjectives within 600 ms from adjective onset, suggesting that negation does not invert the representation of adjectives (i.e., “not bad” represented as “good”)[...]
From: Negation mitigates rather than inverts the neural representations of adjectives
At: https://journals.plos.org/plosbiology/article?id=10.1371/jou...
Agreed. More broadly, classical logic isn't the only logic out there. Many logics will differ on the meaning of implication if x then y. There's multiple ways for x to imply y, and those additional meanings do show up in natural language all the time, and we actually do have logical systems to describe them, they are just lesser known.
Mapping natural language into logic often requires a context that lies outside the words that were written or spoken. We need to represent into formulas what people actually meant, rather than just what they wrote. Indeed the same sentence can be sometimes ambiguous, and a logical formula never is.
As an aside, I wanna say that material implication (that is, the "if x then y" of classical logic) deeply sucks, or rather, an implication in natural language very rarely maps cleanly into material implication. Having an implication if x then y being vacuously true when x is false is something usually associated with people that smirk on clever wordplays, rather than something people actually mean when they say "if x then y"
> You are shown a set of four cards placed on a table, each of which has a number on one side and a color on the other. The visible faces of the cards show 3, 8, blue and red. Which card(s) must you turn over in order to test that if a card shows an even number on one face, then its opposite face is blue?
Confusion over the meaning of 'if' can only explain why people select the Blue card; it can't explain why people fail to select the Red card. If 'if' meant 'if and only if', then it would still be necessary to check that the Red card didn't have an even number. But according to Wason[0], "only a minority" of participants select (the study's equivalent of) the Red card.
[0] https://web.mit.edu/curhan/www/docs/Articles/biases/20_Quart...
I think a corrected question should clarify in any obvious way that we are verifying not "a card" but "a rule" applicable to all cards. So a needs to be replaced with all or any, and mention of rule or pattern needs to be added.
So, when being told:
"Which card(s) must you turn over in order to test that if a card shows an even number on one face, then its opposite face is blue?"
they translate it to:
"Check the cards that show an even number on one face to see whether their opposite face is blue and vice versa"
Based on this, many would naturally pick the blue card (to test the direct case), and the 8 card (to test the "vice versa" case).
They wont check the red to see if there's an odd number there that invalidates the formulation as a general rule, because they're not in the mindset of testing a general rule.
Would they do the same if they had more familiarity with rule validation in everyday life or if the had a more verbose and explicit explanation of the goal?
Im not sure why people keep comparing machine-behaviour to human's. Its like Economic models that assume perfect rationality... yeah that's not reality mate.
We keep benchmarking models against the best humans and the best human institutions - then when someone points out that swarms, branching, or scale could close the gap, we dismiss it as "cheating". But that framing smuggles in an assumption that intelligence only counts if it works the way ours does. Nobody calls a calculator a cheat for not understanding multiplication - it just multiplies better than you, and that's what matters.
LLMs are a different shape of intelligence. Superhuman on some axes, subpar on others. The interesting question isn't "can they replicate every aspect of human cognition" - it's whether the axes they're strong on are sufficient to produce better than human outcomes in domains that matter. Calculators settled that question for arithmetic. LLMs are settling it for an increasingly wide range of cognitive work. The fact that neither can flip a burger is irrelevant.
Humans don't have a monopoly on intelligence. We just had a monopoly on generality and that moat is shrinking fast.
We are doing inversion of God of gaps to "LLM of Gaps" where gaps in LLM capabilities are considered inherently negative and limiting
And the questions "Are these things really intelligent" is just a proxy for that.
And we are interested in that question because that is necessary to justify the massive investment these things are getting now. It is quite easy to look at these things and conclude that it will continue to progress without any limit.
But that would be like looking at data compression at the time of its conception, and thinking that it is only a matter of time we can compress 100GB into 1KB..
We live in a time of scams that are obvious if you take a second look. If something that require much deeper scrutiny, then it is possible to generate a lot more larger bubble.
> and that moat is shrinking fast..
The point is that in reality it is not. It is just appearance. If you consider how these things work, then there is no justification of this conclusion.
I have said this elsewhere, but the problem of Hallucination itself along with the requirement of re-training, the smoking gun that these things are not intelligence in ways that would justify these massive investments.
You're right that the Wason task is partly about a mismatch between how "if" works in formal logic and how it works in everyday language. That's a fair point. But I think it actually supports what I'm saying rather than undermining it. If people default to interpreting "if x then y" as "if and only if" based on how language normally works in conversation, that is pattern-matching from familiar context. It's a totally understandable thing to do, and I'm not calling it a cognitive defect. I'm saying it's evidence that our default mode is contextual pattern-matching, not rule application. We agree on the mechanism, we're just drawing different conclusions from it.
Your own experience is interesting too. You got the right answer because you have some background in formal logic. That's exactly what I'd expect. Someone who's practiced in a domain recognizes the pattern quickly. But that's the claim: most reasoning happens within well-practiced domains. Your success on the task doesn't counter the pattern-matching thesis, it's a clean example of it working well.
On the broader point about LLMs having a "tenuous grasp on reality," I hear that, and I don't want to flatten the differences. There probably is something meaningfully different going on with how humans stay grounded. I just think the "humans reason, LLMs pattern-match" framing undersells how much human cognition is also pattern-matching, and that being honest about that is more productive than treating it as a reductionist insult.
Your point rings true with most human reasoning most of the time. Still, at least some humans do have the capability to run that deductive engine, and it seems to be a key part (though not the only part) of scientific and mathematical reasoning. Even informal experimentation and iteration rest on deductive feedback loops.
I can perform symbolic calculations too. But most people have limited versions of this skill, and many people who don’t learn to think symbolically have full lives.
I think it is fair to say humans don’t naturally think in formal or symbolic reasoning terms.
People pattern match,
Another clue is humans have to practice things, become familiar with them to reason even somewhat reliable about them. Even if they already learned some formal reasoning.
—-
Higher level reasoning is always implemented as specific forms of lower order reasoning.
There is confusion about substrate processing vs. what higher order processes can be created with that substrate.
We can “just” be doing pattern matching from an implementation view, and yet go far “beyond” pattern matching with specific compositions of pattern matching, from a capability view.
How else could neurons think? We are “only” neurons. Yet we far surpass the kinds of capabilities neurons have.
Some references on that
https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow
https://thedecisionlab.com/reference-guide/philosophy/system...
System 1 really looks like a LLM (indeed completing a phrase is an example of what it can do, like, "you either die a hero, or you live enough to become the _"). It's largely unconscious and runs all the time, pattern matching on random stuff
System 2 is something else and looks like a supervisor system, a higher level stuff that can be consciously directed through your own will
But the two systems run at the same time and reinforce each other
S1 is “bare” language production, picking words or concepts to say or think by a fancy pattern prediction. There’s no reasoning at this level, just blabbering. However, language by itself weeds out too obvious nonsense purely statistically (some concepts are rarely in the same room), but we may call that “mindlessly” - that’s why even early LLMs produced semi-meaningful texts.
S2 is a set of patterns inside the language (“logic”), that biases S1 to produce reasoning-like phrases. Doesn’t require any consciousness or will, just concepts pushing S1 towards a special structure, simply backing one keeps them “in mind” and throws in the mix.
I suspect S2 has a spectrum of rigorousness, because one can just throw in some rules (like “if X then Y, not Y therefore not X”) or may do fancier stuff (imposing a larger structure to it all, like formulating and testing a null hypothesis). Either way it all falls down onto S1 for a ultimate decision-making, a sense of what sounds right (allowing us our favorite logical flaws), thus the fancier the rules (patterns of “thought”) the more likely reasoning will be sounder.
S2 doesn’t just rely but is a part of S1-as-language, though, because it’s a phenomena born out (and inside) the language.
Whether it’s willfully “consciously” engaged or if it works just because S1 predicts logical thinking concept as appropriate for certain lines of thinking and starts to involve probably doesn’t even matter - it mainly depends on whatever definition of “will” we would like to pick (there are many).
LLMs and humans can hypothetically do both just fine, but when it comes to checking, humans currently excel because (I suspect) they have a “wider” language in S1, that doesn’t only include word-concepts but also sensory concepts (like visuospatial thinking). Thus, as I get it, the world models idea.
While humans did seemingly evolve socially very fast, with the tools we seem to have had for a few hundred thousand years it could have been far faster if there were not some other limitations that are being applied.
This is because, the 'reasoning' part of our brain came from evolution when we started to communicate with others, we needed to explain our behaviour.
Which is fascinating if you think of the implications of that. In the most part we think we are being logical, but in reality we are pattern matching/impulsive and using our reasoning/logic to come up for excuses for why we have chosen what we had already decided.
It explains a lot about the world and why it's so hard to reason with someone, we are assuming the decision came from reason in the first place, which when you look at such peoples choices, makes sense as it's clear it didn't.
Humans can produce new concepts and then symbolize them for communication purposes. The meaning of concepts is grounded in operational definitions - in a manner that anyone can understand because they are operational, and can be reproduced in theory by anyone.
For example, euclid invented the concepts of a point, angle and line to operationally represent geometry in the real world. These concepts were never "there" to begin with. They were created from scratch to "build" a world-model that helps humans navigate the real world.
Euclid went outside his "training distribution" to invent point, angle, and line. Humans have this ability to construct new concepts by interaction with the real world - bringing the "unknown" into the "known" so-to-speak. Animals have this too via evolution, but it is unclear if animals can symbolize their concepts and skills to the extent that humans can.
Sure, but the question is how often this actually happens versus how often people are doing something closer to recombination and pattern-matching within familiar territory. The point was about the base rate of genuine novel reasoning in everyday human cognition, and I don't think this addresses that.
> Euclid invented the concepts of a point, angle and line to operationally represent geometry in the real world. These concepts were never "there" to begin with.
This isn't really true though. Egyptian and Babylonian surveyors were working with geometric concepts long before Euclid. What Euclid did was axiomatize and systematize knowledge that was already in wide practical use. That's a real achievement, but it's closer to "sophisticated refinement within a well-practiced domain" than to reasoning from scratch outside a training distribution. If anything the example supports the parent comment.
There's also something off about saying points and lines were "never there." Humans have spatial perception. Geometric intuitions come from embodied experience of edges, boundaries, trajectories. Formalizing those intuitions is real work, but it's not the same as generating something with no prior basis.
The deeper issue is you're pointing to one of the most extraordinary intellectual achievements in human history and treating it as representative of human cognition generally. The whole point, drawing on Kahneman, is that most of what we call reasoning is fast associative pattern-matching, and that the slow deliberate stuff is rarer and more error-prone than people assume. The fact that Euclid existed doesn't tell us much about what the other billions of humans are doing cognitively on a Tuesday afternoon.
> The fact that Euclid existed doesn't tell us much about what the other billions of humans are doing cognitively on a Tuesday afternoon.
Birds can fly - so, there is some flying intelligence built into their dna. But, are they aware of their skill to be able to create a theory of flight, and then use that to build a plane ? I am just pointing out that intuitions are not enough - the awareness of the intuitions in a manner that can symbolize and operationalize it is important.
> The whole point, drawing on Kahneman, is that most of what we call reasoning is fast associative pattern-matching, and that the slow deliberate stuff is rarer and more error-prone than people assume
David Bessis, in his wonderful book [1] argues that the cognitive actions done by you and I on a tuesday afternoon is the same that mathematicians do - just that we are unaware of it. Also, since you brought up Kahneman, Bessis proposes a System 3 wherein inaccurate intuitions is corrected by precise communication.
[1] Mathematica: A Secret World of Intuition and Curiosity
On Bessis, I actually think his argument is more compatible with what I was saying than it might seem. If the cognitive process underlying mathematical reasoning is the same one operating on a Tuesday afternoon, that's an argument against treating Euclid-level formalization as categorically different from everyday cognition. It suggests a continuum rather than a bright line between "pattern matching" and "genuine reasoning." Which is interesting and probably right. But it also means you can't point to Euclid as evidence that humans routinely do something qualitatively beyond what LLMs do. If Bessis is right, then the extraordinary cases and the mundane cases share the same underlying machinery, and the question becomes quantitative (how far along the continuum, how often, under what conditions) rather than categorical.
I'll check out the book though, it sounds like it's making a more careful version of the point than usually gets made in these threads.
I guess I just always thought it was obvious that you can't do better than nature. You can do different things, sure, but if a society of unique individuals wasn't the most effective way of making progress, nature itself would not have chosen it.
So in a way I think Yan is smart because he got money, but in a way I think he's a fucking idiot if he can't see just how very, very very far we are from competing with organic intelligence.
You're onto something there.
If everyone knew they were to die tomorrow, all of a sudden they'd choose to act differently. There is no logical thought process that determines that - it's something else. Something we can't concretely point toward as an object.
But one might say that the brain is not lossless ... True, good point. But in what way is it lossy? Can that be simulated well enough to learn an Einstein? What gives events significance is very subjective.
I like how people are accepting this dubious assertion that Einstein would be "useful" if you surgically removed his hippocampus and engaging with this.
It also calls this Einstein an AGI rather than a disabled human???
"Reading, after a certain age, diverts the mind too much from its creative pursuits. Any man who reads too much and uses his own brain too little falls into lazy habits of thinking".
-- Albert Einstein
As for the "just put a vision LLM in a robot body" suggestion: People are trying this (e.g. Physical Intelligence) and it looks like it's extraordinarily hard! The results so far suggest that bolting perception and embodiment onto a language-model core doesn't produce any kind of causal understanding. The architecture behind the integration of sensory streams, persistent object representations, and modeling time and causality is critically important... and that's where world models come in.
While I suspect latter is a real problem (because all mammal brains* are much more example-efficient than all ML), the former is more about productisation than a fundamental thing: the models can be continuously updated already, but that makes it hard to deal with regressions. You kinda want an artefact with a version stamp that doesn't change itself before you release the update, especially as this isn't like normal software where specific features can be toggled on or off in isolation of everything else.
* I think. Also, I'm saying "mammal" because of an absence of evidence (to my *totally amateur* skill level) not evidence of absence.
The fundamental difference is that physical neurons have a discrete on/off activation, while digital "neurons" in a network are merely continuous differentiable operations. They also don't have a notion of "spike timining dependency" to avoid overwriting activations that weren't related to an outcome. There are things like reward-decay over time, but this applies to the signal at a very coarse level, updates are still scattered to almost the entire system with every training example.
I think this is true to some extent: we like our tools to be predictable. But we’ve already made one jump by going from deterministic programs to stochastic models. I am sure the moment a self-evolutive AI shows up that clears the "useful enough" threshold we’ll make that jump as well.
I also don’t think there is a reason to believe that self-learning models must be unpredictable.
And generally:
> I want to know the model is exactly the same as it was the last time I used it.
What exactly does that gain you, when the overall behavior is still stochastic?
But still, if it's important to you, you can get the same behavior by taking a model snapshot once we crack continuous learning.
[1] https://arxiv.org/pdf/2501.00663
[2] https://arxiv.org/pdf/2512.24695
Whoever cracks the continuous customized (per user, for instance) learning problem without just extending the context window is going to be making a big splash. And I don't mean cheats and shortcuts, I mean actually tuning the model based on received feedback.
The user wouldn’t know if the continuous learning came from the context or the model retrained. It wouldn’t matter.
Continuous learning seems to be a compute and engineering problem.
My solution is to have this massive 'boot up' prompt but it becomes extremely tedious to maintain.
A bit like the main character played by Guy Pierce in the movie Memento (which doesn't work great for him to be honest).
Ultimately, we still have a lot to learn and a lot of experiments to do. It’s frankly unscientific to suggest any approaches are off the table, unless the data & research truly proves that. Why shouldn’t we take this awesome LLM technology and bring in more techniques to make it better?
A really, really basic example is chess. Current top AI models still don’t know how to play it (https://www.software7.com/blog/ai_chess_vs_1983_atari/) The models are surely trained on source material that include chess rules, and even high level chess games. But the models are not learning how to play chess correctly. They don’t have a model to understand how chess actually works — they only have a non-deterministic prediction based on what they’ve seen, even after being trained on more data than any chess novice has ever seen about the topic. And this is probably one of the easiest things for AI to stimulate. Very clear/brief rules, small problem space, no hidden information, but it can’t handle the massive decision space because its prediction isn’t based on the actual rules, but just “things that look similar”
(And yeah, I’m sure someone could build a specific LLM or agent system that can handle chess, but the point is that the powerful general purpose models can’t do it out of the box after training.)
Maybe more training & self-learning can solve this, but it’s clearly still unsolved. So we should definitely be experimenting with more techniques.
I mean, sure. But do world models the way LeCun proposes them solves this? I don't think so. JEPAs are just an unsupervised machine learning model at the end of the day; they might end up being better that just autoregressive pretraining on text+images+video, but they are not magic. For example, if you train a JEPA model on data of orbital mechanics, will it learn actually sensible algorithms to predict the planets' motions or will it just learn a mix of heuristic?
So I do buy his idea. But I disagree that you need world models to get to human level capabilities. IMO there's no fundamental reason why models can't develop human understanding based on the known human observations.
From his point of view, there are not much research left on LLM. Sure we can still improve them a bit with engineering around, but he's more interested in basic research.
What LLM do is even farther away from what neural nets do, and even there - artificial neurons are inspired by but not reimplementing biological neurons.
You can understand human thought in terms of LLMs, but that is just a simile, like understanding physical reality in terms of computers or clockworks.
> One major critique LeCun raises is that LLMs operate only in the realm of language, which is a simple, discrete space compared to the continuous, complex physical world we live in. LLMs can solve math problems or answer trivia because such tasks reduce to pattern completion on text, but they lack any meaningful grounding in physical reality. LeCun points out a striking paradox: we now have language models that can pass the bar exam, solve equations, and compute integrals, yet “where is our domestic robot? Where is a robot that’s as good as a cat in the physical world?” Even a house cat effortlessly navigates the 3D world and manipulates objects — abilities that current AI notably lacks. As LeCun observes, “We don’t think the tasks that a cat can accomplish are smart, but in fact, they are.”
The biggest thing thats missing is actual feedback to their decisions. They have no "idea of that because transformers and embeddings dont model that yet. And langiage descriptions and image representations of feedback arent enough. They are too disjointed. It needs more
It's like the people who are so hyped up about voice controlled computers. Like you get a linear stream of symbols is a huge downgrade in signals, right? I don't want computer interaction to be yet more simplified and worsened.
Compare with domain experts who do real, complicated work with computers, like animators, 3D modelers, CAD, etc. A mouse with six degrees of freedom, and a strong training in hotkeys to command actions and modes, and a good mental model of how everything is working, and these people are dramatically more productive at manipulating data than anyone else.
Imagine trying to talk a computer through nudging a bunch of vertexes through 3D space while flexibly managing modes of "drag" on connected vertexes. It would be terrible. And no, you would not replace that with a sentence of "Bot, I want you to nudge out the elbow of that model" because that does NOT do the same thing at all. An expert being able to fluidly make their idea reality in real time is just not even remotely close to the instead "Project Manager/mediocre implementer" relationship you get prompting any sort of generative model. The models aren't even built to contain specific "Style", so they certainly won't be opinionated enough to have artistic vision, and a strong understanding of what does and does not work in the right context, or how to navigate "My boss wants something stupid that doesn't work and he's a dumb person so how do I convince him to stop the dumb idea and make him think that was his idea?"
https://en.wikipedia.org/wiki/Moravec%27s_paradox
All the things we look at as "Smart" seem to be the things we struggle with, not what is objectively difficult, if that can even be defined.
The density of information in the spatiotemporal world is very very great, and a technique is needed to compress that down effectively. JEPAs are a promising technique towards that direction, but if you're not reconstructing text or images, it's a bit harder for humans to immediately grok whether the model is learning something effectively.
I think that very soon we will see JEPA based language models, but their key domain may very well be in robotics where machines really need to experience and reason about the physical the world differently than a purely text based world.
I assume that when you get out of bed in the morning, the first thing you dont do is paint 1000 1080p pictures of what your breakfast looks like.
LeCunns models predict purely in representation space and output no pixel scale detailed frames. Instead you train a model to generate a dower dimension representation of the same thing from different views, penalizing if the representation is different ehen looking at the same thing
I don't think it makes sense conceptually unless you're literally referring to discovering new physical things like elements or something.
Humans are remixers of ideas. That's all we do all the time. Our thoughts and actions are dictated by our environment and memories; everything must necessarily be built up from pre-existing parts.
You can't get Suno to do anything that's not in its training data. It is physically incapable of inventing a new musical genre. No matter how detailed the instructions you give it, and even if you cheat and provide it with actual MP3 examples of what you want it to create, it is impossible.
The same goes for LLMs and invention generally, which is why they've made no important scientific discoveries.
You can learn a lot by playing with Suno.
Einstein’s theory of relativity springs to mind, which is deeply counter-intuitive and relies on the interaction of forces unknowable to our basic Newtonian senses.
There’s an argument that it’s all turtles (someone told him about universes, he read about gravity, etc), but there are novel maths and novel types of math that arise around and for such theories which would indicate an objective positive expansion of understanding and concept volume.
Using the term autoregressive models instead might help.
What current LLMs lack is inner motivation to create something on their own without being prompted. To think in their free time (whatever that means for batch, on demand processing), to reflect and learn, eventually to self modify.
I have a simple brain, limited knowledge, limited attention span, limited context memory. Yet I create stuff based what I see, read online. Nothing special, sometimes more based on someone else's project, sometimes on my own ideas which I have no doubt aren't that unique among 8 billions of other people. Yet consulting with AI provides me with more ideas applicable to my current vision of what I want to achieve. Sure it's mostly based on generally known (not always known to me) good practices. But my thoughts are the same way, only more limited by what I have slowly learned so far in my life.
The problem is, idk if we're ready to have millions of distinct, evolving, self-executing models running wild without guardrails. It seems like a contradiction: you can't achieve true cognition from a machine while artificially restricting its boundaries, and you can't lift the boundaries without impacting safety.
Virtual simulations are not substitutable for the physical world. They are fundamentally different theory problems that have almost no overlap in applicability. You could in principle create a simulation with the same mathematical properties as the physical world but no one has ever done that. I'm not sure if we even know how.
Physical world dynamics are metastable and non-linear at every resolution. The models we do build are created from sparse irregular samples with large error rates; you often have to do complex inference to know if a piece of data even represents something real. All of this largely breaks the assumptions of our tidy sampling theorems in mathematics. The problem of physical world inference has been studied for a couple decades in the defense and mapping industries; we already have a pretty good understanding of why LLM-style AI is uniquely bad at inference in this domain, and it mostly comes down to the architectural inability to represent it.
Grounded estimates of the minimum quantity of training data required to build a reliable model of physical world dynamics, given the above properties, is many exabytes. This data exists, so that is not a problem. The models will be orders of magnitude larger than current LLMs. Even if you solve the computer science and theory problems around representation so that learning and inference is efficient, few people are prepared for the scale of it.
(source: many years doing frontier R&D on these problems)
What do you mean by that? Simulating physics is a rich field, which incidentally was one of the main drivers of parallel/super computing before AI came along.
Reconstructing ground truth from these measurements, which is what you really want to train on, is a difficult open inference problem. The idiosyncratic effects induce large changes in the relationships learnable from the data model. Many measurements map to things that aren't real. How badly that non-reality can break your inference is context dependent. Because the samples are sparse and irregular, you have to constantly model the noise floor to make sure there is actually some signal in the synthesized "ground truth".
In simulated physics, there are no idiosyncratic measurement issues. Every data point is deterministic, repeatable, and well-behaved. There is also much less algorithmic information, so learning is simpler. It is a trivial problem by comparison. Using simulations to train physical world models is skipping over all the hard parts.
I've worked in HPC, including physics models. Taking a standard physics simulation and introducing representative idiosyncratic measurement seems difficult. I don't think we've ever built a physics simulation with remotely the quantity and complexity of fine structure this would require.
I'll admit I'm not very familiar with that type of work - I'm in the forward solve business - but if assumptions are made on the sensor noise distribution, couldn't those be inferred by more generic models? I realize I'm talking about adding a loop on top of an inverse problem loop, which is two steps away (just stuffing a forward solve in a loop is already not very common due to cost and engineering difficulty).
Or better yet, one could probably "primal-adjoint" this and just solve at once for physical parameters and noise model, too. They're but two differentiable things in the way of a loss function.
In the last step of training LLMs, reinforcement learning from verified rewards, LLMs are trained to maximize the probability of solving problems using their own output, depending on a reward signal akin to winning in Go. It's not just imitating human written text.
Fwiw, I agree that world models and some kind of learning from interacting with physical reality, rather than massive amounts of digitized gym environments is likely necessary for a breakthrough for AGI.
No hate, but this is just your opinion.
The definition of "text" here is extremely broad – an SVG is text, but it's also an image format. It's not incomprehensible to imagine how an AI model trained on lots of SVG "text" might build internal models to help it "visualise" SVGs in the same way you might visualise objects in your mind when you read a description of them.
The human brain only has electrical signals for IO, yet we can learn and reason about the world just fine. I don't see why the same wouldn't be possible with textual IO.
But yeah, I can't imagine that LLMs don't already have a world model in there. They have to. The internet's corpus of text may not contain enough detail to allow a LLM to differentiate between similar-looking celebrities, but it's plenty of information to allow it to create a world model of how we perceive the world. And it's a vastly more information-dense means of doing so.
Imagine that we made an LLM out of all dolphin songs ever recorded, would such LLM ever reach human level intelligence? Obviously and intuitively the answer is NO.
Your comment actually extended this observation for me sparking hope that systems consuming natural world as input might actually avoid this trap, but then I realized that tool use & learning can in fact be all that's needed for singularity while consuming raw data streams most of the time might actually be counterproductive.
It could potentially reach super-dolphin level intelligence
Dataset limitations have been well understood since the dawn of statistics-based AI, which is why these models are trained on data and RL tasks that are as wide as possible, and are assessed by generalization performance. Most of the experts in ML, even the mathematically trained ones, within the last few years acknowledge that superintelligence (under a more rigorous definition than the one here) is quite possible, even with only the current architectures. This is true even though no senior researcher in the field really wants superintelligence to be possible, hence the dozens of efforts to disprove its potential existence.
Not so fast. People have built pretty amazing thought frameworks out of a few axioms, a few bits, or a few operations in a Turing machine. Dolphin songs are probably more than enough to encode the game of life. It's just how you look at it that makes it intelligence.
I 100% guarantee that he will not be holding the bag when this fails. Society will be protecting him.
On that proviso I have zero respect for this guy.
Also there is no evidence that novel discoveries are more than remixes. This is heavily debated but from what we’ve seen so far I’m not sure I would bet against remix.
World models are great for specific kinds of RL or MPC. Yann is betting heavily on MPC, I’m not sure I agree with this as it’s currently computationally intractable at scale
Perhaps for the current implementations this is true. But the reason the current versions keep failing is that world dynamics has multiple orders of magnitude fewer degrees of freedom than the models that are tasked to learn them. We waste so much compute learning to approximate the constraints that are inherent in the world, and LeCun has been pressing the point the past few years that the models he intends to design will obviate the excess degrees of freedom to stabilize training (and constrain inference to physically plausible states).
If my assumption is true then expect Max Tegmark to be intimately involved in this new direction.
It's true, but it's also true that text is very expressive.
Programming languages (huge, formalized expressiveness), math and other formal notation, SQL, HTML, SVG, JSON/YAML, CSV, domain specific encoding ie. for DNA/protein sequences, for music, verilog/VHDL for hardware, DOT/Graphviz/Mermaid, OBJ for 3D, Terraform/Nix, Dockerfiles, git diffs/patches, URLs etc etc.
The scope is very wide and covers enough to be called generic especially if you include multi modalities that are already being blended in (images, videos, sound).
I'm cheering for Yann, hope he's right and I really like his approach to openness (hope he'll carry it over to his new company).
At the same time current architectures do exist now and do work, by far exceeding his or anybody's else expectations and continue doing so. It may also be true they're here to stay for long on text and other supported modalities as cheaper to train.
This seems wrong to me on a few levels.
First, there is no way to "experience the world directly," all experience is indirect, and language is a very good way of describing the world. If language was a bad choice or limited in some fundamental way, LLMs wouldn't work as well as they do.
Second, novel ideas are often existing ideas remixed. It's hard/impossible to point to any single idea that sprung from nowhere.
Third, you can provide an LLM with real-world information and suddenly it's "interacting with the world". If I tell an LLM about the US war on Iran, I am in a very real sense plugging it into the real world, something that isn't part of its training data.
Finally, modern LLMs are multi-modal, meaning they have the ability to handle images/video. My understanding is that they use some kind of adapter to turn non-text data into data that the LLM can make sense of.
Re 2: There's something tremendous in the fact, staring us right in the face, that LLMs are unable to meaningfully contribute to academic/medical research. I'm not saying that they need to perform on the level of a one-in-a-million Maxwell, DaVinci, or whatever. But as Dwarkesh asked one year ago: "What do you make of the fact that these things have basically the entire corpus of human knowledge memorized and they haven't been able to make a single new connection that has led to a discovery?"
Re 3: Sure, you can hold it by the hand and spoonfeed it. You can also create for it a mirror reality which doesn't exist, which is pure fiction. Given how limited these systems are, I don't suppose it makes much of a difference. There's no way for it to tell. The "human in the loop" is its interaction with the world. And a pale, meager interaction it is.
Re 4: Static, old images/video that they were trained on some months ago. That, too, is no way of interacting with the world.
It's not clear to me that this is a fundamental limitation. If you provide LLMs with a news feed, it's closer to real-time. You can incrementally get closer than that in very obvious ways.
> Re 2: There's something tremendous in the fact, staring us right in the face, that LLMs are unable to meaningfully contribute to academic/medical research. I'm not saying that they need to perform on the level of a one-in-a-million Maxwell, DaVinci, or whatever. But as Dwarkesh asked one year ago: "What do you make of the fact that these things have basically the entire corpus of human knowledge memorized and they haven't been able to make a single new connection that has led to a discovery?"
LLMs have been around for a very short time. It wouldn't surprise me if researchers have used them to make discoveries. If they haven't, they will soon. Then there's a question about attribution...if you're a researcher and you use an LLM to discover something, do you give it credit? Or is it just a tool? There's a long, long history of researchers being less than honest how they made some discovery.
> Re 3: Sure, you can hold it by the hand and spoonfeed it. You can also create for it a mirror reality which doesn't exist, which is pure fiction. Given how limited these systems are, I don't suppose it makes much of a difference. There's no way for it to tell. The "human in the loop" is its interaction with the world. And a pale, meager interaction it is.
Our perception of reality is meager too. You can easily imagine how an LLM could be "plugged in" to reality. Again nothing fundamental here.
> Re 4: Static, old images/video that they were trained on some months ago. That, too, is no way of interacting with the world.
No, you can send an LLM a video/image and it can "understand it". It's not perfect but, like I said, the technology is already here to project video data into something the LLMs can interact with.
If that's what you're experiencing, then you're not asking them the right questions.
If you're at the edge of your field so you're able to judge whether something is novel or not, and you have a direction you'd like the LLM to explore, just ask it. Prompt it to come up with some ideas of how to solve X, or categorize Y, or analyze Z. Encourage it to take ideas from, or find parallels in, closely related or distantly related fields.
You will probably quickly find yourself with a ton of new ideas, of varying quality, in the same way as if you were brainstorming with a colleague.
But they don't work "solo". They need to you guide the conversation. But when you do, they're chock-full of new ideas and connections and discoveries. But again -- just like with people, the quality varies. If you're looking for a good startup idea, you need to sift through hundreds. Similarly if you're looking for an idea of a paper you could publish, there are a lot of hypotheses to sift through. And you're supplying your own expert "good taste" to try to determine what's worth pursuing and developing further, etc.
LLMs don't just magically come up with new proven discoveries unprompted. But they turn out to be fantastic research and idea-generation partners. They excel at combining existing related-but-distant facts and models and interpretations in novel ways.
As the other commenter pointed out, this is 1B seed.
Top tier scientists aren't gonna be swayed by European state retirement systems.
Europe again missing out, until AMI reaches a much higher valuation with an obvious use case in robotics.
Either AMI reaches over $100B+ valuation (likely) or it becomes a Thinking Machines Lab with investors questioning its valuation. (very unlikely since world models has a use-case in vision and robotics)
I can't read the article, but American investors investing into European companies, isn't US the one missing out here? Or does "Europe" "win" when European investors invest in US companies? How does that work in your head?
Why would the US miss out here? The US invests in something = the US owns part of something.
This isn't a zero sum game.
Personally I don't believe anyone is missing out on anything here.
But rvz earlier claimed that Europe is missing out, because US investors are investing in a European company. That's kind of surprising to me, so asking if they also believe that the US is "missing out" whenever European investors invest in US companies, or if that sentiment only goes one way.
Of course, each relevant newspaper on those areas highlight that it's coming to their place, but it really seems to be distributed.
Europe in general has been tightening up their rules / taxes / laws around startups / companies especially tech and remote.
It's been less friendly. these days.
As such, They are more likely to talk about singapore news and exaggerate the claims.
Singapore isn't the Key location. From what I am seeing online, France is the major location.
Singapore is just one of the more satellite like offices. They have many offices around the world it seems.
[0]: https://www.sgpbusiness.com/company/Sph-Media-Limited
Like? Care to provide any specific examples? "Europe" is a continent composed of various countries, most of which have been doing a lot to make it easier for startups and companies in general.
Might be to be close to some of Yann's collaborators like Xavier Bresson at NUS
Almost certainly the IP will be held in Singapore for tax reasons.
> He is the Jacob T. Schwartz Professor of Computer Science at the Courant Institute of Mathematical Sciences at New York University. He served as Chief AI Scientist at Meta Platforms before leaving to work on his own startup company.
That entire sentence before the remarks about him service at Meta could have been axed, its weird to me when people compare themselves to someone else who is well known. It's the most Kanye West thing you can do. Mind you the more I read about him, the more I discovered he is in fact egotistical. Good luck having a serious engineering team with someone who is egotistical.
This is just the official name of a chair at NYU. I'm not even sure Jacob T. Schwartz is more well known than Yann LeCun
Either you have not read enough Wikipedia pages, or you have too much to complain about. (Or both.)
If you're looking to learn about JEPA, LeCun's vision document "A Path Towards Autonomous Machine Intelligence" is long but sketches out a very comprehensive vision of AI research: https://openreview.net/pdf?id=BZ5a1r-kVsf
Training JEPA models within reach, even for startups. For example, we're a 3-person startup who trained a health timeseries JEPA. There are JEPA models for computer vision and (even) for LLMs.
You don't need a $1B seed round to do interesting things here. We need more interesting, orthogonal ideas in AI. So I think it's good we're going to have a heavyweight lab in Europe alongside the US and China.
BTW, I went to your website looking for this, but didn't find your blog. I do now see that it's linked in the footer, but I was looking for it in the hamburger menu.
That said, have you considered that “Measure 100+ biomarkers with a single blood draw” combined with "heart health is a solved problem” reads a lot like Theranos?
The specific biomarkers being predicted are the ones most relevant to heart health, like cholesterol or HbA1c. These tend to be more stable from hour to hour -- they may vary on a timescale of weeks as you modify your diet or take medications.
If you think that LLMs are sufficient and RSI is imminent (<1 year), this is horrible for Europe. It is a distracting boondoggle exactly at the wrong time.
Sure LLMs are getting better and better, and at least for me more and more useful, and more and more correct. Arguably better than humans at many tasks yet terribly lacking behind in some others.
Coding wise, one of the things it does “best”, it still has many issues: For me still some of the biggest issues are still lack of initiative and lack of reliable memory. When I do use it to write code the first manifests for me by often sticking to a suboptimal yet overly complex approach quite often. And lack of memory in that I have to keep reminding it of edge cases (else it often breaks functionality), or to stop reinventing the wheel instead of using functions/classes already implemented in the project.
All that can be mitigated by careful prompting, but no matter the claim about information recall accuracy I still find that even with that information in the prompt it is quite unreliable.
And more generally the simple fact that when you talk to one the only way to “store” these memories is externally (ie not by updating the weights), is kinda like dealing with someone that can’t retain memories and has to keep writing things down to even get a small chance to cope. I get that updating the weights is possible in theory but just not practical, still.
What's still missing is the general reasoning ability to plan what to build or how to attack novel problems - how to assess the consequences of deciding to build something a given way, and I doubt that auto-regressively trained LLMs is the way to get there, but there is a huge swathe of apps that are so boilerplate in nature that this isn't the limitation.
I think that LeCun is on the right track to AGI with JEPA - hardly a unique insight, but significant to now have a well funded lab pursuing this approach. Whether they are successful, or timely, will depend if this startup executes as a blue skies research lab, or in more of an urgent engineering mode. I think at this point most of the things needed for AGI are more engineering challenges rather than what I'd consider as research problems.
You see this in construction - the capital is used for certain things and is operated by labour.
Eventually (maybe taking a lot longer than a lot of people expect and/or are hoping for) we'll achieve full human-equivalent AI, at which point you won't NEED a centaur approach - the mechanical horse will be capable of doing ALL non-physical work by itself, but that doesn't mean this is how this will actually play out. If we do end up heading for some dystopian "Soylent Green" type future where most humans are unemployed, surviving poorly on government handouts, then I expect there would eventually be riots and uprising that would push back against it. It also just doesn't work - you can't create profits without customers, and customers need money to buy what you're selling.
Part of why we may (and hopefully will) continue to see humans, from CEO on down, still working when they could be replaced with AI, is that even "AGI", which we've yet to achieve, doesn't mean human-like - it's really just focusing on intelligence. Creating an actual remote-worker replacement requires more than just automating the intelligent decision-making part of a human (the "AGI" part) - it also requires the human/social/emotional part, which will take longer, and there may not even be any desire to push for that. I think people maybe discount how much of being a successful member of a team is based around human soft skills, our ability to understand and interact with each other, not just raw intellectual capacity, and certainly at this point in time corporate success is still very much "who you know, not what you know".
Wait, we have another acronym to track. Is this the same/different than AGI and/or ASI?
Of course now we know this was delusional and it seems almost funny in retrospect. I feel the same way when I hear that 'just scale language models' suddenly created something that's true AGI, indistinguishable from human intelligence.
Whenever I see people think the model architecture matters much, I think they have a magical view of AI. Progress comes from high quality data, the models are good as they are now. Of course you can still improve the models, but you get much more upside from data, or even better - from interactive environments. The path to AGI is not based on pure thinking, it's based on scaling interaction.
To remain in the same miasma theory of disease analogy, if you think architecture is the key, then look at how humans dealt with pandemics... Black Death in the 14th century killed half of Europe, and none could think of the germ theory of disease. Think about it - it was as desperate a situation as it gets, and none had the simple spark to keep hygiene.
The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model. For example 1B users do more for an AI company than a better model, they act like human in the loop curators of LLM work.
I'm not aware that we have notably different data sources before or after transformers, so what confounding event are you suggesting transformers 'lucked' in to being contemporaneous with?
Also, why are we seeing diminishing returns if only the data matters. Are we running out of data?
The METR time-horizon benchmark shows steady exponential growth. The frontier lab revenue has been growing exponentially from basically the moment they had any revenues. (The latter has confounding factors. For example it doesn't just depend on the quality of the model but on the quality of the apps and products using the model. But the model quality is still the main component, the products seem to pop into existence the moment the necessary model capabilities exist.)
The point is that core model architectures don't just keep scaling without modification. MoE, inference-time, RAG, etc. are all modifications that aren't 'just use more data to get better results'.
It really depends what you mean by 'we'. Laymen? Maybe. But people said it was wrong at the time with perfectly good reasoning. It might not have been accessible to the average person, but that's hardly to say that only hindsight could reveal the correct answer.
It's only with hindsight that we think contagionism is obviously correct.
I'm on the contrary believe that the hunt for better data is an attempt to climb the local hill and be stuck there without reaching the global maximum. Interactive environments are good, they can help, but it is just one of possible ways to learn about causality. Is it the best way? I don't think so, it is the easier way: just throw money at the problem and eventually you'll get something that you'll claim to be the goal you chased all this time. And yes, it will have something in it you will be able to call "causal inference" in your marketing.
But current models are notoriously difficult to teach. They eat enormous amount of training data, a human needs much less. They eat enormous amount of energy to train, a human needs much less. It means that the very approach is deficient. It should be possible to do the same with the tiny fraction of data and money.
> The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model.
Well, I learned English almost all the way to B2 by reading books. I was too lazy to use a dictionary most of the time, so it was not interactive: I didn't interact even with dictionary, I was just reading books. How many books I've read to get to B2? ~10 or so. Well, I read a lot of English in Internet too, and watched some movies. But lets multiply 10 books by 10. Strictly speaking it was not B2, I was almost completely unable to produce English and my pronunciation was not just bad, it was worse. Even now I stumble sometimes on words I cannot pronounce. Like I know the words and I mentally constructed a sentence with it, but I cannot say it, because I don't know how. So to pass B2 I spent some time practicing speech, listening and writing. And learning some stupid topic like "travel" to have a vocabulary to talk about them in length.
How many books does LLM need to consume to get to B2 in a language unknown to it? How many audio records it needs to consume? Life wouldn't be enough for me to read and/or listen so much.
If there was a human who needed to consume as much information as LLM to learn, they would be the stupidest person in all the history of the humanity.
Just because RNNs and Transformers both work with enormous datasets doesn't mean that architecture/algorithm is irrelevant, it just suggests that they share underlying primitives. But those primitives may not be the right ones for 'AGI'.
It was empirical and, though ultimately wrong, useful. Apply as you will to theories of learning.
I won't comment on Yann LeCun or his current technical strategy, but if you can avoid sunk cost fallacy and pivot nimbly I don't think it is bad for Europe at all. It is "1 billion dollars for an AI research lab", not "1 billion dollars to do X".
And even if you think the chance is zero, unless you also think there is a zero chance they will be capable of pivoting quickly, it might still be beneficial.
I think his views are largely flawed, but chances are there will still be lots of useful science coming out of it as well. Even if current architectures can achieve AGI, it does not mean there can't also be better, cheaper, more effective ways of doing the same things, and so exploring the space more broadly can still be of significant value.
I believe he didn't think that reasoning/CoT would work well or scale like it has
What’s different about investing in this than investing in say a young researcher’s startup, or Ilya’s superintelligence? In both those cases, if a model architecture isn’t working out, I believe they will pivot. In YL’s case, I’m not sure that is true.
In that light, this bet is a bet on YL’s current view of the world. If his view is accurate, this is very good for Europe. If inaccurate, then this is sort of a nothing-burger; company will likely exit for roughly the investment amount - that money would not have gone to smaller European startups anyway - it’s a wash.
FWIW, I don’t think the original complaint about auto-regression “errors exist, errors always multiply under sequential token choice, ergo errors are endemic and this architecture sucks” is intellectually that compelling. Here: “world model errors exist, world model errors will always multiply under sequential token choice, ergo world model errors are endemic and this architecture sucks.” See what I did there?
On the other hand, we have a lot of unused training tokens in videos, I’d like very much to talk to a model with excellent ‘world’ knowledge and frontier textual capabilities, and I hope this goes well. Either way, as you say, Europe needs a frontier model company and this could be it.
If you invested in that you knew what you were getting yourself into!
Tech is ultimately a red herring as far as what's needed to keep the EU competitive. The EU has a trillion dollar hole[0] to fill if they want to replace US military presence, and current net import over 50% of their energy. Unfortunately the current situation in Iran is not helping either of these as they constrains energy further and risks requiring military intervention.
0. https://www.wsj.com/world/europe/europes-1-trillion-race-to-...
The need for a military is tightly coupled with the EU's need for energy. You can see this in the immediate impact that the war in Iran has had on Germany's natural gas prices [0]. But already unable to defend itself from Russia, EU countries are in a tough spot since they can't really afford to expend military resources defending their energy needs, and yet also don't have the energy independence to ignore these military engagements without risk. Meanwhile Russia has spend the last 4 years transition to a wartime economy and is getting hungry for expanded resource acquisition.
The world hasn't fundamentally changed since the stone age: humans need resources to survive and if there aren't enough people for those resources then violence will decide who has access the them.
0. https://tradingeconomics.com/commodity/germany-natural-gas-t...
I'm sorry, but this is just crazy talk. Russia cannot enforce its will on Ukraine, one of the poorest and most corrupt countries in Europe, with a (at time of invasion) relatively small and underequipped army. Yes it has grown through conscription, has been equipped by foreign and domestic supplies, has made some brilliant advances in tech and tactics... but when it was attacked, it was weak. And Russia lost its best troops and equipment failing to defat that.
Why would anyone think that the Russia that cannot defeat Ukraine would fare better against Poland? Let alone French warning strike nukes, or French, British, German troops and planes and what not.
As Russia’s economy has continually reshaped over the last 4 years there has been increasingly a domestic demand for war. You point out all the evidence yourself:
> Yes it has grown through conscription, has been equipped by foreign and domestic supplies, has made some brilliant advances in tech and tactics...
Russia (well its oligarchs and rulers) has increasingly benefited from perpetual war. Yes, soon it will need to switch positions to expansion to maintain its economy, but this situation in Iran presents a perfect opportunity if things play it Russia’s interests.
You also will find that if you paid any attention to European politics over the years this is a serious topic to all leaders there.
But I don’t mind if you’re not convinced, I had similar people on hacker news unconvinced Russia could sustain operations in Russia longer than a few months because they were doing so poorly… 4 years ago.
No it has not. It has a ballooning debt crisis (at different levels - regions, military contractors, banks) which will pop at some point; the budget is so unbalanced they're projecting to reduce military spending (unlikely), increase taxes, and still have a pretty heavy deficit. They've been given the gift of the Strait of Hormuz being closed, so oil and gas revenues will grow, which will definitely buy them more time. But they are running against a clock, and they cannot win in Ukraine.
> You also will find that if you paid any attention to European politics over the years this is a serious topic to all leaders there.
Yes, because Russia only responds to strength, so you need to be strong militarily to be able to dissuade them from attacking you. That doesn't mean that realistically they have a chance of winning any conflict.
My main concern with Lecunn are the amount of times he has repeatedly told people software is open source when it’s license directly violates the open source definition.
Looks like you appended the original URL to the end
Or you're using Cloudflare DNS.
Have they changed something on their end?
There is absolutely no doubt about Yann's impact on AI/ML, but he had access to many more resources in Meta, and we didn't see anything.
It could be a management issue, though, and I sincerely wish we will see more competition, but from what I quoted above, it does not seem like it.
Understanding world through videos (mentioned in the article), is just what video models have already done, and they are getting pretty good (see Seedance, Kling, Sora .. etc). So I'm not quite sure how what he proposed would work.
So I keep wondering: if his idea is really that good — and I genuinely hope it is — why hasn’t it led to anything truly groundbreaking yet? It can’t just be a matter of needing more data or more researchers. You tell me :-D
Lecun introduced backprop for deep learning back in 1989 Hinton published about contrastive divergance in next token prediction in 2002 Alexnet was 2012 Word2vec was 2013 Seq2seq was 2014 AiAYN was 2017 UnicornAI was 2019 Instructgpt was 2022
This makes alot of people think that things are just accelerating and they can be along for the ride. But its the years and years of foundational research that allows this to be done. That toll has to be paid for the successsors of LLMs to be able to reason properly and operate in the world the way humans do. That sowing wont happen as fast as the reaping did. Lecun was to plant those seeds, the others who onky was to eat the fruit dont get that they have to wait
If he still hasn’t produced anything truly meaningful after all these years at Meta, when is that supposed to happen? Yann LeCun has been at Facebook/Meta since December 2013.
Your chronological sequence is interesting, but it refers to a time when the number of researchers and the amount of compute available were a tiny fraction of what they are today.
This is naive. Like saying if backprop had any real substance, it would have had results within 10 years of its publication in 1989
> Your chronological sequence is interesting, but it refers to a time when the number of researchers and the amount of compute available were a tiny fraction of what they are today.
Again. Those resources are important. But one resource being ignored is time. Try baking a turkey at 300 for 4 hours veruss at 900 for 1 hour and see how edible each one is
Source: himself https://x.com/ylecun/status/1993840625142436160 (“I never worked on any Llama.”) and a million previous reports and tweets from him.
Quite a big contribution in practice.
That's true for 99% of the scientists, but dismissing their opinion based on them not having done world shattering / ground breaking research is probably not the way to go.
> I sincerely wish we will see more competition
I really wish we don't, science isn't markets.
> Understanding world through videos
The word "understanding" is doing a lot of heavy lifting here. I find myself prompting again and again for corrections on an image or a summary and "it" still does not "understand" and keeps doing the same thing over and over again.
But often passion and freedom to explore are often more important than resources
Is it a troll? Even if we just ignore Llama, Meta invented and released so many foundational research and open source code. I would say that the computer vision field would be years behind if Meta didn't publish some core research like DETR or MAE.
>My only contribution was to push for Llama 2 to be open sourced.
Meta absolutely has (or at least had) a word class industry AI lab and has published a ton of great work and open source models (granted their LLM open source stuff failed to keep up with chinese models in 2024/2025 ; their other open source stuff for thins like segmentation don't get enough credit though). Yann's main role was Chief AI Scientist, not any sort of product role, and as far as I can tell he did a great job building up and leading a research group within Meta.
He deserved a lot of credit for pushing Meta to very open to publishing research and open sourcing models trained on large scale data.
Just as one example, Meta (together with NYU) just published "Beyond Language Modeling: An Exploration of Multimodal Pretraining" (https://arxiv.org/pdf/2603.03276) which has a ton of large-experiment backed insights.
Yann did seem to end up with a bit of an inflated ego, but I still consider him a great research lead. Context: I did a PhD focused on AI, and Meta's group had a similar pedigree as Google AI/Deepmind as far as places to go do an internship or go to after graduation.
Creating a startup has to be about a product. When you raise 1B, investors are expecting returns, not papers.
> I wasn't criticising his scientific contribution at all, that's why I started my comment by appraising what he did.
You were criticising his output at Facebook, though, but he was in the research group at facebook, not a product group, so it seems like we did actually see lots of things?
Speaking of returns - Apple absolutely fucked Meta ads with the privacy controls, which trashed ad performance, revenue and share price. Meta turned things around using AI, with Yann as the lead researcher. Are you willing to give him credit for that? Revenue is now greater than pre-Apple-data-lockdown
[1] https://9to5mac.com/2025/08/21/meta-allegedly-bypassed-apple...
Why would Apple be complicit on this for years?
When you log into FB on any account on any device, then install FB on a new device, or even after you erase the device, they know it's you even before you log in. Because the info is tied to your Apple iCloud account.
And there's no way for users to see or delete what data other companies have stored and linked to your Apple ID via that API.
It's been like this for at least 5 years and nobody seems to care.
That would be fine if users could SEE what has been stored and DELETE it WITHOUT going through the app and trusting it to show you everything honestly.
What's even worse is that it silently persists across DEVICE reinstalls.
Erase and reset your iPhone/iPad. Sign into the same iCloud account. Reinstall FB. Your login info will still be there.
Buy a new iPhone/iPad. Sign into the same iCloud account. Reinstall FB. Your login info will still be there.
And nope, no one seems to care.
For a hot minute Meta had a top 3 LLM and open sourced the whole thing, even with LeCunn's reservations around the technology.
At the same time Meta spat out huge breakthroughs in:
- 3d model generation
- Self-supervised label-free training (DINO). Remember Alexandr Wang built a multibillion dollar company just around having people in third world countries label data, so this is a huge breakthrough.
- A whole new class of world modeling techniques (JEPAs)
- SAM (Segment anything)
If it was a breakthrough, why did Meta acquire Wang and his company? I'm genuinely curious.
Unfotunately the dude knows very little about ai or ml research. He's just another wealthy grifter.
At this point decision making at Meta is based on Zuckerberg's vibes, and i suspect the emperor has no clothes.
Or, maybe it's just hard?
Recently all papers are about LLM, it brings up fatigue.
As GPT is almost reaching its limit, new architecture could bring out new discovery.
That article is from June 2025 so may be out of date, and the definition of "seed round" is a bit fuzzy.
The giant seed round proves investors were willing to fund Mira Murati, not that the company had built anything durable.
Within months, it had already lost cofounder Andrew Tulloch to Meta, then cofounders Barret Zoph and Luke Metz plus researcher Sam Schoenholz to OpenAI; WIRED also reported that at least three other researchers left. At that point, citing it as evidence of real competitive momentum feels weak.
Liquid money rich? No.
Can get pulled for big tech packages? Also no, for most of the employees.
AFAIK, big tech didn’t aggressively poach OpenAI-like talent, they did spend 10M+ pay packages but it was for a select few research scientists. Some folks left and came but it boiled down to culture mostly.
microsoft openai is Big Tech.
Are you ok?
He has hired LeBrun to the helm as CEO.
AMI has also hired LeFunde as CFO and LeTune as head of post-training.
They’re also considering hiring LeMune as Head of Growth and LePrune to lead inference efficiency.
https://techcrunch.com/2025/12/19/yann-lecun-confirms-his-ne...
I have no chance in AI industry...
The fundamental problem with today's LLMs that will prevent them from achieving human level intelligence, and creativity, is that they are trained to predict training set continuations, which creates two very major limitations:
1) They are fundamentally a COPYING technology, not a learning or creative one. Of course, as we can see, copying in this fashion will get you an extremely long way, especially since it's deep patterns (not surface level text) being copied and recombined in novel ways. But, not all the way to AGI.
2) They are not grounded, therefore they are going to hallucinate.
The animal intelligence approach, the path to AGI, is also predictive, but what you predict is the external world, the future, not training set continuations. When your predictions are wrong (per perceptual feedback) you take this as a learning signal to update your predictions to do better next time a similar situation arises. This is fundamentally a LEARNING architecture, not a COPYING one. You are learning about the real world, not auto-regressively copying the actions that someone else took (training set continuations).
Since the animal is also acting in the external world that it is predicting, and learning about, this means that it is learning the external effects of it's own actions, i.e. it is learning how to DO things - how to achieve given outcomes. When put together with reasoning/planning, this allows it to plan a sequence of actions that should achieve a given external result ("goal").
Since the animal is predicting the real world, based on perceptual inputs from the real world, this means that it's predictions are grounded in reality, which is necessary to prevent hallucinations.
So, to come back to "world models", yes an animal intelligence/AGI built this way will learn a model of how the world works - how it evolves, and how it reacts (how to control it), but this behavioral model has little in common with the internal generative abstractions that an LLM will have learnt, and it is confusing to use the same name "world model" to refer to them both.
Models build up this big knowledge base by predicting continuations. But then their RL stage gives rewards for completing problems successfully. This requires learning and generalisation to do well, and indeed RL marked a turning point in LLM performance.
A year after RL was made to work, LLMs can now operate in agent harnesses over 100s of tool calls to complete non-trivial tasks. They can recover from their own mistakes. They can write 1000s of lines of code that works. I think it’s no longer fair to categorise LLMs as just continuation-predictors.
At the end of the day it's still copying, not learning.
RL seems to mostly only generalize in-domain. The RL-trained model may be able to generate a working C compiler, but the "logical reasoning" it had baked into it to achieve this still doesn't stop it from telling you to walk to the car wash, leaving your car at home.
There may still be more surprises coming from LLMs - ways to wring more capability out of them, as RL did, without fundamentally changing the approach, but I think we'll eventually need to adopt the animal intelligence approach of predicting the world rather than predicting training samples to achieve human-like, human-level intelligence (AGI).
I don’t know if this can reach AGI, or if that term makes any sense to begin with. But to say these models have not learnt from their RL seems a bit ludicrous. What do you think training to predict when to use different continuations is other than learning?
I would say LLM’s failure cases like failing at riddles are more akin to our own optical illusions and blind spots rather than indicative of the nature of LLMs as a whole.
I'm not sure what I wrote that made you conclude that I thought these models are not learning anything from their RL training?! Let me say it again: they are learning to steer towards reasoning steps that during training led to rewards.
The capabilities of LLMs, both with and without RL, are a bit counter-intuitive, and I think that, at least in part, comes down to the massive size of the training sets and the even more massive number of novel combinations of learnt patterns they can therefore potentially generate...
In a way it's surprising how FEW new mathematical results they've been coaxed into generating, given that they've probably encountered a huge portion of mankind's mathematical knowledge, and can potentially recombine all of these pieces in at least somewhat arbitrary ways. You might have thought that there are results A, B and C hiding away in some obscure mathematical papers that no human has previously considered to put together before (just because of the vast number of such potential combinations), that might lead to some interesting result.
If you are unsure yourself about whether LLMs are sufficient to reach AGI (meaning full human-level intelligence), then why not listen to someone like Demis Hassabis, one of the brightest and best placed people in the field to have considered this, who says the answer is "no", and that a number of major new "transformer-level" discoveries/inventions will be needed to get there.
Sure, training = learning, but the problem with LLMs is that is where it stops, other than a limited amount of ephemeral in-context learning/extrapolation.
With an LLM, learning stops post-training when it is "born" and deployed, while with an animal that's when it starts! The intelligence of an animal is a direct result of it's lifelong learning, whether that's imitation learning from parents and peers (and subsequent experimentation to refine the observed skill), or the never ending process of observation/prediction/surprise/exploration/discovery which is what allows humans to be truly creative - not just behaving in ways that are endless mashups of things they have seen and read about other humans doing (cf training set), but generating truly novel behaviors (such as creating scientific theories) based on their own directed exploration of gaps in mankind's knowledge.
Application of AGI to science and new discovery is a large part of why Hassabis defines AGI as human-equivalent intelligence, and understands what is missing, while others like Sam Altman are content to define AGI as "whatever makes us lots of money".
I am of the opinion that imagination and creativity comes from emotion, hence a machine that cannot "feel" will never be truly intelligent.
One can go ahead and ask, but you are just a lump of meat, if you can feel, then so a computer of similar structure can.
If we assume that physical reality is fundamental, then that might make sense. But what if consciousness is fundamental and reality plays on consciousness?
Then randomness, and in-turn ideas come from the attributes of the fundamental reality that we are in.
I ll try to simplify it. Imagine you having an idea that extends your life for a day. Then from all the possible worlds, in some worlds, you find yourselves living in the next day (in others you are dead). But this "idea" you had, was just one among the infinite sea of possibilities, and your consciousness inside one such world observes you having that idea and survive for a day!
If you want to create a machine that can do that, it implies that you should be a consciousness inside a world in it (because the machine cannot pick valid worlds from infinite samples, but just enables consciousness to exists such suitable worlds). So it cannot be done in our reality!
Mayyyyy be "Quantum Darwinism" is what I am trying to describe here..
How do you see emotion as being necessary for creativity?
It sure seems that things like surprise (prediction failure) driven "curiosity" and exploration (I can't predict what will happen if I do X, so let me try) are behind creativity, pushing the boundaries of knowledge and discovering something new.
Perhaps you mean artistic creativity rather than scientific, in which case we're talking about different things, but I'd agree with you since the goal of much art is to elicit an emotional response in those engaging with it.
I don't think there is anything stopping us from implementing emotions, every bit as real as our own, in some form of artificial life if we want to though. At the end of the day emotion comes down to our primitive brain releasing chemicals like adrenaline, dopamine, etc as a result of certain stimuli, the functioning of our brain/body being affected by those chemicals, and the feedback loop of us then recognizing how our brain/body is operating differently ("I feel sad/exited/afraid" etc). It's all very mechanical.
FWIW I think consciousness is also very mechanical, but it seems somewhat irrelevant to the discussion of intelligence/AGI.
I agree with you; there should be more diversity in investments in EU startups, but ¯\_(ツ)_/¯ not my money.
1) the world has become a bit too focused on LLMs (although I agree that the benefits & new horizons that LLMs bring are real). We need research on other types of models to continue.
2) I almost wrote "Europe needs some aces". Although I'm European, my attitude is not at all that one of competition. This is not a card game. What Europe DOES need is an ATTRACTIVE WORKPLACE, so that talent that is useful for AI can also find a place to work here, not only overseas!
There is DeepMind, OpenAI and Anthropic in London. Even after Brexit, London is still in Europe.
I hope they grow that office like crazy. This would be really good for Canada. We have (or have had) the AI talent here (though maybe less so overall in Montreal than in Toronto/Waterloo and Vancouver and Edmonton).
And I hope Carney is promoting the crap out of this and making it worth their while to build that office out.
I don't really do Python or large scale learning etc, so don't see a path for myself to apply there but I hope this sparks some employment growth here in Canada. Smart choice to go with bilingual Montreal.
JEPAs also strike me as being a bit more akin to human intelligence, where for example, most children are very capable of locomotion and making basic drawings, but unable to make pixel level reconstructions of mental images (!!).
One thing I want to point out is that very LeCunn type techniques demonstrating label free training such as JEAs like DINO and JEPAs have been converging on performance of models that require large amounts of labeled data.
Alexandr Wang is a billionaire who made his wealth through a data labeling company and basically kicked LeCunn out.
Overall this will be good for AI and good for open source.
The startup is Advanced Machine Intelligence Labs: https://amilabs.xyz/
but you don’t even have a product
/cape
AIs that can't smell, can't feel hunger, can't desire -- I do not think it can understand the world the way organic life does.
That said, while I 100% agree with him that LLM's won't lead to human-like intelligence (I think AGI is now an overloaded term, but Yann uses it in its original definition), I'm not fully on board with his world model strategy as the path forward.
can you please elaborate on your strategy as the path forward?
Build attention-grabbing, monetizable models that subsidize (at least in part) the run up to AGI.
Nobody is trying to one-shot AGI. They're grinding and leveling up while (1) developing core competencies around every aspect of the problem domain and (2) winning users.
I don't know if Meta is doing a good job of this, but Google, Anthropic, and OpenAI are.
Trying to go straight for the goal is risky. If the first results aren't economically viable or extremely exciting, the lab risks falling apart.
This is the exact point that Musk was publicly attacking Yann on, and it's likely the same one that Zuck pressed.
That's the point of it. You need to take more risk for different approach. Same as what OpenAI did initially.
Secondly, it's not clear that the current LLMs are a run up to AGI. That's what LeCun is betting - that the LLM labs are chasing a local maxima.
We already have PINN or physics-informed neural networks [1]. Soon we are going to have physical field computing by complex-valued network quantization or CVNN that has been recently proposed for more efficient physical AI [2].
[1] Physics-informed neural networks:
https://en.wikipedia.org/wiki/Physics-informed_neural_networ...
[2] Ultra-efficient physical field computing by complex-valued network quantization:
https://www.nature.com/articles/s41467-026-70319-0
> You're absolutely right. Only large and profitable companies can afford to do actual research. All the historically impactful industry labs (AT&T Bell Labs, IBM Research, Xerox PARC, MSR, etc) were with companies that didn't have to worry about their survival. They stopped funding ambitious research when they started losing their dominant market position.
[1] https://x.com/ylecun/status/1951854741534953687
We recently promoted the no-generated-comments rule from case law [1] to the site guidelines [2], and we're being pretty active about banning accounts that break it.
[1] https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
[2] https://news.ycombinator.com/newsguidelines.html#generated
Intelligence is simply not well-understood at a mathematical level. Like medieval engineers, we rely so heavily on experimentation in AI. We have no idea how far away from the human level we actually are. Or how far above the human level we can get. Or what, if anything, the limits of intelligence are.
A more concrete idea like “learning” has been very strongly defined and quantifiable, which is maybe why progress in a theory of learning is so much more advanced than a theory of “intelligence“.
Who is more intelligent: a politician, or a high school teacher?
What is intelligence, anyway?
https://www.scientificamerican.com/article/i-gave-chatgpt-an...
https://www.reddit.com/r/singularity/comments/1p5f0b1/gemini...
Gemini 3 Pro has an IQ of 130 now but we keep moving the goalposts and being like “not THAT intelligence, we mean this other intelligence”. I suspect, and history shows us this will be the case, that humans will judge AIs as not human and not intelligent and not needing rights way past the point where they should have rights, even when vastly superior to human intelligence.
Everyday environments are rich in tangible control interfaces (TCIs), like, light switches, appliance panels, and embedded GUIs, that are designed for humans and demand commonsense and physics reasoning, but also causal prediction and outcome verification in time and space (e.g., delayed heating, remote lights).
SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios (https://huggingface.co/papers/2511.17649)
Feedback, suggestions, and collaborators are very welcome!
They are currently estimated to be at a 5bn valuation.
He joined Facebook (now Meta) in December 2013. That's over 12 years of access to one of the largest AI labs in the world, near-unlimited compute, and some of the best researchers money can buy.
He introduced I-JEPA in 2023, nearly 3 years ago. It was supposed to represent a fundamental shift in how machines learn — moving beyond generative models toward a deeper, more structured world understanding.
And yet: I-JEPA hasn't decisively beaten existing models on any major benchmark. No Meta product uses JEPA as a core approach. The research community hasn't adopted it — the field keeps pushing on LLMs and diffusion models. There's been no "GPT moment" for JEPA, no single result that made its value obvious to everyone.
So the question becomes simple: how many years, how many resources, and how many failed proof-of-concepts does it take before we're allowed to judge whether an idea actually works?
Second, AMI Labs just secured a billion in funding, and while that's a lot of money, it's literally just a fraction of the yearly salary they are paying to Wang. Big tech companies are literally throwing tens of billions to keep doing the same thing, just on a bigger scale. Why not try something else once in a while?
$99.85 at Sigma-Aldrich
https://www.mit.edu/people/dpolicar/writing/prose/text/think...
We only think slavery is bad because have a philosophy and language to describe and evaluate the situation. It’s unlikely Ant colonies understand the concept of slavery, eunuchs, or feminism. We have the framework to understand these concepts without them we’d be oblivious to them.
1) Keep thinking continuously, as opposed to current AIs that stop functioning between prompts. 2) Have permanent memory of their previous experiences. 3) Be able to alter their own weights based on those experiences (a.k.a. learn).
You can't justify to the board the wasted money to have the android dream.
Id say probability wise we don’t create sentient like behavior for a long time (low probability) much higher is the second circumstance.
There is no such thing as real sentient AI theoretically. Our current models are only emulations of humans. Maybe in the future someone will figure out a way for computers to learn how to learn. Then maybe someone will codify computers to acquire base methodologies vs just implementing any methodology it finds in the world.
> We, and our 228 partners use cookies
And then you'll see a "reject all" button. Can't make this up.