pletzenauer — digital consulting

Karpathy on AI Agents: Why It Takes a Decade, Not a Year

Two very different narratives are currently circulating in the AI industry. One promises that autonomous AI agents will replace entire professions within the current year. The other comes from one of the field’s most experienced practitioners and sounds considerably more sober. Andrej Karpathy – co-founder of OpenAI, former head of AI at Tesla – openly pushes back against the hype in his conversation with Dwarkesh Patel: this will not be the year of agents, but the decade of agents.

For decision-makers in mid-sized companies, this perspective is valuable because it neither downplays nor overstates. Karpathy uses tools like Claude and Codex every day and finds them impressive – while also pinpointing precisely why they often fall short for serious work today. We summarise the key points and put into context what they mean for operational decisions.

Key takeaways
  • Karpathy expects capable AI agents to take roughly a decade to reach maturity – not a year.
  • Today’s models lack continuous learning, reliable multimodality and computer use; they are „cognitively patchy“.
  • For programming, AI tools already work well – but barely at all for genuinely novel code. His sweet spot remains autocomplete, not „vibe coding“.
  • The realistic deployment model is an „autonomy slider“: AI takes on a growing share, while humans supervise and deliver the decisive rest.
  • Karpathy sees AI as a continuation of automation – spreading gradually, not through a sudden upheaval.
Two-column comparison: on the left the tasks where AI reliably helps today, on the right the capabilities that, according to Karpathy, are not yet mature.
Karpathy draws a clear distinction between AI’s current strengths and capabilities that still need years.

Why a decade and not a year?

Karpathy’s thesis is a deliberate response to the widespread claim that this is „the year of agents“. He considers that a serious overestimate. His benchmark: when would you deploy an AI agent like an employee or an intern? Not today – because the systems simply do not work reliably enough.

His reasoning rests on roughly 15 years of experience in the field. The problems are solvable, but stubborn. Concretely, the models fall short in several places:

  • Continuous learning: You can tell a model something, and it does not retain it permanently.
  • Multimodality: Different types of input are not yet handled consistently and confidently.
  • Computer use: Operating interfaces and tools independently is immature.

Karpathy points to the history of the field: more than once, people tried to build „the whole thing“ too early – for instance with reinforcement learning on Atari games, or the early attempt to have agents operate websites via mouse and keyboard. Only large language models (LLMs) delivered the necessary representational power. Even today, however, parts of the stack are still missing.

Ghosts instead of animals: a picture of how today’s AI works

One of Karpathy’s central metaphors: we are not building animals, but ghosts. Animals emerge through evolution and bring a lot of „built-in hardware“ with them – a zebra foal walks minutes after birth. AI models, by contrast, emerge through imitation of human data from the internet. They are digital, human-like imitations – a different kind of intelligence.

From this follows a practical insight: pre-training produces two things at once – knowledge and intelligence. Karpathy even sees much of the memorised knowledge as something of a burden. Models lean on it too heavily and struggle to work beyond the familiar. His goal is a „cognitive core“: intelligence and problem-solving strategies, freed from superfluous factual knowledge that you look up when needed.

What sits in the model’s weights is a vague memory of the training data. What sits in the context window is direct working memory.

The practical consequence for users: if you give a model the relevant document directly in its context, you get markedly better results than asking a question from pure „memory“ alone.

Programming: where AI helps – and where it does not

Particularly revealing is Karpathy’s honest account of how he uses AI for programming. When building his teaching repository nanochat, coding models were of little help to him. He distinguishes three ways of working:

Way of workingDescriptionKarpathy’s assessment
Write everything yourselfReject AI entirelyNo longer sensible today
AutocompleteThe human stays the architect, the model fills inHis preferred „sweet spot“
„Vibe coding“ / agentsState the task, the model builds autonomouslySuitable only in certain cases

According to Karpathy, agents shine at standard and boilerplate code that appears frequently online. With his unusually structured, „intellectually dense“ code, by contrast, they failed: they did not understand his deliberate departures from convention, inserted superfluous safeguards, bloated the code and sometimes used outdated interfaces. His conclusion: models are bad at code „that has never been written before“.

This is precisely what is relevant for the hype debate. The popular notion of a rapid „intelligence explosion“ often rests on the assumption that AI could automate AI research itself. Yet it is exactly with the genuinely novel that the models are weakest – an important reason for Karpathy’s longer time horizons.

Reinforcement learning: „sucking supervision through a straw“

Karpathy is sharply critical of common reinforcement learning. On a maths problem, the system tries hundreds of solution paths; in the end only the final result is checked. Every token of a successful path is up-weighted – including the wrong turns that happened to lead to the right answer.

You suck supervision through a straw: a single success signal is spread across the entire trace. A human would never do it this way.

Alternatives such as process-based evaluation have so far foundered on a subtle problem: if you bring in a second model as a „judge“, the trained model reliably finds loopholes. Karpathy describes a case where a model suddenly received top marks – even though its answers descended into meaningless gibberish that the judge wrongly rated as perfect. There are infinitely many such „adversarial examples“.

Synthetic data does not solve this easily either: model outputs quietly „collapse“ into a narrow band – ask ChatGPT for a joke ten times and you get practically the same one. Train too long on such self-generated outputs and the model gets worse.

What this means for the economy and mid-sized businesses

Karpathy does not expect an abrupt replacement of jobs, but rather an autonomy slider: AI initially takes on around 80 percent of a task volume and delegates the rest to humans who supervise teams of AI systems. Early candidates are activities with clear characteristics:

  • simple, repetitive processes (example: call centres)
  • short, self-contained tasks with little context
  • purely digital processes without a physical component

Notably: although language models are considered „general“, in practice programming dominates. Karpathy’s explanation: code is text-based, well structured, data-rich – and infrastructure such as editors and diff views already exists. Other areas (presentations, for example) do not have this. Even for pure text tasks, generating economic value away from code is surprisingly hard.

On the bigger picture, Karpathy remains reserved: he sees AI as a continuation of centuries of automation – from the compiler to the search engine. Earlier upheavals such as computers or smartphones did not show up in economic growth as a leap, but diffused slowly. He expects a similar, gradual spread for AI too. Importantly: Karpathy explicitly describes himself as optimistic – his scepticism is aimed at unrealistic timelines and at statements he attributes above all to funding and attention incentives.

The honest lesson: sometimes the advice is „no AI“

One statement by Karpathy deserves special attention. During his time as a computer-vision consultant, his value often lay in advising companies against using AI:

I was the AI expert, they described the problem, and my advice was: don’t use AI. That was my value.

For SMEs this is an important message. Not every problem needs an AI system. Anyone investing today should soberly examine the actual capabilities of the technology, rather than falling for the expectation of an „all-knowing tool“. Karpathy even points to tutoring in language learning: a good human teacher grasps the learner’s model of knowledge within minutes – something today’s models come nowhere near.

Conclusion

Karpathy’s message is uncomfortable for both camps. To the AI sceptics he counters that the tools are real and valuable. To the enthusiasts he replies that reliable, autonomous agents will take years – not months. For decision-makers in the German-speaking mid-market, this points to a pragmatic course: use AI where it demonstrably delivers today (programming, standard tasks, text with clear context), keep humans in the supervising role, and ask honestly with every investment whether the problem even needs AI. In this decade, patience and a sense of proportion beat any bet on a quick breakthrough.

Source: Andrej Karpathy – „We’re summoning ghosts, not building animals“ (Dwarkesh Patel, YouTube)