Circleback
Andrej Karpathy Keynote at YC AI Startup School
Action Items

Overview

    Karpathy outlined three software evolution paradigms: Software 1.0 (traditional code), Software 2.0 (neural network weights), and Software 3.0 (LLM prompts in English)

    LLMs function like 1960s-era operating systems with expensive compute requiring cloud-based timesharing, but uniquely diffuse to consumers first rather than governments/corporations

    Partial autonomy applications like Cursor and Perplexity represent the optimal approach—combining traditional interfaces with LLM integration, custom GUIs for verification, and autonomy sliders for user control

    Vibe coding democratizes programming by enabling anyone to build software using natural language, though deployment infrastructure remains complex

    Building for AI agents requires LLM-friendly documentation formats, direct API access, and tools that make digital information easily ingestible

Software evolution paradigms

    Karpathy identified three distinct software paradigms that have emerged rapidly in recent years after 70 years of relative stability

    Software 1.0 consists of traditional computer code written by humans to program computers

    Software 2.0 represents neural network weights where developers tune datasets and run optimizers rather than writing code directly

    Software 3.0 emerged with large language models where prompts written in English serve as programs that control LLMs

    Karpathy observed at Tesla that Software 2.0 (neural networks) progressively "ate through" the autopilot software stack, replacing C code with neural network capabilities

    The same pattern is occurring again with Software 3.0, where LLM-based solutions are replacing both traditional code and neural networks

    GitHub now contains significant amounts of English text interspersed with code, reflecting this paradigm shift

LLM ecosystem analogies

    Karpathy compared LLMs to utilities where labs spend capex to train models (like building electrical grids) and opex to serve intelligence via APIs with metered access

    LLMs also resemble semiconductor fabs due to massive capex requirements and centralized R&D secrets, though software's malleability makes them less defensible

    The strongest analogy positions LLMs as operating systems with closed-source providers (like Windows/macOS) and open-source alternatives (LLAMA ecosystem resembling Linux)

    LLMs function as new computers where the model serves as CPU, context windows act as memory, and the system orchestrates compute for problem-solving

    Current LLM computing resembles 1960s-era mainframes with expensive centralized compute requiring timesharing and thin client access over networks

    Personal LLM computing hasn't emerged yet due to economics, though Mac Minis show promise for batch inference workloads

    LLMs uniquely flip technology diffusion patterns—consumers adopt first (helping with cooking) while governments and corporations lag behind, opposite of historical technology adoption

LLM psychology and limitations

    Karpathy described LLMs as "stochastic simulations of people" or "people spirits" created by autoregressive transformers trained on internet text

    LLMs possess encyclopedic knowledge and memory capabilities far exceeding individual humans, similar to the autistic savant character in Rain Man who could memorize phone books

    Key cognitive deficits include hallucination, poor self-knowledge, and "jagged intelligence" where they're superhuman in some domains but make basic errors humans wouldn't

    LLMs suffer from "anterograde amnesia"—unlike human coworkers who learn organizational context over time, LLMs don't natively consolidate knowledge or develop expertise

    Context windows function as working memory that must be programmed directly, similar to protagonists in Memento and 50 First Dates whose memory resets daily

    Security limitations include gullibility, susceptibility to prompt injection, and potential data leakage

    Users must simultaneously leverage superhuman capabilities while working around significant cognitive deficits

Partial autonomy applications

    Karpathy advocated for partial autonomy apps over direct LLM interaction, using Cursor as the prime example for coding assistance

    Successful LLM apps share common features: extensive context management, orchestration of multiple LLM calls, application-specific GUIs for human verification, and autonomy sliders

    Cursor demonstrates the autonomy slider concept with tab completion (minimal autonomy), Command+K (chunk-level changes), Command+L (file-level changes), and Command+I (repo-level autonomy)

    Perplexity exemplifies similar patterns with quick search, research, and deep research modes representing different autonomy levels

    The human-AI collaboration pattern involves AI generation and human verification, requiring fast verification loops through visual GUIs rather than text-heavy interfaces

    Karpathy emphasized keeping "AI on the leash" to avoid overwhelming users with 1000-line code diffs that become verification bottlenecks

    Drawing from 5 years at Tesla working on autopilot, he noted that even after a perfect 30-minute Waymo demo in 2013, full driving autonomy remains unsolved 12 years later

    The Iron Man suit analogy illustrates the ideal balance—both augmentation tool and autonomous agent with user-controlled autonomy levels

Human-AI collaboration patterns

    The optimal collaboration model positions humans as verifiers and AIs as generators, requiring extremely fast generation-verification loops

    Two key strategies accelerate this loop: speeding up verification through visual GUIs that leverage human computer vision capabilities, and keeping AI constrained to manageable chunks

    GUIs provide "highways to your brain" since visual processing is effortless compared to reading text, making verification faster and more enjoyable

    Successful prompting requires concrete, specific instructions to increase verification success probability and avoid spinning cycles

    Karpathy's education work separates teacher course creation from student course delivery, using intermediate course artifacts to keep AI constrained to specific syllabi and progressions

    Best practices include working in small incremental chunks, focusing on single concrete tasks, and developing techniques to maintain AI focus and prevent "getting lost in the woods"

Vibe coding democratization

    Vibe coding enables anyone to program using natural language, eliminating the traditional 5-10 year learning curve for software development

    Karpathy's viral tweet about "programming computers in English" became a major meme and now has a Wikipedia page, though he couldn't predict its popularity

    Kids vibe coding represents a "gateway drug to software development" and demonstrates the positive potential of democratized programming

    Karpathy successfully built iOS apps despite not knowing Swift, and created MenuGen (menugen.app) to generate restaurant menu images

    MenuGen provides $5 in free credits but operates as a "negative revenue app" due to high AI generation costs

    The key insight: vibe coding makes the actual coding trivial (hours), but deployment infrastructure remains complex (weeks of DevOps work)

    Traditional setup processes like Google login integration require extensive manual clicking and configuration that should be automated for AI agents

Building for AI agents

    AI agents represent a new category of digital information consumers—"people spirits on the Internet" that need software infrastructure designed for them

    LLMs.txt files (similar to robots.txt) can instruct LLMs about domain content in easily readable markdown format

    Companies like Vercel and Stripe are creating LLM-specific documentation in markdown, replacing human-oriented formatting with bold text and images

    Documentation must eliminate "click" instructions and replace them with equivalent curl commands that LLM agents can execute

    Tools like git-ingest convert GitHub repos into LLM-friendly concatenated text by changing URLs from github.com to gitingest.com

    DeepWiki goes further by having AI agents analyze repos and generate comprehensive documentation pages

    Anthropic's Model Context Protocol provides a standardized way for applications to communicate directly with AI agents

    While LLMs can potentially navigate traditional interfaces through clicking, meeting them halfway with agent-friendly formats remains more efficient and cost-effective

    The long tail of software that won't adapt to AI agents will require these conversion tools, but active platforms should build native agent support

Was this useful? This helps improve our AI writing.