Andrej Karpathy Keynote at YC AI Startup School

Overview

    Karpathy outlined three software evolution paradigms: Software 1.0 (traditional code), Software 2.0 (neural network weights), and Software 3.0 (LLM prompts in English)
    LLMs function like 1960s-era operating systems with expensive compute requiring cloud-based timesharing, but uniquely diffuse to consumers first rather than governments/corporations
    Partial autonomy applications like Cursor and Perplexity represent the optimal approach—combining traditional interfaces with LLM integration, custom GUIs for verification, and autonomy sliders for user control
    Vibe coding democratizes programming by enabling anyone to build software using natural language, though deployment infrastructure remains complex
    Building for AI agents requires LLM-friendly documentation formats, direct API access, and tools that make digital information easily ingestible

Software evolution paradigms

    Karpathy identified three distinct software paradigms that have emerged rapidly in recent years after 70 years of relative stability
    Software 1.0 consists of traditional computer code written by humans to program computers
    Software 2.0 represents neural network weights where developers tune datasets and run optimizers rather than writing code directly
    Software 3.0 emerged with large language models where prompts written in English serve as programs that control LLMs
    Karpathy observed at Tesla that Software 2.0 (neural networks) progressively "ate through" the autopilot software stack, replacing C code with neural network capabilities
    The same pattern is occurring again with Software 3.0, where LLM-based solutions are replacing both traditional code and neural networks
    GitHub now contains significant amounts of English text interspersed with code, reflecting this paradigm shift

LLM ecosystem analogies

    Karpathy compared LLMs to utilities where labs spend capex to train models (like building electrical grids) and opex to serve intelligence via APIs with metered access
    LLMs also resemble semiconductor fabs due to massive capex requirements and centralized R&D secrets, though software's malleability makes them less defensible
    The strongest analogy positions LLMs as operating systems with closed-source providers (like Windows/macOS) and open-source alternatives (LLAMA ecosystem resembling Linux)
    LLMs function as new computers where the model serves as CPU, context windows act as memory, and the system orchestrates compute for problem-solving
    Current LLM computing resembles 1960s-era mainframes with expensive centralized compute requiring timesharing and thin client access over networks
    Personal LLM computing hasn't emerged yet due to economics, though Mac Minis show promise for batch inference workloads
    LLMs uniquely flip technology diffusion patterns—consumers adopt first (helping with cooking) while governments and corporations lag behind, opposite of historical technology adoption

LLM psychology and limitations

    Karpathy described LLMs as "stochastic simulations of people" or "people spirits" created by autoregressive transformers trained on internet text
    LLMs possess encyclopedic knowledge and memory capabilities far exceeding individual humans, similar to the autistic savant character in Rain Man who could memorize phone books
    Key cognitive deficits include hallucination, poor self-knowledge, and "jagged intelligence" where they're superhuman in some domains but make basic errors humans wouldn't
    LLMs suffer from "anterograde amnesia"—unlike human coworkers who learn organizational context over time, LLMs don't natively consolidate knowledge or develop expertise
    Context windows function as working memory that must be programmed directly, similar to protagonists in Memento and 50 First Dates whose memory resets daily
    Security limitations include gullibility, susceptibility to prompt injection, and potential data leakage
    Users must simultaneously leverage superhuman capabilities while working around significant cognitive deficits

Partial autonomy applications

    Karpathy advocated for partial autonomy apps over direct LLM interaction, using Cursor as the prime example for coding assistance
    Successful LLM apps share common features: extensive context management, orchestration of multiple LLM calls, application-specific GUIs for human verification, and autonomy sliders
    Cursor demonstrates the autonomy slider concept with tab completion (minimal autonomy), Command+K (chunk-level changes), Command+L (file-level changes), and Command+I (repo-level autonomy)
    Perplexity exemplifies similar patterns with quick search, research, and deep research modes representing different autonomy levels
    The human-AI collaboration pattern involves AI generation and human verification, requiring fast verification loops through visual GUIs rather than text-heavy interfaces
    Karpathy emphasized keeping "AI on the leash" to avoid overwhelming users with 1000-line code diffs that become verification bottlenecks
    Drawing from 5 years at Tesla working on autopilot, he noted that even after a perfect 30-minute Waymo demo in 2013, full driving autonomy remains unsolved 12 years later
    The Iron Man suit analogy illustrates the ideal balance—both augmentation tool and autonomous agent with user-controlled autonomy levels

Human-AI collaboration patterns

    The optimal collaboration model positions humans as verifiers and AIs as generators, requiring extremely fast generation-verification loops
    Two key strategies accelerate this loop: speeding up verification through visual GUIs that leverage human computer vision capabilities, and keeping AI constrained to manageable chunks
    GUIs provide "highways to your brain" since visual processing is effortless compared to reading text, making verification faster and more enjoyable
    Successful prompting requires concrete, specific instructions to increase verification success probability and avoid spinning cycles
    Karpathy's education work separates teacher course creation from student course delivery, using intermediate course artifacts to keep AI constrained to specific syllabi and progressions
    Best practices include working in small incremental chunks, focusing on single concrete tasks, and developing techniques to maintain AI focus and prevent "getting lost in the woods"

Vibe coding democratization

    Vibe coding enables anyone to program using natural language, eliminating the traditional 5-10 year learning curve for software development
    Karpathy's viral tweet about "programming computers in English" became a major meme and now has a Wikipedia page, though he couldn't predict its popularity
    Kids vibe coding represents a "gateway drug to software development" and demonstrates the positive potential of democratized programming
    Karpathy successfully built iOS apps despite not knowing Swift, and created MenuGen (menugen.app) to generate restaurant menu images
    MenuGen provides $5 in free credits but operates as a "negative revenue app" due to high AI generation costs
    The key insight: vibe coding makes the actual coding trivial (hours), but deployment infrastructure remains complex (weeks of DevOps work)
    Traditional setup processes like Google login integration require extensive manual clicking and configuration that should be automated for AI agents

Building for AI agents

    AI agents represent a new category of digital information consumers—"people spirits on the Internet" that need software infrastructure designed for them
    LLMs.txt files (similar to robots.txt) can instruct LLMs about domain content in easily readable markdown format
    Companies like Vercel and Stripe are creating LLM-specific documentation in markdown, replacing human-oriented formatting with bold text and images
    Documentation must eliminate "click" instructions and replace them with equivalent curl commands that LLM agents can execute
    Tools like git-ingest convert GitHub repos into LLM-friendly concatenated text by changing URLs from github.com to gitingest.com
    DeepWiki goes further by having AI agents analyze repos and generate comprehensive documentation pages
    Anthropic's Model Context Protocol provides a standardized way for applications to communicate directly with AI agents
    While LLMs can potentially navigate traditional interfaces through clicking, meeting them halfway with agent-friendly formats remains more efficient and cost-effective
    The long tail of software that won't adapt to AI agents will require these conversion tools, but active platforms should build native agent support
Private notes
Add your own private notes…