Andrej Karpathy recently cautioned that we’re getting “way too excited” about fully autonomous AI agents – reminding developers to “keep AI on the leash.”1 His point is blunt and true: The challenge with advanced AI isn’t making it act; it’s making sure it acts reliably and within bounds.

The Compounding Problem

Today’s large language models can appear superhuman one moment and then make mistakes “no human ever would.”2 But the reliability problem goes much deeper than individual model errors. In enterprise settings, AI systems must navigate large, unfamiliar action spaces – complex workflows with multiple tools, databases, and decision points that weren’t part of their training. When you chain multiple AI operations together, errors compound exponentially. A small hallucination in step one becomes a major failure by step five.

This is why the traditional “prompt-and-pray” approach – tossing a complex request into an AI model and hoping for the best – breaks down in real-world applications. The more steps involved, the more likely something will go wrong. For enterprises handling multi-step processes like financial analysis, due diligence, or research synthesis, this unpredictability is a deal-breaker. If we can’t trust an AI to stay on task across a complex workflow, we can’t rely on it for business-critical work.

The key, as Karpathy argues, is constraining AI behavior – essentially putting our AI agents on a leash of logic, oversight, and guardrails.

Lego: a man walking a robot

Beyond Black Box AI: The Case for Constrained Autonomy

The solution lies in fundamentally rethinking how we architect AI systems. Rather than relying on monolithic language models to handle complex tasks end-to-end, we need hybrid architectures that combine the flexibility of neural reasoning with the reliability of symbolic control systems.

Consider a neuro-symbolic approach: probabilistic language models handle reasoning tasks within individual steps, while symbolic logic governs the overall system, enforcing structure, validating outputs against explicit requirements, and making principled decisions about resource allocation and execution paths. This architecture doesn’t constrain intelligence – it channels it.

Such systems begin complex tasks by generating structured execution plans that break work into manageable steps. They analyze task requirements and available resources, then construct logical workflows that determine which operations to perform, in what sequence, and with what computational investment. At each stage, multiple approaches are evaluated in parallel, weighing likely success against cost.

Validation happens continuously at multiple levels. Rather than checking only final outputs, each requirement is validated separately using both language models for contextual understanding and deterministic code for structural constraints. When outputs don’t meet requirements, they’re automatically corrected before proceeding. If multiple solutions are generated, the highest-scoring one that satisfies all requirements is selected.

The result is unprecedented reliability and accuracy in multi-step AI workflows. Unlike traditional AI systems that operate as black boxes, constrained architectures deliver complete transparency through structured execution plans that make every decision point auditable and debuggable. Users gain full visibility into what the agent chose to do and why, while the system validates outputs at each step to ensure accuracy throughout the entire workflow rather than hoping for the best at the end.

This comprehensive approach translates into practical control that enterprises actually need. Users can specify discrete requirements – whether for content style, output structure, safety guardrails, or domain-specific constraints – and the system validates each requirement independently, automatically attempting corrections within budget and providing detailed reports on success or failure. Meanwhile, explicit budget limitations for compute resources and latency constraints are respected as the system chooses execution paths, dynamically balancing thoroughness against efficiency. These systems don’t just produce answers; they deliver complete fulfillment reports showing exactly how each requirement was satisfied, with full transparency about the process and any corrections made along the way.

The Enterprise Imperative

This approach transforms how organizations can use AI for complex, high-stakes tasks. Take financial due diligence – a process that might require analyzing hundreds of documents, extracting specific facts, cross-verifying numbers, and presenting findings in a precise format. A traditional AI model might generate a convincing report that includes fabricated facts or overlooks crucial details.

With constrained AI architectures, we can automate such workflows with confidence. The system plans out each step (search documents, identify key facts, compile structured reports), validates every piece against explicit requirements, and delivers results with an audit trail of exactly how each requirement was met.

Companies no longer have to gamble on clever AI that might go off the rails. Instead, they can deploy constrained AI agents that earn trust through transparent, validated execution – agents that expand what’s possible while staying firmly under control.

Putting Theory into Practice

These aren’t just theoretical concepts. At AI21, we’ve implemented this philosophy in AI21 Maestro, our enterprise system for building reliable knowledge agents. The platform demonstrates that constrained autonomy isn’t a limitation – it’s liberation. By binding AI with the right architectural constraints, we free it to achieve far more in enterprise settings.

Karpathy’s leash is ultimately about accountability, not restriction. It’s exactly what’s needed for AI to become a dependable partner in business-critical work. The future belongs not to AI that acts without bounds, but to AI that acts reliably within them.

  1. https://www.businessinsider.com/openai-cofounder-andrej-karpathy-keep-ai-on-the-leash-2025-6 ↩︎
  2. ibid ↩︎