Business Implementation

May 6, 2026

4 min read

Managing 300M Agent Runs with LangSmith: Clay

I still remember the first time we hit 300 million agent runs in a month at Clay. It was exhilarating but a logistical nightmare. Yet, we orchestrated this massive operation with LangSmith. Every day, we juggle AI integration and cost optimization while maintaining impeccable quality. LangSmith is our ally, handling everything from agent orchestration to cost reconciliation. This isn't theoretical; it's our daily grind. If you're wondering how we manage at this scale, the answer lies in our ability to tweak every detail, anticipate errors, and always aim higher.

Modern illustration on Clay's AI integration, LangSmith's agent management, quality, cost, and AI scaling challenges, featuring geometric shapes.

I remember the first time we hit 300 million agent runs in a month. It was exhilarating but also a logistical nightmare. Here's how we orchestrate this massive operation with LangSmith. At Clay, our AI operations are a beast of their own—scaling, managing, and optimizing agent runs at an unprecedented scale. This isn't just theory; it's daily practice, and LangSmith is at the heart of it. We're talking large-scale AI integration, agent management, quality, throughput, and cost in AI development. Not to mention model agnosticism and using the Metaprompter tool to refine our agents. From production insights to zero to one building, to cost reconciliation and observability—every detail matters. You'll also discover the challenges of scaling and the future of long-running and self-healing agents. So, let's dive into the behind-the-scenes of this well-oiled machine.

Setting the Stage: Clay's AI Integration

At Clay, AI isn't just an add-on; it's the core of how we approach growth transformation. We launched Claygent, our AI web research agent, around mid-2023. With 300 million agent runs a month, it requires robust infrastructure. Each agent run includes between 10 to 30 steps on average, showcasing the complexity and depth of our processes.

Modern illustration of LangSmith's agent management, highlighting impact on production insights and efficiency. — Illustration of LangSmith's agent management.

But launching such a volume of agents isn't without challenges. Right from the start, managing agent runs proved to be complex without the right tools. This is where LangSmith comes in, allowing us to streamline operations effectively.

"LangSmith transformed our operations by providing unparalleled visibility and insights."

Key metrics here are essential: 300 million monthly runs, each with multiple critical steps. Without LangSmith, maintaining such a cadence while ensuring consistent quality would be nearly impossible.

LangSmith's Role in Agent Management

LangSmith doesn't just manage our agents; it revolutionizes production insights and efficiency. Its model-agnostic approach and Metaprompter tool allow us to easily adapt our agents to various model providers, crucial in a rapidly evolving environment.

A typical workflow with LangSmith begins with agent setup, followed by execution. It's a collaborative process involving 25 to 50 team members, each contributing expertise to optimize performance.

Metaprompter: Enables easy adaptation to different models.
Flexibility: LangSmith is agnostic, facilitating seamless integration.
Collaboration: Involves many members for enriched insights.

Quality, Throughput, and Cost: The Balancing Act

Ensuring quality while increasing agent run cadence is a balancing act. At Clay, we've optimized our throughput strategies to maximize efficiency. A standout feature of LangSmith is its ability to achieve 99.5% accuracy in cost reconciliation, essential for keeping a tight budget without sacrificing quality.

Modern illustration of balancing quality, throughput, and cost in AI, featuring geometric shapes and indigo-violet gradients. — Balancing quality, throughput, and cost is crucial.

To balance cost and performance, we must constantly monitor hidden costs. For example, an overload of the tool can lead to cost explosions, making it crucial to closely monitor metrics.

Throughput Optimization: Strategies to maximize efficiency.
Cost Reconciliation: 99.5% accuracy with LangSmith.
Hidden Costs: Monitor to avoid surprises.

Challenges and Strategies for Scaling AI Agents

Scaling AI agents presents unique challenges. One key strategy is implementing long-running and self-healing agents. We've learned that observability is crucial for tracking and optimizing performance. Moving from zero to one, then to hundreds of thousands of runs, requires constant adjustment of our practices.

Lessons from this experience include the need not to rush into short-term solutions that can harm long-term performance. By combining tools like Optimizing AI Agents: Challenges and Solutions, we've been able to make continuous improvements.

Self-Healing Agents: For sustainable performance.
Observability: Track and adjust in real-time.
Continuous Learning: Avoid rushing into temporary fixes.

The Future of AI Agents: Long-Running and Self-Healing

Looking to the future, long-running and self-healing AI agents represent a major advancement. These agents can operate autonomously, adapting to changes without human intervention. Integrating these features into our workflow is a key goal.

Modern illustration of the future of AI agents, highlighting concepts of self-healing and long-running agents with geometric shapes and subtle gradients. — Future of AI agents: self-healing and durability.

LangSmith will play a crucial role in these future developments, helping us anticipate and prepare for emerging challenges. We must be ready to face potential obstacles while continuing to innovate in this exciting field.

Long-Running Agents: Autonomous and adaptive operation.
Role of LangSmith: Key for future developments.
Preparing for Challenges: Anticipate and adapt.

Managing AI at scale is no small feat, but with the right tools like LangSmith, Clay's handling 300 million agent runs a month—it's doable. First, I integrated LangSmith to orchestrate our agents efficiently. This allowed me to optimize quality and throughput while keeping costs in check. Then, I leveraged the Metaprompter tool to maintain model agility, letting us stay model-agnostic. But watch out, you need to fine-tune the number of steps—10 to 30 per agent run—to avoid cost overruns. Looking ahead, I see real potential in managing AI operations by starting small, iterating, and scaling confidently. I recommend watching the original video 'How Clay manages 300M agent runs a month with LangSmith' for a deeper dive—this isn't just theory, it's actionable insights. YouTube link

Frequently Asked Questions

Clay uses LangSmith to orchestrate and optimize massive AI agent operations.

LangSmith provides production insights and facilitates agent agility.

It allows Clay to use various models without being tied to a single provider.

Challenges include managing costs, quality, and throughput optimization.

Clay is exploring long-running and self-healing agents for the future.

Thibault Le Balier

Co-fondateur & CTO

Coming from the tech startup ecosystem, Thibault has developed expertise in AI solution architecture that he now puts at the service of large companies (Atos, BNP Paribas, beta.gouv). He works on two axes: mastering AI deployments (local LLMs, MCP security) and optimizing inference costs (offloading, compression, token management).

Discover more articles on similar topics

Business Implementation

Optimizing AI Agents: Challenges and Solutions

I've been knee-deep in AI agents, wrestling with their intricacies and harnessing their potential. Dive into how I tackled the challenges of integrating AI for real business value. As I explore the evolution of AI agents, their applications, and effective enterprise management, I'm sharing my hands-on experiences. From institutional knowledge management to building MCP servers and a context-driven approach, I'll guide you through optimizing AI agents. Remember: only 20% of your documentation is truly useful, so let's make every word count.

Business Implementation

Boosting Agents with Supabase Skills: Our Approach

I spent two months knee-deep in Supabase, crafting skills for our AI agents. Let me walk you through how we made them not just good, but actually effective. In this article, I dive into our approach to enhancing agent experience with Supabase, dissecting the structure and components of skills, and comparing them to MCP tools. We've leveraged evaluations to test agent behavior, not to mention the pivotal role OpenAI’s framework played. From RLS in Postgres to deploying in production — each step came with its hurdles. I’ll explain how I orchestrated all of this and, importantly, what I wish I'd known earlier.

Business Implementation

Human-in-the-Loop with n8n: Practical Integration

I dove into n8n and NAD to streamline my workflows, and let me tell you, it's been a game changer. But watch out, every tool has its quirks and limits. In this article, I'll show you how I integrate human-in-the-loop automation using these platforms. Automation isn't just about machines doing all the work. Sometimes you need a human touch to guide the process. That's where human-in-the-loop automation comes in, especially when using platforms like n8n and NAD. We'll explore API integrations, error management, and how to juggle AI agents in your workflows.

Business Implementation

Integrating OpenClaw: Optimizing Daily Life

I handed over the keys to my life to an AI agent. Sounds risky, right? Yet, integrating OpenClaw transformed my daily routine in remarkable ways. Picture managing a 3,000-page Obsidian knowledge base, with tasks kicking off at 4 a.m. I’m sharing how I optimized my life with AI, from data management to building reliable routines. Fixing a Netflix payment failure in five minutes made me realize the potential of these tools. But beware, you need to filter and prioritize information, and handle sometimes brittle automations. Ultimately, it's a fascinating journey toward an AI-optimized life.

Business Implementation

Software for Agents: Designing for the Future

I still remember the moment I realized the next trillion internet users wouldn't be human—it would be AI agents. It hit me during a conference and shifted my whole approach to software design. Gone are the days when human-centric design was enough. Now, it's the era of agents, which means we need to rethink everything—from the interfaces we use to the market opportunities for startups. Machine-readable interfaces like APIs, MCPs, and CLIs are becoming crucial, and clear documentation is no longer optional. If you want to stay ahead, now's the time to pivot and think agent-first. Otherwise, you risk falling behind in this digital revolution.

Managing 300M Agent Runs with LangSmith: Clay

Setting the Stage: Clay's AI Integration

LangSmith's Role in Agent Management

Quality, Throughput, and Cost: The Balancing Act

Challenges and Strategies for Scaling AI Agents

The Future of AI Agents: Long-Running and Self-Healing

Frequently Asked Questions

How does Clay manage 300 million agent runs per month?

What is LangSmith's role in agent management?

What is model agnosticism in the context of LangSmith?

What are the challenges of scaling AI agents?

How does Clay foresee the future of AI agents?

Thibault Le Balier

Related Articles

Optimizing AI Agents: Challenges and Solutions

Boosting Agents with Supabase Skills: Our Approach

Human-in-the-Loop with n8n: Practical Integration

Integrating OpenClaw: Optimizing Daily Life

Software for Agents: Designing for the Future