Business Implementation

April 10, 2026

4 min read

Building AI Agents at Hex: Workflow Challenges

At Hex, I've spent countless hours fine-tuning our AI agents to think like human data analysts. It's been a real journey, but every challenge brings us closer to our goal: creating agents that can reason accurately. First, I connect the various systems, then I test them (and sometimes I get burned). Integration and performance evaluation are crucial to avoid pitfalls like context overflow or poorly designed user interfaces. The key is user experience and contextual memory. Our aim? To reach 100% accuracy by day 90. We're not there yet, but every step counts. Join me in this adventure where technical expertise meets practical application.

Modern illustration of AI agent evolution at Hex, integration and evaluation, UX design, human reasoning challenges.

I've spent countless hours at Hex tinkering with our AI agents to make them think like human data analysts. And believe me, it's quite the challenge. First, you have to integrate the various systems, which is no small feat. Then I test these agents' performance, and sometimes, I get burned. Integration and unification of AI agents are crucial, just as thorough evaluation of their performance is. And it's not just about code and algorithms; user experience and contextual memory play a pivotal role. Our goal? To achieve 100% accuracy by day 90, even though right now we're stuck at 24% with Sonnet 4.6. It's a tricky path, but every step brings us closer to our vision. Join me, and we'll explore together what truly works and what doesn't.

Evolution of AI Agents at Hex

When I started at Hex, we were already navigating the complex world of semantic modeling. Initially, it felt like trying to get an AI to grasp what human intuition is. We began with basic semantic models to enhance our reasoning. The key here was the constant feedback loop. We tested, failed, adjusted. It was an endless cycle, but essential for progress.

But let's not kid ourselves, mimicking human analytical thinking is a colossal challenge. We're not just talking data; it's about contextual understanding. Domain expertise is the pillar that has shaped our AI capabilities. Without it, agents would be like fish out of water, unable to grasp the subtle nuances that make all the difference.

Making AI Reason Like Humans: Challenges and Solutions

Semantic modeling is the very foundation of our AI reasoning. Think of it as the skeleton on which we graft the muscles of human intuition. The problem is, AI doesn't have this innate intuition. We must bridge the gap between the rigid logic of machines and the fluidity of human intuition.

This is where context harvesting comes in. By enriching our decision-making with contextual information, we enhance the efficacy of our agents. That said, watch out for trade-offs! Accuracy and processing time are always a balancing act. Wanting it all at once is like chasing the wind.

Integration and Performance Evaluation

At Hex, integrating AI agents has become a symphony of orchestration. Initially, we had separate agents, each in its corner. But we quickly realized that unifying them brought much more efficient cohesion. We tested this with long-horizon simulations, over 90 days, to see how our agents performed over time.

We set ambitious goals: 100% success rate on questions by the end of the period. In practice, it was more complex. For instance, Sonnet 4.6 only achieved 24% correct answers. A disappointing result, but revealing of current limitations and areas for improvement.

User Experience and Interface Design

Interface design is a bit like choosing the right tools for a craftsman. We designed intuitive interfaces for technical users, ensuring that complexity wasn't a barrier. Ephemeral SQL queries became a key tool for dynamically interacting with our agents.

The impact is direct: good UX means easier adoption and increased efficiency. But beware, too much complexity can kill usability. It's a subtle balance to maintain.

Modern illustration of user experience and interface design with geometric shapes, subtle gradients, indigo and violet colors. — UX/UI design at Hex: Balancing complexity and usability.

Future of AI Agents: Long-Horizon Evaluations

The future of AI agents is promising, but fraught with challenges. To prepare them for future challenges, we are betting on long-term evaluations. Memory and context play a crucial role in sustained performance.

We project exciting developments and potential breakthroughs, but we must also anticipate future trade-offs. As always, the road is paved with uncertainties, but also with fascinating opportunities.

Modern illustration of future AI agents with long-horizon evaluations, featuring memory and context, in indigo and violet tones. — Preparing future AI agents at Hex.

Building AI agents that think like humans is no small feat, but at Hex, we've found it incredibly rewarding. Here's what I've learned from being in the thick of it:

First, always start with a strong foundation in semantic modeling. Without this, your agent won't understand squat.
Next, user experience is key. If your agent's a mess, no one will use it.
Integrating and unifying different agents is a headache, but when it works, it's really worth it.
And when it comes to testing, an agent should nail 100% of the questions by day 90. We've seen Sonnet 4.6 only hit 24% by then, so there's work to be done.

Looking ahead, I'm confident that by refining our techniques, we're on the way to making these agents true game changers. But be cautious, it requires continuous iteration and keeping a close watch on the end user.

I highly recommend checking out the original video "How Hex Builds AI Agents" on YouTube. Izzy Miller offers valuable insights, and it's a must-watch for anyone diving into AI agent development.

Frequently Asked Questions

Challenges include semantic modeling, context harvesting, and intuitive logic.

Hex unifies AI agents through long-horizon simulations and feedback loops.

UX is crucial for agent adoption and efficiency, requiring a balance between complexity and user-friendliness.

It's a method to test agent performance over extended periods, anticipating future challenges.

By integrating continuous feedback loops and rigorous performance evaluation.

Thibault Le Balier

Co-fondateur & CTO

Coming from the tech startup ecosystem, Thibault has developed expertise in AI solution architecture that he now puts at the service of large companies (Atos, BNP Paribas, beta.gouv). He works on two axes: mastering AI deployments (local LLMs, MCP security) and optimizing inference costs (offloading, compression, token management).

Discover more articles on similar topics

Business Implementation

Agentic Engineering: Collaborate with AI

I remember when I first started integrating AI tools into my workflow. It was like discovering a new continent. But the trick wasn't just using AI; it was working with it. That's where agentic engineering comes into play. Today, collaborating with AI goes beyond automation. It's about forging a true partnership with technology. In this article, I'll share how I and other engineers are making this shift—integrating AI models into our development processes, managing context effectively, and configuring AI agents that adapt to our needs. We're no longer passive users; we're active orchestrators. Ready to explore this new frontier?

Open Source Projects

Integrating LangChain with Arcade: A Practical Guide

I dove headfirst into integrating LangChain with Arcade, and let me tell you, the capabilities are game-changing. But, like any powerful tool, it’s all about how you set it up and use it. With over 7,500 Arcade.dev tools now available in LangSmith Fleet, the opportunities for creating AI agents with natural language are unprecedented. However, you need to orchestrate these tools wisely to avoid pitfalls. In this guide, I'll show you how to get the most out of this integration, with concrete examples like using Reddit and Google Docs. And importantly, I'll discuss the challenges of security and reliability in production environments, as well as just-in-time authorization with Arcade. In short, a comprehensive overview for those looking to maximize the impact of their AI projects.

Open Source Projects

Meta's Muse Spark: The Lama 4 Successor

I was knee-deep in AI model development when Meta dropped Muse Spark. This new model isn't just the next step after Lama 4—it's a real leap forward. We're breaking down what sets it apart, from its performance to its future open-sourcing plans. Meta's hefty investment, including the acquisition of Manas, shows they're serious, and Muse Spark is proof of that. It's a powerhouse tool, but watch out for proprietary limits. Stick with me, I'll walk you through it all.

Open Source Projects

AI Auto-Evolution: Towards Autonomy

I remember the first time I saw an AI tweak its own code. It was like watching a child learn to walk—thrilling and a bit terrifying. In this article, I'm diving into the world of AI self-improvement, where machines aren't just executing tasks but redefining their capabilities. With AI systems now capable of modifying their own source code, we're witnessing a shift in software evolution. This isn't just a theoretical leap; it's a practical reality impacting industries like e-commerce and automotive. Discover how this AI auto-evolution is transforming key players like Shopify, Stripe, and Tesla, and what it means for the future of AI-driven development.

Business Implementation

Software Development: Fast Coding, But at What Cost?

I've been coding at breakneck speed, but over time, I learned that the real cost isn't just about how fast you type. Software development is a whole different game. You can code 55 times faster, but if you're not careful, costs can skyrocket. Let's dive into why coding is cheap, but software is expensive. We'll explore the importance of being well-rounded, the role of junior engineers and AI, and why continuous experimentation is key. It's in understanding the value beyond mere lines of code that successful projects are distinguished from costly failures.

Building AI Agents at Hex: Workflow Challenges

Evolution of AI Agents at Hex

Making AI Reason Like Humans: Challenges and Solutions

Integration and Performance Evaluation

User Experience and Interface Design

Future of AI Agents: Long-Horizon Evaluations

Frequently Asked Questions

What are the challenges of making AI agents reason like human analysts?

How does Hex integrate different AI agents?

What is the role of user experience in AI agent development?

What is long-horizon evaluation for AI agents?

How does Hex improve AI agent performance?

Thibault Le Balier

Related Articles

Agentic Engineering: Collaborate with AI

Integrating LangChain with Arcade: A Practical Guide

Meta's Muse Spark: The Lama 4 Successor

AI Auto-Evolution: Towards Autonomy

Software Development: Fast Coding, But at What Cost?