The AI Empiricist: Why Experimentation Trumps Authority

June 11, 2025

A friend is helping run an AI course. Lately, she's been inundated with what essentially boils down to the same question:

Which LLM model should I use?

Different people, different projects but the same hunt for certainty.

Her answer never changes:

You need to run an experiment because you don't know and I don't know.

It’s not the answer they want - but it’s the only one that works.

It’s a pattern I’ve seen often, especially with those newer to AI development. People want definitive answers but AI doesn't work like that.

This post explores how to shift from seeking certainty to building an experimental mindset - and how that shift leads to better decisions, faster progress, and ultimately better outcomes.

The Empirical Reality

If you don't know which model to use, the good news is: nobody else does either. It's not a lack of expertise. It's the nature of AI.

Context shapes everything. Your data, your constraints, your requirements - they create a performance landscape unique to you.

Generic model benchmarks exist but these tend to offer signals, not guarantees. Some patterns are clear: certain models handle images better than text. Some excel only with structured data. But beyond basic heuristics, the hierarchy breaks down.

That's why experienced practitioners default to experimentation. They don't assume optimal solutions exist - they experiment. Every project becomes a new question that needs validation.

School rewards knowing the "right" answer. AI requires asking better questions.

Why Empiricism Wins

In AI, not knowing isn't a weakness. It's your edge - if you know how to use it.

The most effective AI teams don't hide uncertainty - they operationalise it. They adopt not-knowing into their workflow and cultivate it through disciplined experimentation.

The most dangerous phrase in AI development? "Looks good to me."

Moving from opinion based decisions to evidence based ones requires deliberate practice. Here's how to build those habits systematically.

Practical Strategies for an Experimental Mindset

Building an experimental mindset is like building any habit - start small, and make it deliberate.

  • Define metrics that matter for your goals. Technical indicators should correlate with business value:
    • For example say you're building a customer service chat bot and the business goal is to reduce support ticket volume. Then the response semantic similiarity score is not the right technical indicator. Something like % of chats that escalate to human agents correlates much closer to the overarching business metric.
  • Define what "good enough" means for your use case before testing anything.
  • Build a basic, no-frills solution first. This gives you a testable anchor for comparing more complex approaches.
  • Plan your evaluation methodology deliberately. What data will you use? How will you acquire such data? This deserves dedicated planning time.
  • Use real data where possible, or synthetic samples generated from real user queries.
  • Design experiments to isolate what you're actually testing. Random experimentation teaches nothing.
  • Track why things didn't work with the same attention you give to what did. Failed experiments are compressed learning.
  • Debate with metrics, not opinions. Show the data or acknowledge it's just preference.

Conclusion

Certainty is not a viable strategy in AI. The best practitioners don't chase answers - they build better experiments.

The job of AI leaders is to foster this mindset and create the psychological safety that lets teams fail forward. This means creating space for "I don't know" during project planning. It means celebrating experiments that fail fast over assumptions that fail slowly.

My friend's answer to those questions remains the same: "You need to run an experiment because you don't know and I don't know."

It's still not the answer people want. But teams that embrace it move faster and make better decisions because of it.


© 2025 Peter Wooldridge