"How do you actually test a voice agent?"
This was the first question I asked when I joined ARTPARK (at IISc) 5 months back to build the infra for solving problems in public health using voice agents with my mentor and friend, Jigar.
If you have ever deployed a voice agent, you are familiar with the pain: your voice agent works perfectly in demos but falls apart in production.
Across teams, the pattern is clear:
We built Calibrate to fill this gap by applying a proven paradigm from software engineering to voice agents: unit tests + end-to-end tests.
Calibrate lets you simulate conversations using realistic user personas ("who" the user is) and scenarios ("what" the user is doing). This lets you stress-test failure modes like users interrupting the agent, being hesitant, giving partial answers and more.
You don't have to guess which provider is right for you anymore. With Calibrate, you can benchmark different providers (like Google, Sarvam, ElevenLabs and more) for each component of your voice stack (STT, TTS, LLM) on your dataset.
This creates a virtuous loop where you:
Over time, your test suite grows to capture the key failure modes.
You not only ship with confidence but ensure you never repeat a bug again.
Whether you are a PM or founder or ML engineer, Calibrate is built for you.
Start Calibrating now: calibrate.artpark.ai
Calibrate comes with a CLI too:
pip install calibrate-agent
The best part: it is open-source. Forever.
https://github.com/artpark-sahai-org/calibrate
We are early and have a long roadmap ahead. If you are building voice agents and care about the future of responsible Voice AI, let's do it together!
Discord: https://lnkd.in/gpKaY_np
WhatsApp: https://lnkd.in/gXF3w4bR