
I Built an AI System That Makes 1,000 Decisions a Day. Here's Where I Drew the Line.
<p>CostGuard's proxy endpoint makes an autonomous decision on every LLM call that passes through it. It scores the response, compares it against a threshold, and either accepts or rejects in about 1 millisecond, with no human involved.</p> <p>At first that felt like the right desi
Venkata Manideep Patibandla Posted on May 25 I Built an AI System That Makes 1,000 Decisions a Day. Here's Where I Drew the Line. # ai # llm # humaninloop # opensource CostGuard's proxy endpoint makes an autonomous decision on every LLM call that passes through it. It scores the response, compares it against a threshold, and either accepts or rejects in about 1 millisecond, with no human involved. At first that felt like the right design. Fast, automated, scalable. Exactly what an LLM reliability layer should do. Then I looked at what it was actually catching and more importantly, what it was missing and I had to rethink where automation ends and human judgment needs to begin. This is what I learned building a system that sits in the hot path of production LLM pipelines, and why I now think human-in-the-loop design is an engineering decision, not just an ethical one. What CostGuard Actually Does CostGuard is an HTTP proxy that wraps your LLM calls. You route your agent's requests through it instead of directly to the provider. On every call it: Checks the provider's circuit breaker stat Makes the LLM call with a 30-second timeout Scores the response with a heuristic validity scorer (~1ms) Rejects the response and falls back to the next model if the score is below your threshold Logs cost, latency, validity score, and whether fallback was used Every one of those decisions is automated. No human is involved. At production scale that's the right call you cannot have a human reviewing every LLM response in a real-time pipeline. But the automation is only as good as what the scorer can actually detect. The Flaw I Documented in My Own README The heuristic scorer in CostGuard's /proxy endpoint works by rewarding statistical markers confidence intervals, p-values, uncertainty language and penalizing failure signals like empty outputs, error tracebacks, and refusal phrases. It catches obvious failures reliably. A model that returns an empty string, an error message, or '
📰Originally published at dev.to
Staff Writer