coSTAR: How We Ship AI Agents at Databricks Fast, Without Breaking Things

Original URL: https://www.databricks.com/blog/costar-how-we-ship-ai-agents-databricks-fast-without-breaking-things

Article Written: March 20, 2026

Added:

Type: project

Summary

The article discusses the coSTAR methodology developed at Databricks for building and deploying AI agents with a focus on automated testing and refinement. It highlights the transition from a slow, manual review process to a rapid, automated testing framework that significantly reduces the time to verify changes. By using MLflow and a structured approach involving scenario definitions, trace capture, and judge assessments, coSTAR enhances development velocity and confidence in the quality of AI agents. The methodology addresses the unique challenges of testing non-deterministic outputs in AI systems.

💭 Your Thoughts

best-practices coSTAR (coupled Scenario, Trace, Assess, Refine) Using LLM for Judge with prompt, and judge the production traffic data as well The agent depends on external tools and infrastructure, and those change too.

Data Problems Addressed

Cost-Effective Data Orchestration Strategies Comprehensive Grading Systems for AI Outputs Dynamic Evaluation Frameworks for AI Agents Automated Testing Frameworks for AI Agents Dynamic Judge Alignment for AI Systems

Technologies Referenced

MLflow