AI RESEARCH
Notes from evaluating a customer support chat agent system: heuristic evaluators give false signal, retrieval bugs masquerade as LLM failures, and the cost/quality Pareto frontier is rarely where you think [D]
r/MachineLearning
•
Posting some practical findings from a structured audit of a production customer RAG system. Methodology and caveats up front.