AI RESEARCH

Predict, Don't React: Value-Based Safety Forecasting for LLM Streaming

arXiv CS.LG

ArXi:2604.03962v1 Announce Type: cross In many practical LLM deployments, a single guardrail is used for both prompt and response moderation. Prompt moderation operates on fully observed text, whereas streaming response moderation requires safety decisions to be made over partial generations. Existing text-based streaming guardrails commonly frame this output-side problem as boundary detection