VoxSafeBench: Not Just What Is Said, but Who, How, and Where

ArXi:2604.14548v1 Announce Type: cross As speech language models (SLMs) transition from personal devices into shared, multi-user environments, their responses must account for far than the words alone. Who is speaking, how they sound, and where the conversation takes place can each turn an otherwise benign request into one that is unsafe, unfair, or privacy-violating.