Face-to-Face: A Video Dataset for Multi-Person Interaction Modeling

ArXi:2603.14794v1 Announce Type: cross Modeling the reactive tempo of human conversation remains difficult because most audio-visual datasets portray isolated speakers delivering short monologues. We