Running Whisper + speaker diarization + summarization entirely on iPhone Neural Engine — lessons from building Chatham

Built a meeting transcription app (Chatham) that runs the entire ML pipeline on-device on i - wanted to share some technical observations for anyone working on local inference. The pipeline:1. Audio capture and VAD segmentation2. Whisper-based transcription via CoreML on the Neural Engine3. Speaker diarization using embedding models4. Summarization and action item extractionKey learnings:- CoreML compiled Whisper models run surprisingly well on the Neural Engine. For English, accuracy is close to cloud Whisper-large.