How we score speaking when "native-like" is the wrong target - the eval rubric behind Elispeak

How we score speaking when "native-like" is the wrong target - the eval rubric behind Elispeak I build Elispeak, an AI English speaking coach. The first article in this thread covered what was technically hard. The second covered the user-profile layer that makes Eli (the tutor persona) feel like it remembers you. This one is about the piece that sits underneath both: the eval rubric that decides what "you got better today" actually means. It is the smallest, driest part of the product. It is also the part that keeps every other part honest.