We launched the first AI image benchmark that measures feelings, not pixels.

Every AI image benchmark I've ever read measures the wrong thing. FID tells you how close your distribution is to ImageNet. CLIP score tells you how well your image matches its caption according to a model that was itself trained on captions. Inception Score tells you the model is confident. Human Preference Score tells you what a crowd of MTurk workers clicked on in 3 seconds. None of them tell you whether the image made anybody feel anything.