AI RESEARCH

CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models

arXiv CS.AI

ArXi:2604.05755v1 Announce Type: cross In today's software architecture, large language models (LLMs) serve as software architecture co-pilots. However, no benchmark currently exists to evaluate large language models' actual understanding of cloud-native software architecture. For this reason we present a benchmark called CAKE, which consists of 188 expert-validated questions covering four cognitive levels of Bloom's revised taxonomy -- recall, analyze, design, and implement -- and five cloud-native topics.