AI RESEARCH

[R] shadow APIs breaking research reproducibility (arxiv 2603.01919)

r/MachineLearning

Just read this paper auditing shadow APIs (third party services claiming to provide GPT-5/Gemini access). 187 academic papers used these services, most popular one has 5,966 citations findings are bad. performance divergence up to 47%, safety behavior completely unpredictable, 45% of fingerprint tests failed identity verification so basically a bunch of research might be built on fake model outputs this explains some weird stuff ive seen. tried reproducing results from a paper last month, used what they claimed was "gpt-4 via api". numbers were way off.