[Benchmark] 5090RTX: Promt Parsing, Token Generation and Power Level
r/LocalLLaMA
•
Generative AI
Open Source AI
AI Research
Inspired by I've decided to put my 5090 to test and see how do the curves look like for the device and whether there were any obvious sweet spots (apart from setting it to minimum 400w). Graphs and outcomes: Inputs: Backend: llama.cpp in a docker container, FA on, batch 2048, max context 122k. Model: Quant: Q6_K_P Hardware: Threadripper 6970, 2 channel RAM 64GB, 5090RTX Prompt: 30k prompt composed of 3 x 10k copies of the same benchmark for heavy reasoning, math and computations, can present upon request - was generated by QWEN 3.6 specifically for benchmarking.