To run Qwen2.5 72B locally at Q3_K_M quantization, you need at minimum 56.6 GB of GPU VRAM.
Use a lower quantization to fit. Viable for testing at this scale.
Single-card 80GB. Industry-standard for large model inference.
State-of-the-art inference. 3× the bandwidth of A100.
Use the interactive calculator to compare Qwen2.5 72B across all available formats.
Open Live Calculator →