To run Mixtral 8x7B locally at Q8_0 quantization, you need at minimum 51.7 GB of GPU VRAM.
Use a lower quantization to fit. Viable for testing at this scale.
Single-card 80GB. Industry-standard for large model inference.
State-of-the-art inference. 3× the bandwidth of A100.
Use the interactive calculator to compare Mixtral 8x7B across all available formats.
Open Live Calculator →