AMD Ryzen AI Max+ 395: Native GPT-OSS 120B Performance Explainer

Short answer: What this chip does

The AMD Ryzen AI Max+ 395 is the only consumer processor that can run OpenAI's GPT-OSS 120B model natively. It can reach about 30 tokens per second for the 120B model. That speed and the large memory pool let developers run datacenter-class models locally without cloud servers.

How is the Ryzen AI Max+ 395 built?

Key specs in plain words

CPU: 16 "Zen 5" cores for general work and model prep.
NPU: XDNA 2 NPU with 50+ peak AI TOPS to help neural workloads.
GPU: Large integrated RDNA 3.5 GPU driven by about 40 CUs.
Memory: System memory options 32GB to 128GB, with up to 96GB convertible to VRAM on Windows and up to 110GB on Linux via AMD Variable Graphics Memory.

This blend of CPU, NPU, and a big integrated GPU is what lets the chip run huge models on a single consumer box.

Why memory matters for GPT-OSS 120B

Large language models store weights and activations in GPU memory. The GGML-converted MXFP4 weights for the GPT-OSS 120B need roughly 61GB of VRAM. Most consumer GPUs top out at 24GB or 48GB and can’t fit the whole model. The Ryzen AI Max+ 395 can map up to 96GB of system memory to GPU use, enabling the 120B model to load and run natively.

Real-world benchmark summary

Benchmarks show:

GPT-OSS 120B on Ryzen AI Max+ 395: up to ~30 tokens/sec (native).
GPT-OSS 20B on RTX 3090: around ~20 tokens/sec for 20B-class models, but the RTX 3090 lacks the memory for 120B.

Sources and test notes are available from AMD and independent reviewers: Wccftech and Hardware-Corner.

What workloads are a good fit?

Use the Ryzen AI Max+ 395 when you need:

Local inference of very large LLMs like GPT-OSS 120B.
Low-latency responses without cloud round trips.
Privacy-sensitive tasks where data must stay on site.
Benchmarks and research on large models without renting datacenter GPUs.

How to load GPT-OSS 120B (high level)

Get the GGML-converted MXFP4 weights for GPT-OSS 120B.
Install an LLM runtime that supports AMD GPUs and the conversion format (see AMD developer guidance).
Enable AMD Variable Graphics Memory so the OS can assign up to 96GB of system RAM as GPU memory.
Run the model and monitor memory use. If you hit limits, lower the context size or use sharding strategies.

These steps are high level. For exact commands and config files, follow AMD's setup guides and the runtime docs linked above.

Optimization tips that help performance

Use hipBLASLt and the latest AMD drivers for best GPU math performance.
Reduce context window when you don’t need long history to lower RAM use and speed tokens/sec.
Use batching wisely: small batches lower latency; larger batches raise throughput.
Run on Linux for slightly higher memory mapping (up to 110GB) if your workflow allows it.

RTX 3090 and other NVIDIA cards — a neutral comparison

The RTX 3090 is powerful and can run models like GPT-OSS 20B at around 20 tokens/sec. But it has only 24GB of VRAM and can’t fit the full GPT-OSS 120B. NVIDIA cards offer strong raw compute, while the Ryzen AI Max+ 395 offers larger usable memory for single-machine, very-large-model work. Consider this trade-off when selecting hardware.

Benchmarks and real tests

Independent testers ran smaller LLMs and measured token speeds. For example, a Distil Llama-70B Q8 test on LM Studio showed ~3 tokens/sec on some Windows setups with the Ryzen AI Max+ 395. Results vary by model format, runtime, and settings. The key point is that GPT-OSS 120B fits on Ryzen AI Max+ 395, which other consumer GPUs can’t do.

What this means for developers and teams

Takeaway: the Ryzen AI Max+ 395 lets teams run datacenter-class LLMs locally without cloud cost or latency. If you need to run GPT-OSS 120B on a single consumer device, this is currently the practical option.

Quick checklist before you buy or test

Confirm your target model and its GGML or MXFP4 size (120B needs ~61GB VRAM).
Choose system RAM >= 64GB; 96GB or 128GB gives the best headroom.
Plan to run the latest AMD drivers and hipBLASLt-enabled runtime.
Test with smaller models and work up to 120B to confirm stability.

Final note and one clear recommendation

If you want to run GPT-OSS 120B locally, the Ryzen AI Max+ 395 is a strong, practical choice because of its large convertible memory and balanced AI hardware. For raw FLOPS and smaller-model throughput, high-end NVIDIA cards may still win. Choose the chip that fits the model sizes you need most.

For more detail, read AMD's guide: AMD guide to the Ryzen AI Max+ 395, plus testing reports from Wccftech and Hardware-Corner.