Talking to a Quantum Computer: Quantum Hardware Inside a Production Large Language Model

The result. For the first time, a portion of a widely-used large language model has been replaced by a quantum-circuit component, and that component has been executed on real quantum hardware while the model generated text. The language model is Meta’s Llama 3.1 8B-Instruct, one of the most popular open-weight LLMs in the world. The quantum hardware is IBM’s 156-qubit Quantum System Two (processor ibm_basquecountry, IBM Heron r2). Generating each answer required the quantum processor to execute thousands of circuits, and it did, end to end, on real superconducting qubits.

What changed in the model. We added a small quantum adapter roughly 6,000 extra parameters, less than one part in a million of the model’s 8 billion weights inside a single attention layer. Despite the minimal footprint, the language model’s perplexity on the standard WikiText benchmark improved by 1.4% (from 8.877 to 8.752), with the entire adapter computation executed on the QPU.

Cases where quantum-enhanced Llama answers correctly and the original does not. On questions drawn from MMLU, a benchmark designed to test expert-level scientific knowledge, the quantum-enhanced model fixed errors made by the unmodified Llama:

• Astronomy. “Which of the jovian planets have rings?” Original Llama answers “Saturn” (incorrect). The quantum-enhanced Llama correctly answers “all of the above.”

• College biology. “Gene flow between populations results in:” Original Llama answers “disruption of Hardy–Weinberg equilibrium” (incorrect). The quantum-enhanced Llama correctly answers “increase in genetic homogeneity.”

These wins were re-tested at multiple sampling temperatures and remained consistent they are not artefacts of a single lucky generation.

What this is, and what it is not. This is a hardware-feasibility milestone: a demonstration that quantum circuits can be embedded inside a production-scale LLM to improve its capabilities in a scalable way. We do not claim a quantum computational advantage, but rather a proof that the approach is indeed feasible. The significance is that the bridge between today’s quantum hardware and today’s flagship AI models is no longer hypothetical: it has been built, run, and measured.

Why it matters. Frontier AI runs into a hard physical wall: every parameter must be stored in classical memory, and frontier models already cost hundreds of millions of dollars to train. Quantum hardware represents a qualitatively different way of representing and processing information, with the potential to add more capabilities to the models in a scalable way. The work reported here removes the most basic doubt about the combined approach, that quantum and large language models could ever interoperate at all. They can, and they do. And they do really well.

Model Llama 3.1 8B-Instruct (Meta, 8 billion parameters)
Quantum hardware IBM Quantum System Two, 156 qubits, Heron r2
Added parameters∼6,000 (less than 10−6 of model weight)
Improvement WikiText perplexity 8.877 →8.752 (+1.4%)
Inference End-to-end on QPU; verified on real hardware

Check out the paper on Arxiv

Authors: B. Aizpurua, S. Singh, A. Kshetrimayum, S. S. Jahromi, R. Orús. search

Contact: roman.orus@multiversecomputing.com.

Affiliation: Multiverse Computing Research