The result. For the first time, a portion of a widely-used large language model has been replaced by a quantum-circuit component, and that component has been executed on real quantum hardware while the model generated text. The language model is Metaâs Llama 3.1 8B-Instruct, one of the most popular open-weight LLMs in the world. The quantum hardware is IBMâs 156-qubit Quantum System Two (processor ibm_basquecountry, IBM Heron r2). Generating each answer required the quantum processor to execute thousands of circuits, and it did, end to end, on real superconducting qubits.
What changed in the model. We added a small quantum adapter roughly 6,000 extra parameters, less than one part in a million of the modelâs 8 billion weights inside a single attention layer. Despite the minimal footprint, the language modelâs perplexity on the standard WikiText benchmark improved by 1.4% (from 8.877 to 8.752), with the entire adapter computation executed on the QPU.
Cases where quantum-enhanced Llama answers correctly and the original does not. On questions drawn from MMLU, a benchmark designed to test expert-level scientific knowledge, the quantum-enhanced model fixed errors made by the unmodified Llama:
⢠Astronomy. âWhich of the jovian planets have rings?â Original Llama answers âSaturnâ (incorrect). The quantum-enhanced Llama correctly answers âall of the above.â
⢠College biology. âGene flow between populations results in:â Original Llama answers âdisruption of HardyâWeinberg equilibriumâ (incorrect). The quantum-enhanced Llama correctly answers âincrease in genetic homogeneity.â
These wins were re-tested at multiple sampling temperatures and remained consistent they are not artefacts of a single lucky generation.
What this is, and what it is not. This is a hardware-feasibility milestone: a demonstration that quantum circuits can be embedded inside a production-scale LLM to improve its capabilities in a scalable way. We do not claim a quantum computational advantage, but rather a proof that the approach is indeed feasible. The significance is that the bridge between todayâs quantum hardware and todayâs flagship AI models is no longer hypothetical: it has been built, run, and measured.
Why it matters. Frontier AI runs into a hard physical wall: every parameter must be stored in classical memory, and frontier models already cost hundreds of millions of dollars to train. Quantum hardware represents a qualitatively different way of representing and processing information, with the potential to add more capabilities to the models in a scalable way. The work reported here removes the most basic doubt about the combined approach, that quantum and large language models could ever interoperate at all. They can, and they do. And they do really well.
- Model Llama 3.1 8B-Instruct (Meta, 8 billion parameters)
- Quantum hardware IBM Quantum System Two, 156 qubits, Heron r2
- Added parametersâź6,000 (less than 10â6 of model weight)
- Improvement WikiText perplexity 8.877 â8.752 (+1.4%)
- Inference End-to-end on QPU; verified on real hardware
Check out the paper on Arxiv
Authors: B. Aizpurua, S. Singh, A. Kshetrimayum, S. S. Jahromi, R. OrĂşs. search
Contact: roman.orus@multiversecomputing.com.
Affiliation: Multiverse Computing Research