April 11, 2025

Multiverse Computing Compresses Llama 3.1-8B and Llama 3.3-70B By 80% With Almost No Precision Loss

Thumbnail

DONOSTIA, Spain, April 08, 2025 -- Multiverse Computing, the leader in AI model compression, today released two new AI compressed models by CompactifAI: 80% compressed versions of Llama 3.1-8B and Llama 3.3-70B. Both models have 60% fewer parameters than the original models, 84% greater energy efficiency, 40% faster inference, and yield a 50% cost reduction without sacrificing accuracy. AI developers can immediately plug the models into any application – edge, on-premise, or cloud. Multiverse will release versions of the top LLMs compressed by CompactifAI over the coming months.

CompactifAI is Multiverse's proprietary AI compressor. It is the first compressor of its kind, using quantum-inspired tensor networks to make AI systems more efficient and portable, reducing size up to 93% with only a 2-3% drop in accuracy—an astounding feat when compared to an industry-standard 20-30% accuracy loss with 50-60% compression techniques.

“CompactifAI is changing the economics of AI processing and opening up new use cases for AI models,” said Enrique Lizaso Olmos, CEO of Multiverse Computing. “Efforts to curb unwieldy models have come up short. Our novel approach to compression grounded in quantum-inspired techniques makes it possible to pair performance with processing efficiency and gives us a massive edge on LLM providers.”

Multiverse Computing was founded in 2019 by pioneers in quantum-inspired software to develop novel solutions to complex business problems. In 2023 the company began applying its core technology to address the AI energy crisis with CompactifAI.

LLM providers have turned to techniques such as pruning and quantization to compress models but have yet to eradicate the tradeoff between size and performance. For instance, Llama3.1-8B Slim by CompactifAI requires 300x fewer training tokens than Meta’s CAI Llama3, and 3x fewer training tokens than Nvidia’s Llama3.1-Minitron while outperforming across benchmarks. For Llama3.3-70B Slim by CompactifAI, comparative benchmarks show an increase in reasoning capabilities while maintaining original precision.

“We’re rapidly delivering compressed versions of the most powerful LLMs in the world,” said Sam Mugel, Chief Technology Officer at Multiverse. “The advanced capabilities of these two massive models can now fit into smartphones, laptops, and cars, or real-world machines like oil rigs and satellites. Our aggressive roadmap to roll out dozens of compressed, leading LLMs could dramatically accelerate the impact of AI in the real world.”

Leading banks, telecommunication, and energy companies are beta users of the two compressed models. Llama3.1-8B Slim and Llama3.3-70B Slim are available via API in the CompactifAI platform. For additional information on Llama3.1-8B Slim, performance benchmarks and pricing, see here. For more information on Llama3.3-70B Slim, see here. To learn more about CompactifAI, visit multiversecomputing.com/compactifai

About Multiverse Computing

Multiverse Computing is the leader in quantum-inspired AI model compression. The company’s deep expertise in quantum software and AI led to the development of CompactifAI, a revolutionary AI model compressor. CompactifAI compresses LLMs by up to 93% with 2-3% precision loss, reduces computing requirements and unleashes new use cases for AI across industries. Multiverse Computing is headquartered in Donasta, Spain, with offices across Europe, the US, and Canada. The company now serves over 100 customers globally, including Iberdrola, Bosch, and the Bank of Canada. Multiverse Computing has raised $100M to date with investments from Columbus Venture Partners, the Spanish government, and others. For more information, visit multiversecomputing.com.