CompactifAI - Accuracy and Consumption Analysis from a Compressed Llama 3.1 model by Sopra Steria

The exponential growth of Large Language Models (LLMs) and their increasing demand has been a major factor leading to an unparalleled surge in computational requirements, followed by significant increases in energy consumption, financial overhead, and a race for performance.

Due to their large specific infrastructure and widespread use, AI systems exert a wide range of high environmental impacts. Reducing these impacts can be achieved through several actions: reasoning about use cases and prioritizing AI projects with positive impacts, as well as optimizing software (models, data, languages...) and infrastructures.

This study focuses on model optimization, which constitutes only one part of the approach needed to counter the negative environmental impact of AI. In an effort to determine the potential impact of compressed models on computational and energy reductions, SopraSteria's sustAIn team evaluated CompactifAI from Multiverse Computing—an innovative approach that uses tensor networks alongside other techniques to reduce the number of parameters. The objective was to test the compressed version of Llama 3.1 8B, as compressed by CompactifAI, against the original version and benchmark cost savings, energy consumption, and model precision.

SopraSteria established the sustAIn team, dedicated to frugal AI within its broader AI initiative, rAIse. This team has been working specifically on AI software efficiency for several years. It has studied various factors influencing the energy consumption of AI systems based on language models (both small and large), as well as simpler AI methods such as regression and classification algorithms. The team has conducted numerous tests, varying the language model, the number of parameters, quantization techniques, and frameworks. The goal is to help presales professionals, developers, and project managers become aware of AI's environmental impact, enabling them to make informed decisions. Through these benchmarks, the team is able to make impact predictions and provide quantified best practices for more frugal AI, thus facilitating the implementation of eco-design from the earliest project stages.

To reduce model size—and consequently energy consumption—the team had previously examined classical model compression methods such as quantization. According to internal experiments, these methods demonstrated energy savings of around 20%. However, limitations were observed, particularly regarding the accuracy losses that may be unacceptable for certain systems. Upon learning about CompactifAI’s innovative compression method, the team aimed to assess whether the efficiency gains were significant and whether the reduction in accuracy was sufficiently minimal to support its adoption in their projects.

Read the full report here: https://arxiv.org/pdf/2507.08836

CompactifAI - Accuracy and Consumption Analysis from a Compressed Llama 3.1 model by Sopra Steria

Singularity™

AI