Llama 3.1 8B Slim

by CompactifAI
Now Smaller. Faster. Smarter

Efficiency Without Compromise

The future of AI isn't just powerful—it's efficient, accessible, and built to run anywhere.

Seamless deployment on edge devices, from mobile to IoT.

Parameter Number (B)

Run on Nvidia H200
Experience lower latency and real-time processing, even on limited hardware.

Token per Second [token/S]

Experience lower latency and real-time processing, even on limited hardware.

Minimum GPU Required [GB]

Less power, more performance— Datacenters can serve nearly twice as many users on the same GPU hardware

Energy Efficiency [tokens/kWh]

Keep your data secure and localized with on-device intelligence. Perfect for chatbots, automation, content generation, and enterprise AI solutions.

Comparison between our compressed version of Llama3.1 8B Slim and the versions released by Meta (Llama3.2 3.2B) and Nvidia (Llama3-Minitron-4B).

Meta (Llama 3.2): 3.2B (60% compression) 9T training tokens (x300) Healed on private data

Accuracy / Score (%)

The future of AI isn't just powerful—it's efficient, accessible, and built to run anywhere.

Interested in seeing our Quantum AI softwares in action? Contact us.