April 28, 2026 · 6 min read

Introducing the LittleLamb 0.3B Model Family

Compact, capable, and built for the edge 3 CompactifAI compressed AI models now available from Multiverse Computing.

Multiverse Computing

Introducing the LittleLamb 0.3B Model Family

Multiverse Computing is releasing a new family of sub-billion-parameter language models under the LittleLamb name. The family includes three new models LittleLamb 0.3B, LittleLamb 0.3B Tool-Calling and LittleLamb 0.3B Mobile, all derived from Qwen3-0.6B and compressed with CompactifAI, Multiverse's proprietary compression technology. Each model is roughly half the size of its base, puts a Gemma-270M-class footprint on-device, and retains Qwen3's dual thinking / non-thinking chat modes plus bilingual (English / Spanish) coverage.

The model family is aimed at teams that want modern reasoning and tool use behavior for memory- and latency-constrained hardware: on-device assistants, offline-capable apps, edge agents, and function calling on edge devices. Below we summarize what each model is for and how it benchmarks against its peers.

LittleLamb 0.3B - the bilingual flagship

LittleLamb 0.3B is the general-purpose entry in the family. It is a decoder-only Transformer built on Qwen3-0.6B (0.6B / 0.44B non-embedding params) and compressed at a 50% rate with CompactifAI, landing at roughly 0.3B parameters, the same size class as gemma3-270m-it and functiongemma-270m-it. It supports 32K context, 16 Q / 8 KV attention heads (GQA), and both thinking (enable_thinking=True) and non-thinking modes via the standard Qwen3 chat template.

Check out the model and full benchmarks here on Hugging Face

LittleLamb 0.3B Tool-Calling - agentic capability at the edge

The Tool-Calling variant adds native function calling, structured JSON output, and agentic workflow support on top of the compressed LittleLamb base. It detects when to invoke tools, emits Qwen3-style structured tool calls, and consumes tool outputs all in a sub-300M-parameter footprint suitable for on-device agents.

The headline results are in non-thinking mode on the agentic benchmarks, where additional tool-calling fine-tuning pays off against the uncompressed Qwen3-0.6B base:

  • BFCL v4 (thinking): 51.55 vs. 51.95 for Qwen3-0.6B.
  • BFCL v4 (non-thinking): 50.67 vs. 29.17 for Qwen3-0.6B - a 74% relative improvement.
  • τ²-Bench (non-thinking): 26.67 vs. 15.50 for Qwen3-0.6B.

Check out the model and full benchmarks here on Hugging Face

LittleLamb 0.3B Mobile - built for on-device inference

LittleLamb 0.3B Mobile is the deployment-focused artifact line: the same 0.3B CompactifAI footprint, packaged for efficient serving on constrained hardware, with a BF16 export path (e.g. LittleLamb-Mobile_0.3B_BF16_v1.0.0). It targets on-device assistants, offline-capable apps, and edge nodes where latency, memory, battery, and thermal budgets dominate.

Head-to-head with Qwen3-0.6B and functiongemma-270m-it, LittleLamb 0.3B Mobile tracks the full-size Qwen3-0.6B closely and in non-thinking mode it is the best model of the three on τ²-Bench and BFCL v4, the two agentic / tool-calling benchmarks in the suite.

Mobile Actions

On the Mobile Actions task, function-calling accuracy on a mobile-style action dataset, LittleLamb 0.3B Mobile outperforms functiongemma-270m-it after equivalent fine-tuning, in both chat modes.

Family snapshot

Under the hood

All three LittleLamb models share the same backbone Qwen3-0.6B, an open-weight (Apache 2.0) causal language model from the Qwen3 family.

We keep the original tokenizer (100+ language coverage) and the dual thinking / non-thinking chat template. CompactifAI then reduces the non-embedding parameter count by roughly 50% while preserving the reasoning behavior that makes Qwen3 attractive at this size. For the Tool-Calling variant, we add an additional fine-tuning pass on function-calling and structured-output data on top of the compressed base.

Availability & intended use

The LittleLamb family is intended for: on-device and edge inference where memory, battery, and thermal budgets are tight; offline or low-connectivity assistants; reasoning tasks that benefit from configurable thinking modes; bilingual (EN / ES) experiences in compact form factors; and (with the Tool-Calling variant) function calling and agentic workflows in resource-constrained environments. For sampling settings in each chat mode, follow the Qwen3-0.6B model card recommendations.

LittleLamb 0.3B and LittleLamb 0.3B Tool-Calling and are available now from Multiverse Computing on Hugging Face.

About Multiverse Computing

Multiverse Computing is the leader in quantum-inspired AI model compression. The company’s deep expertise in quantum software led to the development of CompactifAI, a revolutionary compressor that reduces computing requirements and unleashes new use cases for AI across industries. Headquartered in Donostia, Spain, with offices in the United States, Canada, and across Europe, Multiverse serves more than 100 global customers, including Iberdrola, Bosch, and the Bank of Canada.

Want to know more?

Reach out to us at business@multiversecomputing.com