Introducing the LittleLamb 0.3B Model Family

Multiverse Computing is releasing a new family of sub-billion-parameter language models under the LittleLamb name. The family includes three new models LittleLamb 0.3B, LittleLamb 0.3B Tool-Calling and LittleLamb 0.3B Mobile, all derived from Qwen3-0.6B and compressed with CompactifAI, Multiverse's proprietary compression technology. Each model is roughly half the size of its base, puts a Gemma-270M-class footprint on-device, and retains Qwen3's dual thinking / non-thinking chat modes plus bilingual (English / Spanish) coverage.

The model family is aimed at teams that want modern reasoning and tool use behavior for memory- and latency-constrained hardware: on-device assistants, offline-capable apps, edge agents, and function calling on edge devices. Below we summarize what each model is for and how it benchmarks against its peers.

LittleLamb 0.3B - the bilingual flagship

LittleLamb 0.3B is the general-purpose entry in the family. It is a decoder-only Transformer built on Qwen3-0.6B (0.6B / 0.44B non-embedding params) and compressed at a 50% rate with CompactifAI, landing at roughly 0.3B parameters, the same size class as gemma3-270m-it and functiongemma-270m-it. It supports 32K context, 16 Q / 8 KV attention heads (GQA), and both thinking (enable_thinking=True) and non-thinking modes via the standard Qwen3 chat template.

Check out the model and full benchmarks here on Hugging Face

LittleLamb 0.3B Tool-Calling - agentic capability at the edge

The Tool-Calling variant adds native function calling, structured JSON output, and agentic workflow support on top of the compressed LittleLamb base. It detects when to invoke tools, emits Qwen3-style structured tool calls, and consumes tool outputs all in a sub-300M-parameter footprint suitable for on-device agents.

The headline results are in non-thinking mode on the agentic benchmarks, where additional tool-calling fine-tuning pays off against the uncompressed Qwen3-0.6B base:

BFCL v4 (thinking): 51.55 vs. 51.95 for Qwen3-0.6B.
BFCL v4 (non-thinking): 50.67 vs. 29.17 for Qwen3-0.6B - a 74% relative improvement.
τ²-Bench (non-thinking): 26.67 vs. 15.50 for Qwen3-0.6B.

Check out the model and full benchmarks here on Hugging Face

LittleLamb 0.3B Mobile - built for on-device inference

LittleLamb 0.3B Mobile is the deployment-focused artifact line: the same 0.3B CompactifAI footprint, packaged for efficient serving on constrained hardware, with a BF16 export path (e.g. LittleLamb-Mobile_0.3B_BF16_v1.0.0). It targets on-device assistants, offline-capable apps, and edge nodes where latency, memory, battery, and thermal budgets dominate.

Head-to-head with Qwen3-0.6B and functiongemma-270m-it, LittleLamb 0.3B Mobile tracks the full-size Qwen3-0.6B closely and in non-thinking mode it is the best model of the three on τ²-Bench and BFCL v4, the two agentic / tool-calling benchmarks in the suite.

Mobile Actions

On the Mobile Actions task, function-calling accuracy on a mobile-style action dataset, LittleLamb 0.3B Mobile outperforms functiongemma-270m-it after equivalent fine-tuning, in both chat modes.

Family snapshot

Under the hood

All three LittleLamb models share the same backbone Qwen3-0.6B, an open-weight (Apache 2.0) causal language model from the Qwen3 family.

We keep the original tokenizer (100+ language coverage) and the dual thinking / non-thinking chat template. CompactifAI then reduces the non-embedding parameter count by roughly 50% while preserving the reasoning behavior that makes Qwen3 attractive at this size. For the Tool-Calling variant, we add an additional fine-tuning pass on function-calling and structured-output data on top of the compressed base.

Availability & intended use

The LittleLamb family is intended for: on-device and edge inference where memory, battery, and thermal budgets are tight; offline or low-connectivity assistants; reasoning tasks that benefit from configurable thinking modes; bilingual (EN / ES) experiences in compact form factors; and (with the Tool-Calling variant) function calling and agentic workflows in resource-constrained environments. For sampling settings in each chat mode, follow the Qwen3-0.6B model card recommendations.

LittleLamb 0.3B and LittleLamb 0.3B Tool-Calling and are available now from Multiverse Computing on Hugging Face.

LittleLamb 0.3B - the bilingual flagship

LittleLamb 0.3B Tool-Calling - agentic capability at the edge

LittleLamb 0.3B Mobile - built for on-device inference

Want to know more?