Microsoft’s new Mu language model powers the AI agent in Windows Settings on Copilot+ PCs. Designed for on-device efficiency, Mu runs on NPUs with ultra-fast response, using an innovative encoder-decoder architecture and advanced quantization to deliver seamless, low-latency natural language control of system settings. Unique :

Meet Mu: The Tiny Language Model Powering Windows Settings AI
Microsoft just unveiled Mu, a compact yet powerful language model designed for on-device AI tasks. Running locally on Windows Copilot+ PCs, Mu powers the new agent inside Windows Settings. This means natural language commands can now instantly map to system functions without cloud delays.
What’s New with Mu?
Mu is a micro-sized 330 million parameter encoder-decoder model optimized to run efficiently on Neural Processing Units (NPUs). Unlike traditional decoder-only models, Mu’s architecture first encodes input into a latent representation, then decodes outputs. This design slashes latency and memory use—key for real-time, on-device AI.
“Mu’s one-time encoding greatly reduces computation and memory overhead.”
Running on Qualcomm Hexagon NPUs, Mu achieves 47% lower first-token latency and nearly 5x faster decoding speed compared to similar decoder-only models. It’s fine-tuned specifically to fit NPU hardware constraints, maximizing parallelism and minimizing memory footprint.
Major Updates: Transformer Upgrades & Training
Mu packs three transformer innovations: Dual LayerNorm for stable training, Rotary Positional Embeddings for better long-context understanding, and Grouped-Query Attention to cut attention parameters without losing accuracy. These upgrades let Mu outperform much larger models on edge devices.
Training started with hundreds of billions of educational tokens, followed by knowledge distillation from Microsoft’s larger Phi models. This approach delivers impressive accuracy despite Mu’s tiny size. Fine-tuning with task-specific data further boosts performance for Windows Settings commands and other tasks.
“Mu is nearly comparable in performance despite being one-tenth of the size.”
Optimized for Real-World Use: Quantization & Hardware Collaboration
To run efficiently on Copilot+ PCs, Mu uses advanced post-training quantization, converting weights to 8-bit and 16-bit integers. This reduces memory and compute needs without sacrificing accuracy. Microsoft also partnered with AMD, Intel, and Qualcomm to optimize Mu’s operations for various NPUs.
The result? Mu can generate over 200 tokens per second on devices like the Surface Laptop 7, enabling smooth, responsive AI-powered settings control.
Why Mu Matters for Windows Users
Changing Windows settings can be tedious. Mu’s agent understands natural language, letting users simply ask for changes. It handles hundreds of settings with ultra-low latency, making Windows more intuitive and accessible.
By scaling training to millions of samples and using synthetic data for diverse phrasing, Microsoft ensured Mu adapts well to real user queries.
Final Thoughts
Mu represents a leap forward in on-device AI—small, fast, and smart enough to power complex tasks without cloud dependency. Windows users with Copilot+ PCs can expect a smoother, more natural way to interact with their system settings.
Stay tuned as Microsoft continues refining Mu and expanding its AI capabilities across Windows.
From the Windows Blog