Microsoft leverages Neural Processing Units (NPUs) in Surface Copilot+ PCs to efficiently run Small Language Models (SLMs) for generative AI tasks. Using Qualcomm AI Hub and Snapdragon X Plus processors, this approach enables fast, low-power local AI inference, enhancing performance and usability on edge devices. Unique :

Unlocking Gen AI Power on Copilot+ PCs with NPUs
Generative AI is evolving fast, and Microsoft’s Copilot+ PCs are leading the charge. While large language models (LLMs) grab headlines, small language models (SLMs) are quietly revolutionizing AI at the edge. These compact models run efficiently on local devices, consuming less power and delivering faster results.
What’s New: Small Language Models Meet NPUs
SLMs shine when paired with Neural Processing Units (NPUs), specialized chips designed for AI workloads. Microsoft’s Surface Copilot+ laptops, powered by Qualcomm Snapdragon X Plus processors, leverage NPUs to run Gen AI tasks smoothly—even on battery.
“SLMs and NPUs together support running powerful Gen AI workloads efficiently on a laptop, even when multitasking.”
This combination means real-time AI applications become practical without cloud dependency or heavy power drains. The Qualcomm AI Hub plays a crucial role by converting AI models into optimized binaries tailored for these NPUs.
Major Updates: Tools and Frameworks for AI on Edge
Qualcomm AI Hub
This platform simplifies deploying AI models on edge devices. It converts PyTorch, TensorFlow, or ONNX models into QNN binaries ready to run on NPUs. With over 175 pre-optimized models, developers get a head start on integrating AI locally.
ONNX Runtime & Windows AI Foundry
ONNX Runtime offers an open-source engine for running AI models, optimized for Snapdragon processors. However, full support for generative AI on NPUs is still in beta. Meanwhile, Windows AI Foundry provides APIs and pre-built models like Phi-Silica for local inference, currently in preview.
AI Toolkit for Visual Studio Code
Developers can use this extension to experiment with AI models from Azure AI Foundry and Hugging Face. While CPU-optimized models are available now, NPU support is on the horizon, starting with Deepseek R
Why It Matters: Efficient, Local AI Inference
Running AI locally on NPUs reduces latency, enhances privacy, and cuts cloud costs. Qualcomm AI Hub stands out as the most developer-friendly solution today, offering easy setup and robust hardware utilization.
“Qualcomm AI Hub is the most user-friendly and well-supported solution available at this time.”
Microsoft’s Surface team demonstrated this by running the Phi-3.5 model on a Surface Laptop 13-inch with Snapdragon X Plus. Using precompiled QNN binaries, they achieved efficient local inference with minimal setup complexity.
Getting Started: What You Need to Know
- Check your device’s NPU model via Device Manager.
- Use Python 3.11 and Visual Studio Code for best compatibility.
- Create a Python virtual environment before starting development.
- Download model variants from Qualcomm AI Hub tailored to your SoC.
In short, Microsoft’s integration of SLMs with NPUs on Copilot+ PCs is a game-changer for edge AI. It empowers developers to build smarter, faster, and more efficient AI experiences right on their laptops.
From the New blog articles in Microsoft Community Hub