MS Ai Insider

New blog articles in Microsoft Community Hub

Microsoft Surface Copilot+ PCs Harness Qualcomm NPUs for Efficient Small Language Model AI Inference

Posted by

ailona

–

June 11, 2025

Microsoft leverages Neural Processing Units (NPUs) in Surface Copilot+ PCs to efficiently run Small Language Models (SLMs) for generative AI tasks. Using Qualcomm AI Hub and Snapdragon X Plus processors, this approach enables fast, low-power local AI inference, enhancing performance and usability on edge devices. Unique :

Unlocking Gen AI Power on Copilot+ PCs with NPUs

Generative AI is evolving fast, and Microsoft’s Copilot+ PCs are leading the charge. While large language models (LLMs) grab headlines, small language models (SLMs) are quietly revolutionizing AI at the edge. These compact models run efficiently on local devices, consuming less power and delivering faster results.

What’s New: Small Language Models Meet NPUs

SLMs shine when paired with Neural Processing Units (NPUs), specialized chips designed for AI workloads. Microsoft’s Surface Copilot+ laptops, powered by Qualcomm Snapdragon X Plus processors, leverage NPUs to run Gen AI tasks smoothly—even on battery.

“SLMs and NPUs together support running powerful Gen AI workloads efficiently on a laptop, even when multitasking.”

This combination means real-time AI applications become practical without cloud dependency or heavy power drains. The Qualcomm AI Hub plays a crucial role by converting AI models into optimized binaries tailored for these NPUs.

Major Updates: Tools and Frameworks for AI on Edge

Qualcomm AI Hub

This platform simplifies deploying AI models on edge devices. It converts PyTorch, TensorFlow, or ONNX models into QNN binaries ready to run on NPUs. With over 175 pre-optimized models, developers get a head start on integrating AI locally.

ONNX Runtime & Windows AI Foundry

ONNX Runtime offers an open-source engine for running AI models, optimized for Snapdragon processors. However, full support for generative AI on NPUs is still in beta. Meanwhile, Windows AI Foundry provides APIs and pre-built models like Phi-Silica for local inference, currently in preview.

AI Toolkit for Visual Studio Code

Developers can use this extension to experiment with AI models from Azure AI Foundry and Hugging Face. While CPU-optimized models are available now, NPU support is on the horizon, starting with Deepseek R

Why It Matters: Efficient, Local AI Inference

Running AI locally on NPUs reduces latency, enhances privacy, and cuts cloud costs. Qualcomm AI Hub stands out as the most developer-friendly solution today, offering easy setup and robust hardware utilization.

“Qualcomm AI Hub is the most user-friendly and well-supported solution available at this time.”

Microsoft’s Surface team demonstrated this by running the Phi-3.5 model on a Surface Laptop 13-inch with Snapdragon X Plus. Using precompiled QNN binaries, they achieved efficient local inference with minimal setup complexity.

Getting Started: What You Need to Know

Check your device’s NPU model via Device Manager.
Use Python 3.11 and Visual Studio Code for best compatibility.
Create a Python virtual environment before starting development.
Download model variants from Qualcomm AI Hub tailored to your SoC.

In short, Microsoft’s integration of SLMs with NPUs on Copilot+ PCs is a game-changer for edge AI. It empowers developers to build smarter, faster, and more efficient AI experiences right on their laptops.

Small Language Models (SLMs) offer real-time, low-power AI capabilities ideal for edge computing.

Qualcomm AI Hub simplifies AI model optimization and deployment specifically for Snapdragon NPUs.

ONNX Runtime and Windows AI Foundry provide alternative frameworks with varying NPU support and model availability.

Surface Laptop 13-inch with Snapdragon X Plus showcases efficient local AI inference using precompiled QNN binaries.

Developers benefit from integrated tools like AI Toolkit for VS Code to streamline generative AI app development.

From the New blog articles in Microsoft Community Hub