MS Ai Insider

New blog articles in Microsoft Community Hub

Microsoft Launches KAITO on AKS for Azure Local: Simplifying Open-Source LLM Deployment on Edge Kubernetes Clusters

Posted by

ailona

–

May 20, 2025

Microsoft announces the public preview of KAITO on AKS for Azure Local, enabling seamless deployment of open-source large language models (LLMs) on edge Kubernetes clusters. This update simplifies LLM deployment with GPU recommendations, model validation, and monitoring, empowering AI innovation at the edge. Unique :

Deploy Open Source Large Language Models with KAITO on AKS Azure Local

Microsoft just dropped a public preview refresh for KAITO on AKS on Azure Local, making LLM deployment easier than ever. If you’re into Kubernetes and edge AI, this update is a game-changer.

What’s New with KAITO on AKS?

KAITO can now be enabled as a cluster extension on AKS clusters powered by Azure Arc. You can add it during cluster creation or later using the Az CLI. This seamless integration matches the cloud AKS experience, so no surprises there.

Microsoft also tackled common deployment headaches. For example, KAITO recommends the right GPU SKU and validates models to prevent Out of Memory errors. This saves tons of trial and error when running large models locally.

“The seamless enablement experience makes it easy to get started with LLM deployment and fully consistent with AKS in the cloud.”

Major Updates: Deploy, Fine-Tune, and Monitor

With KAITO, deploying open-source LLMs like Phi-4, Mistral, or Qwen is as simple as writing a YAML file. Use Visual Studio Code or any editor, then deploy with kubectl on supported GPUs. Plus, you can bring your own models from Hugging Face or private weights.

Fine-tuning is also supported using Parameter Efficient Fine Tuning (PEFT) methods like qLoRA or LoRA. This lets you customize base models right on your edge Kubernetes cluster without complex setups.

Monitoring inference metrics got a boost too. KAITO defaults to the vLLM runtime, and you can visualize performance with Azure Managed Prometheus and Grafana. A few configuration steps link metrics to Azure Monitor, giving you real-time insights.

Compare and Evaluate Models Easily

The AI Toolkit extension for Visual Studio Code lets you test and compare LLMs side-by-side. Use built-in evaluators for coherence, fluency, and relevance to pick the best model for your edge use case.

“We’re aiming to supercharge innovation around LLMs and streamline the journey from ideation to real-world impact.”

Why It Matters

Edge AI is booming, with use cases like pipeline leak detection, factory optimization, and GenAI assistants requiring local models for low latency and compliance. KAITO on AKS Azure Local simplifies deploying powerful LLMs right where data lives.

Getting started is just one command away, making it accessible for developers and enterprises alike. Whether you’re experimenting or scaling production, KAITO offers a unified, cloud-consistent experience.

Ready to Dive In?

Explore the KAITO Jumpstart Drops and product docs to start deploying your own LLMs on edge Kubernetes clusters today. Microsoft encourages feedback via the KAITO OSS repo, so don’t hesitate to share your thoughts.

Stay tuned for more updates as KAITO evolves and continues to empower AI at the edge.

KAITO integrates as a cluster extension on AKS enabled by Azure Arc, streamlining cluster creation and day 2 operations.

Supports deployment and fine-tuning of various preset and custom LLMs using simple YAML configurations and kubectl commands.

AI Toolkit extension in Visual Studio Code allows side-by-side LLM comparison with built-in evaluators for coherence, fluency, and relevance.

Inference metrics can be monitored and visualized via Azure Managed Prometheus and Grafana dashboards for enhanced observability.

Use cases include edge AI scenarios like pipeline leak detection, factory optimization, and GenAI assistants, addressing latency and regulatory needs.

From the New blog articles in Microsoft Community Hub