[Offline AI Automation with Foundry Local] - Ailona Lab: The Autonomous Endpoint

Guide to building offline AI automation using Foundry Local, Microsoft Agent Framework, and PyBullet. It outlines a four-agent pipeline that converts natural language to validated JSON plans, executes physics-driven robot actions locally, supports voice input, and compares model performance for real-time control.

Foundry Local and the Microsoft Agent Framework enable on-device AI automation for robotics and simulation. The project runs an OpenAI-compatible LLM, multi-agent orchestration, and PyBullet physics entirely offline. This delivers low-latency, private inference and reproducible simulation without cloud costs.

Main feature/change and impact

Foundry Local provides an OpenAI-compatible endpoint that runs models on-device. The system pairs this endpoint with the Microsoft Agent Framework and PyBullet. A PlannerAgent generates structured JSON plans from natural language. SafetyAgent validates plans against workspace bounds and schemas. ExecutorAgent maps actions to PyBullet calls and executes validated plans in real time. This architecture reduces latency, eliminates external API dependencies, and preserves data privacy for robotics workloads.

Practical implications

Developers can run the entire pipeline on a laptop or edge device with no internet. The SDK auto-selects hardware backends: CUDA GPU, QNN NPU, then CPU. Small models like qwen2.5-coder-0.5b produce valid JSON plans in about five seconds. The constrained JSON schema lets compact models handle planning reliably. Voice input uses local Whisper transcription, keeping audio local. The system supports rapid iteration and swapping executor implementations.

“No API calls leave your machine. No token costs accumulate. No internet connection is needed.”

The implementation separates language reasoning from motion execution through a tool-call API. The LLM outputs high-level tool calls such as pick and move_ee, never joint angles. The executor translates tool calls to inverse kinematics and gripper control in PyBullet. This abstraction allows swapping the executor without re-prompting or retraining the model. The safety layer validates plans independently of kinematics. Getting started requires installing Foundry Local, downloading models, and launching the web UI. The repository includes setup scripts and examples for pick, describe, and move commands. Adding new actions or agents follows documented patterns in action_schema.py and src/agents/. For performance, qwen2.5-coder-0.5b is recommended for interactive control. Next steps are to test the pipeline on target hardware and extend the agent set for domain needs. Validate safety schemas for your workspace and instrument latency metrics. Consider adding VisionAgent or CostEstimatorAgent for perception and planning cost awareness.

Key points from the article:

Foundry Local provides an on-device OpenAI-compatible endpoint.

Four agents handle planning, safety validation, execution, and narration.

LLM outputs constrained JSON to bridge language and motion.

Executor maps actions to PyBullet inverse kinematics and grasp sequences.

Qwen2.5-coder-0.5b offers fastest interactive performance in tests.

Related Coverage:

From the Microsoft Developer Community Blog articles

Main feature/change and impact

Practical implications

Key points from the article:

Related Coverage:

Share this:

Related