oBeaver is a local inference toolkit that runs LLMs on-device using ONNX Runtime and Foundry Local. It supports macOS, Windows, Linux, NPU acceleration, model conversion from Hugging Face, OpenAI-compatible API, embedding and vision-language models, Docker deployment, and a web dashboard.
oBeaver is a local inference toolkit that runs LLMs on your machine using ONNX Runtime and Foundry Local. It adds dual-engine support for wide platform coverage and NPU acceleration without cloud dependency.
Main feature/change and impact
oBeaver introduces a dual-engine architecture combining Foundry Local and ONNX Runtime GenAI. This ensures one-command model runs on macOS and Windows via Foundry Local. ORT GenAI provides Linux support, broader model compatibility, and NPU acceleration. The change reduces cloud dependency and enables device-native acceleration across CPUs, GPUs, and NPUs with minimal developer friction.
Practical implications
Developers can convert Hugging Face models to ONNX with a single command. oBeaver exposes OpenAI-compatible HTTP endpoints for seamless integration. It supports embeddings and vision-language models natively. Docker images enable headless deployment in CI/CD and Kubernetes. Existing OpenAI clients and frameworks work unchanged by swapping base_url.
“local inference shouldn’t be an island; it should fit seamlessly into your existing dev workflow.”
oBeaver matters if you need local control, NPU acceleration, or cross-platform support for LLMs. Next steps: install Python 3.12+, run obeaver init, and test with obeaver run or obeaver serve.
Key points from the article:
- Foundry Local provides zero-friction NPU-first local inference on macOS and Windows.
- ORT engine supports Linux, ONNX conversion, broad model library and NPUs.
- oBeaver offers OpenAI-compatible HTTP API for easy integration with existing clients.
- Built-in convert command automates Hugging Face to ONNX conversion in one step.
- Includes Docker and dashboard for deployment, monitoring, and developer evaluation.
Related Coverage:
From the Microsoft Developer Community Blog articles
