Posted in

oBeaver: Local LLM Inference with ONNX

oBeaver is a local inference toolkit that runs LLMs on-device using ONNX Runtime and Foundry Local. It supports macOS, Windows, Linux, NPU acceleration, model conversion from Hugging Face, OpenAI-compatible API, embedding and vision-language models, Docker deployment, and a web dashboard.

oBeaver is a local inference toolkit that runs LLMs on your machine using ONNX Runtime and Foundry Local. It adds dual-engine support for wide platform coverage and NPU acceleration without cloud dependency.

Main feature/change and impact

oBeaver introduces a dual-engine architecture combining Foundry Local and ONNX Runtime GenAI. This ensures one-command model runs on macOS and Windows via Foundry Local. ORT GenAI provides Linux support, broader model compatibility, and NPU acceleration. The change reduces cloud dependency and enables device-native acceleration across CPUs, GPUs, and NPUs with minimal developer friction.

Practical implications

Developers can convert Hugging Face models to ONNX with a single command. oBeaver exposes OpenAI-compatible HTTP endpoints for seamless integration. It supports embeddings and vision-language models natively. Docker images enable headless deployment in CI/CD and Kubernetes. Existing OpenAI clients and frameworks work unchanged by swapping base_url.

“local inference shouldn’t be an island; it should fit seamlessly into your existing dev workflow.”

oBeaver matters if you need local control, NPU acceleration, or cross-platform support for LLMs. Next steps: install Python 3.12+, run obeaver init, and test with obeaver run or obeaver serve.

Key points from the article:

  • Foundry Local provides zero-friction NPU-first local inference on macOS and Windows.
  • ORT engine supports Linux, ONNX conversion, broad model library and NPUs.
  • oBeaver offers OpenAI-compatible HTTP API for easy integration with existing clients.
  • Built-in convert command automates Hugging Face to ONNX conversion in one step.
  • Includes Docker and dashboard for deployment, monitoring, and developer evaluation.

Related Coverage:

From the Microsoft Developer Community Blog articles