Posted in

Building Your First Local RAG Application with Foundry Local

Guide to building a fully offline, browser-based RAG AI using Foundry Local, Node.js, SQLite, and TF-IDF. Covers architecture, ingestion, local model management, retrieval pipeline, trade-offs with CAG, deployment steps, and a minimal sample app for field-ready, no-cloud AI support agents.

This post explains a new pattern and toolkit for fully offline RAG apps using Foundry Local. It outlines the architecture, stack, and a step-by-step developer path for a local, browser-based support agent.

Main feature/change and impact

Foundry Local enables on-device model hosting with no cloud calls, API keys, or outbound network traffic. Developers can run Phi-3.5 Mini locally via native SDK bindings for in-process inference. The impact is predictable latency, no external dependencies after model download, and verifiable data provenance for retrieval-augmented responses.

Practical implications

The sample app uses a SQLite vector store with TF-IDF and cosine similarity, avoiding embedding models. This keeps ingestion and retrieval fully offline and deterministic. The runtime supports CPU or NPU, so standard laptops and edge devices can run the agent without GPUs. Runtime document uploads and per-chunk source attribution remain supported.
“You are a helpful assistant.”
The architecture reduces operational complexity while preserving RAG benefits like source attribution and lower hallucination risk. For prototypes, use the included Node.js + Express server and single-file front end to iterate quickly. Next steps: ingest your documents, run npm start, and validate answers against source chunks.

Key points from the article:

  • Foundry Local runs models on-device with no outbound network calls.
  • RAG uses chunking, TF-IDF vectors, and cosine similarity for retrieval.
  • SQLite provides a zero-infrastructure vector store via better-sqlite3.
  • CAG is simpler but limited to small, static document sets.
  • Sample app uses Node.js, Express, single-file frontend, and runtime ingestion.
  • Related Coverage:

    From the Microsoft Developer Community Blog articles