Bing is enhancing its search capabilities by transitioning to Large Language Models (LLMs) and Small Language Models (SLMs), optimizing performance with Nvidia’s TensorRT-LLM. This integration significantly reduces latency and costs while improving search accuracy and user experience. The move promises faster, more precise search results, paving the way for future innovations.2. **Unique (HTML Format):**

Bing’s Transition to LLM/SLM Models: A New Era in Search Technology
Bing is redefining search technology by integrating Large Language Models (LLMs) and Small Language Models (SLMs). This transition marks a significant milestone in enhancing search capabilities. As search queries become more complex, the need for more powerful models is evident.
What’s New?
While transformer models have served their purpose, they often struggle with efficiency. The introduction of SLMs offers a remarkable ~100x throughput improvement over LLMs. This change allows Bing to process and understand search queries with greater precision.
“We will not compromise on quality for speed.”
Major Updates: Optimizing with TensorRT-LLM
Managing latency and cost has been a challenge with larger models. To tackle this, Bing has integrated Nvidia’s TensorRT-LLM into its workflow. This optimization tool enhances SLM inference performance significantly.
One key application of TensorRT-LLM is in the ‘Deep Search’ feature. This innovative approach leverages SLMs in real-time to deliver the best possible web results. Understanding user intent and ensuring the relevance of search results are crucial steps in this process.
Before optimization, the original Transformer model had a 95th percentile latency of 4.76 seconds per batch. After integrating TensorRT-LLM, latency was reduced to 3.03 seconds per batch, while throughput increased from 4.2 to 6.6 queries per second. This optimization not only enhances user experience but also reduces operational costs by 57%.
Benefits for Users
The transition to SLM models and TensorRT-LLM brings several advantages:
- Faster Search Results: Users can enjoy quicker response times, making their search experience seamless.
- Improved Accuracy: Enhanced SLM capabilities deliver more accurate and contextualized search results.
- Cost Efficiency: Reduced costs allow Bing to invest in further innovations, keeping it at the forefront of search technology.
Looking Ahead
Bing is committed to refining its search technology. The transition to LLM and SLM models is just the beginning. Exciting advancements are on the horizon, and users can expect more updates as Bing continues to push the boundaries of search technology.
“We are excited about the future possibilities and look forward to sharing more advancements with you.”
From the Bing Blogs