Posted in

Revolutionizing Search: Bing’s Transition to Language Models for Faster, More Accurate Results

Bing is enhancing its search capabilities by transitioning to Large Language Models (LLMs) and Small Language Models (SLMs), optimizing performance with Nvidia’s TensorRT-LLM. This integration significantly reduces latency and costs while improving search accuracy and user experience. The move promises faster, more precise search results, paving the way for future innovations.2. **Unique (HTML Format):**

Bing’s Transition to LLM/SLM Models: A New Era in Search Technology

Bing is redefining search technology by integrating Large Language Models (LLMs) and Small Language Models (SLMs). This transition marks a significant milestone in enhancing search capabilities. As search queries become more complex, the need for more powerful models is evident.

What’s New?

While transformer models have served their purpose, they often struggle with efficiency. The introduction of SLMs offers a remarkable ~100x throughput improvement over LLMs. This change allows Bing to process and understand search queries with greater precision.

“We will not compromise on quality for speed.”

Major Updates: Optimizing with TensorRT-LLM

Managing latency and cost has been a challenge with larger models. To tackle this, Bing has integrated Nvidia’s TensorRT-LLM into its workflow. This optimization tool enhances SLM inference performance significantly.

One key application of TensorRT-LLM is in the ‘Deep Search’ feature. This innovative approach leverages SLMs in real-time to deliver the best possible web results. Understanding user intent and ensuring the relevance of search results are crucial steps in this process.

Before optimization, the original Transformer model had a 95th percentile latency of 4.76 seconds per batch. After integrating TensorRT-LLM, latency was reduced to 3.03 seconds per batch, while throughput increased from 4.2 to 6.6 queries per second. This optimization not only enhances user experience but also reduces operational costs by 57%.

Benefits for Users

The transition to SLM models and TensorRT-LLM brings several advantages:

  • Faster Search Results: Users can enjoy quicker response times, making their search experience seamless.
  • Improved Accuracy: Enhanced SLM capabilities deliver more accurate and contextualized search results.
  • Cost Efficiency: Reduced costs allow Bing to invest in further innovations, keeping it at the forefront of search technology.

Looking Ahead

Bing is committed to refining its search technology. The transition to LLM and SLM models is just the beginning. Exciting advancements are on the horizon, and users can expect more updates as Bing continues to push the boundaries of search technology.

“We are excited about the future possibilities and look forward to sharing more advancements with you.”

  • Bing’s shift to SLMs offers a ~100x throughput improvement over LLMs.
  • Tensorrt-LLM integration reduces model inference time and enhances user experience.
  • Before optimization, the original Transformer model had a latency of 4.76 seconds per batch.
  • Post-integration, latency dropped to 3.03 seconds per batch, with throughput increasing to 6.6 queries per second.
  • The transition allows Bing to invest in further innovations while ensuring cost efficiency.
  • From the Bing Blogs