Boosting Real-Time Conversations: Windows Developer Blog Introduces Llama 2 Support for DirectML and ONNX Runtime

Windows Developer Blog announces preview support for Llama 2 in DirectML. Developers can now run Llama 2 on Windows with DirectML and the ONNX Runtime. The sample shows progress with Llama 2 7B, which relies on an optimization pass on the model with Olive, a tool for ONNX models. After optimization, Llama 2 7B runs fast enough for real-time conversation on multiple vendors’ hardware.

Preview Support for Llama 2 in DirectML: A New Milestone

Microsoft has recently announced preview support for Llama 2 in DirectML, a significant step forward in the realm of machine learning. The announcement was made at Inspire 2023, promising developers the ability to run Llama 2 on Windows with DirectML and the ONNX Runtime.

“We now have a sample showing our progress with Llama 2 7B!”

What’s New?

The company has provided a sample of their progress with Llama 2 7B on GitHub. The sample relies on an optimization pass with Olive, a potent optimization tool for ONNX models. Olive uses graph fusion optimizations from ONNX Runtime and a model architecture optimized for DirectML. This optimization results in faster inference times, up to 10X!

Real-time Conversations on Multiple Vendors’ Hardware

Post-optimization, Llama 2 7B runs swiftly enough to support real-time conversations on various vendors’ hardware. Microsoft has also developed a user-friendly UI to showcase the optimized model in action.

Partnerships and Support

Microsoft extended its gratitude to the hardware partners who contributed to this achievement. Llama 2 has been optimized to perform on hardware from AMD, Intel, and NVIDIA.

“Thank you to our hardware partners who helped make this happen.”

Getting Started with Llama 2

For developers keen to explore Llama 2, access to the Llama 2 weights from Meta needs to be requested. Microsoft recommends upgrading to the latest drivers for optimal performance. AMD, Intel, and NVIDIA have all released optimized graphics drivers to support this new technology.

What’s Next?

This preview support for Llama 2 in DirectML is just the beginning. Microsoft has hinted at future enhancements to support larger models, fine-tuning, and lower-precision data types. Stay tuned for more exciting updates in this space.

Llama 2 can now be run on Windows with DirectML and the ONNX Runtime.

The sample shows progress with Llama 2 7B, which utilizes Olive for model optimization.

Olive is a powerful optimization tool for ONNX models, capable of speeding up inference times by up to 10X.

After optimization, Llama 2 7B can facilitate real-time conversation on multiple vendors’ hardware.

Optimized graphics drivers have been released by AMD, Intel, and NVIDIA for the best performance with Llama 2.

From the Windows Blog