Advertisement
Top
image credit: Unsplash

Nvidia Claims Doubled Inference Performance with H100

September 11, 2023

Category:

Nvidia says its new TensorRT-LL open-source software can dramatically boost performance of large language models (LLMs) on its GPUs. According to the company, the capabilities of Nvidia’s TensorRT-LL let it boost performance of its H100 compute GPU by two times in GPT-J LLM with six billion parameters. Importantly, the software can enable this performance improvement without re-training the model.

Nvidia developed TensorRT-LLM specifically to speed up performance of LLM inference and performance graphcs provided by Nvidia indeed show a 2X speed boost for its H100 due to appropriate software optimizations. A particular standout feature of Nvidia’s TensorRT-LLM is its innovative in-flight batching technique. This method addresses the dynamic and diverse workloads of LLMs, which can vary greatly in their computational demands.

Read More on Tom's Hardware