Exclusive: Databricks research confirms that Intel’s Gaudi bests Nvidia on price performance for AI accelerators

Exclusive: Databricks research confirms that Intel's Gaudi bests Nvidia on price performance for AI accelerators


Join leaders in San Francisco on January 10 for an exclusive night of networking, insights, and conversation. Request an invite here.

Nvidia isn’t the only company that makes AI accelerators for training and inference, it’s a space that Intel is aggressively competing and excelling in too with its Intel Gaudi 2 technology, according to new research.

Databricks conducted new research that is being released today, revealing that Intel Gaudi 2 provides strong performance competition against the industry-leading AI accelerators from Nvidia. The Databricks research found that for large language model (LLM) inference, Gaudi 2 matched the latency of Nvidia H100 systems on decoding and outperformed the Nvidia A100. The research found that Gaudi 2 inference achieves higher memory bandwidth utilization than H100 and A100. 

Nvidia still provides more training performance on its top-end accelerators. Using the Databricks MosaicML LLM foundry for training, the researchers found that Gaudi 2 achieved the second fastest single-node LLM training performance after NVIDIA H100, with more than 260 TFLOPS/chip. Overall, the Databricks research reported that based on public cloud pricing, Gaud i2 has the best dollar-per-performance for both training and inference compared to A100 and H100.

Intel has been providing its own testing results on Gaudi 2 via the MLcommons MLperf benchmark for both training and inference. The new data from Databricks provides further validation for Intel on the performance of its Gaudi technology, from a third party.

VB Event

The AI Impact Tour

Getting to an AI Governance Blueprint – Request an invite for the Jan 10 event.


Learn More

“We were impressed by the performance of Gaudi 2, especially the high utilization achieved for LLM inference,” Abhinav Venigalla, lead NLP architect at Databricks, told VentureBeat. “We anticipate further training and inference performance gains using Gaudi 2’s FP8 support, which is available in their latest software release,  due to time constraints, we were only able to examine performance using BF16.”

The Databricks performance numbers come as no surprise to Intel either. Eitan Medina, COO at Habana Labs, an Intel company, told VentureBeat that the report is consistent with the data that Intel measures and with feedback it gets from customers.

“It’s always good to get validation of what we say,” Medina said. “Since many people say that the Gaudi is kind of Intel’s best kept secret it’s actually important to have these sorts of publication reviews being made available so more and more customers know that Gaudi is a viable alternative.”

Intel continues to post competitive gains for Gaudi

Intel acquired AI chip startup Habana Labs and its Gaudi technology back in 2019 for $2 billion and has been steadily improving the technology in the years since then.

One of the ways that vendors aim to prove performance with industry-standard benchmarks. Both Nvidia and Intel routinely participate in the MLcommons MLPerf benchmarks for both training and inference, which are updated several times a year. In the latest MLPerf 3.1 training benchmarks released in November, both Nvidia and Intel claimed new LLM training speed records. Several months earlier in September, the MLPerf 3.1 inference benchmarks were released, also with solid competitive performance for both Nvidia and Intel.

While benchmarks like MLPerf and the report from Databricks are valuable, Medina noted that many customers rely on their own testing to make sure that the hardware and software stack works for a specific model and use case.

“The maturity of the software stack is incredibly important because people are suspicious of benchmarking organizations where vendors are kind of optimizing the heck out of meeting that specific benchmark,” he said.

According to Medina, MLPerf has its place, because people know that to submit results, a technology stack needs to pass a certain level of maturity. That said, he emphasized that MLPerf results are not something customers will rely on to make a business decision.

“MLperf results are sort of a maturity filter that organizations use before they invest time in testing,” Medina said.

Gaudi 3 is coming in 2024

The new data on Gaudi 2 comes as Intel is preparing to launch the Gaudi 3 AI accelerator technology in 2024.

Gaudi 2 is developed with a 7 nanometer process, while Gaudi 3 is based on a 5 nanometer process and will provide 4x the processing power and double the network bandwidth. Medina said that Gaudi 3 will be launched and in mass production in 2024.

“Gaudi 3 is a product that takes the Gaudi 2 and just delivers performance leadership,” Medina said. “It’s really a huge jump in performance that translates to advantages of performance per dollar and performance per watt.”

Looking beyond Gaudi 3 and likely into 2025, Intel is working on future generations that will converge the company’s high-performance computing (HPC) and AI accelerator technology. Intel also continues to see value in its CPU technologies for AI inference workloads as well. Intel recently announced its 5th Gen Xeon processors with AI acceleration.

“CPUs still have a significant percentage of inference and even fine tuning can be advantageous in CPUs,” Medina said. “CPUs are participating in the data preparation and of course, are offered together with the Gaudi accelerator for workloads where the density of the compute for AI is extreme; so the  overall strategy is to offer a range of solutions.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.


Source link