Nvidia’s days of absolute dominance in AI could be numbered because of this key performance benchmark

Rodrigo Liang, CEO and co-founder of SambaNova Systems

SambaNova Systems

Nvidia faces competition in AI inference from startups like SambaNova, Groq, and Cerebras.Inference, the production stage of AI, is seen as a key market by these startups.Startups claim superior speed and efficiency for inference, but there are tradeoffs.

It’s difficult to chip away at a headstart that is essentially trillions of dollars strong, but that’s what Nvidia’s competitors are attempting. They may have the best chance of competing in a type of artificial intelligence computing called inference.

Inference is the production stage of AI computing. Once training a model is done, inference chips produce the outputs and complete tasks based on that training — whether that’s generating a picture or written answers to a prompt.

Rodrigo Liang cofounded SambaNova Systems in 2017 with the aim of going after Nvidia’s already obvious lead. But back then, the artificial intelligence ecosystem was even younger, and inference loads were tiny. With foundation models advancing in size and accuracy, the flip from training machine learning models to using them is coming into view.

Last month, Nvidia CFO Colleen Kress said the company’s data center workloads had reached 40% inference. Liang told Business Insider he expects 90% of AI computing workloads will be in inference in the not-too-distant future.

That’s why several startups are charging aggressively into the inference market — emphasizing where they might outperform the goliath in the space.

SambaNova uses a reconfigurable dataflow unit or RDU instead of Nvidia’s and AMD’s graphics processing units. Liang’s firm purports that its architecture is a better fit for machine learning models since it was designed for that purpose, and not for rendering graphics. It’s an argument that fellow Nvidia challenger Cerebras CEO Andrew Feldman references too.

Nvidia also agrees that inference is the larger market, according to Bernstein analysts who met with Nvidia CFO Colette Kress last week.

Kress believes Nvidia’s “offering is the best for inferencing given their networking strength, the liquid cooling offering, and their ARM CPU, all of which are essential for optimal inferencing,” the analysts wrote. Kress also noted that most of Nvidia’s inference revenue currently comes from recommender engines and search. Nvidia declined to comment for this report.

Liang said the inference market will begin to mature within roughly six months.

SambaNova uses a different architecture for it’s chip than Nvidia or AMD.

SambaNova

Startups are betting on computing speed

To pull customers away from Nvidia, newer players like Groq, Cerebras, and SambaNova are touting speed. In fact, Cerebras and SambaNova claim to offer the fastest inference computing in the world. And neither use GPUs, the type of chip leaders Nvidia and AMD promote.

According to SambaNova, its RDUs are ideal for agentic AI, which can complete functions without much instruction. Speed is an important factor when multiple AI models talk to each other and waiting for an answer can dampen the magic of generative AI.

But there isn’t just one measure of inference speed. The specifications of each model, such as Meta’s Llama, Anthropic’s Claude, or OpenAI’s o1, determine how fast results are generated.

Speed in AI computing results from several engineering factors that go beyond the chip itself. And the way chips are networked together can impact their performance, which means Nvidia chips in one data center may perform differently than the same chip in another data center.

The number of tokens per second that can be consumed (when a prompt goes in) and generated (when a response comes out) is a common metric for AI computing speed. Tokens are a base unit of data, where data could be pixels, words, audio, and beyond. But tokens per second don’t account for latency or lag — which can stem from multiple factors.

It is also difficult to make an apples-to-apples comparison between hardware as performance depends on how the hardware is set up and the software that runs it. Additionally, the models themselves are improving constantly.

In the hopes of advancing the market for inference faster, and edge their way into a market dominated by Nvidia, several newer hardware companies are trying different business models to bypass direct competition with Nvidia and go straight to the companies building AI.

SambaNova offers Meta’s open-source Llama foundation model through its cloud service and Cerebras and Groq have launched similar services. Thus, these companies are competing with both chip design companies like Nvidia and AI foundation model companies like OpenAI.

Artificialanalysis.ai provides public information comparing models that offer inference-as-a-service via API. On Wednesday, the site showed that Cerebras, SambaNova, and Groq were indeed the three fastest APIs for Met’a’s Llama 3.1 70B and 8B models.

Nvidia isn’t included in this comparison because the company doesn’t provide inference-as-a-service. MLPerf shares inference performance benchmarks for hardware computing speed. Nvidia is among the best performing in this dataset, but its startup competitors are not included.

“I think you’re gonna see this inference game open up for all these other alternatives in a much, much broader way than the pre-training market opened up,” Liang said. “Because it was so concentrated with very few players, Jensen could personally negotiate those deals in a way that, for startups, it’s hard to break up,” he continued.

Cerebras’s AI chip is roughly size of a dinner plate.

Cerebras

What’s the catch?

Semianalysis chief analyst Dylan Patel said chip buyers have to consider not just performance, but also all the advantages and expenses that make a difference over a chip’s lifetime. From his view, these startup chips can start to show cracks.

Patel told BI that “GPUs offer superior total cost of ownership per token.”

SambaNova and Cerebras disagree with this.

“There is typically a tradeoff when it comes to speed and cost. Higher inference speed can mean a larger hardware footprint, which in turn demands higher costs,” Liang said, adding that SambaNova makes up for this tradeoff by delivering speed and capacity with fewer chips and, therefore, lower costs.

Cerebras CEO Andrew Feldman disputed the view that GPUs have a lower total cost of ownership, saying “While GPU manufacturers may claim leadership in TCO, this is not a function of technology but rather the big bull horn they have,” he said.

Got a tip or an insight to share? Contact Senior Reporter Emma Cosgrove at ecosgrove@businessinsider.com or use the secure messaging app Signal: 443-333-9088

Read the original article on Business Insider

Eritrean News

Ministry of Labor and Social Welfare Activity Assessment Meeting

President Isaias Holds Talks with Special Chinese Delegation

An important National Initiative with Health, Nutrition, and Development Benefits

Haddas Ertra 21 December 2024

Eritrea Alhaditha 21 December 2024

Activity Assessment Meeting of NCEW Executive Committee

Ancient Relic Discovered in Egri-Mekel

Seminar for Nationals in the Netherlands

Haddas Ertra 20 December 2024

Seethe, Western elites: The world likes Russia despite your hate

News24 | Putin vows ‘destruction’ on Ukraine after Kazan drone attack

Orban blames migration for deadly attack in Germany

Christmas market attack suspect threatened act to ‘attract international attention’ in 2013

Nvidia’s days of absolute dominance in AI could be numbered because of this key performance benchmark

Startups are betting on computing speed

What’s the catch?

Like this:

More From Author

My December birthday used to be overshadowed by holiday celebrations. I make sure my kids’ birthday is celebrated.

Seethe, Western elites: The world likes Russia despite your hate

News24 | Putin vows ‘destruction’ on Ukraine after Kazan drone attack

+ There are no comments

Cancel reply

The Black Sea has a deadly naval mine problem that will long outlast the Ukraine war

US announces $375 million in military aid for Ukraine

You May Also Like:

Eritrean Nationalism: A Historical Journey Towards Identity and Independence.

U.S. foreign relations chair charged with corruption over Egypt ties

Mali, Niger and Burkina Faso sign new Sahel pact

In India, G20 set to welcome African Union as newest permanent member

South Africa’s DA says it remains a Taiwan ally as Tsai visits eSwatini

Gabon’s military coup leaders name Oligui Nguema to replace Bongo

Security Council approves Kenya-led mission in Haiti

Fayulu, still undeclared, warns of compromised Dr Congo elections

Eritrean News

Top Tagged

Startups are betting on computing speed

What’s the catch?

Share this:

Like this:

+ There are no comments

The Black Sea has a deadly naval mine problem that will long outlast the Ukraine war

US announces $375 million in military aid for Ukraine