Price plummets 70%: How the AI computing power rental bubble burst?

Techub News
2024-10-22 17:39:51
Collection
Investing in a brand new H100 GPU right now is likely to result in a loss. However, it may be reasonable to invest only under special circumstances, such as being able to purchase discounted H100s, having low electricity costs, or when the AI product has sufficient competitiveness in the market.

Source: Eugene Cheah Substack account

Author: Eugene Cheah

Translation: J1N, Techub News

The decline in AI computing power costs will spark a wave of innovation among startups utilizing low-cost resources.

Last year, due to a tight supply of AI computing power, the rental price for H100 reached as high as $8 per hour, but now the supply of computing power in the market has become excessive, and prices have dropped to below $2 per hour. This is because some companies signed computing power rental contracts early on, and to avoid wasting the excess capacity, they began reselling their reserved computing resources, while the market mostly opted for open-source models, leading to a decrease in demand for new models. Currently, the supply of H100 in the market far exceeds demand, making renting H100 more cost-effective than purchasing, and investing in new H100 has become unprofitable.

A Brief History of the AI Competition

The price of GPU computing power skyrocketed, with the initial rental price of H100 around $4.70 per hour, peaking at over $8. This was due to project founders needing to train their AI models quickly to secure the next round of funding and persuade investors.

ChatGPT was launched in November 2022, using the A100 series of GPUs. By March 2023, NVIDIA released the new H100 series GPUs, promoting that the performance of H100 was three times that of A100, while the price was only twice as high.

This was a huge attraction for AI startups, as the performance of GPUs directly determines the speed and scale of the AI models they can develop. The powerful performance of H100 means these companies can develop AI models that are faster, larger, and more efficient than before, potentially catching up to or surpassing industry leaders like OpenAI. Of course, all of this hinges on their ability to secure enough capital to purchase or rent a large number of H100s.

With the significant performance improvements of H100 and the fierce competition in the AI field, many startups invested heavily to acquire H100s to accelerate their model training. This surge in demand led to a dramatic increase in rental prices for H100, initially at $4.70 per hour, but later rising to over $8.

These startups were willing to pay high rental fees because they were eager to train their models quickly to attract investor attention in the next funding round, aiming for hundreds of millions of dollars to continue expanding their businesses.

For computing centers (farms) with a large number of H100 GPUs, the demand for renting GPUs was extremely high, akin to "money on the table." This was because these AI startups were eager to rent H100s to train their models, even willing to prepay rental fees. This meant that GPU farms could rent out their GPUs at a long-term rate of $4.70 per hour (or higher).

Calculations show that if they could consistently rent out GPUs at this price, the return period on their investment in purchasing H100s (i.e., the time to recover the purchase cost) would be less than 1.5 years. After the return period ends, each GPU could generate over $100,000 in net cash flow income annually.

Due to the sustained high demand for H100 and other high-performance GPUs, investors in GPU farms saw significant profit potential, leading them not only to agree to this business model but also to make larger investments to purchase more GPUs for greater profits.

“The Stupidity of Tulips”: After the first recorded speculative bubble in history, tulip prices soared in 1634 and crashed in February 1637

With the growing demand for artificial intelligence and big data processing, enterprises have seen a surge in demand for high-performance GPUs (especially NVIDIA's H100). To support these compute-intensive tasks, global enterprises initially invested about $600 billion in hardware and infrastructure to purchase GPUs, build data centers, etc., to enhance computing capabilities. However, due to supply chain delays, the price of H100 remained high for most of 2023, even exceeding $4.70 per hour, unless buyers were willing to prepay large deposits. By early 2024, as more suppliers entered the market, the rental price of H100 dropped to about $2.85, but I began receiving various sales emails reflecting the increased competition after the market supply expanded.

Although the initial rental price of H100 GPUs ranged from $8 to $16 per hour, by August 2024, auction-style rental prices had fallen to between $1 and $2 per hour. Market prices are expected to decline by 40% or more annually, far exceeding NVIDIA's forecast of maintaining a price of $4 per hour over the next four years. This rapid price decline poses financial risks for those who recently purchased high-priced new GPUs, as they may not be able to recover costs through rentals.

What is the Capital Return Rate for Investing $50,000 in an H100?

Ignoring power and cooling costs, the purchase cost of an H100 is approximately $50,000, with an expected lifespan of 5 years. There are generally two rental models: short-term on-demand rentals and long-term bookings. Short-term rentals are more expensive but offer greater flexibility, while long-term bookings are cheaper but more stable. The article will analyze the returns from these two models to calculate whether investors can recover costs and achieve profits within 5 years.

Short-term On-Demand Rentals

Rental prices and corresponding returns:

>$2.85 : Exceeds stock market IRR, profitable.

\<$2.85 : Returns lower than stock market investment.

\<$1.65 : Expected investment loss.

Using a "mixed price" model prediction, rental prices may drop to 50% of the current price over the next 5 years. If rental prices remain at $4.50 per hour, the internal rate of return (IRR) exceeds 20%, making it profitable; however, when the price drops to $2.85 per hour, the IRR is only 10%, significantly reducing returns. If the price falls below $2.85 per hour, the investment return may even fall below stock market returns, and when the price drops below $1.65 per hour, investors will face severe loss risks, especially for those who recently purchased H100 servers.

Note: The "mixed price" is a hypothesis that assumes the rental price of H100 will gradually decline to half of the current price over the next 5 years. This estimate is considered optimistic, as current market prices are declining by over 40% annually, so considering price declines is reasonable.

Long-term Lease Agreements (3 years or more)

During the AI boom, many established infrastructure providers, based on past experiences, especially during the early Ethereum PoW era of cryptocurrency, experienced cycles of GPU rental price surges and drops. Therefore, in 2023, they launched high-priced prepayment rental contracts for 3-5 years to lock in profits. These contracts typically require customers to pay prices above $4 per hour, even prepaying 50% to 100% of the rent. With the surge in AI demand, especially in the foundational model companies in the image generation field, to seize market opportunities and be the first to use the latest GPU clusters, these companies had no choice but to sign these high-priced contracts to quickly complete their target models and enhance competitiveness. However, once model training is completed, these companies no longer need these GPU resources, but due to contract lock-in, they cannot easily exit. To mitigate losses, they choose to resell these rented GPU resources to recover some costs. This has led to a large number of resold GPU resources in the market, increasing supply and affecting rental prices and supply-demand relationships.

Current H100 Value Chain

Note: The value chain, also known as value chain analysis or value chain model, was proposed by Michael Porter in 1985 in his book "Competitive Advantage." Porter pointed out that for a company to develop a unique competitive advantage, it must create higher added value for its goods and services. Business strategy structures the company's operating model, becoming a series of value-adding processes, and this series of value-adding processes is the "value chain."

The H100 value chain spans from hardware to AI inference models, and the participants can be roughly divided into the following categories:

  • Hardware suppliers collaborating with Nvidia
  • Data center infrastructure providers and partners
  • Venture capital funds, large companies, and startups: planning to build foundational models (or have already completed model building)
  • Capacity distributors: Runpod, SFCompute, Together.ai, Vast.ai, GPUlist.ai, etc.

The current H100 value chain includes multiple links from hardware suppliers to data center providers, AI model development companies, capacity distributors, and AI inference service providers. The main pressure in the market comes from unused H100 capacity distributors continuously reselling or renting idle resources, as well as the widespread use of "good enough" open-source models (like Llama 3), leading to a decline in demand for H100. These two factors together have resulted in an oversupply of H100, putting downward pressure on market prices.

Market Trend: The Rise of Open-Source Weight Models

Open-source weight models refer to those whose weights have been publicly distributed for free, despite lacking formal open-source licenses, and are widely used in commercial fields.

The demand for these models is primarily driven by two factors: the emergence of large open-source models (like LLaMA3 and DeepSeek-v2) at the scale of GPT-4, and the maturity and widespread adoption of small (8 billion parameters) and medium-sized (70 billion parameters) fine-tuned models.

As these open-source models mature, enterprises can easily access and use them to meet the needs of most AI applications, especially in inference and fine-tuning. Although these models may slightly underperform compared to proprietary models in certain benchmark tests, their performance is good enough to handle most commercial use cases. Therefore, with the proliferation of open-source weight models, the market demand for inference and fine-tuning is rapidly growing.

Open-source weight models also have three key advantages:

First, open-source models offer high flexibility, allowing users to fine-tune the models based on specific domains or tasks, better adapting to different application scenarios. Second, open-source models provide reliability, as model weights do not get updated unexpectedly like some proprietary models, avoiding development issues caused by updates and increasing user trust in the models. Finally, they ensure security and privacy, allowing enterprises to ensure that their prompts and customer data are not leaked through third-party API endpoints, reducing data privacy risks. These advantages are driving the continued growth and widespread adoption of open-source models, particularly in inference and fine-tuning.

Demand Shift Among Small and Medium-Sized Model Creators

Small and medium-sized model creators refer to enterprises or startups that lack the capability or plans to train large foundational models (like 70B parameter models) from scratch. With the rise of open-source models, many companies have realized that fine-tuning existing open-source models is more cost-effective than training a new model from scratch. As a result, more and more companies are choosing to fine-tune rather than train models themselves. This significantly reduces the demand for computing resources like H100.

Fine-tuning is much cheaper than training from scratch. The computing resources required for fine-tuning existing models are far less than those needed for training a foundational model from scratch. Training large foundational models typically requires 16 or more H100 nodes, while fine-tuning usually only requires 1 to 4 nodes. This industry shift has reduced the demand for large clusters among small and medium-sized companies, directly decreasing reliance on H100 computing power.

Additionally, investment in foundational model creation has decreased. In 2023, many small and medium-sized companies attempted to create new foundational models, but now, unless they can bring innovation (such as better architectures or support for hundreds of languages), there are unlikely to be new foundational model creation projects. This is because there are already sufficiently powerful open-source models, like Llama 3, making it difficult for small companies to justify creating new models. Investor interest and funding have also shifted towards fine-tuning rather than training models from scratch, further reducing demand for H100 resources.

Finally, the surplus capacity of reserved nodes is also an issue. Many companies long-term reserved H100 resources during the peak of 2023, but due to the shift towards fine-tuning, they found that these reserved nodes were no longer needed, and some hardware became outdated by the time it arrived. These unused H100 nodes are now being resold or rented out, further increasing market supply and leading to an oversupply of H100 resources.

Overall, with the proliferation of model fine-tuning, the reduction in small and medium-sized foundational model creation, and the surplus of reserved nodes, the demand for H100 in the market has significantly decreased, exacerbating the oversupply situation.

Other factors leading to increased GPU computing power supply and decreased demand

Large Model Creators Moving Away from Open Cloud Platforms

Large AI model creators like Facebook, X.AI, and OpenAI are gradually transitioning from public cloud platforms to building their own private computing clusters. First, existing public cloud resources (like a cluster of 1000 nodes) can no longer meet their needs for training larger models. Second, from a financial perspective, building their own clusters is more advantageous, as purchasing data centers, servers, and other assets can increase company valuation, while renting public clouds is merely an expense that does not enhance assets. Additionally, these companies have sufficient resources and professional teams, and they can even acquire small data center companies to help them build and manage these systems. Therefore, they no longer rely on public clouds. As these companies move away from public cloud platforms, the demand for computing resources in the market decreases, potentially leading to unused resources re-entering the market, increasing supply.

Vast.ai essentially operates as a free market system where suppliers from around the world compete with each other

Idle and Delayed H100 GPUs Coming Online Simultaneously

The simultaneous arrival of idle and delayed H100 GPUs has increased market supply, leading to price declines. Platforms like Vast.ai adopt a free market model where global suppliers compete on price. In 2023, due to delays in H100 shipments, many resources were not brought online in time, and now these delayed H100 resources are starting to enter the market, along with new H200 and B200 devices, as well as idle computing resources from startups and enterprises. Owners of small and medium-sized clusters typically have 8 to 64 nodes, but due to low utilization and depleted funds, their goal is to quickly recover costs by renting out resources at low prices. To achieve this, they choose to compete for customers through fixed rates, auction systems, or free market pricing, especially in auction and free market models, where suppliers compete to ensure resources are rented out, ultimately leading to a significant drop in prices across the market.

Cheaper GPU Alternatives

Another major factor is that once computing power costs exceed budgets, there are many alternative options for AI inference infrastructure, especially if you are running smaller models. There is no need to pay extra for using Infiniband with H100.

Nvidia Market Segmentation

The emergence of cheaper alternatives for AI inference tasks using H100 GPUs directly affects market demand for H100. First, while H100 excels in training and fine-tuning AI models, many cheaper GPUs can meet the needs in inference (i.e., running models), especially for smaller models. Inference tasks do not require the high-end features of H100 (like Infiniband networks), allowing users to choose more economical alternatives to save costs.

Nvidia itself also offers alternative products in the inference market, such as the L40S, a GPU specifically designed for inference, which has about one-third the performance of H100 but costs only one-fifth. While the L40S is not as effective as H100 for multi-node training, it is powerful enough for single-node inference and fine-tuning of small clusters, providing users with a more cost-effective option.

H100 Infiniband cluster performance configuration table (August 2024)

AMD and Intel Alternative Suppliers

Additionally, AMD and Intel have launched lower-priced GPUs, such as AMD's MX300 and Intel's Gaudi 3. These GPUs perform excellently in inference and single-node tasks, are cheaper than H100, and offer more memory and computing power. Although they have not been fully validated in large multi-node cluster training, they are mature enough for inference tasks, making them strong alternatives to H100.

These cheaper GPUs have proven capable of handling most inference tasks, especially those involving common model architectures (like LLaMA 3). Therefore, after resolving compatibility issues, users can opt for these alternative GPUs to reduce costs. In summary, these alternatives in the inference domain are gradually replacing H100, particularly in small-scale inference and fine-tuning tasks, further reducing demand for H100.

Decline in GPU Usage in the Web3 Field

Due to fluctuations in the cryptocurrency market, the usage of GPUs in crypto mining has decreased, leading to a large influx of GPUs into the cloud market. Although these GPUs may not be capable of handling complex AI training tasks due to hardware limitations, they perform well in simpler AI inference work, especially for budget-conscious users handling smaller models (under 10B parameters), making these GPUs a highly cost-effective choice. With optimization, these GPUs can even run large models at a lower cost than using H100 nodes.

What is the Current Market Like After the GPU Computing Power Rental Bubble Burst?

The challenges facing new entrants: New public cloud H100 clusters entering the market late may struggle to be profitable, and some investors may suffer significant losses.

New public cloud H100 clusters entering the market face profitability challenges. If rental prices are set too low (below $2.25), they may not cover operating costs, leading to losses; if priced too high (at $3 or above), they may lose customers, resulting in idle capacity. Additionally, clusters entering the market later, having missed the early high prices ($4/hour), find it difficult to recover costs, putting investors at risk of unprofitability. This makes cluster investment very challenging and could even lead to significant losses for investors.

Profitability for early entrants: Medium or large model creators who signed long-term rental contracts early have already recovered costs and achieved profitability.

Medium and large model creators have gained value from long-term rentals of H100 computing resources, with the costs of these resources covered during financing. Although some computing resources are not fully utilized, these companies have leveraged these clusters for current and future model training through the financing market, extracting value from them. Even with unused resources, they can generate additional income through resale or rental, which lowers market prices and reduces negative impacts, overall having a positive effect on the ecosystem.

After the Bubble Burst: The Low-Cost H100 Can Accelerate the Wave of Open-Source AI Adoption

The emergence of low-cost H100 GPUs will drive the development of open-source AI. As H100 prices decline, AI developers and hobbyists can run and fine-tune open-source weight models more affordably, leading to broader adoption of these models. If proprietary models (like GPT5++) do not achieve significant technological breakthroughs in the future, the gap between open-source and proprietary models will narrow, promoting the development of AI applications. As the costs of AI inference and fine-tuning decrease, it may trigger a new wave of AI applications, accelerating overall market progress.

Conclusion: Do Not Purchase Brand New H100s

Investing in brand new H100 GPUs now is likely to result in losses. However, it may be reasonable to invest only in special circumstances, such as if a project can purchase discounted H100s, has low electricity costs, or if its AI product has sufficient competitiveness in the market. If you are considering an investment, it is advisable to allocate funds to other areas or the stock market for better returns.

ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.
ChainCatcher Building the Web3 world with innovators