Price plummets by 70%: How the AI computing power rental bubble burst?

Techub News
2024-10-22 17:39:51
Collection
Investing in a brand new H100 GPU right now is likely to result in a loss. However, it may be reasonable to invest only under special circumstances, such as being able to purchase discounted H100s, having low electricity costs, or when the AI product has sufficient competitiveness in the market.

Source: Eugene Cheah Substack account

Author: Eugene Cheah

Compiled by: J1N, Techub News

The decline in AI computing costs will spark a wave of innovation among startups utilizing low-cost resources.

Last year, due to a tight supply of AI computing power, the rental price of H100 reached as high as $8 per hour, but now, with an oversupply in the market, the price has dropped to below $2 per hour. This is because some companies signed computing power rental contracts early on, and to avoid wasting the excess capacity, they began reselling their reserved computing resources. Most of the market has chosen to use open-source models, leading to a decrease in demand for new models. Currently, the supply of H100 in the market far exceeds demand, making renting H100 more cost-effective than purchasing, and investing in new H100 has become unprofitable.

A Brief History of the AI Race

The price of GPU computing power skyrocketed, with the initial rental price of H100 around $4.70 per hour, peaking at over $8. This was due to project founders needing to train their AI models quickly to secure the next round of funding and persuade investors.

ChatGPT was launched in November 2022, using the A100 series of GPUs. By March 2023, NVIDIA released the new H100 series GPUs, promoting that H100's performance is three times that of A100, yet its price is only twice that of A100.

This was a huge attraction for AI startups, as the performance of GPUs directly determines the speed and scale of the AI models they can develop. The powerful performance of H100 means these companies can develop AI models that are faster, larger, and more efficient than before, potentially catching up to or surpassing industry leaders like OpenAI. Of course, all of this hinges on their ability to secure enough capital to purchase or rent a large number of H100s.

With the significant performance boost of H100 and the fierce competition in the AI field, many startups invested heavily to acquire H100s to accelerate their model training. This surge in demand caused the rental price of H100 to skyrocket, initially at $4.70 per hour, later exceeding $8.

These startups were willing to pay high rental fees because they were eager to train models quickly to attract investor attention in the next funding round, aiming to secure hundreds of millions of dollars to continue expanding their businesses.

For computing centers (farms) with a large number of H100 GPUs, the demand for renting GPUs was extremely high, akin to "money on the table." This was because these AI startups were eager to rent H100s to train their models, even willing to prepay rental fees. This meant that GPU farms could rent out their GPUs at a long-term rate of $4.70 per hour (or higher).

Calculations show that if they can continue renting GPUs at this price, the return period on their investment in purchasing H100s (i.e., the time to recover the purchase cost) would be less than 1.5 years. After the return period, each GPU could generate over $100,000 in net cash flow income annually.

Due to the sustained high demand for H100 and other high-performance GPUs, investors in GPU farms saw significant profit potential, leading them not only to agree to this business model but also to make larger investments to purchase more GPUs for greater profits.

“The Stupidity of Tulips”: After the first recorded speculative bubble in history, tulip prices soared in 1634 and crashed in February 1637

With the growing demand for artificial intelligence and big data processing, enterprises have seen a surge in demand for high-performance GPUs (especially NVIDIA's H100). To support these compute-intensive tasks, global enterprises initially invested about $600 billion in hardware and infrastructure, including purchasing GPUs and building data centers, to enhance computing capabilities. However, due to supply chain delays, the price of H100 remained high for most of 2023, even exceeding $4.70 per hour, unless buyers were willing to pay large upfront deposits. By early 2024, as more suppliers entered the market, the rental price of H100 dropped to around $2.85, but I began receiving various sales emails reflecting the increased competition following the market supply increase.

Although the initial rental price of H100 GPUs was between $8 and $16 per hour, by August 2024, auction-style rental prices had dropped to between $1 and $2 per hour. Market prices are expected to decline by 40% or more annually, far exceeding NVIDIA's forecast of maintaining a price of $4 per hour over four years. This rapid price decline poses financial risks for those who recently purchased high-priced new GPUs, as they may not be able to recover costs through rentals.

What is the Capital Return Rate for Investing $50,000 in an H100?

Without considering electricity and cooling costs, the purchase cost of H100 is approximately $50,000, with an expected lifespan of 5 years. Rentals typically come in two models: short-term on-demand rentals and long-term bookings. Short-term rentals are more expensive but offer greater flexibility, while long-term bookings are cheaper but more stable. The article will analyze the returns of these two models to calculate whether investors can recover costs and achieve profits within 5 years.

Short-term On-demand Rentals

Rental prices and corresponding returns:

>$2.85 : Exceeds stock market IRR, profitable.

\<$2.85 : Returns lower than stock market returns.

\<$1.65 : Expected investment loss.

Using a "mixed price" model prediction, rental prices may drop to 50% of current prices over the next 5 years. If rental prices remain at $4.50 per hour, the internal rate of return (IRR) exceeds 20%, making it profitable; however, when the price drops to $2.85 per hour, the IRR is only 10%, significantly reducing returns. If the price falls below $2.85 per hour, investment returns may even fall below stock market returns, and when the price drops below $1.65 per hour, investors will face severe loss risks, especially for those who recently purchased H100 servers.

Note: The "mixed price" is a hypothesis that assumes the rental price of H100 will gradually decline to half of the current price over the next 5 years. This estimate is considered optimistic, as current market prices are declining by over 40% annually, so considering price declines is reasonable.

Long-term Lease Agreements (3 years or more)

During the AI boom, many established infrastructure providers, based on past experiences—especially during the early Ethereum PoW era of cryptocurrency, which saw cycles of GPU rental price surges and drops—introduced high-priced prepayment rental contracts of 3-5 years in 2023 to lock in profits. These contracts typically require customers to pay above $4 per hour, even prepaying 50% to 100% of the rental fees. With the surge in AI demand, especially in the foundational model companies for image generation, to seize market opportunities and be the first to use the latest GPU clusters, these companies had no choice but to sign these expensive contracts to quickly complete their target models and enhance competitiveness. However, once model training is completed, these companies no longer need these GPU resources, but due to contract lock-ins, they cannot easily exit. To mitigate losses, they choose to resell these rented GPU resources to recover some costs. This has led to a large number of resold GPU resources in the market, increasing supply and affecting rental prices and supply-demand relationships.

Current H100 Value Chain

Note: The value chain, also known as value chain analysis or value chain model, was proposed by Michael Porter in 1985 in his book "Competitive Advantage." Porter pointed out that for a company to develop a unique competitive advantage, it must create higher added value for its goods and services. Business strategy structures the company's operating model into a series of value-adding processes, and this series of value-adding processes is the "value chain."

The H100 value chain spans from hardware to AI inference models, and the participants can be roughly divided into the following categories:

  • Hardware suppliers collaborating with Nvidia
  • Data center infrastructure providers and partners
  • Venture capital funds, large companies, and startups: planning to build foundational models (or have already completed model building)
  • Capacity distributors: Runpod, SFCompute, Together.ai, Vast.ai, GPUlist.ai, etc.

The current H100 value chain includes multiple links from hardware suppliers to data center providers, AI model development companies, capacity distributors, and AI inference service providers. The main pressure in the market comes from unused H100 capacity distributors continuously reselling or renting out idle resources, as well as the widespread use of "good enough" open-source models (like Llama 3), leading to a decrease in demand for H100. These two factors have jointly resulted in an oversupply of H100, putting downward pressure on market prices.

Market Trends: The Rise of Open-source Weight Models

Open-source weight models refer to those whose weights have been publicly distributed for free, despite lacking formal open-source licenses, and are widely used in commercial fields.

The demand for these models is driven by two main factors: the emergence of large open-source models (like LLaMA3 and DeepSeek-v2) on the scale of GPT-4, and the maturity and widespread adoption of small (8 billion parameters) and medium (70 billion parameters) fine-tuned models.

As these open-source models mature, enterprises can easily access and use them to meet the needs of most AI applications, especially in inference and fine-tuning. Although these models may slightly lag behind proprietary models in some benchmark tests, their performance is already good enough to handle most commercial use cases. Therefore, as open-source weight models become more prevalent, the market demand for inference and fine-tuning is rapidly growing.

Open-source Weight Models Also Have Three Key Advantages:

First, open-source models offer high flexibility, allowing users to fine-tune the models based on specific domains or tasks, better adapting to different application scenarios. Second, open-source models provide reliability, as model weights do not update without notification like some proprietary models, avoiding development issues caused by updates and increasing user trust in the models. Finally, they ensure security and privacy, allowing enterprises to ensure that their prompts and customer data are not leaked through third-party API endpoints, reducing data privacy risks. These advantages have driven the continued growth and widespread adoption of open-source models, particularly in inference and fine-tuning.

Demand Shift Among Small and Medium Model Creators

Small and medium model creators refer to enterprises or startups that lack the capability or plan to train large foundational models (like 70B parameter models) from scratch. With the rise of open-source models, many companies have realized that fine-tuning existing open-source models is more cost-effective than training a new model from scratch. As a result, an increasing number of companies are choosing to fine-tune rather than train models themselves. This significantly reduces the demand for computing resources like H100.

Fine-tuning is much cheaper than training from scratch. The computing resources required for fine-tuning existing models are far less than those needed to train a foundational model from scratch. Training large foundational models typically requires 16 or more H100 nodes, while fine-tuning usually only requires 1 to 4 nodes. This shift in the industry has cut down the demand for large clusters among small and medium companies, directly reducing reliance on H100 computing power.

Additionally, investment in foundational model creation has decreased. In 2023, many small and medium companies attempted to create new foundational models, but now, unless they can bring innovation (like better architecture or support for hundreds of languages), there are unlikely to be new foundational model creation projects. This is because there are already sufficiently powerful open-source models, like Llama 3, making it hard for small companies to justify creating new models. Investor interest and funding have also shifted towards fine-tuning rather than training from scratch, further reducing the demand for H100 resources.

Finally, the surplus capacity of reserved nodes is also an issue. Many companies long-term reserved H100 resources during the peak in 2023, but due to the shift towards fine-tuning, they found these reserved nodes no longer needed, and some hardware became outdated by the time it arrived. These unused H100 nodes are now being resold or rented out, further increasing market supply and leading to an oversupply of H100 resources.

Overall, with the proliferation of model fine-tuning, the reduction in small and medium foundational model creation, and the surplus of reserved nodes, the demand for H100 in the market has significantly decreased, exacerbating the oversupply situation.

Other factors leading to increased GPU computing supply and decreased demand

Large Model Creators Moving Away from Open Cloud Platforms

Large AI model creators like Facebook, X.AI, and OpenAI are gradually moving from public cloud platforms to building their own private computing clusters. First, existing public cloud resources (like a cluster of 1,000 nodes) can no longer meet their needs for training larger models. Second, from a financial perspective, building their own clusters is more advantageous, as purchasing data centers, servers, and other assets can increase company valuation, while renting public cloud resources is merely an expense that does not enhance assets. Additionally, these companies have sufficient resources and professional teams and can even acquire small data center companies to help them build and manage these systems. Therefore, they no longer rely on public clouds. As these companies move away from public cloud platforms, the demand for computing resources in the market decreases, potentially leading to unused resources re-entering the market and increasing supply.

Vast.ai essentially operates as a free market system where suppliers from around the world compete with each other

Idle and Delayed H100 GPUs Coming Online Simultaneously

The simultaneous arrival of idle and delayed H100 GPUs has increased market supply, leading to price declines. Platforms like Vast.ai adopt a free market model where global suppliers compete on price. In 2023, due to delays in H100 shipments, many resources were not brought online in time, and now these delayed H100 resources are starting to enter the market, along with new H200 and B200 devices, as well as idle computing resources from startups and enterprises. Owners of small and medium clusters typically have 8 to 64 nodes, but due to low utilization and exhausted funds, their goal is to quickly recover costs by renting resources at low prices. To achieve this, they choose to compete for customers through fixed rates, auction systems, or free market pricing, especially in auction and free market models, where suppliers compete to ensure resources are rented out, ultimately leading to a significant drop in overall market prices.

Cheaper GPU Alternatives

Another major factor is that once computing costs exceed budgets, there are many alternative options for AI inference infrastructure, especially if you are running smaller models. There is no need to pay extra for using Infiniband with H100.

Nvidia Market Segmentation

The emergence of cheaper alternatives for AI inference tasks using H100 GPUs directly affects the market demand for H100. First, while H100 excels in training and fine-tuning AI models, many cheaper GPUs can meet the needs in inference (i.e., running models), especially for smaller models. This is because inference tasks do not require the high-end features of H100 (like Infiniband networks), allowing users to choose more economical alternatives to save costs.

Nvidia itself also offers alternative products in the inference market, such as the L40S, a GPU specifically designed for inference, which has about one-third the performance of H100 but costs only one-fifth. Although the L40S does not perform as well as H100 in multi-node training, it is sufficiently powerful for single-node inference and fine-tuning of small clusters, providing users with a more cost-effective option.

H100 Infiniband cluster performance configuration table (August 2024)

AMD and Intel Alternative Suppliers

Additionally, AMD and Intel have also launched lower-priced GPUs, such as AMD's MX300 and Intel's Gaudi 3. These GPUs perform excellently in inference and single-node tasks, are cheaper than H100, and offer more memory and computing power. Although they have not yet been fully validated in large multi-node cluster training, they are sufficiently mature for inference tasks, making them strong alternatives to H100.

These cheaper GPUs have proven capable of handling most inference tasks, especially for common model architectures (like LLaMA 3). Therefore, users can choose these alternative GPUs to reduce costs after resolving compatibility issues. In summary, these alternatives in the inference domain are gradually replacing H100, especially in small-scale inference and fine-tuning tasks, further reducing the demand for H100.

Declining GPU Utilization in the Web3 Space

Due to fluctuations in the cryptocurrency market, GPU utilization in crypto mining has decreased, leading to a large influx of GPUs into the cloud market. Although these GPUs may not be suitable for complex AI training tasks due to hardware limitations, they perform well in simpler AI inference tasks, especially for budget-conscious users handling smaller models (under 10B parameters), making these GPUs a highly cost-effective choice. With optimization, these GPUs can even run large models at a lower cost than using H100 nodes.

What is the Current Market Like After the AI Computing Rental Bubble?

The challenges facing new entrants: New public cloud H100 clusters entering the market late may struggle to be profitable, and some investors may incur significant losses.

New public cloud H100 clusters entering the market face profitability challenges. If rental prices are set too low (below $2.25), they may not cover operating costs, leading to losses; if priced too high (at $3 or above), they may lose customers, resulting in idle capacity. Moreover, clusters entering the market later, having missed the early high prices ($4/hour), find it difficult to recover costs, putting investors at risk of unprofitability. This makes cluster investment very challenging and could even lead to significant losses for investors.

Early entrants' earnings situation: Medium or large model creators who signed long-term rental contracts early have already recovered costs and achieved profitability.

Medium and large model creators have gained value from long-term rentals of H100 computing resources, with these costs covered during financing. Although some computing resources are not fully utilized, these companies have leveraged these clusters for current and future model training through the financing market, extracting value from them. Even with unused resources, they can generate additional income through resale or rental, which lowers market prices, reduces negative impacts, and overall has a positive effect on the ecosystem.

After the Bubble Bursts: The Low-cost H100 Can Accelerate the Wave of Open-source AI Adoption

The emergence of low-cost H100 GPUs will drive the development of open-source AI. As H100 prices decline, AI developers and hobbyists can run and fine-tune open-source weight models more affordably, leading to broader adoption of these models. If future closed-source models (like GPT5++) do not achieve significant technological breakthroughs, the gap between open-source and closed-source models will narrow, promoting the development of AI applications. As the costs of AI inference and fine-tuning decrease, it may trigger a new wave of AI applications, accelerating overall market progress.

Conclusion: Do Not Purchase Brand New H100s

Investing in brand new H100 GPUs now is likely to result in losses. However, it may be reasonable to invest only in special circumstances, such as if a project can purchase discounted H100s, has low electricity costs, or if its AI product has sufficient competitiveness in the market. If you are considering investing, it is advisable to allocate funds to other areas or the stock market for better returns.

ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.
banner
ChainCatcher Building the Web3 world with innovators