China has found a creative way to crack the challenge of using NVIDIA’s limited AI hardware. DeepSeek has unveiled an impressive venture, pushing the capabilities of the Hopper H800 AI accelerators to new heights, delivering eight times more TFLOPS.
DeepSeek’s latest achievement, the FlashMLA, is set to transform China’s AI scene by extracting every ounce of potential from NVIDIA’s scaled-down Hopper GPUs. Rather than relying on outside tech, Chinese firms—DeepSeek at the forefront—are cleverly using software to navigate around hardware limitations. DeepSeek’s latest framework demonstrates how adeptly they’ve overcome these challenges. By fine-tuning memory use and resource allocation, they’ve unlocked unprecedented performance levels from NVIDIA’s pared-down GPUs.
During their “OpenSource” week, DeepSeek took the opportunity to release FlashMLA, an innovative decoding kernel that’s tailor-made for NVIDIA’s Hopper GPUs. Before diving into its mechanics, let’s appreciate the game-changing upgrades it introduces to the industry. It’s nothing short of groundbreaking. DeepSeek claims the Hopper H800 can now achieve 580 TFLOPS using FlashMLA for BF16 matrix operations—an achievement that’s astonishingly eight times the norm. But here’s the kicker: they’ve managed all this without any hardware modifications, achieving memory speeds soaring to 3000 GB/s, nearly double the H800’s supposed maximum.
With FlashMLA, DeepSeek employs something called “low-rank key-value compression.” In layman’s terms, this breaks down large sets of data into more manageable pieces, speeding up processing while cutting memory use by 40% to 60%. Moreover, its block-based paging system dynamically adjusts memory allocation based on task demands, enhancing the handling of variable-length sequences and boosting performance.
FlashMLA underscores a compelling point: success in AI computing isn’t tied to hardware alone. It’s a testament to the multifaceted nature of technological advancements, with software playing a crucial role. Although presently, FlashMLA is crafted specifically for Hopper GPUs, one can only wonder about the potential performance benefits it might unlock with future models like the H100.