A Card, 700 Billion Parameters, and Consumes Less Than Half of NVIDIA Blackwell

Skymizer has unveiled HTX301, a new PCIe solution dedicated to local AI inference that promises to redefine the relationship between performance, consumption, and infrastructure costs for companies interested in Large Language Models (LLM). The Taiwanese company claims that its platform is capable of running models up to 700 billion parameters on a single card, a result that, if confirmed, could significantly reduce reliance on large GPU clusters or cloud infrastructures.

At the core of the project is the HyperThought platform, built around the new intellectual property LPU (Language Processing Unit). Unlike general-purpose GPUs, this architecture is expressly designed for LLM workloads, with a particular focus on energy efficiency and the separate yet coordinated management of prefill and decode phases.

Each card integrates six HTX301 chips and adopts a traditional PCIe design, with memory distributed around the processors. While employing a 28 nm manufacturing process, which is less advanced than more recent nodes, Skymizer reports impressive figures: 30 tokens per second with just 0.5 TOPS and bandwidth equal to 100 GB/s. The Octa-Core LPU architecture also achieves 240 tokens/s in Llama2 7B prefill, while multi-chip scalability allows for up to 1200 tokens/s.

One of the most notable aspects relates to memory: HTX301 can reach up to 384 GB using LPDDR4 and LPDDR5, which help reduce costs compared to HBM or GDDR6/7. Skymizer has chosen this configuration to optimize the balance between capacity, bandwidth, and cost, especially for enterprise scenarios focused on on-premise inference.

The company also emphasizes its compression technologies, a crucial element in making the execution of large models sustainable. The compression of 'Weight' (long-term memory) would surpass llama.cpp open-source by 9% to 17.8%, while KV cache compression would reduce perplexity loss between 0.06% and 3.52%.

The most interesting data, however, remains the TDP of 240W, less than half compared to the 600W claimed for high-end AI PCIe accelerators such as the NVIDIA RTX PRO 6000 Blackwell and AMD Instinct MI350P. This figure could translate into significantly reduced operational costs.

HTX301 will be previewed at Computex, where Skymizer is expected to showcase what is presented as an extremely ambitious project. If the claimed performance is confirmed, the local enterprise AI market could welcome a new category of accelerators.