Close Up on ASUS Ascent GX10: the AI Supercomputer with NVIDIA Heart that Fits on Your Desk

There was a time when a petaFLOP of computing power took up an entire room, requiring industrial cooling systems and sky-high budgets. With the ASUS Ascent GX10, that barrier has fallen: the device measures 150 x 150 x 51 mm and weighs 1.48 kg. Yet, inside it beats one of the most advanced chips ever produced, the NVIDIA GB10 Grace Blackwell Superchip, the same technological heart that powers the NVIDIA DGX Spark platform.

The GX10 positions itself in an emerging segment: that of AI supercomputers "for the desk", designed not for gaming or to replace a traditional workstation, but explicitly for developers, researchers, and data scientists who want to perform inference and fine-tuning of large language models without relying on the cloud.

Architecture and Technical Specifications of NVIDIA GB10 Grace Blackwell Superchip

The GB10 is a System-on-Chip (SoC) that integrates into a single package:

an NVIDIA Blackwell GPU with 6144 CUDA cores and fifth-generation Tensor Core, supporting FP4 precision, capable of delivering up to 1 petaFLOP of AI performance.
a 20-core high-performance ARM v9.2-A CPU for data preprocessing, task orchestration, and model fine-tuning.
128 GB of LPDDR5X memory in Coherent Unified System Memory configuration: the CPU and GPU share the same memory pool with extremely high bandwidth, eliminating the classic bottleneck of PCIe transfers.

The connection between the CPU and GPU occurs via NVIDIA NVLink-C2C, ensuring bandwidth roughly five times higher than PCIe Gen 5, making the memory model consistent and as transparent as possible to applications.

Specifications Ascent GX10

CPU: ARM v9.2-A (GB10 Superchip)
GPU: NVIDIA Blackwell (GB10, integrated)
Memory: 128 GB LPDDR5x Coherent Unified
Storage: 1 TB / 2 TB NVMe PCIe 4.0 × 4 or 4 TB NVMe PCIe 5.0 × 4
Operating System: NVIDIA DGX OS (based on Ubuntu Linux)
Wi-Fi: Wi-Fi 7 (Gig+) 2x2 + Bluetooth 5.4
LAN: 1x 10 GbE
AI Networking: 1x NVIDIA ConnectX-7 NIC
USB Ports (back): 3x USB 3.2 Gen 2×2 Type-C (20 Gbps, DisplayPort 2.1) + 1x USB-C with PD in (180 W EPR PD3.1)
Video: 1x HDMI 2.1b
Power Supply: Up to 180 W via USB-C
Dimensions: 150 x 150 x 51 mm
Weight: 1.48 kg

High-Performance Networking with ConnectX-7 and Other Connectivity

The GX10 integrates the NVIDIA ConnectX-7 NIC, designed not for web browsing but for connecting AI computing nodes. Hardware acceleration for TLS, IPsec, and MACsec ensures encrypted transmissions without impacting the CPU, while support for IEEE 1588v2 PTP allows microsecond-level timing synchronization for real-time edge computing applications.

The most interesting feature, however, is the ability to link two GX10 units in a stack via the QSFP112 400G cable (sold separately): the two systems form a single node with 256 GB of unified memory, sufficient to run models of 405 billion parameters like Llama 3.1 405B in full inference.

On the back panel, besides the ConnectX-7, there are three USB-C 3.2 Gen 2x2 at 20 Gbps with DisplayPort 2.1 support (allowing for up to three monitors to be connected) and a dedicated USB-C port for power (180 W EPR PD3.1). Completing the panel are an HDMI 2.1b port, a 10 GbE LAN port, and a Kensington lock. Wi-Fi 7 ensures next-generation wireless connectivity.

Noteworthy is the USB-C port with Power Delivery in: it means that the GX10 is powered via a standard USB-C connector, simplifying integration into various environments and making cable management easier.

The ASUS package includes the GX10, the power supply, the power cable, and documentation. The 0.4 m QSFP 400G cable necessary for dual-unit stacking is sold separately and must be requested through local distribution channels.

Cooling

Managing the thermal dissipation of a 180 W AI SoC in a chassis just 51 mm high is no small challenge. ASUS has designed a dual-fan system with 7-level control, which according to the claimed data ensures thermal coverage 1.6 times better than comparable compact systems. The aim is to ensure sustained performance under prolonged AI loads, without throttling.

While precise and complete temperature readings were not possible during our testing, we observed that ASUS Ascent GX10 is a silent system, typically not seeing the NVIDIA Superchip exceed 80 °C—of course, with variability depending on workload and executed model.

Stacked Software: Ready to Use

One of the most appreciated features of the GX10 is that it arrives configured as a complete platform, not as raw hardware to set up from scratch. The system runs on NVIDIA DGX OS—an Ubuntu Linux distribution optimized for AI computing—with the following pre-installed and optimized:

CUDA, PyTorch, TensorFlow, Jupyter Notebook
NVIDIA TensorRT for high-performance inference
NVIDIA NIM and Blueprints for pre-built AI microservices
Ollama for rapid prototyping with local open-source models

Among the supported models are DeepSeek R1 (up to 70B parameters in single configuration) and Meta Llama 3.1 up to 405B parameters in dual-GX10 configuration.

Who Ascent GX10 is For

The GX10 targets a specific audience and does so unambiguously. It is not a product for gamers or professional video editing: it is a machine for those working with language models and generative AI on a daily basis. AI researchers and developers find in this device a complete local development environment, with which to fine-tune models of hundreds of billions of parameters without relying on external APIs and without variable token costs. The fact that the system is based on Ubuntu and already includes PyTorch and Jupyter significantly lowers the entry barrier.

Data scientists and ML engineers can use it as a high-density edge workstation, suitable for contexts where data cannot leave the corporate perimeter (healthcare, fintech, industry). Startups and research labs with limited budgets find it a more economical alternative than the cloud: the hardware is a fixed cost, and running local open-source models incurs no recurring token costs. Enterprise teams operating in regulated sectors (healthcare, financial services, industry) may appreciate the data sovereignty guaranteed by on-premise inference, reducing the risks associated with transmitting sensitive data to cloud services.

The GX10 also serves as a testing and prototyping node: models developed and validated locally can be transferred with minimal modifications to DGX cloud infrastructures or accelerated data centers, thanks to full-stack compatibility with the NVIDIA AI ecosystem.

ASUS Ascent GX10 Tested

To put the ASUS Ascent GX10 to the test, we ran some performance tests using prevalent models. In the absence of a competitor's Ryzen AI Max platform, we compared it with two other configurations: a CPU-only platform, with a Ryzen Threadripper 9970X with 32 cores and 64 GB and 128 GB of G.Skill Zeta R5 Neo DDR5-6400 memory (32 GB x 4), and the same PC equipped with a GeForce RTX 5090 with 32 GB of VRAM.

The first model we tested Ascent GX10 with, using Ollama (an open-source program that allows downloading and running AI language models (LLMs) directly on the computer), was Qwen3.6-27B (qwen3.6:27b). This is a leading open-weights model released by the Alibaba team. Specifically, it is the flagship “dense” (non-MoE) model of the Qwen3.6 family. Its peculiarity is that despite having "only" 27 billion parameters, it manages to outperform much larger and heavier models of the previous generation in programming and reasoning benchmarks. We used the quantized version (Q4_K_M).

As you can see, ASUS's super-mini PC positions itself clearly above the AMD CPU, while NVIDIA's flagship consumer GPU is many lengths more powerful. This is because the GPU, besides having the necessary VRAM to run the model, offers thousands of extremely fast cores that operate in parallel. ASUS's system has a GPU with 6144 Blackwell cores, basically the same setup as an RTX 5070.

The real strength of ASUS Ascent GX10 is that with 128 GB of unified memory, it can handle multiple models and especially sizes so large that they cannot fit in the RTX 5090's VRAM. Conversely, the RTX 5090 may have a more or less pronounced advantage, although it always depends on the model.

Moving to the second model, Gemma4 quantized with 26 billion parameters, we see the ASUS mini-PC perform even better, and with less distance from the RTX 5090. The third test with Gemma4 31B BF16 (gemma4:31b-it-bf16) is, however, a classic example of a model that is "too large" to fit in the RTX 5090's VRAM, and consequently, the model does not run on the GPU. This does not mean it isn't usable: everything is loaded on the CPU.

In this case, on the ASUS platform, we managed to run the model solely on the GB10 GPU. Regarding the Threadripper platform, we noticed that the model ran using only 32 physical cores, but with some adjustments we managed to get it to use 64 threads: evidently, there's an optimization issue, as performance worsened.

We then moved on to an advanced use case, Llama.cpp with Qwen 3.6 35B and MTP. We are talking about a local inference setup where Llama.cpp (running in C++ on CUDA) hosts Qwen 3.6 35B MoE, a model that occupies memory space of a 35B but calculates with the agility of a 3B. Activating MTP (Multi-Token Prediction) allows overcoming the video memory bandwidth limits through native speculative decoding, ensuring a burst of tokens per second (Token/s) hardware permitting.

With NVIDIA Nemotron 3 Nano Omni (33b-q4_K_M), we are talking about a heavyweight of 33 billion parameters focused on real-time processing of audio, video, and text without intermediaries (Omni). Thanks to the q4_K_M quantization, NVIDIA brings an enterprise-class model directly to enthusiasts' PCs: it requires only 24GB of VRAM to run locally at maximum speed, leveraging the hardware acceleration of Tensor Cores without saturating the PCIe bus.

To further assess the ASUS system's capabilities in AI usage, we ran Hermes Agent, an open-source framework for autonomous AI agents developed by Nous Research. This allowed us to verify the ability to reactively manage one of the cutting-edge uses in the AI field, brought to the fore by OpenClaw.

Finally, through OpenCode, we asked a model to create the historical screen of the movie Matrix, with green code falling from top to bottom. You can see how quickly Ascent GX10 responded to the request, completing it:

Conclusions

The ASUS Ascent GX10 represents a concrete paradigm shift: the computing power sufficient for serious work with language models of hundreds of billions of parameters is now available in a format that fits in a bag.

The 128 GB unified memory, the integration of ConnectX-7, and the dual-node scalability make it credible not only as a prototyping tool but as production infrastructure for small teams.

The most relevant strengths are the completeness of the software stack, scalability through stacking, and the ability to run major open-source models locally—with all that entails in terms of privacy, latency, and operational costs. The main challenge remains the price, positioning it more as a professional investment than a consumer product.

However, even when considering that certainly not secondary element, the Ascent GX10 comes out quite well: with a price of about €4000 turnkey, it is certainly not exorbitantly expensive, though it should be noted that this is the current price; the market situation marked by continuous increases in memory and storage could negatively impact it. The GeForce RTX 5090 alone costs over €3500, sometimes even more than the ASUS mini-PC itself. And when considering a workstation platform like the one in which we installed the NVIDIA GPU, the total price rises to €8000.

The 128 GB of unified memory installed on the system also gives flexibility to work with different models and several AI agents simultaneously. Ultimately, the ASUS Ascent GX10 is not just a successful product: it is a signal of where the sector is heading. The ability to run large-scale language models locally on a compact, relatively accessible, and ready-to-use machine reduces the gap between enterprise infrastructures and professional users.

For developers, researchers, and companies looking to experiment or deploy AI solutions while keeping control of their data, systems like this could soon become the norm rather than the exception. And, looking at the balance between performance, memory capacity, and price, the GX10 shows how this transition has already begun.