This 8-Year-Old NVIDIA GPU Humiliates Modern Cards in AI: Costs Just $100
Eight years after its debut, the NVIDIA Tesla V100 datacenter GPU is back in the spotlight thanks to a series of benchmarks focused on the local execution of AI LLM models. The tests, conducted by the Hardware Haven channel, show how a professional card based on the Volta architecture can still compete today with much more recent consumer solutions in specific AI-related workloads.
The Tesla V100 represented an important milestone for NVIDIA, being the first family designed exclusively for the datacenter sector and the first to adopt Tensor Cores, which are now a central element of the company's AI accelerations. The tested GPU is an SXM2 variant with 5120 CUDA Cores, 640 Tensor Cores, 16 GB of HBM2 memory on a 4096-bit bus, and a bandwidth of 898 GB/s. The declared TDP is 250 W, significantly lower than today's high-end AI platforms.
At the time of launch, the price of the Tesla V100 exceeded $10,000, while today, the used 16 GB versions can be found on the second-hand market for around $100. However, using an SXM2 GPU in a common desktop PC is not straightforward: the format requires dedicated SXM-to-PCIe adapters, specific power supply, and a custom cooling system.
In the test, a handcrafted thermal solution was necessary, using a 3D-printed duct and a dedicated Noctua fan, as the Tesla V100s are designed to operate in servers with passive cooling and controlled airflow. Combining the GPU, adapter, and modifications, the overall project cost exceeded $200.
The results obtained with local LLM models are nonetheless interesting. In the benchmark with GPT-oss 20B, the V100 achieved about 130 tokens per second, compared to approximately 90 tokens/s recorded by the Radeon RX 7800 XT. Even against the GeForce RTX 3060 with 12 GB, the Volta solution showed a significant advantage in tests with Gemma4:e4b via Ollama and Open WebUI.
According to shared data, the V100 ensured a token generation speed that was 42% faster than the RTX 3060, while also maintaining better energy efficiency. When limiting both GPUs to 100 W, the datacenter card maintained a 41% advantage in the token/sec/watt ratio.
The results highlight how some previous-generation professional GPUs can still offer value in modern AI workloads, especially due to the presence of Tensor Cores and the high bandwidth of HBM2 memory. Nevertheless, several practical limitations remain due to the modifications required to make it work, which are not within reach for all users. The 32 GB versions of the Tesla V100, now sold for between $400 and $500, could also prove to be even more interesting for larger AI models, where VRAM capacity continues to represent one of the main operational constraints.