Meta Found a Way to Save Millions: Instead of Buying New RAM, It Recycles Old Ones

The continuous expansion of infrastructure dedicated to artificial intelligence is increasing the demand for hardware components and, consequently, the costs of building large data centers. To address this scenario, Meta has chosen a path different from the indiscriminate purchase of new hardware: recovering DDR4 memory from servers that have reached the end of their operating cycle and using it in next-generation platforms through an internally developed ASIC called Vistara.

The company will present the project at ISCA 2026, but a technical document reveals a solution already operational within the company's infrastructure, where it is used across millions of servers to support workloads ranging from AI inference to data analysis. According to Meta, the lifespan of servers generally ranges from three to five years, while memory modules can remain fully functional for seven to ten years. This means that large quantities of DDR4 RAM are discarded along with the servers even though they are still fully usable.

At the same time, about 40% of the company's machine fleet cannot be expanded with additional memory, which limits the execution of certain workloads requiring hundreds of gigabytes or even terabytes of RAM. From this situation arose the idea of recovering DDR4 modules from systems retired from service and integrating them into new DDR5-based servers, increasing the available capacity without exclusively purchasing new production memory.

To enable this integration, Meta designed Vistara, a proprietary ASIC that serves as a bridge between DDR4 memory and modern processors through a CXL 2.0/1.1 interface on a PCI Express Gen5 x16 connection. The company explains that the CXL solutions available on the market did not meet their needs. Many products actually integrate memory and controllers in the same device, preventing the reuse of existing DIMM modules, in addition to not supporting DDR4 or being deemed too expensive in terms of power consumption and costs.

Vistara, on the other hand, takes a different approach: it completely separates the controller from the memory, allowing for the free installation of DDR4 modules recovered from decommissioned servers. Each ASIC integrates two 72-bit DDR4 channels, supports speeds of up to 3,200 MT/s, and can handle up to 256 GB of memory using 64 GB DIMMs. The chip is also controlled by two RISC-V processors developed for this platform.

The technology is employed in systems that Meta defines as MemServer, equipped with AMD Turin processors featuring 158 cores and 316 threads. Each node integrates 768 GB of DDR5 memory alongside 256 GB of recovered DDR4 memory, connected through Vistara cards installed in dedicated rear slots of the chassis. To ensure operational reliability even under high loads, the system adopts dedicated cooling with airflow directed directly towards CXL modules and high-density memory.

From a software perspective, DDR4 memory is exposed to the Linux operating system as a separate NUMA node without associated CPUs. Applications initially utilize the local memory directly connected to the processor and only resort to the additional capacity made available via CXL when necessary.

Meta has also made specific modifications to the Linux kernel's CXL driver, stating that these interventions have already been integrated or are being included in the main branch of the project. The company claims that the availability of additional memory significantly reduces Out of Memory (OOM) events during the execution of particularly demanding applications such as Spark, Hive, databases, distributed caching systems, CI/CD pipelines, and inference models for recommendation systems.

According to data reported in the technical document, memory expansion allows for a 33% reduction in job restarts and resource fragmentation phenomena due to OOM errors, thus improving the overall reliability of the infrastructure. The economic impact appears equally significant. Meta states that for some implementations dedicated to distributed AI inference, the new architecture allows for a reduction of up to 25% in the number of servers needed to perform the same work, along with the additional benefit of reusing already available components.