TMO, a Facebook mechanism that saves RAM on servers

Facebook engineers disclosed, through a report, the introduction of technology TMO (Transparent Memory Offloading) last year, which allows to significantly save RAM on servers by moving secondary data that is not required to work on cheaper drives such as NVMe SSDs.

Facebook estimates that TMO saves between 20% and 32% of RAM on each server. The solution is designed for use in infrastructures where applications run in isolated containers. The kernel-side components of TMO they are already included in the Linux kernel.

On the Linux kernel side, the operation of technology is provided by the PSI subsystem (Pressure Stall Information), supplied as of version 4.20.

PSI already used in various out of memory drivers and allows to analyze information about waiting time for various resources (CPU, memory, I/O). With PSI, user space processors can more accurately assess system load and slowdown patterns, allowing anomalies to be detected before they have a noticeable impact on performance.

In user space, the Senpai component runs TMO, which dynamically adjusts the memory limit for application containers via cgroup2 based on data received from the PSI.

Senpai analyzes the signs of the beginning of a shortage of resources via PSI, evaluates the sensitivity of applications to slow memory access and tries to determine the minimum size of memory required for a container, in which the data required for the job remains in RAM, and related data that has been sitting in the file cache or is not currently directly used, is forced out to the swap partition.

Transparent Memory Offload (TMO) is Meta's solution for heterogeneous data center environments. It introduces a new Linux kernel mechanism that measures work lost due to resource shortages in CPU, memory, and I/O in real time. Guided by this information and without any prior knowledge of the application, TMO automatically adjusts the amount of memory to offload to a heterogeneous device, such as a compressed memory or SSD. It does this based on the performance characteristics of the device and the application's sensitivity to slower memory accesses.

Therefore, the essence of TMO is to keep processes on a "strict diet" in terms of memory consumption, forcing unused memory pages to be moved to the swap partition, removal of which does not noticeably affect performance (for example, pages with code used only during initialization and one-time data cached on disk) . Unlike flushing information to the swap partition in response to low memory, TMO flushes data based on predictive prediction.

The absence of access to a memory page within 5 minutes is used as one of the criteria for preference. These pages are called cold pages and, on average, they make up about 35% of the application's memory (depending on the type of application, there is a variation from 19% to 65%).

The preference takes into account activity associated with anonymous pages of memory (memory allocated by the application) and memory used for file caching (allocated by the kernel). In some applications the anonymous memory is the main consumption, but in others the file cache is also very important.

To avoid imbalance when flushing memory to the cache, TMO uses a new paging algorithm that flushes anonymous pages and pages associated with the file cache proportionally.

Pushing infrequently used pages to slower memory doesn't have a huge impact on performance, but it can significantly reduce hardware costs. Data is sent to SSDs or compressed swap space in RAM. At the cost of storing one byte of data, using NVMe SSDs is up to 10 times cheaper than using compression on RAM.

Finally, if you are interested in knowing more about it, you can consult the details In the following link.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: AB Internet Networks 2008 SL
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.

  1.   elian said

    can this be used in normal computers with normal apps?