By Bill Jones, Sr. Solution Architect
The Intel Xeon Scalable Processors (codenamed Skylake) have been available from Intel since Q3 of 2017 and started to ship in vendor platforms a few months later. The changes to the memory layout continue to prompt questions from our clients. In this blog post, we’ll talk about what’s new, what’s good, and what’s challenging.
I learned something that I want to share with you as I wrote this blog. I want to share a more exact measure of memory speed that you may not be aware of: it is MegaTransfers per Second. I will use “MT/s,” in the text below which is the rating of the delivered speed of the memory rather than the frequency of the clock. For example, if timing is derived from both the rising and falling edges of the clock cycle rather than one complete cycle, a 400 MHz clock yields 800 MT/sec. Cool!
One of the biggest changes to memory with the new Intel Xeon Scalable Processors is the number of channels and their depth. With the prior two generations of processors, there were four memory channels per socket, and each channel could support up to three DIMMs. Now, there are six channels; each supporting a maximum of two DIMMs.
Another significant change is that some processors models in the Skylake family support more memory than others. Processor models ending with “M” support up to 1.5TB of memory per socket, while ones without the “M” suffix support only 768GB per per socket.
What is not new is that the mixing of RDIMM and LRDIMM memory is still not supported.
With the Haswell and Broadwell processor models, how memory channels were populated could have a significant impact on memory speed. Depending on the processor model, the types of memory modules used, and the number of modules per channel, the memory speed could be as low as 1333MT/s; and the maximum speed was 2400MT/s. With Skylake, the memory population rules are greatly simplified (in part by limiting the number of DIMMs per channel to two), and the minimum and maximum memory speeds have increased to 2133MT/s and 2666MT/s, respectively. Depending on the memory configuration and processor model, the memory speed may run at 2666MT/s, 2400MT/s, or 2133MT/s.
Also, since most client environments don’t require more than 768GB of RAM per socket (or 1.5TB per dual-socket server), providing separate models for high-memory workloads allows clients to save cost by only buying the high-memory model processors when needed.
Despite Skylake processors having six memory channels per socket, due to space limitations and heat limitations of dense server solutions, many of the motherboards built for blade and multi-node servers only have eight DIMM slots per socket, not 12! In these environments, two of the memory channels per socket support two DIMM modules each; the other four channels support only one. Intel recommended best practices for memory population specify that each memory channel should be populated evenly with identical DIMM types. Since there are eight DIMM slots available, it is tempting to populate all eight slots. Unfortunately, this will result in unevenly populated memory channels, which will cause reduced memory performance. The question is: will the reduced memory performance impact my application more than the benefits I see with having more memory in the server? As you might imagine, the only way to know for sure it to run your application on a server that uses six DIMMS and one that uses all eight DIMMS populated and measure the results yourself.
For clients who have tested and validated their own server sizing standards, the added memory channels is driving changes to their current hardware configuration standards. With the Haswell and Broadwell processors, dual-socket servers had eight (8) memory channels. Since 16GB DIMMs usually cost less than two (2) 8GB DIMMs and 32GB DIMMS cost less than two (2) 16GB DIMMS, clients tend to build their servers with either 16GB or 32GB DIMMs. As a result, these servers either have 128GB or 256GB of memory.
With the release of Skylake, for standard sized dual-socket servers, we now have twelve (12) memory channels per server, using 16GB and 32GB DIMMs, we recommend building servers with 192GB or 384GB of memory per server. With Haswell and Broadwell, workloads that require more than 128GB but less than 192GB of RAM would have been optimally sized to have 256GB of RAM. Now, those workloads can be accommodated with only 192GB, thus saving our clients money. However, sometimes explaining to application owners why the new company hardware standard for their workload has less memory (only 192GB instead of 256GB) can be an uncomfortable conversation.
Intel and our hardware partners have created a myriad of tools to help Dasher and our clients create valid and optimal memory configurations in their servers. Here are some of the resources we rely on to help our clients deal with the ever changing memory landscape.
- HPE Memory Configurator
- HPE Datasheet on Gen10 Memory Population Rules
- Dell’s Landing Page for Everything Related to PowerEdge Memory