Latency and throughput figures for LRDIMMs emerge

LRDIMMs exhibit 45% worse latency and 36.7% worse throughput at 3 DPC

LRDIMMs (which are a new standard and incompatible with DDR3 RDIMMs) exhibit significant performance impairment at 3 DPC compared to RDIMM-compatible HCDIMMs:

– LRDIMMs have 45% worse latency than HCDIMMs (235ns vs. 161.9ns for HCDIMMs)
– LRDIMMs have 36.7% worse throughput than HCDIMMs (40.4GB/s vs. 63.9GB/s for HCDIMMs)

And this is when HCDIMMs are run at a SLOWED down 1066MHz at 3 DPC (in order to match the lower max achievable speed of the LRDIMMs).

This comparison – at SAME speed – highlights the architectural weaknesses of the LRDIMM design irrespective of the speeds.

When compared at the MAXIMUM achievable speeds (LRDIMMs at 1066MHz at 3 DPC and HCDIMM at 1333MHz at 3 DPC):

– LRDIMMs have approx. 45% worse latency than HCDIMMs (235ns vs. 161.9ns for HCDIMMs)
– LRDIMMs have 40% worse throughput than HCDIMMs (40.4GB/s vs. 68.1GB/s for HCDIMMs)

.

Inphi inability to deliver credible benchmarks for LRDIMM

Inphi as the sole-supplier of LRDIMM buffer chipsets has been shy of posting benchmarks for LRDIMMs. In it’s recent conference call, Inphi referred to an IBM database benchmark that used Samsung 32GB LRDIMMs. However the improvement shown there was with respect to PREVIOUS generation servers, and was little indication of LRDIMM performance vs. current RDIMMs or HyperCloud HCDIMM (also RDIMM-compatible).

LRDIMMs are a new standard and are incompatible with DDR3 RDIMMs. A shortfall in performance makes for a more difficult use-case for LRDIMMs over the DDR3 RDIMM-compatible options like RDIMMs or HyperCloud HCDIMMs.

Netlist in a recent blog post has pointed to some HP numbers (that I had missed earlier) which show exactly how impaired the LRDIMM performance is at 3 DPC vs. DDR3 RDIMM-compatible options – specifically the HyperCloud HCDIMMs.

The HP numbers suggest a serious decline in both “loaded latency” and “throughput” when LRDIMMs are used at 3 DPC (i.e. high memory loading).

.

Inphi seeks validation of LRDIMM performance

In the Inphi Q2 2012 conference call, Inphi talks about the paucity of benchmark information about LRDIMMs (this is odd, given the Intel promotion of LRDIMMs for Romley rollout).

Inphi talks about an IBM benchmark (that uses Samsung 32GB LRDIMMs) as validation for LRDIMMs – yet that benchmark demonstrates improvement over PREVIOUS generation servers, and says very little about the relative advantage of LRDIMMs vs. the RDIMM-compatible options i.e. RDIMMs or HyperCloud HCDIMMs.

Here is Inphi talking about the benchmark info currently available (IBM database benchmark) and says that benchmarks may become available in second half of 2012.

It should be noted that Inphi is currently the sole-supplier of LRDIMM buffer chipsets (as IDTI and Texas Instruments have skipped LRDIMMs for Romley altogether – although there may be another smaller company called Montage also offering LRDIMMs in the future).

Inphi, IDTI and Texas Instruments are the top 3 buffer chipset makers. Inphi therefore makes buffer chipsets for RDIMMs as well as LRDIMMs.

http://investors.inphi.com/phoenix.zhtml?c=237726&p=irol-irhome
Webcast
Q2 2012 Inphi Corp Earnings Conference Call (Replay)
07/25/12 at 2:00 p.m. PT

at the 19:50 minute mark:

Ford Tamer – CEO:

Going to the computing market .. uh .. again similar we saw strength in LRDIMM and we see that strength continuing in the second half.

And on the base register business (i.e. for RDIMMs) .. uh .. we see Romley accelerating adoption in the second half, opening up a larger TAM (Total Addressable Market).

So .. uh .. when we look at macroeconomic, we .. uh .. still confident where we are going because we are based on new technologies.

We see a bit of headwinds in our more earlier technology like 10Gig.

at the 20:30 minute mark:

Doug Friedman of RBC Capital:

Great. If you could give us a little bit of idea around the LRDIMM adoption.

Where are you seeing the uptick in demand there, and what is really driving it.

Are you successful in getting new customer engagements, or is this stuff that is happening through channels and through “pull” through the OEM partners.

at the 20:45 minute mark:

Ford Tamer – CEO:

Doug, we had said that we wanted to get some benchmarks out and we are continuing to work on them.

But we are very excited to have benchmarks .. uh .. this quarter which are helping us draw and drive adoption.

And so .. uh .. the benchmark by IBM of having THE fastest TPC-E and TPC-C ever on LRDIMM .. uh .. was as much as 20% and 40% performance improvement .. uh .. compared to next-best server.

It’s just phenomenal.

So .. we we we’ve been saying we we could drive a significant performance advantage and we weren’t able to quantify it until now.

And now we feel like we can stand and talk about a very significant performance with LRDIMM.

So that’s one.

Two, I think .. uh .. we’ve seen announcement on 768GB and 1.5TB by HP and IBM and again that increased capacity means a lot to .. uh .. in-memory .. uh .. database in-memory applications .. uh .. high performance applications, financial trading. And so that’s number two.

Number three, what’s happening is we’ve seen the price of some of these LRDIMM modules drop by almost 50% (that’s because they were priced absurdly high to begin with) as the volumes are picking up.

at the 22:10 minute mark:

So really during the quarter, we’ve seen tremendous support from both our OEM partners .. server OEM partners as well as our module partners.

In addition, we continue to work with some of the software partners and expect additional benchmarks to be published in the second half of the year.

So these benchmarks are helping us clearly articulate the benefit of LRDIMM and drive the adoption.

Doug Friedman of RBC Capital:

Terrific.

Thank you for all the detail.

The followup I have and then I’ll jump back in the queue if I have any more.

.

The IBM database benchmark

The IBM database benchmark that Inphi references does not actually validate LRDIMMs – that benchmark compares current Romley generation servers (using 32GB LRDIMMs from Samsung and using Inphi LRDIMM buffer chipset) with PREVIOUS generation pre-Romley servers.

As a result these “benchmark” results say very little about LRDIMMs performance relative to RDIMM-compatible memory options available right now – RDIMMs and HyperCloud HCDIMMs.

Here is the IBM database benchmark that Inphi is referring to:

ftp://service.boulder.ibm.com/eserver/benchmarks/news/newsblurb_x3650M4_tpce_030612.pdf
IBM posts best 2-processor performance ever published on TPC-E benchmark
IBM System x3650 M4 sets new record for 2-processor server performance on TPC-E

March 6, 2012 … IBM has published a benchmark result that sets a new record for 2-processor
performance on the TPC-E benchmark, which is designed to enable clients to more objectively
measure and compare the performance and price of OLTP systems.

The IBM System x3650 M4 server achieved 1,863.23 tpsE (transactions per second E) at $207.85
USD / tpsE. (1) This result is faster than all the other currently published TPC-E results for 2-
processor servers, and represents a significant performance benefit compared to systems using
previous-generation processors. For example, the x3650 M4’s result is more than 45% faster than the
HP ProLiant DL380 G7 server’s result. (2)

.

Netlist points to HP numbers for LRDIMMs

The recently created Netlist blog points to performance figures on LRDIMMs (taken from HP docs) that I had not noticed before:

http://www.netlist.com/media/blog/hypercloud-memory-scaling-the-high-density-memory-cliff/
HyperCloud HCDIMM: Scaling the High Density Memory Cliff
July 24, 2012 at 5:00 AM

See text accompanying Figure 4 – which suggests that 1600MHz HCDIMM will be available in Q1 2013.

Figure 5 – confirms the info presented on this blog about the speed slowdown data for 32GB LRDIMMs at 2 DPC and 3 DPC (taken from OEM data sheets).

Figure 6 – quotes HP figures for LRDIMM “loaded latency” and “throughput” and compares them with HyperCloud HCDIMM performance.

32GB LRDIMMs max out at 1066MHz at 3 DPC

As shown in Figure 5 on the Netlist blog:

– HyperCloud HCDIMMs run at a maximum of 1333MHz at 3 DPC
– LRDIMM run at a maximum of 1066MHz at 3 DPC
– LRDIMMs peak at 1066MHz at 3 DPC on Intel plan-of-record servers

This is consistent with what has been stated on this blog – that LRDIMMs cannot run at 1333MHz at 3 DPC on Intel plan-of-record servers.

For example:

http://ddr3memory.wordpress.com/2012/06/29/infographic-memory-buying-guide-for-romley-2-socket-servers/
Infographic – memory buying guide for Romley 2-socket servers
June 29, 2012

Comparing LRDIMMs with (slowed-down) HCDIMMs at 1066MHz

In Figure 6 – Netlist compares:

– 32GB LRDIMMs running at their max acheivable 1066MHz at 3 DPC
vs.
– 32GB HyperCloud HCDIMM running at a SLOWED-DOWN 1066MHz at 3 DPC

HCDIMM can actually run faster at 1333MHz at 3 DPC – but is slowed down for the purposes of the comparision with LRDIMMs (which max out at 1066MHz at 3 DPC).

One can see from the results that even with the deliberately slowed down HCDIMM, the LRDIMM performs significantly poorly.

What Netlist is trying to demonstrate is the architectural weakness of the LRDIMM (by running both at the same speed).

Netlist blog quotes numbers for 32GB LRDIMMs at 2 DPC and 3 DPC (Figure 6):

– at 1333MHz at 2 DPC – 138.9ns loaded latency – 68.1GB/s throughput
– at 1066MHz at 3 DPC – 235ns loaded latency – 40.4GB/s throughput

These are taken from the HP docs mentioned below.

Netlist blog also quotes numbers for 32GB HCDIMMs at 2 DPC and 3 DPC (Figure 6):

– at 1333MHz at 2 DPC – 138.9ns loaded latency – 68.1GB/s throughput
– at 1066MHz at 3 DPC – 161.9ns loaded latency – 63.9GB/s throughput

NOTE: while 32GB HCDIMM can run at 1333MHz at 3 DPC, it is being run SLOWER at 1066MHz at 3 DPC so that a comparison can be made with 32GB LRDIMMs running at 1066MHz at 3 DPC – i.e. both at the same speed.

The results demonstrate (Figure 6) that at 1066Mhz at 3 DPC:

– 32GB LRDIMM – 235ns loaded latency – 40.4GB/s throughput
– 32GB HCDIMM – 161.9ns loaded latency – 63.9GB/s throughput

So running at the same speed, the HCDIMM has a significant advantage over the LRDIMM.

What about when you compare HCDIMMs running at the full speed of 1333MHz at 3 DPC vs. LRDIMMs running at their full speed of 1066MHz at 3 DPC (on Intel POR servers).

Netlist goes on to state (the text above Figure 6):

With 24 HCDIMM operating at a 1333 Data Rate, the throughput result is similar to the 68.1GB/s result of only populating 16 HCDIMMS. With LRDIMM, depending on application, it’s possible that overall server performance will decline even as memory and the dollars spent on it increases.

Which means that if you were to compare 32GB LRDIMMs at 1066MHz at 3 DPC with 32GB HCDIMM running at their full achievable speed of 1333MHz at 3 DPC, the comparison would be even more skewed in HCDIM’s favor – throughput of 68.1GB/s:

– 32GB LRDIMM – 235ns loaded latency – 40.4GB/s throughput
– 32GB HCDIMM – approx. 161.9ns loaded latency – 68.1GB/s throughput

.

Significance of comparing LRDIMMs with a SLOWED down HCDIMMs running at 1066MHz at 3 DPC

The significance is that even on non-Intel-POR servers there will be a significant performance penalty if you use LRDIMMs over HCDIMMs.

The IBM x3750 M4 server goes beyond the Intel POR (plan-of-record) and implements some motherboard tweaks that “lifts all boats” i.e. allows RDIMMs and LRDIMMs to run faster:

http://ddr3memory.wordpress.com/2012/06/02/memory-choices-for-the-ibm-system-x-x3750-m4-servers-2/
Memory choices for the IBM System X x3750 M4 servers
June 2, 2012

Presumably if HCDIMMs become available on this server, they too would benefit from this leeway (possibly running faster than 1333MHz ?).

In the text accompanying Figure 4 in their blog, Netlist mentions 1600MHz HyperCloud HCDIMM availability in Q1 2013:

http://www.netlist.com/media/blog/hypercloud-memory-scaling-the-high-density-memory-cliff/
HyperCloud HCDIMM: Scaling the High Density Memory Cliff
July 24, 2012 at 5:00 AM

See text accompanying Figure 4 – which suggests that 1600MHz HCDIMM will be available in Q1 2013.

However, even if HCDIMMs were run at their usual 1333MHz at 3 DPC, the latency and throughput advantage over LRDIMMs is likely to remain – as demonstrated by the comparison between LRDIMMs and HCDIMMs running at a similar 1066MHz at 3 DPC.

The comparison essentially points to architectural weaknesses in the LRDIMM design.

For more on the architectural weaknesses in the LRDIMM design (asymmetrical lines and centralized buffer chipset):

http://ddr3memory.wordpress.com/2012/05/31/lrdimm-latency-vs-ddr4/
LRDIMM latency vs. DDR4
May 31, 2012

.

HP data – LRDIMMs exhibit performance impairment at 3 DPC

The data on LRDIMM latency and throughput performance slowdown at 3 DPC is provided by HP.

Although I had seen those tables in that doc, I had not realized their significance.

Netlist points to the HP doc:

http://h20000.www2.hp.com/bc/docs/support/SupportManual/c03293145/c03293145.pdf
Configuring and using DDR3 memory with HP ProLiant Gen8 Servers
Best Practice Guidelines for ProLiant servers with the Intel® Xeon® E5-2600 processor series
Engineering Whitepaper, 1st Edition

pg. 26:

Appendix A
Sample Configurations for 2P ProLiant Gen8 servers

24 DIMM slot servers:

16GB RDIMM 2-rank (at 1, 2, 3 DPC at 1333MHz and 1600MHz):

8 x 16GB 2R 1333 R 128 8 16GB 2R RDIMM 1333 1 65.7 138.9 75.3 5.1 42.6
8 x 16GB 2R 1600 R 128 8 16GB 2R RDIMM 1600 1 65.3 111.0 87.7 6.0 48.6
16 x 16GB 2R 1333 R 256 16 16GB 2R RDIMM 1333 2 65.7 150.7 72.6 11.7 81.2
16 x 16GB 2R 1600 R 256 16 16GB 2R RDIMM 1600 2 65.0 121.4 83.7 13.8 94.2
24 x 16GB 2R 1333 R 384 24 16GB 2R RDIMM 1066 3 66.1 161.4 60.0 17.5 79.2
24 x 16GB 2R 1600 R 384 24 16GB 2R RDIMM 1066 3 65.4 161.9 59.5 20.3 87.9

32GB LRDIMM (at 1, 2, 3 DPC):

8 x 32GB 4R 1333 L 256 8 32GB 4R LRDIMM 1333 1 66.1 122.1 72.4 18.3 77.7
16 x 32GB 4R 1333 L 512 16 32GB 4R LRDIMM 1333 2 66.8 138.9 68.1 35.3 110.8
24 x 32GB 4R 1333 L 768 24 32GB 4R LRDIMM 1066 3 70.9 235.0 40.4 55.9 121.6

The highlighted numbers are: Loaded latency (ns), Throughput (GB/s).

One can see that for LRDIMMs, the loaded latency almost DOUBLES – from 122.1ns to 235.0ns – as you go from 1 DPC to 3 DPC.

And the throughput drops from 72.4GB/s to 40.4GB/s – i.e. almost halving – as you go from 1 DPC to 3 DPC.

While the drop from 1 DPC to 2 DPC is modest, the results show that moving from 2 DPC to 3 DPC (DIMMs per channel), 32GB LRDIMMs exhibit a serious impairment in both latency and throughput.

Netlist is using the 32GB LRDIMM numbers for 2 DPC and 3 DPC (the two lines in italics) in the Figure 6 on the Netlist blog.

.

Confirmation of HyperCloud HCDIMM low latency claims

Netlist claim to latency improvement in HyperCloud HCDIMMs over the LRDIMMs is thus confirmed.

The loaded latency numbers for HyperCloud HCDIMM (Figure 6 on Netlist blog) are almost identical to those for 16GB RDIMMs 2-rank at 1333MHz (HP docs above):

– 32GB HCDIMMs at 2 DPC – 138.9 loaded latency – 68.1GB/s throughput
– 32GB HCDIMMs at 3 DPC – 161.9 loaded latency – 63.9GB/s throughput
.
– 16 x 16GB 2R 1333 R 256 16 16GB 2R RDIMM 1333 2 65.7 150.7 72.6 11.7 81.2
– 24 x 16GB 2R 1333 R 384 24 16GB 2R RDIMM 1066 3 66.1 161.4 60.0 17.5 79.2

And these are numbers for 1333MHz 16GB RDIMM which are 2-rank compared to the more complex 32GB HCDIMMs.

Comparing the 16GB RDIMMs 2-rank and the more complex 32GB HCDIMMs (2-rank virtual, 4-rank internal) running at the same speed of 1333MHz gives an indication of the architectural superiority of the HCDIMMs which allows such low latency operation in a load reduction/rank multiplicaton module (compare LRDIMM numbers to see how bad such an effort can potentially be).

The analysis on this blog that suggested that HyperCloud HCDIMM latency is close to that of RDIMMs (an amazing achievement) is also confirmed.

.

HCDIMMs as future memory – impact on DDR4

For more on the architectural weaknesses in the LRDIMM design (asymmetrical lines and centralized buffer chipset):

http://ddr3memory.wordpress.com/2012/05/31/lrdimm-latency-vs-ddr4/
LRDIMM latency vs. DDR4
May 31, 2012

This is the reason why DDR4 has recognized the need for not only the load reduction and rank multiplication technology used on the LRDIMMs/HCDIMMs, but has also adopted the symmetrical lines and decentralized buffer chipset on the HCDIMMs.

On DDR4 borrowing from LRDIMM use of Netlist IP in “load reduction” and “rank multiplication”:

http://ddr3memory.wordpress.com/2012/06/08/ddr4-borrows-from-lrdimm-use-of-load-reduction/
DDR4 borrows from LRDIMM use of load reduction
June 8, 2012

http://ddr3memory.wordpress.com/2012/06/07/jedec-fiddles-with-ddr4-while-lrdimm-burns/
JEDEC fiddles with DDR4 while LRDIMM burns
June 7, 2012

.

HP – 32GB RDIMM 4-rank non-viability
HP – 16GB LRDIMM non-viability

Note that HP does NOT bother listing the 32GB RDIMMs 4-rank (non-viable) and the 16GB LRDIMMs (non-viable).

While IBM announced IBM HyperCloud HCDIMMs at Romley launch, the HP announcement of HP HyperCloud HDIMMs was a few weeks later. This HP document predates the HP announcement for HP HyperCloud HDIMMs, and therefore this particular document does not mention the HyperCloud HCDIMMs for that reason.

As demonstrated on this blog, 16GB LRDIMMs are non-viable vs. RDIMM.

http://ddr3memory.wordpress.com/2012/06/19/why-are-16gb-lrdimms-non-viable/
Why are 16GB LRDIMMs non-viable ?
June 19, 2012

And 32GB LRDIMMs are non-viable vs. 32GB HyperCloud HCDIMMs.

http://ddr3memory.wordpress.com/2012/06/20/non-viability-of-32gb-rdimms/
Non-viability of 32GB RDIMMs
June 20, 2012

On the non-viability of LRDIMMs in general:

http://ddr3memory.wordpress.com/2012/07/05/examining-lrdimms/
Examining LRDIMMs
July 5, 2012

.

Moving from LRDIMMs to RDIMM-compatible HyperCloud HCDIMMs

When compared at the SAME speeds of 1066MHz at 3 DPC (i.e. running HCDIMMs slower than their max achievable speed of 1333MHz at 3 DPC to match the slower LRDIMM achievable speed of 1066MHz at 3 DPC):

– LRDIMMs have 45% worse latency than HCDIMMs (235ns vs. 161.9ns for HCDIMMs)
– LRDIMMs have 36.7% worse throughput than HCDIMMs (40.4GB/s vs. 63.9GB/s for HCDIMMs)

Conversely, moving from LRDIMMs to HCDIMMs at 1066MHz at 3 DPC, you will see this improvement:

31% improvement in latency in going to HCDIMM (235ns vs. 161.9ns for HCDIMMs)
58.2% improvement in throughput in going to HCDIMMs (40.4GB/s vs. 63.9GB/s for HCDIMMs)

When compared at the MAXIMUM achievable speeds (LRDIMMs at 1066MHz at 3 DPC and HCDIMM at 1333MHz at 3 DPC):

– LRDIMMs have approx. 45% worse latency than HCDIMMs (235ns vs. 161.9ns for HCDIMMs)
– LRDIMMs have 40% worse throughput than HCDIMMs (40.4GB/s vs. 68.1GB/s for HCDIMMs)

Conversely, moving from LRDIMMs to HCDIMMs at 3 DPC, you will see this improvement:

31% improvement in latency in going to HCDIMM (235ns vs. 161.9ns for HCDIMMs)
68.5% improvement in throughput in going to HCDIMMs (40.4GB/s vs. 68.1GB/s for HCDIMMs)

.

Conclusion – narrowing use-case for LRDIMMs

This creates a very difficult use-case for LRDIMMs vs. the RDIMM-compatible HyperCloud HCDIMMs:

– HCDIMMs are DDR3 RDIMM compatible
– LRDIMMs are not compatible with DDR3 RDIMMs (are a new standard)
.
– LRDIMMs have 45% worse latency at 3 DPC vs. HCDIMMs at same speeds
– LRDIMMs have 36.7% worse throughput at 3 DPC vs. HCDIMMs at same speeds
.
– LRDIMMs have approx. 45% worse latency at 3 DPC vs. HCDIMMs at their max speeds
– LRDIMMs have 40% worse throughput at 3 DPC vs. HCDIMMs at their max speeds

Conversely, moving from LRDIMMs to HCDIMMs you will see this improvement:

31% improvement in latency in going to HCDIMM at same speeds
58.2% improvement in throughput in going to HCDIMMs at same speeds
.
31% improvement in latency in going to HCDIMM at their max speeds
68.5% improvement in throughput in going to HCDIMMs at their max speeds

About these ads

7 Comments

Filed under Uncategorized

7 responses to “Latency and throughput figures for LRDIMMs emerge

  1. Pingback: Inphi reports Q2 2012 results | ddr3memory

  2. Pingback: Memory choices for the IBM System X x3750 M4 servers | ddr3memory

  3. Pingback: LRDIMM latency vs. DDR4 | ddr3memory

  4. Pingback: HyperCloud to own the 32GB market ? | ddr3memory

  5. Pingback: Examining Netlist | ddr3memory

  6. Pingback: Examining LRDIMMs | ddr3memory

  7. Pingback: Awaiting 32GB HCDIMMs | ddr3memory

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s