High memory loading and protection from power loss
Since in-memory database solutions (IMDB) tend to move data off the hard drive (slow disk access) and onto DRAM main memory (fast but expensive), there is a higher need for memory on the server.
Here the bottleneck is the high cost of memory – when you load 384GB of memory on a 2-socket server, the cost of memory dwarfs the server cost. For this reason compression is often used (slows down a bit but is still faster than hard disk) to reduce total memory requirements.
However, despite this, the total memory requirements for in-memory databases can get quite large.
Memory for in-memory databases
When you add a lot of memory to a server, it creates the “high memory loading” issues mentioned in other articles here – requiring load reduction and rank multiplication techniques (Netlist IP) – which can be addressed by using LRDIMMs/HyperCloud memory modules.
Memory choice remains the same as for virtualization servers – the OEMs generally have standardized memory – for example the HP:
– HP Smart Memory RDIMM
– HP Smart Memory LRDIMM
– HP Smart Memory HyperCloud
All have the same type of error recovery features.
The IBM x3850 server addresses the “enterprise database” market:
IBM System x3850 X5 Product Guide
On the IBM x3850 server the currently qualified memory is:
HyperCloud is currently not available, but when it does, it would be preferable over the LRDIMMs (which have performance, latency, price and IP issues).
As examined in this article:
Infographic – memory buying guide for Romley 2-socket servers
June 29, 2012
Servers like the IBM x3850 also can have proprietary memory solutions available, like the IBM MAX5 for memory expansion capability beyond the Intel PoR (plan of record). These solutions may introduce latency or speed penalties (going through the QPI interface for example to the MAX5 memory expansion card would introduce latencies), but for in-memory database applications the end result may still be faster than using traditional database applications.
One solution would be to not use the proprietary memory expansion capabilities like MAX5 and go with the load reduction solutions like LRDIMMs/HyperCloud which are now available for Romley servers.
On current 2-socket servers with 24 DIMM slots, you can expand memory to 768GB running at 1333MHz (with 32GB HyperCloud when it becomes available mid-2012). With LRDIMMs you can have it running at 768GB at 1066MHz. 32GB RDIMMs (which will be 4-rank for the foreseeable future) will not be able to deliver 768GB (because of rank limitations – can only deliver 512GB at 800MHz). The 32GB HyperCloud which use 4Gbit monolithic memory packages should be cheaper than the 32GB RDIMMs and 32GB LRDIMMs.
With 4-socket servers, you can just double that number to 1.5TB running at an achievable speed of 1333MHz (with 32GB HyperCloud).
On the fragility of memory modules
Memory is susceptible to errors – errors caused by gamma radiation, and other sources. In addition there can be differences between DRAM dies which make one DRAM have more errors than another.
For some background on DRAM error probabilities:
Nightmare on DIMM street
by Robin Harris on Saturday, 10 October, 2009
DRAM errors in the wild: a large-scale field study
With large amounts of memory, the probability of single-bit errors SOMEWHERE in your 1.5TB of memory goes up (i.e. is double what it would be for 768GB and so on). And the probability of two-bit errors goes up as well.
An analogy can be made with RAID5 – RAID5 was seen as dangerous to use after people realized that with extremely large hard disk sizes (and consequently very long times for RAID recovery), the probability of an error occurring DURING a RAID recovery operation could not be ignored.
Similar considerations need to be given to deal with DRAM memory errors and recovery from those errors.
For in-memory database applications, where the whole database needs to be in memory, even a single error could invalidate the integrity of the whole database.
Enter servers with RAS capability
For these reasons, in-memory database applications like SAP HANA generally tend to favor servers that have additional capabilities for managing memory errors and replacing faulty memory modules with replacement ones.
SAP HANA is an in-memory database solution being pushed by SAP to compete with Oracle – the emphasis on in-memory usage leads to huge improvements in database processing capability:
SAP HANA – Adapt or Die?
11 May 2011 Business Intelligence (BI), HANA, In-Memory, Emerging Technologies
For background on how in-memory databases may become more common in the future:
SAP wants to kill Oracle
By cfheoh | May 5, 2012 | Acquisition, Oracle, SAP, Violin Memory
VMware’s Database Play: “Disk Is the New Tape”
Scott M. Fulton· June 7th, 2012
In order to address this, Westmere (pre-Romley) and the new Intel E7 (Romley) systems have RAS capability (Reliability, Availability and Serviceability).
Intel® Processor-based Server Platforms: Enhanced RAS Capabilities
Servers like the IBM x3850 server explicitly mentions support for in-memory databases and SAP HANA in their docs:
IBM System x3850 X5 Product Guide
This IBM system for example mentions SAP’s HANA in-memory database capability prominently – and offers RAS. RAS capable servers have the ability to do DRAM mirroring and also options for reserving DRAM memory modules which can be used to replace memory modules that exceed a certain error count limit.
In addition the OEMs have various techniques (“Chipkill” from IBM for example) for dealing with the memory errors on the memory modules.
Most of these techniques are available on the full range of memory products offered by the OEM.
– HyperCloud – which are compatible with RDIMMs
– LRDIMM – which are a new standard and incompatible with RDIMMs
Non-volatile DDR3 memory modules
In-memory databases have a greater vulnerability to power loss because a lot of data is in the volatile DRAM – when power goes, so does the data.
While the techniques mentioned above tackle memory errors, they will not help you in case of power outage – where all the data in the DRAM will be lost.
Since in-memory databases store all their data in memory – and since it would slow things down if things were cached all the time to secondary storage (slower) like SSDs or hard disks – for this reason it becomes very important to have features that could save memory module data AFTER a power loss event has been noted (instead of anytime before that in a precautionary way).
For an examination of how non-volatile DDR3 memory could enable fast recovery after power loss:
Would non-volatile DRAM have reduced Amazon outage ?
July 3, 2012