This might seem counter-intuitive because, again, everyone knows that while NAND flash is far faster than any spinning disc, it’s still orders of magnitude slower than DRAM. What the research team found, however, is that there are instances where servers working with large data sets take huge performance hits if they have to use traditional media to retrieve data. Imagine, for example, that you have to perform a calculation that takes 10ns 99% of the time. That last 1% of the time, however, you have to wait for data to be retrieved from a traditional spinning disc. Latency on a standard hard drive for data lookup is about 10 milliseconds, or 10,000,000 nanoseconds. In other words, a tiny fraction of the task is responsible for an overwhelming percentage of the workload execution time.
According to the MIT researchers, the break-even point for this kind of scenario is around 5%. If you have to step out to disk more than 5% of the time, you lose so much performance compared to holding the entire data set in RAM, that it may make sense to replace, say, a 10TB DRAM configuration with 20TB of NAND flash. The team’s recent presentation at the International Symposium on Computer Architecture showed that 40 servers with 10TB worth of RAM couldn’t handle a 10.5TB computation more effectively than 20 servers with 20TB worth of NAND flash. Not only is NAND flash much cheaper than conventional DRAM, it’s also uses a fraction of the power.
“This is not a replacement for DRAM [dynamic RAM] or anything like that,” says Arvind, the Johnson Professor of Computer Science and Engineering at MIT, whose group performed the new work. “But there may be many applications that can take advantage of this new style of architecture. Which companies recognize: Everybody’s experimenting with different aspects of flash. We’re just trying to establish another point in the design space.”
In order to make the system work, the research team constructed a network of flash-based servers. Each of the 20 servers connected to an FPGA, and each FPGA connected to a rack of two 500GB NAND flash chips. All of the FPGA’s were also linked, which allowed any server to access any rack. By moving some of their code directly over the FPGA’s and building a suite of intelligent prediction tools that cached “hot” data and it had it ready to go on-demand, the researchers were able to build a Big Data system that could match a conventional DRAM-equipped network on a fraction of its power budget. In the long run, over time, keeping all of the data in NAND flash proved superior to the DRAM + HDD model.
This specialized approach wouldn’t work in every instance — you need specific computational tasks where disk access creates a huge bottleneck with RAM requirements that make DRAM impractical but are small enough to fit into NAND flash. The FPGA cluster would also need to be tuned for each specific workload. Still, approaches like this could be key to long-term exascale computing and improved power consumption in the datacenter. Modern supercomputers, like the Tianhe-2, continue to use huge RAM loadouts combined with specialized accelerator cards, but if these advances scale to other types of systems, we may see a new type of HPC configuration debut in coming years.
Post a Comment