Moore's Law is slowing down and the bag of tricks the industry uses to extract more from each new generation of chips is reaching diminishing returns. It takes much more effort to stay ahead, and it costs much more, eroding the economic benefits and forcing the industry to look for new solutions.
In its opening lecture at the IEDM 2017, an annual chip conference goes back 63 years, AMD's executive director, Lisa Su, spoke about these challenges and her company's approach to using a multi-chip architecture for "break the restrictions of Moore's Law". The final speech capped a great year for AMD, which brought a long-awaited redesign of its desktop, server and mobile processors, making it competitive once again with Intel in high-performance computing.
It was also a kind of homecoming for Su, who won the IEDM student award as a graduate student of MIT in 1
The PC may be in decline and the smartphone market is showing signs of saturation, but AMD believes that a new era of immersive computing will require a lot of power calculation. The CPU and the GPU have been the pillars of high performance computing. Over the past decade, performance has doubled every 2.4 years for CPUs and every 2.1 years for GPUs, according to AMD data. The efficiency (or performance per watt) in the server chips has also doubled every 2.4 years.
But it has not been easy and interesting AMD says that only about 40 percent of these profits comes from technology is reduced. Much of the rest comes from the design of systems and architecture. This includes the integration of more features, microarchitecture innovations, improved energy management and software as better compilers. The new Zen microarchitecture, for example, increased instructions per clock by 52 percent and each 8-core Epyc array has thousands of sensors to optimize power, improving performance per watt by almost 50 percent.
For now, these tricks continue to deliver solid gains, but about 20 percent comes from an increase in overall power and matrix size. For example, high-end GPUs have gone from 200 watts to 300 watts as the industry improves heat dissipation. In a typical server chip, only a third of the energy now goes directly to computing, since other components such as I / O, caches and fabrics on chip consume more energy. High performance chips now measure from 500 to 600 square millimeters and some approach the limits of manufacturing tools (the grid limit), most notably the huge Nvidia Tesla V100 GPU. Not surprisingly, all this is becoming very expensive and AMD showed a graph indicating that a 7nm chip will more than double the cost of its current 14nmm processors. Finally, the memory bandwidth has not been able to keep up with the increase in CPU and GPU performance.
All this is what led AMD to switch to a multi-chip architecture for Epyc. The flagship 32-core server chip actually consists of four & # 39; chiplets & # 39; of 8 cores in an organic interleaver connected with a patented Infinity Fabric uses Serdes high-speed links. The total area of the die is a bit larger due to peripheral circuits, a total of 852 square millimeters versus 777 square millimeters for a hypothetical monolithic die, but the yield is much higher than it costs 40 percent less. It also provides flexibility to design different products. AMD and others also use stacked DRAM in 3D known as high bandwidth memory (HBM) to increase bandwidth. idth, reduces power and reduces the overall footprint and complexity of designs, albeit at a cost, since HBM has a considerable premium and DRAM prices have been increasing.
The ultimate goal is to stack not only DRAM, but also non-volatile memory, GPUs and other components directly on the processors, Su said. Separately, Sony gave a presentation describing how it is already doing something similar for its CMOS image sensors: stacking the pixel sensor on top of 1Gb of DRAM, which in turn is stacked on top of an image processor , but there is still a lot of work to be done to make this extend to high performance computing, especially with the manufacture of higher density interconnections, known as through silicon pathways (TSV), and the dissipation of all heat stuck in that sandwich. But Su said he is confident that all these problems were surmountable. The real problems, he said, are making 3D integration economical and updating software to fully utilize this type of device.
In the inherited world, "the CPU was the center of everything with other chips hanging from it", "Su said, but the workloads have changed, particularly with the increase of deep learning, and now there is a lot of debate in the industry on whether the CPU, GPU, FPGA or custom ASICs will become the main calculation element. "From my point of view, it's all of the above," Su said. "You'll find the world is a place heterogeneous. "This will also give more importance to the interconnections that unite all these elements (AMD is a member of the CCIX and Gen-Z consortiums that develop this technology).
The combination of continuous scaling and these techniques will continue to deliver at least , performance is doubled every 2.4 years, according to AMD. "We absolutely believe in the performance gains we've seen over the past decade, we can achieve or exceed in the next decade," Su said.