While "replacing 300 CPU-only servers on deep learning training" is hardly a benchmark, 15,500 images per second on ResNet-50 is – just a couple of years ago, training throughput would be 1-2 orders of magnitude slower. Also of interest is the approach that Nvidia is taking here – a single compute "node" will be capable of delivering both AI and HPC workloads with extreme performance (the reference implementation claims two petaflops).
https://www.zdnet.com/article/nvidia-unveils-the-hgx-2-a-server-platform-for-hpc-and-ai-workloads/
The platform’s unique high-precision computing capabilities are designed for the growing number of applications that combine high-performance computing with AI.
