Christian Bell, Dan Bonachea, Yannick Cote, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Michael Welcome, Katherine Yelick
High-end supercomputers are increasingly built out of commodity components, and lack tight integration between the processor and network. This often results in inefficiencies in the communication subsystem, such as high software overheads and/or message latencies. In this paper we use a set of microbenchmarks to quantify the cost of this commoditization, measuring software overhead, latency, and bandwidth on five contemporary supercomputing networks. We compare the performance of the ubiquitous MPI layer to that of lower-level communication layers, and quantify the advantages of the latter for small message performance. We also provide data on the potential for various communication-related optimizations, such as overlapping communication with computation or other communication. Finally, we determine the minimum size needed for a message to be considered 'large' (i.e., bandwidth-bound) on these platforms, and provide historical data on the software overheads of a number of supercomputers over the past decade.