I’ve been really dismayed by the lack of decent simple tools for testing the available bandwidth between a pair of hosts above 1 gigabit/second. Back when I didn’t have any 10 gigabit connections at home, I used iperf and iperf3. But I now have several 10 gigabit connections on my home network, and since these tools don’t use multithreading effectively, they become CPU bound (on a single core) before they reach the target bandwidth. Tools like ssh and scp have the same problem; they’re single threaded and become CPU bound long before they saturate a 10 gigabit connection.
When I install a 10 gigabit connection, whether it’s via SFP+ DACs, SFP+ SR optics or 10GbaseT, it’s important that I’m able to test the connection’s ability to sustain somewhere near line rate transfers end-to-end. Especially when I’m buying my DACs, transceivers or shielded cat6a patch cables from eBay or any truly inexpensive vendor. I needed a tool that could saturate a 10 gigabit connection and report the data transfer rate at the application level. Obviously due to the additional data for protocol headers and link encapsulation, this number will be lower than the link-level bandwidth, but it’s the number that ultimately matters for an application.
So, I quickly hacked together a multithreaded application to test my connections at home. It will spawn the requested number of threads (on each end) and the server will send data from each thread. Each thread gets its own TCP connection.
For a quick hack, it works well.
dwm@www:/home/dwm% mcperf -t 4 -c kiva
bandwidth: 8.531 Gbits/sec
bandwidth: 8.922 Gbits/sec
bandwidth: 9.069 Gbits/sec
bandwidth: 9.148 Gbits/sec
bandwidth: 9.197 Gbits/sec
bandwidth: 9.230 Gbits/sec
bandwidth: 9.253 Gbits/sec
bandwidth: 9.269 Gbits/sec
bandwidth: 9.283 Gbits/sec
Given that I don’t create servers that don’t use strong authentication, even if they’ll only be run for 10 seconds, I’m using the PeerAuthenticator from libDwmAuth for authentication. No encryption of the data that’s being sent, since it’s not necessary.
Of course this got me thinking about the number of tools we have today that just don’t cut it in a 10 gigabit network. ssh, scp, ftp, fetch, etc. Even NFS code has trouble saturating a 10 gigabit connection. It seems like eons ago that Herb Sutter wrote “The Free Lunch Is Over”. It was published in 2005. Yet we still have a bunch of tools that are CPU bound due to being single-threaded. How are we supposed to take full advantage of 10 gigabit and faster networks if the tools we use for file transfer, streaming, etc. are single-threaded and hence CPU bound well before they reach 10 gigabits/second? What happens when I run some fiber at home for NAS and want to run 40 gigabit or (egads!) 100 gigabit? It’s not as if I don’t have the CPU to do 40 gigabits/second; my NAS has 12 cores and 24 threads. But if an application is single-threaded, it becomes CPU bound at around 3.5 gigabits/second on a typical server CPU core. 🙁 Sure, that’s better than 1 gigabit/second but it’s less than what a single SATA SSD can do, and much less than what an NVME/M.2/striped SATA SSD/et. al. can do.
We need tools that aren’t written as if it’s 1999. I suspect that after I polish up mcperf a little bit, I’m going to work on my own replacement for scp so I can at least transfer files without being CPU bound at well below my network bandwidth.