I recently assembled a new workstation for home. My primary need was a machine for software development, including deep learning. This machine is named “thrip”.
Having looked hard at my options, I decided on AMD Threadripper 3960X as my CPU. A primary driver was of course bang for the buck. I wanted PCIe 4.0, at least 18 cores, at least 4-channel RAM, the ability to utilize 256G or more of RAM, and to stay in budget.
By CPU core count alone, the 3960X is over what I needed. On the flip side, it’s constrained to 256G of RAM, and it’s also more difficult to keep cool than most CPUs (280W TDP). But on price-per-core, and overall performance per dollar, it was the clear winner for my needs.
Motherboard-wise, I wanted 10G ethernet, some USB-C, a reasonable number of USB-A ports, room for 2 large GPUs, robust VRM, and space for at least three NVMe M.2 drives. Thunderbolt 3 would have been nice, but none of the handful of TRX40 boards seem to officially support it (I don’t know if this is an Intel licensing issue or something else). The Gigabyte board has the header and Wendell@Level1Techs seems to have gotten it working, but I didn’t like other aspects of the Gigabyte TRX40 AORUS EXTREME board (the XL-ATX form factor, for example, is still limiting in terms of case options).
I prefer to build my own workstations. It’s not due to being particularly good at it, or winding up with something better than I could get pre-built. It’s that I enjoy the creative process of selecting parts and putting it all together.
I had not assembled a workstation in quite some time. My old i7-2700K machine has met my needs for most of the last 8 years. And due to a global pandemic, it wasn’t a great time to build a new computer. The supply chain has been troublesome for over 6 months now, especially for some specific parts (1000W and above 80+ titanium PSUs, for example). We’ve also had a huge availability problem for the current GPUs from NVIDIA (RTX 3000 series) and AMD (Radeon 6000 series). And I wasn’t thrilled about doing a custom water-cooling loop again, but I couldn’t find a worthy quiet cooling solution for Threadripper and 2080ti without going custom loop. Given the constraints, I wound up with these parts as the guts:
Asus TRX40 ROG Zenith II Extreme Alpha motherboard
AMD Threadripper 3960X CPU (24 cores)
256 gigabytes G.Skill Trident Z Neo Series RGB DDR4-3200 CL16 RAM (8 x 32G)
EVGA RTX 2080 Ti FTW3 Ultra GPU with EK Quantum Vector FTW3 waterblock
Seasonic PRIME TX-850, 850W 80+ Titanium power supply
Watercool HEATKILLER IV PRO for Threadripper, pure copper CPU waterblock
It’s all in a Lian Li PC-O11D XL case. I have three 360mm radiators, ten Noctua 120mm PWM fans, an EK Quantum Kinetic TBE 200 D5 PWM pump, PETG tubing and a whole bunch of Bitspower fittings.
My impressions thus far: it’s fantastic for Linux software development. It’s so nice to be able to run ‘make -j40‘ on large C++ projects and have them complete in a timely manner. And thus far, it runs cool and very quiet.
I started my computing career at NSFNET at the end of 1991. Which then became ANSnet. In those days, we had a home-brewed network monitoring system. I believe most/all of it was originally the brainchild of Bill Norton. Later there were several contributors; Linda Liebengood, myself, others. The important thing for today’s thoughts: it was named “rover”, and its user interface philosophy was simple but important: “Only show me actionable problems, and do it as quickly as possible.”
To understand this philosophy, you have to know something about the primary users: the network operators in the Network Operations Center (NOC). One of their many jobs was to observe problems, perform initial triage, and document their observations in a trouble ticket. From there they might fix the problem, escalate to network engineering, etc. But it wasn’t expected that we’d have some omniscient tool that could give them all of the data they (or anyone else) needed to resolve the problem. We expected everyone to use their brains, and we wanted our primary problem reporter to be fast and as clutter-free as possible.
For decades now, I’m spent a considerable amount of time working at home. Sometimes because I was officially telecommuting, at other times just because I love my work and burn midnight hours doing it. As a result, my home setup has become more complex over time. I have 10 gigabit ethernet throughout the house (some fiber, some Cat6A). I have multiple 10 gigabit ethernet switches, all managed. I have three rackmount computers in the basement that run 7×24. I have ZFS pools on two of them, used for nightly backups of all networked machines, source code repository redundancy, Time Machine for my macOS machines, etc. I run my own DHCP service, an internal DNS server, web servers, an internal mail server, my own automated security software to keep my pf tables current, Unifi, etc. I have a handful of Raspberry Pis doing various things. Then there’s all the other devices: desktop computers in my office, a networked laser printer, Roku, AppleTV, Android TV, Nest thermostat, Nest Protects, WiFi access points, laptops, tablet, phone, watch, Ooma, etc. And the list grows over time.
Essentially, my home has become somewhat complex. Without automation, I spend too much time checking the state of things or just being anxious about not having time to check everything at a reasonable frequency. Are my ZFS pools all healthy? Are all of my storage devices healthy? Am I running out of storage space anywhere? Is my DNS service working? Is my DHCP server working? My web server? NFS working where I need it? Is my Raspberry Pi garage door opener working? Are my domains resolvable from the outside world? Are the cloud services I use working? Is my Internet connection down? Is there a guest on my network? A bandit on my network? Is my printer alive? Is my internal mail service working? Are any of my UPS units running on battery? Are there network services running that should not be? What about the ones that should be, like sshd?
I needed a monitoring system that worked like rover; only show me actionable issues. So I wrote my own, and named it “mcrover”. It’s more of a host and service monitoring system than a network monitoring system, but it’s distributed and secure (using ed25519 stuff in libDwmAuth). It’s modern C++, relatively easy to extend, and has some fun bits (ASCII art in the curses client when there are no alerts, for example). Like the old Network Operations Center, I have a dedicated display in my office that only displays the mcrover Qt client, 24 hours a day. Since most of the time there are no alerts to display, the Qt client toggles between a display of the next week’s forecast and a weather radar image when there are no alerts. If there are alerts, the alert display will be shown instead, and will not go away until there are no alerts (or I click on the page switch in the UI). The dedicated display is driven by a Raspberry Pi 4B running the Qt client from boot, using EGLFS (no X11). The Raspberry Pi4 is powered via PoE. It is also running the mcrover service, to monitor local services on the Pi as well as many network services. In fact the mcrover service is running on every 7×24 general purpose computing device. mcrover instances can exchange alerts, hence I only need to look at one instance to see what’s being reported by all instances.
This has alleviated me of a lot of sys admin and network admin drudgery. It wasn’t trivial to implement, mostly due to the variety (not the quantity) of things it’s monitoring. But it has proven itself very worthwhile. I’ve been running it for many months now, and I no longer get anxious about not always keeping up with things like daily/weekly/monthly mail from cron and manually checking things. All critical (and some non-critical) things are now being checked every 60 seconds, and I only have my attention stolen when there is an actionable issue found by mcrover.
So… an ode to the philosophy of an old system. Don’t make me plow through a bunch of data to find the things I need to address. I’ll do that when there’s a problem, not when there isn’t a problem. For 7×24 general purpose computing devices running Linux, macOS or FreeBSD, I install and run the mcrover service and connect it to the mesh. And it requires very little oomph; it runs just fine on a Raspberry Pi 3 or 4.
So why the weather display? It’s just useful to me, particularly in the mowing season where I need to plan ahead for yard work. And I’ve just grown tired of the weather websites. Most are loaded with ads and clutter. All of them are tracking us. Why not just pull the data from tax-funded sources in JSON form and do it myself? I’ve got a dedicated display which doesn’t have any alerts to display most of the time, so it made sense to put it there.
The Qt client using X11, showing the weather forecast.
The Qt client using X11, showing the weather radar.
The curses client, showing ASCII art since there are no alerts to be shown.
I’ve been really dismayed by the lack of decent simple tools for testing the available bandwidth between a pair of hosts above 1 gigabit/second. Back when I didn’t have any 10 gigabit connections at home, I used iperf and iperf3. But I now have several 10 gigabit connections on my home network, and since these tools don’t use multithreading effectively, they become CPU bound (on a single core) before they reach the target bandwidth. Tools like ssh and scp have the same problem; they’re single threaded and become CPU bound long before they saturate a 10 gigabit connection.
When I install a 10 gigabit connection, whether it’s via SFP+ DACs, SFP+ SR optics or 10GbaseT, it’s important that I’m able to test the connection’s ability to sustain somewhere near line rate transfers end-to-end. Especially when I’m buying my DACs, transceivers or shielded cat6a patch cables from eBay or any truly inexpensive vendor. I needed a tool that could saturate a 10 gigabit connection and report the data transfer rate at the application level. Obviously due to the additional data for protocol headers and link encapsulation, this number will be lower than the link-level bandwidth, but it’s the number that ultimately matters for an application.
So, I quickly hacked together a multithreaded application to test my connections at home. It will spawn the requested number of threads (on each end) and the server will send data from each thread. Each thread gets its own TCP connection.
Given that I don’t create servers that don’t use strong authentication, even if they’ll only be run for 10 seconds, I’m using the PeerAuthenticator from libDwmAuth for authentication. No encryption of the data that’s being sent, since it’s not necessary.
Of course this got me thinking about the number of tools we have today that just don’t cut it in a 10 gigabit network. ssh, scp, ftp, fetch, etc. Even NFS code has trouble saturating a 10 gigabit connection. It seems like eons ago that Herb Sutter wrote “The Free Lunch Is Over”. It was published in 2005. Yet we still have a bunch of tools that are CPU bound due to being single-threaded. How are we supposed to take full advantage of 10 gigabit and faster networks if the tools we use for file transfer, streaming, etc. are single-threaded and hence CPU bound well before they reach 10 gigabits/second? What happens when I run some fiber at home for NAS and want to run 40 gigabit or (egads!) 100 gigabit? It’s not as if I don’t have the CPU to do 40 gigabits/second; my NAS has 12 cores and 24 threads. But if an application is single-threaded, it becomes CPU bound at around 3.5 gigabits/second on a typical server CPU core. 🙁 Sure, that’s better than 1 gigabit/second but it’s less than what a single SATA SSD can do, and much less than what an NVME/M.2/striped SATA SSD/et. al. can do.
We need tools that aren’t written as if it’s 1999. I suspect that after I polish up mcperf a little bit, I’m going to work on my own replacement for scp so I can at least transfer files without being CPU bound at well below my network bandwidth.
mcblockd added quite a few networks during a 20 minute period today. I don’t have an explanation for the ssh login attempts all coming in during this period, but it’s nice to see that mcblockd happily blocked all of them.
While this is by no means a high rate of attempts, it’s higher than what I normally see.
May 22 11:32:10 ria mcblockd: [I] Added 185.129.60/22 (DK) to ssh_losers for 180 days
May 22 11:32:11 ria mcblockd: [I] Added 89.234.152/21 (FR) to ssh_losers for 180 days
May 22 11:32:45 ria mcblockd: [I] Added 46.233.0/18 (BG) to ssh_losers for 180 days
May 22 11:33:00 ria mcblockd: [I] Added 216.218.222/24 (US) to ssh_losers for 30 days
May 22 11:33:05 ria mcblockd: [I] Added 199.87.154/24 (CA) to ssh_losers for 30 days
May 22 11:33:15 ria mcblockd: [I] Added 78.109.16/20 (UA) to ssh_losers for 180 days
May 22 11:33:18 ria mcblockd: [I] Added 89.38.148/22 (FR) to ssh_losers for 180 days
May 22 11:33:26 ria mcblockd: [I] Added 65.19.167/24 (US) to ssh_losers for 30 days
May 22 11:34:05 ria mcblockd: [I] Added 62.212.64/19 (NL) to ssh_losers for 180 days
May 22 11:35:54 ria mcblockd: [I] Added 190.10.0/17 (CR) to ssh_losers for 180 days
May 22 11:37:16 ria mcblockd: [I] Added 192.42.116/22 (NL) to ssh_losers for 180 days
May 22 11:38:33 ria mcblockd: [I] Added 199.249.223/24 (US) to ssh_losers for 30 days
May 22 11:38:37 ria mcblockd: [I] Added 173.254.216/24 (US) to ssh_losers for 30 days
May 22 11:39:48 ria mcblockd: [I] Added 128.52.128/24 (US) to ssh_losers for 30 days
May 22 11:39:51 ria mcblockd: [I] Added 64.113.32/24 (US) to ssh_losers for 30 days
May 22 11:40:32 ria mcblockd: [I] Added 23.92.27/24 (US) to ssh_losers for 30 days
May 22 11:40:50 ria mcblockd: [I] Added 162.221.202/24 (CA) to ssh_losers for 30 days
May 22 11:42:42 ria mcblockd: [I] Added 91.213.8/24 (UA) to ssh_losers for 180 days
May 22 11:43:37 ria mcblockd: [I] Added 162.247.72/24 (US) to ssh_losers for 30 days
May 22 11:44:34 ria mcblockd: [I] Added 193.110.157/24 (NL) to ssh_losers for 180 days
May 22 11:44:38 ria mcblockd: [I] Added 128.127.104/23 (SE) to ssh_losers for 180 days
May 22 11:45:50 ria mcblockd: [I] Added 179.43.128/18 (CH) to ssh_losers for 180 days
May 22 11:45:55 ria mcblockd: [I] Added 89.144.0/18 (DE) to ssh_losers for 180 days
May 22 11:46:29 ria mcblockd: [I] Added 197.231.220/22 (LR) to ssh_losers for 180 days
May 22 11:46:44 ria mcblockd: [I] Added 195.254.132/22 (RO) to ssh_losers for 180 days
May 22 11:46:54 ria mcblockd: [I] Added 154.16.244/24 (US) to ssh_losers for 30 days
May 22 11:47:52 ria mcblockd: [I] Added 87.118.64/18 (DE) to ssh_losers for 180 days
May 22 11:48:51 ria mcblockd: [I] Added 46.165.224/19 (DE) to ssh_losers for 180 days
May 22 11:50:13 ria mcblockd: [I] Added 178.17.168/21 (MD) to ssh_losers for 180 days
May 22 11:50:47 ria mcblockd: [I] Added 31.41.216/21 (UA) to ssh_losers for 180 days
May 22 11:50:55 ria mcblockd: [I] Added 62.102.144/21 (SE) to ssh_losers for 180 days
May 22 11:51:19 ria mcblockd: [I] Added 64.137.244/24 (CA) to ssh_losers for 30 days
May 22 11:52:28 ria mcblockd: [I] Added 80.244.80/20 (SE) to ssh_losers for 180 days
May 22 11:52:42 ria mcblockd: [I] Added 192.160.102/24 (CA) to ssh_losers for 30 days
May 22 11:53:06 ria mcblockd: [I] Added 176.10.96/19 (CH) to ssh_losers for 180 days
May 22 11:55:38 ria mcblockd: [I] Added 77.248/14 (NL) to ssh_losers for 180 days
May 22 11:56:20 ria mcblockd: [I] Added 199.119.112/24 (US) to ssh_losers for 30 days
May 22 11:56:32 ria mcblockd: [I] Added 94.142.240/21 (NL) to ssh_losers for 180 days
There’s no one even close in terms of ssh login attempts. In a span of two weeks, mcblockd has blocked 47 million more addresses from China. That doesn’t mean I’ve seen 47 million IP addresses in login attempts. It means that China has a lot of address space being used to probe U.S. sites.
Brazil is in second place, but they’re behind by more than a decimal order of magnitude. Below are the current top two countries being blocked by mcblockd, by quantity of address space.
I seriously doubt that Chinese citizens have anything to do with these attempts. I’m told that the Great Firewall blocks most ssh traffic on port 22. Not to mention that China’s Internet connectivity is somewhere near 95th in the world in terms of available bandwidth, so it’d be terribly painful for an ordinary user to use ssh or scp from China to my gateway. I think I can assume this is all government-sponsored probing.
Today I added a new feature to mcblockd to kill pf state for all hosts in a prefix when the prefix is added to one of my pf tables. This isn’t exactly what I want, but it’ll do for now.
mcblockd also now walks the PCB (protocol control block) list and drops TCP connections for hosts in a prefix I’ve just added to a pf table. Fortunately there was sample code in /usr/src/usr.sbin/tcpdrop/tcpdrop.c. The trick here is that I don’t currently have a means of mapping a pf table to where it’s applied (which ports, which interfaces). In the long term I might add code to figure that out, but in the interim I can configure ports and interfaces in mcblockd’s configuration file that will allow me to drop specific connections. For this first pass, I just toast all PCBs for a prefix.
The reason I added this feature: I occasionally see simultaneous login attempts from different IP addresses in the same prefix. If I’m going to block the prefix automatically, I want to cut off all of their connections right now, not after all of their connections have ended. Blowing away their pf state works, but leaves a hanging TCP connection in the ESTABLISHED state for a while. I want the PCBs to be cleaned up.
The mcblockd automation has been running for roughly one week. It’s been fairly busy automatically blocking those trying to crack my ssh server. Below is some of the output from a query of the active blocked networks (the summary information for the top 10 countries by the number of addresses being blocked). Interesting to note that the automation has blocked a huge swath of addresses from China. State-sponsored cyberattacks?
One of the many sets of data I collect with mcflow on my gateway is traffic counters for TCP SYN packets I receive but do not SYN ACK. I keep the source IP address, the destination port, and of course timestamps and counters. This type of data generally represents one of three things: probing for vulnerable services which I don’t run, probing for services I do run but block from offenders, or probing for botnet-controlled devices.
The table below shows the top 10 ports for the current week. In the case of ssh and http, I do run those services but mcblockd automatically blocks those who violate my configured policies. I do not run a telnet server anywhere (my IoT devices are of my own design and use ECDH, 2048-bit RSA keys and AES128). I also do not run MS SQL Server or rdp (Remote Desktop). I have no Windows hosts, and if I did, I certainly wouldn’t expose MS SQL Server or Remote Desktop.
Ports 7547 and 5358 are known to be used by Mirai and its descendants. Port 7547 is also a common port used by broadband ISPs for TR-064 services (specifically, TR-069) to manage home routers.
Below is a table showing the SYNs I didn’t SYN ACK by country. This is just the top 10. Note that the top two have large swaths of their IP address space automatically blocked by mcblockd for violating my configured policies. They’re also known state sponsors of cyberattacks, and the evidence is pretty clear here. Much (but not all) of the US stuff is research scanning.
RU (Russian Federation)
US (United States)
What is perhaps interesting about this data: the lines drawn during WWII and the Cold War don’t appear to have changed. I find this very sad. I’m just a tiny single user running a very modest home network, yet I’m a target of Russia and China. And my network is likely much more secure than the average home network. I assume this means that all of us are being probed all of the time, and some of us are probably regularly compromised. I think we (meaning the entire industry) need to consider completely banning telnet and doing something real about securing IoT devices.
Similar to what I have for sshd, I have real time log processing on my web server. The secure remote communication with mcblockd is very nice to have, since my web server is a separate machine from my gateway/firewall. Below you can see offending web server log entries followed immediately by an action from mcblockd. Instant blocking, without my involvement.
188.8.131.52 - [20/Apr/2017:19:08:08] "GET /blog/xmlrpc.php HTTP/1.0" 200 42
Apr 20 19:08:09 ria mcblockd: [I] Added 185.36.100/22 (CZ) to www_losers for 30 days
184.108.40.206 - [20/Apr/2017:19:57:52] "POST /blog/xmlrpc.php HTTP/1.1" 500 -
Apr 20 19:57:52 ria mcblockd: [I] Added 191.101/16 (CL) to www_losers for 90 days
220.127.116.11 - [20/Apr/2017:20:12:07] "GET /blog/xmlrpc.php HTTP/1.0" 200 42
Apr 20 20:12:08 ria mcblockd: [I] Added 5.164/14 (RU) to www_losers for 90 days
18.104.22.168 - [22/Apr/2017:21:59:24] "GET /wp-login.php HTTP/1.1" 404 210
Apr 22 21:59:24 ria mcblockd: [I] Added 160.202.160/22 (KR) to www_losers for 90 days
22.214.171.124 - [23/Apr/2017:00:58:00] "GET /wp-login.php HTTP/1.1" 404 210
Apr 23 00:58:00 ria mcblockd: [I] Added 104.173.193/24 (US) to www_losers for 30 days
126.96.36.199 - [23/Apr/2017:04:18:19] "GET /wp-login.php HTTP/1.1" 404 210
Apr 23 04:18:19 ria mcblockd: [I] Added 191.37.0/17 (BR) to www_losers for 90 days
188.8.131.52 - [23/Apr/2017:07:50:15] "GET /xmlrpc.php HTTP/1.1" 404 208
Apr 23 07:50:15 ria mcblockd: [I] Added 103.229.124/22 (TW) to www_losers for 30 days
184.108.40.206 - [23/Apr/2017:09:40:35] "GET /wp-login.php HTTP/1.1" 404 210
Apr 23 09:40:36 ria mcblockd: [I] Added 61.72/13 (KR) to www_losers for 90 days
220.127.116.11 - [23/Apr/2017:10:30:24] "GET /blog/xmlrpc.php HTTP/1.0" 405 42
Apr 23 10:30:24 ria mcblockd: [I] Added 46.161.0/18 (RU) to www_losers for 90 days
And yes, the threshold policy code works fine. Below is the result of someone trying to log in 5 times over a period of about 26 minutes. Since I have the threshold set to 5 times in 30 days, they were way above the threshold, but this would be considered a ‘slow’ attempt by some measures.
Apr 21 17:08:59 ria mcblockd: [I] Pending 69.162.73/24 (US) for ssh_losers, 1/5
Apr 21 17:08:59 ria mcblockd: [I] Pending 69.162.73/24 (US) for ssh_losers, 2/5
Apr 21 17:22:14 ria mcblockd: [I] Pending 69.162.73/24 (US) for ssh_losers, 3/5
Apr 21 17:22:14 ria mcblockd: [I] Pending 69.162.73/24 (US) for ssh_losers, 4/5
Apr 21 17:35:21 ria mcblockd: [I] Added 69.162.73/24 (US) to ssh_losers for 30 days
And another over a period of about 91 minutes:
Apr 23 01:39:43 ria mcblockd: [I] Pending 64.179.211/24 (CA) for ssh_losers, 1/5
Apr 23 01:39:43 ria mcblockd: [I] Pending 64.179.211/24 (CA) for ssh_losers, 2/5
Apr 23 02:25:48 ria mcblockd: [I] Pending 64.179.211/24 (CA) for ssh_losers, 3/5
Apr 23 02:25:48 ria mcblockd: [I] Pending 64.179.211/24 (CA) for ssh_losers, 4/5
Apr 23 03:11:15 ria mcblockd: [I] Added 64.179.211/24 (CA) to ssh_losers for 30 days