mcblock: new code for pf rule management from a ‘lazy’ programmer

Good programmers are lazy. We’ll spend a good chunk of time writing new/better code if we know it will save us a lot of time in the future.

Case in point: I recently completely rewrote some old code I use to manage the pf rules on my gateway. Why? Because I had been spending too much time doing things that could be done automatically by software with just a small bit of intelligence. Basically codifying the things I’ve been doing manually. And also because I’m lazy, in the way that all good programmers are lazy.

Some background…

I’m not the type of person who fusses a great deal about the security of my home network. I don’t have anything to hide, and I don’t have a need for very many services. However, I know enough about Internet security to be wary and to at least protect myself from the obvious. And I prefer to keep out the hosts that have no need to access anything on my home network, including my web server. And a very long time ago, I was a victim of an SSH-v1 issue and someone from Romania set up an IRC server on my gateway while I was on vacation in the Virgin Islands. I don’t like someone else using my infrastructure for nefarious purposes.

At the time, it was almost humorous how little the FBI knew about the Internet (next to nothing). I’ll never forget how puzzled the agents were at my home when I was explaining what had happened. The only reason I had called them was because the perpetrator managed to get a credit card number from us (presumably by a man-in-the-middle attack) and used it to order a domain name and hosting services. At the time I had friends with fiber taps at the major exhanges and managed to track down some of his traffic and eventually a photo of him and his physical address (and of course I had logged a lot of the IRC traffic before I completely shut it down). Didn’t do me any good since he was a Russian minor living in Romania. The FBI agents knew nothing about the Internet. My recollection is hazy, but I think this was circa 1996. I know it was before SSH-v2, and that I was still using Kerberos where I could.

Times have changed (that was nearly 20 years ago). But I continue to keep a close eye on my Internet access. I will never be without my own firewall with all of the flexibility I need.

For a very long time, I’ve used my own software to manage the list of IP prefixes I block from accessing my home network. Way back when, it was hard: we didn’t have things like pf. But all the while I’ve had some fairly simple software to help me manage the list of IP prefixes that I block from accessing my home network and simple log grokking scripts to tell me what looks suspicious.

Way back when, the list was small. It grew slowly for a while, but today it’s pretty much non-stop. And I don’t think of myself as a desirable target. Which probably means that nearly everyone is under regular probing and weak attack attempts.

One interesting thing I’ve observed over the last 5 years or so… the cyberwarfare battle lines could almost be drawn from a very brief lesson on WWI, WWII and the Cold War, with maybe a smattering of foreign policy SNAFUs and socialism/communism versus capitalism and East versus West. In the last 5 years, I’ve primarily seen China, Russia, Italy, Turkey, Brazil and Columbia address space in my logs with a smattering of former Soviet block countries, Iran, Syria and a handful of others. U.S. based probes are a trickle in comparison. It’s really a sad commentary on the human race, to be honest. I would wager that the countries in my logs are seeing the opposite directed at them: most of their probes and attacks are likely originating from the U.S. and its old WWII and NATO allies. Sigh.

Anyway…

My strategy

For about 10 years I’ve been using code I wrote that penalizes repeat attackers by doubling their penalty time each time their address space is re-activated in my blocked list. This has worked well; the gross repeat offenders wind up being blocked for years, while those who only knock once are only blocked for a long enough time to thwart their efforts. Many of them move on and never return (meaning I don’t see more attacks from their address space for a very long time). Some never stop, and I assume some of those are state-sponsored, i.e. they’re being paid to do it. Script kiddies don’t spend years trying to break into the same tiny web site nor years scanning gobs of broadband address space. Governments are a different story with a different set of motivations that clearly don’t go away for decades or even centuries.

The failings

The major drawback to what I’ve been doing for years: too much manual intervention, especially adding new entries. It doesn’t help that there is no standard logging format for various externally-facing services and that the logging isn’t necessarily consistent from one version to the next.

My primary goal was to automate the drudgery, replace the SQL database in the interest of having something lighter and speedier, while leveraging code and ideas that have worked well for me. I created mcblock as a simple set of C++ classes and a single command-line application to serve the purpose of grokking logs and automatically adding to my pf rules.

Automation

  • I’m not going to name all the ways in which I automatically add offenders, but I’ll mention one: I parse auth.log.0.bz2 every time newsyslog rolls over auth.log. This is fairly easy on FreeBSD, see the entry regarding the R flag and path_to_pid_cmd_file in the newsyslog.conf(5) manpage. Based on my own simple heuristics, those who've been offensive will be blocked for at least 30 days. Longer if they're repeat offenders, and I will soon add policy to permit more elaborate qualifications. What I have today is fast and effective, but I want to add some feeds from my probe detector (reports on those probing ports on which I have nothing listening) as well as from pflog. I can use those things today to add entries or re-instantiate expired entries, but I want to be able to extend the expiration time of existing active entries for those who continue to probe for days despite not receiving any response packets.
  • My older code used an SQL database, which was OK for most things but made some operations difficult on low-power machines. For example, I like to be able to automatically coalesce adjacent networks before emitting pf rules; it makes the pf rules easier to read. For example, if I already have 5.149.104/24 in my list and I add 5.149.105/24, I prefer emitting a single rule for 5.149.104/23. And if I add 5.149.105/24 but I have an inactive (expired) rule for 5.149.104/22, I prefer to reactivate the 5.149.104/22 rule rather than add a new rule specifically for 5.149.105/24. My automatic additions always use /24's, but once in a while I will manually add wider rules knowing that no one from a given address space needs access to anything on my network or the space is likely being used for state-sponsored cyberattacks. Say Russian government address space, for example; there's nothing a Russian citizen would need from my tiny web site and I certainly don't have any interest in continuous probes from any state-sponsored foreign entity.
  • Today I'm using a modified version of my Ipv4Routes class template to hold all of the entries. Modified because my normal Ipv4Routes class template uses a vector of unordered_map under the hood (to allow millions of longest-match IPv4 address lookups per second), but I need ordering and also a smaller memory footprint for my pf rule generation. While it's possible to reduce the memory footprint of unordered_map by increasing the load factor, it defeats the purpose (slows it down) when your hash key population isn't well-known and you still wind up with no ordering. Ordering allows the coalescing of adjacent prefixes to proceed quickly, so my modified class template uses map in place of unordered_map. Like my original Ipv4Routes class template, I have separate maps for each prefix length, hence there are 33 of them. Of course I don't have a use for /0, but it's there. I also typically don't have a use for the /32 map, but it's also there. Having the prefix maps separated by netmask length makes it easy to understand how to find wider and narrower matches for a given IP address or prefix, and hence write code that coalesces or expands prefixes. And it's more than fast enough for my needs: it will easily support hundreds of thousands of lookups per second, and I don't need it to be anywhere near as fast as it is. But I only had to change a couple of lines of my existing Ipv4Routes class template to make it work, and then added the new features I needed.
  • I never automatically remove entries from the new database. That's because historical information is useful and the automation can re-activate an existing but expired entry that might be a wider prefix than what I would allow automation to do without such information. While heuristics can do some of this fairly reliably, expired entries in the database serve as additional data for heuristics. If I've blocked a /16 before, seeing nefarious traffic from it again can (and usually should) trigger reactivation of a rule for that /16. And then there are the things like bogons and private space that should always be available for reactivation if I see packets with source addresses from those spaces coming in on an external interface.
  • Having this all automated means I now spend considerably less time updating my pf rules. Formerly I would find myself manually coalescing the database, deciding when I should use a wider prefix, reading the daily security email from my gateway to make sure I wasn't missing anything, etc. Since I now have unit tests and a real lexer/parser for auth.log, and pf entries are automatically updated and coalesced regularly, I can look at things less often and at my leisure while knowing that at least most of the undesired stuff is being automatically blocked soon after it is identified.

Good programmers are lazy. A few weekends of work is going to save me a lot of time in the future. I should've cobbled this up a long time ago.

Site Health page now shows UPS status and history

I am now collecting UPS status data from my UPS, and there is a new plot on the Site Health page that displays it. I still need to make these plots work more like those on the Site Traffic page, but having the UPS data for battery charge level, UPS load, expected runtime on battery and power consumption is useful to me. I currently have 3 computers plus some other gear running from one UPS, but soon will move a few things to a second UPS to increase my expected on-battery runtime a bit.

Measuring TCP round-trip times, part 5: first round of clean-ups

I added the ability to toggle a chart’s y-axis scale between linear and logarithmic. This has been deployed on the Site Traffic page.

Code cleanup… when I added the round-trip time plot, I wound up creating a lot of code that is largely a duplicate of code I had for the traffic plot. Obviously there are differences in the data and the presentation, but much of it is similar or the same. Tonight I started looking at trickling common functionality into base classes, functions and possibly a template or two.

I started with the obvious: there’s little sense in having a lot of duplicate code for the basics of the charts. While both instances were of type Wt::Chart::WCartesianChart, I had separate code to set things like the chart palette, handle a click on the chart widget, etc. I’ve moved the common functionality into my own chart class. It’s likely I’ll later use this class on my Site Health page.

Measuring TCP round-trip times, part 4: plotting the data

I added a new plot to the Site Traffic page. This is just another Wt widget in my existing Wt application that displays the traffic chart. Since Wt does not have a box plot, I’m displaying the data as a simple point/line chart. There’s a data series for each of the minimum, 25th percentile, median, 75th percentile and 95th percentile. These are global round-trip time measurements across all hosts that accessed my web site. In a very rough sense, they represent the network distance of the clients of my web site. It’s worth noting that the minimum line typically represents my own traffic, since my workstation shares an ethernet connection with my web server.

Clicking on the chart will display the values (in a table below the chart) for the time that was clicked. I added the same function to the traffic chart while I was in the code. I also started playing with mouse drag tracking so I can later add zooming.

Measuring TCP round-trip times, part 3: data storage classes

I’ve completed the design and initial implementation of some C++ classes for storage of TCP round trip data. These classes are simple, especially since I’m leveraging functionality from the Dwm::IO namespace in my libDwm library.

The Dwm::WWW::TCPRoundTrips class is used to encapsulate a std::vector of round trip times. Each round trip time is represented by a Dwm::TimeValue (class from libDwm). I don’t really care about the order of the entries in the vector, since a higher-level container holds the time interval in which the measurements were taken. Since I don’t care about the order of entries in the vector, I can use mutating algorithms on the vector when desired.

The Dwm::WWW::TCPHostRoundTrips class contains a std::map of the aforementioned Dwm::WWW::TCPRoundTrips objects, keyed by the remote host IP address (represented by Dwm::Ipv4Address from libDwm). An instance of this class is used to store all round trip data during a given interval. This class also contains a Dwm::TimeInterval (from my libDwm library) representing the measurement interval in which the round trip times were collected.

both of these classes have OrderStats() members which will fetch order statistics from the encapsulated data. I’m hoping to develop a box plot class for Wt in order to display the order statistics.

Measuring TCP round-trip times, part 2: the throwaway

I previously posted about measuring TCP round-trip times from my web server to its clients. Last night I quickly added code to my existing site traffic monitor to perform this task, as the experimental throwaway to validate my design idea. I have not yet designed the data store, only the collection of round-trip times. To see what it’s doing, I syslog the rtt measurements. It appears to be working fine. Here’s some data from Google’s crawlers prowling my site:


May 2 21:37:23 www sitetrafficd[2318]: [I] 66.249.66.43:45825 rtt 123.2 ms
May 2 21:38:49 www sitetrafficd[2318]: [I] 66.249.66.57:38926 rtt 123.6 ms
May 2 21:38:49 www sitetrafficd[2318]: [I] 66.249.66.43:38085 rtt 123.5 ms
May 2 21:40:16 www sitetrafficd[2318]: [I] 66.249.66.143:39725 rtt 137.8 ms
May 2 21:40:16 www sitetrafficd[2318]: [I] 66.249.66.143:37657 rtt 126.2 ms
May 2 21:41:25 www sitetrafficd[2318]: [I] 66.249.66.204:47961 rtt 160.9 ms
May 2 21:41:25 www sitetrafficd[2318]: [I] 66.249.66.143:45623 rtt 121.1 ms
May 2 21:41:47 www sitetrafficd[2318]: [I] 66.249.66.60:36603 rtt 142 ms
May 2 21:42:15 www sitetrafficd[2318]: [I] 66.249.66.204:48875 rtt 123.6 ms
May 2 21:43:15 www sitetrafficd[2318]: [I] 66.249.66.43:56275 rtt 125.8 ms
May 2 21:44:42 www sitetrafficd[2318]: [I] 66.249.66.57:49966 rtt 124.1 ms
May 2 21:44:42 www sitetrafficd[2318]: [I] 66.249.66.204:53209 rtt 122.9 ms
May 2 21:45:59 www sitetrafficd[2318]: [I] 66.249.66.238:46595 rtt 123.8 ms
May 2 21:47:27 www sitetrafficd[2318]: [I] 66.249.66.60:60241 rtt 142.2 ms

I believe I can call the raw measurement design valid. It’s a bonus that it was not difficult to add to my existing data collection application. It’s another bonus that the data collection remains fairly lightweight in user space. My collection process has a resident set size just over 4M, and that’s on a 64-bit machine. My round-trip time measurement resolution is microseconds, and I’m using the timestamps from pcap to reduce scheduler-induced variance. Since I’m not using any special kernel facilities directly, this code should port fairly easily to OS X and Linux.

May 3, 2012
I added the ability to track SYN to SYN ACK round-trip time, so I can run data collection on my desktop or gateway and characterize connections where I’m acting as the TCP client.

Measuring TCP round-trip times, part 1: random thought

I would like to start measuring TCP round-trip times from my web server. This could potentially be done on either my web server or my firewall. But given that I’m already sniffing related packets on my web server for other purposes, it makes sense to do the work there, possibly in the same process.

The idea is simple, and surely unoriginal: measure the time between my server’s SYN ACK and the client’s ACK of my SYN ACK (the last 2/3 of a TCP handshake). Record the wall time, the client IP address, and the time between the transmission of my SYN ACK and the reception of the client’s ACK of my SYN ACK.

In the not too distant future, I will upgrade my desktop machine to FreeBSD 9.0-STABLE. At that point I’ll start writing code that utilizes use the new h_ertt(4) kernel module.

Much of what I want is actually client-anonymous: an idea of the distribution of network distance of the visitors of my web site. I will want a facility to deal with crawlers, since they’re of less interest to me than human eyeballs and are likely to skew some statistics.

dwmqping revived

Recently I had the need for a program to collect and plot round-trip times from my desktop. I wrote such a program in 2008 for FreeBSD, called dwmqping. I decided to revive it this month.

The old user interface was somewhat ugly (was I aiming for a Star Trek color scheme?), and at some point I’ll probably change it. However, it is functional, and I had forgotten how useful it can be. It can ping one or more destinations, and it uses TCP packets for transmission and BPF for timing, so it’s fairly accurate and not as dramatically affected by process scheduling, etc. as timestamping in application code. Clicking on the plot causes a snapshot to be saved in /tmp; this is useful but also a nuisance since it doesn’t let you supply a filename and is easily triggered when you don’t want a snapshot.

I used Qt for the GUI, and I’ve no plans to change to something other than Qt.

The buttons needs some work, but here are some screenshots of what it looks like right now.

This first screenshot is with 2 destinations on my local network: ria and www. Here we have the live plot showing for ria. The packet rate was 1 packet per second. The blue lines and cyan dots in the plot are raw round-trip samples. The yellow line is the median of the last N points, whenre N was 1600 for this picture. If packet loss is oberved, it appears as a red area plot using the Y-axis on the right. This axis is logarithmic because TCP begins to suffer significantly before 10% packet loss.

The next screenshot is the history plot for a single destination on the other side of the country at 20 packets per second. The history plot is a box or candle plot. The top of the blue line represents the 95th percentile of the samples in the box’s set. I intentionally don’t display the maximum sample; I don’t want a single outlier to skew the Y-axis scale. The top of the cyan box represents the 75th percentile. The line through the cyan box represents the 50th percentile (median). The bottom of the cyan box represents the 25th percentile. The bottom of the blue line represents the minimum sample.

The next screenshot is the distribution plot. It shows the round trip time distribution for the last N samples, where N is 1600 in this case.

CPU gauge and plot added to dwmqlaunch

Having grown tired of the amount of CPU used by gkrellm and how difficult it is to create styles for it (as bad as xmms), I decided to start adding some optional widgets to dwmqlaunch to show the things I was using gkrellm to display. I started with CPU utilization. I used my old gauge code to create a small CPU utilization gauge, and created a new widget to display a CPU utilization plot. They’re at the bottom of the dwmqlaunch screenshot shown to the right of this text. In the plot, the blue bars are live unfiltered data and the orange line is the median of the 10 most recent samples.