Good programmers are lazy. We’ll spend a good chunk of time writing new/better code if we know it will save us a lot of time in the future.
Case in point: I recently completely rewrote some old code I use to manage the pf rules on my gateway. Why? Because I had been spending too much time doing things that could be done automatically by software with just a small bit of intelligence. Basically codifying the things I’ve been doing manually. And also because I’m lazy, in the way that all good programmers are lazy.
Some background…
I’m not the type of person who fusses a great deal about the security of my home network. I don’t have anything to hide, and I don’t have a need for very many services. However, I know enough about Internet security to be wary and to at least protect myself from the obvious. And I prefer to keep out the hosts that have no need to access anything on my home network, including my web server. And a very long time ago, I was a victim of an SSH-v1 issue and someone from Romania set up an IRC server on my gateway while I was on vacation in the Virgin Islands. I don’t like someone else using my infrastructure for nefarious purposes.
At the time, it was almost humorous how little the FBI knew about the Internet (next to nothing). I’ll never forget how puzzled the agents were at my home when I was explaining what had happened. The only reason I had called them was because the perpetrator managed to get a credit card number from us (presumably by a man-in-the-middle attack) and used it to order a domain name and hosting services. At the time I had friends with fiber taps at the major exhanges and managed to track down some of his traffic and eventually a photo of him and his physical address (and of course I had logged a lot of the IRC traffic before I completely shut it down). Didn’t do me any good since he was a Russian minor living in Romania. The FBI agents knew nothing about the Internet. My recollection is hazy, but I think this was circa 1996. I know it was before SSH-v2, and that I was still using Kerberos where I could.
Times have changed (that was nearly 20 years ago). But I continue to keep a close eye on my Internet access. I will never be without my own firewall with all of the flexibility I need.
For a very long time, I’ve used my own software to manage the list of IP prefixes I block from accessing my home network. Way back when, it was hard: we didn’t have things like pf. But all the while I’ve had some fairly simple software to help me manage the list of IP prefixes that I block from accessing my home network and simple log grokking scripts to tell me what looks suspicious.
Way back when, the list was small. It grew slowly for a while, but today it’s pretty much non-stop. And I don’t think of myself as a desirable target. Which probably means that nearly everyone is under regular probing and weak attack attempts.
One interesting thing I’ve observed over the last 5 years or so… the cyberwarfare battle lines could almost be drawn from a very brief lesson on WWI, WWII and the Cold War, with maybe a smattering of foreign policy SNAFUs and socialism/communism versus capitalism and East versus West. In the last 5 years, I’ve primarily seen China, Russia, Italy, Turkey, Brazil and Columbia address space in my logs with a smattering of former Soviet block countries, Iran, Syria and a handful of others. U.S. based probes are a trickle in comparison. It’s really a sad commentary on the human race, to be honest. I would wager that the countries in my logs are seeing the opposite directed at them: most of their probes and attacks are likely originating from the U.S. and its old WWII and NATO allies. Sigh.
Anyway…
My strategy
For about 10 years I’ve been using code I wrote that penalizes repeat attackers by doubling their penalty time each time their address space is re-activated in my blocked list. This has worked well; the gross repeat offenders wind up being blocked for years, while those who only knock once are only blocked for a long enough time to thwart their efforts. Many of them move on and never return (meaning I don’t see more attacks from their address space for a very long time). Some never stop, and I assume some of those are state-sponsored, i.e. they’re being paid to do it. Script kiddies don’t spend years trying to break into the same tiny web site nor years scanning gobs of broadband address space. Governments are a different story with a different set of motivations that clearly don’t go away for decades or even centuries.
The failings
The major drawback to what I’ve been doing for years: too much manual intervention, especially adding new entries. It doesn’t help that there is no standard logging format for various externally-facing services and that the logging isn’t necessarily consistent from one version to the next.
My primary goal was to automate the drudgery, replace the SQL database in the interest of having something lighter and speedier, while leveraging code and ideas that have worked well for me. I created mcblock as a simple set of C++ classes and a single command-line application to serve the purpose of grokking logs and automatically adding to my pf rules.
Automation
- I’m not going to name all the ways in which I automatically add offenders, but I’ll mention one: I parse auth.log.0.bz2 every time newsyslog rolls over auth.log. This is fairly easy on FreeBSD, see the entry regarding the R flag and path_to_pid_cmd_file in the newsyslog.conf(5) manpage. Based on my own simple heuristics, those who've been offensive will be blocked for at least 30 days. Longer if they're repeat offenders, and I will soon add policy to permit more elaborate qualifications. What I have today is fast and effective, but I want to add some feeds from my probe detector (reports on those probing ports on which I have nothing listening) as well as from pflog. I can use those things today to add entries or re-instantiate expired entries, but I want to be able to extend the expiration time of existing active entries for those who continue to probe for days despite not receiving any response packets.
- My older code used an SQL database, which was OK for most things but made some operations difficult on low-power machines. For example, I like to be able to automatically coalesce adjacent networks before emitting pf rules; it makes the pf rules easier to read. For example, if I already have 5.149.104/24 in my list and I add 5.149.105/24, I prefer emitting a single rule for 5.149.104/23. And if I add 5.149.105/24 but I have an inactive (expired) rule for 5.149.104/22, I prefer to reactivate the 5.149.104/22 rule rather than add a new rule specifically for 5.149.105/24. My automatic additions always use /24's, but once in a while I will manually add wider rules knowing that no one from a given address space needs access to anything on my network or the space is likely being used for state-sponsored cyberattacks. Say Russian government address space, for example; there's nothing a Russian citizen would need from my tiny web site and I certainly don't have any interest in continuous probes from any state-sponsored foreign entity.
- Today I'm using a modified version of my Ipv4Routes class template to hold all of the entries. Modified because my normal Ipv4Routes class template uses a vector of unordered_map under the hood (to allow millions of longest-match IPv4 address lookups per second), but I need ordering and also a smaller memory footprint for my pf rule generation. While it's possible to reduce the memory footprint of unordered_map by increasing the load factor, it defeats the purpose (slows it down) when your hash key population isn't well-known and you still wind up with no ordering. Ordering allows the coalescing of adjacent prefixes to proceed quickly, so my modified class template uses map in place of unordered_map. Like my original Ipv4Routes class template, I have separate maps for each prefix length, hence there are 33 of them. Of course I don't have a use for /0, but it's there. I also typically don't have a use for the /32 map, but it's also there. Having the prefix maps separated by netmask length makes it easy to understand how to find wider and narrower matches for a given IP address or prefix, and hence write code that coalesces or expands prefixes. And it's more than fast enough for my needs: it will easily support hundreds of thousands of lookups per second, and I don't need it to be anywhere near as fast as it is. But I only had to change a couple of lines of my existing Ipv4Routes class template to make it work, and then added the new features I needed.
- I never automatically remove entries from the new database. That's because historical information is useful and the automation can re-activate an existing but expired entry that might be a wider prefix than what I would allow automation to do without such information. While heuristics can do some of this fairly reliably, expired entries in the database serve as additional data for heuristics. If I've blocked a /16 before, seeing nefarious traffic from it again can (and usually should) trigger reactivation of a rule for that /16. And then there are the things like bogons and private space that should always be available for reactivation if I see packets with source addresses from those spaces coming in on an external interface.
- Having this all automated means I now spend considerably less time updating my pf rules. Formerly I would find myself manually coalescing the database, deciding when I should use a wider prefix, reading the daily security email from my gateway to make sure I wasn't missing anything, etc. Since I now have unit tests and a real lexer/parser for auth.log, and pf entries are automatically updated and coalesced regularly, I can look at things less often and at my leisure while knowing that at least most of the undesired stuff is being automatically blocked soon after it is identified.
Good programmers are lazy. A few weekends of work is going to save me a lot of time in the future. I should've cobbled this up a long time ago.