Looking at ‘Synners’ (TCP SYN data)

One of the many sets of data I collect with mcflow on my gateway is traffic counters for TCP SYN packets I receive but do not SYN ACK. I keep the source IP address, the destination port, and of course timestamps and counters. This type of data generally represents one of three things: probing for vulnerable services which I don’t run, probing for services I do run but block from offenders, or probing for botnet-controlled devices.

The table below shows the top 10 ports for the current week. In the case of ssh and http, I do run those services but mcblockd automatically blocks those who violate my configured policies. I do not run a telnet server anywhere (my IoT devices are of my own design and use ECDH, 2048-bit RSA keys and AES128). I also do not run MS SQL Server or rdp (Remote Desktop). I have no Windows hosts, and if I did, I certainly wouldn’t expose MS SQL Server or Remote Desktop.

Ports 7547 and 5358 are known to be used by Mirai and its descendants. Port 7547 is also a common port used by broadband ISPs for TR-064 services (specifically, TR-069) to manage home routers.

Port Packets Bytes
22 (ssh) 22116 1168688
23 (telnet) 3740 152784
80 (http) 1601 99216
1433 (ms-sql-s) 1279 52288
81 917 38016
7547 515 20620
3389 (rdp) 199 8792
5358 195 8148
2323 181 7384
8080 154 6700

Below is a table showing the SYNs I didn’t SYN ACK by country. This is just the top 10. Note that the top two have large swaths of their IP address space automatically blocked by mcblockd for violating my configured policies. They’re also known state sponsors of cyberattacks, and the evidence is pretty clear here. Much (but not all) of the US stuff is research scanning.

Country Packets Bytes
RU (Russian Federation) 17394 864024
CN (China) 6038 319116
US (United States) 3077 169932
NL (Netherlands) 1160 47580
TH (Thailand) 603 33480
UA (Ukraine) 467 20612
KR (Korea) 462 19380
BR (Brazil) 426 18708
FR (France) 341 17828
TR (Turkey) 281 11756

What is perhaps interesting about this data: the lines drawn during WWII and the Cold War don’t appear to have changed. I find this very sad. I’m just a tiny single user running a very modest home network, yet I’m a target of Russia and China. And my network is likely much more secure than the average home network. I assume this means that all of us are being probed all of the time, and some of us are probably regularly compromised. I think we (meaning the entire industry) need to consider completely banning telnet and doing something real about securing IoT devices.

mcblockd automation progress

So far, so good. Nice to see this in the logs while I’m working on updates to mcblockd. This shows lines from my auth.log with the corresponding actions invoked in mcblockd. The key takeaway: nearly instantaneous response to login attempts from countries where I have the policy set to low tolerance, and the expected response for “US” networks where I have the tolerance set a little higher.

The way this works…

A mcblocklog process receives all auth.log entries via a pipe from syslogd. It uses a list of regular expressions (in a plain text file) to match offending lines in the log, then posts matched IP addresses to mcblockd as ‘logHit’ requests. Unlike my previous setup that periodically parsed entire logs, this happens in real time. mcblockd asks dwprdapd for prefix and country information, then applies configured policy. Depending on the policy for the network, mcblockd may instantly add an entry to its database and the pf table, or wait for the policy to be violated (number of hits over a configured time period). For foreign countries, I have the policy set to trigger from a single offending line, hence mcblockd will immediately add an entry to the pf table. For the U.S., I have the policy set to 5 hits in 7 days. These are experimental settings at the moment, it’s likely I’ll change them.

Also part of the configured policy is how long an entry will live in the pf tables, by days. For countries which have no business connecting to my network, the policy is set long versus my own country. This is a common desired feature in an IPS (Intrusion Protection System). Another part of the policy is a ‘widest mask’ setting, to allow me to avoid blocking huge swaths of address space from a given country to whom I want to grant a bit of leniency (say the U.S. and Canada in my case).

Probably worth noting that if an address is already covered in the pf tables, mcblockd does nothing.

Also worth noting that the service is secured with libDwmAuth, using ECDH and 2048-bit RSA keys during authentication, then AES128 in GCM mode after authentication.

While the log entries below are for ssh, I have a similar process for web logs and mail server logs.

Apr 19 05:33:30 ria sshd[7695]: error: maximum authentication attempts exceeded
                    for root from 81.100.183.189 port 43973 ssh2 [preauth]
Apr 19 05:33:30 ria mcblockd[1854]: [I] Added 81.96/12 (GB) to ssh_losers

Apr 19 06:09:50 ria sshd[7752]: error: maximum authentication attempts exceeded
                    for root from 36.36.254.10 port 60635 ssh2 [preauth]
Apr 19 06:09:50 ria mcblockd[1854]: [I] Added 36.36/16 (CN) to ssh_losers

Apr 19 09:22:37 ria sshd[8123]: error: maximum authentication attempts exceeded
                    for root from 123.96.0.151 port 60583 ssh2 [preauth]
Apr 19 09:22:37 ria mcblockd[1854]: [I] Added 123.96/15 (CN) to ssh_losers

Apr 19 09:29:38 ria sshd[8129]: Did not receive identification string from 34.205.143.181
Apr 19 09:29:43 ria sshd[8130]: Invalid user support from 34.205.143.181
Apr 19 09:29:43 ria sshd[8130]: Postponed keyboard-interactive for invalid user
                    support from 34.205.143.181 port 53145 ssh2 [preauth]
Apr 19 09:29:43 ria sshd[8130]: error: PAM: authentication error for illegal user
                    support from 34.205.143.181
Apr 19 09:29:43 ria sshd[8130]: Failed keyboard-interactive/pam for invalid user
                    support from 34.205.143.181 port 53145 ssh2
Apr 19 09:29:44 ria mcblockd[1854]: [I] Added 34.205.143/24 (US) to ssh_losers

Apr 19 14:11:40 ria sshd[8666]: error: maximum authentication attempts exceeded
                    for root from 200.73.205.204 port 45585 ssh2 [preauth]
Apr 19 14:11:40 ria mcblockd[1854]: [I] Added 200.73.200/21 (EC) to ssh_losers

Apr 19 14:51:48 ria sshd[9272]: Invalid user admin from 77.39.72.192
Apr 19 14:51:48 ria mcblockd[1854]: [I] Added 77.39.0/17 (RU) to ssh_losers

Apr 19 15:31:18 ria sshd[17218]: Invalid user admin from 193.105.134.184
Apr 19 15:31:18 ria mcblockd[1854]: [I] Added 193.105.134/24 (SE) to ssh_losers

Apr 19 15:34:02 ria sshd[18020]: error: maximum authentication attempts exceeded
                    for root from 85.90.198.244 port 44202 ssh2 [preauth]
Apr 19 15:34:02 ria mcblockd[31598]: [I] Added 85.90.192/19 (UA) to ssh_losers

Apr 19 15:58:13 ria sshd[23696]: error: maximum authentication attempts exceeded
                    for root from 156.213.133.233 port 58400 ssh2 [preauth]
Apr 19 15:58:13 ria mcblockd[31598]: [I] Added 156.192/11 (EG) to ssh_losers

Apr 19 16:04:49 ria sshd[23785]: error: maximum authentication attempts exceeded
                    for root from 171.50.175.114 port 46884 ssh2 [preauth]
Apr 19 16:04:49 ria mcblockd[31598]: [I] Added 171.48/12 (IN) to ssh_losers

Apr 19 16:39:23 ria sshd[23858]: Invalid user support from 181.211.93.159
Apr 19 16:39:23 ria mcblockd[31598]: [I] Added 181.211/16 (EC) to ssh_losers

Apr 19 16:59:10 ria sshd[23914]: Did not receive identification string from 
                    128.40.46.124
Apr 19 16:59:10 ria mcblockd[31598]: [I] Added 128.40/15 (GB) to ssh_losers

Apr 19 18:19:24 ria sshd[24599]: error: maximum authentication attempts exceeded
                    for root from 178.216.100.130 port 52035 ssh2 [preauth]
Apr 19 18:19:24 ria mcblockd[31598]: [I] Added 178.216.96/21 (UA) to ssh_losers

Apr 19 19:21:43 ria sshd[24873]: Invalid user admin from 200.121.233.88
Apr 19 19:21:43 ria mcblockd[31598]: [I] Added 200.121/16 (PE) to ssh_losers

Apr 19 23:12:25 ria sshd[30989]: error: maximum authentication attempts exceeded
                    for root from 131.161.55.11 port 42822 ssh2 [preauth]
Apr 19 23:12:25 ria mcblockd[31598]: [I] Added 131.161.52/22 (HN) to ssh_losers

Apr 20 00:08:10 ria sshd[31282]: error: maximum authentication attempts exceeded
                    for root from 167.250.75.214 port 4837 ssh2 [preauth]
Apr 20 00:08:10 ria mcblockd[31598]: [I] Added 167.250.72/22 (BR) to ssh_losers

Apr 20 00:22:31 ria sshd[31674]: Did not receive identification string from
                    218.93.17.146
Apr 20 00:22:31 ria mcblockd[31598]: [I] Added 218.64/11 (CN) to ssh_losers

Apr 20 00:25:41 ria sshd[31691]: Invalid user admin from 60.178.126.100
Apr 20 00:25:41 ria mcblockd[31598]: [I] Added 60.160/11 (CN) to ssh_losers

Apr 20 00:38:12 ria sshd[31715]: Invalid user ubnt from 119.191.105.117
Apr 20 00:38:12 ria mcblockd[31598]: [I] Added 119.176/12 (CN) to ssh_losers

Apr 20 00:45:53 ria sshd[31733]: Invalid user admin from 123.170.99.10
Apr 20 00:45:53 ria mcblockd[31598]: [I] Added 123.160/12 (CN) to ssh_losers

Apr 20 01:39:27 ria sshd[31845]: error: maximum authentication attempts exceeded
                    for root from 119.193.140.196 port 60716 ssh2 [preauth]
Apr 20 01:39:27 ria mcblockd[31598]: [I] Added 119.192/11 (KR) to ssh_losers

dwmrdapd nearing production-ready: RDAP cache for IDS/IPS applications

I’ve been working on a new IP to country mapping service to be used by my IDS/IPS tools. This post is about the server portion, named dwmrdapd.

dwmrdapd provides a simple service to map an IP address to its registered prefix (in one of the NICs, i.e. ARIN, RIPE, AFRINIC, LACNIC, APNIC) and its registered country. It maintains a small custom database of the mappings in order to provide a quick responses to most queries. When an entry is not found in the database, or the requested entry is more than 30 days old, dwmrdapd will make a new RDAP query to the RDAP server of the corresponding NIC (Network Information Center).

I’ve been using the service to apply policy to the networks automatically blocked by my firewall. As of this week, I can call it near production-ready.

Most of the trickery in implementing this service revolved around dealing with ARIN’s poor RDAP service. The first problem was dealing with the fact that they pad IP octets with leading zeros in startAddress and endAddress, which leads all of the standard string to address functions to interpret the numbers as octal. That was relatively easy to handle with a simple regular expression fix. The second problem is that ARIN doesn’t populate the country value. Why, I don’t know. The workaround is to parse all of the vcardArrays for a card with an adr label and then parse the label looking for a country name, then map that country name to a 2-letter country code. The latest version of dwmrdapd does this, but it’s still a bit hokey. Some ARIN RDAP responses contain many vcard entries, with different countries. There doesn’t seem to be a science to the entries, hence I prioritize non-U.S. cards and fall back to “US” as the country code as a last resort.

The service itself is secured with libDwmAuth using ECDH, RSA 2048-bit keys and AES128 in GCM mode once authentication is complete. Key management is very similar to that used by ssh, which makes it easy for me to use on my local hosts.

Inside the encryption is just simple JSON. Example output from the simple client:

% dwmrdapc 35.1.1.1
[
   {
      "country" : "US",
      "countryName" : "United States of America",
      "ipv4addr" : "35.1.1.1",
      "lastChanged" : "2014-09-23 18:00",
      "lastUpdated" : "2017-04-20 15:26",
      "prefix" : "35.1/16"
   }
]

This isn’t exactly a new kind of service. Going back to the late 1990s, we’ve had IP geolocation services. But I wanted something free, tightly secured, and automatically updated on an on-demand basis. I also wanted something small data-wise; I don’t need latitude/longitude, etc. And I also wanted to take a look at the RDAP services from the NICs.

I did look at some other freely available sources of data, one of them being ipdeny.com. While their data is useful for bootstrapping (and I have a program to bootstrap dwmrdapd’s initial database from their country ‘zone files’), I’ve found it lacking in correctness. Possibly due to no fault of their own: NIC data is messy, especially if you’re fetching it via WHOIS but even the RDAP data can be very sloppy (cough, ARIN, cough), or abysmally slow (LACNIC).

There are also RIR datasets (Routing Information Registry), but they’re not uniform and there’s less participation than some of us would like to see.

Refactoring and adding to libDwmAuth

I’ve been working on some changes and additions to libDwmAuth.

I had started a round of changes to the behind-the-scenes parts of the highest-level APIs to make managing authorized users and MitM prevention easier. However, in the end I felt like I was following the wrong course because my first solution involved too many round trips between client and server and some significant key generation overhead since I was using ephemeral 2048-bit RSA keys.

I’m now using ECDH for the first step. I have a working implementation with unit tests, using Crypto++. Unfortunately I’m still waiting for curve25519 to show up in Crypto++, but in the meantime I’m using secp256r1 despite its vulnerabilities.

I also have a rudimentary scheme for MitM prevention that is very similar to that used by OpenSSH, and client and server authentication based on RSA keys (2048 bits at the moment). I have a known_services file that’s similar to OpenSSH’s known_hosts, and an authorized_keys file that’s similar to the same for OpenSSH. This allows fairly easy management on both client and server side for my applications.

Obviously I also have a public/private key generator application.

mcblock examples

I recently wrote about the creation of a new utility I created to help manage my pf rules called mcblock. Thus far the most useful part has been the automation of rule addition by grokking logs.

For example, it can parse auth.log on FreeBSD and automatically add entries to my pf rule database. And before adding the entries, it can show you what it would do. For example:

# bzcat /var/log/auth.log.0.bz2 | mcblock -O - 
109.24.194.41        194 hits
  add 109.24.194/24 30 days
103.25.133.151         3 hits
  add 103.25.133/24 30 days
210.151.42.215         3 hits
  add 210.151.42/24 30 days

What I’ve done here is uncompress auth.log.0.z2 to stdout and pipe it to mcblock to see what it would do. mcblock shows that it would add three entries to my pf rule database, each with an expiration 30 days in the future. I can change the number of days with the -d command line option:

# bzcat /var/log/auth.log.0.bz2 | mcblock -d 60 -O -
109.24.194.41        194 hits
  add 109.24.194/24 60 days
103.25.133.151         3 hits
  add 103.25.133/24 60 days
210.151.42.215         3 hits
  add 210.151.42/24 60 days

By default, mcblock uses a threshold of 3 entries from a given offending IP address in a log file. This can be changed with the -t argument:

# bzcat /var/log/auth.log.0.bz2 |  mcblock -t 1 -O - 
109.24.194.41        194 hits
  add 109.24.194/24 30 days
103.25.133.151         3 hits
  add 103.25.133/24 30 days
210.151.42.215         3 hits
  add 210.151.42/24 30 days
31.44.244.11           2 hits
  add 31.44.244/24 30 days

If I’m happy with these actions, I can tell mcblock to execute them:

# bzcat /var/log/auth.log.0.bz2 | mcblock -t 1 -A -

And then look at one of the entries it added:

# mcblock -s 31.44.244/24
31.44.244.0/24     2015/08/21 - 2015/09/20

This particular address space happens to be from Russia, and is allocated as a /23. So let’s add the /23:

# mcblock -a 31.44.244/23

And then see what entries would match 31.44.244.11:

# mcblock -s 31.44.244.11
31.44.244.0/23     2015/08/21 - 2015/09/20

The /24 was replaced by a /23. Let’s edit this entry to add the registry and the country, and extend the time period:

# mcblock -e 31.44.244/23
start time [2015/08/21 04:37]: 
end time [2015/09/20 04:37]: 2016/02/21 04:37
registry []: RIPE
country []: RU
Entry updated.

And view again:

# mcblock -s 31.44.244.11
31.44.244.0/23     2015/08/21 - 2016/02/21 RIPE     RU

mcblock: new code for pf rule management from a ‘lazy’ programmer

Good programmers are lazy. We’ll spend a good chunk of time writing new/better code if we know it will save us a lot of time in the future.

Case in point: I recently completely rewrote some old code I use to manage the pf rules on my gateway. Why? Because I had been spending too much time doing things that could be done automatically by software with just a small bit of intelligence. Basically codifying the things I’ve been doing manually. And also because I’m lazy, in the way that all good programmers are lazy.

Some background…

I’m not the type of person who fusses a great deal about the security of my home network. I don’t have anything to hide, and I don’t have a need for very many services. However, I know enough about Internet security to be wary and to at least protect myself from the obvious. And I prefer to keep out the hosts that have no need to access anything on my home network, including my web server. And a very long time ago, I was a victim of an SSH-v1 issue and someone from Romania set up an IRC server on my gateway while I was on vacation in the Virgin Islands. I don’t like someone else using my infrastructure for nefarious purposes.

At the time, it was almost humorous how little the FBI knew about the Internet (next to nothing). I’ll never forget how puzzled the agents were at my home when I was explaining what had happened. The only reason I had called them was because the perpetrator managed to get a credit card number from us (presumably by a man-in-the-middle attack) and used it to order a domain name and hosting services. At the time I had friends with fiber taps at the major exhanges and managed to track down some of his traffic and eventually a photo of him and his physical address (and of course I had logged a lot of the IRC traffic before I completely shut it down). Didn’t do me any good since he was a Russian minor living in Romania. The FBI agents knew nothing about the Internet. My recollection is hazy, but I think this was circa 1996. I know it was before SSH-v2, and that I was still using Kerberos where I could.

Times have changed (that was nearly 20 years ago). But I continue to keep a close eye on my Internet access. I will never be without my own firewall with all of the flexibility I need.

For a very long time, I’ve used my own software to manage the list of IP prefixes I block from accessing my home network. Way back when, it was hard: we didn’t have things like pf. But all the while I’ve had some fairly simple software to help me manage the list of IP prefixes that I block from accessing my home network and simple log grokking scripts to tell me what looks suspicious.

Way back when, the list was small. It grew slowly for a while, but today it’s pretty much non-stop. And I don’t think of myself as a desirable target. Which probably means that nearly everyone is under regular probing and weak attack attempts.

One interesting thing I’ve observed over the last 5 years or so… the cyberwarfare battle lines could almost be drawn from a very brief lesson on WWI, WWII and the Cold War, with maybe a smattering of foreign policy SNAFUs and socialism/communism versus capitalism and East versus West. In the last 5 years, I’ve primarily seen China, Russia, Italy, Turkey, Brazil and Columbia address space in my logs with a smattering of former Soviet block countries, Iran, Syria and a handful of others. U.S. based probes are a trickle in comparison. It’s really a sad commentary on the human race, to be honest. I would wager that the countries in my logs are seeing the opposite directed at them: most of their probes and attacks are likely originating from the U.S. and its old WWII and NATO allies. Sigh.

Anyway…

My strategy

For about 10 years I’ve been using code I wrote that penalizes repeat attackers by doubling their penalty time each time their address space is re-activated in my blocked list. This has worked well; the gross repeat offenders wind up being blocked for years, while those who only knock once are only blocked for a long enough time to thwart their efforts. Many of them move on and never return (meaning I don’t see more attacks from their address space for a very long time). Some never stop, and I assume some of those are state-sponsored, i.e. they’re being paid to do it. Script kiddies don’t spend years trying to break into the same tiny web site nor years scanning gobs of broadband address space. Governments are a different story with a different set of motivations that clearly don’t go away for decades or even centuries.

The failings

The major drawback to what I’ve been doing for years: too much manual intervention, especially adding new entries. It doesn’t help that there is no standard logging format for various externally-facing services and that the logging isn’t necessarily consistent from one version to the next.

My primary goal was to automate the drudgery, replace the SQL database in the interest of having something lighter and speedier, while leveraging code and ideas that have worked well for me. I created mcblock as a simple set of C++ classes and a single command-line application to serve the purpose of grokking logs and automatically adding to my pf rules.

Automation

  • I’m not going to name all the ways in which I automatically add offenders, but I’ll mention one: I parse auth.log.0.bz2 every time newsyslog rolls over auth.log. This is fairly easy on FreeBSD, see the entry regarding the R flag and path_to_pid_cmd_file in the newsyslog.conf(5) manpage. Based on my own simple heuristics, those who've been offensive will be blocked for at least 30 days. Longer if they're repeat offenders, and I will soon add policy to permit more elaborate qualifications. What I have today is fast and effective, but I want to add some feeds from my probe detector (reports on those probing ports on which I have nothing listening) as well as from pflog. I can use those things today to add entries or re-instantiate expired entries, but I want to be able to extend the expiration time of existing active entries for those who continue to probe for days despite not receiving any response packets.
  • My older code used an SQL database, which was OK for most things but made some operations difficult on low-power machines. For example, I like to be able to automatically coalesce adjacent networks before emitting pf rules; it makes the pf rules easier to read. For example, if I already have 5.149.104/24 in my list and I add 5.149.105/24, I prefer emitting a single rule for 5.149.104/23. And if I add 5.149.105/24 but I have an inactive (expired) rule for 5.149.104/22, I prefer to reactivate the 5.149.104/22 rule rather than add a new rule specifically for 5.149.105/24. My automatic additions always use /24's, but once in a while I will manually add wider rules knowing that no one from a given address space needs access to anything on my network or the space is likely being used for state-sponsored cyberattacks. Say Russian government address space, for example; there's nothing a Russian citizen would need from my tiny web site and I certainly don't have any interest in continuous probes from any state-sponsored foreign entity.
  • Today I'm using a modified version of my Ipv4Routes class template to hold all of the entries. Modified because my normal Ipv4Routes class template uses a vector of unordered_map under the hood (to allow millions of longest-match IPv4 address lookups per second), but I need ordering and also a smaller memory footprint for my pf rule generation. While it's possible to reduce the memory footprint of unordered_map by increasing the load factor, it defeats the purpose (slows it down) when your hash key population isn't well-known and you still wind up with no ordering. Ordering allows the coalescing of adjacent prefixes to proceed quickly, so my modified class template uses map in place of unordered_map. Like my original Ipv4Routes class template, I have separate maps for each prefix length, hence there are 33 of them. Of course I don't have a use for /0, but it's there. I also typically don't have a use for the /32 map, but it's also there. Having the prefix maps separated by netmask length makes it easy to understand how to find wider and narrower matches for a given IP address or prefix, and hence write code that coalesces or expands prefixes. And it's more than fast enough for my needs: it will easily support hundreds of thousands of lookups per second, and I don't need it to be anywhere near as fast as it is. But I only had to change a couple of lines of my existing Ipv4Routes class template to make it work, and then added the new features I needed.
  • I never automatically remove entries from the new database. That's because historical information is useful and the automation can re-activate an existing but expired entry that might be a wider prefix than what I would allow automation to do without such information. While heuristics can do some of this fairly reliably, expired entries in the database serve as additional data for heuristics. If I've blocked a /16 before, seeing nefarious traffic from it again can (and usually should) trigger reactivation of a rule for that /16. And then there are the things like bogons and private space that should always be available for reactivation if I see packets with source addresses from those spaces coming in on an external interface.
  • Having this all automated means I now spend considerably less time updating my pf rules. Formerly I would find myself manually coalescing the database, deciding when I should use a wider prefix, reading the daily security email from my gateway to make sure I wasn't missing anything, etc. Since I now have unit tests and a real lexer/parser for auth.log, and pf entries are automatically updated and coalesced regularly, I can look at things less often and at my leisure while knowing that at least most of the undesired stuff is being automatically blocked soon after it is identified.

Good programmers are lazy. A few weekends of work is going to save me a lot of time in the future. I should've cobbled this up a long time ago.

depot’s backup space is now ZFS mirror

Last night I installed an HGST Deskstar NAS 4TB drive in depot to pair with the existing HGST Deskstar 4TB drive. I saved the existing data to a ZFS pool on kiva, then wiped the existing HGST Deskstar drive: unmounted the filesystem, deleted the partition, deleted the partitioning scheme.

If you’re doing this for the first time on FreeBSD 10.1 or later, don’t forget to enable ZFS (loading of the kernel module) and tell the system to mount ZFS pools at boot.

Enable ZFS kernel module at boot by adding to /boot/loader.conf:

zfs_load="YES"

Tell the system to mount ZFS pools at boot by adding to /etc/rc.conf:

zfs_enable="YES"

If you haven’t rebooted after changing /boot/loader.conf, you can load the kernel module manually:

# kldload zfs

Before getting started, I changed the default ashift setting to be more amenable to 4k drives:

# sysctl vfs.zfs.min_auto_ashift=12

I then created my ZFS pool. First I created the GPT partitioning scheme on each drive:

# gpart create -s gpt ada0
# gpart create -s gpt ada4

I then created a partition on each, leaving 1 gigabyte of space unused:

# gpart add -t freebsd-zfs -l gpzfs1_0 -b1M -s3725G ada0
# gpart add -t freebsd-zfs -l gpzfs1_1 -b1M -s3725G ada4

I then created the pool:

# zpool create zfs1 mirror /dev/gpt/gpzfs1_0 /dev/gpt/gpzfs1_1

I created my filesystem heirarchy. For now I only need my backups mount point. Since FreeBSD now has lz4_compress enabled by default, I can use lz4 compression. lz4 is considerably faster than lzjb, especially on incompressible data.

# zfs create -o compression=lz4 zfs1/backups

I then copied back the original data that was on the single HGST Deskstar 4TB drive. Since I had disabled TimeMachine on my desktop computer in order to move TimeMachine backups to the ZFS mirror, I re-enabled TimeMachine on my desktop and manually asked it to perform a backup. It worked fine and completed in less than 2 minutes since I hadn’t changed much on my desktop machine.

First ZFS pool now on kiva

I finally got around to creating the first ZFS pool on my new-to-me server (kiva). At the moment, this particular pool is for backups of other machines.

I am using the 4TB HGST Deskstar drive I bought a little bit ago, and a 4TB HGST Deskstar NAS I bought today. Once installed in hot-swap bays, they showed up as da1 and da2.

I created the GPT partitioning scheme on each:

# gpart create -s gpt da1
# gpart create -s gpt da2

I created a partition on each, leaving 2 gigabytes of space unused. It’s not uncommon for a replacement drive to have slightly less space, and I don’t want to be trapped in a jam if one of the drives fails and I need to use a different type of drive as a replacement. 2 gigabytes seems like a lot of space, but in the grand scheme of this ZFS host, it’s nothing. On FreeBSD, there is no performance penalty for using partitions for ZFS versus using whole disks. This allows me to wait to buy a replacement disk, which means I don’t have spare disks sitting around with their warranty period ticking away without the drives being used. I would always have spare drives on hand in a production environment, but at home it makes sense (especially for backups) to wait for a drive to have some trouble before purchasing its replacement. 4TB drives are readily available locally.

# gpart add -t freebsd-zfs -l gpzfs1_0 -b1M -s3724G da1
# gpart add -t freebsd-zfs -l gpzfs1_1 -b1M -s3724G da2

So now I see:

% gpart show da1
=>        34    7814037101  da1  GPT  (3.6T)
          34          2014       - free -  (1.0M)
        2048    7809794048    1  freebsd-zfs  (3.6T)
  7809796096       4241039       - free -  (2.0G)

% gpart show da2
=>        34    7814037101  da2  GPT  (3.6T)
          34          2014       - free -  (1.0M)
        2048    7809794048    1  freebsd-zfs  (3.6T)
  7809796096       4241039       - free -  (2.0G)

I created the pool:

# zpool create zfs1 mirror /dev/gpt/gpzfs1_0 /dev/gpt/gpzfs1_1

I created my filesystem heirarchy. For now I only need my backups mount point. Since FreeBSD now has lz4_compress enabled by default, I can use lz4 compression. lz4 is considerably faster than lzjb, especially on incompressible data.

# zfs create -o compression=lz4 zfs1/backups

And since I had not yet enabled ZFS on kiva, I added to /boot/loader.conf:

zfs_load="YES"

And added to /etc/rc.conf:

zfs_enable="YES"

After copying over 38 gigabytes of backups from another host, I have this:

% zpool list -v
NAME               SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zfs1              3.62T  23.9G  3.60T         -     0%     0%  1.00x  ONLINE  -
  mirror          3.62T  23.9G  3.60T         -     0%     0%
    gpt/gpzfs1_0      -      -      -         -      -      -
    gpt/gpzfs1_1      -      -      -         -      -      -

lz4 compression yielded a 37% reduction in disk space for these backups. That’s quite reasonable.

A friend asked me why I was using a mirror. The simple answer is that it’s more reliable than raidzN, and more easily expanded. This machine has 12 hot-swap drive bays, and I don’t expect to need all of them anytime soon (if ever). While a raidzN is more space-efficient, it’s not easily expanded and when one drive from a batch fails, others are often not far behind. Resilvering a raidzN is hard on all of the drives involved, and it’s not uncommon to have another disk fail during a resilvering. Resilvering a raidzN is slower than resilvering a mirror, and array performance suffers dramatically during resilvering of a raidzN. If/when I need to add more space to the pool, I can simply buy two more drives and add another mirror to the pool.

It’s worth noting that ZFS is not a substitute for backups. Here I am using ZFS to store backups of other machines, and it’s very useful for this use case.

Computer reallocation: depot is now web server

I’ve mostly finished migrating my web server to depot.

I did this to gain RAM and CPU (mostly the former). My old web server was an Atom D510 based machine with 4G of RAM. Most of the time this wasn’t a huge issue, but it was holding me back from putting more of my own software on it and I couldn’t easily run multiple jails or bhyve. depot has 32G of RAM and an i5-2405S, which should be sufficient for my needs for a while.

It’s worth noting that the big hog on my web server is mysql. I’m looking to get rid of it, but that means replacing my blog since WordPress requires mysql. I already wrote my own gallery software to replace gallery3, I have no reason to believe I can’t replace WordPress with something of my own that is simpler and consumes fewer resources. I’m also growing tired of the security issues that regularly crop up with WordPress; I’m certain I can produce something more secure.

New (to me) server up and running: kiva

I recently bought a server from eBay to replace the duties of my storage server (depot). depot will become my web server. I needed to do this because my web server was an Atom D510 based system. I need more RAM than can be addressed with an Atom D510. I also wanted ECC memory in my storage server since I’m about to start using ZFS pools and can use the integrity provided by ECC.

The server I bought is overkill for my current needs, but it was inexpensive because it’s older technology. It is a Supermicro X8DTN+ motherboard with a pair of Xeon L5640 CPUs and 48G of registered ECC RAM (six 8G sticks). It’s in a Supermicro SC826 chassis with a BPN-SAS-826EL backplane. This wasn’t the backplane I wanted; the eBay description was incorrect. However, it’ll work for my needs since I don’t really need more than 4 SAS lanes. As an upside, the cabling is cleaner than SFF to SATA breakout cables. I’m using an LSI 9211-8i PCIe x4 HBA to connect the backplane to the motherboard.

As an aside, I’ve never owned a machine with 12 CPU cores that can run 24 hyperthreads. While the L5640 runs at a paltry 2.26GHz, it is very handy to be able to run gmake -j24 when doing software development. I’m using a Crucial MX100 512G SSD as my OS drive, because it was inexpensive (an Amazon Prime Day deal). I would normally choose a Samsung 850 Pro, but I couldn’t justify the price for setting up this machine. I can always change it later. At any rate, compiles of my software are speedy on this machine, which means I can get back to finishing my BGP-4 development along with some other things.

The new machine is named kiva (thanks to Julie for the name!). Other than the Crucial MX100, it has an HGST 4TB Deskstar that will host backups of other machines. Backups of kiva are currently going to another HGST 4TB Desktar in depot. kiva is running FreeBSD 10.2-BETA2 (i.e. 10.1-STABLE on its way to 10.2). It is mounted in the rack, but I’ll likely change its position later.