Tracking my “1,000 miles by Sept. 1” cycling goal with software

I decided I wanted a means of tracking progress toward my 1,000 miles of cycling by September 1st. So this weekend I threw together a C++ library and tool to read a fairly simple JSON data file and emit chart.js plots and a table.

The JSON file is pretty straightforward. All I need to do after each rides is add a new entry to the “rides” array. I split the lines in the display below, but in reality I have a single line per “ride” entry. It’s easy to add an entry since I just copy one and edit as needed. The order of the “rides” entries doesn’t matter since my library will sort them by the “odometer” field as needed. So I always add my latest ride as the first “rides” entry just because it’s the fastest (no scrolling).

Note that the “odometer” field for each ride is my bike odometer reading at the end of the given ride. This lets me compute the miles for the given ride by subtracting the odometer reading for the previous ride. In the case of the first ride, I subtract the “startMiles” which represents the odometer reading when the goal is created.

{
    "rides": [
	{ "time": "6/29/2024 21:30", "odometer": 538.6,
          "minutes": 42, "calories": 481 },
	{ "time": "6/28/2024 15:45", "odometer": 529.1,
          "minutes": 40, "calories": 476 },
	{ "time": "6/27/2024 20:30", "odometer": 519.5,
          "minutes": 40, "calories": 168 },
	{ "time": "6/27/2024 15:00", "odometer": 515.0,
          "minutes": 53, "calories": 578 },
	{ "time": "6/26/2024 19:25", "odometer": 504.6,
          "minutes": 15, "calories": 121 },
	{ "time": "6/26/2024 15:22", "odometer": 500.6,
          "minutes": 34, "calories": 317 },
	{ "time": "6/25/2024 20:40", "odometer": 491.4,
          "minutes": 35, "calories": 430 },
	{ "time": "6/24/2024 23:50", "odometer": 482.6,
          "minutes": 35, "calories": 410 },
	{ "time": "6/24/2024 14:00", "odometer": 473.8,
          "minutes": 60, "calories": 830 }
    ],
    "startTime": "6/24/2024 00:00",
    "goalTime": "9/1/2024 00:00",
    "startMiles": 461.0,
    "goalMiles": 1000
}

From this simple file I can generate useful charts and a table. I wrote a CGI program that utilizes the C++ library and can emit charts.js graphs and and HTML table. I can embed these with iframes in a web page. That’s exactly what I’m doing to track my progress. You can see these at https://www.rfdm.com/Daniel/Cycling/Goals/20240901.php. And I can of course embed them here too!

One of the things that’s cool about this is that it’s really easy to create a new goal file and just update it after each ride. I can create shorter goals in new files. I can easily change the goal miles or goal time as desired.

I’ll probably create a command line utility to allow me to add rides without editing the file directly. Mainly because I can use it to sanity check input (say an incorrect date/time), and of course can make it a secure client/server application. And at some point I might make an iOS app that can grab the relevant data via HealthKit APIs so I don’t have to type at all.

Big Tech arms race is out of control; it’s time to bolster defenses

I recently posted about doing my annual web log perusal and adding more networks to the list which I block from accessing my web server. Until 2021 or so, most of the networks I added were what I’d consider hostile netizens, and most of them in foreign countries (China far and away the worst offender, followed by Russia, Singapore, Hong Kong and others).

Around 2021 (maybe earlier), I started seeing more attacks coming from cloud infrastructure. Today, after my first full pass of changes (March 17, 2024), it’s the majority.

As an aside, how do I know that many of the adversaries are foreign? Well, after my first pass of changes, it’s pretty clear that a lot of them were far away (by network round-trip time). Below are plots of traffic and round trip times to clients of my web server, from before my changes and right after my changes. Note how the 95th and 75th percentile round trip times dropped dramatically. That’s because many of the nefarious users were far away.

It’s worth noting that much of my traffic is local to my home. I have many Raspberry Pis and other hosts running my ‘mcrover’ software, and one of the things it does it monitor my web server, including queries to my blog and the gallery software for Randy’s site since they have database backends I want to be sure are running. This background monitoring traffic works out to about 80 kilobits/second on average. The round trip time for this traffic is tiny; for some hosts it’s sub-millisecond since they have 10 gigabit connections (DAC, fiber or 10GbaseT).

Back to the topic at hand…

Late last year I got a question via email from a customer at DigitalOcean as to why I had blocked their access. I didn’t reply. I’ve come to the conclusion that if users won’t police themselves, and their cloud provider won’t police their users (and expects us to help them do the policing!), neither deserve my time. If you, or your customer, are getting 404’s from my website for URLs like ‘/login.php’ (which doesn’t exist and has never existed on my site), you’re either a criminal or an accomplice. Neither of you deserve my time. Your activities are hostile, period.

Since 2021 or so, the more disturbing trend I see seems to be driven by the AI arms race in big tech. Microsoft (and OpenAI), Google and Apple crawling my web site with more frequency, and pulling everything they can find. PDFs, all images, etc. There was a time when I considered it OK, in the sense that it was used only for search purposes, which at least gives back a bit to the web as a whole (search is a useful service, despite the mess it has become). But today, it’s pretty clear to me that the snarfing of data by big tech is not balanced by a benefit to users of the web. It’s also quite aggressive. How do I benefit from Microsoft downloading every PDF and image that’s on my web site? I don’t. I lose (I pay for my bandwidth), they profit. There is literally zero benefit to me, or you unless you’re Microsoft or OpenAI.

I’ve grown tired of this. I’m not interested in feeding the LLMs, much less allowing cloud infrastructure to be leveraged against my web site. So as of this week, anytime I happen to be doing some block list maintenance and see traffic from a cloud service or big tech, the network will be blocked. And not just small chunks; the entire CIDR allocation. And not just for a short period of time; 2 years. If you manage to get your act together 2 years from now, great. If not, you’ll be blocked for 2 more years. I anticipate the latter, since that’s been the steady course for the last 7 years.

I’ve already blocked large swaths of Amazon, Microsoft, Google, DigitalOcean, Linode, Hetzner, Hurricane Electric, OVH and other cloud services. There is literally no reason I should see requests from any of these address spaces. They’re not human beings reading my web pages. They are nefarious automation, and not policed by the cloud providers. We are the victims. Ideally the cloud providers would be using deep packet inspection and proactively shut down such activities. But if no one is going to jail for their paying customers’ illicit activities… nothing gets done. And most of us don’t have time to do the policing for them. It’s MUCH faster for me to just block the cloud provider’s entire address space. I’m not going to play wack-a-mole, nor serve as their unpaid reporter of nefarious activity they’re enabling. A simple buh-bye to the cloud provider and all of their customers is my far and away best option. This isn’t a rash decision on my part; I’ve seen nefarious traffic from the cloud providers for many years, and given that they hold gobs of address space, it’s futile to block only small parts. The only sane course is to block all of their address space. In particular, I’m talking to you Amazon, Google, DigitalOcean, Hetzner, Linode, Microsoft and OVH.

It is highly likely that I’ll soon be blocking crawler address space belonging to Google, Apple, Microsoft and others. Sad but true that the benefit of being indexed by search engines has finally been eclipsed by the downsides of allowing unfettered access from big tech. robots.txt doesn’t get us there because it doesn’t have the granularity we need, and the reality is that many of the crawlers don’t even bother to look at it. Worse, how are we as web site admins supposed to be able to determine how the data is being used? If the request is coming from Google crawler address space, is it for search or for feeding their LLMs? Or for sale to a third party? Or for driving their ad revenue? The reality is that we don’t know, and have no means of knowing. I’d rather my web site disappear from search than have all of my web site data used against me in some manner (which includes directed advertising). And I’m tired of the amount of bandwidth it consumes. Google alone is responsible for over 3,000 downloads from my web site per week. Honestly, it’s pretty disgusting. My site is a TINY personal web site, mostly for my own use. I can’t imagine the barrage of traffic from Google to large web sites.

If you happen to be a victim of the more aggressive blocking I’m about to start and the automation that will maintain it long term, my apologies. But I’m unlikely to be sympathetic to pleas for access. If your neighbor attacks my site and my automation blocks the network you share with them as a result… I am truly sorry, but I don’t have time to poke tiny holes as exceptions. I’ll likely leave the 30 day with geometric escalation policy in place for U.S. broadband users, but if you’re using a VPN through a cloud provider and someone else uses that provider to abuse my web site… you’ll be out of luck for much longer. Again, my apologies. If you want to do something about what’s been happening for years on this front that’s forcing some of us to block large swaths of cloud provider space and probably soon the web crawlers of Google, Microsoft, Apple and others… reach out to your legislators. If you’re in the U.S., maybe just making them aware that cloud services on U.S. soil are being used every minute of every day to attack law abiding, tax paying citizens who just want to share knowledge with other citizens. Good luck, godspeed, live long and prosper.

Raspberry Pi garage door opener: Part 3

I finally submitted my order for prototype PCBs for my Raspberry Pi ‘HAT’ that I’ll be using for my Raspberry Pi garage door opener. In the end I wound up with this:

  • 2 relays, used to activate the doors. These are driven by relay drivers.
  • 2 rotary encoder inputs (A and B for 2 encoders), to allow my FreeBSD rotary encoder driver to determine the position of the doors.
  • 2 closed door switch inputs. I’m using Honeywell magnetic switches here.
  • 2 pushbutton inputs. I want to be able to activate the garage doors when standing in front of my garage door unit (inside the garage near the doors).
  • 4 LED outputs. I’m using a tricolor panel-mount LED indicator for each door, which will show green when the door is closed, flashing yellow when the door is moving, steady yellow when the door is open but not moving, and will flash red whenever the door is activated.

I’m using anti-vandal pushbuttons for the door activation buttons. Not because I need the anti-vandal feature, but because they are flat and not easy to push accidentally. The ones I’m using have blue ring illumination.

I’m using Apem Q-Series indicator LEDs.

The enclosure I’m using at the moment is a Hammond translucent blue polycarbonate box. Probably larger then I need, but it’ll let me house a POE splitter to power the whole thing via power over ethernet.

I still need to finish the mechanical stuff… mainly the mounting and connection of the rotary encoders. I have a drawing for part of it for FrontPanelExpress, but I don’t really need to go that route for myself. The main issue I’m still debating is whether I can come up with something cheaper and lighter than Lovejoy couplings to connect them to the garage doors.

In any event, I believe I have fully functioning backend software and the web interface works fine. Everything is encrypted, and authentication is required to activate the doors.

Raspberry Pi garage door opener: part 2

I have a fully functioning software suite now for my garage door opener. I have been using a small simulator program on the Raspberry Pi to pull the pins up and down (using the pullup and pulldown resistors). Tonight I plugged in one of the actual rotary encoders, and it works fine. And now that I think about it, I don’t really need the optocouplers on the inputs, since I’m using encoders with NPN open collector outputs. All I need to do is enable the pull-up resistors. This is also true for the garage door closed switches. Hence I am going to draw up a second board with a lot fewer components. The component cost wasn’t significant for the board I have now, but it’ll save on my effort to populate the board. By dropping the optocouplers, I will eliminate 15 components. And technically I could probably eliminate the filtering capacitors too since the encoder cable is shielded. That would eliminate 4 more components.

I hate the amount of board space required for the relays, but I need them. I considered using MOSFETs or Darlingtons, but I decided it was just a bad idea to tie the Raspberry Pi ground to my garage door opener’s ground pin. It’d be a recipe for ground loop disasters. The relays keep the Raspberry Pi isolated. I am using relay drivers to drive the relays, which just saves on component count and board space.

I have a decent web interface now, which runs on my web server and communicates with the Raspberry Pi (encrypted). I have yet to implement the separate up/down logic, but since the web interface shows the movement of the door, it’s not strictly necessary. Door activation works, and I can see whether the door is opening or closing.

My code on the Raspberry Pi will learn the door travel from a full open/close cycle, so the graphic in the web interface is very representative of the amount the door is open.

Site Health page now shows UPS status and history

I am now collecting UPS status data from my UPS, and there is a new plot on the Site Health page that displays it. I still need to make these plots work more like those on the Site Traffic page, but having the UPS data for battery charge level, UPS load, expected runtime on battery and power consumption is useful to me. I currently have 3 computers plus some other gear running from one UPS, but soon will move a few things to a second UPS to increase my expected on-battery runtime a bit.

Measuring TCP round-trip times, part 5: first round of clean-ups

I added the ability to toggle a chart’s y-axis scale between linear and logarithmic. This has been deployed on the Site Traffic page.

Code cleanup… when I added the round-trip time plot, I wound up creating a lot of code that is largely a duplicate of code I had for the traffic plot. Obviously there are differences in the data and the presentation, but much of it is similar or the same. Tonight I started looking at trickling common functionality into base classes, functions and possibly a template or two.

I started with the obvious: there’s little sense in having a lot of duplicate code for the basics of the charts. While both instances were of type Wt::Chart::WCartesianChart, I had separate code to set things like the chart palette, handle a click on the chart widget, etc. I’ve moved the common functionality into my own chart class. It’s likely I’ll later use this class on my Site Health page.

Measuring TCP round-trip times, part 4: plotting the data

I added a new plot to the Site Traffic page. This is just another Wt widget in my existing Wt application that displays the traffic chart. Since Wt does not have a box plot, I’m displaying the data as a simple point/line chart. There’s a data series for each of the minimum, 25th percentile, median, 75th percentile and 95th percentile. These are global round-trip time measurements across all hosts that accessed my web site. In a very rough sense, they represent the network distance of the clients of my web site. It’s worth noting that the minimum line typically represents my own traffic, since my workstation shares an ethernet connection with my web server.

Clicking on the chart will display the values (in a table below the chart) for the time that was clicked. I added the same function to the traffic chart while I was in the code. I also started playing with mouse drag tracking so I can later add zooming.

Measuring TCP round-trip times, part 3: data storage classes

I’ve completed the design and initial implementation of some C++ classes for storage of TCP round trip data. These classes are simple, especially since I’m leveraging functionality from the Dwm::IO namespace in my libDwm library.

The Dwm::WWW::TCPRoundTrips class is used to encapsulate a std::vector of round trip times. Each round trip time is represented by a Dwm::TimeValue (class from libDwm). I don’t really care about the order of the entries in the vector, since a higher-level container holds the time interval in which the measurements were taken. Since I don’t care about the order of entries in the vector, I can use mutating algorithms on the vector when desired.

The Dwm::WWW::TCPHostRoundTrips class contains a std::map of the aforementioned Dwm::WWW::TCPRoundTrips objects, keyed by the remote host IP address (represented by Dwm::Ipv4Address from libDwm). An instance of this class is used to store all round trip data during a given interval. This class also contains a Dwm::TimeInterval (from my libDwm library) representing the measurement interval in which the round trip times were collected.

both of these classes have OrderStats() members which will fetch order statistics from the encapsulated data. I’m hoping to develop a box plot class for Wt in order to display the order statistics.

Measuring TCP round-trip times, part 2: the throwaway

I previously posted about measuring TCP round-trip times from my web server to its clients. Last night I quickly added code to my existing site traffic monitor to perform this task, as the experimental throwaway to validate my design idea. I have not yet designed the data store, only the collection of round-trip times. To see what it’s doing, I syslog the rtt measurements. It appears to be working fine. Here’s some data from Google’s crawlers prowling my site:


May 2 21:37:23 www sitetrafficd[2318]: [I] 66.249.66.43:45825 rtt 123.2 ms
May 2 21:38:49 www sitetrafficd[2318]: [I] 66.249.66.57:38926 rtt 123.6 ms
May 2 21:38:49 www sitetrafficd[2318]: [I] 66.249.66.43:38085 rtt 123.5 ms
May 2 21:40:16 www sitetrafficd[2318]: [I] 66.249.66.143:39725 rtt 137.8 ms
May 2 21:40:16 www sitetrafficd[2318]: [I] 66.249.66.143:37657 rtt 126.2 ms
May 2 21:41:25 www sitetrafficd[2318]: [I] 66.249.66.204:47961 rtt 160.9 ms
May 2 21:41:25 www sitetrafficd[2318]: [I] 66.249.66.143:45623 rtt 121.1 ms
May 2 21:41:47 www sitetrafficd[2318]: [I] 66.249.66.60:36603 rtt 142 ms
May 2 21:42:15 www sitetrafficd[2318]: [I] 66.249.66.204:48875 rtt 123.6 ms
May 2 21:43:15 www sitetrafficd[2318]: [I] 66.249.66.43:56275 rtt 125.8 ms
May 2 21:44:42 www sitetrafficd[2318]: [I] 66.249.66.57:49966 rtt 124.1 ms
May 2 21:44:42 www sitetrafficd[2318]: [I] 66.249.66.204:53209 rtt 122.9 ms
May 2 21:45:59 www sitetrafficd[2318]: [I] 66.249.66.238:46595 rtt 123.8 ms
May 2 21:47:27 www sitetrafficd[2318]: [I] 66.249.66.60:60241 rtt 142.2 ms

I believe I can call the raw measurement design valid. It’s a bonus that it was not difficult to add to my existing data collection application. It’s another bonus that the data collection remains fairly lightweight in user space. My collection process has a resident set size just over 4M, and that’s on a 64-bit machine. My round-trip time measurement resolution is microseconds, and I’m using the timestamps from pcap to reduce scheduler-induced variance. Since I’m not using any special kernel facilities directly, this code should port fairly easily to OS X and Linux.

May 3, 2012
I added the ability to track SYN to SYN ACK round-trip time, so I can run data collection on my desktop or gateway and characterize connections where I’m acting as the TCP client.