Striping 4 Samsung 990 Pro 2TB on Ubuntu 22.04

On Prime Day I ordered four Samsung 990 Pro 2TB NVMe SSDs to install in my Threadripper machine. I’ve had an unopened Asus Hyper M.2 x16 Gen4 card for years waiting for drives. Just never got around to finishing the plan for my Threadripper machine.

The initial impression is positive. Just for fun, I striped all 4 of them and put an ext4 filesystem on the group, just to grab some out-of-the-box numbers. First up: a simple read test, which yielded more than 24 gigabytes/second. Nice.

dwm@thrip:/hyperx/dwm% fio --name TEST --eta-newline=5s --filename=temp.file --rw=read --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting

TEST: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32
Starting 1 process
TEST: Laying out IO file (1 file / 2048MiB)

TEST: (groupid=0, jobs=1): err= 0: pid=6333: Wed Jul 19 02:11:19 2023
  read: IOPS=25.2k, BW=24.6GiB/s (26.4GB/s)(10.0GiB/407msec)
    slat (usec): min=27, max=456, avg=38.46, stdev=21.03
    clat (usec): min=174, max=10736, avg=1206.67, stdev=443.19
     lat (usec): min=207, max=11193, avg=1245.21, stdev=460.96
    clat percentiles (usec):
     |  1.00th=[  971],  5.00th=[ 1020], 10.00th=[ 1037], 20.00th=[ 1057],
     | 30.00th=[ 1074], 40.00th=[ 1074], 50.00th=[ 1074], 60.00th=[ 1090],
     | 70.00th=[ 1123], 80.00th=[ 1172], 90.00th=[ 1975], 95.00th=[ 2024],
     | 99.00th=[ 2245], 99.50th=[ 2278], 99.90th=[ 7832], 99.95th=[ 9241],
     | 99.99th=[10421]
  lat (usec)   : 250=0.05%, 500=0.24%, 750=0.26%, 1000=1.76%
  lat (msec)   : 2=88.58%, 4=8.87%, 10=0.21%, 20=0.03%
  cpu          : usr=2.71%, sys=96.06%, ctx=144, majf=0, minf=8205
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=0.8%, 32=98.5%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=10240,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=24.6GiB/s (26.4GB/s), 24.6GiB/s-24.6GiB/s (26.4GB/s-26.4GB/s), io=10.0GiB (10.7GB), run=407-407msec

Disk stats (read/write):
    dm-0: ios=151773/272, merge=0/0, ticks=47264/0, in_queue=47264, util=83.33%, aggrios=10240/21, aggrmerge=30720/63, aggrticks=3121/2, aggrin_queue=3124, aggrutil=76.09%
  nvme3n1: ios=10240/21, merge=30720/63, ticks=3146/3, in_queue=3149, util=76.09%
  nvme4n1: ios=10240/21, merge=30720/63, ticks=3653/3, in_queue=3657, util=76.09%
  nvme1n1: ios=10240/21, merge=30720/63, ticks=2504/3, in_queue=2507, util=76.09%
  nvme2n1: ios=10240/21, merge=30720/63, ticks=3182/2, in_queue=3184, util=76.09%

A short while later, I ran a simple write test. Here I see more than 13 gigabytes/second.

dwm@thrip:/hyperx/dwm% fio --name TEST --eta-newline=5s --filename=temp.file --rw=write --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32
Starting 1 process

TEST: (groupid=0, jobs=1): err= 0: pid=6682: Wed Jul 19 02:15:31 2023
  write: IOPS=13.7k, BW=13.4GiB/s (14.4GB/s)(10.0GiB/746msec); 0 zone resets
    slat (usec): min=35, max=297, avg=69.19, stdev=14.38
    clat (usec): min=48, max=9779, avg=2242.89, stdev=738.00
     lat (usec): min=105, max=9837, avg=2312.18, stdev=740.08
    clat percentiles (usec):
     |  1.00th=[ 1549],  5.00th=[ 2040], 10.00th=[ 2057], 20.00th=[ 2073],
     | 30.00th=[ 2089], 40.00th=[ 2089], 50.00th=[ 2114], 60.00th=[ 2114],
     | 70.00th=[ 2114], 80.00th=[ 2147], 90.00th=[ 2278], 95.00th=[ 3195],
     | 99.00th=[ 6456], 99.50th=[ 8979], 99.90th=[ 9503], 99.95th=[ 9634],
     | 99.99th=[ 9765]
   bw (  MiB/s): min=13578, max=13578, per=98.92%, avg=13578.00, stdev= 0.00, samples=1
   iops        : min=13578, max=13578, avg=13578.00, stdev= 0.00, samples=1
  lat (usec)   : 50=0.01%, 100=0.02%, 250=0.09%, 500=0.15%, 750=0.16%
  lat (usec)   : 1000=0.19%
  lat (msec)   : 2=1.06%, 4=96.87%, 10=1.46%
    sync (nsec): min=180, max=180, avg=180.00, stdev= 0.00
    sync percentiles (nsec):
     |  1.00th=[  181],  5.00th=[  181], 10.00th=[  181], 20.00th=[  181],
     | 30.00th=[  181], 40.00th=[  181], 50.00th=[  181], 60.00th=[  181],
     | 70.00th=[  181], 80.00th=[  181], 90.00th=[  181], 95.00th=[  181],
     | 99.00th=[  181], 99.50th=[  181], 99.90th=[  181], 99.95th=[  181],
     | 99.99th=[  181]
  cpu          : usr=36.11%, sys=53.56%, ctx=9861, majf=0, minf=14
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=0.8%, 32=98.5%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,10240,0,1 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=13.4GiB/s (14.4GB/s), 13.4GiB/s-13.4GiB/s (14.4GB/s-14.4GB/s), io=10.0GiB (10.7GB), run=746-746msec

Disk stats (read/write):
    dm-0: ios=0/135825, merge=0/0, ticks=0/12864, in_queue=12864, util=87.00%, aggrios=0/10276, aggrmerge=0/30823, aggrticks=0/1124, aggrin_queue=1125, aggrutil=82.63%
  nvme3n1: ios=0/10275, merge=0/30822, ticks=0/1109, in_queue=1109, util=82.20%
  nvme4n1: ios=0/10276, merge=0/30826, ticks=0/1000, in_queue=1001, util=82.63%
  nvme1n1: ios=0/10276, merge=0/30822, ticks=0/1366, in_queue=1367, util=82.63%
  nvme2n1: ios=0/10277, merge=0/30825, ticks=0/1022, in_queue=1023, util=82.20%

It’s worth noting that I don’t consider this configuration a good idea for anything other than scratch space (perhaps for DL training data sets, etc.); 4 striped drives is as my friend Ben put it, risky. I of course trust SSD more than spinning rust here, and historically I’ve had no failures with Samsung SSD drives, but… that’s a hard thing to judge from just my personal observations and from where the industry has gone. I still have Samsung SATA SSDs from the 830 and 840 series, and they’re still healthy. But… we’ve gone from SLC to TLC to QLC to… losing a hair of reliability (and a chunk of warranty) at each step. And I’d be remiss if I didn’t mention Samsung’s botched firmware in the last two generations (980 and 990). In fact I’m annoyed that 2 of the 4 drives I received have old firmware that I’ll need to update.

Raspberry Pi PoE+ is a typo (should be PoS)

Seriously. ‘S’ is adjacent ‘E’ on a QWERTY keyboard.

I knew the official PoE+ HATs were pieces of poop before I bought them. This isn’t news. You don’t have to look hard to find Jeff Geerling’s comments, Martin Rowan’s comments or the many others who’ve complained. I had read them literally years before I purchased.

I decided to buy 4 of them despite the problems, for a specific purpose (4 rack mounted Pi4B 8G, all powered via PoE alone). I’ve had those running for a few days and they’re working. They’re inefficient, but so far they work.

I also ordered 2 more, with the intent of using one of them on a Raspberry Pi 4B 8G in an Anidees Pro case and keeping the other as a spare. Well, in literally 36 hours, one of them is already dead. I believe it destroyed itself via heat. And therein lies part of the problem. I’ll explain what I casually observed, since I wasn’t taking measurements.

I ran the Pi from USB-C for about a day, without the PoE HAT installed. It was in the Anidees Pro case, fully assembled. It was fine, idling around 37.4C and not seeming to go above 44C when running high loads (make -j4 on some medium-sized C++ projects; a prominent workload for me). Solid proof that for my use, the Anidees Pro case works exactly as intended. The case is a big heatsink. Note that I have the 5mm spacers installed for the lid, so it’s open around the entire top perimeter.

I then installed the PoE+ HAT, with extension headers and the correct length standoffs that are needed in the Anidees Pro case. Note that this activity isn’t trivial; the standoffs occupy the same screw holes as the bottom of the case (from opposite directions), and an unmodified standoff is likely to bottom out as it collides with the end of the case bottom screw. You can shorten the threaded end of the standoff, or do as I did and use shorter standoffs and add a nut and washers to take up some of the thread. I don’t advise shortening the screws for the bottom of the case.

I plugged in the PoE ethernet from my office lab 8-port PoE switch, which has been powering the 4 racked Pis for a few days. And observed the expected horrible noise noted by others. Since I expected it, I immediately unplugged the USB-C power. I continued installing software and started compiling and installing my own software (libDwm, libCredence, mcweather, DwmDns, libDwmWebUtils, mcloc, mcrover, etc.). It was late, so I stopped here. On my way out of the home office, I put my hand on the Pi. It was much warmer than when running from the USB-C. In fact, uncomfortably warm. I checked the CPU temperature with vcgencmd, it was under 40C. Hmm. I was tired, so I decided to leave it until the next day and see what happens.

In the morning the Pi had no power. I unplugged and plugged both ends of the 12″ PoE cable. Nothing.

It turns out that the PoE+ HAT is dead. Less than 48 hours of runtime. As near as I can tell, it cooked itself. The PoE port on the ethernet switch still works great. The Pi still works great (powered from USB-C after the dead PoE+ HAT was removed).

I find this saddening and unacceptable. “If it’s not tested, it’s broken.”. Hey Eben: it’s broken. No, literally, it’s broken. And looks to not even be smoke tested. In fact I’d say it’s worse than the issues with Rev. 1 of the original PoE HAT. This is a design problem, not a testing problem. In other words, the problem occurred at the very beginning of the process. Which means it passed through all of engineering. And this issue lies with leadership, not the engineers.

So not only have you gone backward, you’ve gone further back than you were for Rev. 1 of the PoE HAT. And you discontinued the only good PoE HAT you had? Now I’n just left with, “Don’t believe anything, ANYTHING from the mouth of Eben Upton.”.

I’m angry because my trust of Raspberry Pi has been eroding for years, and this is just another lump of coal. We hobbyists were basically screwed for 3 years on availability of all things Pi 4, and you’re still selling a PoE HAT that no one should use.

I’ve been saying this for a couple of years now: there is opportunity for disruption here. While I appreciate the things the Raspberry Pi Foundation has done, I’m starting to feel like I can’t tell anyone that the Pi hardware landscape is great. In fact for many things, it has stagnated.

For anyone for whom 20 US dollars matters, do NOT buy an official PoE+ HAT. Not that it matters… it’s June 2023 and it’s still not trivial to find something to use it (a Raspberry Pi 3 or 4).

There comes a time when a platform dies. They get lazy after the ecosystem builds around them. I’m wondering if I am seeing that on the horizon for the Raspberry Pi.