mdadm, GPT and kernel; unexpected behavour30 May 2017
This weekend I encountered some rather unexpected interaction between mdadm, GPT and the linux kernel.
I had added a ‘new’ disk to my personal RAID5, growing it from 3 to 4 disks.
mdadm --add /dev/md0 /dev/sdd mdadm --grow --raid-devices=4 --backup-file=/root/grow_md0.bak /dev/md0
After the reshape had completed ~26h later, I rebooted the system to check that everything was working.
The array came up with a disk missing
11720662464 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
The missing drive was showing correctly as
mdadm --examine /dev/sdd was correct
I re-added the missing drive, waited for it to resilver/rebuild and then rebooted again.
Again the disk was missing from the array
An issue showed when I ran
fdisk -l /dev/sdd
The primary GPT table is corrupt, but the backup appears OK, so that will be used. Disk /dev/sdd: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 1B1FA805-AF84-4575-8D86-4252CF6C6BF1
The disk had previously been used and had an existing; but now empty, GPT table. Although the raid configuration had destroyed the head partition table, the tail partition table had been left intact. This meant the kenel mapped it as a normal drive and stopped mdadm from building the array with the disk.
I removed the drive from the system, and with a USB->SATA blanked out the partition tables
Expert command (? for help): z About to wipe out GPT on /dev/sdd. Proceed? (Y/N): Y GPT data structures destroyed! You may now partition the disk using fdisk or other utilities. Blank out MBR? (Y/N): Y
And then added the disk back to the array with
mdadm --add /dev/md0 /dev/sdd
After the rebuild ~12h I rebooted the system again, this time the disk was correctly included in the array.
Thanks to #Linux-Raid @ freenode for help debugging this issue