Reciprocate zone

where we can give back what we took from the net


Subject: using mdadm and xfs_growfs to remove a disk from a RAID5 array and grow another RAID5 with that disk


Keywords: mdadm, raid 5, reshape array, resize filesystem, remove disk, nfs, xfs


Situation: first of all I'll list my system configuration:


Solution:
As a preamble, afer having unmounted the NFS shares from the clients, login onto the server as root and shut down NFS sharing:

universo:~# /etc/init.d/nfs-kernel-server stop
Stopping NFS kernel daemon: mountd nfsd.
Unexporting directories for NFS kernel daemon....
		
note this is an unnecessary measure as you can unshare single exports via exportfs -u <path-to-share>, yet we are gross and shut down the whole thing. Then we check if we are clear
universo:~# exportfs
	
this should list all exported dirs: no exports - we are happy. We proceed further unwiring the volumes unmounting them from the system:
universo:~# umount /dev/md0
universo:~# umount /dev/md1
  
OK, now we can go on tearing off the disk from md0, but first let's peek at what we are dismantling:
universo:~# mdadm -D /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Sat Aug 25 20:08:59 2007
     Raid Level : raid5
     Array Size : 976767872 (931.52 GiB 1000.21 GB)
    Device Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu Jun 26 16:08:20 2008
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : d6edefb0:96b08698:df9cb6b2:0b4a12a3
         Events : 0.480

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
  
Seems quite an healthy array! Note that the -D option is to get details on an MD volume. Now we want to completely remove, say, sdc, from the array, but before doing that, we must tell the kernel not to use that hard drive any more:
universo:~# mdadm -f /dev/md0 /dev/sdc1
mdadm: set /dev/sdc1 faulty in /dev/md0
universo:~# mdadm -D /dev/md0
/dev/md0:
[…]
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1
[…]
    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       0        0        2      removed

       3       8       33        -      faulty spare   /dev/sdc1
  
The -f option tells mdadm to mark the specified device as faulty, still it remains part of the array as a spare drive to be used in case of another drive to become unusable (in this case there will be data loss because the array is already degraded!); the state of the array becomes degraded as it has no more his normal gemetry but is still clean because there has been no data loss. sdc can now be removed from the array:
universo:~# mdadm -r /dev/md0 /dev/sdc1
mdadm: hot removed /dev/sdc1
universo:~# mdadm -D /dev/md0
/dev/md0:
[…]
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
[…]
    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       0        0        2      removed
  
Now we are ready to add the removed device to md1, but but before we look at its state:
universo:~# mdadm -D /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Fri May 23 23:09:06 2008
     Raid Level : raid5
     Array Size : 976772864 (931.52 GiB 1000.22 GB)
    Device Size : 488386432 (465.76 GiB 500.11 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Thu Jun 26 16:08:24 2008
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

           UUID : d8a342fa:f0143196:2e07fe24:7eba3cee (local to host universo)
         Events : 0.34

    Number   Major   Minor   RaidDevice State
       0       8       48        0      active sync   /dev/sdd
       1       8       64        1      active sync   /dev/sde
       2       8       80        2      active sync   /dev/sdf
universo:~# mdadm --add /dev/md1 /dev/sdc
mdadm: added /dev/sdc
universo:~# mdadm -D /dev/md1
/dev/md1:
[…]
          State : clean
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1
[…]
    Number   Major   Minor   RaidDevice State
       0       8       48        0      active sync   /dev/sdd
       1       8       64        1      active sync   /dev/sde
       2       8       80        2      active sync   /dev/sdf

       3       8       32        -      spare   /dev/sdc
  
Our newly added device has become a spare! In this case if one of the active devices would fail the spare would be integrated into the array automatically, thus preserving redundancy and so the fault-tolerant-ness of the array.
The next step is to tell the kernel to change the array geometry to include another device, this implies a total rewrite of the stripes since now the error correction stripe is no more calculated from 2 data stripes, but from 3, so this operation will take time. A lot, possibly, and the system will be vulnerable at a certain degree to device failure and power loss ...
universo:~# mdadm --grow /dev/md1 --raid-devices=4
mdadm: Need to backup 768K of critical section..
mdadm: ... critical section passed.
universo:~# mdadm -D /dev/md1
/dev/md1:
        Version : 00.91.03
  Creation Time : Fri May 23 23:09:06 2008
     Raid Level : raid5
     Array Size : 976772864 (931.52 GiB 1000.22 GB)
    Device Size : 488386432 (465.76 GiB 500.11 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Thu Jun 26 18:28:22 2008
          State : clean, recovering
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

 Reshape Status : 17% complete
  Delta Devices : 1, (3->4)

           UUID : d8a342fa:f0143196:2e07fe24:7eba3cee (local to host universo)
         Events : 0.54444

    Number   Major   Minor   RaidDevice State
       0       8       48        0      active sync   /dev/sdd
       1       8       64        1      active sync   /dev/sde
       2       8       80        2      active sync   /dev/sdf
       3       8       32        3      active sync   /dev/sdc
  
The critical section of the process involves backing up and transformation of the superblocks, which include critical data on the geometry of the volume, if something goes wrong in the critical section, chanches are that you won't see your data again.
Then we have this intresting new report - we have succeded! well, not yet, since the array is recovering (but still clean!), this means that it is rewriting all data to comply with the new geometry. To check the state of the recovery process you can use the pseudofile /proc/mdstat:
universo:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [multipath] [faulty]
md1 : active raid5 sdc[3] sdd[0] sdf[2] sde[1]
      976772864 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/4] [UUUU]
      [===>.................]  reshape = 18.0% (88114560/488386432) finish=775.9min speed=8595K/sec

md0 : active raid5 sda1[0] sdb1[1]
      976767872 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_]

unused devices: <none>
  
Could be useful to watch the process to check its progress: watch cat /proc/mdstat.

Interpolation: I live in Rome, Italy, which is not a very civilized country, for many aspects the italian people never got out from fascism and is totally stuck with the new mind raping consummerism. Well, while all the people drool over football or their new air conditioner it seems nobody notices that we get disconnected from the power grid twice a day. For many reasons I haven't a backup power source so i can say that, apart from the critical section, a power loss is not a problem during the reshape process. I've tried, twice.

In the end, the reshape process will end. My experience is it is not fundamental to unmount end unshare the raid volume during these operation, in fact when the server was re-powered up after the failure the raid volumes were seamlessly mounted and NFS-shared. Raid support in current kernels is really robust!
Now, the volume has enlarged, not so the superimposed filesystem, in fact querying for the filesystem dimension:

universo:~# df /mnt/md1
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md1             976637952 822611892 154026060  85% /mnt/md1
	
shows that the FS dimension is unchanged, approximately 970Gib. To enlarge the filesystem, an XFS, in this case, we use the standard tools that come with the XFS libraries. Firstly we check the FS for errors and inconsistencies (before checking remember to unmount the filesystem!):
universo:~# xfs_check /dev/md1
	
All's OK so we grow the FS. Man page shows that, paradoxically, the FS must be mounted in order to be extended:
universo:~# mount -t xfs /dev/md1 /mnt/md1 -o noatime,nodiratime
universo:~# xfs_growfs /mnt/md1
meta-data=/dev/md1               isize=256    agcount=32, agsize=7631008 blks
         =                       sectsz=4096  attr=0
data     =                       bsize=4096   blocks=244192256, imaxpct=25
         =                       sunit=32     swidth=160 blks, unwritten=0
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=2
         =                       sectsz=4096  sunit=32 blks
realtime =none                   extsz=262144 blocks=0, rtextents=0
data blocks changed from 244192256 to 366289824
universo:~# df /mnt/md1
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md1             1465028224 822612640 642415584  57% /mnt/md1
And that's all, folks!


Sources: