Subject: using mdadm and xfs_growfs to remove a disk from a RAID5 array and grow another RAID5 with that disk
Keywords: mdadm, raid 5, reshape array, resize filesystem, remove disk, nfs, xfs
Situation: first of all I'll list my system configuration:
Solution:
As a preamble, afer having unmounted the NFS shares from the clients, login onto the server as root and shut down NFS sharing:
universo:~# /etc/init.d/nfs-kernel-server stop Stopping NFS kernel daemon: mountd nfsd. Unexporting directories for NFS kernel daemon....note this is an unnecessary measure as you can unshare single exports via
exportfs -u <path-to-share>, yet we are gross and shut down the whole thing. Then we check if we are clearuniverso:~# exportfsthis should list all exported dirs: no exports - we are happy. We proceed further unwiring the volumes unmounting them from the system:
universo:~# umount /dev/md0 universo:~# umount /dev/md1OK, now we can go on tearing off the disk from
md0, but first let's peek at what we are dismantling:
universo:~# mdadm -D /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Sat Aug 25 20:08:59 2007
Raid Level : raid5
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Device Size : 488383936 (465.76 GiB 500.11 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu Jun 26 16:08:20 2008
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : d6edefb0:96b08698:df9cb6b2:0b4a12a3
Events : 0.480
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
Seems quite an healthy array! Note that the -D option is to get details on an MD volume. Now we want to completely remove, say, sdc, from the array, but before doing that, we must tell the kernel not to use that hard drive any more:
universo:~# mdadm -f /dev/md0 /dev/sdc1
mdadm: set /dev/sdc1 faulty in /dev/md0
universo:~# mdadm -D /dev/md0
/dev/md0:
[…]
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
[…]
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 0 0 2 removed
3 8 33 - faulty spare /dev/sdc1
The -f option tells mdadm to mark the specified device as faulty, still it remains part of the array as a spare drive to be used in case of another drive to become unusable (in this case there will be data loss because the array is already degraded!); the state of the array becomes degraded as it has no more his normal gemetry but is still clean because there has been no data loss. sdc can now be removed from the array:
universo:~# mdadm -r /dev/md0 /dev/sdc1
mdadm: hot removed /dev/sdc1
universo:~# mdadm -D /dev/md0
/dev/md0:
[…]
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
[…]
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 0 0 2 removed
Now we are ready to add the removed device to md1, but but before we look at its state:
universo:~# mdadm -D /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Fri May 23 23:09:06 2008
Raid Level : raid5
Array Size : 976772864 (931.52 GiB 1000.22 GB)
Device Size : 488386432 (465.76 GiB 500.11 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Thu Jun 26 16:08:24 2008
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 128K
UUID : d8a342fa:f0143196:2e07fe24:7eba3cee (local to host universo)
Events : 0.34
Number Major Minor RaidDevice State
0 8 48 0 active sync /dev/sdd
1 8 64 1 active sync /dev/sde
2 8 80 2 active sync /dev/sdf
universo:~# mdadm --add /dev/md1 /dev/sdc
mdadm: added /dev/sdc
universo:~# mdadm -D /dev/md1
/dev/md1:
[…]
State : clean
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
[…]
Number Major Minor RaidDevice State
0 8 48 0 active sync /dev/sdd
1 8 64 1 active sync /dev/sde
2 8 80 2 active sync /dev/sdf
3 8 32 - spare /dev/sdc
Our newly added device has become a spare! In this case if one of the active devices would fail the spare would be integrated into the array automatically, thus preserving redundancy and so the fault-tolerant-ness of the array.
universo:~# mdadm --grow /dev/md1 --raid-devices=4
mdadm: Need to backup 768K of critical section..
mdadm: ... critical section passed.
universo:~# mdadm -D /dev/md1
/dev/md1:
Version : 00.91.03
Creation Time : Fri May 23 23:09:06 2008
Raid Level : raid5
Array Size : 976772864 (931.52 GiB 1000.22 GB)
Device Size : 488386432 (465.76 GiB 500.11 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Thu Jun 26 18:28:22 2008
State : clean, recovering
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 128K
Reshape Status : 17% complete
Delta Devices : 1, (3->4)
UUID : d8a342fa:f0143196:2e07fe24:7eba3cee (local to host universo)
Events : 0.54444
Number Major Minor RaidDevice State
0 8 48 0 active sync /dev/sdd
1 8 64 1 active sync /dev/sde
2 8 80 2 active sync /dev/sdf
3 8 32 3 active sync /dev/sdc
The critical section of the process involves backing up and transformation of the superblocks, which include critical data on the geometry of the volume, if something goes wrong in the critical section, chanches are that you won't see your data again./proc/mdstat:
universo:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [multipath] [faulty]
md1 : active raid5 sdc[3] sdd[0] sdf[2] sde[1]
976772864 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/4] [UUUU]
[===>.................] reshape = 18.0% (88114560/488386432) finish=775.9min speed=8595K/sec
md0 : active raid5 sda1[0] sdb1[1]
976767872 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_]
unused devices: <none>
Could be useful to watch the process to check its progress: watch cat /proc/mdstat.
Interpolation: I live in Rome, Italy, which is not a very civilized country, for many aspects the italian people never got out from fascism and is totally stuck with the new mind raping consummerism. Well, while all the people drool over football or their new air conditioner it seems nobody notices that we get disconnected from the power grid twice a day. For many reasons I haven't a backup power source so i can say that, apart from the critical section, a power loss is not a problem during the reshape process. I've tried, twice.
In the end, the reshape process will end. My experience is it is not fundamental to unmount end unshare the raid volume during these operation, in fact when the server was re-powered up after the failure the raid volumes were seamlessly mounted and NFS-shared. Raid support in current kernels is really robust!
Now, the volume has enlarged, not so the superimposed filesystem, in fact querying for the filesystem dimension:
universo:~# df /mnt/md1 Filesystem 1K-blocks Used Available Use% Mounted on /dev/md1 976637952 822611892 154026060 85% /mnt/md1shows that the FS dimension is unchanged, approximately 970Gib. To enlarge the filesystem, an XFS, in this case, we use the standard tools that come with the XFS libraries. Firstly we check the FS for errors and inconsistencies (before checking remember to unmount the filesystem!):
universo:~# xfs_check /dev/md1All's OK so we grow the FS. Man page shows that, paradoxically, the FS must be mounted in order to be extended:
universo:~# mount -t xfs /dev/md1 /mnt/md1 -o noatime,nodiratime
universo:~# xfs_growfs /mnt/md1
meta-data=/dev/md1 isize=256 agcount=32, agsize=7631008 blks
= sectsz=4096 attr=0
data = bsize=4096 blocks=244192256, imaxpct=25
= sunit=32 swidth=160 blks, unwritten=0
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=2
= sectsz=4096 sunit=32 blks
realtime =none extsz=262144 blocks=0, rtextents=0
data blocks changed from 244192256 to 366289824
universo:~# df /mnt/md1
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md1 1465028224 822612640 642415584 57% /mnt/md1
And that's all, folks!
Sources: