How To Resize RAID Partitions (Grow) (Software RAID)

This article describes how you can grow existing software RAID partitions. I have tested this with non-LVM RAID1 partitions that use ext3 as the file system. I will describe this procedure for an intact RAID array.

1. Preliminary Note

The goal of this exercise was to upgrade the drives on the RAID1 array on the file server, without having to move files or re-install a new clean operating system. Essentially, I wanted to swap the drives, and grow the file system.

The current server has (2) 500G SATA drives, making up two raid partitions /dev/md0 (O/S) and /dev/md1 (/home)

[root@waltham ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb3[1] sda3[0]
1931004864 blocks [2/2] [UU]

md0 : active raid1 sdb1[1] sda1[0]
20474688 blocks [2/2] [UU]

In summary, I took out the current primary 500G drive, and cloned it onto (2) 2TB drives. The reason for cloning the primary drive, was that the boot sector, is only written to the primary drive. That way, both clones would have a copy of the boot sector, in case that part of the disk is ever corrupted.

In a software raid, only the primary drive retains a copy of the boot sector. I learned this the hard way.

Once both drives were cloned with CLONEZILLA, I took out the old drives, and put in the two new cloned drives, and booted the system. Following are the detailed steps in the process.

Once, I rebooted the system with the two new 2TB drives, the system recognized that the drives were members of an array, but it would not re-establish the array, as you can see.

[root@waltham ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda3[0]
1931004864 blocks [1/2] [_U]

md0 : active raid1 sda1[0]
20474688 blocks [1/2] [_U]

The primary disk came online, but the other one did not.

Know that the data was intact, since one of the drives booted up fine, I ran fdisk on the drive that did not come up.

Fdisk allows me to delete the current 490G partition sdb3, and re-create it using the MAX allowed space. That way, when I recreated the partition, it was now almost 2TB. I then added the partitions to their respective arrays.

mdadm /dev/md0 –add /dev/sdb1
mdadm /dev/md1 –add /dev/sdb3

/dev/sdb2 and /dev/sda2 are swap partitions.

Once this was done, the array started to re-create itself. You can see the progress by typing the following command

cat /proc/mdstat

Once the mirroring completed, I took /dev/sda off, the array, and ran fdisk on /dev/sda3 in order to re-size it to the full size of the disk.

After that was done, you need to re-add the new partition to the array in order for the imaging to start again, on the new (bigger) partition. Since /dev/md1 is still defined at 500G, we need to take the following steps before proceeding.

2 Intact Array

I will describe how to resize the array /dev/md1, made up of /dev/sda3 and /dev/sdb3.

2.1 Growing An Intact Array

Boot into into single user mode. When the GRUB loader comes up, hit ‘e’ for ‘edit’ and select the first boot command, select ‘e’ again, and add the word ‘single’ to the command string, then hit ‘b’ to continue the boot process. At the hash prompt, you will need to unmounts the array that you wish to grow.

umount /home

Then activate your RAID arrays:

cp /etc/mdadm/mdadm.conf /etc/mdadm/mdadm.conf_orig
mdadm –examine –scan >> /etc/mdadm/mdadm.conf

mdadm -A –scan

Now we can grow /dev/md1 as follows:

mdadm –grow /dev/md1 –size=max

–size=max means the largest possible value. You can as well specify a size in KiBytes (see previous chapter).

Then we run a file system check…

e2fsck -f /dev/md1

…, resize the file system…

resize2fs /dev/md1

… and check the file system again:

e2fsck -f /dev/md1

Afterwards you can boot back into your normal system, and you should have a new filesystem, as you can see with the full size of your grown space.

[root@waltham ~]# df -H
Filesystem Size Used Avail Use% Mounted on
/dev/md0 21G 5.6G 14G 29%  /
tmpfs 4.1G 0 4.1G 0% /dev/shm
/dev/md1 2.0T 259G 1.6T 15% /home

A likely problem in any multi-tenant network.

In any local area network, there is generally one and only one DHCP server. Dynamic Host Configuration Protocol (DHCP) is a network protocol that enables a server to automatically assign a unique IP address to any local network computer from a defined range of numbers (i.e., a scope or subnet) configured for a given network.

For example, when a computer is started on a local area network, the router typically acting as DHCP server, gives the newly started computer a unique ip address so it can access other network resources, and the internet as well. If you introduce a second DHCP server on a network, you wreak havoc on all computers trying get and ip address so they can access the network. With multiple DHCP servers, varying computers get various ip addresses, generally in unrelated subnets. Some computers will get a 192.168.1.X ip and other will get a 192.168.2.X, while others get 10.1.10.X, etc, etc. Each machine will get an ip based on the DHCP server that responds fastest. However, there is always one and only one gateway, and if your computers are on different subnets, they will never access the one and only gateway. The gateway brokers all network traffic.

We have a residential client that provides shared internet access to each tenant in a multi-tenant facility. Two of the tenants moving in, decided to add their own router to the network in order to provide for themselves wireless internet access to all the computers in their unit. the problem is that they connected the wrong network interface of their routers to the building network connection. This created multiple DHCP servers on the same network. So, when some residents many floors away went to access the internet, they were greeted with a page not found, only because some DHCP server had assigned an incorrect IP number (outside the range of their primary gateway).

We were notified that the internet was down, however, our internet monitoring software showed that the internet was up. We saw no problem with the internet connection. Our monitoring servers would have notified us notified us of the slightest outage. further investigation revealed that when we unplugged the main network switch from the internet router, we were being assigned an ip. That should never happen. Voila! A rogue DHCP server! Now, we just had to identify which of the 50 different units was the location of the rogue router. We isolated one of the router by trial and error, unplugging various connections, until ping response to the culprit ip failed. however, upon further diagnosis, we found a second rogue router. So now we had a DHCP router on 192.168.0.X and another on 192.168.2.X while the primary network was on 10.1.10.X!

At this point, we configured a diagnosis machine internal to their LAN with ip aliases on each subnet. We then accessed the router config page for the 192.168.0.1 router config page to disable DHCP and disable WIFI access on that subnet. We then did the same for 192.168.2.X, and “presto chango”, we finally had access to the true subnet of 10.1.10.X. We had each of the tenants reboot their computers, and access points, and their internet connectivity was restored.

We then notified all the tenants that connectivity was restored with the exception of two tenants. Once the two offending tenants contacted us, we re-configured their routers, re-enabled their wifi and saved the day.

The beauty of all this was that we managed to do it while 60 miles away from our office. Other than the initial plugging and unplugging data jacks, we were able to accomplish the balance of the diagnosis remotely.