Blog

Latest Technical Notes

Fixing a broken mdadm array – failed drive

Sometimes, even with the best intentions things can go wrong with a RAID array. A drive may fail, or the array may become ‘dirty’ for any number of reasons.

Here, we will go through some simple steps to repair a damaged array.

In our example case, a drive has failed. By running the following command in a terminal, we can get a status update on our array:

sudo mdadm --detail /dev/md0 # Displays detail about /dev/md0

The output:

failed.png

You can see the state is listed as “clean, degraded” this means a drive is missing from the array. Also note that device 1 has been “removed”.

Before we do anything, we need to unmount our array (in this case, /dev/md0)

sudo umount /dev/md0 # Unmounts /dev/md0

If you recieve a ‘device is busy’ warning, you can find out what process is using the array with the following command:

fuser -m /dev/md0 # Shows what process number is using /dev/md0
/dev/sdc1: 538
ps auxw|grep 538
# Shows what process number 538 refers to
damian 538 0.4 2.7 219212 56792 ? SLl Feb11 11:25 rhythmbox

So in this case, it is rhythmbox that is using the drive. Close this, and umount again. If it is Samba, then issue the following command then umount:

sudo /etc/init.d/samba stop # Stops the Samba process

Failed Drive has been re-added

In this instance, we can try to re-add the lost device. In this case I believe that restarting your computer is a good first step. On reboot, open a terminal and run su to become root. Then run the following commands:

sudo mdadm --detail /dev/md0   # Just to check nothing has changed
sudo mdadm --add /dev/md0 /dev/sdc1  # To re-add the faulty (now working) HDD

You should recieve mdadm: re-added /dev/sdc1. If so, run the following:

sudo mdadm --detail /dev/md0   # To ensure the drive has readded successfully

The output should look like this:

rebuilding.png

Notice “clean, degraded, recovering” this is a good sign – as is “spare rebuilding” these messages mean that the array is rebuilding successfully (so far).

To monitor further – run the following command:

sudo watch cat /proc/mdstat

This command will display the status of mdadm, and refresh every 2 seconds. When you are done watching, you can press CTRL+C to escape back to the command line, or you can simply close the terminal window
rebuilding2.png

The eventual result of another sudo mdadm --detail /dev/md0 should show the array as “clean”

MDADM can be tricky – Jaytag can help. Give us a call on 0845 310 2750 about software and hardware RAID arrays, and we can give advice about choosing the right option for you.

DamianFixing a broken mdadm array – failed drive
Share this post

Join the conversation

Related Posts