Simons Blog | Newest post | About / Imprint

Old RAID controllers may destroy your data

Posted on: 2018-04-06

This just happened to a colleague of mine. He had two 3 TB disks in a RAID-1 configuration, attached to an older Marvell RAID controller. (But the manufacturer really doesn't matter). Suddenly Windows started complaining about being unable to access folders and then the disk completely went away. Disk management just shows it as "unformatted". However both disks report as OK. What happened?

The problem is that some older controllers don't know how to handle drives larger than 2TB and they deal with it the worst way possible: A wrap around. So if the system accesses a sector exactly 2TB from the start of the disk it actually gets the first sector, the Master Boot Record. And this not just happens during reading but also during writing. The worst thing about this is that this problem doesn't show up immediately, only after you've copied around 2TB worth of data to the disk, which you will suddenly loose once you hit that watermark as important filesystem information is being overwritten :( First folders and files will become unreadable then the partition isn't recognized anymore.

If you have such a setup there are easy way to test this: If you've just bought the disks try filling them up all the way with some data you can afford to loose, like take any large HD movie and copy it to the disk a few hundred times. If you have more than 2TB on your disk and it is still there after a reboot, you're probably in the clear.

To be 100% certain you have to examine the disk.

Linux

Open a terminal and become root (sudo -i), then enter the following, but replace /dev/sda with your RAID array. If you don't know which disk to test try "cat /proc/partitions" to get a list:

dd if=/dev/sda of=/tmp/part1 bs=512 count=10
dd if=/dev/sda of=/tmp/part2 bs=512 count=10 skip=4294967296

If you get a message like this from the second "dd" command then everything's OK as your disk is smaller than 2TB (or you have the wrong disk, check again):

dd: ‘/dev/sda’: cannot skip: Invalid argument

If it succeeded compare the two with this command:

diff /tmp/part1 /tmp/part2

The diff should print out something like this:

Binary files /tmp/part1 and /tmp/part2 differ

If diff doesn't print anything then it means that the files are identical and your system most likely suffers from the wrap around (read below). Once you're done remove the temporary files:

rm /tmp/part1 /tmp/part2

Windows

On Windows we have to get some extra software. Download the HxD Hex-Editor (Freeware) from here: https://mh-nexus.de/en/downloads.php

Then click the button to open your disk. Use the disks labeled "Hard disk 1" or "Hard disk 2", NOT the drive (C: or similar). Open the disk in read only mode. Make a screenshot of the first sector, then type in sector number 4294967296 and press enter. Compare the contents from this sector to the one from your screenshot. If they are identical your system most likely suffers from the wrap around.

My system has this bug. What next?

First of all, make a backup. Seriously, even if you are far from the 2TB mark, copy everything to a new, preferably external, disk. Don't postpone it, buy a USB disk immediately if you don't have one with enough capacity to spare.

Then, wipe the disks and decide what to do next:

My system has this bug and I've lost data. What next?

Well, that's bad. Really bad. Recovery is very hard and you will certainly not get 100% of your data back. I can recommend OnTrack EasyRecovery but if you had very valuable data on it better give it to a recovery specialist.