Catastrophic Disk Failure... it can happen to you too.

Posted by: rjstaines on 26 June 2016

The term 'Catastrophic disk failure' is used in the IT industry to denote the unannounced failure of a hard drive... it's the Operations Manager's nightmare stuff.

But modern disk technology provides us with warnings that something is about to go wrong, giving us the opportunity to aquire a new, replacement disk.... usually.

But not always.  Last Friday at around lunchtime my Netgear readyNAS Duo suddenly and without any warning announced that one of its two disks had been 'disconnected'.  It was now running in unprotected mode - RAID protection was no longer operative.  One of the disks has simply failed totally, no warning signs, no error reports... it just stopped working and the NAS drive had disconnected it from its RAID array.

Panic would be the usual reaction... almost my entire music collection now spinning on just the one disk, failure of which would cause the loss of everything.

It cannot be denied that this is a scary situation.

Amazon Prime to the rescue... delivery of a replacement hard drive by 09.30 Saturday, installation of the new disk by 09.45 and sit back all day and watch the RAID array being slowly re-built (completing fourteen hours later).  So as of this beautiful, sunny Sunday morning, my music collection is once again safe.

 

What did I learn?

Well there's not a lot you can learn other than the unexpected can and does still happen, even in these days of advanced hard drive technology.

The failed disk was a Seagate - installed in May 2012, so four years + service.  The replacement is a WD RED, matching the remaining disk - probably worth paying that bit extra for a 'server rated' disk.   Four years service from a hard drive seems to be about right... in my computer-maintenance business I always advise customers who have had their PCs for over three years to think carefully about the consequences of a disk failure; you'd be surprised how many folks are not aware they would lose all their photos and other documents because they never thought of this thing called backup !

...but most importantly... while I was sitting there for the 24 hours waiting for Amazon to deliver my replacement disk, the consequence of the second disk failing were constantly at the forefront of my mind. The loss of thousands  of albums... unthinkable, catastrophic !   And how many of us would be in a similar situation?  Hair turning grey overnight would become a reality (if I had any).

BUT I'm an IT guy (albeit retired), so you wouldn't expect me to be satisfied with a single level of backup, and I'm not. My ReadyNAS drive is copied each week to a second NAS drive which is located in my garage, away from the house, connected by undergrount CAT6 cable.  Different folders (shares) are copied each day, so over a week, everything is replicated. Those shares that get updated often (HD albums for example), are copied daily, my CD collection twice weekly, my MP3s weekly.

So having in place a robust, two layer backup strategy enabled me to sleep peacefully last night 

But there's a sting in the tail...

Forty years in IT has taught me that you can never plan for every contingency... even the most 'robust' of backup strategies have flaws... and the sting?  You don't discover the flaws until a real disaster happens.   Complacent I'm not... optimistic....always! 

Don't forget, if a disk is involved, the worst can and sooner or later will happen to you too  

Roger

 

Posted on: 26 June 2016 by Bert Schurink

I have already experienced loss of files in total. And it's frustrating especially when you have a large collection, and can't find out what you lost. So double security is definitely required. Good call out with your post...

Posted on: 26 June 2016 by DWO-Naim

+1 - good call on the reminder to have robust back up strategies. Especially if you have a large collection of music and other digital content.

Roger - You have probably considered this already, but is your garage back up raised some distance off the floor (protection against floooding) and on a different (protected?) mains supply (protection against lightning strike/unforeseen power surges)? As you say you never know until a real disaster strikes. Some people I know have their second layer at a more distant location either connected via WAN/Internet or physically move drives.

Posted on: 26 June 2016 by blythe

I have three back-up NAS drives in my house, as well as, like you, one in my garage. 
I also keep another NAS drive in a different house, as well as making occasional manual back-ups to USB hard drive. These last two in the worst case, might be 6 months out of date but at least that's only likely to be a relatively small number of CD's to re-rip....

Overkill maybe, but I don't fancy having to rip well over 1000 CD's ever again.....

Posted on: 26 June 2016 by DavidDever

(re-edited post)

If you have an older consumer-grade commodity NAS, you will find that it becomes quite difficult to keep the unit up-to-date with the pace of Linux kernel & security updates, as well as vendor-specific updates, necessary to keep the unit in a secure, fully-functional state.

You may also find, as I did, that a full-sized desktop chassis in another room (running an off-the-shelf OS) might end up being more fundamentally reliable, especially if you're using something along the lines of Asset UPnP or MinimServer.

If space precludes a desktop computer as storage device, both Synology as well as QNAP seem to be reasonable vendors with relatively frequent support updates. For these devices, SSDs are definitely a fair option, with prices on a per-gigabyte basis now (here in the U.S., anyway) hovering between $0.25-0.33 – as always, your mileage may vary.

Posted on: 26 June 2016 by rjstaines
DWO-Naim posted:

+1 - good call on the reminder to have robust back up strategies. Especially if you have a large collection of music and other digital content.

Roger - You have probably considered this already, but is your garage back up raised some distance off the floor (protection against floooding) and on a different (protected?) mains supply (protection against lightning strike/unforeseen power surges)? As you say you never know until a real disaster strikes. Some people I know have their second layer at a more distant location either connected via WAN/Internet or physically move drives.

Quite right, DWO- the garage has its own power supply and the NAS is mounted on a protected shelf half way up the wall, out of reach of grandchildren and cats. This part of the garage is actually used as an outside office, complete with heating, so the NAS environment is very 'friendly'.  And of course, having a network feed enables not only the NAS to run but also a separate PC that SWMBO uses for her tennis club activities.

I've never been a fan of physically moving devices offsite as other respondees have mentioned... maybe back to my days working in Virgin when we had to schedule a guy or girl out for a few hours, transporting the backup device to it's 'safe' location... the need for care during transport and the need for testing the re-connection of the returned device... altogether too many opportunities for things to go wrong.  For me personally I'm a lifetime fan of 'set it and forget it' (oh and  have it monitor its own health, of course). 

I thought twice about wrtiting this, DWO- but here where I live in the Wirral, stuck between Liverpool and Wales, we've never been flooded, so that hasn't been a consideration.  I realise though, looking out on the rain this afternoon, that I should probably now spend the remainder of the day with my fingers crossed 

Posted on: 26 June 2016 by Adam Zielinski
blythe posted:

I have three back-up NAS drives in my house, as well as, like you, one in my garage. 
I also keep another NAS drive in a different house, as well as making occasional manual back-ups to USB hard drive. These last two in the worst case, might be 6 months out of date but at least that's only likely to be a relatively small number of CD's to re-rip....

Overkill maybe, but I don't fancy having to rip well over 1000 CD's ever again.....

You are not alone in this - I do the same

Posted on: 26 June 2016 by Brubacca

I have 2 readynas Duo machines. Each is mirrored and #1 backs up to #2

Posted on: 27 June 2016 by Claus-Thoegersen

Yesterday I played a 24 bit highres  album from my  nas. Suddenly I got a few dropouts from time to time, and my ns01 stopped playing tracks from the playlist. Strange since I do not have network issues and the nas is connected to the ns01 with cables through a netgear 105 switch. Restarted my readynas, and from here the disk started to act up and is reported as not working by the Raidar utility. I will have to check before I send it back, and it will take days maybe a week until this is fixed. This is the second time in 4 years a disk is failing, and a wd red disk that should be more reliable in a nas. But I have a  Naim serbvber with an internal hard disk, so I am only missing my highres albums and other downloads. If I want to listen I can just copy the albums from my backup disks to an usb stick. I am really glad I did not go for the ssd version of the server, that would have caused much more frustration.

Claus  

Posted on: 27 June 2016 by Mike-B

Hi Claus,  can I ask what the HDD make is.  I ask because I helped someone with his ReadyNAS with the same problem a few months ago & they were Seagate,  & I understand ReadyNAS fit Seagate as standard.  We looked up some reports & found Seagate to have reliability problems,  but to add confusion the latest version of that same report shows Seagate to now be "much improved".      The most popular HDD on the forum seems to be WD "Red",  but I now see WD have reliability issues according to the same report.     I also note the most reliable, HGST, is now owned by WD Co.     What to do ???  I just hope I don't face a HDD failure !!!!   Might go SSD if I could find some reliability data        

Posted on: 27 June 2016 by Huge

Mike, most top line manufacturer's MLC SSD have now exceeded the statistical reliability of HDDs when wear leveling is allowed (i.e. probably not Melco!); the change-over came about four years ago.  High reliability, industrial rated SLC SSDs have been more reliable than HDDs for quite a lot longer than that and are currently the most reliable R/W bulk storage medium that are reasonably easily available (but at considerable cost).

The other point is that to maximise the reliability of SSDs you need to keep them as cool as possible (but above 0°C), whereas for HDDs the temperature needs to be allowed to rise a little (optimal is typically somewhere about 30°C to 35°C).

Posted on: 27 June 2016 by Claus-Thoegersen
Mike-B posted:

Hi Claus,  can I ask what the HDD make is.  I ask because I helped someone with his ReadyNAS with the same problem a few months ago & they were Seagate,  & I understand ReadyNAS fit Seagate as standard.  We looked up some reports & found Seagate to have reliability problems,  but to add confusion the latest version of that same report shows Seagate to now be "much improved".      The most popular HDD on the forum seems to be WD "Red",  but I now see WD have reliability issues according to the same report.     I also note the most reliable, HGST, is now owned by WD Co.     What to do ???  I just hope I don't face a HDD failure !!!!   Might go SSD if I could find some reliability data        

WD red 3 TB. I cannot remember what my previous disk was, but at least that disk lived for 3 years. It is probably a good idea to get another disk and mirror the disks, just to avoid this situation again. I have music but it is anoying and takes to much time to get the nas up and running again.

I am not going to invest in ssd disks, it is still too expensive.

Claus

Posted on: 27 June 2016 by Brubacca

All mechanical hard drives will fail at some point.  As others have pointed out RAID is not the only solution.  It is part of the solution.  Raid and a Backup re th best way to future proof your data.  Another recommendation is an off site backup.  That way if catastrophe strikes your residence the data is elsewhere.  

Posted on: 27 June 2016 by Mike-B
Huge posted:

Mike, most top line manufacturer's MLC SSD have now exceeded the statistical reliability of HDDs

Thks Huge,  I've been reading up on reliability & see more or less the same.   I have it all tucked away for that bad NAS day as I don't think I will make the change until I get hints that a drive is showing signs (or just fails).   All I need to do is select whatever make/model is the best at that time for reliability, speed and/or SQ (if such a thing exists).   I'm surprised at prices, they have more than halved since last time I looked

Posted on: 29 June 2016 by jon h

"BUT I'm an IT guy (albeit retired), so you wouldn't expect me to be satisfied with a single level of backup, and I'm not. My ReadyNAS drive is copied each week to a second NAS drive which is located in my garage, away from the house, connected by undergrount CAT6 cable.  Different folders (shares) are copied each day, so over a week, everything is replicated. Those shares that get updated often (HD albums for example), are copied daily, my CD collection twice weekly, my MP3s weekly.

So having in place a robust, two layer backup strategy enabled me to sleep peacefully last night "

So a power spike to your house could easily take out both devices. Its possible that both NAS's are the same, in which case a bad firmware upgrade could vape both. 

What you have described is better than a single point of storage. But not much.

Posted on: 29 June 2016 by DavidDever
Brubacca posted:

All mechanical hard drives will fail at some point.  As others have pointed out RAID is not the only solution.  It is part of the solution.  Raid and a Backup re th best way to future proof your data.  Another recommendation is an off site backup.  That way if catastrophe strikes your residence the data is elsewhere.  

+1 - I cannot stress more the need for off-site backup. In an age where nearly everything is stored in the cloud, multi-site replication in support of disaster recovery is crucial.

Jon's point regarding homogeneous firmware across your devices is also an excellent observation, though, frankly, I'd be very wary of NETGEAR at this point as regards NAS support / regular firmware updates. I believe that Synology and QNAP are both better bets in that respect.

Posted on: 29 June 2016 by Foot tapper

Hi Roger

Helpful advice, much appreciated.

I also have 2 physically separate NAS drives, of different design, both mirrored, with one backing up the other.  Jon makes a good point re firmware, so I'll keep them different from now on, as both are Synology NAS drives.

My main concern has been a lightning or mains spike, so I have recently installed a small APC "Back-UPS CS650" uninterruptible power supply for the main NAS drive, computer, broadband modem, wifi router, ethernet switch & office printer.  

Never thought I'd need it though.

Until 2 days later when the mains supply suddenly went down.

Phew!

Still vulnerable to lightning hitting the cat6 network and frying everything, but I'm not sure how to avoid that one!

Best regards, FT

Posted on: 29 June 2016 by Harry

It's happened to me also Roger. A readyNAS Duo box just went pop and took both HDDs with it. More recently one of my WD Red HDDs in the TS470 stopped working spontaneously. WD delivered on their commitments re replacement, I'm happy to report (that's two WD Red HDD failuress in two years - both replaced FOC).

As you say, if such a failure causes sighs and the grinding of teeth while the failed hardware is replaced and the backup is dug out of the cupboard, then you've done it right. If it causes tears, regrets and a "back to square one" result, then you've at least learned a valuable lesson. Unfortunately.

Posted on: 29 June 2016 by Claus-Thoegersen

Catastrophies come in bundles. I send my dead wd disk in to be replaced under warranty, but the disk was physically damaged during transport, so no replacement here! partly my own fault but annoying to have wasted time on sending the disk back. And yesterday my adsl line died, so no music from the main system, since I do not have other ways to control it than using the remote. Maybe I should buy a really simple router just to be able to get dhcp working in a situation like this. Anyway internet should be back tomorrow and my 2 new wd disks should also arrive. If this continues the internet fairy will have eaten my backups or the 2 usb disks will have died over night!

Claus

  

Posted on: 29 June 2016 by Harry

It sounds like you've had your quota of misery. Probably all good from now

Posted on: 30 June 2016 by PeterJ

I have a Melco N1A and a ReadyNAS Duo (with WD drives). I also backup to a WD Passport USB drive which I only connect to the devices for backups and normally keep in a fire safe. If all of your backups are always online there is a danger you might be hit with a crypto locker malware infection (and yes I run anti-virus etc) which could take everything out. I did have an Iomega RAID drive pair and both drives failed a few years ago (fortunately not at the same time). I am also ex IT.

Posted on: 30 June 2016 by Huge

Crypto viruses can be stopped by a DMZ implemented in a different OS - I use the NAS OS to do just that, to protect a second disk that isn't accessible from the network.

Posted on: 30 June 2016 by Guy007

I would endorse the HGST ( formally know as Hitachi ) Deskstar NAS range,

Prior to getting the QNAP NAS I was using LaCie d2 Quadra drives ( which I still use, but as incremental back ups )  they also are solid drives too - only issue I had was with a power supply to an older 500GB model.  But as of 2014 Seagate own LaCie. 

So it looks regardless of the name, WD or Seagate are the main players.  But I look at the HGST as the Lexus brand and the WD as the Toyota.

And if you are looking for SSD's, I would look at SanDisk Extreme Pro's with their 10 year warranty - which also ironically, is a company now owned by WD too - as of May 2016.

Posted on: 01 July 2016 by pixies

I have just ordered a 2nd WD Red 2TB HDD for my Synology NAS to enable me to use it in RAID. I have only had 1 HDD in my NAS for several years yet always backup regularly to several external hard drives, however this topic has reminded me to add another level of security, so thanks!

Posted on: 01 July 2016 by Claus-Thoegersen

Problems not over yet. My los of internet is due to a faulty router. It seems my new adsl provider either has been unlucky with a batch of routers, I was not the only person to lose my internet connection. A new router should be on the way, so it will probably be here monday or Tuesday. Well 4G on my iphone works for most things, but not internet radio not Iradio  on the ns01 and mu-so.  At least I found a Windows dhcp server so I can use my ns01 again and configure my readynas with the new disks.