by loudog
Have you ever wondered about your hard drive's health? Is it getting along in years, or how many hiccups has it had during it's life? Does it run hot? Or the most important question of all, is it ready to die? At one time or another we have all dealt with these nagging questions.
Some PCLinuxOS users are very adept with the OS, and with a few strokes of the keyboard, can launch a program from the CLI (command line interface), and satisfy any of these nagging questions fairly quickly. But what about those of us who are not quite comfortable in the console yet? What about those of us who are members of the "Point and click generation?" What program does PCLinuxOS have that would alleviate or aggravate our suspicions as to the hard drives recent erratic behavior and provide us with a simple but user friendly GUI?
Why, Gsmartctl of course! This little program is an excellent choice when you need to check on your HD's basic health status. Recently, while writing an article on KdenLive, I noticed one of my videos was corrupted from the 7 minute 9 second mark onward. This did not make me a happy camper, since the corrupted part of the file was the best part. I had just reviewed it several days before, and was intrigued as to the cause of the corruption.
After an internet search (and being from the point and click generation), I determined that Gsmartctl could provide some of the information I required to answer those nagging questions. I was pleasantly surprised to find it in the repository. After loading the program from the repository (you will need to load the gui also) I started it up to have a look see. I found it in the main menu/more applications/monitoring section after installation.
Let's take a look at the GUI. Before starting, the program will ask for the administrator credentials. After we enter our root password a quaint little window opens up that shows what drives Gsmartctl has detected.
Expand the window to your taste and let your mouse hover over different areas for an excellent array of tooltips. This abundance of tooltips can be found throughout the program. Clicking on any device will change the basic information displayed at the top of the window to reflect the selected drives attributes. We can see that the particular drive selected has passed the basic health test, SMART (Self-Monitoring Analysis and Reporting Technology) is enabled but the offline data collection is disabled. When we try to enable it we discover that this particular drive is possibly not manufactured with the option installed in the firmware.
Why don't we take a look at the output and see what that can tell us.
Wow! Thats a lot of information. It appears that the drive may possibly be able to use the offline data collection feature if we just find a way to enable it. After performing an exhaustive online search, it became apparent that my particular ssd drives latest firmware did not support the offline data collection service. Hmmm. Okay, let's move on. During the casual browsing of your other drives you may notice some displayed devices are not SMART supported either.
These are usually the flash and/or cd-dvd drives on the system. A quick check into the "show output" information informs us that this is our cd/dvd drive.
To remove the drives that are not SMART supported we simply go to options/preferences from the selection tabs at the top of the window.
This new window will give us the options to select or disregard at our discretion. Ticking the "show SMART capable drives only" box is a personal preference but not a requirement.
If you tick the box, the next time the program is started, it will not show the unsupported drives. Why don't we take a closer look at what the program has to tell us about one of our drives. We will select a drive, right click on it and choose the "View Details" option from the dropdown menu.
This new window we are presented with is full of features and contains the in depth information we are looking for.
At the top we see the various tabs that will lead us deep into the drive's SMART capabilities. The identity tab is pretty straightforward, showing the basic information about the drive. What I want to draw your attention to are the tabs that contain red font. This indicates there are recorded errors in the SMART data. I think it might be important to check the error log and see what it reports (noooo, no, no. no! YEP. Crud!)
Yep, SMART has recorded 1,886 errors on this drive so far (errrrrrrrr). Looking closer, we see the error was recorded as "Uncorrectable error in data" at 32,788 hours. Hmmm, you chump. Well, well well, let's see here, if we divide the total hours by 24, then divide again by 365 we get 3.7 years. Wow! This drive had been powered on for 3.7 years at the time of the last error. With a 3 year warranty I'm thinking this drive is uuuhhh, well, it starts after the letter E. I believe this one is about ready to be retired.
"I wonder how old the drive is now" I ask myself, which is actually an important question, because with the actual hour/age the drive is, now with a little math, we can discern when the error occurred and for some reason I now need to know. To find actual power on age, we go to the attributes tab. Yep, the other one with red font.
Okay, seeing the total power on time in hours, we take that number, 33792 minus 32788 (time of last recorded error) = 1004 / 24 = 41.8 days ago. In actual power on state, this is when the last error was reported. Since I generally leave my PC running 24/7, I ascertain the last reported error occurred roughly about a month and a week ago. Hmmmm, if I recall correctly that's about the time I was transferring the drives etc to a more modern 4 core machine. I vividly remember that one of the drives slipped from my hand and fell about 8 inches to the desk. Owwwie, my liver! I was hoping at the time that I hadn't ruined it. Well, it's not ruined, but apparently did receive some damage. Looking at the errors (highlighted in pink) the one that stands out is the Reallocated Sector Count. The reason it is noteworthy is because it falls under the pre-failure type. Let's see what the tooltip says.
The tool tip informs us that there is no SMART warning yet but the raw value (actual value on drive) is not zero and this could indicate an impending failure. The threshold for this error is 140 and we have a raw value of 211, quite a bit above the flag limit (threshold) that is why it has been flagged. If your curisoty gets the best of you and you're wanting a more indepth description of the whats and whys of the attributes section go here: https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis,_and_Reporting_Technology. Now it's time to look at the perform tests tab.
The first test I will run is the Conveyance self test because it is for detecting transport damage, which is what I suspect happened when I dropped it. Later, I will run the other tests, just for good measure. The conveyance test finished with positive results.
After running all three tests without error, I have decided that the drive is ok to use, but based on the hours, errors and warranty, it may possibly need replacing soon. I also copied some gigantic files to the drive, filling it up and did not get any new errors in the error log. My curiosity is satisfied at the moment, but I will be keeping a close eye on this drive for sure. It must be noted that SMART is in no way a foolproof plan to determine if the drive will fail or if your data will be safe, but it can be a great tool in helping you make informed decisions as to when the drive may need replacing, or why it's been acting "funny" lately. Oh and one last thing before we go, if you see something like this
or this
You may be convinced that GSmartctl is smarter than you thought. Use your own discretion and enjoy all things Linux.
|