akopalypse.net

Badblocks

badblocks is part of the e2fsprogs package.

Even brand new hard disks can have damaged sectors. If you run a test before using the disk, bad sectors can be replaced with replacement sectors.
Depending on the computing power and size of the hard disk, a test with badblocks can take several days.
Since badblocks was originally written to verify floppy disks, its design isn’t construed for modern HDDs and has some hardcoded 2^32 limitations. With sizes such as 18 TB drives, even the use -b 4096 won’t help anymore. One can choose blocksizes of 8192 bytes, which is widely seen as a reason for possible false-negative results: no bad blocks found when they may still exist.

Attention! The data on the disk will be irretrievably deleted during this operation!

# badblocks -b 8192 -s -w -v /dev/device

Options:

-b blocksize (default 1024) -c number of blocks tested at once -s show progress, progress in percent -w write-test-mode -n non-destructive mode

(-w and -n are mutually exclusive!) -v verbose mode

Alternative

This is an alternative, from the archlinux wiki:

Span a crypto layer above the device:

# cryptsetup open /dev/device name --type plain --cipher aes-xts-plain64

Fill the now opened decrypted layer with zeroes, which get written as encrypted data:

# shred -v -n 0 -z /dev/mapper/name

Compare fresh zeroes with the decrypted layer:

# cmp -b /dev/zero /dev/mapper/name

If it just stops with a message about end of file, the drive is fine. This method is also way faster than badblocks even with a single pass. As the command does a full write, any bad sectors (as known to the disk controller) should also be eliminated.
On btrfs and ZFS, the designers have decided that a floppy-era bad block list is not needed any more. They are usually right as long as you write over the defects (see above). Reading will still hang from retrying. If you want to “isolate” the bad blocks like in the old days, use a lower-level solution by partitioning or LVM.

(via)[https://wiki.archlinux.org/title/Badblocks] (via)[https://superuser.com/questions/528176/using-badblocks-on-modern-disks]

SMART

Install smartmontools. Tests can be done in a foreground and a background mode. Chose foreground, when you need to know right now, if the disk is faulty. Chose background if the disk is needed to function without lowering of throughput due to testing.

SMART enabled?

# smartctl -i /dev/device

A short test on a disk not in use otherwise:

# smartctl -t short -C /dev/device

A possibly long and thorough test takes a

# smartctl -t long /dev/device

Results

# smartctl -a /dev/device

Most relevant parameters are

1 Read Error Rate
5 Reallocated_Sector_Ct
187 Reported Uncorrectable Errors
196 Reallocated_Event_Count
197 Current_Pending_Sector
198 Offline_Uncorrectable

A non-zero value for 199 UDMA_CRC_Error_Count usually means there has been some disruption while writing, like a power failure or a lose cable. It does not hint at a fault in the disk itself.

Fast SSD Health check

I found this on the internet. You need smartmontools, sed, awk and bc installed.

#!/bin/bash

#######################################
# Variables                           #
#######################################

SSD_DEVICE="/dev/sda"

ON_TIME_TAG="Power_On_Hours"
WEAR_COUNT_TAG="Wear_Leveling_Count"
LBAS_WRITTEN_TAG="Total_LBAs_Written"
LBA_SIZE=512 # Value in bytes

BYTES_PER_MB=1048576
BYTES_PER_GB=1073741824
BYTES_PER_TB=1099511627776

#######################################
# Get total data written…             #
#######################################

# Get SMART attributes
SMART_INFO=$(sudo /usr/sbin/smartctl -A "$SSD_DEVICE")

# Extract required attributes
ON_TIME=$(echo "$SMART_INFO" | grep "$ON_TIME_TAG" | awk '{print $10}')
WEAR_COUNT=$(echo "$SMART_INFO" | grep "$WEAR_COUNT_TAG" | awk '{print $4}' | sed 's/^0*//')
LBAS_WRITTEN=$(echo "$SMART_INFO" | grep "$LBAS_WRITTEN_TAG" | awk '{print $10}')

# Convert LBAs -> bytes
BYTES_WRITTEN=$(echo "$LBAS_WRITTEN * $LBA_SIZE" | bc)
MB_WRITTEN=$(echo "scale=3; $BYTES_WRITTEN / $BYTES_PER_MB" | bc)
GB_WRITTEN=$(echo "scale=3; $BYTES_WRITTEN / $BYTES_PER_GB" | bc)
TB_WRITTEN=$(echo "scale=3; $BYTES_WRITTEN / $BYTES_PER_TB" | bc)

# Output results…
echo "------------------------------"
echo " SSD Status:   $SSD_DEVICE"
echo "------------------------------"
echo " On time:      $(echo $ON_TIME | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta') hr"
echo "------------------------------"
echo " Data written:"
echo "           MB: $(echo $MB_WRITTEN | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta')"
echo "           GB: $(echo $GB_WRITTEN | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta')"
echo "           TB: $(echo $TB_WRITTEN | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta')"
echo "------------------------------"
echo " Mean write rate:"
echo "        MB/hr: $(echo "scale=3; $MB_WRITTEN / $ON_TIME" | bc | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta')"
echo "------------------------------"
echo " Drive health: ${WEAR_COUNT} %"
echo "------------------------------"

hdparm

hdparm is a command line utility to set and view hardware parameters of hard disk drives. hdparm can also be used as a simple benchmarking tool. To get information about hard disks, run the following:

# hdparm -I /dev/sda

Power management configuration

Overly aggressive power management can reduce the lifespan of hard drives due to frequent parking and spindowns.

-B	Set the Advanced Power Management feature. Possible values are between 1 and 255, low values mean more aggressive power management and higher values mean better performance. Values from 1 to 127 permit spin-down, whereas values from 128 to 254 do not. A value of 255 completely disables the feature.
-S	Set the standby (spindown) timeout for the drive. The timeout specifies how long to wait in idle (with no disk activity) before turning off the motor to save power. The value of 0 disables spindown, the values from 1 to 240 specify multiples of 5 seconds and values from 241 to 251 specify multiples of 30 minutes.
-M	Set the Automatic Acoustic Management feature. Most modern hard disk drives have the ability to speed down the head movements to reduce their noise output. The possible value depends on the disk, some disks may not support this feature.

To query current value of -B, pass the parameter without a value:

# hdparm -B /dev/sda

To apply different value, for example set APM to 127:

# hdparm -B 127 /dev/sda

Power off a hard disk drive

A typical usage case, where such a feature is looked for, is with disks connected to a cheap external USB/SATA/FireWire enclosure, or bridge. If it does not properly issue a stop command to the drive when turning off the power switch, the drive is forced to do an emergency head retract. Regularly doing that will, sooner or later, break the drive. One solution is, after one is sure the data has been written to the media, to run a command to power off the drive:

# hdparm -Y /dev/sdX

Warning: Be absolutely sure
* the data was actually written to the media. It is also advised to wait some time so that the drive will become idle.
* the device, /dev/sdX in the example, is the one you want to power off.

via