Data recovery from M.2 NVMe SSD. Script ddrescue-loop v0.2

Data recovery from M.2 NVMe SSD. Script ddrescue-loop v0.2

It will be about the method of extracting data from a faulty SSD for cases when after trying to read any failed sector – the SSD completely stops giving data and only a power cut helps.

I present a revised version of the script ddrescue-loop with support for USB relay control and uhubctl

A simple and cheap solution USB Relay Module LCUS-1 CH340, which are available on Aliexpress, was used to interrupt the SSD power supply. And connection through the docking station AgeStar 31CBNV1C based on the USB-NVMe bridge JMicron JMS583

Let’s consider the recovery process on the example of a case with faulty M.2 NVMe SSDs manufactured by Kimtigo on a Maxio MAP1202 controller

ddrescue-loop v0.2.1

ddrescue-loop v0.2.1

ddrescue-loop-v0.2.1.gz

#!/bin/sh
#ddrescue-loop script writen by gumanzoy <[email protected]>

# Compatible only with Linux, not with other *nix!
# Depends on udev /dev and sysfs /sys kernel interfaces

# For SATA requires AHCI compatible motherboard
# For all Intel and modern AMD platforms (AM4 and newer), check the UEFI Setup
# SATA settings to ensure Port Hot Plug is enabled

# For USB requires lsusb from usbutils package
# And optional uhubctl for power off/on cycle
# Or hardware USB Relay Module LCUS-1 CH340

# [RU] forum thread. Обсуждение
# https://forum.ixbt.com/topic.cgi?id=11:47589-31

# /* This program is free software. It comes without any warranty, to
# * the extent permitted by applicable law. You can redistribute it
# * and/or modify it under the terms of the Do What The Fuck You Want
# * To Public License, Version 2, as published by Sam Hocevar. See
# * http://www.wtfpl.net/ for more details. */

VERSION=0.2.1

showhelp () {
echo "ddrescue-loop v""$VERSION"" перезапускает процесс ddrescue в случае его завершения"
echo "Внимание следует соблюдать очередность аргументов"
echo "Указывать ключи в произвольном порядке нельзя!"
echo "Числовые значения аргументов обязательно через пробел"
echo -n "\n"
echo "# ----- SATA ----- SATA ----- SATA ----- SATA ----- SATA -----"
echo "# Остановить/запустить диск на SATA порту:"
echo "-ata <n> -stop""		""остановить диск на SATA порту <n>"
echo "-ata <n> -scan""		""сканировать SATA порт <n>"
echo -n "\n"
echo "# Запустить восстановление c SATA:"
echo "ddrescue-loop -ata <n> [-loop <n>] [-pwc] [-wait <n>] [-act <n>] outfile mapfile [ddrescue options]"
echo -n "\n"
echo "# Укажите номер SATA порта к которому подключен диск источник:"
echo -n "-ata <n>""		""Номер SATA порта <n> цифра (смотрите вывод dmesg)"
echo -n "\n""			""#: "; ls /sys/class/ata_port
echo -n "\n"
echo "# Функция циклической остановки/перезапуска диска на SATA порту:"
echo "-loop <n>""		""<n> предельное число попыток"
echo -n "\n"
echo "# Таймер ожидания остановки/перезапуска диска:"
echo "-wait <n>""		""Время в секундах <n> [10]"
echo -n "\n"
echo "# Переопределить таймаут ожидания исполнения ATA команд:"
echo "-act <n>""		""Время в секундах <n> [30]"
echo -n "\n"
echo "# ------ USB ------ USB ------ USB ------ USB ------ USB -----"
echo "# Отключить/включить питание USB устройства <ID>, методом <hub/rle>:"
echo "-usb <ID> -pwc hub""	""Использовать uhubctl --search <ID>"
echo "-usb <ID> -pwc rle""	""Использовать USB реле LCUS-1 CH340 RLETTY=""$RLETTY"
echo -n "\n"
echo "# Запустить восстановление c USB:"
echo "ddrescue-loop -usb <ID> [-loop <n>] [-pwc <hub/rle>] [-wait <n>] outfile mapfile [ddrescue options]"
echo -n "\n"
echo "# Укажите Hex идентификаторы VID:PID USB устройства источника:"
echo "-usb <ID>""		""<VID:PID> через двоеточие (смотрите вывод lsusb)"
echo -n "\n"
echo "# Функция циклического перезапуска ddrescue:"
echo "-loop <n>""		""<n> предельное число попыток"
echo -n "\n"
echo "# Основные:"
echo "outfile""			""Устройство приемник данных / файл образа"
echo "mapfile""			""ddrescue map/log файл (обязательно)"
echo -n "\n"
echo "# В конце после mapfile можно указать опции запуска ddrescue через пробел"
echo "# Поддержка зависит от версии. Полный список опций в мануале. Важные:"
echo "-P [<n>]""		""Предпросмотр данных [число строк] по умолчанию 3"
echo "-b 4096""			""<bytes> размер сектора (физического блока) [default 512]"
echo "-c <n>""			""Размер кластера <n> секторов за раз [default 128]"
echo "-O"" #Рекомендую!		""После каждой ошибки заново открывать файл устройства"
echo "-J"" #Опционален		""При ошибке перечитать последний не сбойный сектор"
echo "-r <n> #ИЛИ -r -1""	""<n> число повторных проходов до перехода к trim"
echo "-m <domain.mapfile>""	""Ограничить область чтения доменом <file> ddru_ntfsbitmap"
}

get_ata_host () {
until SCSIHOST=`readlink -f /sys/class/ata_port/ata"$1"/device/host?/scsi_host/host?/` \
&& test -d "$SCSIHOST"; do sleep 1; done
}

get_ata_target () {
until SYSFSTGT=`readlink -f /sys/class/ata_port/ata"$1"/device/host?/target?:?:?/?:?:?:?/` \
&& test -d "$SYSFSTGT"; do sleep 1; done
}

get_ata_dev () {
until INDEV=`readlink -f /dev/disk/by-path/pci-*-ata-"$1"` \
&& test -b "$INDEV"; do sleep 1; done
}

device_delete () {
while test -f "$SYSFSTGT"/delete; do echo 1 > "$SYSFSTGT"/delete; sleep 1; done
}

get_usb_dev_by_path () {
INDEV="/dev/"`basename "$1"`
SYSFSTGT="$1""/device/"
}

get_usb_dev_by_id () {
IDVID=`echo -n "$1" | cut -d ":" -f1`
IDPID=`echo -n "$1" | cut -d ":" -f2`

until get_usb_dev_by_path `udevadm trigger -v -n -s block \
-p ID_VENDOR_ID="$IDVID" -p ID_MODEL_ID="$IDPID"` \
&& test -b "$INDEV"; do sleep 1; done
}

power_cycle () {
if [ -n "$USBID" ] && [ "$PWRCTL" = hub ]; then
uhubctl --search "$USBID" --action cycle --delay "$LOOPWAIT"
elif [ "$PWRCTL" = rle ]; then /bin/echo -en "\xA0\x01\x01\xA2" > "$RLETTY" && \
sleep "$LOOPWAIT" && /bin/echo -en "\xA0\x01\x00\xA1" > "$RLETTY"
fi
}

if [ "$1" = "-h" -o "$1" = "--help" ]; then showhelp
exit; fi

if [ "`whoami`" != "root" ]; then
echo Exit. This script should be run as root !
exit 1; fi

if [ -z "$RLETTY" ] && test -c /dev/ttyUSB0; then RLETTY="/dev/ttyUSB0"
elif [ -n "$RLETTY" ] && ! test -c "$RLETTY"; then
echo "RLETTY=""$RLETTY"" control device not found"; exit 1; fi

if [ -n "$1" ] && [ "$1" = "-ata" ]; then
if [ -n "$2" ] && test -d /sys/class/ata_port/ata"$2"; then
SATAP="$2"; get_ata_host "$SATAP"; shift; shift
else echo -n "Please enter correct port number: "; ls /sys/class/ata_port; exit 1; fi
fi

if [ -n "$1" ] && [ "$1" = "-stop" ] && [ -n "$SATAP" ]; then
get_ata_target "$SATAP"; device_delete; exit; fi

if [ -n "$1" ] && [ "$1" = "-scan" ] && [ -n "$SATAP" ]; then
echo '0 0 0' > "$SCSIHOST"/scan; exit; fi

if [ -n "$1" ] && [ "$1" = "-usb" ] && [ -z "$SATAP" ]; then
if [ -n "$2" ] && lsusb -d "$2"; then
USBID="$2"; get_usb_dev_by_id "$USBID"; shift; shift
else echo "Please enter correct USB Device ID:"
lsusb | cut -d ":" -f2,3 | grep -vi hub
exit 1; fi
fi

if [ -n "$1" ] && [ "$1" = "-loop" ]; then
if [ -n "$2" ] && [ "$2" -gt 0 ]; then
DDLOOP="$2"; shift; shift; fi
else DDLOOP=0
fi

if [ -n "$1" ] && [ "$1" = "-pwc" ]; then
if [ -n "$USBID" ] && [ -n "$2" ] && [ "$2" = "hub" -o "$2" = "rle" ]; then
PWRCTL="$2"; echo "PWRCTL=""$2"; shift; shift
elif [ -n "$RLETTY" ]; then
PWRCTL="rle"; echo "PWRCTL=rle"; shift; fi
fi

if [ -n "$1" ] && [ "$1" = "-wait" ]; then
if [ -n "$2" ] && [ "$2" -gt 0 ]; then
LOOPWAIT="$2"; shift; shift; fi
else LOOPWAIT=10
fi

if [ -n "$1" ] && [ "$1" = "-act" ]; then
if [ -n "$2" ] && [ "$2" -gt 0 ]; then
ATACMDT="$2"; shift; shift; fi
fi

if [ -n "$RLETTY" ] && [ "$PWRCTL" = rle ]; then
stty -F "$RLETTY" 9600 -echo && echo "RLETTY=""$RLETTY"; fi

if [ "$DDLOOP" = 0 ]; then
if [ -n "$USBID" ] && [ "$PWRCTL" = hub ]; then power_cycle; exit
elif [ -n "$RLETTY" ] && [ "$PWRCTL" = rle ]; then power_cycle; exit; fi
fi

if [ -z "$SATAP" ] && [ -z "$USBID" ]; then showhelp
exit; fi

OUTFILE="$1"; shift
MAPFILE="$1"; shift
DDOPTS="$@"

DONE=X
LOOPCOUNT=0

until [ "$DONE" = 0 ]; do

if [ -n "$SATAP" ]; then get_ata_target "$SATAP"; get_ata_dev "$SATAP"
elif [ "$LOOPCOUNT" -gt 0 ] && [ -n "$USBID" ]; then get_usb_dev_by_id "$USBID"
fi

if [ -n "$ATACMDT" ]; then echo "$ATACMDT" > "$SYSFSTGT"/timeout
fi

echo ddrescue "-fd" "$INDEV" "$OUTFILE" "$MAPFILE" "$DDOPTS"
ddrescue "-fd" "$INDEV" "$OUTFILE" "$MAPFILE" $DDOPTS
DONE="$?"

if [ "$DONE" != 0 ] && [ "$DDLOOP" -gt 0 ]; then

  device_delete &
  sleep "$LOOPWAIT"

  if [ -n "$PWRCTL" ]; then power_cycle
  elif [ -n "$SATAP" ]; then while test -d "$SYSFSTGT"; do
  sleep "$LOOPWAIT"; done; fi

  if [ -n "$SATAP" ]; then sleep "$LOOPWAIT"
  echo '0 0 0' > "$SCSIHOST"/scan; fi

  DDLOOP=$(($DDLOOP-1))
  LOOPCOUNT=$(($LOOPCOUNT+1))

  echo "\n\033[1mDDLOOP #""$LOOPCOUNT"; tput sgr0
  date; echo -n "\n"

  sleep "$LOOPWAIT"

else DONE=0
fi
done

This program is free software. This is a commission without any warranty, unless permitted by applicable law. You can repair it and/or change it during the What The Fuck You’re Going to Get Public License, Version 2

How to use Startup options

The use of SATA devices is discussed in my first article. If you haven’t read it, I recommend that you read it first.

Since I’m not very experienced in sh scripting, and I’m not a programmer at all – I wasn’t particularly clever with parsing parameters. Therefore, there are some important restrictions!

The sequence of arguments should be observed. It is not possible to specify the keys in any order!
Numeric values ​​of arguments must be separated.

ddrescue-loop -usb <ID> [-loop N] [-pwc <hub/rle>] [-wait N] outfile mapfile [ddrescue options]

Specify the Hex VID:PID identifiers of the source USB device:
-usb <ID> VID:PID through colon (see lsusb output)

ddrescue loop stop/restart function:
-loop N Limit number of attempts N integer. Must specify.

Device power interruption function:
-pwc hub Use uhubctl --search <ID>
-pwc rle Use USB relay LCUS-1 CH340

Disc Stop/Restart Wait Timer:
-wait N Time in seconds. 10 by default.

At the end after mapfile ddrescue startup options can be specified. They are already processed by ddrescue itself, you can specify everything as usual.

Demonstration of work. Terminal output records

In the event of a sector reading error and after messages in dmesg
uas_eh_device_reset_handler start
uas_eh_device_reset_handler success
Device offlined – not ready after error recovery
I/O error, dev sdd, sector 8985600

The device stops giving data, the process ddrescue is ending
Can’t reopen input file: No such device or address
The script commands the relay to turn on, then restarts ddrescue and reading continues.

How it works

Since device names sda sdb sdc are not permanent and the letter changes depending on the order of connection to the system. I applied the following solution: the script accepts as input the ID VID:PID of the source USB device. It reflects them lsusb. In the script, the code receives the address of the block device /dev/sdX And it does this every time after powering down the drive before restarting the process ddrescue

USB Relay Module LCUS-1 CH340

USB Relay Module LCUS-1 CH340
Option with USB Type-A
aliexpress.com/item/4001216792789.html
aliexpress.com/item/1005001993993906.html
Option with USB Type-C
aliexpress.com/item/1005004323626598.html
aliexpress.com/item/1005004347242232.html

Using a USB relay. This means connecting the power to the USB/SATA drive through the relay contacts COM and NC So that when the power is turned off, the power passes. And when giving the command to turn on the relay, the power was turned off. The script controls the relay using commands:

echo -en "\xA0\x01\x01\xA2" > "$RLETTY"
sleep "$LOOPWAIT"
echo -en "\xA0\x01\x00\xA1" > "$RLETTY"

Default RLETTY=/dev/ttyUSB0 can be overridden by passing an environment variable before running the script:
RLETTY=/dev/ttyUSB1 ddrescue-loop -pwc rle

Also added support uhubctl. In theory, you can use it instead of a relay, but I don’t have a suitable USB hub. It may be useful for restoring flash drives.
ddrescue-loop -usb <ID> -pwc hub

uhubctl --search "$USBID" --action cycle --delay "$LOOPWAIT"

Connecting the relay to the docking station

The AgeStar 31CBNV1C docking station has a power off switch. I connected a relay instead.

Docking station AgeStar 31CBNV1C

The switch has five contacts. Edges for attaching to the board. You need to solder to the second and third from the left. In the photo, I highlighted it in red.

Soldered the switch completely, soldered two wires, tinned the reverse ends and removed it in heat shrink. Assembled the docking station back into the case. Here’s what happened.

Dock station + relay

Directly the recovery process

I write this section including for those who are far from Linux. I am not sure that I will be able to explain clearly, but I will try. Mostly all notes of this kind have been removed under spoilers.

Using a PC with GNU/Linux Debian 11 (haven’t updated to Debian 12 yet due to laziness).
I connect the docking station to the USB3 port. And several 3.5″ hard drives are connected to SATA.

I save the image to a file, to a disk with the Ext4 file system. The file is created as a sparse file, so the space is used only for the actual volume of the copied file, and not for the entire volume of the faulty SSD. At the same time, I mount the file in /dev/loopN it allows you to work with it in the same way as with a physical disk.

It is convenient to use the gnome-disk-utility GUI to create/mount/disable images

gnome-disk-utility used in the Gnome environment. There are relatively few dependencies.

You need to create an image of the same or larger volume. In order not to make a mistake, you can copy and specify the size of the disk in bytes (displayed in gnome-disks in the right panel at the top when selecting the appropriate disk).

Hamburger menu item New Disk Image…, select the size in bytes in the window, paste the copied number into the field and remove the commas. Specify the name and path to save. Press Attach new image…

The created image will be connected to the free one /dev/loopN in read/write mode. When you select and mount an existing image file, it is mounted read-only by default. Don’t forget to uncheck the box Set up read-only device

If the SSD partition table is readable. Before starting the copy, you can build the domain file using the utility ddru_ntfsbitmap (from the composition ddrutility). This allows partitions with the NTFS file system to limit the amount of copying to the occupied space only.

Unfortunately, in the case of SSD, this does not speed up the process, but only saves space for the image. Since zeros from untouched blocks are copied at full speed and without errors. Bad sectors are located where there were some files, and the main part of the time is spent on disk processing of errors.

Creating a domain file using ddru_ntfsbitmap

In the commands below /dev/sdX instead X substitute the appropriate letter for the device (you can see in the same gnome-disk-utility).

You need to create a domain file bound to the entire disk, not to a separate partition. Therefore, you need to specify the device in the commands /dev/sdX not a partition /dev/sdXN

For ddru_ntfsbitmap you need to calculate and specify the value of the input offset (partition offset) key --inputoffset or short -i

First you need to run sudo fdisk -l /dev/sdX
Find the desired NTFS partition in the table and copy the value from the column Start
Then run the command, where you substitute the value instead of START

sudo ddru_ntfsbitmap -i $((START*512)) -m mftdomain.map /dev/sdX domain.map

The searched files will be created domain.map and mftdomain.map

And also a few more files whose names begin with __ (Double underline). They are not needed, they can be deleted.

Start the copying process

The ddrescue-loop script can be copied to /usr/local/bin/

Copy and issue performance rights

sudo zcat ddrescue-loop-v0.2.1.gz > /usr/local/bin/ddrescue-loop
sudo chmod +x /usr/local/bin/ddrescue-loop

First, run it in a separate terminal dmesg -Wt to see what is happening with the disk.

You need to run ddrescue-loop with root rights. Also dmesg, but only in distributions where kernel.dmesg_restrict=1 is enabled by default (Debian is one of them). For the sake of brevity, I will not add the sudo command, but it is implied.

ddrescue-loop -usb 152d:0583 -loop 9999 -pwc rle /dev/loopN mapfile.log -b 4096 -c 32 -O -J

Let’s analyze the following parameters:

  • -usb 152d:0583 this is the VID:PID ID of the docking station (more precisely, the JMicron JMS583 USB-NVMe controller)
    You can view the list of connected devices by running ddrescue-loop -usb

  • -loop 9999 -pwc rle limit the number of restart attempts in a cycle and use a USB relay.

  • /dev/loopN Where N replace with the appropriate number. It is assumed that the receiver file has been created and mounted (you can see it in gnome-disk-utility)
    Instead, you can simply specify where to save the image file, then ddrescue will create it itself.

  • mapfile.log The name of the ddrescue map/log file must be specified.

  • -b 4096 be sure to indicate the real size of the sector (physical block)
    The default is 512 and this does not correspond to modern drives, both SSD and HDD.

  • -c 32 limit the size of the cluster (how many sectors ddrescue will try to read at a time in normal mode before switching to trimming). The default is 128, and since we’ve increased the sector size, that’s already too much.

  • -O be sure to specify. So that ddrescue tries to open the device file again after every error. This is necessary so that if further reading is impossible, the ddrescue process ends with an error and the script uses the disk stop/restart method.

  • -J we also indicate that this is an additional check – in case of an error, reread the last non-failing sector.

If you created a domain file, then run the first pass with the addition of the launch line at the end. -m mftdomain.map
Then, when the entire MFT has been read, we read all other sectors involved in the file system. -m domain.map

Fine tuning in progress

If stopped ddrescue on Ctrl+C to change the parameters before restarting, you can use the relay or uhubctl (if we use the hub instead of the relay)
ddrescue-loop -pwc rle or ddrescue-loop -usb <ID> -pwc hub

At the initial stage, you can (but not necessarily) try to proofread large problem-free areas first. To do this, you can jump over a cluster of troubles by adding an option at the end -i Example -i 30G if the reading is in the forward direction. And you can read backwards, for that point indicate -R and -s Example -R -s 40G

ddrescue-loop -usb 152d:0583 -loop 9999 -pwc rle -wait 4 -act 23 /dev/loopN mapfile.log -b 4096 -c 32 -O -J -m domain.map

Added here -wait 4 i.e. reduced the wait timer from 10 to 4
-act 23 waiting times for ATA commands have been reduced from 30 to 23

These parameters were selected experimentally for the described case in order to try to reduce the waiting time for reconnection in case of SSD errors.

Value -act selected with an eye on the number of messages uas_eh_device_reset_handler start uas_eh_device_reset_handler success dmesg after each failed sector. At 30, the core managed to make two reset attempts. At 22, there are already three resets – and it takes longer. The optimal value was 23.

When the main stages of the proofreading process are completed, the block size can be increased in the scraping mode for acceleration, while the quality will suffer.
For example, specify -b 16Ki -c 1 or -b 32Ki -c 1 instead -b 4096 -c 32

When the process is nearing completion

In this case, reading one SSD, including trimming, took about 14 days. But the scraping is not finished yet.

Intermediate/98.51% final result

File systems from partitions of the resulting image can be mounted by means of the Linux kernel immediately from /dev/loopNpPwhere P – Section number. This can be done using gnome-disk-utility. At the same time, it must be in read mode only. It is safer to disconnect the image and reconnect in read-only mode.

But to improve the results, searching for MFT fragments and files by signature, it is better to use specialized software. From free TestDisk/PhotoRec.

I have been using the Linux version of DMDE for a long time. Unfortunately, weekends are closed, this is a paid software. However, the free version (for personal non-commercial use only) only limits the number of files that can be recovered at the same time, but not their size. This is great for assessing the recoverability of files.

The use of such software and the analysis of various related nuances will definitely not fit into this article. And my knowledge about the device of file systems is very superficial.

Conclusion

Recovery in this case I consider successful. User files are extracted and read. For the sake of fairness, it should be noted that not everything is normal, but it hardly makes sense to try to scratch further. Since only problem areas with a total volume of 1.54GB remained and quite possibly 90% of them are unreadable.

I hope someone else will need my experience and script. I think that the need for data recovery from SSD does not become less. Don’t forget to remind users about backups.

Thank you for attention! And good experiments!

Related posts