Sparta, WI, September 2025
The Good
We helped Bob get a new computer, and we helped Brenda with her new computer. Steve has been having some computer trouble recently, so he asked for some help too. It turns out that his computer was stalling out while he was using it, which was causing anger management issues.
The short version is that we saved Steve’s data. The long version is that it took about 24 to 32 hours to solve the problem. Some of it was wasted preemptive research because we did not yet know what was being asked of us.
The Bad
We thought Steve was looking to get a new computer, so John did eight to twelve hours of research on budget computers. As it turned out, Steve just wanted to hand his computer off to have “stuff” uninstalled from the laptop to make it “go faster”. Sadly for him, that is not how these things work. First, uninstalling stuff seldom improves computer speed enough to matter. Second, it is hard to solve a customer’s problem with no understanding of it. We stopped by and talked to him about his budget and how he used the computer because that significantly impacts the solutions. On an unrelated note, we have a list of budget laptops that were on sale in Sparta last week, if you want it.
We learned a few things.
- He only uses a few programs on the laptop for his business. He does not need it for anything else.
- The programs are all business applications, so they are not very demanding.
- The laptop completely stalls out for an extended time (30+ seconds) and becomes unresponsive.
- The laptop gets extremely hot.
There are some common reasons that the Windows OS can stall – virus scanning, installing updates, CPU throttled down due to heat, too many programs running for the CPU/RAM to handle. Usually, these completely stall out for a few seconds and are just slow performing tasks.
There are a few common reasons that a laptop can get very hot – poor design, failed fans, too must pet hair or dust, or an older overworked CPU (especially older AMD CPUs). About this time, Steve mentions that it gets too hot to keep on his lap…
Wait, what? Too hot to keep on your lap?!?! As it turns out, the laptop cooling vents are on the bottom. While using it, Steve was blocking all airflow to the computer. He appears to enjoy roasting his chestnuts over an AMD CPU. ‘Tis the season!
We warned Steve that heat is bad for a computer and can damage it. I think we told him about a friend of John’s that let her six year old daughter play games on her laptop. The daughter put it on a pillow in her lap, which blocked the cooling vents. Excessive heat is especially bad on drives. The friend’s hard drive failed, and John was enlisted to recover data. While examining the drive, the internal statistics showed that the hard drive got hot enough to boil an egg. Boiling eggs on your hard drive is no more recommended than roasting your chestnuts over your CPU.
Steve finished his invoices the next day … with the laptop on his lap … and handed off the laptop to us.
The first thing we did was buy a thumb drive and back up Steve’s business data. (You back up your data, right?) During this process, we found out what the real issue was. Want to guess what the problem turned out to be? If you guessed failing hard drive, give yourself a cookie. (No, a web browser cookie. You can get them from almost any internet site.) John’s best guess is that, while Steve was working, the virus scanning or search indexing would hit a bad file and stall access to the hard drive until it timed out.
The file backup kept stalling out on the same file during the transfer. When you let the copy run until it timed out, Windows reported an I/O error. These are common symptoms of hard disk drive failure – failing on the same file with an I/O error. It took a few hours to finish copying files, but his business data seemed largely intact. He lost one old invoice from 2019. It is hard to know it anything was corrupted, until he looks at the file.
He could buy a new computer, but part of the cost with buying computers is the software. Even if Steve bought a cheap replacement computer, it would be an additional $200+ in software. Maria had a laptop with one of the programs he needed installed, and Kate did some research to find that the other could be purchased for just over $100. Maria’s laptop is a third viable alternate solution to fixing Steve’s laptop or buying a new laptop and software. Fixing the laptop would be cheapest at under $100. Using Maria’s laptop would be next at $100+, and buying a new computer would be $500+ with software.
John has not done this in a while, so it took some time to remember how to get certain things done and learn newer better ways to do other things. He was able to copy an image off of Steve’s failing drive onto a new drive. The image was corrupt, but Windows was able to recover and boot. The damage appeared to be in the first 4GB – 22GB, which was likely repairable operating system files. Most commercial laptops (i.e, Helwett-Packard, Dell, etc.) have recovery software built-in that will fix minor issues. The video driver was also corrupt, so he fixed that. It should properly go into sleep mode again.
Now the laptop just needs to be tested by the user to make sure none of the programs or data he needs are corrupted. The user should consider not treating the laptop like a combination easy bake oven and basketball. He should buy one of those squeeze toys where the eyes bulge out when you squeeze them? Is this a possible Christmas present? ‘Tis the season!
And, The Ugly
Omitting the steps that were irrelevant, such as trying to access a non-existent directory on his remote storage device, this is how John recovered the hard drive. (You might want to just skip this section, just read the background, or just nod and smile.)
Background
A hard drive is a type of storage device. The two common styles today are HDDs, hard disk drives with spinning magnetic platters, and SSDs, solid state drives with persistent memory chips. A HDD is similar in concept to a record player that can read and write. The main differences are that it has a stack of permanently installed magnetized platters instead of a single plastic disk and a magnet on the tonearm instead of a needle.
At the most basic level, the data is stored by magnetizing very small areas on the platter in concentric rings. In modern drives, the information written to the platter is encoded so that the data can still be decoded when there are a few errors reading the small magnetized areas. When an area of the drive has read errors too many times, the HDD remaps that bad region to a pool of spare good regions on the drive. When the drive starts to fail, you have more read errors than you can correct and/or no more spare good locations to map the bad locations. Under normal conditions, this is when you start losing data a little at a time. Under abnormal conditions, you trip and drop your HDD into a wood chipper and lose everything all at once.
Ones and Zeros, bits, are digital letters. (Can you imagine how short the alphabet song would be with only two letters?) These bits are encoded and written to the HDD by magnetizing small areas of the platter. The arrangement of the data on the drive depends greatly on the application. For modern computers, a HDD is segmented into partitions. A partition is just like what it sounds like. A defined region of the HDD, like fences in a community. This land is my land. That land is your land. Or maybe, this platter is my partition; that platter is your partition.
When you look at your computer, you might see Drive C: and Drive D:. Those could be on partitions on two different HDDs or they could be two partitions of the same HDD. A file system is a way to index and find data on a partition. If the books in a library were your files, the old card catalogue would be your file system. When you put a file system on a drive or partition, it becomes a volume from which your computer can read and write files.
The sequential bits of an entire drive are called an image. Since it is an image of the drive data, the image can contain multiple partitions. Modern comercial Windows computers purchased in a store usually have multiple partitions. These are for booting, Windows system repair, Windows system restore, and the installed Windows operating system that you use from day to day.
This should be enough information to understand the discussions below, but the commands require more knowledge about Linux..
Save the business data
This only works with a laptop that still boots and functions to some degree.
This was a reasonable option because it prioritized saving Steve’s business data, but it put some unnecessary stress on the drive.
- Boot
- Log on
- Insert thumb drive
- Copy files with explorer.
The computer was still functioning with stalls after this step.
Cleanup the drive errors
The system was completely stalling on some files and taking 15 minutes to time out. Operating systems and hard drives are designed to handle failures. If the drive fails enough times reading the same data, the drive will relocate the data to a new sector. Trying read an image from a drive encountering errors can be very slow. There are people with drive recovery programs running for weeks to recover their drive. Checking the disk was intended to force the bad data to be copied into good areas.
This was not a great option because it put significant unnecessary stress on a failing drive. If drive is too far gone, this can cause catastrophic failure. It did make the laptop not boot for a while, but it did not destroy the drive.
- chkdsk C: /r /f /x
- You can run chkdsk while windows is running, but it does not run against the actual disk data. Instead, you want to schedule a scan on boot. Once the boot scan starts, you can not safely stop it. It can take hours. Steve’s took about 2 hours. (FYI, it appears that /x includes /f.)
- You can run chkdsk while windows is running, but it does not run against the actual disk data. Instead, you want to schedule a scan on boot. Once the boot scan starts, you can not safely stop it. It can take hours. Steve’s took about 2 hours. (FYI, it appears that /x includes /f.)
- It should automatically reboot
The computer booted once after this step, but it failed to boot a second time. (The failing to boot might have been due to a cat getting on the desk and jiggling the power loose, so it ran out of battery right at the end of the scan.) With continued use, the laptop would have started failing to boot within the next few weeks to month. After imaging the drive, windows repair was able to make the system boot again.
Attempt to image the drive
This is an attempt to save as much of the drive as possible. This can save the windows install, installed programs, and business data. The original option was to use the Linux “dd” (data dump) command, but “dd” does not handle read errors. A recovery variant called “ddrescue” is intended for this purpose.
After some research, this is the smartest option, and it is where we should have gone immediately after finding out that the drive was failing. If your drive is failing, you want to spend any remaining life of the drive recovering the data as gently as possible.
Linux has free tools that can be used for system recovery, but it requires a lot more work and knowledge than running paid software. Fedora is the version of Linux that John generally uses. Since he could not find an existing CD, he had to burn a new one for this recovery. Since Steve’s computer was not recognizing our external CDROM for booting, John pulled the hard drive out and put it in one his computers with an internal CDROM.
Remove the Hard Drive
(For this step, we do this for Steve’s laptop and the recovery laptop.)
- Remove screws from back cover
- Gently pry off back cover
- Ground both hands on the metal chassis
- Remove battery or disconnect the power connector for the battery
- Press the power button to discharge residual power
- Unscrew hard drive rails from laptop
- Disconnect hard drive connector (avoid touching the circuit board)
- Unscrew the hard drive rails
- Do not lose the screws
Install the Hard Drive
(For this step, install the failing hard drive in the recovery laptop.)
- Find the screws you probably lost
- Ground both hands on the metal chassis
- Screw on laptop drive rails
- Connect failing hard drive
- Screw hard drive rails into laptop
- Insert battery
- Screw back cover onto the recovery laptop
Boot to Fedora Live
- Configure recovery laptop BIOS to boot to CDROM
- Insert Fedora Live CD into CDROM drive
- Boot
- Confirm boot to Fedora Live CD and NOT install Linux to hard drive
Install ddrescue
The “ddrescue” application does not come standard on the live CD, so it has to be installed from a terminal window.
sudo dnf install ddrescue
Verify the Device to Use
Linux uses the term “device” as the name of many of the low level hardware and software entities. Devices are located under the /dev directory. On Unix-ish systems, drive can appear as /dev/hd* or /dev/sd*. If you have more than one drive, you really want to make sure that you image the right one. There are several tools that can do this, but “lsblk” and “mount” comes with Linux.
sudo lsblk
or
sudo lsblk -o MODEL,SERIAL,SIZE,STATE,FSTYPE
or
mkdir ./Local
sudo mount -r -t ntfs /dev/sdb3 ./Local
# After you are done looking, unmount Local with “umount”
sudo umount ./Local
If the drives are different sizes and partitions, the first is adequate. If the drives have different manufacturers and models, the second one is better. If they are the same models and layout, you might need the third option to see the files on the drive. Steve’s drive had different partitions and model, so it was easy to identify it as “/dev/sdb”.
Mount the Remote Storage
We have an 8TB network attached storage device, so that was the most convenient place to put the disk image. It has to be mounted onto the Linux filesystem to write to it. Leaving off the password causes the mount command to prompt for the password, which can work better when you have special characters in your password.
mkdir ./Remote
sudo mount -o username=user_name -t cifs //ip_address/homes ./Remote/diskimage
If your recovery system can hold two drives, you can do a direct drive to drive transfer. This is faster, but it can be riskier too. If you reverse the devices, you wipe out all of your data.
First Pass Low Stress Recovery
The “ddrescue” application is designed to do smart recovery. First, it reads the drive from the front to the back, and it skips ahead when it reaches an error. Next, it reads the drive from back to front, and it skips when it reaches an error. This algorithm reads the patches of good data without stressing the drive by trying to read the bad data. It writes what it learns about good and bad areas into a mapfile that is used to avoid stressing the drive by reading data that has already been recovered.
sudo ddrescue -n /dev/sdb ./Remote/sdb-n.img ./Remote/sdb-n.map
This took 7.5 hours to complete. There were about 250KBs of data that was not recovered. Based on the location of the data on the drive, it was probably parts of the operating system. The pattern seemed to be a stripe of damaged area on the platter, like someone tossed their laptop on the couch and the arm scratched the platters. We are sure this never actually happened …
Copy the Disk Image
The recovery pass of “ddrescue” puts more stress on the drive, and it updates the image file with any bad areas that are recovered. To avoid damaging the initial image, a copy of the image is used for this pass.
sudo cp ./Remote/sdb-n.img ./Remote/sdb-r.img
sudo cp ./Remote/sdb-n.map ./Remote/sdb-r.map
Copying the image took 2.5 hours.
Second Pass High Stress Recovery
This copy will try to read the bad areas repeatedly to see if it can get the data off the drive. It puts a lot of stress on the drive. This configuration does three passes trying to read the data.
sudo ddrescue -d -r3 /dev/sdb ./Remote/sdb-r.img ./Remote/sdb-r.map
This recovery attempt took about two hours, and it did not collect any new data. You can configure it to infinitely read the data until the drive relocates it. This is very stressful, and we wanted the drive available to image again, in case something went wrong.
Read the Partition Table
A drive has a table on it that explains what the partitions on the drive are. If this information is corrupt, it is much more difficult to recover data. It probably requires scanning the drive for patterns that indicate that there is a partition. Fortunately, this was not required. The partition table should be saved to a text file. If the “parted” application is not available, you might need to install it with “dnf”.
# only if parted is not already available
sudo dnf install partedsudo parted ./Remote/sdb-r.img
> unit B
The table should give the offsets to the start of the partitions on the hard drive in bytes.
Examine the Disk Image
You can mount the image on Linux to look at what was recovered. The # with offset is the offset to the partition in the image from the partition table above. If the partition is FAT32, use “vfat” instead of “ntfs”
mkdir ./Local
sudo mount -o loop,ro,offset=# -t ntfs ./Remote/sdb-r.img ./Local
# When you are done, unmount it with “umount”
sudo umount ./Local
Shutdown the recovery Laptop
You do not want the failing drive to be idling and active, so shutdown the recovery laptop.
Restore the Hard Drive
Once you have an image, you can restore it on a functioning hard drive. If the failing drive was not too damaged, the system will recover and be usable again.
Remove Failing Hard Drive
Remove the failing hard drive from the recovery machine, as noted above.
Install New Hard Drive
Install the new hard drive into the recovery machine, as noted above.
Boot to Fedora Live
Same as the steps above, except the BIOS should already be set to boot to CDROM.
Mount the Remote Storage
Same as the steps above. Since it is a “live CD”, the Remote dir your created does not persist between boots.
Verify the Device to Use
Some systems are not consistent about what device is assigned, so verify which device you want to write to. It should be easier, since the new drive should be blank. It is still /dev/sdb on the recovery laptop.
Copy the Image to the Drive
There may be bad data in the disk image, but the image on the remote storage should be readable without errors. Using “dd” should be fine for this part, and it should be faster. We copy data in 4M chunks (because it is faster) from the image file to the hard drive device.
sudo dd if=./Remote/sdb-r.img of=/dev/sdb bs=4M status=progress
Shutdown Recovery Laptop
We need to re-install the hard drive, so shutdown the recovery laptop.
Remove New Drive from Recovery Laptop
As noted above, remove the new laptop from the recovery laptop.
Install the New Hard Drive
As noted above, install the new hard drive to replace the one that was failing.
Test Repaired Laptop
Just because the laptop boots and runs, does not mean it will be stable. There could be damage to other components from the heat. There could be corrupted data that has not bee touched, yet. Only time will tell on this.
- Boot the repaired laptop
- Login
- Let Windows do any system checks, drive repairs, and/or updates
- Test the required functionality
Revert the Recovery Laptop
- As noted above, install the original drive into the recovery laptop
- Revert the BIOS to boot from the hard drive instead of the CDROM
- Boot the recovery laptop and verify that it functions.