Linux Recovery: Cannot SSH to Linux VM due to file system errors (fsck, inodes)
When a Linux VM requires fsck to repair possible file system issues, manual intervention will be required. Below you can see four examples on how to identify file system issues by looking at the boot diagnostics on a given VM under:
Virtual Machines > VMNAME > All settings > Boot diagnostics
Example (1)
Checking all file systems. [/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/sda1 /dev/sda1 contains a file system with errors, check forced . /dev/sda1: Inodes that were part of a corrupted orphan linked list found. /dev/sda1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY |
Example (2)
EXT4-fs (sda1): INFO: recovery required on readonly filesystem EXT4-fs (sda1): write access will be enabled during recovery EXT4-fs warning (device sda1): ext4_clear_journal_err:4531: Filesystem error recorded from previous mount: IO failure EXT4-fs warning (device sda1): ext4_clear_journal_err:4532: Making fs in need of filesystem check . |
Example (3)
[ 14.252404] EXT4-fs (sda1): Couldn't remount RDWR because of unprocessed orphan inode list. Please unmount/remount instead An error occurred while mounting /. |
Example (4) - This one in particular is the result of a clean fsck. In this specific case there is also additional data disks attached to the VM (/dev/sdc1 and /dev/sde1)
Checking all file systems. [/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/sda1 /dev/sda1: clean, 65405/1905008 files, 732749/7608064 blocks [/sbin/fsck.ext4 (1) -- /tmp] fsck.ext4 -a /dev/sdc1 [/sbin/fsck.ext4 (2) -- /backup] fsck.ext4 -a /dev/sde1 /dev/sdc1: clean, 12/1048576 files, 109842/4192957 blocks /dev/sde1 : clean, 51/67043328 files, 4259482/268173037 blocks |
To recover the VM back to a normal state you will need to delete the inaccessible VM and keep its OSDisk and deploy a new recovery VM using the same Linux distribution and version as the inaccessible VM.
NOTE: We highly recommend making a backup of the VHD from the inaccessible VM before going through the steps for the recovery process, you can make a backup of the VHD by using Microsoft Storage Explorer, available at https://storageexplorer.com
The steps are described below:
A = Original VM (Inaccessible VM)
B = New VM (New Recovery VM)
- Stop VM A via Azure Portal
- For Resource Manager VM, we recommend to save the current VM information before deleting
- Azure CLI: azure vm show ResourceGroupName LinuxVmName > ORIGINAL_VM.txt
- Azure PowerShell: Get-AzureRmVM -ResourceGroupName $rgName -Name $vmName
- Delete VM A BUT select “keep the attached disks”
NOTE: The option to keep the attached disks is only available for classic deployments, for Resource Manager deleting a VM will always keep its OSDisk by default. - Once the lease is cleared, attach the Data Disk from A to VM B via the Azure Portal, Virtual Machines, Select “B”, Attach Disk
- On VM “B” eventually the disk will attach and you can then mount it.
- Locate the drive name to mount, on VM “B” look in relevant log file note each Linux is slightly different.
- grep SCSI /var/log/kern.log (ubuntu, debian)
- grep SCSI /var/log/messages (centos, suse, oracle, redhat)
- You will not be able to mount the file system so we have to check the correct file system that we need to run the disk check.
sudo -i
fdisk -l (this will return the attached disks, use it with also df -h)Sample outputs from both commands:
# fdisk -l
Disk /dev/sdc: 32.2 GB, 32212254720 bytes
255 heads, 63 sectors/track, 3916 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000c23d3
Device Boot Start End Blocks Id System
/dev/sdc1 * 1 3789 30432256 83 Linux
/dev/sdc2 3789 3917 1024000 82 Linux swap / Solaris# df -hFilesystem Size Used Avail Use% Mounted on
/dev/sda1 29G 2.2G 25G 9% /
tmpfs 776M 0 776M 0% /dev/shm
/dev/sdb1 69G 180M 66G 1% /mnt/resourceAfter looking at the output of the above commands we can see that sda1 and sdb1 are mounted as part of the local OS, sdc1 is not mounted so, in this case, we will run fsck against /dev/sdc1.NOTE: Prior to running fsck - Please capture the following data and send Microsoft Support Engineer the *.log files (sdc and sdc1 are used as examples)fdisk -l /dev/sdc > /var/tmp/fdisk_before.log
dumpe2fs /dev/sdc1 > /var/tmp/dumpe2fs_before.log
tune2fs -l /dev/sdc1 > /var/tmp/tune2fs_before.log
e2fsck -n /dev/sdc1 > /var/tmp/e2fsck._beforelogNow proceed to run fsck on the desired partition
fsck -yM /dev/sdc1
fsck from util-linux-ng 2.17.2
e2fsck 1.41.12 (17-May-2010)
/dev/sdc1: clean, 57029/1905008 files, 672768/7608064 blocks
- Detach the disk from VM B via the Azure portal
- Recreate the original VM A from the repaired VHD
For a Classic VM:
Recreate the original VM A (Create VM from Gallery, Select My Disks) you will see the Disk referring to VM A – Select the original Cloud Service name.
For a Resource Manager VM you will need to use either Powershell or Azure CLI tools, the articles below have steps to recreate a VM from its original VHD:
Azure PowerShell: How to delete and re-deploy a VM from VHD
Azure CLI: How to delete and re-deploy a VM from VHD