joongul
Posts: 5
Joined: Sun Nov 10, 2019 12:52 am

lvm: pvmove crashed, how to recover?

Fri Nov 29, 2019 2:12 am

Hello,

My system consists of rasberry pi 4 with raspian using the 64bit kernel. My home directory is on an lvm raid-5 volume, with two 4T drives and a 2T drive, with cache on SSD. New 4T drive arrived yesterday and I started pvmove from the 2T drive to new one. pvmove failed with following dmesg:

[ 9667.686238] INFO: task usb-storage:176 blocked for more than 120 seconds.
[ 9667.686246] Tainted: G C 4.19.75-v8+ #1270
[ 9667.686252] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9667.686260] usb-storage D 0 176 2 0x00000028
[ 9667.686275] Call trace:
[ 9667.686286] __switch_to+0x94/0xf0
[ 9667.686297] __schedule+0x2f4/0x870
[ 9667.686308] schedule+0x3c/0xa0
[ 9667.686317] schedule_timeout+0x1f4/0x430
[ 9667.686325] wait_for_common+0xc0/0x180
[ 9667.686334] wait_for_completion+0x28/0x38
[ 9667.686346] usb_sg_wait+0xf4/0x160
[ 9667.686357] usb_stor_bulk_transfer_sglist.part.3+0xd0/0x138
[ 9667.686366] usb_stor_bulk_srb+0x90/0xa0
[ 9667.686376] usb_stor_Bulk_transport+0x138/0x370
[ 9667.686385] usb_stor_invoke_transport+0x54/0x4b0
[ 9667.686395] usb_stor_transparent_scsi_command+0x28/0x38
[ 9667.686405] usb_stor_control_thread+0x1c0/0x260
[ 9667.686415] kthread+0x104/0x130
[ 9667.686424] ret_from_fork+0x10/0x18

I rebooted from my rescue system in sd card, and gets the following dmesg during booting:

[ 15.565790] device-mapper: raid: Failed to read superblock of device at position 2
[ 15.601120] md/raid:mdX: not clean -- starting background reconstruction
[ 15.601216] md/raid:mdX: device dm-11 operational as raid disk 0
[ 15.601226] md/raid:mdX: device dm-13 operational as raid disk 1
[ 15.608330] md/raid:mdX: cannot start dirty degraded array.
[ 15.630134] md/raid:mdX: failed to run raid set.
[ 15.630146] md: pers->run() failed ...
[ 15.630250] device-mapper: table: 253:17: raid: Failed to run raid array
[ 15.630260] device-mapper: ioctl: error adding target to table

So I tried to see the current state of the logical volume:

sudo lvs -a
Number of segments in active LV VG/pvmove0 does not match metadata.
Number of segments in active LV VG/pvmove0 does not match metadata.

and the logical volume VG/home is inactive. I have tried vgcfgrestore, both from backup file and from older archive files but cannot make the volume active. All the HDs appear working in normal condition.

My questions:
1. pvmove failed but I would think 2 disks out of 3 for my raid5 remain sound, so I should be able to reconstruct my logical volume. How do I do it? I suspect I need to manually edit the metadata, but I don't know what I am doing and I am scared. Google doesn't give me enough information either...

2. Important data is backed up so I can delete the volume and start from scratch, which might be an easier way. But if I try lvremove, I get:

sudo lvremove VG/home
Cannot rename locked LV home_corig_rimage_2
Failed to uncache VG/home.

How can I remove the lock and delete the volume, if I cannot reconstruct the volume?

3. My experience does not give me confidence about lvm+raid. Should I avoid lvm altogether and just use mdadm? I think I will use zfs when it is available as a package.

swampdog
Posts: 300
Joined: Fri Dec 04, 2015 11:22 am

Re: lvm: pvmove crashed, how to recover?

Fri Nov 29, 2019 6:05 am

My system consists of rasberry pi 4 with raspian using the 64bit kernel
I'd suspect the 64bit kernel..

Code: Select all

[ 9667.686238] INFO: task usb-storage:176 blocked for more than 120 seconds.
..or your usb adapter.

I'm afraid if you really wanted this data you should have not gone bleeding edge and done more testing. :-|

'pvmove' (like 'lsof') does not "fail". It can be interrupted any time before completion. It will continue from where it left off when re-run. What it does do, by its nature, is hammer at least two disks so if your storage wasn't 100% rock solid then faults are likely to trigger. I went through multiple usb/sata adapters before I found ones that work - sometimes they'd work for days but randomly fail. Eventually I went for a powered adapter.

Do not use raid5. It needs four drives and only one can fail. Use raid6 or just use mirroring, raid1. Personally I don't see much advantage to using software raid on an rpi atm. A minimal reasonable disk linux server uses three mdadm devices across 4 disks..

/dev/md0 raid1 /boot (/dev/sda1 /dev/sdb1 +(/dev/sdc1,/dev/sdd1)hot
/dev/md1 raid6 OS (/dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2)
/dev/md2 raid6 DATA (/dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3)
..with grub MBR install on /dev/sda,/dev/sdb. the rpi has no grub so /dev/md0 is pointless and as LVM can do mirroring why bother with /dev/md1,/dev/md2?

From what you've posted it sounds like you've only ever had a degraded (3 disk) raid5 and you aren't actually using software raid but LVM mirroring. It's likely by trying commands at random you've destroyed any chance of getting the data back but if you want to attempt it..

Code: Select all

lvs -a -o +devices
lvmdump -a -m
..will give you a hint what state things are in. Ordinarily one would first attempt to continue the 'pvmove' but if your underlying issue is hardware then this would be a bad idea. Second option is 'pvmove --abort'. Third option is to wade through the files in that lvmdump to find the LV's that are locked & figure out what options, if any, you have (may involve 'lvconvert').

User avatar
rpdom
Posts: 15929
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

Re: lvm: pvmove crashed, how to recover?

Fri Nov 29, 2019 6:32 am

swampdog wrote:
Fri Nov 29, 2019 6:05 am
Do not use raid5. It needs four drives and only one can fail.
RAID 5 requires a minimum of three drives of which at least two need to be operational for the data to be accessible.

swampdog
Posts: 300
Joined: Fri Dec 04, 2015 11:22 am

Re: lvm: pvmove crashed, how to recover?

Fri Nov 29, 2019 7:54 am

rpdom wrote:
Fri Nov 29, 2019 6:32 am
swampdog wrote:
Fri Nov 29, 2019 6:05 am
Do not use raid5. It needs four drives and only one can fail.
RAID 5 requires a minimum of three drives of which at least two need to be operational for the data to be accessible.
Depending how LVM is configured an extra drive may be required for logging (--mirrorlog). I'm playing it safe because we don't know what instructions the OP used. Many places are just rehashes of old instructions whereas LVM as advanced. Additionally, I've never seen LVM raid5 in the wild. I'm supposed to be configuring a centos8 "stream" VM next week. I might just try it with raid5 first as I'm curious.

User avatar
rpdom
Posts: 15929
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

Re: lvm: pvmove crashed, how to recover?

Fri Nov 29, 2019 8:06 am

I'll admit I've never used LVM RAID by itself. I've always used LVM on top of MD which handled the RAID side of things.

swampdog
Posts: 300
Joined: Fri Dec 04, 2015 11:22 am

Re: lvm: pvmove crashed, how to recover?

Fri Nov 29, 2019 8:39 am

rpdom wrote:
Fri Nov 29, 2019 8:06 am
I'll admit I've never used LVM RAID by itself. I've always used LVM on top of MD which handled the RAID side of things.
Yep. It's unusual.

I've just found some info on redhat site. Since rhel 6.3 the default changed away from "segment type mirror" to "segment type raid1" so it will depend what instructions the OP followed. It also leverages mdadm and there is no --mirrorlog option in that context.

Probably need a redhat account but here's the base url https://access.redhat.com/documentation ... id_volumes.

I have fixed a normal rpi4 LVM ssd by attaching it to a linux box. I dunno (architecture) if it would be possible to continue a 'pvmove' but as I was able to activate the rpi ssd VG and edit files it's not to be ruled out.

User avatar
rpdom
Posts: 15929
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

Re: lvm: pvmove crashed, how to recover?

Fri Nov 29, 2019 9:16 am

swampdog wrote:
Fri Nov 29, 2019 8:39 am
I have fixed a normal rpi4 LVM ssd by attaching it to a linux box. I dunno (architecture) if it would be possible to continue a 'pvmove' but as I was able to activate the rpi ssd VG and edit files it's not to be ruled out.
pvmove should be able to resume regardless of architecture. It always performs safe updates: Copy data to new volume. If successful, update pointers to that data and flush buffers before marking old data as unused. That is as long as the underlying hardware doesn't fail - in this case the mdadm RAID seems to have hit a problem due (possibly) to a failure of the USB adaptor or its driver.

The first thing to do would be to try and recover the mdadm RAID array. Then look at resuming the pvmove.

joongul
Posts: 5
Joined: Sun Nov 10, 2019 12:52 am

Re: lvm: pvmove crashed, how to recover?

Fri Nov 29, 2019 9:30 am

Hello,

I guess I did not make myself clear in the original post. I am using lvm raid5 volume for /home, by using lvconvert --type raid5 to convert a linear volume into raid5. The following is what I get from lvs command.

Code: Select all

root@courante:~# lvs -a -o +devices
  LV                     VG       Attr       LSize   Pool    Origin       Data%  Meta%  Move     Log Cpy%Sync Convert Devices                                                             
  [chome]                courante Cwi---C--- 100.00g                                                                  chome_cdata(0)                                                      
  [chome_cdata]          courante Cwi---r--- 100.00g                                                                  chome_cdata_rimage_0(0),chome_cdata_rimage_1(0)                     
  [chome_cdata_rimage_0] courante Iwi---r---  50.00g                                                                  /dev/sdd(0)                                                         
  [chome_cdata_rimage_1] courante Iwi---r---  50.00g                                                                  /dev/sde(0)                                                         
  [chome_cmeta]          courante ewi-------  40.00m                                                                  /dev/sdd(12800)                                                     
  home                   courante Cwi---C---   2.00t [chome] [home_corig]                                             home_corig(0)                                                       
  [home_corig]           courante rwi---C---   2.00t                                                                  home_corig_rimage_0(0),home_corig_rimage_1(0),home_corig_rimage_2(0)
  [home_corig_rimage_0]  courante Iwi---r---   1.00t                                                                  /dev/sdb(786435)                                                    
  [home_corig_rimage_0]  courante Iwi---r---   1.00t                                                                  /dev/sdb(0)                                                         
  [home_corig_rimage_1]  courante Iwi---r---   1.00t                                                                  /dev/sda(1)                                                         
  [home_corig_rimage_2]  courante Iwi---r---   1.00t                                                                  pvmove0(0)                                                          
  [home_corig_rmeta_0]   courante ewi---r---   4.00m                                                                  /dev/sdb(786434)                                                    
  [home_corig_rmeta_1]   courante ewi---r---   4.00m                                                                  /dev/sda(0)                                                         
  [home_corig_rmeta_2]   courante ewi---r---   4.00m                                                                  pvmove0(0)                                                          
  [lvol0_pmspare]        courante ewi-------  40.00m                                                                  /dev/sdd(12810)                                                     
  [pvmove0]              courante p-c---m---   1.00t                                    /dev/sdf                      /dev/sdf(1),/dev/sdc(0)                                             
  [pvmove0]              courante p-c---m---   1.00t                                    /dev/sdf                      /dev/sdf(0),/dev/sdc(262144)         
  
At the moment I cannot detect any hardware error: sudo pvs says all drives are working, and dmesg or /var/log/syslog reports no hardware error. ssd drive used for cache has usb quirks option enabled and since then gave me no problem.

pvmove is aborted but I don't know what to do with the internal volume pvmove0, which makes the volume impossible to remove.

lvchange has options to check/repair or rebuild but it seems it can’t be applied to a cached volume. I can’t remove cache with lvconvert —uncache command either because of the pvremove0 volume.

If nothing else works I guess I could remove the metadate related to /home altogether and restore the /home partition from backup, but I am curious if there's a better way.

swampdog
Posts: 300
Joined: Fri Dec 04, 2015 11:22 am

Re: lvm: pvmove crashed, how to recover?

Fri Nov 29, 2019 10:01 am

rpdom wrote:
Fri Nov 29, 2019 9:16 am
The first thing to do would be to try and recover the mdadm RAID array. Then look at resuming the pvmove.
We'll have to wait for the OP. I'm not sure there is a software raid..
[ 15.565790] device-mapper: raid: Failed to read superblock of device at position 2
[ 15.601120] md/raid:mdX: not clean -- starting background reconstruction
[ 15.601216] md/raid:mdX: device dm-11 operational as raid disk 0
[ 15.601226] md/raid:mdX: device dm-13 operational as raid disk 1
[ 15.608330] md/raid:mdX: cannot start dirty degraded array.
[ 15.630134] md/raid:mdX: failed to run raid set.
[ 15.630146] md: pers->run() failed ...
[ 15.630250] device-mapper: table: 253:17: raid: Failed to run raid array
[ 15.630260] device-mapper: ioctl: error adding target to table
..those device-mapper numbers are high and I believe "md/*mdX" crops up when LVM is using mdadm to resync. I've only access to one box (centos 6) using LVM mirroring and it's raid1. I'm able to reboot it atm..

Code: Select all

[admin@sdvmc64smb ~]$ sudo cat /var/log/messages | grep mdX
Nov 29 09:36:51 sdvmc64smb kernel: md/raid1:mdX: active with 2 out of 2 mirrors
Nov 29 09:36:51 sdvmc64smb kernel: created bitmap (1024 pages) for device mdX
Nov 29 09:36:51 sdvmc64smb kernel: mdX: bitmap initialized from disk: read 65 pages, set 0 of 2097128 bits
Nov 29 09:36:52 sdvmc64smb kernel: md/raid1:mdX: active with 2 out of 2 mirrors
Nov 29 09:36:52 sdvmc64smb kernel: created bitmap (1536 pages) for device mdX
Nov 29 09:36:52 sdvmc64smb kernel: mdX: bitmap initialized from disk: read 97 pages, set 0 of 3145696 bits

Code: Select all

[admin@sdvmc64smb ~]$ sudo lvs -a -o +devices
  LV               VG    Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                            
  lv00             vg00  -wi-ao----    1.00g                                                     /dev/sda2(5888)                    
  lv01             vg00  -wi-ao----    4.00g                                                     /dev/sda2(4608)                    
  lv02             vg00  -wi-ao----    2.00g                                                     /dev/sda2(3584)                    
  lv03             vg00  -wi-ao----    2.00g                                                     /dev/sda2(4096)                    
  lv04             vg00  -wi-ao----    1.00g                                                     /dev/sda2(2304)                    
  lv05             vg00  -wi-ao----    4.00g                                                     /dev/sda2(2560)                    
  lv06             vg00  -wi-ao----    8.00g                                                     /dev/sda2(0)                       
  lv07             vg00  -wi-ao----    1.00g                                                     /dev/sda2(2048)                    
  lv08             vg00  -wi-ao----    1.00g                                                     /dev/sda2(5632)                    
  lvfon            vgFON Rwi-aor---    1.50t                                    100.00           lvfon_rimage_0(0),lvfon_rimage_1(0)
  [lvfon_rimage_0] vgFON iwi-aor---    1.50t                                                     /dev/sdb1(1)                       
  [lvfon_rimage_0] vgFON iwi-aor---    1.50t                                                     /dev/sdd1(0)                       
  [lvfon_rimage_0] vgFON iwi-aor---    1.50t                                                     /dev/sdj1(0)                       
  [lvfon_rimage_1] vgFON iwi-aor---    1.50t                                                     /dev/sdc1(1)                       
  [lvfon_rimage_1] vgFON iwi-aor---    1.50t                                                     /dev/sdf1(0)                       
  [lvfon_rimage_1] vgFON iwi-aor---    1.50t                                                     /dev/sdg1(0)                       
  [lvfon_rmeta_0]  vgFON ewi-aor---    4.00m                                                     /dev/sdb1(0)                       
  [lvfon_rmeta_1]  vgFON ewi-aor---    4.00m                                                     /dev/sdc1(0)                       
  lvsmb            vgSMB Rwi-aor--- 1023.99g                                    100.00           lvsmb_rimage_0(0),lvsmb_rimage_1(0)
  [lvsmb_rimage_0] vgSMB iwi-aor--- 1023.99g                                                     /dev/sdk1(0)                       
  [lvsmb_rimage_0] vgSMB iwi-aor--- 1023.99g                                                     /dev/sdk1(65535)                   
  [lvsmb_rimage_0] vgSMB iwi-aor--- 1023.99g                                                     /dev/sde1(0)                       
  [lvsmb_rimage_1] vgSMB iwi-aor--- 1023.99g                                                     /dev/sdi1(0)                       
  [lvsmb_rimage_1] vgSMB iwi-aor--- 1023.99g                                                     /dev/sdi1(65535)                   
  [lvsmb_rimage_1] vgSMB iwi-aor--- 1023.99g                                                     /dev/sdh1(0)                       
  [lvsmb_rmeta_0]  vgSMB ewi-aor---    4.00m                                                     /dev/sdk1(65534)                   
  [lvsmb_rmeta_1]  vgSMB ewi-aor---    4.00m                                                     /dev/sdi1(65534) 
..those rimage/rmeta are scattered all over the place!

swampdog
Posts: 300
Joined: Fri Dec 04, 2015 11:22 am

Re: lvm: pvmove crashed, how to recover?

Fri Nov 29, 2019 10:37 am

joongul wrote:
Fri Nov 29, 2019 9:30 am
Hello,

I guess I did not make myself clear in the original post. I am using lvm raid5 volume for /home, by using lvconvert --type raid5 to convert a linear volume into raid5. The following is what I get from lvs command.

Code: Select all

root@courante:~# lvs -a -o +devices
  LV                     VG       Attr       LSize   Pool    Origin       Data%  Meta%  Move     Log Cpy%Sync Convert Devices                                                             
  [chome]                courante Cwi---C--- 100.00g                                                                  chome_cdata(0)                                                      
  [chome_cdata]          courante Cwi---r--- 100.00g                                                                  chome_cdata_rimage_0(0),chome_cdata_rimage_1(0)                     
  [chome_cdata_rimage_0] courante Iwi---r---  50.00g                                                                  /dev/sdd(0)                                                         
  [chome_cdata_rimage_1] courante Iwi---r---  50.00g                                                                  /dev/sde(0)                                                         
  [chome_cmeta]          courante ewi-------  40.00m                                                                  /dev/sdd(12800)                                                     
  home                   courante Cwi---C---   2.00t [chome] [home_corig]                                             home_corig(0)                                                       
  [home_corig]           courante rwi---C---   2.00t                                                                  home_corig_rimage_0(0),home_corig_rimage_1(0),home_corig_rimage_2(0)
  [home_corig_rimage_0]  courante Iwi---r---   1.00t                                                                  /dev/sdb(786435)                                                    
  [home_corig_rimage_0]  courante Iwi---r---   1.00t                                                                  /dev/sdb(0)                                                         
  [home_corig_rimage_1]  courante Iwi---r---   1.00t                                                                  /dev/sda(1)                                                         
  [home_corig_rimage_2]  courante Iwi---r---   1.00t                                                                  pvmove0(0)                                                          
  [home_corig_rmeta_0]   courante ewi---r---   4.00m                                                                  /dev/sdb(786434)                                                    
  [home_corig_rmeta_1]   courante ewi---r---   4.00m                                                                  /dev/sda(0)                                                         
  [home_corig_rmeta_2]   courante ewi---r---   4.00m                                                                  pvmove0(0)                                                          
  [lvol0_pmspare]        courante ewi-------  40.00m                                                                  /dev/sdd(12810)                                                     
  [pvmove0]              courante p-c---m---   1.00t                                    /dev/sdf                      /dev/sdf(1),/dev/sdc(0)                                             
  [pvmove0]              courante p-c---m---   1.00t                                    /dev/sdf                      /dev/sdf(0),/dev/sdc(262144)         
  
At the moment I cannot detect any hardware error: sudo pvs says all drives are working, and dmesg or /var/log/syslog reports no hardware error. ssd drive used for cache has usb quirks option enabled and since then gave me no problem.

pvmove is aborted but I don't know what to do with the internal volume pvmove0, which makes the volume impossible to remove.

lvchange has options to check/repair or rebuild but it seems it can’t be applied to a cached volume. I can’t remove cache with lvconvert —uncache command either because of the pvremove0 volume.

If nothing else works I guess I could remove the metadate related to /home altogether and restore the /home partition from backup, but I am curious if there's a better way.

Code: Select all

vgchange -a y courante
pvmove -i 15
What happened?

joongul
Posts: 5
Joined: Sun Nov 10, 2019 12:52 am

Re: lvm: pvmove crashed, how to recover?

Fri Nov 29, 2019 9:27 pm

swampdog wrote:
Fri Nov 29, 2019 10:37 am

Code: Select all

vgchange -a y courante
pvmove -i 15
What happened?

Code: Select all

jlee@courante:~ $ sudo vgchange -a y courante
  device-mapper: reload ioctl on  (253:17) failed: Input/output error
  Background polling started for 1 logical volume(s) in volume group "courante"
  2 logical volume(s) in volume group "courante" now active
jlee@courante:~ $ sudo pvmove -i 15
  Number of segments in active LV courante/pvmove0 does not match metadata.
  Number of segments in active LV courante/pvmove0 does not match metadata.
  ABORTING: Mirror percentage check failed.
  LVM command executed by lvmpolld failed.
  For more information see lvmpolld messages in syslog or lvmpolld log file.

swampdog
Posts: 300
Joined: Fri Dec 04, 2015 11:22 am

Re: lvm: pvmove crashed, how to recover?

Fri Nov 29, 2019 10:34 pm

joongul wrote:
Fri Nov 29, 2019 9:27 pm
swampdog wrote:
Fri Nov 29, 2019 10:37 am

Code: Select all

vgchange -a y courante
pvmove -i 15
What happened?

Code: Select all

jlee@courante:~ $ sudo vgchange -a y courante
  device-mapper: reload ioctl on  (253:17) failed: Input/output error

Code: Select all

#  dmsetup ls --tree
Output?
..the one matching 253:17 is the bad device (LV).

I'm afraid I'm away until next week. You can try to "lvchange -a n" that LV then 'pvmove --abort' and "lvremove"(*) in the meantime .. but you've lost that LV which comes back to a hardware/64bit kernel fault.

(*) lvremove first, then "lvremove -f" then "lvremove -ff".

joongul
Posts: 5
Joined: Sun Nov 10, 2019 12:52 am

Re: lvm: pvmove crashed, how to recover?

Fri Nov 29, 2019 10:38 pm

Update: happy ending!

I examined the contents of /etc/lvm/archive and vgcfgresotre the metadata right before pvmove began. The volume was still inactive but this time I was able to use lvconvert --uncache and then vgchange -ay to make the volume active.

I was able to fsck and mount the /home volume and it appears to work fine. I think I am not going to use pvmove again but to backup, make a new raid volume and then rsync.

Thanks for your help!

Return to “Raspbian”