bunny-lab/docs

Fork 0

Files

T

nicole 430a2857a6

Automatic Documentation Deployment / Sync Docs to https://kb.bunny-lab.io (push) Successful in 7s

Details

Fixed Admonitions

2026-06-04 13:46:20 -06:00

18 KiB

Raw Blame History

Proxmox VE Shared iSCSI/LVM Orphan Disk Audit and Cleanup Procedure

Purpose

This procedure describes how to identify and safely remove orphaned Proxmox VE virtual machine disks from shared iSCSI-backed LVM storage.

It is intended for environments where:

Proxmox VE is clustered.
Multiple Proxmox nodes access the same shared iSCSI LUN.
The shared storage is exposed to Proxmox as LVM storage.
VM disks are stored as LVM logical volumes.
Some volumes may remain after VM disk deletion, failed migrations, failed resizes, storage UI inconsistencies, or manual recovery work.

The goal is to reclaim storage space without accidentally deleting disks that are still attached to running or stopped VMs.

Scope

This document focuses on the following storage type:

Proxmox storage type: lvm
Backing storage:      shared iSCSI
Volume group:         vg_proxmox_iscsi
Storage ID example:   iscsi-cluster-lvm

Adjust the storage ID and volume group names as needed for your environment.

Safety Requirements

!!! danger "Never delete based on the Storage UI alone" The Proxmox storage UI may show a volume as belonging to a VM because its name follows the pattern vm-<vmid>-disk-<n>. That does not prove the disk is currently attached to the VM.

```
Always verify against VM configuration files and active QEMU processes before deleting.
```

!!! warning "Run the audit before running any cleanup commands" The audit scripts in this document are read-only. The cleanup commands are destructive. Do not run cleanup commands until the audit output has been reviewed.

!!! warning "Snapshot volumes require extra caution" Volumes named like the following may be part of a snapshot chain:

````
```text
snap_vm-<vmid>-disk-<n>_<snapshot-name>
```

Do not remove snapshot volumes manually unless you have verified that the VM and snapshot are no longer known to Proxmox, no backing chain references them, and no QEMU process has them open.
````

!!! note "Shared storage does not mean shared config" In some cluster layouts, each node may only show VM config files for VMs assigned to that node. Therefore, an audit run from only one node can falsely report disks from other nodes as orphaned.

```
Run the confirmation script on every node in the cluster.
```

Terms

Term	Meaning
Attached disk	A disk volume referenced in a VM config, such as `scsi0`, `sata0`, `virtio0`, `efidisk0`, or `tpmstate0`.
Orphan disk	A storage volume that exists on shared storage but is not referenced by any VM config on any node and is not opened by any active process.
Volume ID	Proxmox storage identifier, such as `iscsi-cluster-lvm:vm-107-disk-1.qcow2`.
LV	LVM logical volume, such as `/dev/vg_proxmox_iscsi/vm-107-disk-1.qcow2`.
Snapshot chain	A chain of qcow2 backing files or Proxmox snapshot volumes.

Phase 1: Identify Storage Names

Run this on any Proxmox node:

pvesm status
cat /etc/pve/storage.cfg
vgs

Identify the shared iSCSI/LVM storage.

Example:

Storage ID: iscsi-cluster-lvm
VG name:    vg_proxmox_iscsi

For the rest of this document, replace these values if your environment differs:

STORAGE="iscsi-cluster-lvm"
VG="vg_proxmox_iscsi"

Phase 2: Run the Storage Orphan Audit

Run the following script on one node that can see the shared LVM storage.

This script does not delete anything.

cat > /root/pve-iscsi-orphan-audit.sh <<'EOF'
#!/usr/bin/env bash
set -u

STORAGE="iscsi-cluster-lvm"
VG="vg_proxmox_iscsi"
OUT="/root/pve-iscsi-orphan-audit-$(hostname)-$(date +%Y%m%d-%H%M%S).txt"

{
  echo "===== PVE ISCSI ORPHAN AUDIT ====="
  echo "Host: $(hostname)"
  echo "Date: $(date)"
  echo "Storage: ${STORAGE}"
  echo "VG: ${VG}"

  echo
  echo "===== STORAGE STATUS ====="
  pvesm status 2>&1 | egrep "^(Name|${STORAGE})" || true
  vgs "${VG}" 2>&1 || true

  echo
  echo "===== ALL VOLUMES IN ${STORAGE} ====="
  pvesm list "${STORAGE}" 2>&1 || true

  echo
  echo "===== ALL LVs IN ${VG} ====="
  lvs -a -o lv_name,lv_path,lv_size,lv_attr,devices "${VG}" 2>&1 || true

  echo
  echo "===== LOCAL VM CONFIG FILES ====="
  for conf in /etc/pve/qemu-server/*.conf; do
    [ -e "$conf" ] || continue
    echo
    echo "----- $conf -----"
    cat "$conf"
  done

  echo
  echo "===== REFERENCE ANALYSIS - LOCAL CONFIG FILES ONLY ====="
  printf '%-55s | %-8s | %-10s | %-8s | %-8s | %s\n' \
    "volume" "vmid" "referenced" "open" "size" "path"

  {
    pvesm list "${STORAGE}" 2>/dev/null | awk 'NR>1 {print $1}' | sed "s#^${STORAGE}:##"
    lvs --noheadings -o lv_name "${VG}" 2>/dev/null | awk '{print $1}'
  } | sort -u | while read -r vol; do
    [ -n "$vol" ] || continue

    case "$vol" in
      vm-*-disk-*|snap_vm-*-disk-*) ;;
      *) continue ;;
    esac

    vmid="unknown"
    if [[ "$vol" =~ ^vm-([0-9]+)-disk- ]]; then
      vmid="${BASH_REMATCH[1]}"
    elif [[ "$vol" =~ ^snap_vm-([0-9]+)-disk- ]]; then
      vmid="${BASH_REMATCH[1]}"
    fi

    ref="no"
    if grep -R -Fq "$vol" /etc/pve/qemu-server 2>/dev/null; then
      ref="yes"
    fi

    open="no"
    if lsof 2>/dev/null | grep -Fq "$vol"; then
      open="yes"
    fi

    size="$(lvs --noheadings -o lv_size "${VG}/${vol}" 2>/dev/null | awk '{$1=$1;print}')"
    path="$(lvs --noheadings -o lv_path "${VG}/${vol}" 2>/dev/null | awk '{$1=$1;print}')"

    printf '%-55s | %-8s | %-10s | %-8s | %-8s | %s\n' \
      "$vol" "$vmid" "$ref" "$open" "$size" "$path"
  done

  echo
  echo "===== DONE ====="
} | tee "$OUT"

echo
echo "Saved audit to: $OUT"
EOF

chmod +x /root/pve-iscsi-orphan-audit.sh
/root/pve-iscsi-orphan-audit.sh

The script writes a file similar to:

/root/pve-iscsi-orphan-audit-<node>-<timestamp>.txt

How to Read the First Audit

The most important section is:

REFERENCE ANALYSIS - LOCAL CONFIG FILES ONLY

Example:

volume                 | vmid | referenced | open | size
vm-107-disk-0.qcow2    | 107  | no         | no   | 4.00m
vm-107-disk-1.qcow2    | 107  | no         | no   | 256.04g
vm-107-disk-2.qcow2    | 107  | yes        | no   | 4.00m
vm-107-disk-3.qcow2    | 107  | yes        | no   | 256.04g

Interpretation:

Field	Meaning
`referenced=yes`	The volume appears in a local VM config file. Do not delete.
`referenced=no`	The volume does not appear in local VM configs. It may be orphaned, but confirm across all nodes first.
`open=yes`	A process has the volume open. Do not delete.
`open=no`	No process on this node has the volume open. Still confirm across all nodes.

!!! warning "Local reference analysis is not enough" If a VM runs on another cluster node, its config may not appear on the node where you ran the audit. This can make valid disks look orphaned.

```
Continue to Phase 3 before deleting anything.
```

Phase 3: Run Cluster-Wide Confirmation

Run the following script on every Proxmox node in the cluster.

This script is read-only.

cat > /root/pve-cluster-vm-confirm.sh <<'EOF'
#!/usr/bin/env bash
set -u

STORAGE="iscsi-cluster-lvm"
VG="vg_proxmox_iscsi"
OUT="/root/pve-cluster-vm-confirm-$(hostname)-$(date +%Y%m%d-%H%M%S).txt"

{
  echo "===== NODE ====="
  hostname
  date

  echo
  echo "===== CLUSTER RESOURCES - VMS ====="
  pvesh get /cluster/resources --type vm 2>&1 || true

  echo
  echo "===== LOCAL QM LIST ====="
  qm list 2>&1 || true

  echo
  echo "===== QEMU CONFIG FILES PRESENT ====="
  ls -la /etc/pve/qemu-server/ 2>&1 || true

  echo
  echo "===== QEMU CONFIG FILE CONTENTS ====="
  for conf in /etc/pve/qemu-server/*.conf; do
    [ -e "$conf" ] || continue
    echo
    echo "----- $conf -----"
    cat "$conf"
  done

  echo
  echo "===== ALL STORAGE VOLUMES ====="
  pvesm list "${STORAGE}" 2>&1 || true

  echo
  echo "===== ALL LVs ====="
  lvs -a -o lv_name,lv_path,lv_size,lv_attr,devices "${VG}" 2>&1 || true

} | tee "$OUT"

echo
echo "Saved to: $OUT"
EOF

chmod +x /root/pve-cluster-vm-confirm.sh
/root/pve-cluster-vm-confirm.sh

Collect the output file from each node.

Example for a three-node cluster:

/root/pve-cluster-vm-confirm-cluster-node-01-YYYYMMDD-HHMMSS.txt
/root/pve-cluster-vm-confirm-cluster-node-02-YYYYMMDD-HHMMSS.txt
/root/pve-cluster-vm-confirm-cluster-node-03-YYYYMMDD-HHMMSS.txt

How to Read the Cluster Confirmation

For each suspicious volume, search all three outputs.

Example candidate:

vm-107-disk-1.qcow2

Check whether it appears in any VM config:

grep -R "vm-107-disk-1.qcow2" /etc/pve/qemu-server/ || true

If reviewing output files manually, look for config lines such as:

scsi0: iscsi-cluster-lvm:vm-107-disk-1.qcow2
sata0: iscsi-cluster-lvm:vm-107-disk-1.qcow2
virtio0: iscsi-cluster-lvm:vm-107-disk-1.qcow2
efidisk0: iscsi-cluster-lvm:vm-107-disk-1.qcow2
tpmstate0: iscsi-cluster-lvm:vm-107-disk-1.qcow2

If a volume appears in any of those lines, it is attached to a VM and must not be deleted.

Phase 4: Classify Candidate Volumes

Use the following decision table.

Condition	Classification	Action
Volume appears in any VM config on any node	In use	Do not delete
Volume is opened by QEMU or another process	In use or unsafe	Do not delete
Volume is a `snap_vm-*` snapshot volume	Snapshot-chain item	Inspect snapshot/backing chain before deletion
Volume does not appear in any VM config and is not open	Orphan candidate	Eligible for final verification
VMID no longer exists in cluster resources and disk is not referenced	Strong orphan	Eligible for cleanup

Example: Valid VM Disks

If VM 107 has this config:

efidisk0: iscsi-cluster-lvm:vm-107-disk-2.qcow2
sata0: iscsi-cluster-lvm:vm-107-disk-3.qcow2

Then these disks are valid and must not be deleted:

vm-107-disk-2.qcow2
vm-107-disk-3.qcow2

If storage also contains:

vm-107-disk-0.qcow2
vm-107-disk-1.qcow2

and neither appears in any config file on any node, those are orphan candidates.

Phase 5: Final Verification Before Deletion

For each candidate volume, run the following checks on a node that can see the shared storage.

Replace the volume name as appropriate.

VOL="vm-107-disk-1.qcow2"
STORAGE="iscsi-cluster-lvm"
VG="vg_proxmox_iscsi"

echo "===== Check all cluster config references ====="
grep -R "$VOL" /etc/pve/qemu-server/ || true

echo
echo "===== Check Proxmox storage listing ====="
pvesm list "$STORAGE" | grep "$VOL" || true

echo
echo "===== Check LVM volume ====="
lvs -a -o lv_name,lv_path,lv_size,lv_attr,devices "$VG" | grep "$VOL" || true

echo
echo "===== Check whether open by any process ====="
lsof | grep "$VOL" || true

echo
echo "===== Check qemu-img metadata if device path exists ====="
LVPATH="$(lvs --noheadings -o lv_path "${VG}/${VOL}" 2>/dev/null | awk '{$1=$1;print}')"
if [ -n "$LVPATH" ] && [ -e "$LVPATH" ]; then
  qemu-img info --backing-chain "$LVPATH"
else
  echo "LV path missing or inactive: $LVPATH"
fi

Safe deletion pattern:

grep -R ...          no output
pvesm list ...       shows the volume
lvs ...              shows the volume
lsof ...             no output
qemu-img info ...    no unexpected backing file dependency

!!! danger "Stop if grep finds a reference" If the candidate volume appears in any /etc/pve/qemu-server/*.conf file, do not delete it.

!!! danger "Stop if lsof finds a process" If lsof shows the volume is open, do not delete it.

Phase 6: Cleanup Commands

Preferred Method: Proxmox Storage Layer

Use pvesm free first.

Example:

pvesm free iscsi-cluster-lvm:vm-107-disk-0.qcow2
pvesm free iscsi-cluster-lvm:vm-107-disk-1.qcow2

Then verify:

pvesm list iscsi-cluster-lvm | grep "vm-107-disk" || true
lvs -a -o lv_name,lv_size,lv_attr,devices vg_proxmox_iscsi | grep "vm-107-disk" || true
vgs vg_proxmox_iscsi
pvesm status | egrep '^(Name|iscsi-cluster-lvm)'

Expected result:

vm-107-disk-0.qcow2    gone
vm-107-disk-1.qcow2    gone
vm-107-disk-2.qcow2    still present
vm-107-disk-3.qcow2    still present

Fallback Method: Direct LVM Removal

Only use this if pvesm free refuses and the final verification confirms the volume is not referenced and not open.

lvremove /dev/vg_proxmox_iscsi/vm-107-disk-0.qcow2
lvremove /dev/vg_proxmox_iscsi/vm-107-disk-1.qcow2

Then refresh device nodes and verify:

vgscan --mknodes
udevadm settle

pvesm list iscsi-cluster-lvm | grep "vm-107-disk" || true
lvs -a -o lv_name,lv_size,lv_attr,devices vg_proxmox_iscsi | grep "vm-107-disk" || true
vgs vg_proxmox_iscsi

!!! warning "Prefer pvesm free over lvremove" pvesm free lets Proxmox remove the volume through its storage abstraction. Use direct lvremove only when Proxmox refuses and the orphan status is already proven.

Phase 7: Post-Cleanup Validation

After deleting orphan volumes, validate storage and VM health.

pvesm status
vgs vg_proxmox_iscsi
lvs -a -o lv_name,lv_size,lv_attr,devices vg_proxmox_iscsi

Check the affected VM’s config:

qm config 107

Confirm the VM still starts or remains healthy:

qm status 107

If the VM is running, confirm its active QEMU process only references expected disks:

ps auxww | grep "kvm -id 107" | grep -o "/dev/vg_proxmox_iscsi/[^ ,\"]*" | sort -u

Expected example:

/dev/vg_proxmox_iscsi/vm-107-disk-2.qcow2
/dev/vg_proxmox_iscsi/vm-107-disk-3.qcow2

Snapshot Volume Handling

Snapshot volumes require additional review.

Examples:

snap_vm-105-disk-0_Fresh_Install.qcow2
snap_vm-106-disk-0_Fresh_Install_FullyUpdated.qcow2

Before deleting a snapshot volume, check:

qm config <vmid>
qm listsnapshot <vmid>
grep -R "snap_vm-<vmid>" /etc/pve/qemu-server/ || true
qemu-img info --backing-chain /dev/vg_proxmox_iscsi/vm-<vmid>-disk-<n>.qcow2

If the VM still has a parent: line or qm listsnapshot shows the snapshot, remove it through Proxmox first:

qm delsnapshot <vmid> <snapshot-name>

Only consider manual removal if:

the VM no longer references the snapshot,
no backing chain references the snapshot volume,
no QEMU process has it open,
and Proxmox cannot delete it normally.

!!! danger "Do not manually delete active snapshot-chain volumes" Deleting an active snapshot backing volume can corrupt the VM disk chain.

Example Cleanup Walkthrough

Scenario

VM 107 has this config:

efidisk0: iscsi-cluster-lvm:vm-107-disk-2.qcow2
sata0: iscsi-cluster-lvm:vm-107-disk-3.qcow2

Storage contains:

vm-107-disk-0.qcow2
vm-107-disk-1.qcow2
vm-107-disk-2.qcow2
vm-107-disk-3.qcow2

disk-0 and disk-1 do not appear in any config and are not open by any process.

Verify

grep -R "vm-107-disk-0.qcow2" /etc/pve/qemu-server/ || true
grep -R "vm-107-disk-1.qcow2" /etc/pve/qemu-server/ || true

lsof | grep "vm-107-disk-0.qcow2" || true
lsof | grep "vm-107-disk-1.qcow2" || true

Expected output:

no output

Delete

pvesm free iscsi-cluster-lvm:vm-107-disk-0.qcow2
pvesm free iscsi-cluster-lvm:vm-107-disk-1.qcow2

Validate

pvesm list iscsi-cluster-lvm | grep "vm-107-disk"
lvs -a -o lv_name,lv_size,lv_attr,devices vg_proxmox_iscsi | grep "vm-107-disk"
vgs vg_proxmox_iscsi

Expected remaining volumes:

vm-107-disk-2.qcow2
vm-107-disk-3.qcow2

Technician Checklist

Use this checklist before removing any orphan disk.

I ran the storage orphan audit.
I ran the cluster confirmation script on every Proxmox node.
I confirmed the candidate volume is not referenced in any VM config.
I confirmed the candidate volume is not open by any process.
I confirmed the candidate volume is not part of an active snapshot chain.
I confirmed the VMID relationship is understood.
I used pvesm free first.
I used lvremove only if Proxmox refused and the volume was proven orphaned.
I validated storage state after cleanup.
I validated the affected VM still references only expected disks.

Quick Reference Commands

List shared storage volumes

pvesm list iscsi-cluster-lvm

List LVs

lvs -a -o lv_name,lv_path,lv_size,lv_attr,devices vg_proxmox_iscsi

Search VM configs

grep -R "vm-<vmid>-disk-<n>" /etc/pve/qemu-server/ || true

Check open files

lsof | grep "vm-<vmid>-disk-<n>" || true

Check image metadata

qemu-img info --backing-chain /dev/vg_proxmox_iscsi/vm-<vmid>-disk-<n>.qcow2

Delete via Proxmox

pvesm free iscsi-cluster-lvm:vm-<vmid>-disk-<n>.qcow2

Delete via LVM fallback

lvremove /dev/vg_proxmox_iscsi/vm-<vmid>-disk-<n>.qcow2

Verify storage usage

pvesm status
vgs vg_proxmox_iscsi

18 KiB Raw Blame History Unescape Escape

Proxmox VE Shared iSCSI/LVM Orphan Disk Audit and Cleanup Procedure

Purpose

Scope

Safety Requirements

Terms

Phase 1: Identify Storage Names

Phase 2: Run the Storage Orphan Audit

How to Read the First Audit

Phase 3: Run Cluster-Wide Confirmation

How to Read the Cluster Confirmation

Phase 4: Classify Candidate Volumes

Example: Valid VM Disks

Phase 5: Final Verification Before Deletion

Phase 6: Cleanup Commands

Preferred Method: Proxmox Storage Layer

Fallback Method: Direct LVM Removal

Phase 7: Post-Cleanup Validation

Snapshot Volume Handling

Example Cleanup Walkthrough

Scenario

Verify

Delete

Validate

Technician Checklist

Quick Reference Commands

List shared storage volumes

List LVs

Search VM configs

Check open files

Check image metadata

Delete via Proxmox

Delete via LVM fallback

Verify storage usage

18 KiB

Raw Blame History