The parent virtual disk has been modified since the child was created

Some VMs in my environment had virtual-mode RDMs on them, along with multiple nested snapshots. Some of the RDMs were subsequently extended at the storage array level, but the storage team didn’t realize there was an active snapshot on the virtual-mode RDMs. This resulted in immediate shutdown of the VMs and a vSphere client error “The parent virtual disk has been modified since the child was created” when attempting to power them back on.

I had done a little bit of work dealing with broken snapshot chains before, but the change in RDM size was outside of my wheelhouse, so we logged a call with VMware support. I learned some very handy debugging techniques from them and thought I’d share that information here. I went back into our test environment and recreated the situation that caused the problem.

In this example screenshot, we have a VM with no snapshot in place and we run vmkfstools –q –v10  against the vmdk file
-q means query, -v10 is verbosity level 10

The command opens up the disk, checks for errors, and reports back to you.

1_vmkfstools

 

In the second example, I’ve taken a snapshot of the VM. I’m now passing the snapshot VMDK into the vmkfstools command. You can see the command opening up the snapshot file, then opening up the base disk.

 

2_vmkfstools

 

In the third example, I  pass it the snapshot vmdk for a virtual-mode RDM on the same VM –  it traverses the snapshot chain and also correctly reports that the VMDK is a non-passthrough raw device mapping, which means virtual mode RDM.

 

3_vmkfstools

Part of the problem that happened was the size of the RDM changed (increased size) but the snapshot pointed to the wrong smaller size.  However, even without any changes to the storage, a corrupted snapshot chain can  happen  during an out-of-space situation.

I have intentionally introduced a drive geometry mismatch in my test VM below – note that the value after RW in the snapshot TEST-RDM_1-00003.vmdk  is 1 less than the value in the base disk  TEST-RDM_1.vmdk

4_vmkfstools

 

Now if I run it through the vmkfstools command, it reports the error that we were seeing in the vSphere client in Production when trying to boot the VMs – “The parent virtual disk has been modified since the child was created”. But the debugging mode gives you an additional clue that the vSphere client does not give– it says that the capacity of each link is different, and it even gives you the values (20368672 != 23068671).

5_vmkfstools
The fix was to follow the entire chain of snapshots and ensure everything was consistent. Start with the most current snap in the chain. The “parentCID” value must be equal to the “CID” value in the next snapshot in the chain. The next snapshot in the chain is listed in the “parentFileNameHint”.  So TEST-RDM_1-00003.vmdk is looking for a ParentCID value of 72861eac, and it expects to see that in the file TEST-RDM_1.vmdk.

If you open up Test-RDM_1.vmdk, you see a CID value of 72861eac – this is correct.  You also see an RW value of 23068672. Since this file is the base RDM, this is the correct value. The value in the snapshot is incorrect, so you have to go back and change it to match.  All snapshots in the chain must match in the same way.

4_vmkfstools

 

I change the RW value in the snapshot back to match  23068672 – my vmkfstools command succeeds, and I’m also able to delete the snapshot from the vSphere client6_vmkfstools

 

Extending Citrix Cache Drives in vSphere

I have a large client running a Citrix XenDesktop farm on top of vSphere. The environment is using PVS to PXE boot desktops. The VM shells were created with a 2GB cache drive. However, the environment has grown and we needed to extend the drive to 3GB.

PowerShell and PowerCLI to the rescue! First, we need to extend the size of the VMDK from 2 to 3GB. The client wanted me to do this in a controlled manner, so I pointed my script to the AD OU containing computer accounts for a specific pool of desktops. I do realize I could have passed a few more of the variables as parameters.

Param(
    [switch] $WhatIf=$true
)

$LOG_FILE_NAME = "output.txt"

function LogThis($buf)
{
    write-host $buf
    Add-Content -Path $LOG_FILE_NAME $buf
}

if ( Test-Path $LOG_FILE_NAME )
{
    Remove-Item $LOG_FILE_NAME
}

Add-PSSnapin VMware.VimAutomation.Core
Import-Module ActiveDirectory
Connect-VIServer YOURVCENTER.foo.com
$computers = get-adcomputer -Filter * -SearchBase "OU=Some OU2,OU=Some OU,DC=foo,DC=com"
foreach ( $computer in $computers )
{
   LogThis( $computer.Name )
   $vm = Get-VM $computer.Name -ErrorAction SilentlyContinue
   if ( $vm -eq $null)
   {
        LogThis( "Could not locate VM in vCenter" )
   }
   else
   {
        foreach ( $hd in (Get-HardDisk $vm) )
        {
             #$hd.CapacityKB -lt "2097153"    #2097152 is 2048K
             if ( $hd.CapacityKB -lt "2097513" )
             {
                  if ( $WhatIf -eq $true )
                  {
                     LogThis("Running in whatif mode - would have extended disk.")
                  }

                  else
                  {   
                        Set-HardDisk -HardDisk $hd -CapacityKB 3145728 -Confirm:$False
                  }
             }
            else
            {
                LogThis("No disk extension required.")
            }
        }
   }
   LogThis("`r`n")

}

Next, I needed a way to expand the partition for Windows. I thought about some kind of script to disconnect the VMDK, mount it to another VM and extend it that way, but it seemed too destructive. So I looked at diskpart instead. I first thought I was going to use a GPO to trigger a startup script, but apparently you can’t use those with Citrix PVS. The VM thinks it’s the identity of the master on boot – your WMI filters don’t work.

Instead, I went with remote Powershell invocation of diskpart.exe

Param(
    [switch] $WhatIf
)

Add-PSSnapin VMware.VimAutomation.Core
Import-Module ActiveDirectory
Connect-VIServer MYVCENTER.foo.com
$computers = get-adcomputer -Filter * -SearchBase "OU=OU2,OU=OU,DC=foo,DC=com"

$LOG_FILE_NAME = "diskpart_output.txt"

function LogThis($buf)
{
    write-host $buf
    Add-Content -Path $LOG_FILE_NAME $buf
}

if ( Test-Path $LOG_FILE_NAME )
{
    Remove-Item $LOG_FILE_NAME
}

foreach ( $computer in $computers )
{
     LogThis( $computer.Name )
     if ( $WhatIf -eq $true )
     {
        LogThis("Would have performed remote script")
     }
     else
     {
        invoke-command -computername $computer.Name -ScriptBlock { $script = $Null;$script = @("select disk 0","select partition 1","extend","exit");$script | Out-File -Encoding ASCII -FilePath "c:\windows\temp\Diskpart-extend.txt";diskpart.exe /S C:\windows\temp\Diskpart-extend.txt}
     }
  
}

The Invoke-Command line deserves some explanation

The diskpart commands I want to run are:
select disk 0
select partition 1
extend
exit

I create an empty variable and write the diskpart commands out to it. Then I use Out-File to save the diskpart commands to a text file in C:\windows\temp. Then I call diskpart.exe with an /S command switch, which executes the commands in the script. Because I used the -ComputerName parameter, all of my code is remotely executed on the desktop.

Hope this post saves you some time.

vSphere Datastore Last Updated timestamp – Take 2

I referenced this in an earlier post, but we continue to have datastore alarm problems on hosts running 4.0U2 connected to a 4.1 vCenter. In some cases, the timestamp on the datastore does not update, so it’s not just the alarm having a problem but also the datastore timestamp. As a stopgap measure, we scheduled a little PowerCLI script to automatically run to ensure all of the datastores are refreshed. We then accelerated our upgrade plans to quickly eliminate the problem from our primary datacenter. We now only have it in our DR site, so it’s not critical anymore, just annoying.

if ( (Get-PSSnapin -Name VMware.VimAutomation.Core -ErrorAction SilentlyContinue) -eq $null )
{
    Add-PsSnapin VMware.VimAutomation.Core
}
 
Connect-VIServer yourserver.foo.com
$ds = Get-Datastore
foreach ( $dst in $ds )
{
   $dsv = $dst | Get-View
   Write-Host "Refreshing "$dsv.Name   
   $dsv.RefreshDatastore()
}

Guest NICs disconnected after upgrade

We are upgrading our infrastructure to ESXi 4.1 and had an unexpected result in Development cluster where multiple VMs were suddenly disconnected after vMotion. It sounded a lot like a problem that I had seen before where a misconfiguration in the number of ports on a vSwitch prevents vMotioned VMs from being able to connect to the switch. If a vSwitch has too few available ports, the VMs that vMotion over are unable to connect to the switch. You generally avoid this with host profiles, but it’s possible a host in the Dev cluster fell out of sync. In any event, the server that was being upgraded this evening had been rebuilt and it wasn’t worth trying to figure out what the configuration might have been. I needed to go through, find all VMs that should have been connected but weren’t, and reconnect them. I decided that I needed:

  • VMs that were currently Powered On – obviously as Powered Off VMs are all disconnected
  • VMs with NICs currently set to “Connect at Power On” so I could avoid connecting something that an admin had intentionally left disconnected
  • VMs with NICs currently not connected

Note that this script will change network settings and REBOOT VMs if you execute it. I was watching the script while it executed, I pinged the guest DNS name first to ensure the IP wasn’t already on the network, then connected the NIC, then pinged again to make sure it was back on the network. I figured I could Control-C to stop if something looked wrong. I rebooted all of the guests to avoid any failed service / failed task problems that might have occurred while the guests were disconnected.

$vms=get-cluster "Development" | get-vm | Where { $_.PowerState -eq "PoweredOn" }| Sort-Object Name
foreach ($vm in $vms)
{
   $nics = $vm | get-networkadapter | Where {$_.ConnectionState.Connected -eq $false -and $_.ConnectionState.StartConnected -eq $true}
   if ($nics -ne $null)
   {
  	 foreach ( $nic in $nics )
  	 {
	     	write-host $vm.Name
	     	write-host $nic
	      	ping $vm.Guest.HostName -n 5
		$nic | Set-NetworkAdapter -Connected $true -confirm:$false
	 }
 
        ping $vm.Guest.HostName -n 5
	$vm | Restart-VMGuest
 
   }

vSphere Datastore Last Updated timestamp

We encountered issues that mysteriously appeared after a patching window on our vCenter server. We had just upgraded to vCenter 4.1 earlier in the month and this was the first time the server had been rebooted since the upgrade.

After reboot, the Last Updated timestamp on all datastores was stuck at the time the vCenter service came online. None of our disk space alarms worked because the datastore statistics were not being updated.

We noticed that the problem only appeared on the 4.0 U2 hosts – datastores connected to 4.1 clusters had no problem.

VMware support acknowledged the timestamp update issue as a problem in 4.0 U2 that was partially addressed in 4.1 and fully addressed in the soon-to-be-released 4.1 U1

vSphere workingDir configuration parameter erased by Storage vMotion

The workingDir parameter allows you to configure a specific datastore for VM snapshots. We attempted to configure the workingDir parameter in several of our virtual machines. We then found out the hard way that a storage vMotion erases the workingDir parameter – a snapshot completely filled the datastore containing our C: drive and made a serious mess of the SQL VM.

VMware’s official position is that workingDir is not compatible with storage vMotion.

VMware Automatic HBA rescan

When you add a new datastore, vCenter initiates an automatic rescan on the ESX hosts in the cluster. This can create a so-called “rescan storm” with your hosts pounding away at the SAN. This can cause serious performance problems for the duration of the storm. Even if you don’t have enough hosts in a cluster to see a real problem, it’s pretty inconvenient to have to wait for each rescan when you’re trying to add 10 datastores.

To disable the automatic rescan, open your vSphere client.

1. Administration->vCenter Server
2. Settings->Advanced Settings
3. Look for “config.vpxd.filter.hostRescanFilter. If it’s not there, add it and set to false.

If it’s currently set to true, you have to edit vpxd.cfg and restart the vCenter service.
C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\vpxd.cfg, change this key to false and restart the vCenter service.
true

Doing this creates a new problem – you now have to manually force a rescan on every host.

Here is a PowerCLI one-liner to automate this process. It will perform the rescan on each host in the cluster, and only on one host at a time.

get-cluster “Production” | Get-VMHost | %{ Get-VMHostStorage $_ -RescanVMFS }

Unable to delete vSphere datastore

I had a nearly empty datastore in one of my vSphere clusters that I wanted to destroy so I could steal some of the LUNs to grow a different datastore. I migrated all machines and files off the datastore, it was completely empty, but I kept getting an error from vCenter saying it was unable to delete the datastore because the resource was in use. I eventually discovered that a VM in the cluster had its CD-ROM connected to one of the ISOs that I had moved off the datastore. Disconnected the CD-ROM and I was good to go.