Came in to work today to find a VM stuck at the "In Progress" status while taking a snapshot. We use vDR as a complementary subset to our backup plans, and vDR had the unfortunate task of calling the snapshot which is now hung.
The official error read "An error occurred while quiescing the virtual machine. See the virtual machine's event log for details." One problem with that, I couldn't log into the system. The snapshot was far enough along to freeze the IO, so I had to jump into CLI and kill the task.
To kill the task for a VM, jump into the CLI for the host (in this case it was through the iDRAC and local terminal... bad, I know) and run a: ps | grep vmx command to see all the processes while searching for vmx's
Locate the Parent Process ID (the second column) for the hung VM, and run: kill *parent process ID* to end the process. In this case, it was: kill 465724
***DISCLAIMER: Be very careful doing this, if you don't kill the proper process, it can do harm to your ESXi host***
Instantly, the task remaining in progress should change to have a status of "The attempted operation cannot be performed in the current state (Powered off)."
Now, it's time to check out that error. In this case I received a "Volume Shadow Copy Service error: Unexpected error DeviceIoControl" with the rest of the error seemingly pointing at the generic floppy drive. I know this is the error because it's pointing dead at the VMware Snapshot Provider and in a state of "DoSnapshotSet"
That's incredibly weird. So I uninstall both the floppy drive and it's controller, also remove it from the VM's settings while it's powered off. I boot the system back up, the floppy drive has reinstalled itself. Very odd, so I just uninstall the drive and then disable the controller.
So it's snapshot time again, right? Well, not really. I retried the snapshot, it freezes again. Time for some googling after I kill the parent process for the VM again.
What I came up with is that there is apparently something with Windows Server 2008 R2 systems having SQL 2008. This was a frequent topic over many VMware Communities posts, however no one really every had an answer on what was going on internal to the VM which would cause this problem. I know we have 4 or 5 SQL servers running and this is the first system we've run into this problem on.
Anyways, the best workaround I found was to disable the disk.EnableUUID parameter on the VM. Please note that by disabling this, you effectively disable VSS for the snapshot (ie. no quiescing). So I maintain that this is only a workaround and not yet a true solution
To do this, shut down the VM. Right click, Edit Settings, hit the Options tab, and click on "Configuration Parameters"
In the Configuration Parameter pop-up screen, look for the "disk.EnableUUID" setting and change the value to read "false"
Click OK a couple times and boot the system up. Once it's booted up, try giving it a snapshot while checking the option for "Quiesce guest file system". This time, everything was successful. I ran the test and I also had the vDR appliance run another snapshot to get that one up to date
Hopefully I can do some more research and turn up some better answers, and at worst I'll create a support ticket and see if VMware Support can point me in a better direction
We ran into a HUGE problem with the migration of our file servers to Server 2008 R2 from Server 2003, our Techs noticed that the Previous Versions tab was only populating 64 of the oldest snapshots. This is a huge problem.
So I did some digging, ended up figuring out that if I opened up the Previous Versions tab in an older OS (I used Server 2003 as well as XP Pro) and could see all of the snapshots perfectly. So this now became a 2008/Win7 problem.
For those that don't know, the default maximum snapshot value is 64. This can be modified by going into the registry of the system that VSS is running on and going to: HKLM\SYSTEM\CurrentControlSet\services\VSS\Settings and adding the DWORD value of "MaxShadowCopies" and setting it to a decimal value of 512. 512 is the maximum number of snaps that can be done. More information can be found here: http://technet.microsoft.com/en-us/library/ee923636%28WS.10%29.aspx
On the left is a Server 2003 mapping of a 2008 R2 file share, on the right is the same file share mapped on a Windows 7 box. Big difference there.
We ended up creating a case with Microsoft. Turns out that it happens to be a known bug with no resolution in site. However, there did happen to be a workaround. To correct the problem, SMB2 has to be turned off. This is generally not something you want to do, however it worked in this particular instance. For information: http://en.wikipedia.org/wiki/Server_Message_Block#SMB2
So to turn off SMB2 and give yourself the ability to see all of the created snapshots in an OS newer than Server 2003 and/or XP, you have to dive back into the registry. Go to: HKLM\SYSTEM\CurrentControlSet\services\LanmanServer\Parameters and create a new DWORD value of "SMB2" and ensure that the decimal value is 0. After that change has been made, reboot the system.
Note: Microsoft highly recommends turning SMB2 off at the desktop level rather than the server level.
After the reboot, here's what pulled up:
Success! Our techs can now help people out with the VSS snaps.