When installing the Dell Equallogic MEM goes wrong…
For those those don't know, Dell and Equallogic have been releasing firmwares and updates at a massive pace. One which I, personally, prefer compared to some of the other vendors which release updates at a quarterly rate.
However, after upgrading to the VMware version of MEM 1.1 things started to go very... very wrong.
I happened to take a screenshot of the successful install of 1.1 and removal of 1.0.9:
After a reboot, none of the volumes returned. None, not a single one. I checked the Management Console, everything was fine, except for the connections tab which showed zero connections instead of the normal 48.
I then connected to the host and could ping all of the storage array's iSCSI ports without problem.
Being that I'm already SSH'd in to the host, figure I'll do some log diving.
In the vmkwarning.log I'm hit instantly with a bunch of these:
2012-03-26T20:31:17.714Z cpu13:7053)WARNING: NMP: nmpDeviceAttemptFailover:658:Retry world failover device "naa.6090a0a8c099b78b22e8d4f8d3904f66" - failed to issue command due to Not found (APD), try again...
2012-03-26T20:31:17.714Z cpu13:7053)WARNING: NMP: nmpDeviceAttemptFailover:708:Logical device "naa.6090a0a8c099b78b22e8d4f8d3904f66": awaiting fast path state update...
2012-03-26T20:31:18.714Z cpu2:6930)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate:972:Could not select path for device "naa.6090a0a8c099b78b22e8d4f8d3904f66".
2012-03-26T20:31:18.714Z cpu2:4786)WARNING: vmw_psp_rr: psp_rrSelectPath:1146:Could not select path for device "naa.6090a0a8c099b78b22e8d4f8d3904f66".
2012-03-27T01:32:31.371Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba33:CH:1 T:15 CN:0: Failed to receive data: Connection closed by peer
2012-03-27T01:32:31.371Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2012-03-27T01:32:31.371Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 10.*.*.22:62322 R: 10.*.*.5:3260]
2012-03-27T01:32:31.623Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba33:CH:0 T:0 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
2012-03-27T01:32:31.623Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2012-03-27T01:32:31.623Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 10.*.*.21:52273 R: 10.*.*.5:3260]
2012-03-27T03:20:39.154Z cpu4:4980)WARNING: NMP: nmp_SatpGetDefaultPspi:624:Default psp DELL_PSP_EQL_ROUTED for SATP: VMW_SATP_EQL Load Failed! [Not found]
2012-03-27T03:20:39.159Z cpu0:4980)WARNING: ScsiDeviceIO: 6235: The device naa.6090a0a8c099072610ef54ee86018060 does not permit the system to change the sitpua bit to 1.
Then, to add insult to injury, I check out the vmkernel.log and find more horribleness:
2012-03-27T03:17:23.611Z cpu5:5624)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x41002c0b7640 network resource pool netsched.pools.persist.iscsi associated
2012-03-27T03:17:23.611Z cpu5:5624)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x41002c0b7640 network tracker id 1 tracker.iSCSI.10.*.*.5 associated
2012-03-27T03:17:23.612Z cpu5:5624)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba33:CH:0 T:3 CN:0: Failed to receive data: Connection closed by peer
2012-03-27T03:17:23.612Z cpu5:5624)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2012-03-27T03:17:23.613Z cpu5:5624)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 10.*.*.21:56710 R: 10.*.*.5:3260]
2012-03-27T03:17:23.613Z cpu5:5624)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: vmhba33:CH:0 T:3 CN:0: Connection rx notifying failure: Failed to Receive. State=Bound
2012-03-27T03:17:23.613Z cpu5:5624)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2012-03-27T03:17:23.613Z cpu5:5624)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Conn [CID: 0 L: 10.*.*.21:56710 R: 10.*.*.5:3260]
2012-03-27T03:19:19.127Z cpu14:5624)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x41002c09d020 network resource pool netsched.pools.persist.iscsi associated
2012-03-27T03:19:19.127Z cpu14:5624)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x41002c09d020 network tracker id 1 tracker.iSCSI.10.*.*.10 associated
2012-03-27T03:19:19.595Z cpu14:5624)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba33:CH:0 T:0 CN:0: iSCSI connection is being marked "ONLINE"
2012-03-27T03:19:19.596Z cpu14:5624)WARNING: iscsi_vmk: iscsivmk_StartConnection: Sess [ISID: 00023d000001 TARGET: iqn.2001-05.com.equallogic:0-8a0906-8bb799c0a-664f90d3f8d4e822-remotesystems TPGT: 1 TSIH: 0]
2012-03-27T03:19:19.596Z cpu14:5624)WARNING: iscsi_vmk: iscsivmk_StartConnection: Conn [CID: 0 L: 10.*.*.21:58840 R: 10.*.*.10:3260]
2012-03-27T03:19:20.344Z cpu22:5497)ScsiScan: 1098: Path 'vmhba33:C2:T0:L0': Vendor: 'EQLOGIC ' Model: '100E-00 ' Rev: '5.2 '
2012-03-27T03:19:20.344Z cpu22:5497)ScsiScan: 1101: Path 'vmhba33:C2:T0:L0': Type: 0x0, ANSI rev: 5, TPGS: 1 (implicit only)
2012-03-27T03:19:20.345Z cpu0:5497)ScsiScan: 1582: Add path: vmhba33:C2:T0:L0
2012-03-27T03:19:20.385Z cpu0:5497)VMKAPICore: 2204: Loading Module vmw_satp_eql
2012-03-27T03:19:20.386Z cpu21:4783)WARNING: UserObj: 3232: Unimplemented operation on 0x410017c05b60/RPC
2012-03-27T03:19:20.386Z cpu21:4783)WARNING: UserObj: 675: Failed to crossdup fd 9, cnxId: 0x80000000 type RPC: Not implemented
2012-03-27T03:19:20.544Z cpu4:6392)Loading module vmw_satp_eql ...
2012-03-27T03:19:20.544Z cpu4:6392)Elf: 1862: module vmw_satp_eql has license VMware
2012-03-27T03:19:20.544Z cpu4:6392)Mod: 4015: Initialization of vmw_satp_eql succeeded with module ID 62.
2012-03-27T03:19:20.544Z cpu4:6392)vmw_satp_eql loaded successfully.
2012-03-27T03:19:20.549Z cpu0:5497)NMP: nmp_LoadPlugin:3188: Plugin DELL_PSP_EQL_ROUTED is not registered!
2012-03-27T03:19:20.549Z cpu0:5497)WARNING: NMP: nmp_SatpGetDefaultPspi:624:Default psp DELL_PSP_EQL_ROUTED for SATP: VMW_SATP_EQL Load Failed! [Not found]
2012-03-27T03:19:20.549Z cpu0:5497)ScsiPath: 4541: Plugin 'NMP' claimed path 'vmhba33:C0:T0:L0'
2012-03-27T03:19:20.551Z cpu10:5497)vmw_psp_fixed: psp_fixedSelectPathToActivateInt:479: Changing active path from NONE to vmhba33:C2:T0:L0 for device "Unregistered Device".
2012-03-27T03:41:46.169Z cpu14:4977)VMWARE SCSI Id: Id for vmhba33:C1:T14:L0
0x60 0x90 0xa0 0xb8 0x60 0x6e 0xd5 0x3a 0xaf 0xf0 0xd4 0x01 0x00 0x00 0xf0 0x0b 0x31 0x30 0x30 0x45 0x2d 0x30
2012-03-27T03:41:46.169Z cpu14:4977)VMWARE SCSI Id: Id for vmhba33:C0:T14:L0
0x60 0x90 0xa0 0xb8 0x60 0x6e 0xd5 0x3a 0xaf 0xf0 0xd4 0x01 0x00 0x00 0xf0 0x0b 0x31 0x30 0x30 0x45 0x2d 0x30
2012-03-27T03:41:46.171Z cpu14:4977)ScsiDeviceIO: 5843: QErr is correctly set to 0x0 for device naa.6090a0b8606ed53aaff0d4010000f00b.
2012-03-27T03:41:46.171Z cpu14:4977)WARNING: ScsiDeviceIO: 6235: The device naa.6090a0b8606ed53aaff0d4010000f00b does not permit the system to change the sitpua bit to 1.
2012-03-27T03:41:46.171Z cpu14:4977)VAAI_FILTER: VaaiFilterClaimDevice:270: Attached vaai filter (vaaip:VMW_VAAIP_EQL) to logical device 'naa.6090a0b8606ed53aaff0d4010000f00b'
2012-03-27T03:41:46.203Z cpu14:4977)ScsiDevice: 3121: Successfully registered device "naa.6090a0b8606ed53aaff0d4010000f00b" from plugin "NMP" of type 0
2012-03-27T03:41:46.205Z cpu14:4977)vmw_psp_fixed: psp_fixedSelectPathToActivateInt:479: Changing active path from NONE to vmhba33:C2:T6:L0 for device "Unregistered Device".
These errors are all included in the following VMware KB articles which all point at the MEM plugin:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1016381
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2004432
First and foremost, I restarted the management agents by doing this in the SSH console:
service mgmt-vmware restart
Still no volumes, so I had the system do a full restart, only to find that there were still no volumes present.
I'm still seeing similar errors in the logs, so I go through and uninstall the Dell Equallogic MEM plugin. To do that, do the following and then reboot:
esxcli software vib remove --vibname dell-eql-host-connection-mgr --vibname dell-eql-hostprofile --vibname dell-eql-routed-psp
System comes back up, still no volumes. At this point, I did something I don't know that I really had to do. I disabled the iSCSI software initiator, rebooted, and re-enabled it. It didn't solve anything, and might have caused more trouble. I mention it because I'm not officially sure what the fix really was.
Last ditch effort, I put one of the volumes offline and then brought it back online. Amazingly enough, the ESXi host found it. It was given a path selection of "Fixed".
I wish I could find the KB articles (or maybe it was a blog) where it was talking about, essentially, a cached path selection. Proof of that could be found in the bottom three lines of the code above. At the time stamp 03:41 the MEM plugin had already been uninstalled so the ESXi host shouldn't have seen that plugin anymore. So on the bottom line, it swapped over to be "Fixed"
I haven't made another effort to reinstall the MEM plugin so far. I'd like to do some more testing on a smaller scale before I attempt it again.
Small follow up: I have created a support ticket with Dell. They are suggesting that I switch all of the path policies over to be Round Robin before upgrading.
I'm not exactly thrilled with that response, but will be going through and disabling the MEM Plugin, then trying the update again. Perhaps the plugin needs to be disabled beforehand.
I'll update with any results.
Update
Well, that seems to be the official word. The update process is not as streamlined as I would have hoped. The tech left me with "It is suggested to change all datastore multi pathing to round-robin on ESX 5.x before installing MEM."
I did happen to test it elsewhere and it does work just fine if you change all of the PSPs off the DELL_PSP_EQL_ROUTED to either Round Robin or Fixed, everything does work and the install of the new MEM complete successfully.
Registering the EQL HIT Kit to a New vCenter
Instead of the planned upgrade we were going to perform, we decided to start from scratch and do a full reinstall of our environment. So that entailed registering the Dell EqualLogic HIT Kit to a new VirtualCenter.
Start off by opening up the console on the VM and logging in. (Default Username: root Default Password: eql) Once logged in, select Option 8 to unregister it from the old vCenter.

From there, select Option 4 to configure vCenter. Enter in the credentials for the new vCenter (IP, admin account, password, EQL HIT Kit Appliance IP, and an admin email addres), confirm the credentials and the appliance should connect to vCenter and be successful.
Once back to the main screen, select Option 7 to register the appliance with vCenter and then reboot the appliance
After the appliance is back at the login prompt, check back to the vCenter "Solutions and Applications" section and make sure that the EqualLogic utilities are there. For good measure, login to one of the utilities and ensure the configuration is correct.

Installing the EQL HIT kit – VMware Edition
Start by downloading the HIT kit from http://support.equallogic.com/ in the Downloads section
"Accept" the EULA, and save the OVA file

Through vCenter, go to "File" then "Deploy OVF Template..."

Select the OVA file that was just downloaded

Verify the OVF details

Give the new system a name and a location, along with a datastore to reside on


Choose the disk format (thin or thick) and assign the networking


Verify the information, click finish, and wait for the deployment to complete


Open the console on the new system
Select the green "power on" button
At the login prompt, login with:
Username: root
Password: eql


Set the Time Zone by selecting Item 1
Select "Americas" by selecting #2

Select "United States" by selecting #47

Select your proper time zone
Verify the information by selecting #1
Verify the information again, hit the "Enter" key

Configure the management network by selecting Item 2
Enter a hostname
Select which NIC to use for the management network (eth0 is default)
Enter an IP
Enter the Network mask
Enter the Gateway
Enter the DNS server
Enter a secondary DNS server
Verify the information, proceed by entering "y"
Verify the information again, hit the "Enter" key

Configure the storage managment network by selecting Item 3
Enter an IP
Enter the Network mask
Enter the Gateway
Verify the information, proceed by entering "y"
Verify the information again, hit the "Enter" key

Configure the vCenter by selecting Item 4
Enter the name or address of the vCenter
Enter the name of an admin
Enter and re-enter the admin's password
Enter an administrator's email address
Verify the information, proceed by entering "y"
Verify the information again, hit the "Enter" key

Configure the PS Group by selecting Item 5
Enter the name or IP of the PS group
Enter the name of a PS group admin
Enter and re-enter the admin's password
Verify the information, proceed by entering "y"
Verify the information again, hit the "Enter" key

Configure View if you use it in your environment, we currently do not so this step was skipped.
Register the plug-in with vCenter by selecting Item 7


Verify the information, proceed by entering "y" and the system will reboot
After the system reboots, go to vCenter and check your plugins to make sure they're there.
Back on the "Home" screen in vCenter, it should look similar to this:

Click on the "Equallogic Datastore Manager" and accept the digital signature, then login with your normal vCenter credentials


A screen similar to this should be received:





10 GHz Total CPU
16 GB Total RAM
7,578 GB Total Disk
1 Host(s)
1 RPs
8 VMs
0 vMotions
(4)
(4)
(0)
3 Physical NICs
3 Virtual PGs