That… could be a problem…

30Mar/124

When installing the Dell Equallogic MEM goes wrong…

For those those don't know, Dell and Equallogic have been releasing firmwares and updates at a massive pace. One which I, personally, prefer compared to some of the other vendors which release updates at a quarterly rate.

However, after upgrading to the VMware version of MEM 1.1 things started to go very... very wrong.

I happened to take a screenshot of the successful install of 1.1 and removal of 1.0.9:
MEM Install

After a reboot, none of the volumes returned. None, not a single one. I checked the Management Console, everything was fine, except for the connections tab which showed zero connections instead of the normal 48.

I then connected to the host and could ping all of the storage array's iSCSI ports without problem.

Being that I'm already SSH'd in to the host, figure I'll do some log diving.

In the vmkwarning.log I'm hit instantly with a bunch of these:

2012-03-26T20:31:17.714Z cpu13:7053)WARNING: NMP: nmpDeviceAttemptFailover:658:Retry world failover device "naa.6090a0a8c099b78b22e8d4f8d3904f66" - failed to issue command due to Not found (APD), try again...

2012-03-26T20:31:17.714Z cpu13:7053)WARNING: NMP: nmpDeviceAttemptFailover:708:Logical device "naa.6090a0a8c099b78b22e8d4f8d3904f66": awaiting fast path state update...

2012-03-26T20:31:18.714Z cpu2:6930)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate:972:Could not select path for device "naa.6090a0a8c099b78b22e8d4f8d3904f66".

2012-03-26T20:31:18.714Z cpu2:4786)WARNING: vmw_psp_rr: psp_rrSelectPath:1146:Could not select path for device "naa.6090a0a8c099b78b22e8d4f8d3904f66".




2012-03-27T01:32:31.371Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba33:CH:1 T:15 CN:0: Failed to receive data: Connection closed by peer

2012-03-27T01:32:31.371Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]

2012-03-27T01:32:31.371Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 10.*.*.22:62322 R: 10.*.*.5:3260]
2012-03-27T01:32:31.623Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba33:CH:0 T:0 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)

2012-03-27T01:32:31.623Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]

2012-03-27T01:32:31.623Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 10.*.*.21:52273 R: 10.*.*.5:3260]



2012-03-27T03:20:39.154Z cpu4:4980)WARNING: NMP: nmp_SatpGetDefaultPspi:624:Default psp DELL_PSP_EQL_ROUTED for SATP: VMW_SATP_EQL Load Failed! [Not found]

2012-03-27T03:20:39.159Z cpu0:4980)WARNING: ScsiDeviceIO: 6235: The device naa.6090a0a8c099072610ef54ee86018060 does not permit the system to change the sitpua bit to 1.

Then, to add insult to injury, I check out the vmkernel.log and find more horribleness:

2012-03-27T03:17:23.611Z cpu5:5624)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x41002c0b7640 network resource pool netsched.pools.persist.iscsi associated

2012-03-27T03:17:23.611Z cpu5:5624)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x41002c0b7640 network tracker id 1 tracker.iSCSI.10.*.*.5 associated

2012-03-27T03:17:23.612Z cpu5:5624)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba33:CH:0 T:3 CN:0: Failed to receive data: Connection closed by peer

2012-03-27T03:17:23.612Z cpu5:5624)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]

2012-03-27T03:17:23.613Z cpu5:5624)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 10.*.*.21:56710 R: 10.*.*.5:3260]

2012-03-27T03:17:23.613Z cpu5:5624)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: vmhba33:CH:0 T:3 CN:0: Connection rx notifying failure: Failed to Receive. State=Bound

2012-03-27T03:17:23.613Z cpu5:5624)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]

2012-03-27T03:17:23.613Z cpu5:5624)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Conn [CID: 0 L: 10.*.*.21:56710 R: 10.*.*.5:3260]



2012-03-27T03:19:19.127Z cpu14:5624)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x41002c09d020 network resource pool netsched.pools.persist.iscsi associated

2012-03-27T03:19:19.127Z cpu14:5624)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x41002c09d020 network tracker id 1 tracker.iSCSI.10.*.*.10 associated
2012-03-27T03:19:19.595Z cpu14:5624)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba33:CH:0 T:0 CN:0: iSCSI connection is being marked "ONLINE"

2012-03-27T03:19:19.596Z cpu14:5624)WARNING: iscsi_vmk: iscsivmk_StartConnection: Sess [ISID: 00023d000001 TARGET: iqn.2001-05.com.equallogic:0-8a0906-8bb799c0a-664f90d3f8d4e822-remotesystems TPGT: 1 TSIH: 0]

2012-03-27T03:19:19.596Z cpu14:5624)WARNING: iscsi_vmk: iscsivmk_StartConnection: Conn [CID: 0 L: 10.*.*.21:58840 R: 10.*.*.10:3260]



2012-03-27T03:19:20.344Z cpu22:5497)ScsiScan: 1098: Path 'vmhba33:C2:T0:L0': Vendor: 'EQLOGIC ' Model: '100E-00 ' Rev: '5.2 '

2012-03-27T03:19:20.344Z cpu22:5497)ScsiScan: 1101: Path 'vmhba33:C2:T0:L0': Type: 0x0, ANSI rev: 5, TPGS: 1 (implicit only)

2012-03-27T03:19:20.345Z cpu0:5497)ScsiScan: 1582: Add path: vmhba33:C2:T0:L0

2012-03-27T03:19:20.385Z cpu0:5497)VMKAPICore: 2204: Loading Module vmw_satp_eql

2012-03-27T03:19:20.386Z cpu21:4783)WARNING: UserObj: 3232: Unimplemented operation on 0x410017c05b60/RPC

2012-03-27T03:19:20.386Z cpu21:4783)WARNING: UserObj: 675: Failed to crossdup fd 9, cnxId: 0x80000000 type RPC: Not implemented

2012-03-27T03:19:20.544Z cpu4:6392)Loading module vmw_satp_eql ...

2012-03-27T03:19:20.544Z cpu4:6392)Elf: 1862: module vmw_satp_eql has license VMware

2012-03-27T03:19:20.544Z cpu4:6392)Mod: 4015: Initialization of vmw_satp_eql succeeded with module ID 62.

2012-03-27T03:19:20.544Z cpu4:6392)vmw_satp_eql loaded successfully.

2012-03-27T03:19:20.549Z cpu0:5497)NMP: nmp_LoadPlugin:3188: Plugin DELL_PSP_EQL_ROUTED is not registered!

2012-03-27T03:19:20.549Z cpu0:5497)WARNING: NMP: nmp_SatpGetDefaultPspi:624:Default psp DELL_PSP_EQL_ROUTED for SATP: VMW_SATP_EQL Load Failed! [Not found]

2012-03-27T03:19:20.549Z cpu0:5497)ScsiPath: 4541: Plugin 'NMP' claimed path 'vmhba33:C0:T0:L0'

2012-03-27T03:19:20.551Z cpu10:5497)vmw_psp_fixed: psp_fixedSelectPathToActivateInt:479: Changing active path from NONE to vmhba33:C2:T0:L0 for device "Unregistered Device".



2012-03-27T03:41:46.169Z cpu14:4977)VMWARE SCSI Id: Id for vmhba33:C1:T14:L0
0x60 0x90 0xa0 0xb8 0x60 0x6e 0xd5 0x3a 0xaf 0xf0 0xd4 0x01 0x00 0x00 0xf0 0x0b 0x31 0x30 0x30 0x45 0x2d 0x30

2012-03-27T03:41:46.169Z cpu14:4977)VMWARE SCSI Id: Id for vmhba33:C0:T14:L0
0x60 0x90 0xa0 0xb8 0x60 0x6e 0xd5 0x3a 0xaf 0xf0 0xd4 0x01 0x00 0x00 0xf0 0x0b 0x31 0x30 0x30 0x45 0x2d 0x30

2012-03-27T03:41:46.171Z cpu14:4977)ScsiDeviceIO: 5843: QErr is correctly set to 0x0 for device naa.6090a0b8606ed53aaff0d4010000f00b.

2012-03-27T03:41:46.171Z cpu14:4977)WARNING: ScsiDeviceIO: 6235: The device naa.6090a0b8606ed53aaff0d4010000f00b does not permit the system to change the sitpua bit to 1.

2012-03-27T03:41:46.171Z cpu14:4977)VAAI_FILTER: VaaiFilterClaimDevice:270: Attached vaai filter (vaaip:VMW_VAAIP_EQL) to logical device 'naa.6090a0b8606ed53aaff0d4010000f00b'

2012-03-27T03:41:46.203Z cpu14:4977)ScsiDevice: 3121: Successfully registered device "naa.6090a0b8606ed53aaff0d4010000f00b" from plugin "NMP" of type 0

2012-03-27T03:41:46.205Z cpu14:4977)vmw_psp_fixed: psp_fixedSelectPathToActivateInt:479: Changing active path from NONE to vmhba33:C2:T6:L0 for device "Unregistered Device".

These errors are all included in the following VMware KB articles which all point at the MEM plugin:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1016381
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2004432

First and foremost, I restarted the management agents by doing this in the SSH console:
service mgmt-vmware restart

Still no volumes, so I had the system do a full restart, only to find that there were still no volumes present.

I'm still seeing similar errors in the logs, so I go through and uninstall the Dell Equallogic MEM plugin. To do that, do the following and then reboot:
esxcli software vib remove --vibname dell-eql-host-connection-mgr --vibname dell-eql-hostprofile --vibname dell-eql-routed-psp

System comes back up, still no volumes. At this point, I did something I don't know that I really had to do. I disabled the iSCSI software initiator, rebooted, and re-enabled it. It didn't solve anything, and might have caused more trouble. I mention it because I'm not officially sure what the fix really was.

Last ditch effort, I put one of the volumes offline and then brought it back online. Amazingly enough, the ESXi host found it. It was given a path selection of "Fixed".

I wish I could find the KB articles (or maybe it was a blog) where it was talking about, essentially, a cached path selection. Proof of that could be found in the bottom three lines of the code above. At the time stamp 03:41 the MEM plugin had already been uninstalled so the ESXi host shouldn't have seen that plugin anymore. So on the bottom line, it swapped over to be "Fixed"

I haven't made another effort to reinstall the MEM plugin so far. I'd like to do some more testing on a smaller scale before I attempt it again.

Small follow up: I have created a support ticket with Dell. They are suggesting that I switch all of the path policies over to be Round Robin before upgrading.

I'm not exactly thrilled with that response, but will be going through and disabling the MEM Plugin, then trying the update again. Perhaps the plugin needs to be disabled beforehand.

I'll update with any results.

Update

Well, that seems to be the official word. The update process is not as streamlined as I would have hoped. The tech left me with "It is suggested to change all datastore multi pathing to round-robin on ESX 5.x before installing MEM."

I did happen to test it elsewhere and it does work just fine if you change all of the PSPs off the DELL_PSP_EQL_ROUTED to either Round Robin or Fixed, everything does work and the install of the new MEM complete successfully.

13Dec/110

Registering the EQL HIT Kit to a New vCenter

Instead of the planned upgrade we were going to perform, we decided to start from scratch and do a full reinstall of our environment. So that entailed registering the Dell EqualLogic HIT Kit to a new VirtualCenter.

Start off by opening up the console on the VM and logging in. (Default Username: root Default Password: eql) Once logged in, select Option 8 to unregister it from the old vCenter.

HIT Kit Login
HIT Kit Unreg

From there, select Option 4 to configure vCenter. Enter in the credentials for the new vCenter (IP, admin account, password, EQL HIT Kit Appliance IP, and an admin email addres), confirm the credentials and the appliance should connect to vCenter and be successful.
HIT Kit Config

Once back to the main screen, select Option 7 to register the appliance with vCenter and then reboot the appliance
HIT Kit Reg

After the appliance is back at the login prompt, check back to the vCenter "Solutions and Applications" section and make sure that the EqualLogic utilities are there. For good measure, login to one of the utilities and ensure the configuration is correct.
vCenter Plugins
vCenter Config

6Jul/110

Installing the EQL HIT kit – VMware Edition

Start by downloading the HIT kit from http://support.equallogic.com/ in the Downloads section
"Accept" the EULA, and save the OVA file
OVA Download

Through vCenter, go to "File" then "Deploy OVF Template..."
Deploy OVF Template

Select the OVA file that was just downloaded
OVA Deploy

Verify the OVF details
OVF Details

Give the new system a name and a location, along with a datastore to reside on
Name
Datastore

Choose the disk format (thin or thick) and assign the networking
Disk Format
Networking

Verify the information, click finish, and wait for the deployment to complete
Verify
Complete

Open the console on the new system
Select the green "power on" button
At the login prompt, login with:
Username: root
Password: eql
Logging In
Initial Menu

Set the Time Zone by selecting Item 1
Select "Americas" by selecting #2
Time Zone Config

Select "United States" by selecting #47
Time Zone Config

Select your proper time zone
Verify the information by selecting #1
Verify the information again, hit the "Enter" key
Time Zone Config

Configure the management network by selecting Item 2
Enter a hostname
Select which NIC to use for the management network (eth0 is default)
Enter an IP
Enter the Network mask
Enter the Gateway
Enter the DNS server
Enter a secondary DNS server
Verify the information, proceed by entering "y"
Verify the information again, hit the "Enter" key
Management Network Config

Configure the storage managment network by selecting Item 3
Enter an IP
Enter the Network mask
Enter the Gateway
Verify the information, proceed by entering "y"
Verify the information again, hit the "Enter" key
Networking Config

Configure the vCenter by selecting Item 4
Enter the name or address of the vCenter
Enter the name of an admin
Enter and re-enter the admin's password
Enter an administrator's email address
Verify the information, proceed by entering "y"
Verify the information again, hit the "Enter" key
vCenter Config

Configure the PS Group by selecting Item 5
Enter the name or IP of the PS group
Enter the name of a PS group admin
Enter and re-enter the admin's password
Verify the information, proceed by entering "y"
Verify the information again, hit the "Enter" key
PS Group Config

Configure View if you use it in your environment, we currently do not so this step was skipped.
Register the plug-in with vCenter by selecting Item 7
Review
Register

Verify the information, proceed by entering "y" and the system will reboot

After the system reboots, go to vCenter and check your plugins to make sure they're there.
Back on the "Home" screen in vCenter, it should look similar to this:
Plugins

Click on the "Equallogic Datastore Manager" and accept the digital signature, then login with your normal vCenter credentials
Datastore Manager
HIT/VE Login

A screen similar to this should be received:
HIT/VE Main Screen