When installing the Dell Equallogic MEM goes wrong…
For those those don't know, Dell and Equallogic have been releasing firmwares and updates at a massive pace. One which I, personally, prefer compared to some of the other vendors which release updates at a quarterly rate.
However, after upgrading to the VMware version of MEM 1.1 things started to go very... very wrong.
I happened to take a screenshot of the successful install of 1.1 and removal of 1.0.9:
After a reboot, none of the volumes returned. None, not a single one. I checked the Management Console, everything was fine, except for the connections tab which showed zero connections instead of the normal 48.
I then connected to the host and could ping all of the storage array's iSCSI ports without problem.
Being that I'm already SSH'd in to the host, figure I'll do some log diving.
In the vmkwarning.log I'm hit instantly with a bunch of these:
2012-03-26T20:31:17.714Z cpu13:7053)WARNING: NMP: nmpDeviceAttemptFailover:658:Retry world failover device "naa.6090a0a8c099b78b22e8d4f8d3904f66" - failed to issue command due to Not found (APD), try again...
2012-03-26T20:31:17.714Z cpu13:7053)WARNING: NMP: nmpDeviceAttemptFailover:708:Logical device "naa.6090a0a8c099b78b22e8d4f8d3904f66": awaiting fast path state update...
2012-03-26T20:31:18.714Z cpu2:6930)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate:972:Could not select path for device "naa.6090a0a8c099b78b22e8d4f8d3904f66".
2012-03-26T20:31:18.714Z cpu2:4786)WARNING: vmw_psp_rr: psp_rrSelectPath:1146:Could not select path for device "naa.6090a0a8c099b78b22e8d4f8d3904f66".
2012-03-27T01:32:31.371Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba33:CH:1 T:15 CN:0: Failed to receive data: Connection closed by peer
2012-03-27T01:32:31.371Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2012-03-27T01:32:31.371Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 10.*.*.22:62322 R: 10.*.*.5:3260]
2012-03-27T01:32:31.623Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba33:CH:0 T:0 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
2012-03-27T01:32:31.623Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2012-03-27T01:32:31.623Z cpu7:5651)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 10.*.*.21:52273 R: 10.*.*.5:3260]
2012-03-27T03:20:39.154Z cpu4:4980)WARNING: NMP: nmp_SatpGetDefaultPspi:624:Default psp DELL_PSP_EQL_ROUTED for SATP: VMW_SATP_EQL Load Failed! [Not found]
2012-03-27T03:20:39.159Z cpu0:4980)WARNING: ScsiDeviceIO: 6235: The device naa.6090a0a8c099072610ef54ee86018060 does not permit the system to change the sitpua bit to 1.
Then, to add insult to injury, I check out the vmkernel.log and find more horribleness:
2012-03-27T03:17:23.611Z cpu5:5624)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x41002c0b7640 network resource pool netsched.pools.persist.iscsi associated
2012-03-27T03:17:23.611Z cpu5:5624)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x41002c0b7640 network tracker id 1 tracker.iSCSI.10.*.*.5 associated
2012-03-27T03:17:23.612Z cpu5:5624)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba33:CH:0 T:3 CN:0: Failed to receive data: Connection closed by peer
2012-03-27T03:17:23.612Z cpu5:5624)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2012-03-27T03:17:23.613Z cpu5:5624)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 10.*.*.21:56710 R: 10.*.*.5:3260]
2012-03-27T03:17:23.613Z cpu5:5624)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: vmhba33:CH:0 T:3 CN:0: Connection rx notifying failure: Failed to Receive. State=Bound
2012-03-27T03:17:23.613Z cpu5:5624)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Sess [ISID: TARGET: (null) TPGT: 0 TSIH: 0]
2012-03-27T03:17:23.613Z cpu5:5624)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Conn [CID: 0 L: 10.*.*.21:56710 R: 10.*.*.5:3260]
2012-03-27T03:19:19.127Z cpu14:5624)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x41002c09d020 network resource pool netsched.pools.persist.iscsi associated
2012-03-27T03:19:19.127Z cpu14:5624)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x41002c09d020 network tracker id 1 tracker.iSCSI.10.*.*.10 associated
2012-03-27T03:19:19.595Z cpu14:5624)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba33:CH:0 T:0 CN:0: iSCSI connection is being marked "ONLINE"
2012-03-27T03:19:19.596Z cpu14:5624)WARNING: iscsi_vmk: iscsivmk_StartConnection: Sess [ISID: 00023d000001 TARGET: iqn.2001-05.com.equallogic:0-8a0906-8bb799c0a-664f90d3f8d4e822-remotesystems TPGT: 1 TSIH: 0]
2012-03-27T03:19:19.596Z cpu14:5624)WARNING: iscsi_vmk: iscsivmk_StartConnection: Conn [CID: 0 L: 10.*.*.21:58840 R: 10.*.*.10:3260]
2012-03-27T03:19:20.344Z cpu22:5497)ScsiScan: 1098: Path 'vmhba33:C2:T0:L0': Vendor: 'EQLOGIC ' Model: '100E-00 ' Rev: '5.2 '
2012-03-27T03:19:20.344Z cpu22:5497)ScsiScan: 1101: Path 'vmhba33:C2:T0:L0': Type: 0x0, ANSI rev: 5, TPGS: 1 (implicit only)
2012-03-27T03:19:20.345Z cpu0:5497)ScsiScan: 1582: Add path: vmhba33:C2:T0:L0
2012-03-27T03:19:20.385Z cpu0:5497)VMKAPICore: 2204: Loading Module vmw_satp_eql
2012-03-27T03:19:20.386Z cpu21:4783)WARNING: UserObj: 3232: Unimplemented operation on 0x410017c05b60/RPC
2012-03-27T03:19:20.386Z cpu21:4783)WARNING: UserObj: 675: Failed to crossdup fd 9, cnxId: 0x80000000 type RPC: Not implemented
2012-03-27T03:19:20.544Z cpu4:6392)Loading module vmw_satp_eql ...
2012-03-27T03:19:20.544Z cpu4:6392)Elf: 1862: module vmw_satp_eql has license VMware
2012-03-27T03:19:20.544Z cpu4:6392)Mod: 4015: Initialization of vmw_satp_eql succeeded with module ID 62.
2012-03-27T03:19:20.544Z cpu4:6392)vmw_satp_eql loaded successfully.
2012-03-27T03:19:20.549Z cpu0:5497)NMP: nmp_LoadPlugin:3188: Plugin DELL_PSP_EQL_ROUTED is not registered!
2012-03-27T03:19:20.549Z cpu0:5497)WARNING: NMP: nmp_SatpGetDefaultPspi:624:Default psp DELL_PSP_EQL_ROUTED for SATP: VMW_SATP_EQL Load Failed! [Not found]
2012-03-27T03:19:20.549Z cpu0:5497)ScsiPath: 4541: Plugin 'NMP' claimed path 'vmhba33:C0:T0:L0'
2012-03-27T03:19:20.551Z cpu10:5497)vmw_psp_fixed: psp_fixedSelectPathToActivateInt:479: Changing active path from NONE to vmhba33:C2:T0:L0 for device "Unregistered Device".
2012-03-27T03:41:46.169Z cpu14:4977)VMWARE SCSI Id: Id for vmhba33:C1:T14:L0
0x60 0x90 0xa0 0xb8 0x60 0x6e 0xd5 0x3a 0xaf 0xf0 0xd4 0x01 0x00 0x00 0xf0 0x0b 0x31 0x30 0x30 0x45 0x2d 0x30
2012-03-27T03:41:46.169Z cpu14:4977)VMWARE SCSI Id: Id for vmhba33:C0:T14:L0
0x60 0x90 0xa0 0xb8 0x60 0x6e 0xd5 0x3a 0xaf 0xf0 0xd4 0x01 0x00 0x00 0xf0 0x0b 0x31 0x30 0x30 0x45 0x2d 0x30
2012-03-27T03:41:46.171Z cpu14:4977)ScsiDeviceIO: 5843: QErr is correctly set to 0x0 for device naa.6090a0b8606ed53aaff0d4010000f00b.
2012-03-27T03:41:46.171Z cpu14:4977)WARNING: ScsiDeviceIO: 6235: The device naa.6090a0b8606ed53aaff0d4010000f00b does not permit the system to change the sitpua bit to 1.
2012-03-27T03:41:46.171Z cpu14:4977)VAAI_FILTER: VaaiFilterClaimDevice:270: Attached vaai filter (vaaip:VMW_VAAIP_EQL) to logical device 'naa.6090a0b8606ed53aaff0d4010000f00b'
2012-03-27T03:41:46.203Z cpu14:4977)ScsiDevice: 3121: Successfully registered device "naa.6090a0b8606ed53aaff0d4010000f00b" from plugin "NMP" of type 0
2012-03-27T03:41:46.205Z cpu14:4977)vmw_psp_fixed: psp_fixedSelectPathToActivateInt:479: Changing active path from NONE to vmhba33:C2:T6:L0 for device "Unregistered Device".
These errors are all included in the following VMware KB articles which all point at the MEM plugin:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1016381
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2004432
First and foremost, I restarted the management agents by doing this in the SSH console:
service mgmt-vmware restart
Still no volumes, so I had the system do a full restart, only to find that there were still no volumes present.
I'm still seeing similar errors in the logs, so I go through and uninstall the Dell Equallogic MEM plugin. To do that, do the following and then reboot:
esxcli software vib remove --vibname dell-eql-host-connection-mgr --vibname dell-eql-hostprofile --vibname dell-eql-routed-psp
System comes back up, still no volumes. At this point, I did something I don't know that I really had to do. I disabled the iSCSI software initiator, rebooted, and re-enabled it. It didn't solve anything, and might have caused more trouble. I mention it because I'm not officially sure what the fix really was.
Last ditch effort, I put one of the volumes offline and then brought it back online. Amazingly enough, the ESXi host found it. It was given a path selection of "Fixed".
I wish I could find the KB articles (or maybe it was a blog) where it was talking about, essentially, a cached path selection. Proof of that could be found in the bottom three lines of the code above. At the time stamp 03:41 the MEM plugin had already been uninstalled so the ESXi host shouldn't have seen that plugin anymore. So on the bottom line, it swapped over to be "Fixed"
I haven't made another effort to reinstall the MEM plugin so far. I'd like to do some more testing on a smaller scale before I attempt it again.
Small follow up: I have created a support ticket with Dell. They are suggesting that I switch all of the path policies over to be Round Robin before upgrading.
I'm not exactly thrilled with that response, but will be going through and disabling the MEM Plugin, then trying the update again. Perhaps the plugin needs to be disabled beforehand.
I'll update with any results.
Update
Well, that seems to be the official word. The update process is not as streamlined as I would have hoped. The tech left me with "It is suggested to change all datastore multi pathing to round-robin on ESX 5.x before installing MEM."
I did happen to test it elsewhere and it does work just fine if you change all of the PSPs off the DELL_PSP_EQL_ROUTED to either Round Robin or Fixed, everything does work and the install of the new MEM complete successfully.
Registering the EQL HIT Kit to a New vCenter
Instead of the planned upgrade we were going to perform, we decided to start from scratch and do a full reinstall of our environment. So that entailed registering the Dell EqualLogic HIT Kit to a new VirtualCenter.
Start off by opening up the console on the VM and logging in. (Default Username: root Default Password: eql) Once logged in, select Option 8 to unregister it from the old vCenter.

From there, select Option 4 to configure vCenter. Enter in the credentials for the new vCenter (IP, admin account, password, EQL HIT Kit Appliance IP, and an admin email addres), confirm the credentials and the appliance should connect to vCenter and be successful.
Once back to the main screen, select Option 7 to register the appliance with vCenter and then reboot the appliance
After the appliance is back at the login prompt, check back to the vCenter "Solutions and Applications" section and make sure that the EqualLogic utilities are there. For good measure, login to one of the utilities and ensure the configuration is correct.

Updating EqualLogic SANHQ Software
Dell has released an early production access release of SANHQ, the version is 2.2. It's been updated to include some new features such as Live View, RAID Evaluator, support for multiple SANHQ servers connections to the client, support for the new EqualLogic FS7500, temperature displays, drive firmware notifications, and the ability to increase the syslog file size.
Personally, the most exciting ones were these:
Live View allows the streaming of data from either an individual member or even volume.

RAID Evaluator (Requires Firmware v5.1) which takes the Group's current data and evaluates it as if it had a different RAID group applied.

To start the install, go to http://support.equallogic.com/ and go to the Downloads section and look for the SAN HeadQuarters section and download the latest version. Once downloaded, run the SANHQSetup32and64.exe

I was immediately greeted by a screen to install the .NET 4.0 client profile, so click "Yes" to install that.

Accept the license terms and click "Install" (Be patient, it did take some time in my case)


Verify that the installer will upgrade the current insstallation and click "Continue" to upgrade the current version of SANHQ, after that the service will be stopped to proceed


Enable the TCP/IP Communication, this is required for all of the advanced features such as Live View, and then the new version is installed

SAN HeadQuarters will then update, click "Finish"

The first time SANHQ opens, one of the other features what was included will greet you. That is the firewall detection update. You can tell the firewall to update automatically or you can create the rules on your own. Personally, I use Group Policy to create my firewall rules, so I chose "No".

You'll then be asked for credentials to the SANHQ Server, and after that you're in


Everything went quite well for me except for one thing... An email was sent out that all of the controllers failed and that the Firmware was upgraded on them. All very odd, and after a quick screenshot, all the worries from those on the email list were put at ease.


PS Series Firmware – V5.1.1-H2 – Another update!
Just got a nice email from Dell that indicates some people have been having problems during the V5.1.1-H1 update process where it actually freezes! Luckily, I had no such problems with my upgrade to H1
However, I'm still in the lucky situation where I have a PS6500E at my disposal to test with. So it's time to apply the new patch to it and see how things go with this update.
So head back out to the Equallogic Support site: http://support.equallogic.com/ and download the new V5.1.1-H2 firmware and extract it.
H2 appears to be the same size as H1, so far so good.

Open up the Group Manager and login:

Click on "Group Configuration", then the "Advanced" tab, and click on the "Update Firmware" button:

Enter the admin password, select the newly downloaded and extracted H2 firmware and click "Open"

You should see that the EQL is indeed running a different firmware, then click the "Update" button

Watch as the firmware is transferred to the EQL, then as the firmware is processed, then click the "Restart" button and hope for the best!



We see the status go to "Preparing to restart" then to "Offline"


After about 5 minutes of waiting, we see a refreshing "Member is up to date" message in green

Finally for the real test, the effect on the VMware host. Max Write Latency of 2508ms, that's it!

So in review, took about 20 minutes to complete from start to finish and everything worked perfectly and as it did the first go-round. Roughly the same write latency was seen in H2 as was seen in H1. So I'd claim this to be a success. Thanks to the Dell/EQL teams!
Upgrading to EQL Firmware 5.1.1-H1
So if you were like me, you noticed that Dell pulled the v5.1 firmware upgrade for the Equallogics pretty quickly. Then v5.1.1 appeared, only to be taken down days later. Finally, this morning, v5.1.1-H1 appeared this morning!
After seeing all the benefits to the new firmware at the Dell Storage Forum, I've been a bit on the excited side to get this tossed on some hardware and see how much of a difference we're talking!
So to start your firmware upgrade, head out to the Equallogic Support site and download the new Firmware. Once downloaded, extract it. Should look similar to:

Log into the Group Manager. Here's what I'm working with at the moment:

Click on the "Group Configuration", then go to the "Advanced" tab, and click on the "Update firmware..." button:

Enter the administrative password for the group:

Browse to the extracted firmware location and select the file with a tgz extension:

The Firmware Update Manager will tell you the status of the member/s in the group and give you an action. In this case, the member is running an older version of firmware and there is and "Upgrade" action available.

Click on the "Upgrade" button and it will start a 3 step process, starting with the FTP transfer of the firmware, then having the member process the firmware update, and finally a restart of each controller:





From my experience, the reboot is ONLY of each controller and not at the same time. The inactive controller reboots, then it is changed to the active controller, then the newly inactive controller is rebooted.
There was a little noticeable lag during the reboot, nothing major. Certainly wouldn't do this during the middle of a busy or even moderately busy day. VMware reported roughly 2.25 seconds of latency during the process:

Finally, after the reboot of each controller. They're both reporting as v5.1.1-H1 firmware!

Initializing an Equallogic – the GUI way
Personally, I rather enjoy getting on the serial cable and setting it up that way. However, a recent experience forced me to have to set up an Equallogic PS6000E sight-unseen. Someone else had already done the dirty work of racking and networking it all together. So we are assuming that the Windows system we're using has at least one NIC on the same switch and/or VLAN as the EQL was plugged in to.
Start off by installing the HIT kit for Windows
Run the "Remote Setup Wizard"

Choose to initialize the PS Series array, click "Next" and wait for the system to find the array


With a bit of good luck, it should show up

Give the newly found array a name, IP, subnet, and gateway. If this is the first one, create a new group or you can even join an existing group. In this case, this is the first one.

Since this is the first one for the group, create a new group with the necessary information

Once everything is entered, allow the array some time to initialize

Upon completion, click "Finish" to exit from the program. The array has been successfully created.

Installing the EQL HIT kit – VMware Edition
Start by downloading the HIT kit from http://support.equallogic.com/ in the Downloads section
"Accept" the EULA, and save the OVA file

Through vCenter, go to "File" then "Deploy OVF Template..."

Select the OVA file that was just downloaded

Verify the OVF details

Give the new system a name and a location, along with a datastore to reside on


Choose the disk format (thin or thick) and assign the networking


Verify the information, click finish, and wait for the deployment to complete


Open the console on the new system
Select the green "power on" button
At the login prompt, login with:
Username: root
Password: eql


Set the Time Zone by selecting Item 1
Select "Americas" by selecting #2

Select "United States" by selecting #47

Select your proper time zone
Verify the information by selecting #1
Verify the information again, hit the "Enter" key

Configure the management network by selecting Item 2
Enter a hostname
Select which NIC to use for the management network (eth0 is default)
Enter an IP
Enter the Network mask
Enter the Gateway
Enter the DNS server
Enter a secondary DNS server
Verify the information, proceed by entering "y"
Verify the information again, hit the "Enter" key

Configure the storage managment network by selecting Item 3
Enter an IP
Enter the Network mask
Enter the Gateway
Verify the information, proceed by entering "y"
Verify the information again, hit the "Enter" key

Configure the vCenter by selecting Item 4
Enter the name or address of the vCenter
Enter the name of an admin
Enter and re-enter the admin's password
Enter an administrator's email address
Verify the information, proceed by entering "y"
Verify the information again, hit the "Enter" key

Configure the PS Group by selecting Item 5
Enter the name or IP of the PS group
Enter the name of a PS group admin
Enter and re-enter the admin's password
Verify the information, proceed by entering "y"
Verify the information again, hit the "Enter" key

Configure View if you use it in your environment, we currently do not so this step was skipped.
Register the plug-in with vCenter by selecting Item 7


Verify the information, proceed by entering "y" and the system will reboot
After the system reboots, go to vCenter and check your plugins to make sure they're there.
Back on the "Home" screen in vCenter, it should look similar to this:

Click on the "Equallogic Datastore Manager" and accept the digital signature, then login with your normal vCenter credentials


A screen similar to this should be received:

Server 2008 R2 Throughput Testing
Background:
SAN Back end: 2 Equallogic 6510Es combined in a single storage pool
1 Windows Server 2008R2, base install, not joined to the domain on an ESXi 4.1 host
1 Windows Server 2008R2, base install, not joined to the domain on another ESXi 4.1 host
Connected via VMXNET3 NICs on our Dell PowerConnect 8024F 10G network on a separate VLAN, no other network connections
Test Procedure: Moving an ~8GB zip via UNC path
Changes made to local policy followed by running gpupdate
Default Policy: 69MB/second

Disable: Digitally sign secure communications (always): 96.7MB/second

Disable: Digitally sign secure communications (always) & (if server/client agrees): 345MB/second

Initial Thoughts: EQL 6510E
So here I have a brand new Equallogic 6510E, decided to power it on and test it out before the rest of the equipment arrived.
Couple notes to make:
- Check your power button before plugging in the power, the first one had the button depressed already
- The unit needs to have all the drives inserted to work properly, tried two, twelve, twenty four... No luck, need all 48.
- Don't force the drives into the slots. The mounts on the drives are a little more fragile than I would've expected, but if you line it up right there are no problems.
- Default Username and Password are both: grpadmin
We had some problems getting the management network up and running, apparently it was resolved once we received our PowerConnect 8024F switch. However, we did learn some cool stuff as a result. For instance, the EQLs run off of NetBSD. Enter "support" at the command line and *bang* you've opened a door to a whole lot more functionality. Was able to go in, set the Management NIC (ifconfig eth2) to a static IP. Worked out great.
*NOTE* The support command line option is not supported by Dell, use at your own risk.
Racking the EQLs & R710s
Finally got our new rack setup and it's time to toss everything in
After a lot of struggling with the ball bearing rack mounts, it's in!

Add in 3 R710s and we're down to a single rack, amazing stuff

The old home, still a physical box to retire and an unneeded KVM






10 GHz Total CPU
16 GB Total RAM
7,578 GB Total Disk
1 Host(s)
1 RPs
8 VMs
0 vMotions
(4)
(4)
(0)
3 Physical NICs
3 Virtual PGs