Monday, November 16, 2009

Vxplex DISABLED RECOVER state

After a NetApp 6080 hosting FCP LUNs failed this weekend we came into the office to notice many of the servers using those LUNs had offline volumes and disk groups.


Here was the state of the volume in question

v szdbor006du02 - DISABLED ACTIVE 2727606272 SELECT - fsgen
pl szdbor006du02-01 szdbor006du02 DISABLED RECOVER 2727606272 CONCAT - RW
sd szdbor006ddg01-01 szdbor006du02-01 szdbor006ddg01 0 209646560 0 c1t500A098187197B34d10 ENA
sd szdbor006ddg02-02 szdbor006du02-01 szdbor006ddg02 209648096 943459616 209646560 c1t500A098187197B34d11 ENA
sd szdbor006ddg03-01 szdbor006du02-01 szdbor006ddg03 0 1153107712 1153106176 c3t500A098287197B34d15 ENA
sd szdbor006ddg04-01 szdbor006du02-01 szdbor006ddg04 0 421392384 2306213888 c1t500A09828759382Fd50 ENA

Issued vxrecover on the volume and plex but the state never changed and I didn't find a vxrecovery task with ps or vxtask list. The recovery task was somehow confused I am guessing so to fix here is what I needed to do.
vxplex -g diskgroup det szdbor006du02-01
This put the plex into a DETACHED STALE state
vxmend -g diskgroup fix clean szdbor006du02-01
This put the plex back into a DETACHED CLEAN state at which point I could do a
vxvol -g diskgroup startall (I could have just put the volume name as well)
This enabled and started the volume. FSCK'd and remounted the FS.

Now to figure out why exactly the FAS6080 crashed just because of an HBA hiccup.
Hope this may be useful if you ever run into the same scenario.

Tuesday, November 3, 2009

NetApp fun for 24 hours

I've been working at our ORC Datacenter (Off-Site Records Center)installing 2 NetApp filers that I moved from our downtown DC. WOW...it was all going to so well until i booted up the new FAS3050 filers to replace the older 960 and 980 heads.

1st the 3050's complained about not seeing any disks it could own. Fixed that by booting maint mode and assigning them to the new 3050's.

2nd the 3050's complained that the disks had a mismatched OnTap version on them. 7.3.2 on the disks and the 3050's had 7.2.# on them.

3rd a netboot of the 3050's blew up every time. The nic would just go offline and hence would kill the netboot. I tried a netboot from downtown, from hou and directly connected to my laptop. None of which worked!

4th decided to just reuse the 900's. the first 960 boots up and and complains it cant grab ownership of the disks (because the 3050 grabbed them before). so now i have to re-rack the 3050 plug in the disk and remove_ownership.

5th the network ports locations weren't completely communicated correctly so they were on the correct VLAN's

24 hours later (930AM today) the 2 replication filers are back online in the ORC datacenter.

This freed up 8Kw of power in the downtown DC, so the DC manager is happy again.

(sorry for the bad grammar and capitalization)

Thursday, October 22, 2009

Veritas SF patches....

I've been troubleshooting an issue on a Solaris 10 host with Veritas SF 4.1MP2.RP3 installed. The issue is when you reuse a LUN # vxvm and vxdmp don't realize that the LUN is actually new so DMP tries to use it as an alternate path for another disk. In my case we had migrated some LUNs to another storage port and HBA on the host for performance reasons. When we later mapped another LUN with the same # as used prior it freaked out and dropped the volume which in-turn blew up oracle.

In digging through this to find a release that had a fix I remember this site http://vias.symantec.com/labs/vpcs. It lists all the Veritas Storage products and the Maintenance Packs, Rolling Patches and Hot-fixes.

Hopefully someone else gets some good use out of it.

Wednesday, October 21, 2009

My first white paper...

I wrote a white paper that discussed HDS Storage Virtualization and VMware a couple years ago. I lost track of that white paper and found it this morning.

It was my first try at a publication so don't be too critical of me. I thought I would share it as well as plug my storage Vendor/Partner.

Lumenate VMware and Storage Virtualization

Lumenate

Tuesday, October 20, 2009

VMware ESX migration..

How are we going to migrate 2 large VM ESX 3.5U2 environments into our new DC?

Well here are the details of the new environment (servers and storage) and my thoughts!

The servers will be HP c-Class Bladesystem with BL460c blades and Intel E5550 Quads. The storage will be Hitachi Data Systems USP-V.

If I extend the SAN from the PDC to our current DC via the 10Gb/s link that runs from the airport area to downtown this will allow me to map new LUNs from the PDC array to the current ESX clusters. Move that direction will allow us to get the first piece of the puzzle completed and that is migrating the data to the PDC. Using Storage VMotion we can migrate the VM data to the PDC USP-V. Now the hairy part, How do we migrate the actual VM's to the PDC servers.

Well if we extend the ESX cluster to the PDC bladesystem over the 10Gb/s link that should work just as if we were extending the cluster from one side of the DC to another. OK so technically that all should work. What about the CPU differences? Currently the ESX clusters run on X5355 Intel Quads and according the the VMotion CPU Compatibility Requirements for Intel CPU's this will be a problem.

Intel Core CPUs

VMotion CPU Compatibility Group
CPU Details
ESX 4.x, ESX Server 3.x, and ESX Server 2.x
Group A
Without SSSE3, SSE4.1, or SSE4.2.
Models include:
Dual-core Xeon LV based on Intel Core microarchitecture.
For example, Sossaman.
For A<->B VMotion, apply SSSE3 mask. (Not supported)
Group B
With SSSE3. Without SSE4.1 or SSE 4.2.
Models include:
Intel Xeon CPUs based on the Intel Core microarchitecture. For example, Intel Xeon 30xx, 32xx, 51xx, 53xx, 72xx, or 73xx.
For B<->C VMotion, apply SSE4.1 mask. (Not supported prior to ESX 3.5. Experimentally supported for ESX 3.5 and later only.)
Group C
With SSSE 3 and SSE4.1. Without SSE4.2.
Models include:
Intel Xeon CPUs based on 45nm Intel Core microarchitecture. For example, Intel Xeon 31xx, 33xx, 52xx, 54xx, or 74xx.
For C<->D VMotion, apply SSE4.2 mask. (Not supported prior to ESX 3.5. Experimentally supported for ESX 3.5 and later only.)
Group D
With SSSE3, SSE4.1, and SSE4.2.
Models include:
Intel Xeon CPUs based on Intel Nehalem microarchitecture (Core i7). For example, Intel Xeon 55xx.

So now what to do? Will this not actually work or just not be a long-term supportable configuration? Feasibility vs Supportability! I just need it to be feasible.

I have mulled over SRM for migration as well. While it's possible it will require a TON of pre-work to get specific move group VM's onto the same LUNs and will require outages, which I guess are ok, but I just think its really cool to be able to do it all without downtime.

To be or Not to be...

Monday, October 19, 2009

the little cloud that could...

so with the recent failure in the MS/Sidekick cloud i got to thinking...what if?

what if we put our data in a cloud ($billions in data assets)?
what if all that data disappeared?
what would the user experience be like with the data in the cloud?
could we create our own cloud in our Primary Data Center?
should the cloud be "clustered" for application and data availability?

oh well that's what i had running through my head....

feel free to answer the q's above or add more!

EDIT: http://opensource.sys-con.com/node/1148239 Very interesting take on the Internal (private) Cloud. Thanks Eric!

Wednesday, September 30, 2009

Blog Entries...

I was told that I needed to start a blog. I remembered that I had this one, so maybe I'll start adding some content to it. I prefer to minimize the number of sites that I have to post on or update, but I guess maybe I can try and keep this one up2date.

Tuesday, January 27, 2009

Stardate 200901.27

So the blog begins. I have every other type of social networking/web 2.0 site so i figured i may as well start this one as well. thats all for now, igottapoop.