I am thrilled to announce that my blog post on repairs to corrupted VMDKs has been featured in the vSphere Pocketbook Blog Edition! It is truly an honor to be published alongside so many respected members of the VMware community.
Some VMs in my environment had virtual-mode RDMs on them, along with multiple nested snapshots. Some of the RDMs were subsequently extended at the storage array level, but the storage team didn’t realize there was an active snapshot on the virtual-mode RDMs. This resulted in immediate shutdown of the VMs and a vSphere client error “The parent virtual disk has been modified since the child was created” when attempting to power them back on.
I had done a little bit of work dealing with broken snapshot chains before, but the change in RDM size was outside of my wheelhouse, so we logged a call with VMware support. I learned some very handy debugging techniques from them and thought I’d share that information here. I went back into our test environment and recreated the situation that caused the problem.
In this example screenshot, we have a VM with no snapshot in place and we run vmkfstools –q –v10 against the vmdk file
-q means query, -v10 is verbosity level 10
The command opens up the disk, checks for errors, and reports back to you.
In the second example, I’ve taken a snapshot of the VM. I’m now passing the snapshot VMDK into the vmkfstools command. You can see the command opening up the snapshot file, then opening up the base disk.
In the third example, I pass it the snapshot vmdk for a virtual-mode RDM on the same VM – it traverses the snapshot chain and also correctly reports that the VMDK is a non-passthrough raw device mapping, which means virtual mode RDM.
Part of the problem that happened was the size of the RDM changed (increased size) but the snapshot pointed to the wrong smaller size. However, even without any changes to the storage, a corrupted snapshot chain can happen during an out-of-space situation.
I have intentionally introduced a drive geometry mismatch in my test VM below – note that the value after RW in the snapshot TEST-RDM_1-00003.vmdk is 1 less than the value in the base disk TEST-RDM_1.vmdk
Now if I run it through the vmkfstools command, it reports the error that we were seeing in the vSphere client in Production when trying to boot the VMs – “The parent virtual disk has been modified since the child was created”. But the debugging mode gives you an additional clue that the vSphere client does not give– it says that the capacity of each link is different, and it even gives you the values (20368672 != 23068671).
The fix was to follow the entire chain of snapshots and ensure everything was consistent. Start with the most current snap in the chain. The “parentCID” value must be equal to the “CID” value in the next snapshot in the chain. The next snapshot in the chain is listed in the “parentFileNameHint”. So TEST-RDM_1-00003.vmdk is looking for a ParentCID value of 72861eac, and it expects to see that in the file TEST-RDM_1.vmdk.
If you open up Test-RDM_1.vmdk, you see a CID value of 72861eac – this is correct. You also see an RW value of 23068672. Since this file is the base RDM, this is the correct value. The value in the snapshot is incorrect, so you have to go back and change it to match. All snapshots in the chain must match in the same way.
Updated May 3, 2014
My autographed copy has arrived!
Updated January 6, 2014
Original Post: January 1, 2014
On New Year’s Eve, @ChrisWahl tweeted:
@StevePantol responded with:
Setting up the switches
Nexus or an MDS
Cables are all ready
One task left before you go
Building fibre channel zones
Adding hosts into the zones
Typin’ world wide names
Lost up in a sea of hex
Thanks for FCNS
Without it you’d be truly vexed
Building fibre channel zones
You will start
Adding hosts into the zones
You’ll never get the port online
Until you bind it to the vfc
You’ll have to keep up with the fight
Until the VSAN membership is right
Hoping to see flogi
Praying to be done and free
The longer that it takes
The greater the insanity
Building fibre channel zones
Adding hosts into the zones
I was attempting to enable statistics on a VNX5300 running the most current release of the R31 train. I found that the stats were apparently already running, but only for a duration of 2 days. I wanted 7 days, so I tried stopping the logging. Unisphere gave me this gem of an error message:
I couldn’t get it working via NaviSecCLI either. On a whim, I fired up an older version of the Unisphere client that was installed on my laptop and it was able to stop and restart statistics collection.
Heartbleed is a major online security vulnerability in a widely used, open-source encryption library named OpenSSL. The OpenSSL library is found in nearly 70% of all webservers on the internet as well as many other software products. The vulnerability allows any attacker to compromise the private encryption key of the webserver. It also allows any attacker the ability to remotely read pieces of data directly out of the server’s memory. These are both extremely serious flaws. The fix requires IT staff to first update the OpenSSL library, then replace the SSL certificate with a new one. This is both time consuming and costly. If you are running any Linux-based webserver, particularly any on the public internet, you need to immediately check the version of OpenSSL and remediate if required.
Note that this vulnerability does not exist on a standard Windows web server running IIS.
The most recent release of VMware ESXi, version 5.5, is affected by the bug and there is no patch currently available. Standard security practice is to segment the management IP addresses from the rest of your network to prevent a malicious user from compromising your vSphere host. If you are running 5.5 and you have not segmented the management network, do so immediately. This is the only workaround available as of April 10th, 2014 at 10:15AM Central
Technical details on the bug can be found here:
Here are official releases from various vendors on this vulnerability:
Barracuda – No official statement found, although I have verbal confirmation that multiple Barracuda products are vulnerable. A patch already exists for the Message Archiver product.
VMware just announced a new recertification policy for the VCP. A VCP certification expires 2 years after it is achieved. You can recertify by taking any VCP or VCAP exam.
Part of VMware’s justification for this change is “Recertification is widely recognized in the IT industry and beyond as an important element of continuing professional growth.” While I do agree with this statement in general, I don’t believe this decision makes much sense for several reasons:
- Other vendors – Cisco and Microsoft as two examples – expire after 3 years, not 2 years. Two years is unnecessarily short. It’s also particularly onerous given the VMware course requirement for VCP certification. It’s hard enough to remain current with all of the vendors recertification policies at 3 years.
- Other vendors – again, Cisco and Microsoft as examples – have no version number tied to their certifications. You are simply “MCSE” or “CCNA”. With VMware, you are “VCP3″, “VCP4″, or “VCP5″. The certifications naturally age themselves out. A VCP3 is essentially worthless at this point. The VCP4 is old, and the VCP5 is current. An expiration policy doesn’t need to be in place for this to remain true.
- The timing of this implementation is not ideal. VMware likes to announce releases around VMworld, so we’re looking at August 2014 for 6.0. Most VMware technologists will be interested in keeping certifications with the current major release, so demand for the VCP6 will be high. Will the certification department release 6 in time for everybody to test before expiration? It’s really a waste of my time and money to force me to recertify on 5 when 6 is right around the corner.
- The expiration policy makes no sense in light of the policy on VCAPs and VCPs. Currently, any VCP makes you eligible to take a VCAP in any of the three tracks, and achieving the VCAP in a track automatically gives you a VCP in the same track. This is a significant timesaver for those of us who are heavily invested in VMware – skip the entry level exam and go straight to the advanced exam. VCAP exam development is obviously even slower than VCP exam development. I have doubts that the VCAPs will come out quickly enough to meet the March 2015 deadline.
- Adam Eckerle commented in his blog post “I also think it is important to point out that I think it encourages individuals to not only keep their skills up to date but also to branch out. If your VCP-DCV is going to expire why not take a look at sitting the VCP-DT or Cloud, or IaaS exams? If you don’t use the Horizon products or vCloud Suite as part of your job that can be difficult.”I agree that in some cases, this might encourage you to pursue a certification in a separate track. Before I had the desktop certifications, I might have considered accelerating exam preparation to prepare for this recertification date. However, I already own 4 of 6 VCAPs. Even as a consultant I have no use for vCloud, there’s just not enough demand from our customers to build a practice area around it. There’s currently no business benefit in pursuing the Cloud track.
It’s VMware’s program and they can do as they please, but I hope they consider 3 years instead of 2 for recertification.
I needed to set up a recurring import of CSV data into SQL Server using BIDS. The source data contained multiple columns with the same name. SQL Server does not like this situation – it errors out with “There is more than one data source column with the name ‘First Name'”. One option is to change the column names manually in the source data, but that’s not a very good solution for a recurring import.
After a little searching I found that you can rename the source columns when you set up your data source. Change each offending name one time when you set up the import, then the error disappears.
This has been slowly driving me insane since I got my S4. I charge my phone overnight on my nightstand. When it is fully charged, the screen turns on and stays on, bathing the room with enough light to summon Batman. I finally found a solution buried in an Android forum.
If you do not have Developer options, tap on “About Phone”.
To enable developer mode, tap on the Build number 7 times. Seriously, you have to tap on it 7 times. You’ll get a message that tells you that you are now a developer.
Now go into the Developer options screen and uncheck “Stay awake”. With this setting unchecked, the phone won’t wake up and neither will you.
Today, I passed the Cisco 640-916 DCICT exam, achieving the CCNA Datacenter certification. This was my third attempt. I failed my first attempt by 4%. I failed my second attempt by 1% and wrote about my less-than-stellar customer service experience with Pearson Vue in this post.
I primarily studied with Anthony Sequiera‘s CBTNuggets series – if you have some hands-on experience with basic Nexus configuration tasks, his videos are enough to pass the exam with one caveat. The exam developers at Cisco have taken a step backward in exam quality compared to the CCNA Route & Switch. Most Cisco exams don’t expect you to memorize pages of technical specifications, but that’s not the case with this exam. It’s almost as if they hired a few Microsoft exam developers and had them write Microsoft-style “Under which menu option would you find X feature” questions. Then they mixed those nonsense questions in with the typically straightforward Cisco questions. The result is an annoyingly blended exam that bounces between fair questioning on concepts and worthless memorization. Unfortunately, the straightforward questions aren’t enough to balance out the straight memorization.
While using somebody else’s braindump is against the rules, using your own exam experience is not. If you do happen to fail, my suggestion is to write down all of the areas you were confused by immediately – don’t even wait for the drive home, do it in the parking lot of the testing center. You can then take this extremely valuable information home with you and focus your study. Doing this made me understand exactly what pieces I had to memorize and resulted in a pass.
When you work in IT, and particularly when you work on consulting, people are in to the next big thing. If you’re not working on the next big thing, you feel like you’re missing something critical.
The buzzword d’jour for the last few years has been ‘cloud’. Put it out in the cloud, the cloud makes business more agile, the cloud saves money, etc. The cloud has its place, but you have tradeoffs. You have no control over the environment. You have to continually pay for licensing – in the cloud, you own nothing. You don’t even own the data – legalities aside, the data is sitting on equipment that you don’t own. The cloud provider could go offline at any point leaving you high and dry. I don’t really think that Amazon will go out of business soon – but what happens if it did? Backups, obviously, but now you have to look at how to back up and recover your cloud environment.
In the past, if you bought Exchange 2007, you owned it. If your cash flow meant you couldn’t afford Exchange 2010 when it came out, then you kept running 2007. In the new cloud world, you’re bound to a monthly or annual subscription. A cash flow problem means your business stops. I don’t object to looking for cloud solutions to many business problems, but people seem to be rushing to the cloud without considering all of the ramifications.