A quick NSX microsegmentation example

This short post demonstrates the power of NSX. My example is a DMZ full of webservers – you don’t want any of your webservers talking to each other. If one of your webservers happens to be compromised, you don’t want the attacker to then have an internal launching pad to attack the rest of the webservers. They only need to communicate with your application or database servers.

We’ll use my lab’s Compute Cluster A as an a sample. Just pretend it’s a DMZ cluster with only webservers in it.

Compute Cluster A

 

I’ve inserted a rule into my Layer 3 ruleset and named it “Isolate all DMZ Servers”. In my traffic source, you can see that you’re not stuck with IP addresses or groups of IP addresses like a traditional firewall – you can use your vCenter groupings like Clusters, Datacenters, Resource Pools, or Security Tags to name a few.

Rule Source

I add Computer Cluster A as the source of my traffic. I do the same for the destination.

NSX Source Cluster

 

My rule is now ready to publish. As soon as I hit publish changes, all traffic from any VM in this cluster will be blocked if it’s destined for any other VM in this cluster.

Ready to publish

 

Note that these were only Layer3 rules – so we’re secured traffic going between subnets. However, nothing’s stopping webservers on the same subnet from talking to each other. No worries here though, we can implement the same rule at layer 2.

Once this rule gets published, even VMs that are layer 2 adjacent in this cluster will be unable to communicate with each other!

NSX layer 2 block

This is clearly not a complete firewall policy as our default rule is to allow all. We’d have to do more work to allow traffic through to our application or database servers, and we’d probably want to switch our default rule to deny all. However, because these rules are tied to Virtual Center objects and not IP addresses, security policies apply immediately upon VM creation. There is no lag time between VM creation and application of the firewalling policy – it is instantaneous!  Anybody who’s worked in a large enterprise knows it can take weeks or months before a firewall change request is pushed into production.

Of course, you still have flexibility to write IP-to-IP rules, but once you start working with Virtual Center objects and VM tags, you’ll never want to go back.

Alzheimer’s Association – Forgotten Donation

Chris Wahl just put up this blog post showing the donation of royalties from his book, Networking for VMware Administrators. I won my copy for free and at the time I promised to donate to the Alzheimer’s association. I failed to do so, but I have rectified that today.

Below is my personal donation along with VMware’s matching gift. $31.41 is the minimum donation to receive a matching gift with VMware’s matching program.  alz-donation alz-matching

Custom Views and Dashboards in vRealize Operations

This post covers a few of the most common questions my customers ask me as I demonstrate what you can do with vROps. I’m going to take you through an example of needing to frequently check the CPU ready % of your VMs – this was my customer’s most recent request, but know that you can make this happen for any metric collected by vROps.

First, we’re going to create a custom view for CPU Ready % by going to Content>Views, then clicking on the green Plus to add a new View.

1-AddView

I named this one “Custom-CPU Ready” and gave it a description.

2-CustomCPUReady

Next, pick what our View looks like. In this case, I want to see all of the data in a list format, so I pick List.

3-Presentation

Now to select the subjects – these are the objects that the View will be looking at. We want CPU Ready % on Virtual Machines, so we pick the vCenter Adapter and scroll down until we find Virtual Machine.

4a-Subjects

 

4b-Subjects

We now need to find the CPU Ready % metric

5-CPUData

Double-click on it when you find it in the list on the left, it will then appear in the Data section. Change the Sort order to descending because we want to see the VM with the highest CPU ready on top.

6-SelectReady

The Availability options let you control where inside vROps the View will be usable. I lef the defaults.

7-Visibility

You now see the custom view in the Views list.

8-CustomView

How can we use our brand new view? We want to see the CPU ready for all VMs in the Production cluster. Go to Environment, then drill down into the vSphere World until you reach the Production cluster. Click on the Details tab, you can then scroll down and find the custom View that we created. Click on it and all of your VMs show up, sorted by highest CPU ready.

9-ShowReady

 

Let’s say this is a metric that you look at at daily or multiple times a day. You can create a custom dashboard so the metric is immediately visible if you’re using vROps Advanced.

To create a new dashboard, from the Home menu, click Actions, then Create Dashboard

1-AddDashboard

Name the dashboard and select a layout.

2-NameDashboard

We want to show a View in the widget, so we drag View over into the right pane.

2a-WidgetList

Click the Edit icon in the blank View to customize it.

3-EditWidget

Click on Self Provider to allow us to specify the Production Cluster object on the left, then select our Custom CPU Ready View on the right and click Save.

4-AddWidget

The dashboard is now ready. The CPU Ready for the Production VMs will now show up in the dashboard.

5-FinalDashboard

 

 

 

 

VMware Hands-on Labs

It always surprises me how few customers are aware of the Hands-on Labs. The labs are full installations of VMware software running in VMware’s cloud, accessible from any browser and completely free for anyone to use.

Each self-paced lab is guided with a step-by-step lab manual. You can follow the manual from start to finish for a complete look at the specific software product. Alternatively, you can focus on specific learning areas due to the modular structure of the lab manual. You can even ignore the manual completely and use the lab as a playground for self-directed study.

You can give the labs a try here: http://labs.hol.vmware.com/

Moving VMs to a different vCenter

I had to move a number of clusters into a different Virtual Center and I didn’t want to have to deal with manually moving VMs into their correct folders. In my case I happened to have matching folder structures in both vCenters and I didn’t have to worry about creating an identical folder structure on the target vCenter. All I need to do was to record the current folder location and move the VM to the correct folder in the new vCenter.

I first run this script against the source cluster in the source vCenter. It generates a CSV file with the VM name and the VM folder name

$VMCollection = @()
Connect-VIServer "Source-vCenter
$CLUSTERNAME = "MySourceCluster"
 
$vms = Get-Cluster $CLUSTERNAME | Get-VM
foreach ( $vm in $vms )
{
	$Details = New-Object PSObject
	$Details | Add-Member -Name Name -Value $vm.Name -membertype NoteProperty 
	$Details | Add-Member -Name Folder -Value $vm.Folder -membertype NoteProperty
	$VMCollection += $Details
}
 
$VMCollection
$VMCollection | Export-CSV "folders.csv"

Once the first script is run, I disconnect each host from the old vCenter and add it into a corresponding cluster in the new vCenter. I can now run this command aginst the new vCenter to ensure the VMs go back into their original folders.

Connect-VIServer "Dest-vCenter"
$vmlist = Import-CSV "folders.csv"
 
foreach ( $vm in $vmlist )
{
	$vm.Name
	$vm.Folder
	Move-VM -VM $vm.Name -Destination $vm.Folder
}

The parent virtual disk has been modified since the child was created

Some VMs in my environment had virtual-mode RDMs on them, along with multiple nested snapshots. Some of the RDMs were subsequently extended at the storage array level, but the storage team didn’t realize there was an active snapshot on the virtual-mode RDMs. This resulted in immediate shutdown of the VMs and a vSphere client error “The parent virtual disk has been modified since the child was created” when attempting to power them back on.

I had done a little bit of work dealing with broken snapshot chains before, but the change in RDM size was outside of my wheelhouse, so we logged a call with VMware support. I learned some very handy debugging techniques from them and thought I’d share that information here. I went back into our test environment and recreated the situation that caused the problem.

In this example screenshot, we have a VM with no snapshot in place and we run vmkfstools –q –v10  against the vmdk file
-q means query, -v10 is verbosity level 10

The command opens up the disk, checks for errors, and reports back to you.

1_vmkfstools

 

In the second example, I’ve taken a snapshot of the VM. I’m now passing the snapshot VMDK into the vmkfstools command. You can see the command opening up the snapshot file, then opening up the base disk.

 

2_vmkfstools

 

In the third example, I  pass it the snapshot vmdk for a virtual-mode RDM on the same VM –  it traverses the snapshot chain and also correctly reports that the VMDK is a non-passthrough raw device mapping, which means virtual mode RDM.

 

3_vmkfstools

Part of the problem that happened was the size of the RDM changed (increased size) but the snapshot pointed to the wrong smaller size.  However, even without any changes to the storage, a corrupted snapshot chain can  happen  during an out-of-space situation.

I have intentionally introduced a drive geometry mismatch in my test VM below – note that the value after RW in the snapshot TEST-RDM_1-00003.vmdk  is 1 less than the value in the base disk  TEST-RDM_1.vmdk

4_vmkfstools

 

Now if I run it through the vmkfstools command, it reports the error that we were seeing in the vSphere client in Production when trying to boot the VMs – “The parent virtual disk has been modified since the child was created”. But the debugging mode gives you an additional clue that the vSphere client does not give– it says that the capacity of each link is different, and it even gives you the values (20368672 != 23068671).

5_vmkfstools
The fix was to follow the entire chain of snapshots and ensure everything was consistent. Start with the most current snap in the chain. The “parentCID” value must be equal to the “CID” value in the next snapshot in the chain. The next snapshot in the chain is listed in the “parentFileNameHint”.  So TEST-RDM_1-00003.vmdk is looking for a ParentCID value of 72861eac, and it expects to see that in the file TEST-RDM_1.vmdk.

If you open up Test-RDM_1.vmdk, you see a CID value of 72861eac – this is correct.  You also see an RW value of 23068672. Since this file is the base RDM, this is the correct value. The value in the snapshot is incorrect, so you have to go back and change it to match.  All snapshots in the chain must match in the same way.

4_vmkfstools

 

I change the RW value in the snapshot back to match  23068672 – my vmkfstools command succeeds, and I’m also able to delete the snapshot from the vSphere client6_vmkfstools

 

VMware recertification

VMware just announced a new recertification policy for the VCP. A VCP certification expires 2 years after it is achieved. You can recertify by taking any VCP or VCAP exam.

Part of VMware’s justification for this change is “Recertification is widely recognized in the IT industry and beyond as an important element of continuing professional growth.” While I do agree with this statement in general, I don’t believe this decision makes much sense for several reasons:

  • Other vendors – Cisco and Microsoft as two examples – expire after 3 years, not 2 years. Two years is unnecessarily short. It’s also particularly onerous given the VMware course requirement for VCP certification. It’s hard enough to remain current with all of the vendors recertification policies at 3 years.

 

  • Other vendors – again, Cisco and Microsoft as examples – have no version number tied to their certifications. You are simply “MCSE” or “CCNA”. With VMware, you are “VCP3″, “VCP4″, or “VCP5″. The certifications naturally age themselves out. A VCP3 is essentially worthless at this point. The VCP4 is old, and the VCP5 is current. An expiration policy doesn’t need to be in place for this to remain true.

 

  • The timing of this implementation is not ideal. VMware likes to announce releases around VMworld, so we’re looking at August 2014 for 6.0.  Most VMware technologists will be interested in keeping certifications with the current major release, so demand for the VCP6 will be high. Will the certification department release 6 in time for everybody to test before expiration? It’s really a waste of my time and money to force me to recertify on 5 when 6 is right around the corner.

 

  • The expiration policy makes no sense in light of the policy on VCAPs and VCPs. Currently, any VCP makes you eligible to take a VCAP in any of the three tracks, and achieving the VCAP in a track automatically gives you a VCP in the same track. This is a significant timesaver for those of us who are heavily invested in VMware – skip the entry level exam and go straight to the advanced exam. VCAP exam development is obviously even slower than VCP exam development. I have doubts that the VCAPs will come out quickly enough to meet the March 2015 deadline.

 

  • Adam Eckerle commented in his blog post “I also think it is important to point out that I think it encourages individuals to not only keep their skills up to date but also to branch out. If your VCP-DCV is going to expire why not take a look at sitting the VCP-DT or Cloud, or IaaS exams?  If you don’t use the Horizon products or vCloud Suite as part of your job that can be difficult.”I agree that in some cases, this might encourage you to pursue a certification in a separate track. Before I had the desktop certifications, I might have considered accelerating exam preparation to prepare for this recertification date.  However, I already own 4 of 6 VCAPs. Even as a consultant I have no use for vCloud, there’s just not enough demand from our customers to build a practice area around it. There’s currently no business benefit in pursuing the Cloud track.

It’s VMware’s program and they can do as they please, but I hope they consider 3 years instead of 2 for recertification.

Christian Mohn’s blog has a fairly lively discussion going on, and Vladan Seget also has some thoughts and comments

More loathing of Pearson Vue … or, my [redacted] beta exam experience

This is what I get for taking beta exams. I understand. I create my own mess. But that doesn’t change the fact that Pearson Vue is spectacularly incompetent. I’ve previously blogged my strongly negative opinions of Pearson Vue and today’s experience doesn’t do much to improve my outlook.

I sat the [redacted] VMware beta exam today and there were problems with the remote environment. It took me almost 40 minutes to complete the first two questions because the performance of the environment was so lousy. Per my beta exam instructions, after about 15 minutes I asked the Vue proctor to contact VMware for assistance.

The proctor returned to say that we should reboot my local exam station, as if that had anything at all to do with the slow response from the remote lab. I asked her who told her to do that – she had called the Vue helpdesk, not VMware. I told her to call VMware, to which she replied “We can’t do that.”  By then the environment had improved from ‘lousy’ to ‘nearly tolerable’ so I gave up complaining.

After the exam I spoke with the VMware certification team and I received confirmation of the following:

  1. VMware has been assured by Pearson Vue that VCAP candidates will be able to get in contact with VMware’s support team for problems with the exam environment
  2. VMware’s support team has the authority to grant extended time in the event of an environment failure
  3. Pearson Vue has the capability to extend the exam time.

The slow performance of the environment set me back too far to recover;  I was unable to complete the exam. Maybe it will cost me a passing grade, maybe it won’t, but Pearson Vue’s failure to rectify the situation is inexcusable.

VMware load balancing with Cisco C-series and 1225 VIC card

I recently did a UCS C-series rackmount deployment. The servers came with a 10gbps 1225 VIC card and the core routers were a pair of 4500s in VSS mode.

The 1225 VIC card lets you carve virtual NICs from your physical NICs. You can put COS settings directly on the virtual NICS, enabling you to prioritize traffic directly on the physical NIC. For this deployment, I created 3 virtual NIC for each pNIC – Management, vMotion, and VM traffic. By setting COS to 6 for management, 5 for VMs, and 4 for vMotion on the vNICs, I ensure that management traffic is never interrupted, and I also guarantee that VM traffic will be prioritized over vMotion. This still allows me to take full advantage of 10gbps of bandwidth when the VMs are under light load.

Cisco 1225 VIC vNIC

Cisco 1225 VIC vNIC

My VCAP5-DTD exam experience

I took the VCAP5-DTD beta exam on January 3rd, 2013. Like many people, I received the welcome news today that I passed the exam.

I’m laughing a little to myself as I write this post because my certification folder contains a log of my studying. I downloaded the beta blueprint on December 17, 2012, but I already had Microsoft exams scheduled for December 28th.  I did no studying for this VCAP until the day before the exam, January 2rd, where you can clearly see my feverish morning download activity. I will say though that I have several years of View deployments under my belt, so my knowledge on the engineering side was up-to-date and at the front of my mind.

VCAP5-DTD Folder

I downloaded every PDF referenced in the exam blueprint, and I already had most of the product documentation already downloaded. I am primarily a delivery engineer, but to be successful on the exam you need to put on your designer’s hat. I tried to keep that in mind as I pored through the PDFs – it does make a difference because different information will stand out if you actively look for design elements.

My exam was just after lunch and it was well over an hour away, so I left early and brought my Kindle. I continued going through the PDFs until exam time. The sheer volume of information you have to read through makes VMware design exams quite difficult. I suggest reading the answers before you read the question – this helps you identify clues in the question. There are detailed descriptions requiring 6 or more paragraphs of reading just to answer a single multiple choice question.

The GA version of the exam has 115 questions and 6 diagramming scenarios. Keep track of the number of diagramming questions you get so you can budget your time appropriately. You should not spend any more than 15 minutes on a diagram. Keep in mind that 15 * 6 = 90 minutes, leaving you only 105 minutes to answer 109 questions. The pace you have to sustain is mentally exhausting. The beta was even more difficult with 131  questions, plus the expectation to provide comment feedback on the questions.

I found the diagramming questions to be even more involved than the DCD questions.. I’d say the tool was a bit better behaved than the DCD exam, but not by much. It’s easy to get sucked in to a design scenario and waste far too much time. Remember that you’re not designing the perfect system, it just has to be good enough to meet the stated requirements.