NSX per-VM licensing compliance

I had a customer with a production cluster of around 100 VMs. They needed to segment off a few VMs due to PCI compliance, and were looking at a large expense to create a physically separate PCI DMZ. I suggested instead the purchase of our per-VM NSX solution. We sell it in packs of 25 and it comes in at a list price of around $10K. This looked great compared to the $75K solution they were considering.

The problem with per-VM licensing is with compliance. VMware doesn’t have a KB explaining a way to make sure you aren’t using more than the number of licenses that you bought. If you add a 25-pack of NSX licenses to a cluster with 100 VMs in it, the vCenter licensing portal will show that you’re using 100 licenses but only purchased 25. VMware KB 2078615 does say “There is no hard enforcement on the number of Virtual Machines licensed and the product will be under compliance.” However, this post is related to the way per-socket licensing displays when you add it to vCenter, not related to per-VM pricing.

I’ve had a few conversations with the NSX Business Unit (NSBU) and the intent of per-VM licensing is to allow customers to use the NSX distributed firewall without physically segmenting clusters. You can run NSX-protected VMs in a cluster alongside non-NSX-protected VMs. However, you have to take some steps to ensure that you’re remaining in licensing compliance. This post shows you how to do it.

One way to do this is to avoid using ‘any’ in your firewall rules. If all of your firewall rules are VM to VM or security group to security group, all you have to do is keep the total VM count below your purchased VM count. It is difficult to craft a firewall policy without using ‘any’, though this is the simplest method if your requirements lend themselves to this method.

An alternative way is to use security tags. It’s a bit more involved but lets you have precise control over where your NSX security policy is applied.

First, I create two custom tags, Custom_NSX.FirewallDisabled and Custom_NSX.FirewallEnabled


I then assigned tags to my VMs as shown below. The disadvantage to this method is you have to keep making sure that you security tag VMs. But it does make rule writing easier. I’m only creating two groups – NSX enabled and disabled. However, there’s nothing stopping you from creating multiple tags – maybe you have a DMZ1 and DMZ2, maybe PCI and HIPAA are separate tags.

In this case, I assign all of my PCI VMs the FirewallEnabled tag and the rest of my VMs the FirewallDisabled tag.


Now, instead of going to the Firewall section, I go to Service Composer. Don’t be confused by the fact that the security groups already exist – I took the screenshot of the Security Groups tab after I created the groups.


First, I create an NSX_Disabled group with a dynamic membership of CUSTOM_NSX.FirewallDisabled.


Next, I create an NSX_Enabled security group with a dynamic membership of CUSTOM_NSX.FirewallEnabled


I then specifically exclude NSX_Disabled from the NSX_Enabled group. This guarantees that no firewall rules can touch my excluded VMs.


I create a new security policy in Service Composer


In the Firewall Rules section, NSX has something called “Policy’s Security Groups”.  If we assign the policy to the NSX_enabled security group, we can safely use an ‘any’ rule as long as the other side is ‘Policy’s Security Groups’. So source could be ‘any’ if dest is Policy’s Security Groups, or dest could be ‘any’ if source is Policy’s Security Groups. The security group we made enforces that NSX won’t apply rules on VMs that aren’t in the NSX_enabled group.


I then apply my new policy to the NSX_Enabled security group.

policy_apply security-group-select

Doing a security policy this way is a bit more involved than simply using the Firewall section of NSX, but it’s worth considering. It’s a perfect way to ensure 100% compliance in a per-VM model. It’s also helping you unlock the power of NSX – all you have to do is security tag VMs and they automatically get their security policy.


Unable to connect virtual NIC in vCloud Air DRaaS

I had a customer open a service request, they were in the middle of a DR test using vCloud Air DRaaS and were unable to connect 1 virtual machine to the network. It kept erroring out with a generic unable to connect error.

It turns out that their VM had a VMDK sized with a decimal point, like 50.21GB instead of just 50GB. I don’t see it often, but this sometimes happens when P2V a machine. The vCloud Director backend can’t handle the decimal point in the disk size, so it errors out.

I’m not entirely sure why the error happens, but the fix is to resize your source disk to a non-decimal number and run replication again.

A quick NSX microsegmentation example

This short post demonstrates the power of NSX. My example is a DMZ full of webservers – you don’t want any of your webservers talking to each other. If one of your webservers happens to be compromised, you don’t want the attacker to then have an internal launching pad to attack the rest of the webservers. They only need to communicate with your application or database servers.

We’ll use my lab’s Compute Cluster A as an a sample. Just pretend it’s a DMZ cluster with only webservers in it.

Compute Cluster A


I’ve inserted a rule into my Layer 3 ruleset and named it “Isolate all DMZ Servers”. In my traffic source, you can see that you’re not stuck with IP addresses or groups of IP addresses like a traditional firewall – you can use your vCenter groupings like Clusters, Datacenters, Resource Pools, or Security Tags to name a few.

Rule Source

I add Computer Cluster A as the source of my traffic. I do the same for the destination.

NSX Source Cluster


My rule is now ready to publish. As soon as I hit publish changes, all traffic from any VM in this cluster will be blocked if it’s destined for any other VM in this cluster.

Ready to publish


Note that these were only Layer3 rules – so we’re secured traffic going between subnets. However, nothing’s stopping webservers on the same subnet from talking to each other. No worries here though, we can implement the same rule at layer 2.

Once this rule gets published, even VMs that are layer 2 adjacent in this cluster will be unable to communicate with each other!

NSX layer 2 block

This is clearly not a complete firewall policy as our default rule is to allow all. We’d have to do more work to allow traffic through to our application or database servers, and we’d probably want to switch our default rule to deny all. However, because these rules are tied to Virtual Center objects and not IP addresses, security policies apply immediately upon VM creation. There is no lag time between VM creation and application of the firewalling policy – it is instantaneous!  Anybody who’s worked in a large enterprise knows it can take weeks or months before a firewall change request is pushed into production.

Of course, you still have flexibility to write IP-to-IP rules, but once you start working with Virtual Center objects and VM tags, you’ll never want to go back.

Alzheimer’s Association – Forgotten Donation

Chris Wahl just put up this blog post showing the donation of royalties from his book, Networking for VMware Administrators. I won my copy for free and at the time I promised to donate to the Alzheimer’s association. I failed to do so, but I have rectified that today.

Below is my personal donation along with VMware’s matching gift. $31.41 is the minimum donation to receive a matching gift with VMware’s matching program.  alz-donation alz-matching

Moving VMs to a different vCenter

I had to move a number of clusters into a different Virtual Center and I didn’t want to have to deal with manually moving VMs into their correct folders. In my case I happened to have matching folder structures in both vCenters and I didn’t have to worry about creating an identical folder structure on the target vCenter. All I need to do was to record the current folder location and move the VM to the correct folder in the new vCenter.

I first run this script against the source cluster in the source vCenter. It generates a CSV file with the VM name and the VM folder name

$VMCollection = @()
Connect-VIServer "Source-vCenter
$CLUSTERNAME = "MySourceCluster"
$vms = Get-Cluster $CLUSTERNAME | Get-VM
foreach ( $vm in $vms )
	$Details = New-Object PSObject
	$Details | Add-Member -Name Name -Value $vm.Name -membertype NoteProperty 
	$Details | Add-Member -Name Folder -Value $vm.Folder -membertype NoteProperty
	$VMCollection += $Details
$VMCollection | Export-CSV "folders.csv"

Once the first script is run, I disconnect each host from the old vCenter and add it into a corresponding cluster in the new vCenter. I can now run this command aginst the new vCenter to ensure the VMs go back into their original folders.

Connect-VIServer "Dest-vCenter"
$vmlist = Import-CSV "folders.csv"
foreach ( $vm in $vmlist )
	Move-VM -VM $vm.Name -Destination $vm.Folder

The parent virtual disk has been modified since the child was created

Some VMs in my environment had virtual-mode RDMs on them, along with multiple nested snapshots. Some of the RDMs were subsequently extended at the storage array level, but the storage team didn’t realize there was an active snapshot on the virtual-mode RDMs. This resulted in immediate shutdown of the VMs and a vSphere client error “The parent virtual disk has been modified since the child was created” when attempting to power them back on.

I had done a little bit of work dealing with broken snapshot chains before, but the change in RDM size was outside of my wheelhouse, so we logged a call with VMware support. I learned some very handy debugging techniques from them and thought I’d share that information here. I went back into our test environment and recreated the situation that caused the problem.

In this example screenshot, we have a VM with no snapshot in place and we run vmkfstools –q –v10  against the vmdk file
-q means query, -v10 is verbosity level 10

The command opens up the disk, checks for errors, and reports back to you.



In the second example, I’ve taken a snapshot of the VM. I’m now passing the snapshot VMDK into the vmkfstools command. You can see the command opening up the snapshot file, then opening up the base disk.




In the third example, I  pass it the snapshot vmdk for a virtual-mode RDM on the same VM –  it traverses the snapshot chain and also correctly reports that the VMDK is a non-passthrough raw device mapping, which means virtual mode RDM.



Part of the problem that happened was the size of the RDM changed (increased size) but the snapshot pointed to the wrong smaller size.  However, even without any changes to the storage, a corrupted snapshot chain can  happen  during an out-of-space situation.

I have intentionally introduced a drive geometry mismatch in my test VM below – note that the value after RW in the snapshot TEST-RDM_1-00003.vmdk  is 1 less than the value in the base disk  TEST-RDM_1.vmdk



Now if I run it through the vmkfstools command, it reports the error that we were seeing in the vSphere client in Production when trying to boot the VMs – “The parent virtual disk has been modified since the child was created”. But the debugging mode gives you an additional clue that the vSphere client does not give– it says that the capacity of each link is different, and it even gives you the values (20368672 != 23068671).

The fix was to follow the entire chain of snapshots and ensure everything was consistent. Start with the most current snap in the chain. The “parentCID” value must be equal to the “CID” value in the next snapshot in the chain. The next snapshot in the chain is listed in the “parentFileNameHint”.  So TEST-RDM_1-00003.vmdk is looking for a ParentCID value of 72861eac, and it expects to see that in the file TEST-RDM_1.vmdk.

If you open up Test-RDM_1.vmdk, you see a CID value of 72861eac – this is correct.  You also see an RW value of 23068672. Since this file is the base RDM, this is the correct value. The value in the snapshot is incorrect, so you have to go back and change it to match.  All snapshots in the chain must match in the same way.



I change the RW value in the snapshot back to match  23068672 – my vmkfstools command succeeds, and I’m also able to delete the snapshot from the vSphere client6_vmkfstools


VMware load balancing with Cisco C-series and 1225 VIC card

I recently did a UCS C-series rackmount deployment. The servers came with a 10gbps 1225 VIC card and the core routers were a pair of 4500s in VSS mode.

The 1225 VIC card lets you carve virtual NICs from your physical NICs. You can put COS settings directly on the virtual NICS, enabling you to prioritize traffic directly on the physical NIC. For this deployment, I created 3 virtual NIC for each pNIC – Management, vMotion, and VM traffic. By setting COS to 6 for management, 5 for VMs, and 4 for vMotion on the vNICs, I ensure that management traffic is never interrupted, and I also guarantee that VM traffic will be prioritized over vMotion. This still allows me to take full advantage of 10gbps of bandwidth when the VMs are under light load.

Cisco 1225 VIC vNIC

Cisco 1225 VIC vNIC

My VCAP5-DTD exam experience

I took the VCAP5-DTD beta exam on January 3rd, 2013. Like many people, I received the welcome news today that I passed the exam.

I’m laughing a little to myself as I write this post because my certification folder contains a log of my studying. I downloaded the beta blueprint on December 17, 2012, but I already had Microsoft exams scheduled for December 28th.  I did no studying for this VCAP until the day before the exam, January 2rd, where you can clearly see my feverish morning download activity. I will say though that I have several years of View deployments under my belt, so my knowledge on the engineering side was up-to-date and at the front of my mind.

VCAP5-DTD Folder

I downloaded every PDF referenced in the exam blueprint, and I already had most of the product documentation already downloaded. I am primarily a delivery engineer, but to be successful on the exam you need to put on your designer’s hat. I tried to keep that in mind as I pored through the PDFs – it does make a difference because different information will stand out if you actively look for design elements.

My exam was just after lunch and it was well over an hour away, so I left early and brought my Kindle. I continued going through the PDFs until exam time. The sheer volume of information you have to read through makes VMware design exams quite difficult. I suggest reading the answers before you read the question – this helps you identify clues in the question. There are detailed descriptions requiring 6 or more paragraphs of reading just to answer a single multiple choice question.

The GA version of the exam has 115 questions and 6 diagramming scenarios. Keep track of the number of diagramming questions you get so you can budget your time appropriately. You should not spend any more than 15 minutes on a diagram. Keep in mind that 15 * 6 = 90 minutes, leaving you only 105 minutes to answer 109 questions. The pace you have to sustain is mentally exhausting. The beta was even more difficult with 131  questions, plus the expectation to provide comment feedback on the questions.

I found the diagramming questions to be even more involved than the DCD questions.. I’d say the tool was a bit better behaved than the DCD exam, but not by much. It’s easy to get sucked in to a design scenario and waste far too much time. Remember that you’re not designing the perfect system, it just has to be good enough to meet the stated requirements.