Troubleshooting the VMC on AWS API using Firefox

Many thanks to the awesome Nico Vibert for showing me this trick.

I’ve been writing code against the VMC API and have been having problems debugging some of the calls.

For this example, we will work with the NSX-T services API. Services define ports and protocols i.e. HTTP runs on TCP/80. First we go to Developer Center, pick my SDDC out of the dropdown, then go to the NSX VMC Policy API.

Find the Policy,Inventory,Services objects.

Execute the GET request for /infra/services

We get a ServiceListResult, which we can click on and see all of the services defined in our SDDC (screenshot truncated).

API explorer makes it easy to play with the API and get it working. But when you’re writing code against the API in a programming language, it can be very difficult to figure out where your mistakes are when you can’t successfully call the API programmatically.

Firefox has a Web Developer tool with awesome web debugging tools. You can enable them as shown below. For working with APIs, we want the Network debugger.

This opens up a debugging window at the bottom of Firefox. We can do a filter by typing ‘vmwarevmc.com’ in the filter box. Now only URLS with the VMC API vmwarevmc.com in it will show up.

When we execute the GET request, it shows up below.

We can click on the request and see the entire API call – the GET request, URL, headers, Response, etc. If we were doing a PUT request, we would also see the JSON payload in the Request tab.

Using this technique makes it significantly easier to troubleshoot your code. It also highlighted something that I understood incorrectly – I had a mental image of API explorer executing calls on a server running the cloud services portal somewhere. That’s not what’s happening – your own browser is making the API calls.

Using the RVTools import feature in the VMC Sizer

The VMC sizer is a publicly available tool that VMware employees and partners use to correctly size SDDC environments. A recent update allowed the direct upload of RV Tools data, saving the user the effort of having to calculate all of the required inputs. This is great if you need to size an entire vCenter, but in most cases we’re only sizing a portion of the vCenter.

The first part of the sizer is basic cluster settings common across all workloads that you need to size.

Next is the workload profile with the new import feature.

I pick RVTools, then select an RVTools export file that I made from my homelab vCenter. You have a few options here, including sizing only Powered On VMs, as well as deciding whether you want to size on Utilized or Provisioned storage and memory.

I click upload, and the sizer automatically calculates and populates the required inputs.

Let’s look at some of the RVTools data from my homelab. In the vInfo sheet, you see 2 VMs named esx03 and esx04. These are nested ESXi hosts and should not be included in the VMC VM count.

The nested hosts show up in the vHost sheet – this will skew the calculations as the sizer thinks we have more physical resources than we actually have.

For the purposes of this demonstration, I want to remove ESX03 and ESX04, and I picked 2 other random VMs – DC02 and UTIL01.

I now have to do the following to my RVTools Excel file:

  • Delete VMs DC02, ESX03, ESX04, and UTIL01 from the vInfo sheet
  • Delete hosts ESX03 and ESX04 from the vHost sheet
  • Delete VMs DC02, ESX03, ESX04, and UTIL01 from the vMemory sheet

Now when I import the new spreadsheet, I get the following values:

  • VM count drops from 13 to 9
  • Storage per VM increased from 22.95 to 23.14 – this is because none of the 4 VMs had much storage attached to them.
  • vCPU/core drops from 1.92 to 1.88 as we have fewer vCPUs after deleting 4 VMs.
  • vCPU/VM drops from 1.77 to 1.67. This makes sense because the ESXi hosts had more vCPUs than anything else in my lab.
  • vRAM/VM increases from 4.76GB to 5.05GB. The 4 VMs had less RAM than most other VMs in my lab, so the total RAM per VM went up even though overall RAM went down.

My lab is tiny so the sizer isn’t going to show huge changes. But if you’re deleting entire clusters worth of VMs, your sizing will be much different.

VMC on AWS multi-edge scenario

In M12, we released the multi-edge feature in VMC on AWS. It is possible to oversaturate the NSX-T edge router under very heavy loads. If that happens, one way to resolve it is to deploy an additional edge pair to spread the load.

Traffic is routed to a specific edge node based on configuring traffic groups. Traffic groups are built using source subnets – your traffic is directed to a specific edge based on the IP address given to the VM in VMC on AWS.

Given the scenario:

VM-A = 10.10.10.10
VM-B = 10.10.10.20
VM-C = 10.10.10.30

We could create a traffic group for 10.10.10.10/32, 10.10.10.20/32, 10.10.10.30/32, creating dedicated edges for each VM. Multi-edge does get expensive as you burn 2 entire hosts worth of capacity just for the edge – one for primary and one for failover.

When you deploy the traffic groups as shown above, the route table in the connected VPC will show the following:

10.10.10.10/32 => ENI1
10.10.10.20/32 => ENI2
10.10.10.30/32 => ENI3

This configuration does nothing to balance the load sourced from EC2 instances in the Connected VPC, meaning if we have 8 EC2 application servers that need to write to VM-A, they will all hit the same ENI – ENI1.

Any EC2 application server in the Connected VPC destined for VM-A will always use ENI1
Any EC2 application server in the Connected VPC destined for VM-B will always use ENI2
Any EC2 application server in the Connected VPC destined for VM-C will always use ENI3

Using the VMC API to troubleshoot the connected VPC

I ran into a problem today where a customer tried to connect their native VPC to VMC on AWS, but had no subnets available, even though they knew there were subnets in the VPC.

The problem ended up being that all of the subnets in the connected VPC were shared subnets – owned by a different account and shared into the connected VPC. This is unsupported – connected VPC subnets must exist directly in the connected VPC. However, the CloudFormation stack still succeeds and connects the organization to the connected VPC.

You can obviously use the AWS console to look at your VPC subnets, but that’s not always an option. Sometimes based on project timelines and staff availability, all you do on a particular call is get the person with administrative access to run the CloudFormation stack to connect the VPC to the VMC on AWS org. You can then create the SDDC later. If you’re having a problem with no subnets showing up, you either need to wait for somebody who has access to the VPC to log on, or you can check it yourself via API.

To create a test environment, I started a new SDDC build, connected a new AWS account, then ran the Cloud Formation stack

The cloud formation stack succeeds.

Connection was successful

Now if I cancel out of the SDDC creation and start creating a different SDDC, my newly connected account starting in 77 is still available as a choice – I can continue with the SDDC creation.

In a related problem, sometimes the customer connects the wrong account and you want it removed from the list. You can also do this via the API – I show this at the bottom of the post.

Using the API

If you have the ability to pull down Python scripts from Github, the Python Client for VMware Cloud on AWS is the easiest way to make the API calls to check on connected VPC subnet compatibility. Clone the repo and put your own refresh token, Org ID, and SDDC ID in config.ini

You can then use two new commands that I just created – show-connected-accounts and show-compatible-subnets.

Run the show-connected-accounts command (output redacted). This shows us all of the native AWS accounts that are connected to our organization.

$ ./pyVMC.py show-connected-accounts
+--------------------------------------+
|                OrgID                 |
+--------------------------------------+
| 2axxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
+--------------------------------------+
+----------------+--------------------------------------+
| Account Number |                  id                  |
+----------------+--------------------------------------+
|  61xxxxxxxxxx  | 08xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
|  68xxxxxxxxxx  | 14xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
|  80xxxxxxxxxx  | 59xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
|  82xxxxxxxxxx  | bbxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
+----------------+--------------------------------------+

Then find the connected account ID associated with the AWS account number and use it as an argument to the show-compatible-subnets command.

To show compatible native AWS subnets connected to the SDDC:
        show-compatible-subnets [LINKEDACCOUNTID] [REGION]

In this case I need to connect the AWS account number starting with 68 to an SDDC In us-east-1. So I find account 68xxx is mapped to Linked Account ID 14xxx. The command output is below, you get a list of subnets and whether they are compatible.

$ ./pyVMC.py show-compatible-subnets 14xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx us-east-1
+-----------------------+-------------+
|          vpc          | description |
+-----------------------+-------------+
| vpc-04xxxxxxxxxxxxxxx |     Texx    |
| vpc-0axxxxxxxxxxxxxxx |  Texxxxxxx  |
| vpc-02xxxxxxxxxxxxxxx |     xxxx    |
|      vpc-41xxxxxx     | vmcxxxxxxxx |
+-----------------------+-------------+
+-----------------------+--------------------------+-------------------+--------------------+------------+
|         vpc_id        |        subnet_id         | subnet_cidr_block |        name        | compatible |
+-----------------------+--------------------------+-------------------+--------------------+------------+
| vpc-04xxxxxxxxxxxxxxx | subnet-00xxxxxxxxxxxxxxx |   192.168.xxx/xx  |       Texxx        |   False    |
| vpc-0axxxxxxxxxxxxxxx | subnet-0cxxxxxxxxxxxxxxx |  10.xxxxxxxxx/xx  |    Alxxxxxxxxxx    |    True    |
| vpc-0axxxxxxxxxxxxxxx | subnet-05xxxxxxxxxxxxxxx |   10.xxxxxxx/xx   |    Texxxxxxxxxx    |    True    |
| vpc-0axxxxxxxxxxxxxxx | subnet-08xxxxxxxxxxxxxxx |   10.xxxxxxxx/xx  |    Texxxxxxxxxx    |    True    |
| vpc-0axxxxxxxxxxxxxxx | subnet-0cxxxxxxxxxxxxxxx |   10.xxxxxxxx/xx  |    Texxxxxxxxxx    |    True    |
| vpc-0axxxxxxxxxxxxxxx | subnet-05xxxxxxxxxxxxxxx |   10.xxxxxxxx/xx  |    Texxxxxxxxxx    |    True    |
| vpc-0axxxxxxxxxxxxxxx | subnet-00xxxxxxxxxxxxxxx |  10.xxxxxxxxx/xx  |    Texxxxxxxxxx    |   False    |
| vpc-02xxxxxxxxxxxxxxx | subnet-0cxxxxxxxxxxxxxxx |   172.xxxxxx/xx   |        xxxx        |    True    |
| vpc-02xxxxxxxxxxxxxxx | subnet-00xxxxxxxxxxxxxxx |   172.xxxxxx/xx   |        xxxx        |    True    |
|      vpc-41xxxxxx     |     subnet-63xxxxxx      |   172.xxxxxxx/xx  |        xxxx        |    True    |
|      vpc-41xxxxxx     |     subnet-70xxxxxx      |   172.xxxxxxx/xx  |        xxxx        |    True    |
|      vpc-41xxxxxx     |     subnet-34xxxxxx      |   172.xxxxxxx/xx  |        xxxx        |    True    |
|      vpc-41xxxxxx     |     subnet-0cxxxxxx      |   172.xxxxxxx/xx  |        xxxx        |    True    |
|      vpc-41xxxxxx     |     subnet-25xxxxxx      |   172.xxxxxxx/xx  |        xxxx        |   False    |
|      vpc-41xxxxxx     |     subnet-1exxxxxx      |   172.xxxxxxx/xx  | vmcxxxxxxxxxxxxxxx |    True    |
+-----------------------+--------------------------+-------------------+--------------------+------------+

If pulling down Python repositories from Github isn’t your thing, here is how to do it via API calls in the Developer Center

Go to Developer Center>API Explorer, then click VMware Cloud on AWS>General. Expand the AWS Account Connection Operations category.

Find the /account-link/connected-accounts GET request and execute it

We get back a list of AWS accounts that are connected to this org. Now we need to find which one is connected to the correct AWS account ID. Above, I made a new connection to an AWS account ID that began with 77, so normally we would look for that account ID. However, that account is completely empty with no subnets in it; I’m instead going to look for an account ID that I know has subnets in it – the account ID starts with 68.

I click until I find the one I want – I find my account ID that starts with 68 in the ConnectedAccount GUID starting with 14. I copy that GUID to my clipboard

Now find the /account-link/compatible-subnets GET request. The org ID automatically populates, paste the Linked Account ID in, then you have to enter the region where you were trying to deploy – in this case us-east-1.

All of your subnets show up – the important flag is at the bottom – ‘compatible’.

I didn’t want my empty linked account with the GUID starting with ’47’ to remain linked anymore, so I deleted it with this call.

Changing the SA Lifetime interval for a VMC on AWS VPN connection

The Security Association (SA) lifetime defines how how long a VPN tunnel stay up before swapping out encryption keys. In VMC, the default lifetimes are 86,400 seconds (1 day) for Phase 1 and 3600 (1 hour) for Phase 2.

I had a customer that needed Phase 2 set to 86,400 seconds. Actually, they were using IKEv2 and there really isn’t a Phase 2 with IKEv2, but IKE is a discussion for a different day. Regardless, if you need the tunnel up for 86,400 seconds, you need to configure the setting as shown in this post. This can only be done via API call. I will show you how to do it through the VMC API explorer – you will not need any programming ability to execute the API calls using API explorer.

Log into VMC and go to Developer Center>API Explorer, pick your SDDC from the dropdown in the Environment section, then click on the NSX VMC Policy API.

Search for VPN in the search box, then find Policy,Networking,Network,Services,VPN,Ipsec,Ipsec, Profiles.

Expand the first GET call for /infra/ipsec-vpn-tunnel-profiles

Scroll down until you see the Execute button, click the button to execute the API call. You should get a response of type IPSecVpnTunnelProfileListResult.

Click on the result list to expand the list. The number of profiles will vary by customer – in my lab, we have 11 profiles.

I click on the first one and see my name in it, so I can identify it as the one I created and the one I want to change. I find the key sa_life_time set to 3600 – this is the value that needs to change to 86,400

Click on the down arrow next to the tunnel profile to download the JSON for this tunnel profile. Open it in a text editor and change the value from 3600 to 86400 (no commas in the number).

Now we need to push our changes back to VMC via a PATCH API call. Find the PATCH call under the GET call and expand it.

Paste the entirety of the JSON from your text editor into the TunnelProfile box. You can see that the 86400 is visible. Paste the tunnel ID into the tunnel-profile-id section – you can find the ID shown as “id” in the JSON file. Click execute. If successful, you will get a “Status 200, OK” response.

Now to verify. Find the GET request that takes a tunnel profile ID – this will return just a single tunnel profile instead of all of them.

Pass it the tunnel ID and click Execute. You should get a response with a single tunnel profile object.

Click on the response object and you should find an sa_life_time value of 86400.

Changing a VMC segment type from the Developer Center

Here’s a way to use the API explorer to test out API calls. This is one technique to figure out how the API works before you start writing code in your favorite language.

First I create a disconnected segment in my SDDC

Then I go to the Developer Center in the VMC console, pick API Explorer, NSX VMC Policy API, and pick my SDDC from the dropdown.

Now I need a list of all segments – I find this in /infra/tier-1s/{tier-1-id}/segments

I provide the value ‘cgw’ for tier-1-id and click Execute

It’s easiest to view the results by clicking the down arrow to download the resulting JSON file.

Inside the file I find the section containing my test segment ‘pkremer-api-test’.

        {
            "type": "DISCONNECTED",
            "subnets": [
                {
                    "gateway_address": "192.168.209.1/24",
                    "network": "192.168.209.0/24"
                }
            ],
            "connectivity_path": "/infra/tier-1s/cgw",
            "advanced_config": {
                "hybrid": false,
                "local_egress": false,
                "connectivity": "OFF"
            },
            "resource_type": "Segment",
            "id": "15d1e170-af67-11ea-9b05-2bf145bf35c8",
            "display_name": "pkremer-api-test",
            "path": "/infra/tier-1s/cgw/segments/15d1e170-af67-11ea-9b05-2bf145bf35c8",
            "relative_path": "15d1e170-af67-11ea-9b05-2bf145bf35c8",
            "parent_path": "/infra/tier-1s/cgw",
            "marked_for_delete": false,
            "_create_user": "pkremer@vmware.com",
            "_create_time": 1592266812761,
            "_last_modified_user": "pkremer@vmware.com",
            "_last_modified_time": 1592266812761,
            "_system_owned": false,
            "_protection": "NOT_PROTECTED",
            "_revision": 0
        }

Now I need to update the segment to routed, which I do by finding PATCH /infra/tier-1s/{tier-1-id}/segments/{segment-id}. I fill in the tier-1-id and segment-id values as shown (the segment-id came from the JSON output above as the “id”.

This is code that I pasted in the Segment box

{
    "type": "ROUTED",
    "subnets": [
     {
        "gateway_address": "192.168.209.1/24",
        "network": "192.168.209.0/24"
     }
     ],
     "advanced_config": {
         "connectivity": "ON"
      },
     "display_name": "pkremer-api-test"
}

The segment is now a routed segment.

VMC on AWS – VPN, DNS Zones, TTLs

My customer reported an issue with DNS zones in VMC, so I needed it up in the lab to check the behavior. The DNS service in VMC allows you to specify DNS resolvers to forward requests. By default DNS is directed to 8.8.8.8 and 8.8.4.4. If you’re using Active Directory, you generally will set the forwarders to your domain controllers. But some customers need more granular control over DNS forwarding. For example – you could set the default forwarders to domain controllers for vmware.com, but maybe you just acquired Pivotal, and their domain controllers are at pivotal.com. DNS zones allow you direct any request for *.pivotal.com to a different set of DNS servers.

Step 1

First, I needed an IPSEC VPN from my homelab into VMC. I run Ubiquiti gear at home. I decided to go with a policy-based VPN because my team’s VMC lab has many different subnets with lots of overlap on my home network. I went to the Networking & Security tab, Overview screen which gave me the VPN public IP, as well as my appliance subnet. All of the management components, including the DNS resolvers, sit in the appliance subnet. So I will need those available across the VPN. Not shown here is another subnet 10.47.159.0/24, which contains jump hosts for our VMC lab.

I set up the VPN in the Networks section of the UniFi controller – you add a VPN network just like you add any other wired or wireless network. I add the appliance subnet 10.46.192.0/18 and my jump host subnet 10.47.159.0/24. Peer IP is the VPN public IP, and local WAN IP is the public IP at home.

I could not get SHA2 working and never figured out why. Since this was just a temporary lab scenario, I went with SHA1

On the VMC side I created a policy based VPN. I selected the Public Local IP address, which matches the 34.x.x.x VMC VPN IP shown above. The remote Public IP is my public IP at home. Remote networks – 192.168.75.0/24 which contains my laptop, and 192.168.203.0/24 which contains my domain controllers. Then for local networks I added the Appliance subnet 10.46.192.0/18 and 10.47.159.0/24. The show up as their friendly name in the UI.

The VPN comes up.

Now I need to open an inbound firewall rule on the management gateway to allow my on-prem subnets to communicate with vCenter. I populate the Sources object with subnet 192.168.75.0/24 so my laptop can hit vCenter. I also set up a reverse rule (not shown) outbound from vCenter to that same group. This isn’t specifically necessary to get DNS zones to work since we can only do that for the compute gateway – it’s the compute VMs that need the DNS zone. But I wanted to reach vCenter over the VPN.

I create similar rules on the compute gateway to allow communication between my on-prem subnets and anything behind the compute gateway – best practice would be to lock down specific subnets and ports.

I try to ping vCenter 10.46.224.4 from my laptop on 192.168.75.0/24 and it fails. I run a traceroute and I see my first hop is my VPN connection into VMware corporate. I run ‘route print’ on my laptop and see the entire 10.0.0.0/8 is routed to the VPN connection.

This means I will either have to disconnect from the corporate VPN to hit 10.x IP addresses in VMC, or I have to route around the VPN with a static route

At an elevated command prompt, I run these commands

route add 10.46.192.0 mask 255.255.192.0 192.168.75.1 metric 1 -p
route add 10.47.159.0 mask 255.255.255.0 192.168.75.1 metric 1 -p

This inserts two routes into my laptop’s route table. The -p means persistent, so the route will persist across reboots.

Now when I run a route print I can see that for my VMC appliance subnet and jump host subnet.

I can now ping vCenter by private IP from my laptop. I can also ping servers in the jump host subnet.

Now to create a DNS zone – I point to one of my domain controllers on-prem – in production you would of course point to multiple domain controllers.

I flip back to DNS services and edit the Compute Gateway forwarder. The existing DNS forwarders point to our own internal lab domain, and we don’t want to break that communication. What we do want is to have queries destined for my homelab AD redirected to my homelab DNS servers. We add the zone to the FQDN Zones box and click save.

Now we run a test – you can use nslookup, but I downloaded the BIND tools so I can use dig on Windows.

First dig against my homelab domain controller

dig @192.168.203.10 vmctest88.ad.patrickkremer.com

Then against the VMC DNS server 10.46.192.12

dig @10.46.192.12 vmctest88.ad.patrickkremer.com

The correct record appears. You can see the TTL next to the DNS entry at 60 seconds – the VMC DNS server will cache the entry for the TTL that I have configured on-prem. If I dig again, you can see the TTL counting down toward 0.

I do another lookup after the remaining 21 seconds expire and you can see a fresh record was pulled with a new TTL of 60 seconds.

Let’s make a change. I update vmctest88 to point to 192.168.203.88 instead of .188, and I update the TTL to 1 hour.

On-prem results:

VMC results:

This will be cached for 1 hour in VMC.

I switch it back to .188 with a TTL of 60 seconds on-prem, which is reflected instantly

But in VMC, the query still returns the wrong .88 IP, with the TTL timer counting down from 3600 seconds (1 hour)

My customer had the same caching problem problem, except their cached TTL was 3 days and we couldn’t wait for it to fix itself. We needed to clear the DNS resolver cache. In order to do that, we go to the API. A big thank you to my coworker Matty Courtney for helping me get this part working.

You could, of course, do this programmatically. But if consuming APIs in Python isn’t your thing, you can do it from the UI. Go to the Developer Center in the VMC console, then API explorer. Pick your Org and SDDC from the dropdowns.

Click on the NSX VMC Policy API

In the NSX VMC Policy API, find Policy, Networking IP, Management, DNS, Forwarder, then this POST operation on the tier-1 DNS forwarder

Fill out the parameter values:
tier-1-id: cgw
action: clear_cache
enforcement_point: /infra/sites/default/enforcement-points/vmc-enforcementpoint

Click Execute

We see Status: 200, OK – success on the clear cache operation. We do another dig against the VMC DNS server – even though we were still within the old 1 hour cache period, the cache has been cleared. The VMC DNS server pulls the latest record from my homelab, we see the correct .188 IP with a new TTL of 60 seconds.

AD authentication for vCenter in VMC on AWS

VMware has good documentation on setting up Hybrid Linked Mode in VMC, but the docs are a little bit confusing if all you want is Active Directory authentication into the VMC vCenter. This post shows how I was able to configure AD authentication for a VMC on AWS vCenter.

Step 1

I first wanted to build a domain controller in the connected VPC, allowing AD communication across the ENI. If you already have a domain controller accessible via VPN or Direct Connect, you do not need to worry about this part of the configuration, you can skip to Step 2. But I wanted to demonstrate AD communication across the ENI as part of this post. To figure out which EC2 subnet I need my domain controller in, I looked at Networking & Security Overview

I created a Windows 2016 EC2 instance, gave it an IP of 172.20.0.249, and promoted it to a domain controller. My test domain was named: poc.test. I needed to open the firewall to allow the management network in VMC to communicate with the domain controller. Best practice would obviously be to restrict communication to only Active Directory ports, but I opened it all up to make things simpler. The 0.0.0.0/0 for RDP was to allow domain controller access from the public internet – obviously not something you’d want to do in production, but this is just a temporary lab. The default outbound rule in EC2 is to allow everything, which I left in place.

I also needed to open the compute gateway firewall to allow bidirectional communication across the ENI, which I’ve done below.

Step 2

Once you have a Domain Controller available, you need to point the management gateway DNS to your domain controller. In this example I also pointed the Compute Gateway DNS to the domain controller.

Step 3

Even though you’re not setting up Hybrid Linked Mode, it’s a good idea to use some of the HLM troubleshooting tools to ensure connectivity to the domain controller. I ran the 5 tests shown below against my DC IP 172.20.0.249

Step 4

Now we need to configure an identity source in the VMC vCenter. Log in as cloudadmin@vmc.local. You can find this under Menu>Administration, then Single Sign On>Configuration, then Identity Sources. Click Add to add an identity source.

Select Active Directory over LDAP in the Identity Source Type dropdown.

Fill out the identity source according to your Active Directory environment. You would want to use the secondary LDAP server in production, and you would never use a Domain Admin account as the LDAP user in production.

Once the identity source is added, you will see it in the list.

Log out as cloudadmin@vmc.local and log in as a domain user.

If we enter the correct password, we receive this error. This is OK as we have not granted any domain user access to our vCenter. All domain users are granted No Access by default.

Log back in as cloudadmin and grant privileges to a domain user. In our case we want to grant admin rights at the vCenter level, we click on the vCenter object, then Permissions, then the plus to add a permission.

The AD domain should show up in the dropdown.

If you start typing in the User/Group line, a dropdown will auto-populate with matching AD objects. I pick Administrators.

Be careful here – you cannot grant the Administrator role in VMC because you are not an administrator – only VMware support has full administrative access to an SDDC. Instead, grant the CloudAdmin role. Check Propogate to send the permission down the entire tree.

We now see the new permission in the Permissions list.

Now log off as cloudadmin, and log in as the AD user.

Success! You can now grant permissions to Active Directory users.

VMware Event Broker Appliance – Part IX – Deploying the Datastore Usage Email sample script in VMC

In Part VIII, we discussed basic troubleshooting techniques for VEBA. In this post we will deploy a new sample script.

There is a new VEBA sample script in 0.3 that enables you to send emails based on a datastore alarm. Although the functionality is built into on-prem vCenter, you have very little control over the email content. It is not possible to configure email alarms in VMC on AWS because the customer is locked out of that configuration area in vCenter. The sample script is written in PowerShell/PowerCLI.

I wrote extensive documentation on deploying a PowerShell script into VEBA in Part VII . If you don’t already know how to clone a repository with git and how to work with OpenFaaS, check out Part VII. Here I am just going to show the config files and config changes necessary.

We find the code in examples/powercli/datastore-usage-email

NOTE: If you send a large volume of emails from VMC, you need to open a ticket with VMware support to unthrottle SMTP on the shadow VPC that contains your SDDC. AWS by default throttles emails sent out of all their VPCs. You can simply pop open the chat window in the VMC console and request the change. VMware will file a ticket with AWS support and you will be unthrottled.

The default configuration is set to work with GMail authenticated SMTP. However, many customers run SMTP relays either on-prem or up in VMC for various email related needs. You can change the SMTP_PORT to 25 and leave SMTP_USERNAME and SMTP_PASSWORD blank to use standard SMTP.

Here’s how I changed it to get it work using VMware’s SMTP relay to deliver to my corp email address.

 

My modified stack.yml looks like this

Create the secret and push the function with faas-cli

faas-cli secret create vc-hostmaint-config –from-file=vc-hostmaint-config.json –tls-no-verify

faas-cli up –tls-no-verify

Now we need to cause a storage alarm. We need to find the default Datastore Usage on Disk alarm and edit it

We change the warning level very low, here I change it to 7% to force an alarm. Then I hit next to the end of the config wizard.

The datastore shows a warning. Now we perform the same Edit operation and set the warning percentage back to 70%. The datastore warning should clear.

If everything worked, you should have 2 emails in your inbox – a warning email, and back to normal email.

If you don’t see the emails, check your spam folder – you may have to whitelist the emails depending on your spam settings.

If you have issues, troubleshoot the VMware event router logs and the email function logs as show in the troubleshooting section Part VIII .

VMware Event Broker Appliance – Part VII – Deploy the Sample Host Maintenance Function

In Part VI of this series, we showed how to sync a fork of our VEBA repository to the upstream repository maintained by VMware.  Back in Part IV, we deployed our first sample function. In this post, we will deploy another sample function – the host maintenance function. This post was updated on 2020-03-07 to include screenshots for the VEBA 0.3 release.

Our first sample function was written in Python. As of the date of this post, the other available samples are all in PowerShell. We will be working with the hostmaint-alarms function. This function will disable alarm actions when you pop a host into maintenance mode. No more alerts for a host that you’re doing maintenance on!

We had a problem in the 0.2 release with secret names colliding. Here is the stack.yml file for our python tagging function

Here is the sample file we used to generate the secrets named ‘vcconfig’.

Here is the stack.yml for our host maintenance alarms function in the 0.2 release

We no longer have this issue in the 0.3 release as we named our secret vc-hostmaint-config.

Here is the vcconfig.json file we use to configure the secret named ‘vcconfig’ for our PowerShell function.

A problem happens when you use secrets of the same name for scripts that aren’t expecting the same secrets format. In 0.2, we had a secret named vcconfig used for both functions, but the secret files have a completely different configuration. Neither script can read the other’s secret because they weren’t programmed to do so. The TOML secret file is a configuration file format popular with Python. The PowerShell secret file is simple JSON.  This means that we will need to change the secrets file to a different name, one for the Python script and one for PowerShell. Note that it doesn’t have to be this way – there’s nothing stopping a script writer from using TOML format for PowerShell and JSON for Python – all that matters is how how the script is written. You could write your scripts to use a single secret format and they could all share a single secret.

We now need to change the sample script to point to a different secrets file. In order to do that, we need create a new secret using our vcconfig.json file.

After editing the file to point to our environment, we push it into the VEBA appliance. I name it ‘vcconfig-hostmaint’ but you can name it whatever you want. To match the current 0.3 script, you should name it ‘vc-hostmaint-config’. If you match what’s in the script, you don’t have to rebuild any container images – the default container image will work. But there are many reasons why you would need to rebuild the container image. Any time you want to improve the existing functions, or write your own, you will need to build your own container image.  This post will continue on showing how to finish deploying by rebuilding the container image.

To create the secret file, remember you need to log in with faas-cli first, for a refresher look at Part IV of this series.

Now that we have our secrets file, we need to change our code to use it.

First we edit the first line of script.ps1 in the handler folder. We need to change the secret name to whatever we named it in the cli – here, I change it to: vcconfig-hostmaint

Looking again at the stack.yml file, we have additional problems. We built a new secret, so we can change the secrets: section to point to vcconfig-hostmaint. Our gateway needs to point to our VEBA appliance. Then we need to worry about our image. Because we changed PowerShell code, we have to rebuild the container image that runs the code. Otherwise, the container that launches is the default container that ships with the VEBA appliance.

 

We sign up for a Docker account

 

Now we download and install Docker Desktop. The installation is simple, you can find the official installation documentation here.

 

After installation there’s a little whale icon in my system tray

I right-click and log in with the account I just created

 

 

Now when I right-click the whale, it shows that I’m signed in.

Now we edit the stack.yml file.  We make sure our gateway is pointing to our VEBA. We change the secrets to point to our new secret. And we change the image name – the account name gets changed to the Docker account that we just created.

Now we need to build, push, and deploy our new container image. The faas-cli function creation documentation shows us the commands we need to use.

First, we need to log in to Docker. Since I had no experience with Docker, it took me forever to figure out this seemingly simple task. I tried many different ways, but all failed with an Unauthorized error.

The simple answer is to make sure you’re logged into the Docker desktop

Then you issue a docker login command with no arguments. Just docker login.

 

We now issue the faas-cli build -f stack.yml command.

Multiple screens of information scroll past, but we end with a successful build.

Now we push the containers into the Docker registry with faas-cli push -f stack.yml

Now we deploy the container to our VEBA with faas-cli deploy -f stack.yml –tls-no-verify

PROTIP: Once you understand what the 3 commands do – build, push, deploy – you can use a nifty OpenFaaS shortcut command: faas-cli up –tls-no-verify
This command runs build, push, and deploy in sequence, automatically.

Now we log into vCenter and look at the Alarm actions on a host. They are currently enabled.

 

After entering maintenance mode, the alarm actions have been disabled by our function

 

Now we exit maintenance mode and our function enables alarm actions. Success!

 

Finally, we want to verify that we did not break our other Python function, the one that tags a VM when it powers on. We check our test VM and see it has no tags.

After power on, a tag has been applied. Now we have both functions working!

We have now successfully deployed our PowerShell host maintenance function! In Part VIII, we will look at some troubleshooting techniques.