Category Archives: VMConAWS

Changing the SA Lifetime interval for a VMC on AWS VPN connection

The Security Association (SA) lifetime defines how how long a VPN tunnel stay up before swapping out encryption keys. In VMC, the default lifetimes are 86,400 seconds (1 day) for Phase 1 and 3600 (1 hour) for Phase 2.

I had a customer that needed Phase 2 set to 86,400 seconds. Actually, they were using IKEv2 and there really isn’t a Phase 2 with IKEv2, but IKE is a discussion for a different day. Regardless, if you need the tunnel up for 86,400 seconds, you need to configure the setting as shown in this post. This can only be done via API call. I will show you how to do it through the VMC API explorer – you will not need any programming ability to execute the API calls using API explorer.

Log into VMC and go to Developer Center>API Explorer, pick your SDDC from the dropdown in the Environment section, then click on the NSX VMC Policy API.

Search for VPN in the search box, then find Policy,Networking,Network,Services,VPN,Ipsec,Ipsec, Profiles.

Expand the first GET call for /infra/ipsec-vpn-tunnel-profiles

Scroll down until you see the Execute button, click the button to execute the API call. You should get a response of type IPSecVpnTunnelProfileListResult.

Click on the result list to expand the list. The number of profiles will vary by customer – in my lab, we have 11 profiles.

I click on the first one and see my name in it, so I can identify it as the one I created and the one I want to change. I find the key sa_life_time set to 3600 – this is the value that needs to change to 86,400

Click on the down arrow next to the tunnel profile to download the JSON for this tunnel profile. Open it in a text editor and change the value from 3600 to 86400 (no commas in the number).

Now we need to push our changes back to VMC via a PATCH API call. Find the PATCH call under the GET call and expand it.

Paste the entirety of the JSON from your text editor into the TunnelProfile box. You can see that the 86400 is visible. Paste the tunnel ID into the tunnel-profile-id section – you can find the ID shown as “id” in the JSON file. Click execute. If successful, you will get a “Status 200, OK” response.

Now to verify. Find the GET request that takes a tunnel profile ID – this will return just a single tunnel profile instead of all of them.

Pass it the tunnel ID and click Execute. You should get a response with a single tunnel profile object.

Click on the response object and you should find an sa_life_time value of 86400.

Changing a VMC segment type from the Developer Center

Here’s a way to use the API explorer to test out API calls. This is one technique to figure out how the API works before you start writing code in your favorite language.

First I create a disconnected segment in my SDDC

Then I go to the Developer Center in the VMC console, pick API Explorer, NSX VMC Policy API, and pick my SDDC from the dropdown.

Now I need a list of all segments – I find this in /infra/tier-1s/{tier-1-id}/segments

I provide the value ‘cgw’ for tier-1-id and click Execute

It’s easiest to view the results by clicking the down arrow to download the resulting JSON file.

Inside the file I find the section containing my test segment ‘pkremer-api-test’.

        {
            "type": "DISCONNECTED",
            "subnets": [
                {
                    "gateway_address": "192.168.209.1/24",
                    "network": "192.168.209.0/24"
                }
            ],
            "connectivity_path": "/infra/tier-1s/cgw",
            "advanced_config": {
                "hybrid": false,
                "local_egress": false,
                "connectivity": "OFF"
            },
            "resource_type": "Segment",
            "id": "15d1e170-af67-11ea-9b05-2bf145bf35c8",
            "display_name": "pkremer-api-test",
            "path": "/infra/tier-1s/cgw/segments/15d1e170-af67-11ea-9b05-2bf145bf35c8",
            "relative_path": "15d1e170-af67-11ea-9b05-2bf145bf35c8",
            "parent_path": "/infra/tier-1s/cgw",
            "marked_for_delete": false,
            "_create_user": "pkremer@vmware.com",
            "_create_time": 1592266812761,
            "_last_modified_user": "pkremer@vmware.com",
            "_last_modified_time": 1592266812761,
            "_system_owned": false,
            "_protection": "NOT_PROTECTED",
            "_revision": 0
        }

Now I need to update the segment to routed, which I do by finding PATCH /infra/tier-1s/{tier-1-id}/segments/{segment-id}. I fill in the tier-1-id and segment-id values as shown (the segment-id came from the JSON output above as the “id”.

This is code that I pasted in the Segment box

{
    "type": "ROUTED",
    "subnets": [
     {
        "gateway_address": "192.168.209.1/24",
        "network": "192.168.209.0/24"
     }
     ],
     "advanced_config": {
         "connectivity": "ON"
      },
     "display_name": "pkremer-api-test"
}

The segment is now a routed segment.

VMC on AWS – VPN, DNS Zones, TTLs

My customer reported an issue with DNS zones in VMC, so I needed it up in the lab to check the behavior. The DNS service in VMC allows you to specify DNS resolvers to forward requests. By default DNS is directed to 8.8.8.8 and 8.8.4.4. If you’re using Active Directory, you generally will set the forwarders to your domain controllers. But some customers need more granular control over DNS forwarding. For example – you could set the default forwarders to domain controllers for vmware.com, but maybe you just acquired Pivotal, and their domain controllers are at pivotal.com. DNS zones allow you direct any request for *.pivotal.com to a different set of DNS servers.

Step 1

First, I needed an IPSEC VPN from my homelab into VMC. I run Ubiquiti gear at home. I decided to go with a policy-based VPN because my team’s VMC lab has many different subnets with lots of overlap on my home network. I went to the Networking & Security tab, Overview screen which gave me the VPN public IP, as well as my appliance subnet. All of the management components, including the DNS resolvers, sit in the appliance subnet. So I will need those available across the VPN. Not shown here is another subnet 10.47.159.0/24, which contains jump hosts for our VMC lab.

I set up the VPN in the Networks section of the UniFi controller – you add a VPN network just like you add any other wired or wireless network. I add the appliance subnet 10.46.192.0/18 and my jump host subnet 10.47.159.0/24. Peer IP is the VPN public IP, and local WAN IP is the public IP at home.

I could not get SHA2 working and never figured out why. Since this was just a temporary lab scenario, I went with SHA1

On the VMC side I created a policy based VPN. I selected the Public Local IP address, which matches the 34.x.x.x VMC VPN IP shown above. The remote Public IP is my public IP at home. Remote networks – 192.168.75.0/24 which contains my laptop, and 192.168.203.0/24 which contains my domain controllers. Then for local networks I added the Appliance subnet 10.46.192.0/18 and 10.47.159.0/24. The show up as their friendly name in the UI.

The VPN comes up.

Now I need to open an inbound firewall rule on the management gateway to allow my on-prem subnets to communicate with vCenter. I populate the Sources object with subnet 192.168.75.0/24 so my laptop can hit vCenter. I also set up a reverse rule (not shown) outbound from vCenter to that same group. This isn’t specifically necessary to get DNS zones to work since we can only do that for the compute gateway – it’s the compute VMs that need the DNS zone. But I wanted to reach vCenter over the VPN.

I create similar rules on the compute gateway to allow communication between my on-prem subnets and anything behind the compute gateway – best practice would be to lock down specific subnets and ports.

I try to ping vCenter 10.46.224.4 from my laptop on 192.168.75.0/24 and it fails. I run a traceroute and I see my first hop is my VPN connection into VMware corporate. I run ‘route print’ on my laptop and see the entire 10.0.0.0/8 is routed to the VPN connection.

This means I will either have to disconnect from the corporate VPN to hit 10.x IP addresses in VMC, or I have to route around the VPN with a static route

At an elevated command prompt, I run these commands

route add 10.46.192.0 mask 255.255.192.0 192.168.75.1 metric 1 -p
route add 10.47.159.0 mask 255.255.255.0 192.168.75.1 metric 1 -p

This inserts two routes into my laptop’s route table. The -p means persistent, so the route will persist across reboots.

Now when I run a route print I can see that for my VMC appliance subnet and jump host subnet.

I can now ping vCenter by private IP from my laptop. I can also ping servers in the jump host subnet.

Now to create a DNS zone – I point to one of my domain controllers on-prem – in production you would of course point to multiple domain controllers.

I flip back to DNS services and edit the Compute Gateway forwarder. The existing DNS forwarders point to our own internal lab domain, and we don’t want to break that communication. What we do want is to have queries destined for my homelab AD redirected to my homelab DNS servers. We add the zone to the FQDN Zones box and click save.

Now we run a test – you can use nslookup, but I downloaded the BIND tools so I can use dig on Windows.

First dig against my homelab domain controller

dig @192.168.203.10 vmctest88.ad.patrickkremer.com

Then against the VMC DNS server 10.46.192.12

dig @10.46.192.12 vmctest88.ad.patrickkremer.com

The correct record appears. You can see the TTL next to the DNS entry at 60 seconds – the VMC DNS server will cache the entry for the TTL that I have configured on-prem. If I dig again, you can see the TTL counting down toward 0.

I do another lookup after the remaining 21 seconds expire and you can see a fresh record was pulled with a new TTL of 60 seconds.

Let’s make a change. I update vmctest88 to point to 192.168.203.88 instead of .188, and I update the TTL to 1 hour.

On-prem results:

VMC results:

This will be cached for 1 hour in VMC.

I switch it back to .188 with a TTL of 60 seconds on-prem, which is reflected instantly

But in VMC, the query still returns the wrong .88 IP, with the TTL timer counting down from 3600 seconds (1 hour)

My customer had the same caching problem problem, except their cached TTL was 3 days and we couldn’t wait for it to fix itself. We needed to clear the DNS resolver cache. In order to do that, we go to the API. A big thank you to my coworker Matty Courtney for helping me get this part working.

You could, of course, do this programmatically. But if consuming APIs in Python isn’t your thing, you can do it from the UI. Go to the Developer Center in the VMC console, then API explorer. Pick your Org and SDDC from the dropdowns.

Click on the NSX VMC Policy API

In the NSX VMC Policy API, find Policy, Networking IP, Management, DNS, Forwarder, then this POST operation on the tier-1 DNS forwarder

Fill out the parameter values:
tier-1-id: cgw
action: clear_cache
enforcement_point: /infra/sites/default/enforcement-points/vmc-enforcementpoint

Click Execute

We see Status: 200, OK – success on the clear cache operation. We do another dig against the VMC DNS server – even though we were still within the old 1 hour cache period, the cache has been cleared. The VMC DNS server pulls the latest record from my homelab, we see the correct .188 IP with a new TTL of 60 seconds.

AD authentication for vCenter in VMC on AWS

VMware has good documentation on setting up Hybrid Linked Mode in VMC, but the docs are a little bit confusing if all you want is Active Directory authentication into the VMC vCenter. This post shows how I was able to configure AD authentication for a VMC on AWS vCenter.

Step 1

I first wanted to build a domain controller in the connected VPC, allowing AD communication across the ENI. If you already have a domain controller accessible via VPN or Direct Connect, you do not need to worry about this part of the configuration, you can skip to Step 2. But I wanted to demonstrate AD communication across the ENI as part of this post. To figure out which EC2 subnet I need my domain controller in, I looked at Networking & Security Overview

I created a Windows 2016 EC2 instance, gave it an IP of 172.20.0.249, and promoted it to a domain controller. My test domain was named: poc.test. I needed to open the firewall to allow the management network in VMC to communicate with the domain controller. Best practice would obviously be to restrict communication to only Active Directory ports, but I opened it all up to make things simpler. The 0.0.0.0/0 for RDP was to allow domain controller access from the public internet – obviously not something you’d want to do in production, but this is just a temporary lab. The default outbound rule in EC2 is to allow everything, which I left in place.

I also needed to open the compute gateway firewall to allow bidirectional communication across the ENI, which I’ve done below.

Step 2

Once you have a Domain Controller available, you need to point the management gateway DNS to your domain controller. In this example I also pointed the Compute Gateway DNS to the domain controller.

Step 3

Even though you’re not setting up Hybrid Linked Mode, it’s a good idea to use some of the HLM troubleshooting tools to ensure connectivity to the domain controller. I ran the 5 tests shown below against my DC IP 172.20.0.249

Step 4

Now we need to configure an identity source in the VMC vCenter. Log in as cloudadmin@vmc.local. You can find this under Menu>Administration, then Single Sign On>Configuration, then Identity Sources. Click Add to add an identity source.

Select Active Directory over LDAP in the Identity Source Type dropdown.

Fill out the identity source according to your Active Directory environment. You would want to use the secondary LDAP server in production, and you would never use a Domain Admin account as the LDAP user in production.

Once the identity source is added, you will see it in the list.

Log out as cloudadmin@vmc.local and log in as a domain user.

If we enter the correct password, we receive this error. This is OK as we have not granted any domain user access to our vCenter. All domain users are granted No Access by default.

Log back in as cloudadmin and grant privileges to a domain user. In our case we want to grant admin rights at the vCenter level, we click on the vCenter object, then Permissions, then the plus to add a permission.

The AD domain should show up in the dropdown.

If you start typing in the User/Group line, a dropdown will auto-populate with matching AD objects. I pick Administrators.

Be careful here – you cannot grant the Administrator role in VMC because you are not an administrator – only VMware support has full administrative access to an SDDC. Instead, grant the CloudAdmin role. Check Propogate to send the permission down the entire tree.

We now see the new permission in the Permissions list.

Now log off as cloudadmin, and log in as the AD user.

Success! You can now grant permissions to Active Directory users.

VMware Event Broker Appliance – Part IX – Deploying the Datastore Usage Email sample script in VMC

In Part VIII, we discussed basic troubleshooting techniques for VEBA. In this post we will deploy a new sample script.

There is a new VEBA sample script in 0.3 that enables you to send emails based on a datastore alarm. Although the functionality is built into on-prem vCenter, you have very little control over the email content. It is not possible to configure email alarms in VMC on AWS because the customer is locked out of that configuration area in vCenter. The sample script is written in PowerShell/PowerCLI.

I wrote extensive documentation on deploying a PowerShell script into VEBA in Part VII . If you don’t already know how to clone a repository with git and how to work with OpenFaaS, check out Part VII. Here I am just going to show the config files and config changes necessary.

We find the code in examples/powercli/datastore-usage-email

NOTE: If you send a large volume of emails from VMC, you need to open a ticket with VMware support to unthrottle SMTP on the shadow VPC that contains your SDDC. AWS by default throttles emails sent out of all their VPCs. You can simply pop open the chat window in the VMC console and request the change. VMware will file a ticket with AWS support and you will be unthrottled.

The default configuration is set to work with GMail authenticated SMTP. However, many customers run SMTP relays either on-prem or up in VMC for various email related needs. You can change the SMTP_PORT to 25 and leave SMTP_USERNAME and SMTP_PASSWORD blank to use standard SMTP.

Here’s how I changed it to get it work using VMware’s SMTP relay to deliver to my corp email address.

 

My modified stack.yml looks like this

Create the secret and push the function with faas-cli

faas-cli secret create vc-hostmaint-config –from-file=vc-hostmaint-config.json –tls-no-verify

faas-cli up –tls-no-verify

Now we need to cause a storage alarm. We need to find the default Datastore Usage on Disk alarm and edit it

We change the warning level very low, here I change it to 7% to force an alarm. Then I hit next to the end of the config wizard.

The datastore shows a warning. Now we perform the same Edit operation and set the warning percentage back to 70%. The datastore warning should clear.

If everything worked, you should have 2 emails in your inbox – a warning email, and back to normal email.

If you don’t see the emails, check your spam folder – you may have to whitelist the emails depending on your spam settings.

If you have issues, troubleshoot the VMware event router logs and the email function logs as show in the troubleshooting section Part VIII .

VMware Event Broker Appliance – Part VII – Deploy the Sample Host Maintenance Function

In Part VI of this series, we showed how to sync a fork of our VEBA repository to the upstream repository maintained by VMware.  Back in Part IV, we deployed our first sample function. In this post, we will deploy another sample function – the host maintenance function. This post was updated on 2020-03-07 to include screenshots for the VEBA 0.3 release.

Our first sample function was written in Python. As of the date of this post, the other available samples are all in PowerShell. We will be working with the hostmaint-alarms function. This function will disable alarm actions when you pop a host into maintenance mode. No more alerts for a host that you’re doing maintenance on!

We had a problem in the 0.2 release with secret names colliding. Here is the stack.yml file for our python tagging function

Here is the sample file we used to generate the secrets named ‘vcconfig’.

Here is the stack.yml for our host maintenance alarms function in the 0.2 release

We no longer have this issue in the 0.3 release as we named our secret vc-hostmaint-config.

Here is the vcconfig.json file we use to configure the secret named ‘vcconfig’ for our PowerShell function.

A problem happens when you use secrets of the same name for scripts that aren’t expecting the same secrets format. In 0.2, we had a secret named vcconfig used for both functions, but the secret files have a completely different configuration. Neither script can read the other’s secret because they weren’t programmed to do so. The TOML secret file is a configuration file format popular with Python. The PowerShell secret file is simple JSON.  This means that we will need to change the secrets file to a different name, one for the Python script and one for PowerShell. Note that it doesn’t have to be this way – there’s nothing stopping a script writer from using TOML format for PowerShell and JSON for Python – all that matters is how how the script is written. You could write your scripts to use a single secret format and they could all share a single secret.

We now need to change the sample script to point to a different secrets file. In order to do that, we need create a new secret using our vcconfig.json file.

After editing the file to point to our environment, we push it into the VEBA appliance. I name it ‘vcconfig-hostmaint’ but you can name it whatever you want. To match the current 0.3 script, you should name it ‘vc-hostmaint-config’. If you match what’s in the script, you don’t have to rebuild any container images – the default container image will work. But there are many reasons why you would need to rebuild the container image. Any time you want to improve the existing functions, or write your own, you will need to build your own container image.  This post will continue on showing how to finish deploying by rebuilding the container image.

To create the secret file, remember you need to log in with faas-cli first, for a refresher look at Part IV of this series.

Now that we have our secrets file, we need to change our code to use it.

First we edit the first line of script.ps1 in the handler folder. We need to change the secret name to whatever we named it in the cli – here, I change it to: vcconfig-hostmaint

Looking again at the stack.yml file, we have additional problems. We built a new secret, so we can change the secrets: section to point to vcconfig-hostmaint. Our gateway needs to point to our VEBA appliance. Then we need to worry about our image. Because we changed PowerShell code, we have to rebuild the container image that runs the code. Otherwise, the container that launches is the default container that ships with the VEBA appliance.

 

We sign up for a Docker account

 

Now we download and install Docker Desktop. The installation is simple, you can find the official installation documentation here.

 

After installation there’s a little whale icon in my system tray

I right-click and log in with the account I just created

 

 

Now when I right-click the whale, it shows that I’m signed in.

Now we edit the stack.yml file.  We make sure our gateway is pointing to our VEBA. We change the secrets to point to our new secret. And we change the image name – the account name gets changed to the Docker account that we just created.

Now we need to build, push, and deploy our new container image. The faas-cli function creation documentation shows us the commands we need to use.

First, we need to log in to Docker. Since I had no experience with Docker, it took me forever to figure out this seemingly simple task. I tried many different ways, but all failed with an Unauthorized error.

The simple answer is to make sure you’re logged into the Docker desktop

Then you issue a docker login command with no arguments. Just docker login.

 

We now issue the faas-cli build -f stack.yml command.

Multiple screens of information scroll past, but we end with a successful build.

Now we push the containers into the Docker registry with faas-cli push -f stack.yml

Now we deploy the container to our VEBA with faas-cli deploy -f stack.yml –tls-no-verify

PROTIP: Once you understand what the 3 commands do – build, push, deploy – you can use a nifty OpenFaaS shortcut command: faas-cli up –tls-no-verify
This command runs build, push, and deploy in sequence, automatically.

Now we log into vCenter and look at the Alarm actions on a host. They are currently enabled.

 

After entering maintenance mode, the alarm actions have been disabled by our function

 

Now we exit maintenance mode and our function enables alarm actions. Success!

 

Finally, we want to verify that we did not break our other Python function, the one that tags a VM when it powers on. We check our test VM and see it has no tags.

After power on, a tag has been applied. Now we have both functions working!

We have now successfully deployed our PowerShell host maintenance function! In Part VIII, we will look at some troubleshooting techniques.

 

VMware Event Broker Appliance – Part IV – Deploying the First Sample Function

In Part III of this series, we created our vCenter tag and cloned the code repository from Github. In Part IV, we will deploy our function and watch it fire in response to a vCenter event. This post was updated 2020-03-07 with new screenshots for the VEBA 0.3 release.

The python sample function is sitting in the root path /examples/python/tagging. The sample function will cause a vCenter tag to be applied to a VM when the VM powers on.
First we want to modify vcconfig.toml

Here we need to change the server to our vCenter server, put in the correct username/password, and change the tag URN to match the URN we retrieved in Part III

To quickly review, one way to get a Tag’s URN is via the Get-Tag PowerCLI cmdlet

 

vcconfig.toml now looks like this

We continue with the OpenFaaS instructions in the getting started documentation.

When the correct password is used, we should get this:

If you get an error, a number of things could be the cause:

  • The VM might not be fully online; it can take a while for services to come up after initial boot, particularly in a homelab
  • You might be running into a hostname problem – you must set the OPENFAAS_URL to the exact name you used when you deployed the OVF. If you deployed as the FQDN, you have to use the FQDN here. If you deployed with the short name, you must use the short name here.
  • You are typing the incorrect credentials

Next, we want to create a secret named ‘vcconfig’ in faas-cli so we’re not keeping credentials in plaintext files.

Next, we edit stack.yml.  The gateway has to change to your VEBA VM – make sure it matches the name you deployed. We also need to look at the topic. The documentation warns us that if we’re trying to monitor a VM powered on event in a DRS cluster, the topic is actually DrsVmPoweredOnEvent. We change the topic: entry to match.

My changed stack.yml looks like this:

We now issue a faas-cli template pull command (only required for the first deployment), and then deploy the function with:
faas-cli deploy -f stack.yml –tls-no-verify

 

Now the moment of truth, time to test our function! The expected behavior is that when powered on, a VM gets the alert-power-operations tag assigned to it. First we check my test VM to see if it has any tags – it has none.

After we power on the VM, has the tag been assigned?

Success! Note that you may have to hit the refresh button to get the tag to appear in the GUI.

That’s it for your first sample function!

Since this product is open source, you can make changes to it. Even if you’re not a developer, a great way to contribute is to keep documentation up-to-date. In part V, we will explore the basics of using git to commit code back to the repository. Even if you’re not a developer, you can learn the basics of git.

VMware Event Broker Appliance – Part Ia – AWS EventBridge Deployment

Deploying VEBA for AWS EventBridge

Part I of this series covers deploying VEBA. As of the 0.3 VEBA release, we support 2 event processors – built-in OpenFaaS, and external AWS EventBridge. This post covers how to configure AWS EventBridge.

William Lam has a good EventBridge blog showing his setup, and the EventBridge documentation has detailed instructions. I won’t repeat all that content, but I will show you my setup.

First I created a new EventBridge event bus.

Then I created an event forwarding rule

It’s important to make sure you have a filter – I suggest using the following event pattern for testing. You can always change it later to fit whatever events you’re trying to respond to. But if you don’t filter, EventBridge will react to every single event that comes out of your vCenter. If you’re running a Prod vCenter, that’s going to be a LOT of traffic.

I picked these two events because they’re relatively low traffic – most customers aren’t popping their hosts in and out of maintenance mode thousands of times a day.

When you deploy VEBA with the EventBridge processor, you have to enter the following settings in the EventBridge. Access key and secret are straightforward, the same IAM credentials that you’d create for programmatic access to any AWS service.  Event Bus Name needs to match the event bus you created. The region is easily found by clicking on the region name in the AWS console. The ARN is just a copy-paste from the Rule page in the AWS console – just make sure you copy the Rule ARN and not the Event Bus ARN. In my case, this:  arn:aws:events:us-east-1:##########:rule/SET-VMC-EventBus/VEBA-event-forwarding

When you finish deploying VEBA and it boots, you can look at the logfiles to see what’s happening with the Event Broker. Or obviously you can just put a host in maintenance mode and look in CloudWatch to see if a log appears.

To look on the VEBA – Enable SSH on the VEBA by following the instructions in Part VIII, or just do it on the console of the VM.

Type kubectl get pods -A and look for the event router pod

Now type kubectl logs vmware-event-router-5dd9c8f858-7b9pk -n vmware –follow
You are doing the Kubernetes equivalent of tailing a logfile.

Near the top, you see the Event Router connecting to AWS EventBridge and pulling down the configured filters. You then see the EventRouter begin receiving vCenter events

Let’s put a host into maintenance mode and see what happens.

The event was intercepted and sent to EventBridge

Now we take the host out of maintenance mode and we also see the event logged

Let’s check CloudWatch and see if our events were logged – they were!

You can return to Part I of this series for more information on deployment, or move on to Part II to begin working on the prerequisites for deploying code samples to VEBA.

 

VMware Event Broker Appliance – Part I – Deployment

Introduction

I became aware of the VMware Event Broker Appliance Fling (VEBA) in December, 2019. The VEBA fling is open source code released by VMware which allows customers to easily create event-driven automation based on vCenter Server Events. You can think of it as a way to run scripts based on alarm events – but you’re not limited to only the alarm events exposed in the vCenter GUI. Instead, you have the ability to respond to ANY event in vCenter.

Did you know that an event fires when a user account is created? Or when an alarm is created or reconfigured? How about when a distributed virtual switch gets upgraded or when DRS migrates a VM?  There are more than 1,000 events that fire in vCenter;  you can trap the events and execute code in response using VEBA. Want to send an email and directly notify PagerDuty via API call for an event? It’s possible with VEBA. VEBA frees you from the constraints of the alarms GUI, both in terms of events that you can monitor as well as actions you can take.

VEBA is a client of the vSphere API, just like PowerCLI. VEBA connects to vCenter to listen for events. There is no configuration stored in the vCenter itself, and you can (and should!) use a read-only account to connect VEBA to your vCenter. It’s possible for more than one VEBA to listen to the same single vCenter the same way multiple users can interact via PowerCLI with the same vCenter.

For more details, make sure to check out the VMworld session replay “If This Then That” for vSphere- The Power of Event-Driven Automation (CODE1379UR)

Installing the VEBA – Updated on March 3, 2020 to show the 0.3 release

If you notice screenshots alternating between an appliance named veba01 and veba02, it’s because I alternate between the two as I work with multiple VEBA versions.

The instructions can be found on the VEBA Getting Started page.

First we download the OVF appliance from https://flings.vmware.com/vcenter-event-broker-appliance and deploy it to our cluster


VMC on AWS tip: As with all workload VMs, you can only deploy on the WorkloadDatastore

I learned from my first failed 0.1 VEBA deployment to read the instructions – it says the netmask should be in CIDR format, but I put in 255.255.255.0. In 0.3, the subnet mask/network prefix is a dropdown, eliminating this problem.

Pay attention to DNS and NTP settings – they are space separated in the OVF, the instructions have said ‘space separated’ since 0.2 but it’s important to enter them correctly here. NTP settings were new in 0.2

Proxy support was new in 0.2. I don’t have a proxy server in the lab so I am not configuring it here.

These settings are all straightforward. Definitely disable TLS verification if you’re running in a homelab. One gotcha are passwords – I used blank passwords for the purposes of this screenshot, but you can move ahead with blank passwords.  This resulted in a failure after the appliance booted because it couldn’t connect to vCenter. On the plus side it lead to my capture of how to troubleshoot failures, which is shown further down in this post.

PROTIP: If you’re brand new to this process, it’s easiest to get this working with an administrator account. It is best practice to use a read-only vCenter account. You can easily change this credential after everything is working by following the procedure at the bottom of this post – edit /root/event-router/config.json (file moved in version 0.4) /root/config/event-router-config.json and restart the pod as shown in the directions.

New in 0.3, you can decide whether you want to use AWS EventBridge or OpenFaaS as your event processor.  EventBridge setup is covered in Part Ia of this series. For this setup, we choose OpenFaaS.

Since we’re not configuring EventBridge, we leave this section blank.

The POD CIDR network is an important setting new in 0.2. This network is how the containers inside the appliance communicate with each other. If the IP that you assign to the appliance is in the POD CIDR subnet, the containers will not come up and you will be unable to use the appliance. You must change either the POD CIDR address or the appliance’s IP address so they do not overlap.

This is what the VEBA looks like during first boot

If you end up with this, [IP] in brackets instead of your hostname, something has failed. Did you use a subnet mask instead of CIDR? Did you put comma-separated DNS instead of space? Did you put in the incorrect gateway? If you enabled debugging at deploy time, you can look at /var/log/boostrap-debug.log for detailed debug logs to help you pinpoint the error. If not, see what you can find in /var/log/boostrap.log

There is an entire troubleshooting section for this series. When the pod wasn’t working despite successful deployment, I needed to determine why.

Enabling SSH on the appliance is covered in the troubleshooting section. You could also use the VM console without enabling SSH.

kubectl get pods -A lists out the available pods

The pods that is crashing is the event router pod.
kubectl logs vmware-event-router-5dd9c8f858-q57h6 -n vmware will show us the logs for the event router pod. We can see that the vCenter credentials are incorrect.

We need to edit the JSON config file for Event Router.

vi /root/config/event-router-config.json

Oops. The password is blank

Fix the password

Delete and recreate the pod secret

kubectl -n vmware delete secret event-router-config
kubectl -n vmware create secret generic event-router-config –from-file=event-router-config.json

Get the current pod name with
kubectl get pods -A

Delete the event router pod

kubectl -n vmware delete pod vmware-event-router-5dd9c8f858-zsfq5

The event router pod will automatically recreate itself.  Get the new name with
kubectl get pods -A

Now check out the logs with
kubectl logs vmware-event-router-5dd9c8f858-pcq2v -n vmware

We see the event router successfully connecting and beginning to receive events from vCenter

Now we have a VEBA appliance and need to configure it with functions to respond to events. Note that it may take some time for the appliance to respond to http requests, particularly in small homelab. It took about 4 minutes for it to fully come up after succesful boot in my heavily overloaded homelab. One additional gotcha: the name you input at OVF deployment time is the name encoded in the SSL certificate, so if you input “veba01” as your hostname, you cannnot then reach it with https://veba01.fqdn.com.

In part II of this series, I will demonstrate how I configured my laptop with the prereqs to get the first sample function working in VEBA.