Homelab – 2012 to 2019 AD upgrade

I foolishly thought that I would quickly swap out my 2012 domain controllers with 2019 domain controllers, thus beginning a weeks-long saga. I have 2 DCs in my homelab, DC1 and DC2.

Built a new DC, joined to the domain, promoted to a DC (it ran AD prep for me, nice!), transferred FSMO roles (all were on DC1), all looked great! Demoted DC1, all logins failed with ‘Domain Unavailable’.

Uh-oh.

Thankfully I had my Synology backing up my FSMO role holder DC. So I restored it from scratch. I figured I might have missed something obvious so I did it again. Same result.

Ran through all sorts of crazy debugging, ntdsutil commands looking for old metadata to clean up, found some old artifacts that I thought might have been causing the issue, and repeated the process. Same result.

Several weeks later I realized what happened – I had a failing UPS take down my Synology multiple times until I replaced it a few days ago. Guess which VM I never restarted? The Enterprise CA. The CA caused all of this. Or at least most of it. Even after I powered up the CA, I was unable to cleanly transfer all FSMO roles. Everything but the Schema Master transferred cleanly, even though they all transferred cleanly while the CA was down. I had to seize the schema master role and manually delete DC1 from ADUC – thankfully, current versions of AD do the metadata cleanup for you when you delete a DC from ADUC.

In hilarious irony, I specifically built the CA on a member server and not a domain controller to avoid upgrade problems.

In summary:

  1. When you don’t administer AD every day, you forget lots of things
  2. No AD upgrade is easy
  3. Make sure you have a domain controller backup before you start
  4. Turn on your CA
  5. Run repadmin /showrepl and dcdiag BEFORE you start messing with the domain
  6. Run repadmin /showrepl and dcdiag AFTER you add a domain controller and BEFORE you remove old domain controllers
  7. ntdsutil is like COBOL – old and verbose

The Power of Community

I’ve both given to and received from the virtualization community over my career. I passed my first VCAP with help from the vBrownBags. I’ve delivered vBrownBag Tech Talks at VMworld. I’ve been part of the Chicago VMUG as a participant or VMware employee for as long as I can remember. I’ve lost count of how many times I’ve presented content to VMUGs. Community matters to me. The impact of community is immense, and you can’t predict what kind of a positive impact community has in the moment – you can only look back and connect the dots.

This year was different from all others for me in terms of community. I was awarded vExpert status in 2020, primarily because of the body of work I have had the privilege to generate this year, highlighted in my VEBA series. I’ve learned and done things I never imagined I’d be able to do. I wrote code that is running in a VMware open source product. That’s a crazy thought for a presales person. It wouldn’t have been possible without community.

Without William Lam‘s encouragement in December 2019, I would have allowed my lack of development skills and impostor syndrome to stop me from even considering contributing to an open source project . Without his guidance, I would never have been able to make some of the code changes I made.

I would have been forever lost in an ocean of Git and Kubernetes without Michael Gasch‘s willingness to spend time teaching me.

PK spent hours teaching me how to write a modern website, enabling me to contribute heavily to the VEBA website.

I wouldn’t have applied for vExpert without encouragement from Robert Guske and Vladimir Velikov.

I doubt she remembers, because it was just an honest comment, but Frankie Gold wrote in Slack “That [file format is] so confusing–which makes your blog post that much more valuable…” That really stuck with me – if this Golang wizard thinks I have a valuable contribution, then I must have something valuable to contribute.

I’ve been using Martin Dekov‘s MTU fixer function to help myself learn Python.

Find your community. Contribute to your community. You help it grow as it helps you grow.

My remotely proctored VMware exam experience

After taking my AWS SAA certification via remote Pearson Vue proctoring, I wanted to audit the experience from the VMware perspective to help validate that our customers are getting a good experience. I wasn’t sure which exam to register for, but since I’m a VMC on AWS SE, I decided to give the VMware Cloud on AWS Master Services Competency Specialist a shot. Fortunately, working with VMC on AWS every day for a year was very good prep for this exam, and I passed, earning the Master Specialist badge for VMware Cloud on AWS 2020.

For the most part any Pearson Vue exam is the same – same testing engine, same blue background. I expected a similar experience to the AWS exam. 

  • Prior to talking to a registration person, you have to complete a check-in process. After you log into Vue and indicate you’re ready to start your exam, you are taken to a check-in screen. You can input your cellphone number into the screen and Vue will text you a link, or you can just type the URL manually.  You have to take front and back photos of your ID, a photo of your face, and a view of all 4 sides of your seating area. Once you’ve submitted those photos from your phone, you can continue checking in on the website. You could use your webcam and avoid the cellphone process altogether but it would be tough to get all of the photos you need with webcam.
  • You get checked in by a registration person, they are not the exam proctor. The registration person can see you on your webcam and they provide you either chat or verbal instructions.

  • They want your desk empty – pens, pencils, headsets, everything. The only thing you should have on your desk is a mouse, keyboard, laptop if necessary, monitor, and webcam. Unlike last time, the staffer didn’t care about my laptop being on the desk and didn’t question what my docking station was.

  • You’re going to have to use your webcam to show them around the room, so be prepared to take down a monitor-mounted one or to spin your laptop around.

  • If they see or hear somebody in the room with you, it’s an instant fail. Make sure you are in a room with a locked door.

  • Unplug any dual-or-greater monitor configuration before you get into the exam. Only a single monitor is allowed. Also unplug any other monitor in the room – my homelab rack and its console monitor are in the office, so it was flagged by Pearson as a problem.

  • There is no scratch paper, no dry-erase board, your only option is an electronic notepad built into the testing software. It wasn’t a big deal for this exam but I could see this being a problem for calculations and larger design problems, at least for me – I like to write things down.

  • Unlike my last exam, the proctors were immediately responsive to chat requests for help. I tested this multiple times with quick responses.

  • The process was quite a bit smoother this time around, and it surely beats driving to an exam center.

  • Once you’re in the exam it feels pretty much like taking an exam at any other test center.

VMware Event Broker Appliance – Part XI – Changing options in the OVA installer

In Part X, we talked about building the VEBA OVA from source code. In this post, I will explain the change I made that required me to rebuild the appliance.

It was a relatively simple change – although it’s best practice to keep SSH turned off, I deploy a LOT of VEBA appliances. I’m always doing some kind of testing to do my part to contribute this open source project. I usually have to turn on SSH to do what I need to do with the appliance, so I wanted a way to have SSH automatically enabled.

This is a screenshot from the v0.4 appliance that has my change included – just a simple “Enable SSH” checkbox.

If you would like to check out the pull request, you can find it here. There were five files that needed to be changed. I am pasting screenshots from the PR on Github, the PR shows you all changes made to the code.

manual/photon.xml.template. This file defines all available properties in the OVA. I have named my property ‘enable_ssh’.

test/deploy_veba_eventbridge_processor.sh
test/deploy_veba_openfaas_processor.sh

– These fiiles are used for automated deployments of the appliance in either EventBridge or OpenFaaS mode. You can see the VEBA_NOPROXY line in the EventBridge file where I deleted some inadvertent spacing that I introduced in a prior PR. The change for the SSH feature included adding the default value of False to enable SSH, then adding a line of code to push the value into the OVF for deployment.

files/setup.sh – This file extracts the values input by the user into the OVA and places them into variables for use during the rest of the appliance setup scripts. I

files/setup-01-os.sh – There are 9 different shell scripts in the files folder that perform various configuration tasks when the appliance is deployed.

In the OS setup file, I removed the default code that stopped and disabled SSHD. Instead, I perform an ifcheck on the ENABLE_SSH variable and start it if the box is checked.

After I made all of the code changes, I then built the appliance as shown in Part X to test. Once everything worked, I filed the PR to get my changes incorporated into the product. Special thanks to William Lam for teaching me how this process works.

My remotely proctored AWS Certified Solution Architect – Associate exam experience

I recertified my AWS SAA certification this past Friday. It was the older SAA-C01 exam which expires on June 30th, it has been replaced by the SAA-C02 exam. The SAA-C01 seemed harder than the SAA-C00 exam that I took 3 years ago, there seemed to be more speeds and feeds memorization that I thought the original exam did not focus on.

Thank you to A Cloud Guru which is the only tool I used to pass the exam.

This is the first time I ever took a remotely proctored exam. A couple of points from this experience:

  • You get checked in by a registration person, they are not the exam proctor. The registration person can see your webcam, they talk to you and give you instructions.
  • They’re not kidding when they say they want your desk empty – literally nothing but a keyboard and mouse. They even complained about my laptop being on the desk – it was docked and it took a fair amount of convincing to let me continue.
  • You’re going to have to use your webcam to show them around the room, so be prepared to take down a monitor-mounted one or to spin your laptop around.
  • If they see or hear somebody in the room with you, it’s an instant fail. Make sure you are in a room with a locked door.
  • Do yourself a favor and disconnect and unplug any dual-or-greater monitor configuration before you get into the exam.
  • There is no scratch paper, no dry-erase board, your only option is an electronic notepad built into the testing software. It wasn’t a big deal for this exam but I could see this being a big problem for subnetting on a CCNA exam, or calculations on a VCAP design exam.
  • I got the feeling that nobody was really minding the store. I had an issue and clicked on the “chat” for help, it took somebody like 15 minutes to respond, they said they would call my computer but they never did.
  • Overall it wasn’t all that bad, and it surely was nice to not have to drive back and forth to a testing center. I would do it for another exam where I wasn’t expecting to have to write notes.

Changing a VMC segment type from the Developer Center

Here’s a way to use the API explorer to test out API calls. This is one technique to figure out how the API works before you start writing code in your favorite language.

First I create a disconnected segment in my SDDC

Then I go to the Developer Center in the VMC console, pick API Explorer, NSX VMC Policy API, and pick my SDDC from the dropdown.

Now I need a list of all segments – I find this in /infra/tier-1s/{tier-1-id}/segments

I provide the value ‘cgw’ for tier-1-id and click Execute

It’s easiest to view the results by clicking the down arrow to download the resulting JSON file.

Inside the file I find the section containing my test segment ‘pkremer-api-test’.

        {
            "type": "DISCONNECTED",
            "subnets": [
                {
                    "gateway_address": "192.168.209.1/24",
                    "network": "192.168.209.0/24"
                }
            ],
            "connectivity_path": "/infra/tier-1s/cgw",
            "advanced_config": {
                "hybrid": false,
                "local_egress": false,
                "connectivity": "OFF"
            },
            "resource_type": "Segment",
            "id": "15d1e170-af67-11ea-9b05-2bf145bf35c8",
            "display_name": "pkremer-api-test",
            "path": "/infra/tier-1s/cgw/segments/15d1e170-af67-11ea-9b05-2bf145bf35c8",
            "relative_path": "15d1e170-af67-11ea-9b05-2bf145bf35c8",
            "parent_path": "/infra/tier-1s/cgw",
            "marked_for_delete": false,
            "_create_user": "pkremer@vmware.com",
            "_create_time": 1592266812761,
            "_last_modified_user": "pkremer@vmware.com",
            "_last_modified_time": 1592266812761,
            "_system_owned": false,
            "_protection": "NOT_PROTECTED",
            "_revision": 0
        }

Now I need to update the segment to routed, which I do by finding PATCH /infra/tier-1s/{tier-1-id}/segments/{segment-id}. I fill in the tier-1-id and segment-id values as shown (the segment-id came from the JSON output above as the “id”.

This is code that I pasted in the Segment box

{
    "type": "ROUTED",
    "subnets": [
     {
        "gateway_address": "192.168.209.1/24",
        "network": "192.168.209.0/24"
     }
     ],
     "advanced_config": {
         "connectivity": "ON"
      },
     "display_name": "pkremer-api-test"
}

The segment is now a routed segment.

Per-zone DNS resolution for homelabs

One of the problems I’ve had with my homelab is the fact that logging into my corporate VPN every day changes my DNS servers, so I cannot resolve homelab DNS. For the past 4+ years I’ve gotten past this using hostfile entries, which is quite annoying when you’re spinning up workloads dynamically.

I posted this question the VMware homelab Slack channel and Steve Tilkens came back with /private/etc/resolver for the Mac. He wrote:

Just create a file in that directory named whatever your lab domain name is (i.e. – “lab.local”) and the contents should contain the following:
nameserver 192.168.0.1
nameserver 192.168.0.2

This didn’t help me on Windows, but immediately helped another employee.

But then I started Googling around for things like ‘/private/etc/resolver for Windows’ and somewhere I found somebody suggesting the Windows NRPT. The first hit on my search was a Scott Lowe blog talking about using the resolver trick on a Mac so if you want a detailed explanation of the Mac stuff, check it out.

Anyway it took me like 10 seconds to open up the Local Group Policy editor (gpedit.msc) on my laptop and configure my laptop to resolve my AD domain via my homelab domain controllers. Years of searching over!

VMC on AWS – VPN, DNS Zones, TTLs

My customer reported an issue with DNS zones in VMC, so I needed it up in the lab to check the behavior. The DNS service in VMC allows you to specify DNS resolvers to forward requests. By default DNS is directed to 8.8.8.8 and 8.8.4.4. If you’re using Active Directory, you generally will set the forwarders to your domain controllers. But some customers need more granular control over DNS forwarding. For example – you could set the default forwarders to domain controllers for vmware.com, but maybe you just acquired Pivotal, and their domain controllers are at pivotal.com. DNS zones allow you direct any request for *.pivotal.com to a different set of DNS servers.

Step 1

First, I needed an IPSEC VPN from my homelab into VMC. I run Ubiquiti gear at home. I decided to go with a policy-based VPN because my team’s VMC lab has many different subnets with lots of overlap on my home network. I went to the Networking & Security tab, Overview screen which gave me the VPN public IP, as well as my appliance subnet. All of the management components, including the DNS resolvers, sit in the appliance subnet. So I will need those available across the VPN. Not shown here is another subnet 10.47.159.0/24, which contains jump hosts for our VMC lab.

I set up the VPN in the Networks section of the UniFi controller – you add a VPN network just like you add any other wired or wireless network. I add the appliance subnet 10.46.192.0/18 and my jump host subnet 10.47.159.0/24. Peer IP is the VPN public IP, and local WAN IP is the public IP at home.

I could not get SHA2 working and never figured out why. Since this was just a temporary lab scenario, I went with SHA1

On the VMC side I created a policy based VPN. I selected the Public Local IP address, which matches the 34.x.x.x VMC VPN IP shown above. The remote Public IP is my public IP at home. Remote networks – 192.168.75.0/24 which contains my laptop, and 192.168.203.0/24 which contains my domain controllers. Then for local networks I added the Appliance subnet 10.46.192.0/18 and 10.47.159.0/24. The show up as their friendly name in the UI.

The VPN comes up.

Now I need to open an inbound firewall rule on the management gateway to allow my on-prem subnets to communicate with vCenter. I populate the Sources object with subnet 192.168.75.0/24 so my laptop can hit vCenter. I also set up a reverse rule (not shown) outbound from vCenter to that same group. This isn’t specifically necessary to get DNS zones to work since we can only do that for the compute gateway – it’s the compute VMs that need the DNS zone. But I wanted to reach vCenter over the VPN.

I create similar rules on the compute gateway to allow communication between my on-prem subnets and anything behind the compute gateway – best practice would be to lock down specific subnets and ports.

I try to ping vCenter 10.46.224.4 from my laptop on 192.168.75.0/24 and it fails. I run a traceroute and I see my first hop is my VPN connection into VMware corporate. I run ‘route print’ on my laptop and see the entire 10.0.0.0/8 is routed to the VPN connection.

This means I will either have to disconnect from the corporate VPN to hit 10.x IP addresses in VMC, or I have to route around the VPN with a static route

At an elevated command prompt, I run these commands

route add 10.46.192.0 mask 255.255.192.0 192.168.75.1 metric 1 -p
route add 10.47.159.0 mask 255.255.255.0 192.168.75.1 metric 1 -p

This inserts two routes into my laptop’s route table. The -p means persistent, so the route will persist across reboots.

Now when I run a route print I can see that for my VMC appliance subnet and jump host subnet.

I can now ping vCenter by private IP from my laptop. I can also ping servers in the jump host subnet.

Now to create a DNS zone – I point to one of my domain controllers on-prem – in production you would of course point to multiple domain controllers.

I flip back to DNS services and edit the Compute Gateway forwarder. The existing DNS forwarders point to our own internal lab domain, and we don’t want to break that communication. What we do want is to have queries destined for my homelab AD redirected to my homelab DNS servers. We add the zone to the FQDN Zones box and click save.

Now we run a test – you can use nslookup, but I downloaded the BIND tools so I can use dig on Windows.

First dig against my homelab domain controller

dig @192.168.203.10 vmctest88.ad.patrickkremer.com

Then against the VMC DNS server 10.46.192.12

dig @10.46.192.12 vmctest88.ad.patrickkremer.com

The correct record appears. You can see the TTL next to the DNS entry at 60 seconds – the VMC DNS server will cache the entry for the TTL that I have configured on-prem. If I dig again, you can see the TTL counting down toward 0.

I do another lookup after the remaining 21 seconds expire and you can see a fresh record was pulled with a new TTL of 60 seconds.

Let’s make a change. I update vmctest88 to point to 192.168.203.88 instead of .188, and I update the TTL to 1 hour.

On-prem results:

VMC results:

This will be cached for 1 hour in VMC.

I switch it back to .188 with a TTL of 60 seconds on-prem, which is reflected instantly

But in VMC, the query still returns the wrong .88 IP, with the TTL timer counting down from 3600 seconds (1 hour)

My customer had the same caching problem problem, except their cached TTL was 3 days and we couldn’t wait for it to fix itself. We needed to clear the DNS resolver cache. In order to do that, we go to the API. A big thank you to my coworker Matty Courtney for helping me get this part working.

You could, of course, do this programmatically. But if consuming APIs in Python isn’t your thing, you can do it from the UI. Go to the Developer Center in the VMC console, then API explorer. Pick your Org and SDDC from the dropdowns.

Click on the NSX VMC Policy API

In the NSX VMC Policy API, find Policy, Networking IP, Management, DNS, Forwarder, then this POST operation on the tier-1 DNS forwarder

Fill out the parameter values:
tier-1-id: cgw
action: clear_cache
enforcement_point: /infra/sites/default/enforcement-points/vmc-enforcementpoint

Click Execute

We see Status: 200, OK – success on the clear cache operation. We do another dig against the VMC DNS server – even though we were still within the old 1 hour cache period, the cache has been cleared. The VMC DNS server pulls the latest record from my homelab, we see the correct .188 IP with a new TTL of 60 seconds.

AD authentication for vCenter in VMC on AWS

VMware has good documentation on setting up Hybrid Linked Mode in VMC, but the docs are a little bit confusing if all you want is Active Directory authentication into the VMC vCenter. This post shows how I was able to configure AD authentication for a VMC on AWS vCenter.

Step 1

I first wanted to build a domain controller in the connected VPC, allowing AD communication across the ENI. If you already have a domain controller accessible via VPN or Direct Connect, you do not need to worry about this part of the configuration, you can skip to Step 2. But I wanted to demonstrate AD communication across the ENI as part of this post. To figure out which EC2 subnet I need my domain controller in, I looked at Networking & Security Overview

I created a Windows 2016 EC2 instance, gave it an IP of 172.20.0.249, and promoted it to a domain controller. My test domain was named: poc.test. I needed to open the firewall to allow the management network in VMC to communicate with the domain controller. Best practice would obviously be to restrict communication to only Active Directory ports, but I opened it all up to make things simpler. The 0.0.0.0/0 for RDP was to allow domain controller access from the public internet – obviously not something you’d want to do in production, but this is just a temporary lab. The default outbound rule in EC2 is to allow everything, which I left in place.

I also needed to open the compute gateway firewall to allow bidirectional communication across the ENI, which I’ve done below.

Step 2

Once you have a Domain Controller available, you need to point the management gateway DNS to your domain controller. In this example I also pointed the Compute Gateway DNS to the domain controller.

Step 3

Even though you’re not setting up Hybrid Linked Mode, it’s a good idea to use some of the HLM troubleshooting tools to ensure connectivity to the domain controller. I ran the 5 tests shown below against my DC IP 172.20.0.249

Step 4

Now we need to configure an identity source in the VMC vCenter. Log in as cloudadmin@vmc.local. You can find this under Menu>Administration, then Single Sign On>Configuration, then Identity Sources. Click Add to add an identity source.

Select Active Directory over LDAP in the Identity Source Type dropdown.

Fill out the identity source according to your Active Directory environment. You would want to use the secondary LDAP server in production, and you would never use a Domain Admin account as the LDAP user in production.

Once the identity source is added, you will see it in the list.

Log out as cloudadmin@vmc.local and log in as a domain user.

If we enter the correct password, we receive this error. This is OK as we have not granted any domain user access to our vCenter. All domain users are granted No Access by default.

Log back in as cloudadmin and grant privileges to a domain user. In our case we want to grant admin rights at the vCenter level, we click on the vCenter object, then Permissions, then the plus to add a permission.

The AD domain should show up in the dropdown.

If you start typing in the User/Group line, a dropdown will auto-populate with matching AD objects. I pick Administrators.

Be careful here – you cannot grant the Administrator role in VMC because you are not an administrator – only VMware support has full administrative access to an SDDC. Instead, grant the CloudAdmin role. Check Propogate to send the permission down the entire tree.

We now see the new permission in the Permissions list.

Now log off as cloudadmin, and log in as the AD user.

Success! You can now grant permissions to Active Directory users.

VMware Event Broker Appliance – Part X – Building the Appliance OVA from source code

In Part IX of this series, we deployed the datastore usage sample function into our appliance. In this post, we discuss how to build the appliance OVA from source code. This blog post would not have been possible without William Lam spending his valuable time teaching the VEBA team about the build process.

The VEBA appliance is built periodically, when enough features have been added to warrant a release. However, changes are committed frequently to the development branch. Sometimes, you want to deploy a new feature but don’t want to wait for a new release. Or, in my case, I wanted to make a change to the options shown while deploying the OVA. This requires rebuilding the appliance. Everything you need to package the appliance for use is available in the event broker repository.

Step 1 – Build machine

First, you need a build machine. You cannot use Windows for this task. I run a Windows laptop, so I ended up building a Ubuntu build server in my homelab. Here are all of the packages I added in order to be able to build an OVA.

#Git
apt install git-all

#Unzip
apt install unzip

# OpenFaas command line utility faas-cli
curl -sSL https://cli.openfaas.com | sudo sh

#Packer utility from Hashicorp
wget https://releases.hashicorp.com/packer/1.5.6/packer_1.5.6_linux_amd64.zip
unzip packer_1.5.6_linux_amd64.zip -d /usr/local/bin

#Download ovftool from My VMware
#https://my.vmware.com/group/vmware/get-download?downloadGroup=OVFTOOL440
# I used WinSCP to copy the file to the Linux VM. I’m beginning to see why lots of # #developers like Macs

chmod 744 VMware-ovftool-4.4.0-15722219-lin.x86_64.bundle
./VMware-ovftool-4.4.0-15722219-lin.x86_64.bundle

#Install PowerShell 7.0
# https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell-core-on-linux?view=powershell-7
# The documentation tells you to do the following commands:
wget -q https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb
dpkg -i packages-microsoft-prod.deb
apt-get update
add-apt-repository universe
apt-get install -y powershell

#But the commands result in this error:
#Reading package lists… Done
#Building dependency tree
#Reading state information… Done
#Some packages could not be installed. This may mean that you have
#requested an impossible situation or if you are using the unstable
#distribution that some required packages have not yet been created
#or been moved out of Incoming.
#The following information may help to resolve the situation:

#The following packages have unmet dependencies:
#powershell : Depends: libssl1.0.0 but it is not installable
#Depends: libicu60 but it is not installable
#E: Unable to correct problems, you have held broken packages.

# This problem is fixed in RC https://github.com/PowerShell/PowerShell/releases/tag/v7.0.0-rc.1
# Download the release candidate code, extract it, and add a path to $PATH

# Install PowerCLI
install-module vmware.powercli

# Configure Git
root@build01:~# git config –global user.name “Patrick Kremer”
root@build01:~# git config –global user.email “pkremer@vmware.com”

# Configure git password cache for github
# https://help.github.com/en/github/using-git/caching-your-github-password-in-git

git config –global credential.helper cache
git config –global credential.helper ‘cache –timeout=86400’

# Clone my fork, add the upstream repo
git clone https://github.com/kremerpatrick/vcenter-event-broker-appliance.git
git remote add upstream https://github.com/vmware-samples/vcenter-event-broker-appliance.git

Step 2 – Clone the repo

You need to clone the development repo. If you are not familiar with git, take a look at Part V. If you’ve been following the examples and have the code, but just need to make sure you have the latest copy of the development branch, look at Part VI.

Step 3 – Build an ESXi host

If you’ve still got a 6.7 host around, you’re in luck. But as of the date of this blog post, you cannot build the VEBA appliance against a 7.0 host. The build process relies on VNC, which was removed from ESXi in 7.0.  My lab is on 7.0, so I had to build a nested ESXi.  I know for a fact that I’ve gotten the native MAC learning feature working in my lab, but I could not get it to work nesting 6.7 inside 7.0. I’m not sure why it wouldn’t work, so I ended up reverting to the tried-and-true promiscuous mode for the outer portgroup, and an untagged portgroup for the inner one. In this screenshot, esx03 is a VM running on physical hosts in cluster CL1, but I added it as a host outside the cluster in vCenter.

Step 4 – Edit JSON files

In the root of the vcenter-event-broker-appliance directory are photon.json and photon-builder.json. You must edit them to match your environment

root@build01:/var/git/vcenter-event-broker-appliance# cat photon-builder.json
{
  "builder_host": "192.168.30.14",
  "builder_host_username": "root",
  "builder_host_password": "VMware1!",
  "builder_host_datastore": "sm-vsanDatastore",
  "builder_host_portgroup": "VM Network"
}

builder_host is the nested ESXi 6.7 host. Make sure to enter the correct datastore, it’s easy to make a typo so consider copy-pasting the value. You need to keep the portgroup “VM Network”, so you will need to put a portgroup named “VM Network” on your nested ESXi host. Otherwise your own portgroup name will be built into the appliance. Also, the automation scripts in the /test folder will break because they’re looking for “VM Network” to be in the appliance.

Now look at the top of photon.json. Adjust the file to point to the vCenter managing your nested ESXi host.

root@build01:/var/git/vcenter-event-broker-appliance# more photon.json
{
  "variables": {
    "veba_ovf_template": "photon.xml.template",
    "ovftool_deploy_vcenter": "192.168.30.200",
    "ovftool_deploy_vcenter_username": "administrator@vsphere.local",
    "ovftool_deploy_vcenter_password": "VMware1!"
  },

You can also look at photon-version.json – unless you work for VMware, you probably won’t be releasing VEBA, so you won’t really need to adjust this file. But it’s important to note that the output of the build process will look like an official build. If you build the OVA straight out of the development branch, you’ll end up with a appliance name like vCenter_Event_Broker_Appliance_v0.4.0.ova – this of course does not match the actual release v0.4.0, so don’t get yourself confused with binaries downloaded on your machine.

root@build01:/var/git/vcenter-event-broker-appliance# cat photon-version.json
{
  "version": "0.4.0",
  "description": "Photon Build for vCenter Event Broker Appliance",
  "vm_name": "vCenter_Event_Broker_Appliance",
  "iso_checksum": "f6619bcff94cef63d0d6d7ead7dd3878816ebfa6a1ef5717175bb0d08d4ccc719e4ec7daa7db3c5dc07ea3547fc24412b4dc6827a4ac332ada9d5bfc842c4229",
  "iso_checksum_type": "sha512",
  "iso_url": "http://dl.bintray.com/vmware/photon/3.0/Rev2/iso/Update1/photon-3.0-a0f216d.iso",
  "numvcpus": "2",
  "ramsize": "8192",
  "guest_username": "root",
  "guest_password": "##FILL-IN-SECURE-PASSWORD##"
}

We start the build with build.sh

build.sh release

It will take some time the first time to download photon your system. It caches the iso so future builds aren’t as slow.

The script will also sit for some time on the ‘Starting HTTP server’ message. If you look at your nested host you will eventually see a disk being built

This is what is running behind the HTTP server – the build script has created a PXE boot host

The VM will eventually start booting

If you want to see the PXE boot happening live, you will quickly need to establish a console session on the VM as it boots. You will see packer typing PXE boot commands over VNC. The VM will PXE boot, run commands, then reboot into Photon and end up at a command prompt

The script completes the rest of the configuration via SSH.

The build will progress until completion.

The OVA file will be in the output-vmware-iso folder. You can see here that I renamed it with my initials so I don’t get confused by the official 0.4.0 ova.

root@build01:/var/git/vcenter-event-broker-appliance/output-vmware-iso# ls -al
total 1232992
drwxr-xr-x 2 root root 4096 May 15 23:49 .
drwxr-xr-x 16 root root 4096 May 18 20:43 ..
-rw——- 1 root root 1262570496 May 15 23:44 vCenter_Event_Broker_Appliance_v0.4.0-pk.ova

That’s it for building VEBA. In the next post we will look at the underlying change that made it necessary for me to build the OVA – how to change the options in the deployment GUI.