VMware Event Broker Appliance – Part XII – Advanced Function Troubleshooting (OpenFaaS)

In Part XI, we discussed changing configuration options in the OVA UI. In this post, we will discuss advanced function troubleshooting.

A VEBA bug report came in saying “Hostmaint-alarms fails on hosts where the cluster name contains a hyphen with a preceeding space”. For example, if you have a cluster named CL-2, the function works. But if your cluster is named CL – 2, the function fails. The bug report states the error message is “/root/function/script.ps1 : Cannot process argument because the value of argument “name” is not valid. Change the value of the “name” argument and run the operation again.”

One way to test this is to actually create a cluster with the name “CL – 2” and enter/exit maintenance mode. This grows tiresome when you’re actively debugging a problem – you don’t want to be continually putting a host in and out of maintenance mode.

Another way to do this is to first use the echo function to intercept the payload from the event router in VEBA. The echo function will drop the JSON into the pod’s logfile.

First, deploy the echo function to your VEBA. You will need to change the gateway to match your VEBA. If you open up the current stack.yml file for the host maintenance function, you find the topics are EnteredMaintenanceModeEvent and ExitMaintenanceModeEvent. These are the topics we want in our echo function.

Deploy the echo function with:

faas-cli deploy -f stack.yml –tls-no-verify

Look at Part IV if you need a refresher on deploying functions.

Once the function is deployed, we need to tail (a.k.a. follow in Kubernetes) the echo function logfile. Look at Part VIII for details on tailing logfiles in Kubernetes

kubectl get pods -A
kubectl logs -n openfaas-fn veba-echo-5dcf46b9c-k7v2s --follow

Then do a maintenance mode operation on a host – in this example, I take a host out of maintenance mode. The logfile contains JSON that I can easily copy and paste into VS Code

2020/09/09 23:29:36 Forking fprocess.
2020/09/09 23:29:36 Query
2020/09/09 23:29:36 Path  /
2020/09/09 23:29:36 Duration: 0.054708 seconds
{"id":"3dd3d87f-9d10-408f-8a0f-41bb74faad08","source":"https://vc01.ad.patrickkremer.com/sdk","specversion":"1.0","type":"com.vmware.event.router/event","subject":"ExitMaintenanceModeEvent","time":"2020-09-09T23:29:34.941977022Z","data":{"Key":334033,"ChainId":334030,"CreatedTime":"2020-09-09T23:29:34.911981Z","UserName":"VSPHERE.LOCAL\\Administrator","Datacenter":{"Name":"HomeLab","Datacenter":{"Type":"Datacenter","Value":"datacenter-2"}},"ComputeResource":{"Name":"CL - 2","ComputeResource":{"Type":"ClusterComputeResource","Value":"domain-c6240"}},"Host":{"Name":"esx04.ad.patrickkremer.com","Host":{"Type":"HostSystem","Value":"host-6244"}},"Vm":null,"Ds":null,"Net":null,"Dvs":null,"FullFormattedMessage":"Host esx04.ad.patrickkremer.com in HomeLab has exited maintenance mode","ChangeTag":""},"datacontenttype":"application/json"}

It is unformatted and messy when you paste it, but save it as a JSON file, I named mine test.json.

Then you can run the Document Formatter with Shift-Alt-F. The result is a much more readable JSON file for you!

Once you have your JSON, you can tail the logfile for the hostmaint function. This obviously assumes you already have the hostmaint function installed. If not, check out Part VII.

kubectl get pods -A
kubectl logs -n openfaas-fn  powercli-entermaint-74bfbd7688-w6t6x  --follow

Now log into the OpenFaaS UI at https://your-veba-URL

Click on the powercli-entermaint function, then click on JSON, and paste the JSON into the Request Body. I am pasting the JSON that includes the space-hyphen problem. Click Invoke.

You should get a success 200 code message at the bottom.

Now look at the logfile that we tailed above – the error message appears. We can now easily troubleshoot the JSON by making changes right in the OpenFaaS UI

All I had to do to get the function working is change the cluster name CL – 2 to CL-2 in the JSON.

When I click Invoke again, the function succeeds. Now I know I can reproduce the issue by adding space-hyphen, and I can fix the issue by removing it.

The issue is that PowerShell is interpreting the space-hyphen as a command line argument. The JSON needs to be treated as a single object and not parsed. PowerShell handles this by enclosing arguments in double quotes, so we know we somehow need to get the JSON argument passed in double quotes.

To figure out why it’s currently failing, we look at the function’s Dockerfile.

It tells us that powershell is invoked as ‘pwsh -command’

The fprocess variable tells us how the arguments are invoked – xargs will take whatever gets piped to it and pass it as an argument to the script.

Thank you to PK for showing me another file, template.yml, which also contains an fprocess variable. This overrides the fprocess variable in the Dockerfile and is the location where we actually need to make a change.

But how do we figure out what to change it to?

This command gets us a BASH shell on the pod running the maintenance function

kubectl get pods -A
kubectl exec -i -t -n openfaas-fn powercli-entermaint-74bfbd7688-w6t6x  -- /bin/bash

You can see the first few lines of the maintenance script running inside the pod

Now we want to try manually executing the function inside the pod. First I need the JSON file inside the pod. Oops, no text editor in this pod. curl is there, so if you can put the JSON file on a webserver accessible to the appliance, you can save it inside the pod.

In my case I decided I wanted vi so I could easily edit the file.

tdnf install vim

I open a new file named 1.json in /root and paste my JSON file into it, then save. This version of the file contains the space-hyphen problem.

I directly execute the function in the pod and get the same error that was reported in the bug.

cat 1.json | xargs pwsh ./function/script.ps1

I copy 1.json to 2.json and change the cluster name from CL – 2 to CL-2. This time it succeeds

cat 2.json | xargs pwsh ./function/script.ps1

Now I have reproduced the problem inside the pod and can work on a fix, which involved a bit of trial and error working with all of the options in xargs. I arrived at this command

 xargs -0 -I '{}' pwsh ./function/script.ps1 "'{}'"

The -0 switch ignores blank spaces and problematic characters like newlines. the -I switch gives me the ability to dictate exactly how my argument gets passed. By default xargs just drops the argument in at the end of the line. But in this case, it will first pass double quotes, then the JSON argument, then end with double quotes. And we get what we want – a JSON argument enclosed in double quotes.

Both versions of the JSON file succeed in the pod.

Then I make the change in template.yml. I have to rebuild the function to test it, look at Part VII for details on how to do this. But it works and I can then submit a pull request to fix this in the main VEBA repo.

That’s all for this post. In Part XIII, we look at deploying a function in a new language that I have never worked with – Go!

2 comments

Leave a Reply

Your email address will not be published. Required fields are marked *