It is quite apparent for anybody following me on Twitter which side of the political spectrum I fall on. The reason for mentioning politics in this post will be revealed in just a few paragraphs.
My technical skills have undergone a significant shift over the past year. I’ve done things I couldn’t have imagined doing last year because I started helping with our open source VEBA Fling. I’ve built appliances, I learned the basics of git, and my own code is running inside VEBA. This was not a core part of my job, it was done all outside of work hours.
A couple of months ago, I became aware of another Fling, the Python Client for VMware Cloud on AWS. I started playing with it because I really did not understand how APIs work, and the Fling masks the API implementation. It’s essentially a command-line interface into VMware Cloud on AWS. The only reason I had any idea what I was doing was because of the skills I gained working with VEBA. I understood the basics of git and the basics of an open source project, so it was easy for me to grab a copy of the code and play with the tool.
In September, I came across a customer that was having problems connecting a VPC to their VMC on AWS SDDC, and I learned how to troubleshoot this issue via the VMC API, resulting in this post. Then I thought about how much easier it would have been if I could have just run a command in the Python Client for VMware Cloud on AWS. So I started looking at the Python code and copy-pasted my way to this pull request – I added 2 more commands letting anybody troubleshoot this problem from the command line. Once somebody shows you how to do one API call in Python, it’s a lot easier to figure out how to make other calls, and the Python Client for VMC is full of dozens of API calls.
Now that I had written the those commands, API calls weren’t quite as mystifying. I started taking an introduction to Python course so I could understand the code instead of just blindly copy-pasting existing code.
Around the same time, mid-September, Nico Vibert mentioned a Fling idea that I was interested in, a bit of an evolution of his Python Client for VMC Fling. So I thought I’d start working on it and see what I could learn. I read a little Python, wrote a little Python, but then I got a little stuck, so I took some more of my online Python course.
I returned to this project at the beginning of October. I wrote a little Python, then read about it, took another course module, then wrote a little more. Repeat.
This is a screenshot of my contributions in the VMware private git instance. Over the course of 3 weeks I amassed dozens of commits as we build out this Fling.
Now to the intersection of professional and personal. If you’re a Republican, the rest of this post is about how I used my skills to fight you, so you may not care to read further. I am the IT Director for the Democratic Party in my county. Due to COVID-19, vote-by-mail utilization has exploded. The County Clerk publishes a daily report on the status of vote by mail – who has requested a ballot, when they requested it, when they returned it, and a status code. Even though voter information is public record and this entire file is freely downloadable on the public internet, I have blurred out the voter names in this screenshot.
What political parties do during an election is something called ballot chase – calling voters who have had ballots rejected, and calling voters who have not yet voted to ensure they vote by election day.
This PDF is a worthless data source for calling voters. I could have people try manually wading through the list and calling manually, but that’s not a particularly efficient or scalable process. The formatting of this PDF is such that you can’t copy-paste it as there are no field delimiters. The data is effectively locked away.
And then an idea formed. Could I use Python? I started looking around and found a PDF module PyPDF2. Through extensive Googling and manipulation of code samples, I found that I could iterate through every row of the PDF and extract the text. And much like working on the Fling, I read a little code, wrote a little code, and spent some time learning more about Python.
For two weeks I’ve worked many hours every evening and managed to make incremental improvements on a script to process the PDF. The first iteration could only extract text. The next one tried to parse out rows, which was fun given that there were no field delimiters in the file. Evening by evening I added a little more functionality. At this point, the script pulls the latest PDF down from the clerk’s office, parses it into a CSV, searches another file to find phone number matches, calculates status code percentages, and emails the file out to the people who need it. Plus we can load it into our manual dialer so people can call voters.
The first iteration of this script was a horribly inefficient search because I had no idea what I was doing. Then I discovered the pandas library and cut the processing down from 2 hours to 10 minutes. Now I have another tool in my Python toolbox that I can bring back to my work at VMware.
I would love to share the code publicly but have no interest in my work helping the Republican party, so it remains in a private repo. Any Republican is just as free as I am to donate their time to their party.