Computer testing vendor Pearson Vue suffered a massive outage this past week – at least most people would call it an outage. Pearson Vue’s spin team tried to say their systems were 100% up, only slow, but countless posts online contradict this.
The issues were first acknowledged on the company’s Facebook page.
A second generic post again on April 24th.
I first learned of this outage when I walked into a Vue testing center for an exam on April 24, only to discover that they were unable to deliver because Vue’s servers were not accessible. The center called in to Vue, and customer service said all their systems were frozen and nothing could be done.
Pearson Vue put up another April 24th post suggesting that users try scheduling during non-peak hours.
On April 25th came the first of many outright lies posted by Pearson Vue.
This leads you to believe the system is up but slow. This was not the case. I tried many times to log in without success, as did others such as this Facebook poster.
I called multiple times, only to be told by customer service that they could not log in. This happened to people worldwide, here are a few of the many posts on Facebook.
Testing centers were not able to deliver exams, either.
Later in the day on April 25th came a post with another outright lie saying “our systems are operational, just not optimal”
That post prompted me to post the following, which was not replied to or acknowledged in any way.
On April 26th, a series of posts came out saying that engineers had found the problem and they were bringing the system back to expected performance levels.
A Facebook post directly under the above message shows a user who still can’t schedule an exam using customer service – the timestamp on this is April 28th, 9:30AM CDT.
On April 28th at 10:30 AM CDT, Pearson Vue had the audacity to ask users to stress test the system for them.
The user impact of this outage has been massive. It was more of an inconvenience for me. But for others, there were signifiant impacts in time, expense, and even their ability to work.
Here is one Facebook post from a user who has no Pearson Vue facility in their country. They have to get a visa to leave the country to sit an exam. In order to get a visa, they have to make an appointment with their embassy. Once they get their appointment, they have to register for the exam and bring printed confirmation. Unable to register for over a week, this user loses the embassy appointment.
I know for a fact that I saw dozens more posts with similar problems – physicians unable to go to board exams, nurses unable to work because of results delays. I wish I had thought to screencap more while this was going on, but I didn’t. It appears that those posts were either eaten up by Facebook (yeah right) or deleted by Pearson (likely, but can never be proven). At least one user wrote a post confirming post removal. None of my posts were deleted.
As an IT professional, I find this outage appalling. The company states this was started by an upgrade. Every place I’ve ever worked at upgrades during off hours and rolls back on failure. Pearson deployed a faulty upgrade then forced its users to pay the price while programmers scrambled desperately to fix their poorly written code. Pearson Vue’s suggestion that they carefully planned and tested their upgrades is nonsense. A proper load test reveals these kinds of failures. Their post from today ‘inviting’ us to load test their fixed system also points to the fact that they are unable or unwilling to test their own systems.
The fact that this upgrade caused a global outage for both scheduling and test delivery demonstraties critical failures of their architecture. Are the same webservers used for scheduling also used for exam delivery? Could a breach of vue.com could then result in the theft of exam content? Or instead are they separate servers connected to the same backend database? In any event, their architecture is an abomination. The global failure to deliver exams points to only two possibilities. Do all global Vue delivery centers connect to the same datacenter, meaning the ability to deliver exams globally relies on a single point of failure? If so, this is a catastrophic design flaw. If they do have datacenter redundancy, then they deployed their upgrade across the entire system at the same time. This demonstrates atrocious planning. Why would you risk multiple datacenters with the same upgrade?
Pearson Vue is a billion dollar, Fortune 500 corporation. This kind of an outage is both unacceptable and inexcusable. Considering the power that Vue wields over people’s careers, it’s frightening to witness the depth of ineptitude demonstrated in this disaster.