Using snapshots to back up SQL under heavy IOPS loads

I find this problem coming up frequently in the field – you’ve virtualized your SQL server and you’re trying to back it up with your snapshot-based software.  In many cases, the SQL server is under such a heavy load that it’s impossible to commit the snapshot after the backup is taken. There’s just too much IO demand. You end up having to take a SQL outage to stop IO long enough to get the snapshot committed.

Here’s one strategy for setting up your virtual SQL servers to avoid this problem altogether. It uses a disk mode called independent persistent. An independent persistent disk is excluded from snapshots – all data written to an independent persistent disk is immediately committed, even if a snapshot is active on the VM. By placing SQL datafile and logfile drives in independent persistent mode, they will never be snapshot, eliminating the problem of having to commit a post-backup snapshot.

Here’s a disk layout that I’ve used for SQL servers. These drives are set to standard mode, so a snapshot picks them up.

C:\ – 40GB, SCSI 0:0
D:\ – 30GB, SQL Binaries   SCSI 0:1
L:\ – Logs, 1GB  SCSI 1:0
O:\  – Datafiles, 1GB  SCSI 2:0
Y:\   – Backups, 1GB  SCSI 3:0
Y:\SQLBAK01 – SCSI3:1, 2TB+ mounted filesystem under Y:\

Your backup drive is limited to 2TB -512B if you’re using vSphere 5.1 or earlier, but can go up to 62TB in later versions of vSphere.

L:\Logs01 – SCSI 1:1, independent persistent, variable size, mounted filesystem under L:
O:\SQLData01 – SCSI 2:1, independent persistent, variable size, mounted filesystem under O:\

Part of why we used mountpoints was for consistency – no matter what, L: was always logs, O: was always SQL data, and Y: was always backups. There were no questions as to whether a particular SQL server had a certain number of drives for a specific purpose – the entire structure was under a single, consistent drive letter.

Depending on the workload, O:\SQLData01 might have only 1 heavily used database on a single LUN, or it might have a bunch of small databases.  When we needed another one, we’d attach another mountpoint O:\SQLData02 on SCSI 2:2, L:\Logs02 on SCSI 1:2, Y:\SQLBAK02 on SCSI 3:2. Nightly backup jobs wrote SQL backups out to Y:\.  Since the Y drives are all in standard mode, backup jobs picked up the dumps in the normal snapshotting process.

If you had a complete loss of the entire SQL VM, you could restore from backup and you’d still have the L:, O:, and Y: drives with their mountpoints (although they might not have any disk attached to them), and you’d have to restore the database from the SQL dumps on Y:\.  Depending on what the nature of VM loss was, you may have to spend some time manually fixing the mounts.

It took a little bit of work to maintain, but our backups worked every time. Part of setting up a new database was that the DBAs wrote a restore script and stored it in the root of Y: which got backed up as part of the snapshot. Once the VM came back from Veeam restore, the DBAs would bring up SQL, hit the restore scripts, and we were off and running. You also need to coordinate your DBA’s backup schedule carefully with your backup software schedule – what you don’t want is to have backups being written to the Y: drive at the same time you’ve got an active snapshot in place – you could easily fill up the datastore if your backups are large enough. Some backup software allows you to execute pre-job scripting, it’s a fairly simple task to add some code in there to check if an active SQL backup was running. If so, postpone your backup snapshot and try again later.

Leave a Reply

Your email address will not be published. Required fields are marked *