Making NetApp SnapManager “Play Nice”

Implementing NetApp SnapManager for SQL Server

I’m a firm believer in using any software at your disposal to it’s fullest.  There are times though that I curse my own name for adopting this mantra, because let’s face it, some software just refuses to “Play Nice”.  I don’t want to sound too critical here, but NetApp’s SnapManager for SQL Server product is one of those, how shall I say, onerous applications.  If you find yourself running into issues with the tool or are exploring the possibilities of adopting it in your environment, read on as I go over some of its idiosyncrasies.

What is SnapManager?

For those of you having no idea what NetApp’s SnapManager for SQL Server is, it’s effectively NetApp’s proprietary Virtual Disk Service (VDS) backup software solution for SQL Server. I know that may not clear things up on any level, so let’s quickly touch on what VDS is.

Starting with SQL Server 2005 we have the ability to create database backups utilizing the Virtual Disk Service or the Volume Shadow Copy Service (VSS).  This is advantageous for large databases that would generally take many hours via traditional native SQL Backups.  The basic premise of a VDS/VSS backup is a file (in this case our database) is quiesced (aka paused) at the OS level, a new file is created on the SAN to store future activity (deltas), and activity is then resumed at the OS but is now diverted to the newly created file.  What this does is effectively make the original SAN-level file a static point-in-time backup, as no more changes are saved to it.  There’s less data movement because these files are not being copied, so the time of backup and restore operations is greatly reduced.  Now I’m overly trivializing this whole process but that is the general theme as I understand it.  A thorough explanation of the entire architecture can be found here, here, or here if you’re really interested.

So why is this Difficult?

VDS/VSS solutions make use of volume-level backups.  For all intents and purposes of this post, a volume is equivalent to a LUN in Windows.  The ramifications of this is that all files on the volume/LUN must be quiesced at the same time before the backup operation can happen.  Because of this, NetApp has some stringent requirements regarding the setup of the storage behind your instance.  The main points are as follows:

  • System databases must reside on a separate LUN from user databases and are not backed up via VDS backups
  • Separate LUNS must be created for user database logfiles and datafiles
  • Tempdb must be on its own LUN
  • A separate SnapInfo LUN is required to store the streaming backups of the system databases and transaction logs
  • Application binaries and non-database files must be on LUNs not utilizing VDS backups
  • It is recommended to have no more than 35 databases per LUN

What this means is even your most basic instance will have, at minimum, 5 different LUNS (i.e. system, data, log, SnapInfo, tempdb).  This is outright ridiculous, but if you own the product and want to use it, you’re going to have to roll with it.  In order to validate you adhere to these requirements, you must run the Configuration Wizard prior to utilizing Snap backups.  Thankfully it will catch anything that is not setup properly so you’re not furiously troubleshooting why your Snaps didn’t execute successfully.

While mentioning the Configuration Wizard let’s touch on some of its idiosyncrasies.  The only way I’ve found to enable SnapVault technology on an instance is through the Configuration Wizard, and if you’re configuring a fresh instance that isn’t hosting any user databases, this option is greyed out.  To enable the SnapVault configuration page, ensure a user database is hosted on the instance prior to running the Configuration Wizard.  Additionally, when configuring an instance on a cluster, part of the final configuration process will offline the SQL Server service momentarily.  While this outage is all of 15 seconds, it is still an outage, so be aware of this if the instance is live and hosting active connections.

One final quirk of SnapManager for SQL Server is that you can only run one Snap operation against a volume at a time, otherwise you will find yourself encountering this error:

The requested SnapManager operation cannot start at this time because there is another SnapManager operation currently in progress. Please retry this operation after the current operation has finished.

For the most part this isn’t a big concern, but it is quite possible for a transaction log backup to cause a failure in your full backup job reporting the error above if they run in close (time) proximity of one another.  Ensuring your jobs are scheduled with no overlap or even a simple check to see if any other Snap-related jobs are running is generally all that is needed to ensure the error above is not encountered.

Bringing Order to the Chaos

As no two environments are the same, NetApp has provided numerous ways to manage your backups with this solution.  The most common, as implied above, is to run each instance’s backup jobs locally from the SQL Server Job Agent.  This is an easy and straight-forward approach, and NetApp’s Backup Wizard is even good enough to create a job on the instance after you walk through all the options you wish to utilize.

Alternatively, you can manage the backups from a remote server, so long as that server also has SnapManager for SQL Server installed on it.  This is useful if you wish to centralize all of your backup processes or serialize backups across different instances.  Thankfully we are given the option to pass the server/instance name in the NetApp backup command so no rocket science is required if we decide to go this route.

Deciding where to manage backups from will be dependent on your situation and environment.  Since I manage a heterogeneous environment containing numerous clusters and single-instances, I’ve elected to centrally manage my backups from a Central Managment Server (CMS).  Even though I am actively imposing a single point of failure for my environment, my CMS server is a VM so it’s relatively fault-tolerant. Though, in case of a catastrophic event, it’s easy enough to append a schedule to all of my local backup jobs (all named the same) that I keep in case I need to run an ad-hoc backup from the instance itself.

Centralizing the backup process has three distinct benefits for me.   The primary benefit is ease of administration.  If I need to adjust the backup window to avoid patching or some other scheduled maintenance, it’s much easier to handle if the jobs are centralized.  Another benefit, primarily seen in my clusters, is the knowledge that I’m ensuring no I/O spikes due to backup operations since I’m serializing them over the various clustered instances.  Finally, I can utilize my CMS as a verification server to validate my backups.  This shifts the workload away from my live environments which helps maintain a high level of overall performance.

Make it So

There are effectively three ways in which to initiate a backup using SnapManager, as follows:

  1. Graphical User Interface (GUI) Wizard
  2. Command Line Utility (SmsqlJobLauncher.exe)
  3. PowerShell Commands (using the SmsqlShell.psc1 ConsoleFile)

Obviously the GUI is an on-demand, manual process.  You can use it to create a SQL Server Agent job on the database instance you wish to backup, but you cannot schedule a GUI-initiated backup.  For scheduled backups, we will need to determine if we want to manage some sort of batch file or a PowerShell script.  It really comes down to what you’re comfortable using, but since PowerShell is rather versatile (and becoming the prevalent standard for administration), I’ve elected to go that route.

If you’ve never executed a PowerShell script on the machine managing the backups, there is one step you will need to take so you can avoid the following error:

File %1 cannot be loaded because the execution of scripts is disabled on this system. Please see “get-help about_signing” for more details.

To summarize the TechNet article above, open a command prompt (using run as Administrator option) and execute the following:

  1. powershell
  2. Set-ExecutionPolicy RemoteSigned
  3. exit
  4. exit

Now we are allowed to execute unsigned PowerShell scripts locally.  After this, create the PowerShell script file (.ps1) containing your backup commands and put it into an easily accessible location.  An example of what such a file contains is as follows:

new-backup  -svr 'SERVERNAME1\INSTANCE' -RetainBackupDays  2 -lb -bksif -RetainSnapofSnapInfoDays 2 -trlog  -noutm  -updmir  -mgmt daily
new-backup  -svr 'SERVERNAME2\INSTANCE' -RetainBackupDays  2 -lb -bksif -RetainSnapofSnapInfoDays 2 -trlog  -noutm  -updmir  -mgmt daily
new-backup  -svr 'SERVERNAME3\INSTANCE' -RetainBackupDays  2 -lb -bksif -RetainSnapofSnapInfoDays 2 -trlog  -noutm  -updmir  -mgmt daily
new-backup  -svr 'SERVERNAME4\INSTANCE' -RetainBackupDays  2 -lb -bksif -RetainSnapofSnapInfoDays 2 -trlog  -noutm  -updmir  -mgmt daily
new-backup  -svr 'SERVERNAME5\INSTANCE' -RetainBackupDays  2 -lb -bksif -RetainSnapofSnapInfoDays 2 -trlog  -noutm  -updmir  -mgmt daily  -ArchiveBackup  -ArchivedBackupRetention Daily
new-backup  -svr 'SERVERNAME6\INSTANCE' -RetainBackupDays  2 -lb -bksif -RetainSnapofSnapInfoDays 2 -trlog  -noutm  -updmir  -mgmt daily
new-backup  -svr 'SERVERNAME7\INSTANCE' -RetainBackupDays  2 -lb -bksif -RetainSnapofSnapInfoDays 2 -trlog  -noutm  -updmir  -mgmt daily
new-backup  -svr 'SERVERNAME8\INSTANCE' -RetainBackupDays  2 -lb -bksif -RetainSnapofSnapInfoDays 2 -trlog  -noutm  -updmir  -mgmt daily

As mentioned earlier, if you step through the Backup Wizard for each instance, commands similar to above will be generated in a SQL Job if you choose to script the output instead of execute it.  The scripted job utilizes the Command Line Utility, but the command and it’s parameters are exactly the same if executed through PowerShell instead.  This may be the easiest way to construct the backup command for your first few instances until you get a handle on the switches available in the product.

All that remains is to schedule a job that executes this script.  I choose to do this with a SQL Agent job, but any scheduling software is sufficient.  Here is the Operating system (CmdExec) command that I call within my job:

powershell -PSConsoleFile "C:\Program Files\NetApp\SnapManager for SQL Server\SmsqlShell.psc1" -NoLogo -File "C:\Prod-NetAppFullBackups.ps1"

For each (group of) instance(s) you are backing up, at minimum you’ll want to create one job for your full backups and one for your transaction logs.  If you emulate what I have above, you will also want to create a separate job to verify the backups created from this process as verification is not happening during the backup job itself.

Because of the way SnapManager reports progress back to the job, it will easily chew up space in your SQL Agent job history logs.  To avoid only seeing one line of history for any job on the server, I recommend bumping up the size of your job history logs, using the Remove agent history Older than option, or disable the record limitation outright if you’re maintaining job history another way:

Additionally, I also store the output of my jobs to the file system using dynamically generated output logs.  I find it easier to read through logs created this way rather than looking at the SQL Job Agent history or at each instance’s report history through the SnapManager for SQL Server console.

Final Thoughts

The overall architecture of SnapManager backups must to be created with care.  It can quickly go awry if you don’t put much thought into it, but hopefully I’ve helped identify some of SnapManager’s pitfalls and solutions around them.  When the tool is working well, it works really well.  Backups that would normally take hours, take moments, and restores are just as quick.

Part of our job as DBAs is to find the best method to accomplish whatever task is at hand.  Backups are an integral part of what we do, so don’t be afraid to try non-native backup methodologies.  Just make sure that you properly test the water before you jump in.

Finally I want to throw a quick thank you out to Mark Arnold, a Consulting Systems Engineer for NetApp, for setting me straight on some of my more erroneous statements in the initial draft of this post.  Thanks Mark!

John

References (in order of appearance):
http://technet.microsoft.com/en-us/library/cc778187%28WS.10%29.aspx
http://en.wikipedia.org/wiki/Shadow_Copy
http://technet.microsoft.com/en-us/library/cc966520.aspx
http://blogs.msdn.com/b/sqlserverfaq/archive/2009/04/28/informational-shedding-light-on-vss-vdi-backups-in-sql-server.aspx
http://technet.microsoft.com/en-us/library/cc739923%28WS.10%29.aspx
http://www.mssqltips.com/sqlservertip/2220/considerations-for-implementing-snapmanager-for-sql-server/
http://sqlserverpedia.com/wiki/Central_Management_Server
http://technet.microsoft.com/en-us/library/ee176949.aspx#EEAA
https://dbaeyes.wordpress.com/2011/10/17/no-more-static-output-logs/
Advertisements

4 responses to “Making NetApp SnapManager “Play Nice”

  1. John, what about the reporting and Analyses DB’s? should these goes into LUN of User Db’s or LUN of System DB’s ??

    • I would recommend the Reporting DBs go onto the User DB LUN and if you’re talking a datamart or datawarehouse DB, then they should also go on the User DB. If this is a Datawarehouse server, I’m hoping only DBs related to the Datawarehouse are hosted on this server though, but digging into this would put us on a tangent not related to this question. There is a recommended limit to the number of databases you can put onto each LUN (of 35), so if you hit that limit, you should make a new LUN for the new DBs. Only the System DBs (e.g. master, model, system, and the hidden systemresource) should be on the System LUN. This is because SNAPManager takes native SQL backups of these databases instead of SNAPs, so any other DBs you put into this LUN will not utilize the benefits of SnapManager.

      Hope that helps.

  2. John, a lot of what you’ve graciously put up here is more than we’ve been able to gather going through our unnamed NetApp partner. We currently have our tempdb in the same LUN as our System DB’s.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s