SQL 2014 Storage Solutions – Azure Blob Storage

SQL Server 2014 has support for both on-prem and cloud database files.  This opens up a series of new possibilities which assuredly will spark debate.  Here we will dig into a few of the new scenarios and potential use cases. 

SQL Server running in Azure – Files in Azure Blob Storage

The scenario where you have your SQL Server running in Azure is likely the most common, or at least the first scenario that comes to mind for this feature.  Running your database with files in Azure Blob Storage provides you with the kind of dynamic disk allocation that suits agile development and unknown growth patterns.  Leaving the storage maintenance concerns to Azure leaves IT and development the time to focus on their applications, which is where most developers would choose to focus their efforts. 

SQL Server running On-Prem – Files in Azure Blob Storage

Running a database on-prem with files in Azure Blob Storage has the same benefits of managed storage, but also enables additional HA capabilities.  For example, if your server running your database happens to crash, the database files in Azure are accessible from anywhere (potentially) and the database can be attached to a new instance – on-prem or in Azure.  Additionally, keeping all of the data and log files in Azure Blob Storage enables Always-On Availability Groups to transition between on-prem and Azure. 

SQL Server running On-Prem – Hybrid (Data in Azure, Log On-Prem)

This scenario seems to be about tiered data storage.  The setup here is an on-prem database, with an on-prem log, and data in Azure Blob Storage.  The tiered data storage piece comes into play here because there would more than likely also be data on-prem.  The primary reason that there are files in Azure is that they are accessed infrequently and can handle being somewhat latent.  An example of this is a partitioned table, whereby old partitions that are not written to and rarely read are moved into a filegroup in Azure.  The active partition and log remains on-prem to maintain performance of reads and writes. 

One problem with this scenario is that loss of connectivity to the files in Azure will leave the database In Recovery. 

Creating Database Files in Azure Blob Storage

The setup here is similar to backing up a SQL Server database to Azure Blob Storage step 1 – 3. 

Setting up your storage

Step 1: Create account

Creating an Azure account is quick and easy, and there is currently a 1 month trial available at http://www.windowsazure.com.  Once you request your account, it may take 24 hours for it to be provisioned.  Keep this in mind as you will not be able to begin testing immediately.

Step 2: Add storage

Now that you have your account, you will need to add your storage.  This is quick and easy, with the only steps being to name your storage, pick a location, and choose whether your storage should be geo-redundant or locally redundant.  While backup files can be geo-redundant, data and log files do not currently support this.

Create Storage Account

Step 3: Create container

In order to save any files into your storage, you will need to add a container.  To do this, click on the name of your newly created storage.  From here, you can monitor and change the configuration of your storage.  Click on the navigation for “Containers”.  In the “New Container” dialog, pick a name for what files will be in the container.  When we set up our container for backups previously, we called our container “sqlbackups”.  For our purposes here, we can just call this “data”.  You also have the choice of access types – generally speaking you will leave this at the default of “Private”. 

Now that you have a container created, you will see that the container has a URL.  This is the URL that you will use to point your backups to this container.  The format of this URL is [storage name].core.windows.net/[container].  While the URL shows http://, https:// will also work. 

Storage Container - Data

 

Creating the database with files in the cloud

Step 1:  Creating a credential

As with backing up a database to Azure Blob Storage, a credential needs to be in place for accessing your cloud resources.  However, this credential uses different inputs, which was not very intuitive to me.  Notes on this credential:

1.      The name of the credential must me the URL of the container (you would need 2 credentials if you had a data container and a log container)

2.      The IDENTITY must be set to “SHARED ACCESS SIGNATURE”

3.      The SECRET is not one of the 2 automatically generated access keys in the storage account as it would be for creating a backup credential.  Rather, it is generated outside of the Azure portal using the Azure API.  For this, you will either need to code up some c# to generate a secret, or use one of the Storage Explorers to generate a secret.  If you want to write the code, head over here to see how.  Otherwise, here is how to use a storage explorer:

First, you need to get your hands on a storage explorer.  From your storage account quick start guide, click on Storage Explorers.  In this example I am using the Azure Storage Explorer, which is out on CodePlex.

Add account

StorageExplorer Add Account

Click on the container you will put your files in (“data” in this case), and then click “Security”.  On the “Shared Access Signature” tab, we will generate a secret to use in the credential.

generate shared access signature

Make sure that you check the box for “write”, otherwise your credential will not have access to create the files in your container.  When you click “Generate Signature”, you will get a string that looks like this:

https://blogdemostorage.blob.core.windows.net/data?st=2014-01-13T03%3A16%3A31Z&se=2020-01-13T04%3A16%3A31Z&sr=c&sp=rw&sig=H6c7exHNUpVMY63bBh4nE5Epve5g6oP1xOkW0ybjQTQ%3D

The section highlighted is what we will use for the SECRET in the credential.  You can you everything from “st=” to the end, but that will put a time based restriction on the access.

Now that we have generated a new SECRET, let’s create that credential!

USE MASTER
GO
CREATE CREDENTIAL [https://blogdemostorage.blob.core.windows.net/data]
WITH
	IDENTITY = 'SHARED ACCESS SIGNATURE',
	SECRET = 'sr=c&sp=rw&sig=H6c7exHNUpVMY63bBh4nE5Epve5g6oP1xOkW0ybjQTQ%3D'
GO

Step 2:  Create the database with files in Azure Blob Storage

All Cloud files

CREATE DATABASE [Test_CloudFiles]
	CONTAINMENT = NONE
ON PRIMARY
(
	NAME = N'Test_CloudFiles',
	FILENAME = N'https://blogdemostorage.blob.core.windows.net/data/Test_CloudFiles.mdf',
	SIZE = 4096KB,
	FILEGROWTH = 1024KB
)
LOG ON
(
	NAME = N'Test_CloudFiles_log',
	FILENAME = N'https://blogdemostorage.blob.core.windows.net/data/Test_CloudFiles_log.ldf',
	SIZE = 1024KB,
	FILEGROWTH = 10%
)
GO

On-Prem log, cloud data

CREATE DATABASE [Test_HybridFiles]
	CONTAINMENT = NONE
ON PRIMARY
(
	NAME = N'Test_HybridFiles',
	FILENAME = N'https://blogdemostorage.blob.core.windows.net/data/Test_HybridFiles.mdf',
	SIZE = 4096KB,
	FILEGROWTH = 1024KB
)
LOG ON
(
	NAME = N'Test_HybridFiles_log',
	FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\Test_HybridFiles_log.ldf',
	SIZE = 1024KB,
	FILEGROWTH = 10%
)
GO

Testing the performance of the databases

For a quick performance test, I threw together a script that sequentially inserts 250 integers into a table. 

	create table dbo.test
		(
		id int primary key clustered
		)

	declare @x int = 0

	while (@x < 250)
	begin
		insert into dbo.test(id)
		values(@x)

		set @x = @x + 1
	end

I then ran this script against each database.

On-Prem Data / On-Prem Log  – 0 seconds

Cloud Data / On-Prem Log – 0 seconds

Cloud Data / Cloud Log – 18 seconds

Even on my internet connection, I expected this to be pretty snappy in each scenario.  When working with Azure Blob Storage, it does take some usage to get the disks hot.  It appears that the local log brings the performance back in line, even if your only data file is in Azure Blob Storage.  Presumably this is due to the log buffer flush to the transaction log occurring local.  When I have a bit more time perhaps I will research whether Log Writer waits are occurring when the log is in Blob Storage. 

Changing the default data location when SQL Server won’t start

So I suppose default data locations are important?  Updates were applied to a server recently and someone included a sql server patch that upgraded master and msdb.  SQL Server wouldn’t start after a reboot, and it was a mystery as to what had happened.  In Event Viewer, we were presented with the following:

Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.

Script level upgrade for database ‘master’ failed because upgrade step ‘sqlagent100_msdb_upgrade.sql’ encountered error 598, state 1, severity 25. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the ‘master’ database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.

CREATE FILE encountered operating system error 5(Access is denied.) while attempting to open or create the physical file ‘G:\Logs\temp_MS_AgentSigningCertificate_database_log.LDF’.

CREATE FILE encountered operating system error 5(Access is denied.) while attempting to open or create the physical file ‘G:\Data\temp_MS_AgentSigningCertificate_database.mdf’.

So, G:\Data and G:\Logs were mount points that were set up as default locations for databases.  In order to resolve this issue, we needed to change the default data location to be the location where master was, which was still in the default installation location.  For me, that was:

G:\MSSQL10_50.<InstanceName>\MSSQL\DATA

To do this, we have to dive into the registry (regedit). 

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.<Instance Name>\MSSQLServer

Key: DefaultData – needs to be changed from G:\Data to G:\MSSQL10_50.<Instance Name>\MSSQL\DATA

Key: DefaultLog – needs to be changed from G:\Logs to G:\MSSQL10_50.<Instance Name>\MSSQL\DATA

The “SQLPath” key under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.<Instance Name>\Setup is also set to the location where the installation has taken place: G:\MSSQL10_50.<Instance Name>\MSSQL

After changing these keys, I was able to start the sql service.  It takes a few seconds for the upgrade scripts to then complete.  Then, you have your instance back.  Now to run a query to see which instances have had their default locations for data and logs changed…

Using Azure Blob Storage for SQL Server Backups

With the release of SQL Server 2012 CU4, backing up databases directly to Azure Blog Storage is presented as an option for managing backups.  This provides a great opportunity to leverage the cloud for off-site, geo-redundant data storage for disaster recovery that is available from any location.  This article is an overview of how to get started using the cloud as a part of your DR plan. 

Setting up your storage

Step 1: Create account

Creating an Azure account is quick and easy, and there is currently a 1 month trial available at http://www.windowsazure.com.  Once you request your account, it may take 24 hours for it to be provisioned.  Keep this in mind as you will not be able to begin testing immediately.

Step 2: Add storage

Now that you have your account, you will need to add your storage.  This is quick and easy, with the only steps being to name your storage, pick a location, and choose whether your storage should be geo-redundant or locally redundant.

Create Storage Account

Step 3: Create container

In order to save any files into your storage, you will need to add a container.  To do this, click on the name of your newly created storage.  From here, you can monitor and change the configuration of your storage.  Click on the navigation for “Containers”.  In the “New Container” dialog, pick a name for what files will be in the container.  For our purposes here, we can just call this “sqlbackups”.  You also have the choice of access types – generally speaking you will leave this at the default of “Private”. 

Now that you have a container created, you will see that the container has a URL.  This is the URL that you will use to point your backups to this container.  The format of this URL is [storage name].core.windows.net/[container].  While the URL shows http://, https:// will also work. 

Storage Container

 

Step 4: Manage access keys – created by default

When you are in your storage account on the dashboard, you will notice “Manage Access Keys” at the bottom of the screen.  Clicking this will take you to a screen with 2 access keys pre-generated for your storage account.  You will use these to create a credential in the next step.

manage access keys

Creating your backup

Step 1: Create credential

In order to perform a backup to the cloud storage, you will need to create a credential on the SQL Server instance that you wish to take backups from.  The identity of the credential is the storage name from Azure, and the secret is the access key from step 5 above.

use master
go

CREATE CREDENTIAL BackupCredential WITH IDENTITY = 'blogdemostorage'
,SECRET = 'i/NCk4dWxUce0U+w5kPFT9m9p2UkT5ogWFYdyKEOqRNjLt32DFuu8NO7dMa6oNMqi/EZe2k+F/SuK9EbSkY1AA==' ;

go

Step 2: Backup the db

With your storage account and credential created, all that is left is to issue a backup command.  The biggest difference here is that instead of backing up TO DISK, you are backing up TO URL.  The URL specified in the command is the URL that is associated with the container that you created, with the name of the database backup file appended to it.  Authentication is handled by the credential.  Since you are transmitting this file to the cloud, it is recommended that you perform your backup with compression. 

BACKUP DATABASE AdventureWorks2012
TO URL = 'https://blogdemostorage.blob.core.windows.net/sqlbackups/AdventureWorks2012.bak'
      WITH CREDENTIAL = 'BackupCredential'
     ,COMPRESSION

After a refresh of your container in Azure, your backup file will be there.  Success!

You may choose to perform a backup using the SSMS GUI.  While backup TO URL exists in SQL Server 2012 SP1 CU4 and beyond, the ability to perform a backup in this manner through the GUI does not exist prior to SQL Server 2014.  After choosing URL instead of DISK as the “Back up to:” destination, you are prompted for the file name, credential, and container.  This will automatically generate the url for backing up the database. 

 Backup to URL GUI

Restoring a backup from Azure Blob Storage

Restoring a backup from Azure Blob Storage to your SQL Server instance (on-prem or Azure) is just as easy as taking the backup.  Your options are to download the file, or restore from URL.  To download the file, simply navigate in the Azure web portal to the container with your backups, and click “Download”.  You can then restore the file using commands exactly as you would with any other on-prem file.  If you choose to restore from URL, the modifications are the same as the changes when backing up – create the same credential as you did for taking the backup, and in your backup statement indicate a URL, and the credential.

RESTORE DATABASE AdventureWorks2012
FROM URL = 'https://blogdemostorage.blob.core.windows.net/sqlbackups/AdventureWorks2012.bak'
WITH
	CREDENTIAL = 'BackupCredential';

 As with restoring a backup from disk, you can use MOVE to put the files in a different location.

Adopting a cloud backup strategy:  Compatibility

If you choose to use Azure Blob Storage for your backup strategy, you may have an environment with instances that are not on SQL Server 2012 CU4 or beyond.  This matter is resolved using the SQL Server Backup to Azure Tool.  Simply install this tool, point it to a folder, and it manages the transmission of files in that folder to Azure Blob Storage.  This tool also allows you to compress and encrypt backups for transmission, where the version of SQL Server that you are running may not.

Combining rows into comma delimited string

A frequent task in reporting is to combine rows together into a comma delimited string.  While this can be done in a loop or a cursor, there are other ways to perform this task.  If you are only dealing with a single result, there is a quick way to accomplish this with some shorthand:

/*
set up some data
*/

	declare @t table (id int identity(1,1), data varchar(50))
		insert into @t(data)
		values
			('a'),('b'),('c')

/* use a variable to roll up the data */

	declare @mystring varchar(100) = ''

	select
		@mystring = @mystring + ',' + data
	from @t

	select substring(@mystring, 2, len(@mystring)) --get rid of the first comma using SUBSTRING()
	select stuff(@mystring, 1, 1, '') --get rid of the first comma using STUFF()

The output in this case is “a,b,c”, combining the data from the 3 rows into one comma delimited string. Instead of writing the code to loop through the rows and append to the string, we simply have to remove the initial comma before returning the results. This example shows 2 ways to do that, using SUBSTRING() and using STUFF(). While this is a nice shortcut, this doesn’t handle all of the cases that we will be presented with over time. Specifically, we will again need to employ a loop if we need to group these strings in some manner. As it turns out, there is a method by which you can do this without the loop – use XML and STUFF().

To start, lets perform the same task as above, this time using XML and STUFF().

	declare @t table (id int identity(1,1), data varchar(50))
		insert into @t(data)
		values
			('a'),('b'),('c')

		select
			stuff
			(
				(
				select
					',' + data
				from @t
				for xml path('')
				)
			, 1, 1, '') as mystring

Our output in this case is the same: “a,b,c”.  Breaking this down, we can see that we are selecting all of the rows from the temp table.  The next step is to append a comma to the beginning of each piece of data.  Then, take this data and roll it up into XML, which will bring the data onto a single row.  The role that STUFF() plays here is the same as in the first example – removing the initial comma. We used STUFF() in this case instead of SUBSTRING() because the data is XML. In order to use SUBSTRING(), we would have first had to convert the XML back to VARCHAR.

Now, we can look at an example where we have multiple members that will each have their own comma delimited string.  For this example, we will call that our “grouping key”.

	declare @t2 table (id int identity(1,1), grouping_key int, data varchar(50))
		insert into @t2(grouping_key, data)
		values
			(1, 'a'),(1, 'b'),(1, 'c'), (2, 'd'), (2, 'e')

	select
		grouping_key,
		stuff
		(
			(
			select
				',' + data
			from @t2
			where
				grouping_key = unique_groups.grouping_key
			for xml path('')
			)
		, 1, 1, '') as mystring
	from
		(
		select
			grouping_key
		from @t2
		) unique_groups
	group by
		unique_groups.grouping_key

What we have done here is to use the XML and STUFF() as an inline function to manipulate the data specific to the row we are returning. As such, the output is not a single value. Rather, it is a table of data with our comma delimited strings broken up into 2 rows by the “grouping key”:

grouping_key mystring
1 a,b,c
2 d,e

One quick note on using XML in this manner – if your data has an ampersands in it, you will need to do some replacing to resolve those conflicts (changing “&amp;” back to “&” due to the XML conversion).

Enjoy!

Saving Time with Database Snapshots

Recently I ran into an issue where I needed to troubleshoot an ETL process that I was not very familiar with.  After reading through the process to familiarize myself with it, I set up an equivalent (near) environment for my testing.  While restoring the databases only took about 3 hours, I had some pressure to get this issue resolved.  I didn’t want to waste any time resetting the environment while I was debugging, so I decided to use a database snapshot on the target data warehouse to facilitate minimal reset time.  After a quick glance in BOL to verify that this would satisfy my needs, there it was!

Managing a test database

In a testing environment, it can be useful when repeatedly running a test protocol for the database to contain identical data at the start of each round of testing. Before running the first round, an application developer or tester can create a database snapshot on the test database. After each test run, the database can be quickly returned to its prior state by reverting the database snapshot.

This is precisely what I was looking for.   The database snapshot mechanism allowed me to perform test runs of the ETL and quickly reset the target data warehouse back to the state from the beginning of the test.  The scripts are pretty straight forward:

Create the database snapshot

USE [master]

GO

CREATE DATABASE OriginalDatabaseName_Snapshot--name this whatever you want, representative of your snapshot
ON
    (
    NAME=N'OriginalDataFileName',
    FILENAME=N'c:\DataDirectory\OriginalFileName.ss'
    )
AS SNAPSHOT OF OriginalDatabaseName

Restore from a database snapshot

USE master;

RESTORE DATABASE OriginalDatabaseName FROM
DATABASE_SNAPSHOT='OriginalDatabaseName_Snapshot';

GO

This is very useful for debugging, and is also a great way to build automated tests that rely on a uniform dataset to start each test.  There are some performance and growth considerations, this is an excellent way to save some time and ensure a reliable test.

More information about database snapshots:

 http://msdn.microsoft.com/en-us/library/ms175158.aspx