SQL 2014 Storage Solutions – Azure Blob Storage

SQL Server 2014 has support for both on-prem and cloud database files.  This opens up a series of new possibilities which assuredly will spark debate.  Here we will dig into a few of the new scenarios and potential use cases. 

SQL Server running in Azure – Files in Azure Blob Storage

The scenario where you have your SQL Server running in Azure is likely the most common, or at least the first scenario that comes to mind for this feature.  Running your database with files in Azure Blob Storage provides you with the kind of dynamic disk allocation that suits agile development and unknown growth patterns.  Leaving the storage maintenance concerns to Azure leaves IT and development the time to focus on their applications, which is where most developers would choose to focus their efforts. 

SQL Server running On-Prem – Files in Azure Blob Storage

Running a database on-prem with files in Azure Blob Storage has the same benefits of managed storage, but also enables additional HA capabilities.  For example, if your server running your database happens to crash, the database files in Azure are accessible from anywhere (potentially) and the database can be attached to a new instance – on-prem or in Azure.  Additionally, keeping all of the data and log files in Azure Blob Storage enables Always-On Availability Groups to transition between on-prem and Azure. 

SQL Server running On-Prem – Hybrid (Data in Azure, Log On-Prem)

This scenario seems to be about tiered data storage.  The setup here is an on-prem database, with an on-prem log, and data in Azure Blob Storage.  The tiered data storage piece comes into play here because there would more than likely also be data on-prem.  The primary reason that there are files in Azure is that they are accessed infrequently and can handle being somewhat latent.  An example of this is a partitioned table, whereby old partitions that are not written to and rarely read are moved into a filegroup in Azure.  The active partition and log remains on-prem to maintain performance of reads and writes. 

One problem with this scenario is that loss of connectivity to the files in Azure will leave the database In Recovery. 

Creating Database Files in Azure Blob Storage

The setup here is similar to backing up a SQL Server database to Azure Blob Storage step 1 – 3. 

Setting up your storage

Step 1: Create account

Creating an Azure account is quick and easy, and there is currently a 1 month trial available at http://www.windowsazure.com.  Once you request your account, it may take 24 hours for it to be provisioned.  Keep this in mind as you will not be able to begin testing immediately.

Step 2: Add storage

Now that you have your account, you will need to add your storage.  This is quick and easy, with the only steps being to name your storage, pick a location, and choose whether your storage should be geo-redundant or locally redundant.  While backup files can be geo-redundant, data and log files do not currently support this.

Create Storage Account

Step 3: Create container

In order to save any files into your storage, you will need to add a container.  To do this, click on the name of your newly created storage.  From here, you can monitor and change the configuration of your storage.  Click on the navigation for “Containers”.  In the “New Container” dialog, pick a name for what files will be in the container.  When we set up our container for backups previously, we called our container “sqlbackups”.  For our purposes here, we can just call this “data”.  You also have the choice of access types – generally speaking you will leave this at the default of “Private”. 

Now that you have a container created, you will see that the container has a URL.  This is the URL that you will use to point your backups to this container.  The format of this URL is [storage name].core.windows.net/[container].  While the URL shows http://, https:// will also work. 

Storage Container - Data

 

Creating the database with files in the cloud

Step 1:  Creating a credential

As with backing up a database to Azure Blob Storage, a credential needs to be in place for accessing your cloud resources.  However, this credential uses different inputs, which was not very intuitive to me.  Notes on this credential:

1.      The name of the credential must me the URL of the container (you would need 2 credentials if you had a data container and a log container)

2.      The IDENTITY must be set to “SHARED ACCESS SIGNATURE”

3.      The SECRET is not one of the 2 automatically generated access keys in the storage account as it would be for creating a backup credential.  Rather, it is generated outside of the Azure portal using the Azure API.  For this, you will either need to code up some c# to generate a secret, or use one of the Storage Explorers to generate a secret.  If you want to write the code, head over here to see how.  Otherwise, here is how to use a storage explorer:

First, you need to get your hands on a storage explorer.  From your storage account quick start guide, click on Storage Explorers.  In this example I am using the Azure Storage Explorer, which is out on CodePlex.

Add account

StorageExplorer Add Account

Click on the container you will put your files in (“data” in this case), and then click “Security”.  On the “Shared Access Signature” tab, we will generate a secret to use in the credential.

generate shared access signature

Make sure that you check the box for “write”, otherwise your credential will not have access to create the files in your container.  When you click “Generate Signature”, you will get a string that looks like this:

https://blogdemostorage.blob.core.windows.net/data?st=2014-01-13T03%3A16%3A31Z&se=2020-01-13T04%3A16%3A31Z&sr=c&sp=rw&sig=H6c7exHNUpVMY63bBh4nE5Epve5g6oP1xOkW0ybjQTQ%3D

The section highlighted is what we will use for the SECRET in the credential.  You can you everything from “st=” to the end, but that will put a time based restriction on the access.

Now that we have generated a new SECRET, let’s create that credential!

USE MASTER
GO
CREATE CREDENTIAL [https://blogdemostorage.blob.core.windows.net/data]
WITH
	IDENTITY = 'SHARED ACCESS SIGNATURE',
	SECRET = 'sr=c&sp=rw&sig=H6c7exHNUpVMY63bBh4nE5Epve5g6oP1xOkW0ybjQTQ%3D'
GO

Step 2:  Create the database with files in Azure Blob Storage

All Cloud files

CREATE DATABASE [Test_CloudFiles]
	CONTAINMENT = NONE
ON PRIMARY
(
	NAME = N'Test_CloudFiles',
	FILENAME = N'https://blogdemostorage.blob.core.windows.net/data/Test_CloudFiles.mdf',
	SIZE = 4096KB,
	FILEGROWTH = 1024KB
)
LOG ON
(
	NAME = N'Test_CloudFiles_log',
	FILENAME = N'https://blogdemostorage.blob.core.windows.net/data/Test_CloudFiles_log.ldf',
	SIZE = 1024KB,
	FILEGROWTH = 10%
)
GO

On-Prem log, cloud data

CREATE DATABASE [Test_HybridFiles]
	CONTAINMENT = NONE
ON PRIMARY
(
	NAME = N'Test_HybridFiles',
	FILENAME = N'https://blogdemostorage.blob.core.windows.net/data/Test_HybridFiles.mdf',
	SIZE = 4096KB,
	FILEGROWTH = 1024KB
)
LOG ON
(
	NAME = N'Test_HybridFiles_log',
	FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\Test_HybridFiles_log.ldf',
	SIZE = 1024KB,
	FILEGROWTH = 10%
)
GO

Testing the performance of the databases

For a quick performance test, I threw together a script that sequentially inserts 250 integers into a table. 

	create table dbo.test
		(
		id int primary key clustered
		)

	declare @x int = 0

	while (@x < 250)
	begin
		insert into dbo.test(id)
		values(@x)

		set @x = @x + 1
	end

I then ran this script against each database.

On-Prem Data / On-Prem Log  – 0 seconds

Cloud Data / On-Prem Log – 0 seconds

Cloud Data / Cloud Log – 18 seconds

Even on my internet connection, I expected this to be pretty snappy in each scenario.  When working with Azure Blob Storage, it does take some usage to get the disks hot.  It appears that the local log brings the performance back in line, even if your only data file is in Azure Blob Storage.  Presumably this is due to the log buffer flush to the transaction log occurring local.  When I have a bit more time perhaps I will research whether Log Writer waits are occurring when the log is in Blob Storage. 

Changing the default data location when SQL Server won’t start

So I suppose default data locations are important?  Updates were applied to a server recently and someone included a sql server patch that upgraded master and msdb.  SQL Server wouldn’t start after a reboot, and it was a mystery as to what had happened.  In Event Viewer, we were presented with the following:

Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.

Script level upgrade for database ‘master’ failed because upgrade step ‘sqlagent100_msdb_upgrade.sql’ encountered error 598, state 1, severity 25. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the ‘master’ database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.

CREATE FILE encountered operating system error 5(Access is denied.) while attempting to open or create the physical file ‘G:\Logs\temp_MS_AgentSigningCertificate_database_log.LDF’.

CREATE FILE encountered operating system error 5(Access is denied.) while attempting to open or create the physical file ‘G:\Data\temp_MS_AgentSigningCertificate_database.mdf’.

So, G:\Data and G:\Logs were mount points that were set up as default locations for databases.  In order to resolve this issue, we needed to change the default data location to be the location where master was, which was still in the default installation location.  For me, that was:

G:\MSSQL10_50.<InstanceName>\MSSQL\DATA

To do this, we have to dive into the registry (regedit). 

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.<Instance Name>\MSSQLServer

Key: DefaultData – needs to be changed from G:\Data to G:\MSSQL10_50.<Instance Name>\MSSQL\DATA

Key: DefaultLog – needs to be changed from G:\Logs to G:\MSSQL10_50.<Instance Name>\MSSQL\DATA

The “SQLPath” key under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.<Instance Name>\Setup is also set to the location where the installation has taken place: G:\MSSQL10_50.<Instance Name>\MSSQL

After changing these keys, I was able to start the sql service.  It takes a few seconds for the upgrade scripts to then complete.  Then, you have your instance back.  Now to run a query to see which instances have had their default locations for data and logs changed…

Using Azure Blob Storage for SQL Server Backups

With the release of SQL Server 2012 CU4, backing up databases directly to Azure Blog Storage is presented as an option for managing backups.  This provides a great opportunity to leverage the cloud for off-site, geo-redundant data storage for disaster recovery that is available from any location.  This article is an overview of how to get started using the cloud as a part of your DR plan. 

Setting up your storage

Step 1: Create account

Creating an Azure account is quick and easy, and there is currently a 1 month trial available at http://www.windowsazure.com.  Once you request your account, it may take 24 hours for it to be provisioned.  Keep this in mind as you will not be able to begin testing immediately.

Step 2: Add storage

Now that you have your account, you will need to add your storage.  This is quick and easy, with the only steps being to name your storage, pick a location, and choose whether your storage should be geo-redundant or locally redundant.

Create Storage Account

Step 3: Create container

In order to save any files into your storage, you will need to add a container.  To do this, click on the name of your newly created storage.  From here, you can monitor and change the configuration of your storage.  Click on the navigation for “Containers”.  In the “New Container” dialog, pick a name for what files will be in the container.  For our purposes here, we can just call this “sqlbackups”.  You also have the choice of access types – generally speaking you will leave this at the default of “Private”. 

Now that you have a container created, you will see that the container has a URL.  This is the URL that you will use to point your backups to this container.  The format of this URL is [storage name].core.windows.net/[container].  While the URL shows http://, https:// will also work. 

Storage Container

 

Step 4: Manage access keys – created by default

When you are in your storage account on the dashboard, you will notice “Manage Access Keys” at the bottom of the screen.  Clicking this will take you to a screen with 2 access keys pre-generated for your storage account.  You will use these to create a credential in the next step.

manage access keys

Creating your backup

Step 1: Create credential

In order to perform a backup to the cloud storage, you will need to create a credential on the SQL Server instance that you wish to take backups from.  The identity of the credential is the storage name from Azure, and the secret is the access key from step 5 above.

use master
go

CREATE CREDENTIAL BackupCredential WITH IDENTITY = 'blogdemostorage'
,SECRET = 'i/NCk4dWxUce0U+w5kPFT9m9p2UkT5ogWFYdyKEOqRNjLt32DFuu8NO7dMa6oNMqi/EZe2k+F/SuK9EbSkY1AA==' ;

go

Step 2: Backup the db

With your storage account and credential created, all that is left is to issue a backup command.  The biggest difference here is that instead of backing up TO DISK, you are backing up TO URL.  The URL specified in the command is the URL that is associated with the container that you created, with the name of the database backup file appended to it.  Authentication is handled by the credential.  Since you are transmitting this file to the cloud, it is recommended that you perform your backup with compression. 

BACKUP DATABASE AdventureWorks2012
TO URL = 'https://blogdemostorage.blob.core.windows.net/sqlbackups/AdventureWorks2012.bak'
      WITH CREDENTIAL = 'BackupCredential'
     ,COMPRESSION

After a refresh of your container in Azure, your backup file will be there.  Success!

You may choose to perform a backup using the SSMS GUI.  While backup TO URL exists in SQL Server 2012 SP1 CU4 and beyond, the ability to perform a backup in this manner through the GUI does not exist prior to SQL Server 2014.  After choosing URL instead of DISK as the “Back up to:” destination, you are prompted for the file name, credential, and container.  This will automatically generate the url for backing up the database. 

 Backup to URL GUI

Restoring a backup from Azure Blob Storage

Restoring a backup from Azure Blob Storage to your SQL Server instance (on-prem or Azure) is just as easy as taking the backup.  Your options are to download the file, or restore from URL.  To download the file, simply navigate in the Azure web portal to the container with your backups, and click “Download”.  You can then restore the file using commands exactly as you would with any other on-prem file.  If you choose to restore from URL, the modifications are the same as the changes when backing up – create the same credential as you did for taking the backup, and in your backup statement indicate a URL, and the credential.

RESTORE DATABASE AdventureWorks2012
FROM URL = 'https://blogdemostorage.blob.core.windows.net/sqlbackups/AdventureWorks2012.bak'
WITH
	CREDENTIAL = 'BackupCredential';

 As with restoring a backup from disk, you can use MOVE to put the files in a different location.

Adopting a cloud backup strategy:  Compatibility

If you choose to use Azure Blob Storage for your backup strategy, you may have an environment with instances that are not on SQL Server 2012 CU4 or beyond.  This matter is resolved using the SQL Server Backup to Azure Tool.  Simply install this tool, point it to a folder, and it manages the transmission of files in that folder to Azure Blob Storage.  This tool also allows you to compress and encrypt backups for transmission, where the version of SQL Server that you are running may not.

SQL Server 2014 – Delayed Transaction Durability

SQL Server 2014 has some great features, with my personal favorite being In-Memory OLTP (Hekaton).  However, in CTP2, a new feature was brought to light which has not received as much attention – Delayed Transaction Durability.  A common misperception is that this feature is only available when using In-Memory OLTP.  In reality, this is just another feature that can be put to use to reduce IO contention.  In this case, the targeted IO contention is transaction log writes.  Additionally, this feature may be helpful when your application is experiencing waits due to locks.  Since delaying log durability results in faster commit times, this effectively results in releasing locks faster.  The downside, is that you may lose data if the log is not flushed to disk for any reason (you can manually execute a flush if needed).

Deploying Delayed Durability

Step 1 – The Database

The database first needs to be configured for delayed durability.  This can be done by turning it on for the entire database, disabling for the entire database, or leaving it configurable at a transaction level.  This is done with an ALTER DATABASE statement.

ALTER DATABASE [DATABASE NAME] SET DELAYED_DURABILITY = {Disabled | Allowed | Forced}

These settings are fairly self-explanatory, but…

    1. Disabled – this means it can’t be used
    2. Forced – every transaction falls under delayed durability (if it can, exceptions are include things like distributed transactions which will always be durable)
    3. Allowed – this means that each transaction can control whether it is fully durable or delayed durable

Step 2:  The Transaction

If the database allows transaction durability to be configured (delayed_durability = allowed on the database), we have 2 options.  The first is to configure this in a natively compiled stored procedure (In-Memory OLTP), and the second is in a standard TSQL commit.

In-Memory OLTP (the books online example, I will dig into this further in a subsequent blog post)

CREATE PROCEDURE <procedureName> …

WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER

AS BEGIN ATOMIC WITH

(

DELAYED_DURABILITY = ON,

TRANSACTION ISOLATION LEVEL = SNAPSHOT,

LANGUAGE = N’English’

)

END

The piece that most people seem to be missing right now is that you can do this now without using a natively compiled stored procedure or in-memory table.  For any transaction (in a database that is marked delayed_durability = allowed), a commit statement can control the durability of the transaction.

BEGIN TRANSACTION

COMMIT WITH (DELAYED_DURABILITY = ON)

There are a few caveats to this.  For example, a distributed transaction will override this setting and proceed with delayed_durability = off.  For more on this, check out books online.

Summary

  1. Easy to deploy and use
  2. Use only if you can tolerate some data loss
  3. Use if you are commonly waiting on transaction log writes
  4. Use if you experience heavy contention and locks

My experience working with SQL Server 2014 on IO optimization to this point has been fantastic.  Between In-Memory OLTP and delayed durability, many of the typical IO bottleknecks can be alleviated.  Of course, there are trade-offs, but each project can now be dissected and analyzed as to whether these new features can be used to optimize the workload.  Whether it is a web application, ETL process, etc, I have already pinpointed a number of processes and applications that are prime candidates for these features.  Give them a try, I am sure you will find them useful as well!

SQL 2012 Copy/Paste to Excel Splits Rows

So you just moved to SQL 2012, excellent news! The very next day, someone is running an ad hoc query and copy/paste from the results grid is adding rows when they paste into Excel! The data hasn’t changed, so what is the cause? The answer is most likely that your data has carriage returns in it, and you just didn’t notice before. There is a behavioral difference in SQL Server 2012 that when you copy data from the grid and paste into Excel, the carriage returns are taken into account and data is split into new rows. In previous versions of SQL Server, the carriage returns are not copied as such. The problem appears to stem from moving SSMS into the Visual Studio shell, which took place in SQL Server 2012.

Example:

declare @t table(col1 nvarchar(20))

insert into @t(col1)
values('test' + char(13) + char(13))

insert into @t(col1)
values('test2' + ' ' + char(13) + char(13) + 'test3')

select *
from @t

When you copy these results from SSMS in SQL 2008, and paste into Excel 2010 or Excel 2013, you will see 2 rows.

pic1

However, if you do the same using SQL 2012, you will have 6 rows. As you can see, the data has been split by the carriage returns.

pic2

How do we resolve this issue? No…you don’t have to go back to SQL 2008. For starters, you can clean this data in your query by simply replacing the carriage returns as such:

declare @t table(col1 nvarchar(20))

insert into @t(col1)
values('test' + char(13) + char(13))

insert into @t(col1)
values('test2' + ' ' + char(13) + char(13) + 'test3')

select 
	replace(col1, char(13), ''),
	replace(replace(col1, char(13), ''), char(10), '') --get rid of line feeds too
from @t

This solution produces the following:

pic3

The great news here is that your data is actually cleaner than it was in SQL 2008. Notice that the extra spaces between “test2” and “test3” are now removed. The carriage returns in SQL 2008 displayed as blank spaces, thus the quality of your data was impacted by these even if they did not split your data. Remember, this was only cleansing for a single character that you did not want. At the very least if you are experiencing this, you should probably check for a line feed next (char(10)) and replace that as well. You could also make a general purpose UDF for stripping out all low ascii characters.

This brings us to our next point – Data Quality. Now that you have your workaround, the next step is to get a project together to clean up your old data, and also to clean your data inputs.

Other options here involve some analysis around the process you are using. Should you be running a query and copying the data into Excel? In some cases, other options may be available. For example, this issue does not exist when you load data into PowerPivot. Perhaps a Business Intelligence Semantic Model or PowerPivot Model would be a better solution moving forward.

Finally, feel free to vote up the suggestion on Connect related to this topic. Having too many rows is a much larger deal for most users than some extra whitespace, thus the manner in which this was handled prior to SQL 2012 seems to be the desired output.

Combining rows into comma delimited string

A frequent task in reporting is to combine rows together into a comma delimited string.  While this can be done in a loop or a cursor, there are other ways to perform this task.  If you are only dealing with a single result, there is a quick way to accomplish this with some shorthand:

/*
set up some data
*/

	declare @t table (id int identity(1,1), data varchar(50))
		insert into @t(data)
		values
			('a'),('b'),('c')

/* use a variable to roll up the data */

	declare @mystring varchar(100) = ''

	select
		@mystring = @mystring + ',' + data
	from @t

	select substring(@mystring, 2, len(@mystring)) --get rid of the first comma using SUBSTRING()
	select stuff(@mystring, 1, 1, '') --get rid of the first comma using STUFF()

The output in this case is “a,b,c”, combining the data from the 3 rows into one comma delimited string. Instead of writing the code to loop through the rows and append to the string, we simply have to remove the initial comma before returning the results. This example shows 2 ways to do that, using SUBSTRING() and using STUFF(). While this is a nice shortcut, this doesn’t handle all of the cases that we will be presented with over time. Specifically, we will again need to employ a loop if we need to group these strings in some manner. As it turns out, there is a method by which you can do this without the loop – use XML and STUFF().

To start, lets perform the same task as above, this time using XML and STUFF().

	declare @t table (id int identity(1,1), data varchar(50))
		insert into @t(data)
		values
			('a'),('b'),('c')

		select
			stuff
			(
				(
				select
					',' + data
				from @t
				for xml path('')
				)
			, 1, 1, '') as mystring

Our output in this case is the same: “a,b,c”.  Breaking this down, we can see that we are selecting all of the rows from the temp table.  The next step is to append a comma to the beginning of each piece of data.  Then, take this data and roll it up into XML, which will bring the data onto a single row.  The role that STUFF() plays here is the same as in the first example – removing the initial comma. We used STUFF() in this case instead of SUBSTRING() because the data is XML. In order to use SUBSTRING(), we would have first had to convert the XML back to VARCHAR.

Now, we can look at an example where we have multiple members that will each have their own comma delimited string.  For this example, we will call that our “grouping key”.

	declare @t2 table (id int identity(1,1), grouping_key int, data varchar(50))
		insert into @t2(grouping_key, data)
		values
			(1, 'a'),(1, 'b'),(1, 'c'), (2, 'd'), (2, 'e')

	select
		grouping_key,
		stuff
		(
			(
			select
				',' + data
			from @t2
			where
				grouping_key = unique_groups.grouping_key
			for xml path('')
			)
		, 1, 1, '') as mystring
	from
		(
		select
			grouping_key
		from @t2
		) unique_groups
	group by
		unique_groups.grouping_key

What we have done here is to use the XML and STUFF() as an inline function to manipulate the data specific to the row we are returning. As such, the output is not a single value. Rather, it is a table of data with our comma delimited strings broken up into 2 rows by the “grouping key”:

grouping_key mystring
1 a,b,c
2 d,e

One quick note on using XML in this manner – if your data has an ampersands in it, you will need to do some replacing to resolve those conflicts (changing “&amp;” back to “&” due to the XML conversion).

Enjoy!

SQL Server 2012 SP1 – MsiInstaller is going crazy!

Woah! I just installed SQL Server 2012 SP1 and MsiInstaller is attempting to reconfigure management studio constantly!

Event Viewer: Windows Installer reconfigured the product. Product Name: SQL Server 2012 Management Studio. Product Version: 11.1.3000.0. Product Language: 1033. Manufacturer: Microsoft Corporation. Reconfiguration success or error status: 0.

If this is happening to you, the good news is, there is a hotfix. I have had this happen on one of my SQL 2012 SP1 VMs, but I haven’t been able to exactly replicate how the issue came about. The issue has more documentation on Connect, and is resolved by requesting SQL Server SP1 CU2.

Splitting allocations across multiple records in a single statement

Today I was asked a question about how to appropriately split a quantity across multiple records, ordered by date.  I thought about this for a moment, and then rattled off a few ways that the general problem could be handled.  This first 2 were to loop through the data, and use a CTE, respectively.  Then the question arose as to whether it could be done in a single statement.  After a moment of pause, I channeled my inner Barney Stinson and proclaimed “Challenge Accepted!”.

As an example of working through this, lets say this was a payment that needed to be allocated across multiple invoices.  We can set up this data as follows (run this in conjunction with each of the scripts below to set up the data):

declare @invoice Table (invoice_id int, invoiced_amount money, invoice_date date)

	insert into @invoice(invoice_id, invoiced_amount, invoice_date)
	values
		(1, 25, '2012-1-1'),
		(2, 50, '2012-1-2'),
		(3, 200, '2012-1-2'),
		(4, 100, '2012-1-3'),
		(5, 500, '2012-1-4')

declare @payment money = 225

We now have 5 invoices across a few days, and a payment of 225 that we need to allocate across them. Let’s say that our method of applying this payment is by date and order created.

Example 1: Using a While Loop
This is the most common method I have encountered to resolve this problem. We simply loop through the data and keep track of the payment remaining in a variable, while applying the payment information in a table to keep track of what has been done. When there is no longer any payment remaining to apply, we are done.

	declare @accounts_recievable table(invoice_id int, invoice_date date, invoiced_amount money, paid_amount money, remaining_amount as (invoiced_amount - paid_amount))
		insert into @accounts_recievable(invoice_id, invoice_date, invoiced_amount, paid_amount)
		select
			invoice_id,
			invoice_date,
			invoiced_amount,
			0 as paid_amount
		from @invoice

	declare @invoice_id int = 1
	declare @working_payment money = @payment
	declare @working_payment_remaining money = 0

	while @working_payment > 0
	begin
		--find the total amount of the current row
			select
				@working_payment_remaining = @working_payment - invoiced_amount
			from @accounts_recievable t
			where
				t.invoice_id = @invoice_id

		--update the paid amount for the row
			update t
			set t.paid_amount =
				case
					when @working_payment_remaining > 0 then t.invoiced_amount
					else @working_payment
				end
			from @accounts_recievable t
			where
				t.invoice_id = @invoice_id

		set @working_payment = @working_payment_remaining
		set @invoice_id = @invoice_id + 1
	end

	select *
	from @accounts_recievable

Output:

invoice_id invoice_date invoiced_amount paid_amount remaining_amount
1 1/1/2012 25 25 0
2 1/2/2012 50 50 0
3 1/2/2012 200 150 50
4 1/3/2012 100 0 100
5 1/4/2012 500 0 500

Example 2: Using a CTE
With a CTE, we are essentially looping through the data as well. This example assumed an integer invoice_id that is used for ordering, but you could also modify to use a row number to get an ordered data set.  The first select in the cte determines the results for the first row.  The second “select” uses the results from the current row in conjunction with the results from the previous row to determine the running totals.

;with cte(invoice_id, invoice_date, invoiced_amount, paid_amount, payment_remaining)

as

	(
	select
		invoice_id,
		invoice_date,
		invoiced_amount,
		case
			when @payment - invoiced_amount > 0 then invoiced_amount
			else invoiced_amount - @payment
		end as paid_amount,
		case
			when @payment - invoiced_amount > 0 then @payment - invoiced_amount
			else 0
		end as payment_remaining
	from @invoice i
	where
		invoice_id = 1

	union all

	select
		i.invoice_id,
		i.invoice_date,
		i.invoiced_amount,
		case
			when t.payment_remaining - i.invoiced_amount > 0 then i.invoiced_amount
			else t.payment_remaining
		end as paid_amount,
		case
			when t.payment_remaining - i.invoiced_amount > 0 then t.payment_remaining - i.invoiced_amount
			else 0
		end as payment_remaining
	from cte t
	inner join @invoice i
		on i.invoice_id = t.invoice_id + 1
	)

select
	invoice_id,
	invoice_date,
	invoiced_amount,
	paid_amount,
	invoiced_amount - paid_amount as remaining_amount,
	payment_remaining
from cte

Output:

invoice_id invoice_date invoiced_amount paid_amount remaining_amount payment_remaining
1 1/1/2012 25 25 0 200
2 1/2/2012 50 50 0 150
3 1/2/2012 200 150 50 0
4 1/3/2012 100 0 100 0
5 1/4/2012 500 0 500 0

Example 3: Using SQL Server 2012 Windowing Functions
SQL Server 2012 opens up a variety of new windowing functions.  My favorite is being able to set the boundaries on your function such that you can control your running aggregate.  While we can easily reference the data from the previous row in a CTE or a While Loop, this is more difficult when you are not looping over the data.  However, now that we can set our window boundaries to shift the aggregate, it becomes easier.  In the example below, the “Running total query” is to display the results of each running total.  The keys here are the functions for limiting the boundaries.

Running total query:

	select
		invoice_id,
		invoice_date,
		invoiced_amount,
		sum(invoiced_amount) over (order by invoice_date, invoice_id rows unbounded preceding) as total_applied,
		@payment - sum(invoiced_amount) over (order by invoice_date, invoice_id rows unbounded preceding) as payment_remaining,
		@payment - sum(invoiced_amount) over (order by invoice_date, invoice_id rows between unbounded preceding and 1 preceding) as prior_payment_remaining --shift one left to get the amount previous for the first negative.  the previous amount is the remainder that would subtract from the next invoiced total
	from @invoice

Output:

invoice_id invoice_date invoiced_amount total_applied payment_remaining prior_payment_remaining
1 1/1/2012 25 25 200 NULL
2 1/2/2012 50 75 150 200
3 1/2/2012 200 275 -50 150
4 1/3/2012 100 375 -150 -50
5 1/4/2012 500 875 -650 -150

The first running total (total_applied) uses “rows unbounded preceding”.  This will sum everything from the first row (using the order specified) up to and including the current row.

The second running total (payment_remaining) is like the first, only changed to be the difference of the payment vs. the invoiced amount.

While the first 2 running totals use the default of “current row” as the right boundary, the third running total (prior_payment_remaining) gives us the running total through the previous row by performing the SUM from the first row (“unbounded preceding”) through the row prior to the current row (“1 preceding”).  The importance here is that when applying payment to invoices, there is a specific row at which you can only apply a partial payment and you need to know what the amount is that you can apply.  This happens to be the row where the payment_remaining turns negative. The true amount we want to subtract is the total that was remaining in the previous row, thus the shift in prior_payment_remaining.

Applying the payment using the running totals:

	select
		invoice_id,
		invoice_date,
		invoiced_amount,
		case
			when payment_remaining > 0 then invoiced_amount
			when payment_remaining <= 0 and prior_payment_remaining > 0 then prior_payment_remaining
			else 0
		end as paid_amount,
		case
			when payment_remaining > 0 then 0 --if the applied payment leaves you > 0, then the entire payment can be applied
			when payment_remaining <= 0 and prior_payment_remaining > 0 then invoiced_amount - prior_payment_remaining
			else invoiced_amount
		end as amount_remaining
	from
		(
		select
			invoice_id,
			invoice_date,
			invoiced_amount,
			sum(invoiced_amount) over (order by invoice_date, invoice_id rows unbounded preceding) as total_applied,
			@payment - sum(invoiced_amount) over (order by invoice_date, invoice_id rows unbounded preceding) as payment_remaining,
			@payment - sum(invoiced_amount) over (order by invoice_date, invoice_id rows between unbounded preceding and 1 preceding) as prior_payment_remaining --shift one left to get the amount previous for the first negative.  the previous amount is the remainder that would subtract from the next invoiced total
		from @invoice
		) t

Output:

invoice_id invoice_date invoiced_amount paid_amount amount_remaining
1 1/1/2012 25 25 0
2 1/2/2012 50 50 0
3 1/2/2012 200 150 50
4 1/3/2012 100 0 100
5 1/4/2012 500 0 500

There we have it! 3 ways to achieve the same results.  The last of these methods satisfies allocating this payment across multiple invoices with a single statement.  While many developers and DBAs are familiar with the concept of ROW_NUMBER(), few have taken the time to understand the variety of windowing functions and options that are truly available.  In this case, the usage of “rows unbounded preceding” and “rows between unbounded preceding and 1 preceding” gives us the ability to manipulate our window for the aggregated payment allocations.  This has functionality has proven very useful for me, and for those who have taken the time to understand the power that can be derived from applying it.

Using a SQL Server Stored Procedure as a PowerPivot Data Source

I was recently having a debate with an individual about whether you can use a SQL Server stored procedure as a source of data in PowerPivot, or in a Business Intelligence Semantic Model. The answer, is “yes, you can”. However, you may encounter an error along the way. Let’s try this example, using the stored procedure below:

create procedure dbo.testproc

as
select 1 as data
  1. In PowerPivot, choose “Get External Data”
  2. Choose SQL Server and enter your connection information
  3. Choose to “write a query that will specify the data to import”
  4. In the SQL Statement, execute the stored procedure – exec dbo.testproc

Success! However, what if you have some complicated logic, such as using a temporary table for some reason in your sproc. Shouldn’t the results be the same? Let’s change the stored procedure and see.

alter procedure dbo.testproc

as

declare @t table(data int)
insert into @t(data)
values(1)

select data
from @t

Now try to reload your data in PowerPivot with a refresh, and the following error will present itself: We couldn’t refresh the connection. Please go to existing connections and verify they connect to the file or server.  Okay, so now we have an error…not very intuitive as to how to fix it. The stored procedure used to run fine, and if you run it in SSMS it runs fine. The problem here is that we need to modify the stored procedure to SET NOCOUNT ON. So let’s try it again.

alter procedure dbo.testproc

as

set nocount on

declare @t table(data int)
insert into @t(data)

values(1)

select data
from @t

Reload the data in PowerPivot again using refresh – SUCCESS!  While this may not be presented with an error that points you towards a resolution, using a stored procedure to load data in to PowerPivot does work when you have complex logic.

Excel 2013 – Where is PowerPivot?

I just installed Office 2013 – where did PowerPivot go? Well, it isn’t a download that you need to find and install now. Rather, it is an Add-In that you need to enable.

  1. Open Excel
  2. File -> Options -> Add-Ins
  3. Under “Inactive Application Add-ins” you will notice that PowerPivot is listed. Hence, it is “Inactive” post install.

  1. Since it is listed as a “COM Add-in”, Manage COM Add-ins at the bottom of this screen, and click “Go…”
  2. Here you can modify the state of the Add-in. Check the box and click “OK”. While you are at it, I would also suggest enabling “Power View” for visualizations.

Why you cannot just toggle the Add-ins from inactive to active without the “manage” step is a bit odd, but it works. Perhaps in the future they will let you double click, right click, check a box, or just drag the add-in to the other area without this extra step.