Splitting allocations across multiple records in a single statement

Today I was asked a question about how to appropriately split a quantity across multiple records, ordered by date.  I thought about this for a moment, and then rattled off a few ways that the general problem could be handled.  This first 2 were to loop through the data, and use a CTE, respectively.  Then the question arose as to whether it could be done in a single statement.  After a moment of pause, I channeled my inner Barney Stinson and proclaimed “Challenge Accepted!”.

As an example of working through this, lets say this was a payment that needed to be allocated across multiple invoices.  We can set up this data as follows (run this in conjunction with each of the scripts below to set up the data):

declare @invoice Table (invoice_id int, invoiced_amount money, invoice_date date)

	insert into @invoice(invoice_id, invoiced_amount, invoice_date)
	values
		(1, 25, '2012-1-1'),
		(2, 50, '2012-1-2'),
		(3, 200, '2012-1-2'),
		(4, 100, '2012-1-3'),
		(5, 500, '2012-1-4')

declare @payment money = 225

We now have 5 invoices across a few days, and a payment of 225 that we need to allocate across them. Let’s say that our method of applying this payment is by date and order created.

Example 1: Using a While Loop
This is the most common method I have encountered to resolve this problem. We simply loop through the data and keep track of the payment remaining in a variable, while applying the payment information in a table to keep track of what has been done. When there is no longer any payment remaining to apply, we are done.

	declare @accounts_recievable table(invoice_id int, invoice_date date, invoiced_amount money, paid_amount money, remaining_amount as (invoiced_amount - paid_amount))
		insert into @accounts_recievable(invoice_id, invoice_date, invoiced_amount, paid_amount)
		select
			invoice_id,
			invoice_date,
			invoiced_amount,
			0 as paid_amount
		from @invoice

	declare @invoice_id int = 1
	declare @working_payment money = @payment
	declare @working_payment_remaining money = 0

	while @working_payment > 0
	begin
		--find the total amount of the current row
			select
				@working_payment_remaining = @working_payment - invoiced_amount
			from @accounts_recievable t
			where
				t.invoice_id = @invoice_id

		--update the paid amount for the row
			update t
			set t.paid_amount =
				case
					when @working_payment_remaining > 0 then t.invoiced_amount
					else @working_payment
				end
			from @accounts_recievable t
			where
				t.invoice_id = @invoice_id

		set @working_payment = @working_payment_remaining
		set @invoice_id = @invoice_id + 1
	end

	select *
	from @accounts_recievable

Output:

invoice_id invoice_date invoiced_amount paid_amount remaining_amount
1 1/1/2012 25 25 0
2 1/2/2012 50 50 0
3 1/2/2012 200 150 50
4 1/3/2012 100 0 100
5 1/4/2012 500 0 500

Example 2: Using a CTE
With a CTE, we are essentially looping through the data as well. This example assumed an integer invoice_id that is used for ordering, but you could also modify to use a row number to get an ordered data set.  The first select in the cte determines the results for the first row.  The second “select” uses the results from the current row in conjunction with the results from the previous row to determine the running totals.

;with cte(invoice_id, invoice_date, invoiced_amount, paid_amount, payment_remaining)

as

	(
	select
		invoice_id,
		invoice_date,
		invoiced_amount,
		case
			when @payment - invoiced_amount > 0 then invoiced_amount
			else invoiced_amount - @payment
		end as paid_amount,
		case
			when @payment - invoiced_amount > 0 then @payment - invoiced_amount
			else 0
		end as payment_remaining
	from @invoice i
	where
		invoice_id = 1

	union all

	select
		i.invoice_id,
		i.invoice_date,
		i.invoiced_amount,
		case
			when t.payment_remaining - i.invoiced_amount > 0 then i.invoiced_amount
			else t.payment_remaining
		end as paid_amount,
		case
			when t.payment_remaining - i.invoiced_amount > 0 then t.payment_remaining - i.invoiced_amount
			else 0
		end as payment_remaining
	from cte t
	inner join @invoice i
		on i.invoice_id = t.invoice_id + 1
	)

select
	invoice_id,
	invoice_date,
	invoiced_amount,
	paid_amount,
	invoiced_amount - paid_amount as remaining_amount,
	payment_remaining
from cte

Output:

invoice_id invoice_date invoiced_amount paid_amount remaining_amount payment_remaining
1 1/1/2012 25 25 0 200
2 1/2/2012 50 50 0 150
3 1/2/2012 200 150 50 0
4 1/3/2012 100 0 100 0
5 1/4/2012 500 0 500 0

Example 3: Using SQL Server 2012 Windowing Functions
SQL Server 2012 opens up a variety of new windowing functions.  My favorite is being able to set the boundaries on your function such that you can control your running aggregate.  While we can easily reference the data from the previous row in a CTE or a While Loop, this is more difficult when you are not looping over the data.  However, now that we can set our window boundaries to shift the aggregate, it becomes easier.  In the example below, the “Running total query” is to display the results of each running total.  The keys here are the functions for limiting the boundaries.

Running total query:

	select
		invoice_id,
		invoice_date,
		invoiced_amount,
		sum(invoiced_amount) over (order by invoice_date, invoice_id rows unbounded preceding) as total_applied,
		@payment - sum(invoiced_amount) over (order by invoice_date, invoice_id rows unbounded preceding) as payment_remaining,
		@payment - sum(invoiced_amount) over (order by invoice_date, invoice_id rows between unbounded preceding and 1 preceding) as prior_payment_remaining --shift one left to get the amount previous for the first negative.  the previous amount is the remainder that would subtract from the next invoiced total
	from @invoice

Output:

invoice_id invoice_date invoiced_amount total_applied payment_remaining prior_payment_remaining
1 1/1/2012 25 25 200 NULL
2 1/2/2012 50 75 150 200
3 1/2/2012 200 275 -50 150
4 1/3/2012 100 375 -150 -50
5 1/4/2012 500 875 -650 -150

The first running total (total_applied) uses “rows unbounded preceding”.  This will sum everything from the first row (using the order specified) up to and including the current row.

The second running total (payment_remaining) is like the first, only changed to be the difference of the payment vs. the invoiced amount.

While the first 2 running totals use the default of “current row” as the right boundary, the third running total (prior_payment_remaining) gives us the running total through the previous row by performing the SUM from the first row (“unbounded preceding”) through the row prior to the current row (“1 preceding”).  The importance here is that when applying payment to invoices, there is a specific row at which you can only apply a partial payment and you need to know what the amount is that you can apply.  This happens to be the row where the payment_remaining turns negative. The true amount we want to subtract is the total that was remaining in the previous row, thus the shift in prior_payment_remaining.

Applying the payment using the running totals:

	select
		invoice_id,
		invoice_date,
		invoiced_amount,
		case
			when payment_remaining > 0 then invoiced_amount
			when payment_remaining <= 0 and prior_payment_remaining > 0 then prior_payment_remaining
			else 0
		end as paid_amount,
		case
			when payment_remaining > 0 then 0 --if the applied payment leaves you > 0, then the entire payment can be applied
			when payment_remaining <= 0 and prior_payment_remaining > 0 then invoiced_amount - prior_payment_remaining
			else invoiced_amount
		end as amount_remaining
	from
		(
		select
			invoice_id,
			invoice_date,
			invoiced_amount,
			sum(invoiced_amount) over (order by invoice_date, invoice_id rows unbounded preceding) as total_applied,
			@payment - sum(invoiced_amount) over (order by invoice_date, invoice_id rows unbounded preceding) as payment_remaining,
			@payment - sum(invoiced_amount) over (order by invoice_date, invoice_id rows between unbounded preceding and 1 preceding) as prior_payment_remaining --shift one left to get the amount previous for the first negative.  the previous amount is the remainder that would subtract from the next invoiced total
		from @invoice
		) t

Output:

invoice_id invoice_date invoiced_amount paid_amount amount_remaining
1 1/1/2012 25 25 0
2 1/2/2012 50 50 0
3 1/2/2012 200 150 50
4 1/3/2012 100 0 100
5 1/4/2012 500 0 500

There we have it! 3 ways to achieve the same results.  The last of these methods satisfies allocating this payment across multiple invoices with a single statement.  While many developers and DBAs are familiar with the concept of ROW_NUMBER(), few have taken the time to understand the variety of windowing functions and options that are truly available.  In this case, the usage of “rows unbounded preceding” and “rows between unbounded preceding and 1 preceding” gives us the ability to manipulate our window for the aggregated payment allocations.  This has functionality has proven very useful for me, and for those who have taken the time to understand the power that can be derived from applying it.

Advertisements

Using a SQL Server Stored Procedure as a PowerPivot Data Source

I was recently having a debate with an individual about whether you can use a SQL Server stored procedure as a source of data in PowerPivot, or in a Business Intelligence Semantic Model. The answer, is “yes, you can”. However, you may encounter an error along the way. Let’s try this example, using the stored procedure below:

create procedure dbo.testproc

as
select 1 as data
  1. In PowerPivot, choose “Get External Data”
  2. Choose SQL Server and enter your connection information
  3. Choose to “write a query that will specify the data to import”
  4. In the SQL Statement, execute the stored procedure – exec dbo.testproc

Success! However, what if you have some complicated logic, such as using a temporary table for some reason in your sproc. Shouldn’t the results be the same? Let’s change the stored procedure and see.

alter procedure dbo.testproc

as

declare @t table(data int)
insert into @t(data)
values(1)

select data
from @t

Now try to reload your data in PowerPivot with a refresh, and the following error will present itself: We couldn’t refresh the connection. Please go to existing connections and verify they connect to the file or server.  Okay, so now we have an error…not very intuitive as to how to fix it. The stored procedure used to run fine, and if you run it in SSMS it runs fine. The problem here is that we need to modify the stored procedure to SET NOCOUNT ON. So let’s try it again.

alter procedure dbo.testproc

as

set nocount on

declare @t table(data int)
insert into @t(data)

values(1)

select data
from @t

Reload the data in PowerPivot again using refresh – SUCCESS!  While this may not be presented with an error that points you towards a resolution, using a stored procedure to load data in to PowerPivot does work when you have complex logic.

Excel 2013 – Where is PowerPivot?

I just installed Office 2013 – where did PowerPivot go? Well, it isn’t a download that you need to find and install now. Rather, it is an Add-In that you need to enable.

  1. Open Excel
  2. File -> Options -> Add-Ins
  3. Under “Inactive Application Add-ins” you will notice that PowerPivot is listed. Hence, it is “Inactive” post install.

  1. Since it is listed as a “COM Add-in”, Manage COM Add-ins at the bottom of this screen, and click “Go…”
  2. Here you can modify the state of the Add-in. Check the box and click “OK”. While you are at it, I would also suggest enabling “Power View” for visualizations.

Why you cannot just toggle the Add-ins from inactive to active without the “manage” step is a bit odd, but it works. Perhaps in the future they will let you double click, right click, check a box, or just drag the add-in to the other area without this extra step.

Cross Apply – Yes, you can use a table-valued function without a cursor

I ran into a scenario where i was informed that “Table-valued functions are a pain to use because you need to use a cursor”. The problem was that the individual had a blob of text that they wished to break up using a set delimiter, and they wished to use TSQL. Since I usually reply to questions with an answer, and a “also remove the cursor, please”, developers have become accustomed to writing queries without cursors. So, I suppose this was a topic that needed to get some attention at some point.

The answer to this dilemma is that you certainly can use a UDF to break up the blob of text and return the information without employing the use of a cursor. To do this, we can use CROSS APPLY. This will break up the data and return the relevant rows for us.

First, we create a UDF, in this case, a splitter function.  There are examples of these everywhere, I threw this one together quickly.  Pass in some text and a single character delimiter, get back the split up results – you get the idea.

create function [dbo].[MySplitter]
	(@Text nvarchar(max), @Delimiter char(1))
returns @Table table(ordinal int identity(1,1), data nvarchar(500))
as
    begin

		declare @working_text nvarchar(max) = @text
		declare @token nvarchar(max) = null
		declare @next_delimiter_pos int = 0

		set @next_delimiter_pos = charindex(@delimiter, @working_text)

		while (@next_delimiter_pos != 0)
		begin
			set @token = ltrim(left(@working_text, @next_delimiter_pos - 1))
			set @working_text = right(@working_text, len(@working_text) - @next_delimiter_pos)

			insert into @table(data)
			values(@token)

			set @next_delimiter_pos = charindex(@delimiter, @working_text)
		end

		if(@next_delimiter_pos = 0 and len(@working_text) > 0)
		begin
			insert into @table(data)
			select ltrim(@working_text)
		end

    return
end

Now, we can use this UDF to demonstrate how data can be returned using CROSS APPLY:

declare @t table(textblob nvarchar(100))

insert into @t
values
       ('Hello - Goodbye'),
       ('Yesterday - Today - Tomorrow')

select
       t.textblob,
       s.ordinal,
       s.data
from @t t
cross apply dbo.mysplitter(t.textblob,'-') s

Results:

textblob

ordinal

data

Hello – Goodbye

1

Hello

Hello – Goodbye

2

Goodbye

Yesterday – Today – Tomorrow

1

Yesterday

Yesterday – Today – Tomorrow

2

Today

Yesterday – Today – Tomorrow

3

Tomorrow

In this case, we have at least one value returned for each of the text blobs, so we will not have any NULL data sets returned from the UDF. If we were expecting NULLs, we would use OUTER APPLY instead of CROSS APPLY.

SQL Server String Comparison with Trailing White Space

AKA: When “=” does not mean “=”

Sounds a bit silly, but it causes a lot of confusion for developers.  Take the following example in SQL Server:

declare @t nvarchar(20) = 'aaaa   '
declare @t2 nvarchar(20) = 'aaaa'

if(@t=@t2)
	print 'yes'
else
	print 'no'

Are you expecting the output to be “no”?  The output will be “yes”, contrary to what most developers expect.  Often, someone will perform a comparison to remove whitespace and perform a check such as:

if(data!=rtrim(data))

This will not produce the expected results, however, since SQL Server is Width Insensitive by default.  The whitespace is essentially trimmed off during this comparison, event when rtrim() is not used. 

The problem here, is that there is no SQL Server collation name that IS width sensitive.  In order to resolve this problem, you need to perform a length comparison…but pick the right one.

declare @t nvarchar(20) = 'aaaa   '
declare @t2 nvarchar(20) = 'aaaa'

select len(@t)
select len(@t2)

select datalength(@t)
select datalength(@t2)

As books online indicates LEN() “Returns the number of characters of the specified string expression, excluding trailing blanks. “.  DATALENGTH(), however, will keep the trailing spaces and take them into account when returning the number of bytes in the expression.  One thing to note here is that if you are dealing with CHAR instead of VARCHAR data, your DATALENGTH() will be fixed for both values to the same length (20 in this example, 40 if NCHAR).

The majority of cases dealing with strings may not have to deal with trailing white space.  But, if you happen upon a scenario where you need to check for equality inclusive of white space, this is an important distinction to have knowledge of.

SQL 2012–Unique Constraint Violation Reporting

I wrote previously about some enhancements to the information in the SQL Server Log, whereby you can tell what database failed to be connected to. That doesn’t appear to be the only enhancement that I have stumbled across while working with SQL 2012. When working with unique constraints, you not only find out about a violation, but also the value which prompted this issue is also indicated. This message is indicated as long as you are running 2012 tools, as you will get the feedback when connecting to a 2008 R2 instance as well.

For example:

create table #temp
	(
	id int,
	constraint pk_temp primary key clustered
		(
		id
		)
	)

insert into #temp(id) values(1)
insert into #temp(id) values(1)

Error:

(1 row(s) affected)

Msg 2627, Level 14, State 1, Line 11

Violation of PRIMARY KEY constraint ‘pk_temp’. Cannot insert duplicate key in object ‘dbo.#temp’. The duplicate key value is (1).

The statement has been terminated.

While you may still have to go back and group the data up to see what all of the causes of the violation could be, this is quite a nice enhancement to be able to do a quick spot check of the data.

Understanding Draft Pick Valuation Offset in Keeper Leagues

In a serious keeper league, you will eventually end up at a point where trading draft picks is brought up as something that would be fun. However, figuring out how to value a pick is not something that is easily decided upon. Every league has different rules, categories, etc, but one thing that is common to just about every league is that players are able to be ranked to determine who is the best and who is the worst player. With that as a guideline, some analysis can be done to determine how valuable a pick is vs. another pick. I will go into further detail on that at another time. What will be analyzed here is the difference in round value due to the chaotic nature of keepers being pulled out of the draft pool.

Below is an example from a 10 team keeper league, where we were between seasons and trades were being discussed after keepers were selected. The question being asked wasn’t whether pick 1 is better than pick 2, rather, how do the rounds compare to each other based on value?

Setup:

First, determine the rank for every player (I know pitchers and batters have different stats, but they still have an overall valuation that gives them a rank in the scope of all players. We don’t need to know the raw stats for this example). Now, remove all keepers if they have not already been removed. At this point, you are left with only players that can be drafted. Add the sequential draft order to simulate, ordered by rank of those remaining, where they would be drafted (the player with the best rank would be round 1, pick 1).

Analysis:

Since people keep players from all over the ranks, there are gaps. Through this arbitrage between rank and overall draft pick # (in a 10 team league, round 2 pick 1 would be draft pick # 11) we can determine the value. For each draft position, determine the difference between rank and draft pick #.

Now that we have a general metric for determining value beyond draft slot, we need to look at it and see what it means.

1. If there were no gaps in picks, the metric would be the same for all picks.  For example: rank 1 – pick 1 = 0, rank 100 – pick 100 = 0.  If there is an offset, such as 50 total keepers, then drafting begins at pick 51.  Rank 51 – Pick 1 = 50.  Rank 52 – Pick 2 = 50.

2. Just as the #1 pick is better than the #2 pick, a small number is better than a larger number with this metric. If a player with rank 35 is the best available, and the second best is rank 40, the valuation for pick #1 is 34, pick #2 is 39.

Below is what the first 15 rounds of a draft could look like by draft pick value offset:

Round

Avg

Min

Max

1

48.3

35

53

2

58.5

56

62

3

66.2

62

68

4

71.6

68

76

5

76.2

76

77

6

78.4

77

79

7

80

80

80

8

80.7

80

82

9

82

82

82

10

82.2

82

83

11

83

83

83

12

83.4

83

84

13

84.8

84

85

14

85.2

85

87

15

87

87

87

In chart form:

clip_image002

As you can see, the difference between rounds is greater at the beginning of the draft. This is due to people arbitrarily keeping players and throwing others back into the draft. Thus, by rank, the first few rounds provide an enormous amount of increased value per draft position than the later rounds. Once round 6 rolls around, there is not a large arbitrage due to keepers. This was to be expected since those rounds (in this league) are filled with players who would not warrant being kept.

I will be posting some examples using regression analysis and WAR instead of rank soon to get a more in depth draft pick value.