Using SQL Server Lag and Lead Functions to Show Data Backup Growth

by

I do a small amount of reporting on my database systems to see how different things are performing and attempt to flag issues before they become problems. Recently I came across and started using the SQL Server Lag and Lead functions as part of this proactive process.  I’ve found that by using these functions you can quickly and efficiently generate data that is meaningful especially when comparing various runs of data to each other.

What are the SQL Server Lag and Lead Functions?

Lag is a function that will return you data from the previous row in your result set. Lead will give you the subsequent row.  In versions of SQL prior to 2012 if you wanted to see data from a previous row you would have to do a self-join or is some cases a cursor depending on your knowledge of the language the time allotted for developing the script.

Using SQL Server Lag and Lead Functions Proactively

One of the main tasks for a database administrator is ensuring backups are complete, indexes are defragmented and statistics are updated. As your data footprint gets larger you may find jobs running longer than anticipated or jobs overlapping causing a slowdown.

I will show you how I use the Lag function to determine how much my backups are growing weekly and the impact that is having on timings. Please be aware that the network pipe used for backups and how busy the target system is will have an impact on performance that you may not be able to see but you can still get a good line of reference and may even be able to use SQL Server Lag and Lead Functions to estimate future growth.

Let’s Do This! The Way to Data Backup Growth

We all have msdb and backups, so you can easily do this on your system without having to create or import sample data. Of course, depending on how much backup data you purge you may not be able to see too far back.  Here is a base query I use to retrieve backup information for my full backups (type=D) that occur on Monday (DATEPART = 2) for the “mysmallmediumdb” database and I am going back to review the past 90 days from the current time.

SELECT 
 s.database_name
 ,s.backup_start_date 
 ,s.backup_finish_date
 ,DATEDIFF(SECOND,s.backup_start_date,s.backup_finish_date) as BackupTime_sec
 ,CAST(s.compressed_backup_size/1024/1024 as bigint) as CompressedBackupSize_mb
 ,s.is_copy_only
 ,s.is_snapshot 
FROM
 MSDB..backupset s 
 
WHERE 
 s.database_name = 'mysmallmediumdb'
 AND s.type = 'D'
 AND s.backup_start_date >= DATEADD(DAY,-90, GETDATE())
 AND DATEPART(dw,s.backup_start_date) =2
 
ORDER BY 
 s.backup_start_date DESC

I include is_copy_only and is_snapshot just I so know if those options were selected as they could impact my run times and if they show up I could then exclude them from the query.

This is great and I can see that over time my backups are taking longer and the size is getting bigger, but wouldn’t it be nice to really have some growth numbers?  Let’s add Lag and see the previous rows.

Book Online defines the syntax as:

LAG (scalar_expression [,offset] [,default])

    OVER ( [ partition_by_clause ] order_by_clause )

Based on my query I will need to use the scalar expression that returned me the backup time in seconds and I will also want to see data growth so I have to Lag the backup size too. I want to go 1 row back so that is my offset and I do not want a default (I want nulls to be NULL).  At this point I do not need to partition however if I was doing this for all databases on the system then I could partition by database name. I want to order by the backup start date as that is how I am ordering the query too.

Here is the updated query:

SELECT 
 s.database_name
 ,s.backup_start_date 
 ,s.backup_finish_date
 ,DATEDIFF(SECOND,s.backup_start_date,s.backup_finish_date) as BackupTime_sec
 ,LAG(DATEDIFF(SECOND,s.backup_start_date,s.backup_finish_date),1) OVER (ORDER BY s.backup_start_date) AS PrevTime_sec
 ,CAST(s.compressed_backup_size/1024/1024 as bigint) as CompressedBackupSize_mb
 ,LAG(CAST(s.compressed_backup_size/1024/1024 as bigint)) OVER (ORDER BY s.backup_start_date) AS PrevSize_mb
 ,s.is_copy_only
 ,s.is_snapshot 
FROM
 MSDB..backupset s 
 
WHERE 
 s.database_name = 'mysmallmediumdb'
 AND s.type = 'D'
 AND s.backup_start_date >= DATEADD(DAY,-90,GETDATE())
 AND DATEPART(dw,s.backup_start_date) =2
 
ORDER BY 
 s.backup_start_date DESC

And the screen looks like:

Great, now I can see current backup time and previous time right next to each other and the same goes for backup size and we see the last result which is the oldest date has a null for the previous columns because they do not exist. But I still don’t quite have what I want.

No matter how much you love math you really don’t want to calculate the differences here so I think I need to do some subtraction and see the difference so I need to take my base number and subtract it from my lag.

SELECT 
 s.database_name
 ,s.backup_start_date 
 ,s.backup_finish_date
 ,DATEDIFF(SECOND,s.backup_start_date,s.backup_finish_date) as BackupTime_sec
 ,LAG(DATEDIFF(SECOND,s.backup_start_date,s.backup_finish_date),1) OVER (ORDER BY s.backup_start_date) AS PrevTime_sec
 ,DATEDIFF(SECOND,s.backup_start_date,s.backup_finish_date) - LAG(DATEDIFF(SECOND,s.backup_start_date,s.backup_finish_date),1) OVER (ORDER BY s.backup_start_date) as BackupTimeDiffPrevRun
 ,CAST(s.compressed_backup_size/1024/1024 as bigint) as CompressedBackupSize_mb
 ,LAG(CAST(s.compressed_backup_size/1024/1024 as bigint)) OVER (ORDER BY s.backup_start_date) AS PrevSize_mb
 ,CAST(s.compressed_backup_size/1024/1024 as bigint) - LAG(CAST(s.compressed_backup_size/1024/1024 as bigint)) OVER (ORDER BY s.backup_start_date) as WeeklyGowth_mb
 ,s.is_copy_only
 ,s.is_snapshot 
FROM
 MSDB..backupset s 
 
WHERE 
 s.database_name = 'mysmallmediumdb'
 AND s.type = 'D'
 AND s.backup_start_date >= DATEADD(DAY,-90,GETDATE())
 AND DATEPART(WEEKDAY,s.backup_start_date) =2
 
ORDER BY 
 s.backup_start_date DESC

I removed the copy only and snapshot from the result set to make it fit better on the page and it is not needed for the purpose of this demo.  The screen looks like:

Now that is better, I can see various run-time differences which as I said earlier could be related to network usage amongst other things. But look at the growth numbers. I can see how much I am growing and use this to get an average and anticipate my storage needs over the next few months, for that though I would probably switch the query to get me data for the first day of every month, I used weekly here to get a few more rows returned. I could also take the numbers and get a percentage and continue on even further.

Conclusion on Data Backup Growth

SQL Lead and Lag Functions are fairly simple to use and provide great benefit. Everywhere you are tracking data and want to compare a few columns to their previous or next row you can use Lag or Lead. Remember you can go multiple rows forward and backward by switching the offset. 

If you are seeking assistance with implementing these strategies into your own databases, Fortified Data offers a variety of database consulting services to meet your needs. 

 

Connect with our sales team

Have a question? Please use the form below to contact us today.