Splitting Daytime Into Chunks To Enhance SQL Bulk Time-based Operations Performance

Posted by Ahmed Tarek Hasan on 1/23/2015 03:36:00 AM with No comments




The best way to understand what this post is about is to start with a real scenario.

One of my colleagues was building a system which controls some motors using some readings coming from electronic sensors. The problem was that the frequency of the sensors readings was so high that the system had to insert a few reading records into the SQL database in few milliseconds. This means that in just a second the system had to insert hundreds or even thousands of records into the database.

This for sure was a big challenge and due to some customer needs and business requirements the approved solution was to compress the sensor readings per a fixed time range at the end of the day. This means that at the end of each day the sensor readings should be divided into groups where each group is limited by a time range of half an hour for example and then the readings of each group should be aggregated by taking the average. This way we will end up having hundreds of readings records into the database instead of thousands or even more. Till now everything is somehow good and logical.

A new challenge aroused while working on the SQL routine which will carry out the bulk aggregation process. The performance of this process was not promising. The process used to take a huge amount of time due to the heavy date-time comparisons and groupings. This was what encouraged me to jump in and try to help.

Before applying any changes the process was going like this; getting all sensor readings which were created between 00:00:00 and 00:15:00 and then taking the average for these readings, then, getting all sensor readings which were created between 00:15:00 and 00:30:00 and then taking the average for these readings, and so on......

As you can see this amount of processing and grouping based on time is huge and this was causing the whole process to be a nightmare performance wise.

One of the best ways to enhance the performance of bulk actions is to try to break the whole one big process into small controllable ones. Then, observe each small process and check whether it, for one record, depends only on the record itself and doesn't need to know anything about other records or not. If this happens, then this small process could be held on single record basis at early stage which could be the moment the record was created or something else based on the business requirements. This will help split the huge processing effort on different moments and stages of the system business life cycle and workflow which will make it more controllable and less recognizable.

Now, to apply the same concept on the problem we have on hand right now we should notice that:
  1. The chunks of time ranges (15 minutes) are the same for all readings and they do not change for any reason unless requested by system user which is done manually or through user interference
  2. The creation date of each sensor reading record is known at the moment the record is created
  3. Most of the processing effort is consumed on the date-time comparisons followed by the groupings
This helped me to decide that:
  1. The chunks of time ranges should be defined only once at the system launch and the results should be kept physically in a table to be used as a quick cached reference
  2. The time chunk to which each sensor reading record belongs should be decided at the moment the record is created
  3. This way each record can have an id of the time chunk it belongs to which will cause the grouping process to be much more easier and effortless


Create Database
USE [master]
GO

CREATE DATABASE [DayChunks] ON  PRIMARY 
( NAME = N'DayChunks', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\DayChunks.mdf' , SIZE = 4096KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
 LOG ON 
( NAME = N'DayChunks_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\DayChunks_log.ldf' , SIZE = 1280KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO

ALTER DATABASE [DayChunks] SET COMPATIBILITY_LEVEL = 100
GO

IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC [DayChunks].[dbo].[sp_fulltext_database] @action = 'enable'
end
GO

ALTER DATABASE [DayChunks] SET ANSI_NULL_DEFAULT OFF 
GO

ALTER DATABASE [DayChunks] SET ANSI_NULLS OFF 
GO

ALTER DATABASE [DayChunks] SET ANSI_PADDING OFF 
GO

ALTER DATABASE [DayChunks] SET ANSI_WARNINGS OFF 
GO

ALTER DATABASE [DayChunks] SET ARITHABORT OFF 
GO

ALTER DATABASE [DayChunks] SET AUTO_CLOSE OFF 
GO

ALTER DATABASE [DayChunks] SET AUTO_CREATE_STATISTICS ON 
GO

ALTER DATABASE [DayChunks] SET AUTO_SHRINK OFF 
GO

ALTER DATABASE [DayChunks] SET AUTO_UPDATE_STATISTICS ON 
GO

ALTER DATABASE [DayChunks] SET CURSOR_CLOSE_ON_COMMIT OFF 
GO

ALTER DATABASE [DayChunks] SET CURSOR_DEFAULT  GLOBAL 
GO

ALTER DATABASE [DayChunks] SET CONCAT_NULL_YIELDS_NULL OFF 
GO

ALTER DATABASE [DayChunks] SET NUMERIC_ROUNDABORT OFF 
GO

ALTER DATABASE [DayChunks] SET QUOTED_IDENTIFIER OFF 
GO

ALTER DATABASE [DayChunks] SET RECURSIVE_TRIGGERS OFF 
GO

ALTER DATABASE [DayChunks] SET  DISABLE_BROKER 
GO

ALTER DATABASE [DayChunks] SET AUTO_UPDATE_STATISTICS_ASYNC OFF 
GO

ALTER DATABASE [DayChunks] SET DATE_CORRELATION_OPTIMIZATION OFF 
GO

ALTER DATABASE [DayChunks] SET TRUSTWORTHY OFF 
GO

ALTER DATABASE [DayChunks] SET ALLOW_SNAPSHOT_ISOLATION OFF 
GO

ALTER DATABASE [DayChunks] SET PARAMETERIZATION SIMPLE 
GO

ALTER DATABASE [DayChunks] SET READ_COMMITTED_SNAPSHOT OFF 
GO

ALTER DATABASE [DayChunks] SET HONOR_BROKER_PRIORITY OFF 
GO

ALTER DATABASE [DayChunks] SET  READ_WRITE 
GO

ALTER DATABASE [DayChunks] SET RECOVERY FULL 
GO

ALTER DATABASE [DayChunks] SET  MULTI_USER 
GO

ALTER DATABASE [DayChunks] SET PAGE_VERIFY CHECKSUM  
GO

ALTER DATABASE [DayChunks] SET DB_CHAINING OFF 
GO


Create Tables
USE [DayChunks]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[Settings](
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [Name] [nvarchar](max) NOT NULL,
 [Value] [nvarchar](max) NULL,
 CONSTRAINT [PK_Settings] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO





CREATE TABLE [dbo].[DayChunks](
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [ChunkNumOfMinutes] [int] NOT NULL,
 [ChunkNumOfSeconds] [int] NOT NULL,
 [ChunkStart] [time](0) NOT NULL,
 [ChunkEnd] [time](0) NOT NULL,
 [IsLastChunk] [bit] NOT NULL,
 CONSTRAINT [PK_DayChunks] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[DayChunks] ADD  CONSTRAINT [DF_DayChunks_IsLastChunk]  DEFAULT (0) FOR [IsLastChunk]
GO




CREATE TABLE [dbo].[SensorReadings](
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [CreationDateTime] [datetime] NOT NULL,
 [Reading] [float] NOT NULL,
 [DayChunkID] [int] NULL,
 [DayDate] [date] NOT NULL,
 CONSTRAINT [PK_SensorReadings] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[SensorReadings] ADD  CONSTRAINT [DF_SensorReadings_CreationDateTime]  DEFAULT (getdate()) FOR [CreationDateTime]
GO

ALTER TABLE [dbo].[SensorReadings] ADD  CONSTRAINT [DF_SensorReadings_DayDate]  DEFAULT (getdate()) FOR [DayDate]
GO





CREATE TABLE [dbo].[AggregatedSensorReadings](
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [DayChunkID] [int] NULL,
 [Reading] [float] NOT NULL,
 [DayDate] [date] NOT NULL,
 CONSTRAINT [PK_AggregatedSensorReadings] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

Create Routines
USE [DayChunks]
GO


SET ANSI_NULLS ON
GO

IF EXISTS
(
 SELECT * FROM dbo.sysobjects
 WHERE id = object_id(N'[dbo].[SetDayChunksSettings]')
 AND OBJECTPROPERTY(id, N'IsProcedure') = 1
)
 
DROP PROCEDURE [dbo].[SetDayChunksSettings]

GO
CREATE PROCEDURE [dbo].[SetDayChunksSettings]
(
    @ChunksNumOfMinutes INT
 , @ChunksNumOfSeconds INT
 , @ChunksStartTime TIME(0)
)
AS
BEGIN
 DELETE FROM Settings
 WHERE Name IN
 (
  'DayChunksStartTime'
  , 'DayChunksNumOfMinutes'
  , 'DayChunksNumOfSeconds'
 )
 
 INSERT INTO Settings
 (
  Name
  , Value
 )
 VALUES
 (
  'DayChunksStartTime'
  , CAST(@ChunksStartTime AS NVARCHAR(MAX))
 )
 ,
 (
  'DayChunksNumOfMinutes'
  , CAST(@ChunksNumOfMinutes AS NVARCHAR(MAX))
 )
 ,
 (
  'DayChunksNumOfSeconds'
  , CAST(@ChunksNumOfSeconds AS NVARCHAR(MAX))
 )
END
GO





IF EXISTS
(
 SELECT * FROM dbo.sysobjects
 WHERE id = object_id(N'[dbo].[CreateDayChunksBasedOnInput]')
 AND OBJECTPROPERTY(id, N'IsProcedure') = 1
)
 
DROP PROCEDURE [dbo].[CreateDayChunksBasedOnInput]

GO
CREATE PROCEDURE [dbo].[CreateDayChunksBasedOnInput]
(
    @ChunksNumOfMinutes INT
 , @ChunksNumOfSeconds INT
 , @ChunksStartTime TIME(0)
)
AS
BEGIN
 DECLARE @MaxEnd TIME(0) = DATEADD(minute, (-1 * @ChunksNumOfMinutes), @ChunksStartTime)
 SET @MaxEnd = DATEADD(second, (-1 * @ChunksNumOfSeconds), @MaxEnd)

 DECLARE @ChunkStart TIME(0) = @ChunksStartTime

 DECLARE @ChunkEnd TIME(0) = DATEADD(minute, @ChunksNumOfMinutes, @ChunksStartTime)
 SET @ChunkEnd = DATEADD(second, @ChunksNumOfSeconds, @ChunkEnd)
 
 TRUNCATE TABLE DayChunks
 
 WHILE (@ChunkEnd < @MaxEnd)
 BEGIN
  INSERT INTO DayChunks
  (
   ChunkNumOfMinutes
   , ChunkNumOfSeconds
   , ChunkStart
   , ChunkEnd
  )
  VALUES
  (
   @ChunksNumOfMinutes
   , @ChunksNumOfSeconds
   , @ChunkStart
   , @ChunkEnd
  )
     
  SET @ChunkStart = DATEADD(minute, @ChunksNumOfMinutes, @ChunkStart)
  SET @ChunkStart = DATEADD(second, @ChunksNumOfSeconds, @ChunkStart)
     
  SET @ChunkEnd = DATEADD(minute, @ChunksNumOfMinutes, @ChunkEnd)
  SET @ChunkEnd = DATEADD(second, @ChunksNumOfSeconds, @ChunkEnd)
 END

 IF(@ChunkEnd = @MaxEnd OR @ChunkEnd > @MaxEnd)
 BEGIN
  IF(@ChunkEnd > @MaxEnd)
  BEGIN
   SET @ChunkEnd = @MaxEnd
  END
  
  INSERT INTO DayChunks
  (
   ChunkNumOfMinutes
   , ChunkNumOfSeconds
   , ChunkStart
   , ChunkEnd
  )
  VALUES
  (
   @ChunksNumOfMinutes
   , @ChunksNumOfSeconds
   , @ChunkStart
   , @ChunkEnd
  )
  
  INSERT INTO DayChunks
  (
   ChunkNumOfMinutes
   , ChunkNumOfSeconds
   , ChunkStart
   , ChunkEnd
   , IsLastChunk
  )
  VALUES
  (
   @ChunksNumOfMinutes
   , @ChunksNumOfSeconds
   , @MaxEnd
   , @ChunksStartTime
   , 1
  )
 END
END
GO





IF EXISTS
(
 SELECT * FROM dbo.sysobjects
 WHERE id = object_id(N'[dbo].[CreateDayChunksBasedOnSettings]')
 AND OBJECTPROPERTY(id, N'IsProcedure') = 1
)
 
DROP PROCEDURE [dbo].[CreateDayChunksBasedOnSettings]
GO

CREATE PROCEDURE [dbo].[CreateDayChunksBasedOnSettings]
AS
BEGIN
 DECLARE @ChunksNumOfMinutes INT
 DECLARE @ChunksNumOfSeconds INT
 DECLARE @ChunksStartTime TIME(0)
 
 IF EXISTS (SELECT TOP 1 ID FROM Settings WHERE Name = 'DayChunksNumOfMinutes')
 BEGIN
  SELECT TOP 1 @ChunksNumOfMinutes = 
   CASE
    WHEN Value IS NULL THEN CAST(0 AS INT)
    ELSE CAST (Value AS INT)
   END
  FROM Settings
  WHERE Name = 'DayChunksNumOfMinutes'
 END
 ELSE
 BEGIN
  SET @ChunksNumOfMinutes = 0
 END

 IF EXISTS (SELECT TOP 1 ID FROM Settings WHERE Name = 'DayChunksNumOfSeconds')
 BEGIN
  SELECT TOP 1 @ChunksNumOfSeconds = 
   CASE
    WHEN Value IS NULL THEN CAST(0 AS INT)
    ELSE CAST (Value AS INT)
   END
  FROM Settings
  WHERE Name = 'DayChunksNumOfSeconds'
 END
 ELSE
 BEGIN
  SET @ChunksNumOfSeconds = 0
 END

 IF EXISTS (SELECT TOP 1 ID FROM Settings WHERE Name = 'DayChunksStartTime')
 BEGIN
  SELECT TOP 1 @ChunksStartTime = 
   CASE
    WHEN Value IS NULL THEN CAST('00:00:00' AS TIME(0))
    ELSE CAST (Value AS TIME(0))
   END
  FROM Settings
  WHERE Name = 'DayChunksStartTime'
 END
 ELSE
 BEGIN
  SET @ChunksStartTime = CAST('00:00:00' AS TIME(0))
 END
 
 EXEC [dbo].[CreateDayChunksBasedOnInput] @ChunksNumOfMinutes, @ChunksNumOfSeconds, @ChunksStartTime
END
GO





SET ANSI_NULLS ON
GO

IF EXISTS
(
 SELECT *
 FROM INFORMATION_SCHEMA.ROUTINES
 WHERE ROUTINE_NAME = 'GetDateTimeDayChunkID'
 AND ROUTINE_SCHEMA = 'dbo'
 AND ROUTINE_TYPE = 'FUNCTION'
)
 
DROP FUNCTION [dbo].[GetDateTimeDayChunkID]
GO

CREATE FUNCTION [dbo].[GetDateTimeDayChunkID] 
(
 @InputDateTime DATETIME
)
RETURNS INT
AS
BEGIN
 DECLARE @Result INT
 DECLARE @InputTime TIME(0) = @InputDateTime
 
 SELECT TOP 1 @Result = ID
 FROM DayChunks
 WHERE @InputTime BETWEEN ChunkStart AND ChunkEnd
 
 IF(@Result IS NULL)
 BEGIN
  SET @Result = (SELECT TOP 1 ID FROM DayChunks WHERE IsLastChunk = 1)
 END
 
 RETURN @Result
END
GO

Testing Solution
The script below tests the solution by simulating sensor readings every one second and the bulk aggregation process is held every one minute.
USE [DayChunks]
GO


-- Setting day chunks settings
-- DayChunksStartTime = '00:00:00'
-- DayChunksNumOfMinutes = 1
-- DayChunksNumOfSeconds = 0
-- This means that the day starts at 00:00:00 and is splitted into chunks each is 1 minute long
EXEC [dbo].[SetDayChunksSettings] 1, 0, '00:00:00'

-- Splitting the day into proper chunks based on the settings set in the previous step
EXEC [dbo].[CreateDayChunksBasedOnSettings]
GO


-- Simulating sensor readings every one second
TRUNCATE TABLE SensorReadings
GO

DECLARE @i INT;
DECLARE @SensorReading FLOAT
DECLARE @Now DATETIME

SET @i = 1;
WHILE (@i <= 90)
BEGIN
 WAITFOR DELAY '00:00:01'
 
 DECLARE @CreationDateTime1 DATETIME = GETDATE()
 DECLARE @RandomReading1 FLOAT
 SELECT TOP 1 @RandomReading1 = RAND()
 
 INSERT INTO SensorReadings
 (
  Reading
  , CreationDateTime
  , DayChunkID
 )
 VALUES
 (
  @RandomReading1
  , @CreationDateTime1
  , [dbo].[GetDateTimeDayChunkID](@CreationDateTime1)
 )
 
 SET  @i = @i + 1;
END


SELECT *
FROM SensorReadings


-- Applying aggregation on sensor readings
TRUNCATE TABLE AggregatedSensorReadings
GO

INSERT INTO AggregatedSensorReadings
(
 DayChunkID
 , Reading
 , DayDate
)
SELECT DayChunkID
, AVG(Reading)
, DayDate
FROM SensorReadings
GROUP BY DayDate, DayChunkID


SELECT AggregatedSensorReadings.*
, DayChunks.ChunkStart
, DayChunks.ChunkEnd
FROM AggregatedSensorReadings
LEFT OUTER JOIN DayChunks
ON DayChunks.ID = AggregatedSensorReadings.DayChunkID

Result

 Settings


DayChunks 1




DayChunks 2


SensorReadings 1


SensorReadings 2


AggregatedSensorReadings



That's it. This is just an example of an application of the main concept of splitting day time into chunks. You can still use this concept and adapt the code to your business needs.


Hope this will help you someday.
Good Luck.