I have been using ColdFusion 2016 and ZingCharts (bundled) to dynamically create charts using SQL Server, with a time series on the X axis. When there are time gaps I would like the line chart to also show a gap, but instead the line is continuous and plots each datapoint consecutively.
A pic of the chart the way it is plotting now, you can see there is no 'gap' between the Oct 29 and March dates, the data just run together:
My data are generally in 15min increments, but there are stretches of time (days or months) where there are gaps in the timeseries and data. I contacted ZingCharts to ask if there was some kind of style tag that controls whether the dates are displayed consecutively or with gaps and there is not. It's something that has to be manipulated at the data-level. If my data were hardcoded I would have to add null values so that the charts would plot with gaps in the timeseries, but my charts are dynamic (a user can choose any number of 7 parameters to add to the chart for a date range they choose). I have found information on how to solve this for hardcoded data, but I'm looking for ideas for solutions for dynamically loaded data/series. I have also found information on a deprecated coldfusion tag for the XML file, isInterpolated="false"
, but that's no longer an option.
My question is what is the best way to solve this? I found some information about creating a calendar table in SQL Server and unioning that with the table(s) providing the data so that all datetimes would be filled. I was wondering if there's another approach that I'm not thinking of? Thanks for any help, I'm very new at all of this.
Update: Here is the current query for the data, which is a bit complicated. It pulls "Nth" rows based on how many parameters (7 available) are selected and how many days are in the date range:
SELECT
distinct
datepart(year, t.sample_date) as [year]
,datepart(month, t.sample_date) as [month]
,datepart(day, t.sample_date) as [day]
,datepart(hour, t.sample_time) as [hr]
,datepart(minute, t.sample_time) as [min]
,convert(varchar(10), t.sample_date, 1) + ' ' +
RIGHT('0' + CONVERT([varchar](2), DATEPART(HOUR, t.sample_time)), 2) + ':' +
RIGHT('0' + CONVERT([varchar](2), DATEPART(MINUTE, t.sample_time)), 2) AS [datetime]
,t.stationdesc
<cfif isDefined("form.parameter") and ListFindNoCase(form.parameter, "salinity")>,ROUND(t.salinity,1) as salinity</cfif>
<!---plus 6 more parameters--->
FROM (
SELECT
[sample_date]
,sample_time
,stationdesc
<cfif isDefined("form.parameter") and ListFindNoCase(form.parameter, "salinity") >,salinity</cfif>
<!---plus 6 more parameters--->
, row_number() OVER (ORDER BY streamcode) AS rownum
FROM MyUnionizedTables
WHERE stationdesc = (<cfqueryparam value="#form.station#" cfsqltype="cf_sql_varchar">)
AND [sample_date] BETWEEN (<cfqueryparam value='#Form.StartDate#' cfsqltype="cf_sql_date">)
AND (<cfqueryparam value='#Form.EndDate#' cfsqltype="cf_sql_date">)
<cfif isDefined("form.parameter") and ListFindNoCase(form.parameter, "salinity")>and salinity > -25 and salinity <40 and salinity is not NULL </cfif>
<!---plus 6 more parameters--->
GROUP BY sample_date, sample_time, stationdesc, streamcode
<cfif isDefined("form.parameter") and ListFindNoCase(form.parameter, "salinity")>,salinity</cfif>
<!---plus 6 more parameters--->
) AS t
WHERE <!---returning Nth row when record sets (count of days between dates selected) are long--->
<cfif IsDefined("form.station") AND IsDefined("form.parameter") AND #ParamCount# LTE 3 AND form.station eq 'Coastal Bays - Public Landing' and #ctdays# gte 10> t.rownum % 64 = 0
<cfelseif IsDefined("form.parameter") AND #ParamCount# LTE 3 AND #ctDays# gte '5840'> t.rownum % 64 = 0
<!---plus lots more elseifs--->
<cfelseif IsDefined("form.parameter") AND #ParamCount# GTE 7 AND #ctDays# gte '350'> t.rownum % 8 = 0
<cfelse>t.rownum % 1 = 0</cfif>
ORDER BY
datepart(year, t.sample_date)
,datepart(month, t.sample_date)
,datepart(day, t.sample_date)
,datepart(hour, t.sample_time)
,datepart(minute, t.sample_time)
SECOND UPDATE (after Leigh's link to query on GitHub):
So I'd actually been working on a similar query to the one Leigh posted based on the "CTE Expression" section here. I switched to trying to work with her version, which is below. I don't have write edits, so I'm working with an existing table. MyDataTable has ~ 21mil rows, with a separate sample_date(datetime) and sample_time(datetime) [the dates and times are a PITA - b/c of the instruments and the way these data are remotely telemetered we get a datetime column with a 'good date' but a bogus timevalue that we call 'sample_date', and then a separate datetime column called 'sample_time' with a bogus date and a 'good time'.] There are 125 stations, each with data (for example, temperature) from different starting and ending dates/times, beginning in 2001 through present. So I need to fill date/time gaps for 125 different stations with differing gaps of time, that are normally in 15min increments.
--- simulate main table(s)
--CREATE TABLE MyDataTable ( sample_date datetime, sample_time datetime, stationdesc nvarchar, wtemp float)
--- generate all dates within this range
DECLARE @startDate datetime
DECLARE @maxDate datetime
SET @startDate = '2015-01-01'
SET @maxDate = '2016-12-31'
--- get MISSING dates
;WITH missingDates AS
(
SELECT DATEADD(day,1,@startDate) AS TheDate
UNION ALL
SELECT DATEADD(day,1, TheDate)
FROM missingDates
WHERE TheDate < @maxDate
)
SELECT *
--[wtemp]
-- ,[stationdesc]
-- ,[TIMEVALUE]
FROM missingDates mi LEFT JOIN MyDataTable t ON t.sample_date = mi.TheDate
WHERE t.sample_date IS NULL
--and stationdesc = 'Back River - Lynch Point'
--ORDER BY timevalue
OPTION (MAXRECURSION 0)
When I run this query as-is I get only 17 rows of data. TheDate column lists datetimes with dates 12/15-12/31/16 and all times are 00:00:00.000. Query takes 49s.
Meanwhile, my coworker and I have been working on alternate methods.
--Putting data from only 1 station from our big datatable into the new testtable called '_testdatatable'
SELECT station, sample_date, sample_time, wtemp, streamcode, stationdesc, TIMEVALUE
INTO _testdatatable
FROM MyBigDataTable
WHERE (stationdesc = 'Back River')
order by [sample_date],[sample_time]
--Next, make a new table [_testdatatableGap] with all time values in 15min increments from a datetime table we made
SELECT [wtemp]=null
,[streamcode]='ABC1234'
,[stationdesc]= 'Back River'
,[TIMEVALUE]
into [tide].[dbo].[_testdatatableGap]
FROM DateTimeTable
WHERE (TIMEVALUE BETWEEN '4/19/2014' AND getdate())
--Then, get the missing dates from the Gap table and put into the testdatatable
INSERT into [_testdatatable]
( [wtemp]
,[streamcode]
,[stationdesc]
,[TIMEVALUE]
)
(SELECT
[wtemp]=null -- needs this for except to work
,
[streamcode]
,[stationdesc]
,
[TIMEVALUE]
FROM [_testdatatableGap]
EXCEPT
SELECT
[wtemp]=null -- needs this for except to work
,
[streamcode]
,[stationdesc]
,
[TIMEVALUE]
FROM [_testdatatable])
This method worked to create a table with all the 15min increments in date/time, which resulted in a correctly drawn chart (below). However, we don't know how to scale this up to the full 125 station full data table without making multiple tables.
After working through several suggestions, and a lot of research, trial and error I think I’ve solved my problem. I need to work on my additional complication of sometimes needing to reduce the volume of data returned and graphed, but that part is sort of outside the realm of my original question.
The short version of my answer is:
Made a table view of MyBigDataTable with an additional column which is a datetime column called “TIMEVALUE”.
Made a big permanent datetime calendar table with the datetime column called the same: “TIMEVALUE”.
I then developed a set of SQL queries that
(a) gather data from MyBigDataTable and put it into a #temptable, and
(b) also gathers datetimes from the calendar table and puts it into the same #temptable.
Then, (c) because now there will sometimes be 2 datetime rows, one with data and one with nulls, I run a query to only keep the row with data if there are 2 rows of matching datetime and station. This data can then be charted.
Here’s SQL (here, limited to only 1 parameter, but I have 8):
--Step 1. Check if the temptable exists, if it does then delete it
IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL
BEGIN
DROP TABLE #TempTable
END
;
--Step 2. Create the temptable with data from the parameters, station and dates selected on the .cfm
SET NOCOUNT ON
SELECT
timevalue
,stationdesc
,wtemp
INTO #TempTable
FROM MyBigDataTable
WHERE
stationdesc = 'Station01'
and [timevalue] BETWEEN '5/29/2014' AND '10/01/2016'
GROUP BY
TIMEVALUE
,stationdesc
,wtemp
;
--Step 3. Now select datetimes from a big calendar table, and set stationdesc to the selected station,
--and rest of parameters to null. And do this for the same selected date range
INSERT INTO #TempTable
SELECT
[TIMEVALUE]
,[stationdesc]= 'Station01'
,wtemp=null
FROM MyDatetimeCalendarTable
WHERE [timevalue] BETWEEN '5/29/2014' AND '10/01/2016'
;
--Step 4. Run query on the temptable to gather data for chart, but b/c sometimes there will be 2 rows with the same datetime and station but one with data and one with nulls, this query only gathers the row with data if there are 2 rows with matching datetime and station
SELECT distinct *
FROM #TempTable a
WHERE
wtemp is not null or
wtemp is null and
not exists(
SELECT * FROM #TempTable b
WHERE a.timevalue=b.timevalue
and a.stationdesc=b.stationdesc and b.wtemp is not null)
ORDER BY timevalue
;
I need to fully test it and make some amendments, but I think this satisfies the requirements of an answer, because so far it's doing what I need it to do. Thank you to @Leigh and @Dan Bracuk for their wisdom (and patience!)