Using Insights to take historical snapshots of your data
A common limitation of 3rd-party APIs is that they serve only “current” data. But what if you want to retain the history of your data as it changes over time? In this article, I’ll demonstrate how you can use a simple Insight to record an audit trail of any of your endpoints.
The problem
Consider that your business uses a CRM to manage and bill clients. When you deliver goods to these clients, you use the address that you have recorded against them in your CRM.
As part of your reports, you need to show historical sales for all customers, including the delivery address. Because the CRM doesn’t record the delivery address against your Invoices, you do a simple JOIN to the Client table to load the address from there:
select c.Name, c.Address, i.Total
from Invoice i
inner join Client c on i.ClientID = c.ClientID
The problem with this is that every so often a client’s address changes and when you update the record in your CRM, your reports show this new address for all your historical sales.
What you really need is a snapshot of the address as it changes over time, so that you can reference the address as it was at time of each invoice.
The solution
SyncHub lets you easily resolve this problem by creating an Insight which detects changes to your data and records them in a timestamped table that you can reference in your reports.
Caution: technical stuff follows. If you haven’t read up on Insights yet, I recommend you do so - and don’t forget to read about Insight Parameters too, because we’ll be using those.
The solution consists of two parts - a self-referencing Insight which searches for changes, and a Parameter which caches your data, ensuring excellent performance for your Insight. Let’s start with the Parameter itself, as it is the easiest. Copy this SQL into your Query Editor (I have documented the code inline, so read it carefully):
-- Create a temp table to store the latest value of each record
declare @Keys table (
SnapshotKey nvarchar(50)
)
-- Include all existing snapshots by querying the existing records from your Insight
-- Remember, your Insight may not have been created yet, so you need to check
-- if the table exists first. You can call your table whatever you like, but make a note of it
-- as you'll be using it in your other Insight later too. In this case,
-- I have chosen to call it 'harvest_client_snapshot'
if exists (select 1 from information_schema.tables where table_schema='sh_report_cache' and table_name='harvest_client_snapshot')
begin
insert into @Keys
select distinct SnapshotKey from sh_report_cache.harvest_client_snapshot
end
-- We also add a "current" key, which our calling function uses to populate the current snapshot
-- The snapshot key is simply a representation of the date time, in an *orderable* format, which means yyyyMMddHHmmss
-- It must be orderable because get the LATEST version from our snapshot table. It also must be a string, because we have to account
-- for the '0' placeholders here and there
declare @SnapshotKey nvarchar(50) = FORMAT(GETUTCDATE(), 'yyyyMMddHHmmss')
insert into @Keys values (@SnapshotKey)
-- Return our list
select * from @Keys
So far so good. Now, you need to make it a Parameter, so that we can use it in our other Insight:
Great. Now, create a second file in your Query Editor and pop in the following SQL:
-- By pulling the key from our parameters, we enable Insights to cache
-- Note that my parameter name may differ from the one you created in the earlier step
declare @SnapshotKey nvarchar(50)
set @SnapshotKey = '[PARAMETERS.1_harvestclientsnapshotkeys]'
-- Create a temp table to store the latest value of each record
-- The columns here must match what you eventually want to snapshot for your table
declare @MostRecentSnapshot table (
RemoteID nvarchar(200), -- Always grab the RemoteID from the record you are interested in
Name nvarchar(200),
Address nvarchar(1000),
SnapshotKey nvarchar(50)
)
-- Load our latest snapshots? Remember, our Insight may not exist yet, so we need to check
-- first. And note that the table name must match that in your prior Parameter query
if exists (select 1 from information_schema.tables where table_schema='sh_report_cache' and table_name='harvest_client_snapshot')
begin
;with latest_records as (
select RemoteID, max(SnapshotKey) as LatestSnapshotKey
from sh_report_cache.harvest_client_snapshot
group by RemoteID
)
insert into @MostRecentSnapshot
select s.RemoteID, s.Name, s.Address, s.SnapshotKey
from sh_report_cache.harvest_client_snapshot s
inner join latest_records l on (s.RemoteID = l.RemoteID and s.SnapshotKey = l.LatestSnapshotKey)
end
-- Now just select the records that are in your current realtime table, but whose data differs from that in the snapshot table
select
c.RemoteID, c.Name, c.Address, @SnapshotKey as SnapshotKey
from [CONNECTIONS.harvest].Client c
-- Compare to their latest snapshot?
left join @MostRecentSnapshot snapshot on (c.RemoteID = snapshot.RemoteID)
-- Get records where either the snapshot is different, or there is no prior snapshot
where (
snapshot.RemoteID is null
or (
c.Name <> snapshot.Name
or c.Address <> snapshot.Address
)
)
And that’s it for the SQL - not too bad, right? Next, you just need to create an Insight from this second query:
Important - set your parameter cache
Open the newly created Insight and observe its settings. It is crucial that you configure your Insight to cache your prior records, otherwise it will just continuously overwrite your snapshots with the latest data. To do this, select the Partitions section from your Insight…
And set your cache set zero, which means “immediate”:
Let’s see it in action…
Here is my Insight after it has run for the first couple of times. You can see that the first time (1) it took all of my Client records, but the second time it ran (2) it took no records at all:
Now, I’m going to visit my CRM and add the city and suburb to the address of one of my clients…
…and the next time we run our Insight, voila - it has picked up the changed record and recorded it!
Remember: your regular SyncHub sync must have occurred (obviously) before your Insight runs again.
Let’s check the table itself to see what our audit trail looks like:
Daaaaamn…! That’s cool.