Building meaningful charts using Xero’s Reporting API
Xero’s Reporting API is an extremely powerful endpoint which provides us with the data required to assemble their most common reports, such as Balance Sheets or Profit & Loss. Unfortunately, with this power comes a level of abstraction that makes it quite tough to wrangle the data into a meaningful chart or report.
In this post, I will demonstrate what the structure of this endpoint means, why it is actually excellent (for us nerds), and the SQL required to coerce it into something that humans (not nerds) can understand.
The structure
Please note: In this tutorial I’ll be using the Profit & Loss report, but the same concepts apply to all of Xero’s reporting endpoints
When you think about your Profit & Loss report, you probably imagine it visually - a list of items down the left hand side, with corresponding values for the month in a separate column. Something, like this (taken from Xero’s own dashboards):
But that’s not how a data-driven person sees it. We see a huge range of considerations when trying to convert this into a consistent format suitable for an API. No, we see something more like this:
Reports are divided into sections, with section totals at the bottom of each. This also indicates that there might be sub-sections (or sub-sub-sections) in some reports
The column that contains the totals on the right, also contains a heading, which is of type string. This means the numbers may be type string as well - making things like
sum
oravg
difficultDifferent variations show a different number of columns, such as a YTD roll-up
…you get the idea…
Abstracting the data
The variance in formats makes it prohibitive to generate a different endpoint for each permutation, so what the Xero team did was abstract the data into its common features - rows & cells. Let’s have a look at what the API returns (I’ve truncated to only the Gross Profit section):
{ "Id": "896ccad4-d9f9-457c-9453-3998692ae1d6", "Status": "OK", "ProviderName": "API Explorer", "DateTimeUTC": "\/Date(1686862568907)\/", "Reports": [ { "ReportID": "ProfitAndLoss", "ReportName": "Profit and Loss", "ReportType": "ProfitAndLoss", "ReportTitles": [ "Profit & Loss", "Demo Company (NZ)", "1 June 2023 to 30 June 2023" ], "ReportDate": "15 June 2023", "UpdatedDateUTC": "\/Date(1686862568907)\/", "Fields": [], "Rows": [ { "RowType": "Header", "Cells": [ { "Value": "" }, { "Value": "30 Jun 23" } ] }, { "RowType": "Section", "Title": "Income", "Rows": [ { "RowType": "Row", "Cells": [ { "Value": "Sales", "Attributes": [ { "Value": "7d05a53d-613d-4eb2-a2fc-dcb6adb80b80", "Id": "account" } ] }, { "Value": "6398.52", "Attributes": [ { "Value": "7d05a53d-613d-4eb2-a2fc-dcb6adb80b80", "Id": "account" } ] } ] }, { "RowType": "SummaryRow", "Cells": [ { "Value": "Total Income" }, { "Value": "6398.52" } ] } ] }, { "RowType": "Section", "Title": "Less Cost of Sales", "Rows": [ { "RowType": "Row", "Cells": [ { "Value": "Purchases", "Attributes": [ { "Value": "573a170b-6792-4cfa-b8ce-6f8fd27f5458", "Id": "account" } ] }, { "Value": "730.43", "Attributes": [ { "Value": "573a170b-6792-4cfa-b8ce-6f8fd27f5458", "Id": "account" } ] } ] }, { "RowType": "SummaryRow", "Cells": [ { "Value": "Total Cost of Sales" }, { "Value": "730.43" } ] } ] }, { "RowType": "Section", "Title": "", "Rows": [ { "RowType": "Row", "Cells": [ { "Value": "Gross Profit" }, { "Value": "5668.09" } ] } ] }, { "RowType": "Section", "Title": "", "Rows": [ { "RowType": "Row", "Cells": [ { "Value": "Net Profit" }, { "Value": "4986.98" } ] } ] } ] } ] }
Wow, that is quite a lot to take in. You’ll probably recognize a few things like your product/service names and the totals. But it’s not immediately obvious how to coerce this into a tabular format.
Step 1/2 - Storing the data in a relational database
The format of the API response is called JSON
- and it’s a common format for APIs to return data in. It’s possible to query and manipulate JSON data, but your reporting tool probably doesn’t have this facility and besides, you have to be pretty nerdy.
So, our first step is to convert this JSON into a relational-database format. Reporting tools love relational data. At SyncHub, below is the structure we came up with.
If you have a SyncHub subscription connected to Xero, these tables will automatically appear in your database, and the data will be kept up to date automatically too.
As you can see, we retained the abstract nature of the data (rows & cells), and augmented with a few other quality-of-life fields such as typed columns and foreign keys back to the Account
records.
At this point you’re probably thinking that we’re just replacing one confusing solution for another. And from a human’s point of view, you’re right. But from your reporting tool’s point of view there is one major new benefit - SQL queries.
Step 2/2 - Querying your Xero reports using SQL queries
Now, this data does not lend itself to your run-of-the-mill SELECT queries. No, for this you need something more powerful - the PIVOT query. I won’t go into details about exactly how the PIVOT query works - you can search Google or ask ChatGPT about that. But in short, in our case it lets us use the imported column values (i.e. month names) as columns headings. I’ll cut straight to the code, pasted below.
Note that I’m using MS SQL, which is the default database provided free as part of your SyncHub subscription. If you’re using an alternative such as Postgres, you’ll have to tweak accordingly.
declare @ReportID nvarchar(500) declare @ColumnNames table (ColumnName nvarchar(500), CellNumber int) -- Get the most recent report select top 1 @ReportID = RemoteID from [CONNECTIONS.xero].ProfitAndLossReportByMonth order by [From] desc -- Some reports do not have a column name in the initial column, as it is implicit. Our pivot below however -- needs a column name, so we'll create one here declare @DefaultColumnName nvarchar(100) = 'Item' -- Get our column names. Most typically these are the months that your report covers (e.g. 'Feb 1979') -- Note that this logic is duplicated in the dynamic query below - REF_COLUMN_NAMES insert into @ColumnNames select case when isnull(cc.Value, '') = '' then 'Item' else cc.Value end as ColumnName, cc.CellNumber from [CONNECTIONS.xero].ReportRowCell cc inner join [CONNECTIONS.xero].ReportRow header on (cc.ReportRowRemoteID = header.RemoteID and header.RowType = 'Header') where header.ReportRemoteID = @ReportID and cc.IsDeleted = 0 and header.IsDeleted = 0 -- Check that we have data if not exists (select 1 from @ColumnNames) begin raiserror('No data is present - please make sure your data is synced and up to date', 16, 1) return end -- Pull the column names into a variable so we can dynamically inject into our query below declare @Columns nvarchar(max) = '' select @Columns += QUOTENAME(ColumnName) + ',' from @ColumnNames order by CellNumber set @Columns = left(@Columns, len(@Columns) - 1) -- Dynamically execute our SQL so that we may inject these column names using string concatenation declare @SQL nvarchar(max) = N' ;with allcells as ( select rr.Title, rr.RowType, rr.RowNumber, rr.RemoteID as RowRemoteID, columnNames.ColumnName, cc.[Value] as CellValue from [CONNECTIONS.xero].ReportRow rr left join [CONNECTIONS.xero].ReportRowCell cc on rr.RemoteID = cc.ReportRowRemoteID -- Cross reference to find the corresponding column name, as per the same logic in REF_COLUMN_NAMES above -- We need this so that we can JOIN later left join ( select case when isnull(cc.Value, '''') = '''' then @DefaultColumnName else cc.Value end as ColumnName, cc.CellNumber from [CONNECTIONS.xero].ReportRowCell cc inner join [CONNECTIONS.xero].ReportRow header on (cc.ReportRowRemoteID = header.RemoteID and header.RowType = ''Header'') where header.ReportRemoteID = @ReportID and cc.IsDeleted = 0 and header.IsDeleted = 0 ) columnNames on cc.CellNumber = columnNames.CellNumber where rr.ReportRemoteID = @ReportID and rr.IsDeleted = 0 ), cellsByColumn as ( -- Pivot the cells so that they are based on the column name select * from allCells pivot ( -- Pivot requires that we use some kind of aggregation, so we'll just use min() here - noting that there is only one value anyway min(CellValue) for ColumnName in (' + @Columns + ') ) as pt ) -- Now just format the final report, using all the (dynically generated) month names as columns select Title, ' + @Columns + ' from cellsByColumn where Rowtype <> ''Header'' order by RowNumber ' -- The work above merely concatenated a long SQL string. Here, we execute it to actually query the database, using sp_executeSQL exec sp_executeSQL @SQL, N'@ReportID nvarchar(500), @DefaultColumnName nvarchar(100)', @ReportID, @DefaultColumnName
If you execute this against your SyncHub data store, you’ll get something along the lines of the results below:
So, how can you use this?
I won’t lie to you - that is a gnarly query. But, that is the nature of BI sometimes, and think how impressed everybody in the office will be! So let’s move on to the final step - how exactly can you use this query in your reports?
Using your own reporting tool
Unfortunately, we can’t provide guidance for every reporting tool, but (assuming your tool supports SQL queries, which it almost certainly does), what you can essentially do is copy/paste this into your editor and tweak accordingly.
Note: you must replace the schema placeholder
[CONNECTIONS.xero]
with whatever the schema name of your SyncHub data store is.
There is also the chance that your reporting tool won’t let you execute dynamically-assembled strings. In this case, you’ll have to either:
hard-code the column names (changing every month, as required)
use our Insights option below, then query the resulting table from your reporting tool
Using SyncHub Insights
For a much easier solution, you can use this query in our Insights platform. Insights will periodically execute the query, and store the results in a new database table. From there, you can either:
query your new database table from your reporting tool, using a simple
select * from…
query; oruse Insights itself to render and share your reports
To conclude
Xero is great. Their Reporting API is great. SyncHub is great.
Happy reporting everybody!