April 01, 2018
In CARTO we have a Slack channel called #using-carto
where Iñigo sets a weekly challenge. We have to use CARTO tools to answer a very open question. Last week it was:
I am visiting Madrid and a friend has given me her BiciMAD card.
I love maps and above all maps made by myself. :)
I wonder how I could create a map that allowed me to see easily
Madrid neighbourhoods and the BiciMAD parkings located in them.
It would be great to select parkings with the greater capacity,
so I could be fairly sure of finding a free place for my bike.
It’s fun to see how we all end up answering the same question with different approaches. We have to document every step we take, pointing at the pains we discover along the way.
A common question here is How can I insert data in CARTO to an existing dataset?. My take this week with #using-carto
is answering not the original challenge but this other question.
BiciMAD is the public bike service of Madrid. They provide an API to check the status of its station network. What I want to do is to feed a dataset in CARTO with the results of an API. It’s a small dataset: 172 bike stations with 11 fields each, mostly numbers. It looks like a job for the SQL API!
As stated in the docs, CARTO SQL API allows you to interact with your tables and data inside CARTO as if you were running SQL statements against a normal database. It even has batch queries to send long-running jobs to the database.
Using the BiciMAD bike stations info as an example, I’ll show three different approaches to update your dataset. In all of them we’ll have the restriction of using batch queries. They have a limitation of 16KB size, so we’ll take that into account as we proceed.
If we don’t need to save historical data, what we can do is to erase all data from the dataset and replace it with the result of the BiciMAD API call. This ensures to have our data up-to-date.
As we are using batch queries what we should do is:
Sort the records coming from the API by their ID. We need to delete only the records that will go in the current job. Having them sorted will allow us to know exactly the range of records we need to erase first.
Start writing the SELECT query.
INSERT INTO ${username}.${table} (id, name, dock_bikes, ... , the_geom) VALUES
(1, 'Avda America', 24, ... ),
(2, 'Quevedo', 15, ... ),
(3, 'Suchil', 7, ... ),
DELETE FROM ${username}.${table} WHERE id >= 1 AND id <= 103
[
"DELETE FROM ...",
"INSERT INTO ...
]
If you don’t feel comfortable with seeing only part of the data while it’s being updated, you can create UPSERT queries instead of a bulk delete and insert.
// You need a unique index on `id` in order to get a conflict
INSERT INTO ${username}.${table} (id, name, dock_bikes, ..., the_geom)
VALUES (${values}, ${geom})
ON CONFLICT (id) DO
UPDATE SET ${keysAndValues}, the_geom=${geom};
`;
In this challenge it doesn’t make sense the two previous approaches. If someone is interested in knowing real-time the bike availability, she’d use the official app. We cannot keep the same pace.
The use case that it’s more interesting is to save the historical data of the stations, so we can run analyses later to get insights about the hotest stations or usage patterns.
And it’s even easier than the two previous strategies. We only need to add a timestamp column and then using a multiple rows statement.
I went for the third approach, updating the table every hour instead of aiming for ‘real-time’.
I implemented a Google Cloud HTTP function that calls the API and then builds the batch query jobs. The code of the function is in this Gist. Then, I use an uptime service that pings that URL every hour. I chose StatusCake but there are plenty of them. I did it like that for convenience but you can set a cron job from a server as well.
The dataset is public and growing. You can access it here. I plan to let it getting fat all April and then use the resulting dataset to run some analyses on it. Feel free to use it!
Written by Ivan Malagon, front-end lead at CARTO and Hip Hop nerd. So if you like coding or Rap music you should follow me on Twitter or you can drop me a line to hacheka·gmail_com