Anyone who’s worked with Access will probably know about the First and Last aggregate functions that return the first or last value of a particular column in an aggregate recordset. If all you’re working with is Access, then all is fine and dandy (well, I say fine and dandy, you’re still using Access), but if you want to migrate to a capable database engine, then you’re going to be stumped as most other systems don’t have a first or last function.
Consider the following table showing some temperature readings taken throughout the day by a particular sensor:
| Sensor ID | Time | Value |
| 1 | 14/01/2011 10:14 | 28.5 |
| 1 | 14/01/2011 10:24 | 28.7 |
| 2 | 14/01/2011 10:28 | 14.6 |
| 1 | 14/01/2011 10:34 | 28.9 |
| 2 | 14/01/2011 10:38 | 14.8 |
Imagine that we want to summarise this table by sensor and date providing, among other things, the first and last readings of the day. In Access we could probably do something like:
SELECT sensor_id,
DATE(time) AS date,
FIRST(value) AS first_value,
LAST(value) AS last_value
FROM reading
GROUP BY sensor_id,
DATE(time)
ORDER BY sensor_id,
DATE(time)
Query: access_reading
But any database that doesn’t support first or last isn’t going to like this, so what are we going to do? The first thing is to think about the problem in a slightly different way. Instead of thinking of wanting the first and last values, we want the values on the rows with the earliest and latest times for a particular sensor. We can get the earliest and latest times for a sensor by applying the MIN and MAX functions (which a good database should have) to the time column. So under this new way of thinking, we’re going to need some way to map from the date, to the earliest and latest times under that date, and then join back to our original table to get the values. This intermediate query will look something like this:
SELECT sensor_id,
DATE(time) AS date,
MIN(time) AS early_time,
MAX(time) AS late_time
FROM reading
GROUP BY sensor_id,
DATE(time)
Query: reading_aggregate
So now we can join access_readings to reading_aggregate through the sensor_id and date fields on each query, then we can pull in the ‘early_time’ row by joining back to readings on sensor_id and early_time to sensor_id and time, and again for the ‘late_time’ row. This produces some truly beautiful SQL, so I apologise for the mess below:
SELECT reading_date.sensor_id,
reading_date.date,
early_reading.value,
late_reading.value
FROM reading_date JOIN reading AS early_reading ON reading_date.sensor_id = early_reading.sensor_id AND reading_date.early_time = early_reading.date
JOIN reading AS late_reading ON reading_date.sensor_id = late_reading.sensor_id AND reading_date.late_time = late_reading.date
Query: sql_readings
These queries provide a basis for a pattern, you can add further fields from reading to reading_aggregate. You may also notice that there’s no ORDER BY clause, so if you’re not fussed about sorting data, in the query, you don’t need to worry about it and have the database engine waste time doing so.
I should point out that this pattern only works if your equivalent of sensor_id and time form a candidate key (or are the primary key) for your table. If multiple readings can come in at the same time from the same sensor, then you may end up with multiple rows for each sensor in sql_readings. If this applies to you, then you could replace the joins and their clauses with a subquery selecting the TOP 1 row from reading matching the appropriate criteria:
SELECT reading_date.sensor_id,
reading_date.date,
early_reading.value,
late_reading.value
FROM reading_date JOIN (
SELECT TOP 1 *
FROM reading
WHERE reading.sensor_id = reading_date.sensor_id
AND reading.time = reading_date.early_time) AS early_reading
JOIN (
SELECT TOP 1 *
FROM reading
WHERE reading.sensor_id = reading_date.sensor_id
AND reading.time = reading_date.late_time) AS late_reading
Query: sql_multi_readings
This query will execute slower as it needs to run 2 subqueries for each row, so if your data fits the assumptions for sql_readings, use that one, otherwise sql_multi_readings will help you avoid the result of the cross-join between early_reading and late_reading. Note that this sql_multi_readings there are no criteria on the JOINs because they’re handled in the subqueries WHERE clauses.
The pattern for SQL queries needed to replace First and Last has been demonstrated here. If your ‘sensor_id’ and ‘time’ fields form a candidate key for your table, then the sql_readings query will efficiently replace it, otherwise the sql_multi_readings, a less efficient but stricter, query will get the information you want.
P.S. you think this is boring? Try revising Safety Critical Computer Systems!
0 comments:
Post a Comment