#12/04/2010#

"And now for something completely different"

So… I haven’t posted here for quite a while but I’ve recently been working on an interesting part of our group Integrated Project at uni and I thought that some of the work I’d done might be useful to someone else out there.

For our project my group is creating a bus tracking system.  There’s a website where you can work out the buses you need to take to get to your destination and then there (should be\is) a mobile app that users can use to record the status of the bus they’re catching and let others know whether the bus is on time and\or full.

This obviously needed two important pieces of information: bus stops and bus times.  Bus stops is easy, the government maintains a national database of all the bus stops in the UK (known as the NaPTAN), which is a 300Mb XML file for the entire UK or something much smaller for Bath.  Being XML, extracting the useful information from this file is easy as because its all nicely documented.

Related to NaPTAN is TransXChange, but on request, Bath and North East Somerset Council (BANES) wasn’t able to give us the data in this nice XML format, no, they gave us OT7 files.  OT7 files are unannotated text files with all the information for a particular bus timetable – all the services, stops and times.  There seem to be two flavours of the file, one has complete detail about every stop the bus stops at and the other has the more limited detail you tend to find on public timetables.  If you’re looking at making a system involving bus routes you’ll need the more detailed file to get every point where the bus stops.

There are three datatypes in an OT7 file: numeric, string and null.  Each data item is held on its own line.  Strings are enclosed in quotes, numerics are not and null is simply #NULL#.

The first line of an OT7 file is a number enclosed in “[ ]”, which usually ends in a 3 in Bath.  Following that is a string representation of the days the timetable covers.  The third line is the date the timetable is valid from and the fifth line contains the date the timetable was made (I guess).

Thereafter follows 4 strings and 4 numbers.  The last 3 numbers are important and will tell you how many services, stops and times are in the file.  After the numbers are a variable number of numbers enclosed in quotes.  The numbers appear unimportant, but keep a count of how many there are because you’ll need it later for stops.

8 lines further on is the days the timetable is valid on in less ambiguous format: Monday, Tuesday, Wednesday, tHursday, Friday, Saturday and $unday.

Three lines below that the services start.  Each service is 5 lines long.  The first line is the service number, the second is a description of the route and the third is the operator name.  The other two are usually blank.

2 lines after the services is the list of stops.  The first line is the name of the stop and the last line is ATCO code that you can look up in the NaPTAN database.  In between is 26 + the count of numbers you should’ve kept from earlier.

Immediately after the stops is the list of times.  Bus times start with the service number, followed by 48 lines that appear unimportant followed by the times for that particular bus at each stop.  Times are always separated by 7 lines of “”.  Times are listed in the same order as the list of stops so time[i] relates to stop[i].  At the end of the times are some useful codes associated with the stops which you may or may not want to ignore.

It’s worth mentioning that if an OT7 file contains more than one service not every service will stop at every stop, so if you’re looking to split out each service into an individual database record for example, you’ll need to make sure that you work out which stops each service doesn’t stop at (indicated by a blank time).

All the above is from my own analysis and reverse engineering of OT7 files.

0 comments: