Last night I attended the monthly Dorkbot event at the Centre For Life. As usual there were lots of fascinating talks about a range of subjects, from building a virtual model of Newcastle and Gateshead in Second Life to modeling the Tyne & Wear Metro system in real time in Google Earth. Geek cool at it’s very best.

One particular topic was especially interesting. John Coburn and Mike Hirst presented an overview of “Culture Grid”, a project designed to open up data held by museums and archives across the North East. They were interested in what people might be able to come up with given free reign to search through what data has been made available so far.

My interest was piqued.

So this morning I fired up PHP and pointed it at the Culture Grid API. Easy stuff really, Culture Grid uses an SRU mapping over Apache SOLR so it’s all fluffy and standards compliant. Only… no. Things are never that straightforward.

The first problem is a woefully bad set of documentation. There’s a single PDF that essentially just lists the available fields that the API can return. A couple of example URLs are included, but they’re obvious. That’s all the help you’re going to get. There are no code samples, no libraries for common languages, and apparently no forum or community to turn to to access what’s been worked out so far. Admittedly the service is very new so there’s been little time to build up any following, so perhaps that’s forgivable at this stage in the project.

That wouldn’t be such an issue if the data that’s returned by the service worked with the tools we have available. The XML returned by Culture Grid won’t parse with LIBXML (eg PHP’s SimpleXML wrapper). This is a significant problem for anyone either relatively new to coding or who doesn’t want to dedicate a some amount of time to coding something that will work with it. It is accessible in a very basic way by stripping out the “:” namespace separator in all the XML tags, but the resulting XML tree isn’t very useful either. The API accepts an XSL stylesheet, so it should be possible to write a set of transformations that will massage what comes out into something that can be parsed by PHP, but that’s not going to be much fun.

I think it’s unlikely anyone is going to bother unless they can think of something that’s important to them. Playing with the data just isn’t going to happen unless it’s presented in a better format to start with. As it is, because I’m interested in using the data, I’ve cobbled together some code to parse one of the examples into a standard PHP array; http://www.usrlab.com/code/culturegrid.phps . Note that this is a very rough and ready script. It’ll need a lot of work before it’s actually usable. Feel free to build on it.

This case highlights the biggest problem facing all Open Data initiatives. If the data isn’t in a format that someone can easily access then it’s going to lie fallow, unused and unexplored. As part of Compare The Members I’ve seen lots of mashups that use some of the data sets opened up by the government as part of the push towards a move transparent government. It’s notable that the more accessible the data is, regardless of how interesting or controversial it is, the more people will make something out of it. Ease of access is paramount.