What is an Open Data Project? (Part 3)

Written by: Mike Davies, West Yorkshire Combined Authority, 29/09/2014

In part 1 and part 2 we looked at why and how you might approach an open data project. So what exactly is this data that we’re talking about? What does it mean?

Data and Wisdom

“They say there’s enough data in the world to make people confused, but not enough to make them wise.”
Louis Cyphre [misquoted] from movie Angel Heart

There’s a way of thinking about data’s place in the world, and it’s expressed like this:
Data Information Knowledge Wisdom

Here’s a true-life example to explain the difference between these things:

I’m standing at the bus-stop trying to work out when my next bus is, looking at the timetables – lots of data! I find the bus service I need, and the next times for my bus – this is more like information, data that is relevant to me here and now, that I can use to decide what to do next.

I remember that my last bus home is 11:00pm (this is knowledge, gained from past experience, memory of getting the last bus). But it’s actually 11:30… oh, if I had any wisdom, I wouldn’t have stayed in the pub so long that I didn’t now have to find a taxi…

Let’s take that analogy and apply it to the complex systems and processes within your own organisation. If you’re trying to understand how all the IT systems within your business operate and link together, or are simply looking for some interesting data to publish on Leeds Data Mill, it’s unlikely that you’ll make first contact with the data itself. Whether it’s in databases running on servers or in files saved in folders, the data is initially completely inaccessible to you.

To get to the data, what you’re really doing is following the data ⇢ knowledge trail, but in reverse:

Knowledge = people that use the systems every day
Information  = what the IT systems provide
Data = databases and files

This is why open data is all about people; data on its own doesn’t help you find or understand it, it just is. Your route to the data will be through the people who receive it, use it, view it.

As you follow the rabbit hole towards the data, you’ll be picking up important information about what the data means, even before you’ve seen it. For example:

  • who is responsible for the data?
  • where is it stored?
  • how much of it is there?
  • does it contain personal information about people?

This information is metadata, data about data. (To be honest, the academic distinction between metadata, data and information is not something I think about too much. All that matters is ‘does it seem interesting and useful? Does the metadata help me organise and explain the data that it describes?’)

Metadata is data! – in other words, it needs to treated like data, in a structured way. There are a few standard ways of describing data (e.g. http://dublincore.org/) and the reason why such standards are necessary is to help people share and link their various data with each other in a consistent way, and this is key to making Open Data a workable idea. But such standards won’t necessarily help you to make sense of your own data, that’s specific to your organisation. Remember, the point is to firstly make the data understandable and useful within your own organisation, and then to anyone looking at it should you publish it. So there are really 3 sets of metadata:

  1. Describing what the data means to your organisation
  2. Describing what the data that you publish means to people outside your organisation
  3. Describing how the data is published and what its context is (e.g. ‘data for Aug 2014’)

What I’m really saying is, it’s all about the data, but the data is only the start. The real problem is in revealing the hidden relationships between things, between people, places and services… that discussion is for another time maybe!

@johnmaeda: “It’s not about choosing X versus Y. It’s about choosing the right relationship between X and Y.” —@jshefrin

For those of you just beginning your open data trip, I hope these blog posts will inspire you to explore further. Don’t forget, there is a wonderful community here in Leeds and WY to support you!

Mike is a Business Analyst at West Yorkshire Combined Authority (WYCA) which is the official government agency for transport across West Yorkshire. Get in touch with him on twitter @dotlineform

All views expressed in this article are Mike’s and do not necessarily reflect those of his employer WYCA.

Feature image created, and used with permission from Mike Davies via Behance