Поделиться через


From Data Processing To Data Psychology: Curating Open Government Data Utility

Guest post by Phil West, a Solutions Architect with the Microsoft Office of Civic Innovation.

A large part of Open Government is what people have termed "open data" - in other words, making what should be publicly-available data more findable, searchable, accessible, useable, and sharable online. Here, I discuss a strategic vision for turning storehouses of data into actionable information that is useful, consumable, and interesting. I also introduce the notion of "Data Psychologists" who curate and analyze how useful, consumable, and interesting data is prior to public release.

Data Cosmetic Surgeons Process Data Formats

I have been struggling with my data relationships for a long time.

My first technical job was working in a “data processing” department as a systems analyst. Simply,my job was to analyze computer systems where we processed data. I took datasets and analyzed their structure,often reformatting them to suit our company's database architectures.

In this role, I never really “consumed” data; rather, I viewed it, sliced it, and diced it. My relationships with data weren't very meaningful to me, in the sense that I cannot recall any of the actual contents for you. I processed prepared data products. like ready-to-eat meals shrink-wrapped in plastic.

My role as a systems analyst was something like a Data Cosmetic Surgeon interacting with the somewhat superficial "body" of data - I worried about the format and appearance of the information, but not the meaningfulness of the information. At this point in my career, I had clinical, superficial relationships with data. My interactions with data were gratifying, but in a narrowly-defined sense of the word.

Modern Data Requires Intimate Attention From Data Internists

But my data relationships have evolved, and matured even. Fast-forwarding two decades, organizations have different relationships with data, too. One of them is an embarrassment of riches: Gigabytes of data! Terabytes! Petabytes, even! And this overload comes in all forms, from from vast numbers of small datasets, to very few really HUGE datasets.

Despite everyone having lots of data and some tactics to deal with it, however, few organizations seem to have a strategic vision for how to transform massive data into a massive asset. While keeping in mind that some data security measures are absolutely necessary, it still seems like a lot of organizations go beyond securing data - they stifle it. They limit its true potential.

What is the vision for how employees can develop more intimate relationships with data, and turn it into actionable information that is useful, interesting and consumable to themselves, their colleagues, and even the public? Within Federal government circles, the Open Government Directive offers some strategic direction, and some progress on open government data has certainly been made. But I believe that organizations sometimes view data and how to use it in a self-limiting way.

For a long time, Data.gov, the Federal government's official open data website, suffered as something of a  bulletin board where agencies would post their datasets (or links to their datasets) as static files in CSV or KML format. These data were in some sense a "grab bag" - you had to buy the whole bag to find out if there was anything useful to you inside it. Maybe geeks and wonks (and 1980's systems analysts) like to download large files and weed through them to discover useful nuggets, but what about the average citizen?

At this point there was a disconnect between Open Government as geeks and wonks see it, and how citizens would find it useful. Would they know what a CSV file is and that they could open it in Microsoft Excel, or would they know how to sort it, analyze it, graph it, etc. to find an answer to their question? Would they have the time and patience? Probably not.

But now, Data.gov is moving into the dynamic data world. In this world, data is stored, displayed, and showcased using methods more familiar to average people. In this world, data can be catalogued and cross-referenced within a government data marketplace. In this world, data servers run standard protocols like OData so that data can be read and written across diverse devices and operating systems. This is the world of Data Internists that touch the "mind" of data.

So - if you are looking for statistics about, say, unemployment per state as a function of average winter temperature, you might also get suggestions about related data, like state trends in unemployment rates over time. The data is also machine-searchable, meaning that applications can query Data.gov datasets, grab what they need, and move on; this is especially important for apps on lower-end devices like phones or tablets.

Data Psychologists Curate Data Usefulness

So – we now have intimate data relationships. We are interacting with the mind and body of data - We have dynamic data in a format that is searchable and usable via an open protocol across a wide variety of devices and platforms.  It is housed in a place that provide a way to catalog and showcase the data, and even provide help in cross-referencing to other related datasets. This foundation should permit an ecosystem of internal and external, third-party applications to consume such open government data, and produce truly valuable intelligence intelligence (of, by, and for the people, as it were).

You might think we're progressing quite nicely in our relationship with open government data - We've sliced and diced it, rearranged it, cataloged it, cross-referenced it, made it respond to a common set of commands...but something is still just not quite right. How can we develop an even deeper, more meaningful relationship with this data?

Well, if Data Cosmetic Surgeons interact with the body of data, and Data Internists interact with its mind, then Data Psychologists go even deeper and interact with data's very spirit. The Data Psychologist is a master of curation, digging through the nuances of datasets to ensure that the data is useful, interesting, and relevant. Some of these nuances that may need to be looked at include the age of the data, the amount of data elements, or the overall purpose of the dataset. Every dataset is somewhat unique, and like developing relationships with patients, Data Psychologists must take each case one at a time.

What makes data useful, interesting, and relevant? Well, each case is certainly different, but a Data Psychologist would distinguish between the usefulness of 30 years' worth of average traffic speed data, and whether the #4 bus was late getting to corner of Elm and Main on August 6, 1966. The Data Psychologist also ensures that data is consistent; having average rainfall every Wednesday for three years is consistent, and having haphazard readings is not necessarily so. And sometimes a Data Psychologist might find that, like memories, sometimes organizations keep useless data for no apparent good reason.

A Preliminary Strategic Vision for Open Government Data

In the age of open government, the Feds need to go on a Data Psychologist hiring spree. Agencies will need to psychoanalyze the internal spirit of their data before they introduce it to the widespread public.

I believe that understanding the spirit of data will be the next wave of “data processing” in which we stop clinically handling the data only like Data Cosmetic Surgeons and Internists. This will clean up a lot of useless datasets, and it will make the remaining datasets more interesting in turn – and more likely to power applications that businesses and individual citizens will find helpful and informative.

And thus, we have a preliminary strategic vision for open government data: Taking curated data, stored in a dynamic format that can be accessed via an open protocol, and making it easily consumed across a variety of devices and platforms.  It will require Data Cosmetic Surgeons and Data Internists, yes, but also Data Psychologists, working together to turn the body, mind, and spirit of open data into actionable information that is useful, interesting and consumable.

Otherwise, we may need to start looking for a Data Coroner.

Photos of body-mind-spirit, Michael Jackson in wax , the internist, and Freud used under Creative Commons.