Data data data!
I love data. I love creating databases, maintaining them, mashing them
up, and creating aggregated information. It pains me when data goes
missing or data is locked in a format that makes it useless to access.
But talk about data is very abstract. What do we mean by data?
1. WHAT is the data?
- Activity,
such as title, contract number, descriptions, points of contact,
location, and other "stuff" about what we are doing? i.e. Our activity
will train local government officials in XX to administer legal systems.
We will train 20 trainers, who will train more trainers, throughout the
country and build a website.
- Progress reporting? i.e. Our activity has accomplished the training materials creation, establishment of the website and opened an office.
- Impact and outcomes?
- i.e. As the purpose of our activity is to improve the capacity for
the Government of XX to administer laws, our key indicators will be
number of days it takes government to go from identification to
execution, or the public perception of how the government is doing.
- Best practices/lessons learned? i.e. We learned that trainees have to be of mixed gender in order to not reinforce gender bias...
- Success Stories?
i.e. Jane Smith is a local government official who has improved her
abilities in X, Y and Z ways (and here is a photo of her teaching a
class of other government officials)
- Other development or economic indicators? i.e. Performance indicators for government capacity, by country, region or city, over time.
2. How is the data:
- Formatted?
PDF, JPG, CSV, relational database, JSON, excel are all common formats,
and each one has benefits and huge challenges in data sharing.
- Structured?
is it in an unstructured format (i.e. text and pdfs) or structured?
Does it use common taxonomies? Does it use common standards and
indicators (especially GIS) making it easy to compare, combine and
aggregate with other data sources? (for guidelines for international development data standards, check out the International Aid Transparency Initiative)
- Validated? How "clean" is the data? does it have a lot of errors? Is it an exhaustive account of this area or illustrative?
3. Where is the data?
- Captured and collected? who is doing the collecting and how are they sharing it? Are they doing it ad hoc or do they have standard formats and templates? Do they use a data reporting tool? Who is reviewing and making sure it is complete?
- Shared with and reporting to? who is receiving the data, and what are they doing with it? Are they repurposing it or sending it on as is? Are they validating it/checking it?
- Maintained/stored/housed? Where are the archives? Where are the past versions? What about backups? what about long term lifecycle?
- Accessible? How do we get a copy?
4. When is the data updated?
- Frequency of data capture? Is this data captured daily? Monthly? Annually?
- Frequency of database updates? how frequently is the data we get access to updated from the data captured (many government datasets are updated only annually)
- Start and end dates of data? If the data capture is part of a funded initiative, what happens at the end of the data? What was the start date of data capture?
5. Who has access to it?
- Is the data publicly available? Should it be?
- Copyright or IP issues?
- Are there privacy/security concerns? how are they being addressed?
- Which users need what format to access it? (raw data, aggregated/analyzed, dashboard, meta data tools)
So when we talk about data, we need to be able to answer all of these questions and then we can start discussing how to make that data open.
For some examples of data sets that span the above questions, please see USAID Dev Data Sets
« Back to Sonjara Blog