project:discoverabilitythroughstructure

1. http://opendata.admin.ch/de/dataset/je-d-17-02-01-01 lacks important structured metadata:

  • Date range, which is only in the description.
  • Geographical information (“Switzerland”), as this is assumed to be implicit.
  • Multi-language metadata.
  • Keywords like the names of the parties mentioned in the data.

2. Open data is kept in disconnected portals run by individual organizations. Potential users may not be aware of all the relevant portals.

Create a global open data search portal that can forward queries to individual instances and do metadata search by:

  • Keyword
  • Geography
  • Timespan
  • Data Format
  • Licence
  • Category

A simple implementation of this would be a site that allows querying multiple CKAN instances. From this point, additional major open data portal types could be targeted. One possible backend for this might be pazpar2.

Use case

Goal: A user wants to find data that combine election results with demographic characteristics (age, nationality etc., areas).

User workflow without the meta portal: The user searches different portals with different interfaces in order to find the information he wants. Hence he has to enter the same query several times. After having found the necessary data, the user has to combine/visualize the data himself using a separate applicatio Disadvantages: Searching takes a lot of time or the necessary data cannot be found although the data would actually be available.

User workflow using the meta portal: The user only searches in one portal with one interface. The system helps the user to find the data by providing filters, suggestions, culstering, maps. He might find data from repositories that he otherwise wouldn't have searched.

Specific user workflow:

  1. The user enters the keywords zurich elections
  2. The system suggests search terms/displays boxes using disambiguation (e.g.: zurich in switzerland city, zurich in switzerland canton).
  3. User selects “his” Zurich and the list of results is adjusted according to his choice.
  4. User selects relevant datasets and saves them to “download list”
  5. He starts a new search zurich demografics.
  6. The system provides facets/filters (age, nationality, sex, education) in order to refine the list of results.
  7. see 4)
  8. The user has the possibility to directly combine and visualize the actual data within the portal itself.

Datasets to experiment with:

Metadata on individual open data resources is often not as comprehensive as it could be. The idea is to extract additional metadata from the data itself. Ideally, moving towards being able to query and display individual records in each resource.

This could be for example implemented as a CKAN extension and/or a standalone executable. CKAN already has the DataStore extension, which allows full-text search of files uploaded to a CKAN instance. Metadata extraction would improve this further.

When searching for open data records, results may be missed because of mismatches between search queries and indexed concepts. For example, a query for “Graubünden” should probably return results tagged with “Chur”. To make this work, an ontology of concepts can be used during indexing or search.

This could be implemented as part of the search portal, or also as a CKAN extension.

GeoNames may be a good source of geographical ontology data.

Mockups created using https://balsamiq.com/

demo, source code

Future Work:

  • Sorting by title, pub date, relevant date
  • Searching by relevant date (date/date range the information is about)
  • Example of a facet widget that is enabled for certain searches, eg Volksinitiativen have data sets by different areas like Canton, Municipality, etc.
  • project/discoverabilitythroughstructure.txt
  • Last modified: 2019/02/03 21:17
  • by loleg