Photo Collection Annemarie Schwarzenbach

Metadata comes in form of a csv file, the images are available in a zip, but they are also published on wikidata. Wikidata links are provided in the csv, though they were faulty and needed some correcting (url encoding, repalcing «/» with «-»). Places or geocoordinates are not available in the csv, even though there is a column named «Ort» («places»), it only contains places in Switzerland.

Project

The Project was actually more of a side-project for some people in the rilke-jung correspondence network group during the openGLAM Hackathon in september 2017 at UNIL. There is a website made in 2008 which visualizes Annemarie Schwarzenbachs travels and some of her pictures on a map. It does not contain all images now available on wikidata. The goal of the project was to geolocate as many images as possible, and test out different tools to visualize them on a map. As mentioned above, the information in the column «Ort» in the csv is sparse. However, in the title oftentimes a placename is included, in the form «country, city, description». So the first step was to extract this information. This was done in Openrefine, by splitting the column title at the first and the second comma. Next, the new placenames were reconciled with Wikidata. A lot of the places were not recogniced automativally. Additionally, there were words in there that were not places, as some of the titles did not contain any place name. In consequence, 32 out of the 3486 images did not get any geolocation. Roughly 500 images had not very precise place names like “northern atlantic”, but those could get some coordinates nevertheless. After finishhing reconciling, the next steps were aimed at getting the coordinates out of wikidata. Here, we first went a long route that didn't give the expected results. First, we wanted to display the Q-numbers in a column, so we would then be able to fetch additional information from wikidata. The command used in openrefine is

cell.recon.candidates[0].id

Getting the coordinates proved more diffcult than expected. The quick and dirty solution applied was to fetch the entire wikidata pages

"https://www.wikidata.org/w/api.php?action=wbgetentities&ids=" + value + "&format=json"

then splitting at “latitude”: and “altitude” to extract the lat/long geocoordinates. When displayed on a map, we had a lot of wrong coordinates (e.g. Barcelona in Venezuela instead of Spain). We found out that our command to dispaly the Q-Numbers didn't get the ones we matched, but instead just used the first proposition by wikidata. So we tried another aproach. Adding a column to the column with the reconciled places using this command

"https://tools.wmflabs.org/openrefine-wikidata/en/fetch_values?item="+cell.recon.match.id+"&prop=P625&label=true&flat=true" 

(where P625 are coordinates in wikidata), created a column with the correct coordinates.

When trying to visualize the images on a map, we ran into the problem that the wikidata links provided in the csv pointed to the entry for the image, not the acutal image itself. This could be resolved by searching and replacing File: with Special:Filepath/. Here's the GREL command used:

value.replace("File:", "Special:Filepath/")

Different tools were viewed and tested to display the 3500 images.

Tools

  • Openrefine was used to clean the data, and to enrich it by reconciling with wikidata.
  • Palladio allowed to show points on a map, and different styles of graphs. Links to the images on wikidata directly visualized on the map are not possible. There is a gallery view where the images can be displayes. Palladio projects need to be exported as a json, can not be included directly to publish.
  • Viseyes seems to have problems with datasets of more than a few hundred entries. For smaller datasets, it provides a nice view with a map, a timeline and a «story» column.

Data

Team

  • Lena Heizmann
  • Dominik Sievi