project:jung_rilke_correspondance_network

This is an old revision of the document!


(screenshots or sketches up here)

Joint project bringing together three separate projects: Rilke correspondance, Jung correspondance and ETH Library. Include links to your demo and/or source code, relevant documentation, tools, etc.

  • List and link your actual and ideal data sources.

ACTUAL

Comment: The Rilke data is cleaner than the Jung data. Some cleaning needed to make them match: 1) separate sender and receiver; clean up and cluster (OpenRefine) 2) clean up dates and put in a format that IT developpers need (Perl) 3) clean up placenames and match to geolocators (Dariah-DE) 4) match senders and receivers to Wikidata where possible (Openrefine, problem with volume)

IDEAL

DATA after cleaning:

https://github.com/basimar/hackathon17_jungrilke

* Description of steps, and issues, in Process (please correct and refine).

Objective: provide a framework for correspondance, defining a database that can can be used not just by these two projects but others as well, and that works well with visualisation software in order to see correspondance networks.

Issues with the Jung correspondence is data quality. Sender and recipient in one column. Data cleaning still needed. Also dates need both cleaning for consistency and transformation to meet developper specs. (Basil using Perl)

Will look for Q with Open Refine, note them, and list names that need to be created in wikidata for future use.

Issues with the target database: Fields defined, SQL databases and visuablisation program being evaluated. How - and whether - to integrate with WIkidata still not clear.

Issues: letters are too detailed to be Wikidata items, although it looks like the senders and recipients have the notability and networks to make it worthwhile. Trying to keep options open.

As IT guys are building the database to be used with the visualization tool, data is being cleaned and Q codes are being extracted.

Doing this all at once poses some project management challenges, since several people may be working on same files to clean different data. Need to integrate all files.

Additional issues encountered: - Wikidata Q codes that Openrefine linked to seem to have disappeared? Instructions on how to add the Q codes are here https://github.com/OpenRefine/OpenRefine/wiki/reconciliation. - The second file, with over 16,000 lines, appears to be too big for Openrefine to match with Q codes. Proposed solution: split it into several files.

Please add yourself to the list

Flor Méchain (Wikimedia CH): working on cleaning and matching with Wikidata Q codes using OpenRefine.

Lena Heizman (Doda): Mentoring with OpenRefine.

Hugo Martin

Samantha Weiss

Michael Gasser

Irina Schubert

Sylvie Béguelin

Basie Manti

Jérome Zbinden

Deborah Kyburz

Paul Varé

Laurel Zuckerman

Christian Sisi??

Adrien Zemma

Dominik Sievi

  • project/jung_rilke_correspondance_network.1505484465.txt.gz
  • Last modified: 2017/09/15 16:07
  • by wdparis2017