Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revisionLast revisionBoth sides next revision | ||
project:jung_rilke_correspondance_network [2017/09/15 11:49] – [Data] wdparis2017 | project:jung_rilke_correspondance_network [2017/10/26 16:23] – [Video of the presentation] mgasser | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== Jung - Rilke Correspondance Network | + | ===== Jung - Rilke Correspondence Networks |
- | (screenshots or sketches up here) | + | Joint project bringing together three separate projects: Rilke correspondance, |
- | Joint project bringing together three separate | + | Objectives: |
+ | * agree on a common metadata structure for correspondence datasets | ||
+ | * clean and enrich the existing datasets | ||
+ | * build a database that can can be used not just by these two projects | ||
+ | * experiment with existing visualization | ||
===== Data ===== | ===== Data ===== | ||
- | * List and link your actual and ideal data sources. | + | **ACTUAL INPUT DATA** |
- | ACTUAL | + | |
- | | + | * For Rilke correspondance: |
- | * For Rilke correspondance: | + | |
* | * | ||
- | Comment: The Rilke data is cleaner than the Jung data. Some cleaning needed to make them match. | + | Comment: The Rilke data is cleaner than the Jung data. Some cleaning needed to make them match: |
- | IDEAL | + | 1) separate sender and receiver; clean up and cluster (OpenRefine) |
- | Jung: comments. Needs to be cleaned up. Names of sender | + | 2) clean up dates and put in a format that IT developpers need (Perl) |
- | To do: extract place. | + | 3) clean up placenames and match to geolocators (Dariah-DE) |
+ | 4) match senders and receivers to Wikidata where possible (Openrefine, | ||
+ | |||
+ | **METADATA STRUCTURE** | ||
+ | |||
+ | The follwing fields were included in the common basic data structure: | ||
+ | |||
+ | sysID; callNo; titel; sender; senderID; recipient; recipientID; | ||
+ | |||
+ | **DATA CLEANSING AND ENRICHMENT** | ||
+ | |||
+ | |||
+ | * Description of steps, and issues, in Process (please correct and refine). | ||
+ | |||
+ | |||
+ | Issues with the Jung correspondence is data structure. Sender and recipient in one column. | ||
+ | Also dates need both cleaning for consistency (e.g. removal of " | ||
+ | |||
+ | For geocoding the placenames: OpenRefine was used for the normalization | ||
+ | |||
+ | The C.G. Jung dataset contains sending locations information for 16,619 out of 32,127 letters; 10,271 places were georeferenced. In the Rilke dataset all the sending location were georeferenced. | ||
+ | |||
+ | For matching senders and recipients to Wikidata Q-codes, OpenRefine was used. Issues encountered with large files and with recovering Q codes after successful matching, as well as need of scholarly expertise | ||
+ | |||
+ | Doing this all at once poses some project management challenges, since several people may be working on same files to clean different data. Need to integrate all files. | ||
+ | |||
+ | DATA after cleaning: | ||
+ | |||
+ | https:// | ||
+ | |||
+ | **DATABASE** | ||
+ | |||
+ | Issues | ||
+ | Fields defined, SQL databases and visuablisation program being evaluated. | ||
+ | How - and whether - to integrate with WIkidata still not clear. | ||
+ | |||
+ | Issues: letters are too detailed to be imported as Wikidata | ||
+ | |||
+ | As IT guys are building the database to be used with the visualization tool, data is being cleaned and Q codes are being extracted. | ||
+ | They took the cleaned CVS files, converted to SQL, then JSON. | ||
+ | |||
+ | |||
+ | |||
+ | Additional issues encountered: | ||
+ | |||
+ | - Visualization: | ||
+ | |||
+ | - Ensuring that the files from different projects respect same structure in final, cleaned-up versions. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
===== Team ===== | ===== Team ===== | ||
- | Please add yourself to the list | ||
- | Flor Méchain (Wikimedia CH): working on cleaning and matching with Wikidata Q codes using OpenRefine. | ||
- | Lena Heizman (Doda): Mentoring with OpenRefine. | + | * Flor Méchain (Wikimedia CH): working on cleaning and matching with Wikidata Q codes using OpenRefine. |
+ | * Lena Heizman (Dodis / histHub): Mentoring with OpenRefine. | ||
+ | * Hugo Martin | ||
+ | * Samantha Weiss | ||
+ | * Michael Gasser (Archives, ETH Library): provider of the dataset [[https:// | ||
+ | * Irina Schubert | ||
+ | * Sylvie Béguelin | ||
+ | * Basil Marti | ||
+ | * Jérome Zbinden | ||
+ | * Deborah Kyburz | ||
+ | * Paul Varé | ||
+ | * Laurel Zuckerman | ||
+ | * Christiane Sibille (Dodis / histHub) | ||
+ | * Adrien Zemma | ||
+ | * Dominik Sievi [[user: | ||
- | * [[user: | + | ===== Video of the presentation ===== |
+ | {{vimeo> | ||
| | ||
- | {{tag> | + | {{tag> |
+ | |||
+ | |||
+ | ===== Team ===== | ||
+ | |||
+ | |||
+ | |||
+ | * Flor Méchain (Wikimedia CH): working on cleaning and matching with Wikidata Q codes using OpenRefine. | ||
+ | * Lena Heizman (Dodis / histHub): Mentoring with OpenRefine. | ||
+ | * Hugo Martin | ||
+ | * Samantha Weiss | ||
+ | * Michael Gasser (Archives, ETH Library): provider of the dataset [[https:// | ||
+ | * Irina Schubert | ||
+ | * Sylvie Béguelin | ||
+ | * Basil Marti | ||
+ | * Jérome Zbinden | ||
+ | * Deborah Kyburz | ||
+ | * Paul Varé | ||
+ | * Laurel Zuckerman | ||
+ | * Christiane Sibille (Dodis / histHub) | ||
+ | * Adrien Zemma | ||
+ | * Dominik Sievi [[user: | ||