Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
project:jung_rilke_correspondance_network [2017/09/15 15:28] – [Data] wdparis2017 | project:jung_rilke_correspondance_network [2017/09/17 14:08] – [Jung - Rilke Correspondance Network] mgasser | ||
---|---|---|---|
Line 1: | Line 1: | ||
===== Jung - Rilke Correspondance Network ===== | ===== Jung - Rilke Correspondance Network ===== | ||
- | (screenshots or sketches up here) | + | Joint project bringing together three separate projects: Rilke correspondance, |
- | Joint project bringing together three separate | + | Objectives: |
+ | * agree on a common metadata structure for correspondence datasets | ||
+ | * clean and enrich the existing datasets | ||
+ | * build a database that can can be used not just by these two projects | ||
+ | * experiment with existing visualization | ||
===== Data ===== | ===== Data ===== | ||
- | * List and link your actual and ideal data sources. | + | **ACTUAL INPUT DATA** |
- | ACTUAL | ||
* For Jung correspondance: | * For Jung correspondance: | ||
* For Rilke correspondance: | * For Rilke correspondance: | ||
Line 20: | Line 23: | ||
4) match senders and receivers to Wikidata where possible (Openrefine, | 4) match senders and receivers to Wikidata where possible (Openrefine, | ||
+ | **METADATA STRUCTURE** | ||
- | IDEAL | + | The follwing fields were included in the common basic data structure: |
- | DATA after cleaning: | + | sysID; callNo; titel; sender; senderID; recipient; recipientID; |
- | https:// | + | **DATA CLEANSING AND ENRICHMENT** |
* Description of steps, and issues, in Process (please correct and refine). | * Description of steps, and issues, in Process (please correct and refine). | ||
- | Objective: provide a framework for correspondance, | ||
- | Issues with the Jung correspondence is data quality. Sender and recipient in one column. | + | Issues with the Jung correspondence is data structure. Sender and recipient in one column. |
- | Data cleaning still needed. | + | Also dates need both cleaning for consistency |
- | Also dates need both cleaning for consistency and transformation to meet developper specs. (Basil using Perl) | + | |
- | Will look for Q with Open Refine, note them, and list names that need to be created in wikidata | + | For geocoding the placenames: OpenRefine was used for the normalization of the placenames |
- | Issues with the target database: | + | The C.G. Jung dataset contains sending locations information for 16,619 out of 32,127 letters; 10,271 places were georeferenced. In the Rilke dataset all the sending location were georeferenced. |
- | Fields defined, SQL databases and visuablisation program being evaluated. | + | |
- | How - and whether - to integrate with WIkidata still not clear. | + | |
- | Issues: letters are too detailed | + | For matching senders and recipients |
- | As IT guys are building the database to be used with the visualization tool, data is being cleaned and Q codes are being extracted. | + | Doing this all at once poses some project management challenges, since several people may be working on same files to clean different |
- | Doing this all at once poses some project management challenges, since several people may be working on same files to clean different data. Need to integrate all files. | + | DATA after cleaning: |
- | Additional issues encountered: Wikidata Q codes that Openrefine linked to seem to have disappeared? | + | https:// |
+ | **DATABASE** | ||
- | ===== Team ===== | + | Issues with the target database: |
+ | Fields defined, SQL databases and visuablisation program being evaluated. | ||
+ | How - and whether - to integrate with WIkidata still not clear. | ||
- | Please add yourself | + | Issues: letters are too detailed |
- | Flor Méchain (Wikimedia CH): working on cleaning and matching | + | As IT guys are building the database to be used with the visualization tool, data is being cleaned and Q codes are being extracted. |
+ | They took the cleaned CVS files, converted to SQL, then JSON. | ||
- | Lena Heizman (Doda): Mentoring with OpenRefine. | ||
- | Hugo Martin | ||
- | Samantha Weiss | + | Additional issues encountered: |
- | Michael Gasser | + | - Visualization: |
- | Irina Schubert | + | - Ensuring that the files from different projects respect same structure in final, cleaned-up versions. |
- | Sylvie Béguelin | ||
- | Basie Manti | ||
- | Jérome Zbinden | ||
- | Deborah Kyburz | ||
- | Paul Varé | ||
- | Laurel Zuckerman | ||
- | Christian Sisi?? | ||
- | Adrien Zemma | ||
- | Dominik Sievi | + | ===== Team ===== |
+ | |||
- | * [[user: | + | * Flor Méchain (Wikimedia CH): working on cleaning and matching with Wikidata Q codes using OpenRefine. |
+ | * Lena Heizman (Dodis / histHub): Mentoring with OpenRefine. | ||
+ | * Hugo Martin | ||
+ | * Samantha Weiss | ||
+ | * Michael Gasser (Archives, ETH Library): provider of the dataset [[https:// | ||
+ | * Irina Schubert | ||
+ | * Sylvie Béguelin | ||
+ | * Basil Marti | ||
+ | * Jérome Zbinden | ||
+ | * Deborah Kyburz | ||
+ | * Paul Varé | ||
+ | * Laurel Zuckerman | ||
+ | * Christiane Sibille (Dodis / histHub) | ||
+ | * Adrien Zemma | ||
+ | * Dominik Sievi [[user: | ||
| | ||
- | {{tag> | + | {{tag> |