project:jung_rilke_correspondance_network

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
project:jung_rilke_correspondance_network [2017/09/15 16:07] – [Data] wdparis2017project:jung_rilke_correspondance_network [2017/10/26 16:36] (current) – [Team] mgasser
Line 1: Line 1:
-===== Jung - Rilke Correspondance Network =====+===== Jung - Rilke Correspondence Networks =====
  
-(screenshots or sketches up here)+Joint project bringing together three separate projects: Rilke correspondance, Jung correspondance and ETH Library. 
  
-Joint project bringing together three separate projects: Rilke correspondanceJung correspondance and ETH Library. Include links to your demo and/or source code, relevant documentation, tools, etc.+Objectives:  
 +  * agree on a common metadata structure for correspondence datasets 
 +  * clean and enrich the existing datasets 
 +  * build a database that can can be used not just by these two projects but others as well, and that works well with visualisation software in order to see correspondance networks 
 +  * experiment with existing visualization tools
  
 ===== Data ===== ===== Data =====
  
-  *  List and link your actual and ideal data sources.+ **ACTUAL INPUT DATA** 
  
-ACTUAL 
   * For Jung correspondance: https://opendata.swiss/dataset/c-g-jung-correspondence (three files)   * For Jung correspondance: https://opendata.swiss/dataset/c-g-jung-correspondence (three files)
   * For Rilke correspondance: https://opendata.swiss/en/dataset/handschriften-rainer-maria-rilke (two files, images and meta data)   * For Rilke correspondance: https://opendata.swiss/en/dataset/handschriften-rainer-maria-rilke (two files, images and meta data)
Line 20: Line 23:
 4) match senders and receivers to Wikidata where possible (Openrefine, problem with volume) 4) match senders and receivers to Wikidata where possible (Openrefine, problem with volume)
  
 +**METADATA STRUCTURE**
  
-IDEAL+The follwing fields were included in the common basic data structure:
  
-DATA after cleaning:+sysID; callNo; titel; sender; senderID; recipient; recipientID; place; placeLat; placeLong; datefrom, dateto; language
  
-https://github.com/basimar/hackathon17_jungrilke+**DATA CLEANSING AND ENRICHMENT**
  
  
 * Description of steps, and issues, in Process (please correct and refine). * Description of steps, and issues, in Process (please correct and refine).
  
-Objective: provide a framework for correspondance, defining a database that can can be used not just by these two projects but others as well, and that works well with visualisation software in order to see correspondance networks. 
  
-Issues with the Jung correspondence is data quality. Sender and recipient in one column+Issues with the Jung correspondence is data structure. Sender and recipient in one column. 
-Data cleaning still needed+Also dates need both cleaning for consistency (e.g. removal of "ca."and transformation to meet developper specs. (Basil using Perl scripts)
-Also dates need both cleaning for consistency and transformation to meet developper specs. (Basil using Perl)+
  
-Will look for with Open Refinenote them, and list names that need to be created in wikidata for future use.+For geocoding the placenames: OpenRefine was used for the normalization of the placenames and DARIAH GeoBrowser for the actual geocoding (there were some issues with handling large files). Tests with OpenRefine in combination with Open Street View were done as well.  
 + 
 +The C.G. Jung dataset contains sending locations information for 16,619 out of 32,127 letters; 10,271 places were georeferenced. In the Rilke dataset all the sending location were georeferenced. 
 + 
 +For matching senders and recipients to Wikidata Q-codes, OpenRefine was used. Issues encountered with large files and with recovering Q codes after successful matching, as well as need of scholarly expertise to ID people without clear identification. Specialist knowledge needed. Wikidata Q codes that Openrefine linked to seem to have disappeared? Instructions on how to add the Q codes are here https://github.com/OpenRefine/OpenRefine/wiki/reconciliation. 
 + 
 +Doing this all at once poses some project management challenges, since several people may be working on same files to clean different data. Need to integrate all files.  
 + 
 +DATA after cleaning: 
 + 
 +https://github.com/basimar/hackathon17_jungrilke 
 + 
 +**DATABASE**
  
 Issues with the target database: Issues with the target database:
Line 42: Line 56:
 How - and whether - to integrate with WIkidata still not clear.  How - and whether - to integrate with WIkidata still not clear. 
  
-Issues: letters are too detailed to be Wikidata items, although it looks like the senders and recipients have the notability and networks to make it worthwhile. Trying to keep options open.+Issues: letters are too detailed to be imported as Wikidata items, although it looks like the senders and recipients have the notability and networks to make it worthwhile. Trying to keep options open.
  
 As IT guys are building the database to be used with the visualization tool, data is being cleaned and Q codes are being extracted. As IT guys are building the database to be used with the visualization tool, data is being cleaned and Q codes are being extracted.
 +They took the cleaned CVS files, converted to SQL, then JSON.
 +
  
-Doing this all at once poses some project management challenges, since several people may be working on same files to clean different data. Need to integrate all files. 
  
 Additional issues encountered:  Additional issues encountered: 
-- Wikidata Q codes that Openrefine linked to seem to have disappeared? Instructions on how to add the Q codes are here https://github.com/OpenRefine/OpenRefine/wiki/reconciliation. 
-- The second file, with over 16,000 lines, appears to be too big for Openrefine to match with Q codes. Proposed solution: split it into several files. 
  
 +- Visualization: three tools are being tested: 1) Paladio (Stanford) concerns about limits on large files? 2) Viseyes and 3) Gephi.
  
 +- Ensuring that the files from different projects respect same structure in final, cleaned-up versions. 
  
-===== Team ===== 
  
-Please add yourself to the list 
  
-Flor Méchain (Wikimedia CH): working on cleaning and matching with Wikidata Q codes using OpenRefine. 
  
-Lena Heizman (Doda): Mentoring with OpenRefine. 
  
-Hugo Martin 
  
-Samantha Weiss 
  
-Michael Gasser 
  
-Irina Schubert 
  
-Sylvie Béguelin+===== Visualization (examples) =====
  
-Basie Manti+{{:project:rilke_heatmap.jpg?700|}} 
  
-Jérome Zbinden+Heatmap of Rainer Maria Rilke’s correspondence (visualized with Google Fusion Tables)
  
-Deborah Kyburz 
  
-Paul Varé+{{:project:jung_corr_gephi.jpg?700|}}
  
-Laurel Zuckerman+Correspondence from and to C. G. Jung visualized as a network. The two large nodes are Carl Gustav Jung (below) and his secretary’s office (above). Visualized with the tool Gephi 
 +===== Video of the presentation ===== 
 +{{vimeo>234627486?medium}} 
 +   
 +{{tag>status:concept needs:dev needs:design needs:data needs:expert glam}}
  
-Christian Sisi?? 
  
-Adrien Zemma+===== Team =====
  
-Dominik Sievi 
  
-  * [[user:wdparis2017]] + 
-   +  Flor Méchain (Wikimedia CH): working on cleaning and matching with Wikidata Q codes using OpenRefine. 
-{{tag>status:concept needs:dev needs:design needs:data needs:expert}}+  * Lena Heizman (Dodis / histHub): Mentoring with OpenRefine. 
 +  * Hugo Martin 
 +  * Samantha Weiss 
 +  * Michael Gasser (Archives, ETH Library): provider of the dataset [[https://opendata.swiss/en/dataset/c-g-jung-correspondence|C. G. Jung correspondence]] 
 +  * Irina Schubert 
 +  * Sylvie Béguelin 
 +  * Basil Marti 
 +  * Jérome Zbinden 
 +  * Deborah Kyburz 
 +  * Paul Varé 
 +  * Laurel Zuckerman 
 +  * Christiane Sibille (Dodis / histHub)  
 +  * Adrien Zemma  
 +  * Dominik Sievi [[user:wdparis2017]] 
  
  • project/jung_rilke_correspondance_network.1505484465.txt.gz
  • Last modified: 2017/09/15 16:07
  • by wdparis2017