project:jung_rilke_correspondance_network

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revisionBoth sides next revision
project:jung_rilke_correspondance_network [2017/09/15 15:25] – [Data] wdparis2017project:jung_rilke_correspondance_network [2017/09/17 13:04] – [Jung - Rilke Correspondance Network] mgasser
Line 1: Line 1:
 ===== Jung - Rilke Correspondance Network ===== ===== Jung - Rilke Correspondance Network =====
  
-(screenshots or sketches up here)+Joint project bringing together three separate projects: Rilke correspondance, Jung correspondance and ETH Library. 
  
-Joint project bringing together three separate projects: Rilke correspondanceJung correspondance and ETH Library. Include links to your demo and/or source code, relevant documentation, tools, etc.+Objectives:  
 +  * agree on a common metadata structure for correspondence datasets 
 +  * clean and enrich the existing existing datasets 
 +  * build a database that can can be used not just by these two projects but others as well, and that works well with visualisation software in order to see correspondance networks 
 +  * experiment with existing visualization tools
  
 ===== Data ===== ===== Data =====
Line 30: Line 34:
 * Description of steps, and issues, in Process (please correct and refine). * Description of steps, and issues, in Process (please correct and refine).
  
-Objective: provide a framework for correspondance, defining a database that can can be used not just by these two projects but others as well, and that works well with visualisation software 
  
 Issues with the Jung correspondence is data quality. Sender and recipient in one column. Issues with the Jung correspondence is data quality. Sender and recipient in one column.
 Data cleaning still needed. Data cleaning still needed.
 Also dates need both cleaning for consistency and transformation to meet developper specs. (Basil using Perl) Also dates need both cleaning for consistency and transformation to meet developper specs. (Basil using Perl)
 +For Geolocators, first Dariah-DE was tried but it did not seem to be able to handle the large file. Switched to Open Street View.
  
-Will look for with Open Refine, note them, and list names that need to be created in wikidata for future use.+For matching senders and recipients to Wikidata codesOpenRefine was used. Issues encountered with large file (second file) and with recovering Q codes after successful matching, as well as need of scholarly expertise to ID people without clear identification.
  
 Issues with the target database: Issues with the target database:
Line 45: Line 49:
  
 As IT guys are building the database to be used with the visualization tool, data is being cleaned and Q codes are being extracted. As IT guys are building the database to be used with the visualization tool, data is being cleaned and Q codes are being extracted.
 +They took the cleaned CVS files, converted to SQL, then JSON.
 +
 +Doing this all at once poses some project management challenges, since several people may be working on same files to clean different data. Need to integrate all files.
 +
 +Additional issues encountered: 
 +- Wikidata Q codes that Openrefine linked to seem to have disappeared? Instructions on how to add the Q codes are here https://github.com/OpenRefine/OpenRefine/wiki/reconciliation.
 +
 +- The second file, with over 16,000 lines, appears to be too big for Openrefine to match with Q codes. Proposed solution: split it into several files. (Attempt to solve this by increasing RAM alloted to OpenRefine in ini file)
 +
 +- Visualization: three tools are being tested: 1) Paladio (Stanford) concerns about limits on large files? 2) Viseyes and 3) Gephi.
 +
 +- Ensuring that the files from different projects respect same structure in final, cleaned-up versions. 
 +
 +- Need for scholar to decide which of Openrefine's Q proposals is correct. Specialist knowledge needed.
 +
 +
  
-Doing this all at once poses some project management challenges. 
  
-Additional issues encountered: Wikidata Q codes that Openrefine linked to seem to have disappeared? 
  
  
Line 57: Line 75:
 Flor Méchain (Wikimedia CH): working on cleaning and matching with Wikidata Q codes using OpenRefine. Flor Méchain (Wikimedia CH): working on cleaning and matching with Wikidata Q codes using OpenRefine.
  
-Lena Heizman (Doda): Mentoring with OpenRefine.+Lena Heizman (Dodis / histHub): Mentoring with OpenRefine.
  
 Hugo Martin Hugo Martin
Line 63: Line 81:
 Samantha Weiss Samantha Weiss
  
-Michael Gasser+Michael Gasser (Archives, ETH Library): provider of the dataset [[https://opendata.swiss/en/dataset/c-g-jung-correspondence|C. G. Jung correspondence]]
  
 Irina Schubert Irina Schubert
Line 79: Line 97:
 Laurel Zuckerman Laurel Zuckerman
  
-Christian Sisi??+Christiane Sibille (Dodis / histHub)
  
 Adrien Zemma Adrien Zemma
Line 87: Line 105:
   * [[user:wdparis2017]]   * [[user:wdparis2017]]
      
-{{tag>status:concept needs:dev needs:design needs:data needs:expert}}+{{tag>status:concept needs:dev needs:design needs:data needs:expert glam}}
  
  • project/jung_rilke_correspondance_network.txt
  • Last modified: 2017/10/26 16:36
  • by mgasser