Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
project:jung_rilke_correspondance_network [2017/09/15 14:36] – [Data] wdparis2017 | project:jung_rilke_correspondance_network [2017/09/17 13:04] – [Jung - Rilke Correspondance Network] mgasser | ||
---|---|---|---|
Line 1: | Line 1: | ||
===== Jung - Rilke Correspondance Network ===== | ===== Jung - Rilke Correspondance Network ===== | ||
- | (screenshots or sketches up here) | + | Joint project bringing together three separate projects: Rilke correspondance, |
- | Joint project bringing together three separate | + | Objectives: |
+ | * agree on a common metadata structure for correspondence datasets | ||
+ | * clean and enrich the existing existing datasets | ||
+ | * build a database that can can be used not just by these two projects | ||
+ | * experiment with existing visualization | ||
===== Data ===== | ===== Data ===== | ||
Line 30: | Line 34: | ||
* Description of steps, and issues, in Process (please correct and refine). | * Description of steps, and issues, in Process (please correct and refine). | ||
- | Objective: provide a framework for correspondance, | ||
Issues with the Jung correspondence is data quality. Sender and recipient in one column. | Issues with the Jung correspondence is data quality. Sender and recipient in one column. | ||
Data cleaning still needed. | Data cleaning still needed. | ||
Also dates need both cleaning for consistency and transformation to meet developper specs. (Basil using Perl) | Also dates need both cleaning for consistency and transformation to meet developper specs. (Basil using Perl) | ||
+ | For Geolocators, | ||
- | Will look for Q with Open Refine, note them, and list names that need to be created in wikidata for future use. | + | For matching senders and recipients to Wikidata |
Issues with the target database: | Issues with the target database: | ||
Line 45: | Line 49: | ||
As IT guys are building the database to be used with the visualization tool, data is being cleaned and Q codes are being extracted. | As IT guys are building the database to be used with the visualization tool, data is being cleaned and Q codes are being extracted. | ||
+ | They took the cleaned CVS files, converted to SQL, then JSON. | ||
+ | |||
+ | Doing this all at once poses some project management challenges, since several people may be working on same files to clean different data. Need to integrate all files. | ||
+ | |||
+ | Additional issues encountered: | ||
+ | - Wikidata Q codes that Openrefine linked to seem to have disappeared? | ||
+ | |||
+ | - The second file, with over 16,000 lines, appears to be too big for Openrefine to match with Q codes. Proposed solution: split it into several files. (Attempt to solve this by increasing RAM alloted to OpenRefine in ini file) | ||
+ | |||
+ | - Visualization: | ||
+ | |||
+ | - Ensuring that the files from different projects respect same structure in final, cleaned-up versions. | ||
+ | |||
+ | - Need for scholar to decide which of Openrefine' | ||
+ | |||
+ | |||
+ | |||
- | Doing this all at once poses some project management challenges. | ||
Line 55: | Line 75: | ||
Flor Méchain (Wikimedia CH): working on cleaning and matching with Wikidata Q codes using OpenRefine. | Flor Méchain (Wikimedia CH): working on cleaning and matching with Wikidata Q codes using OpenRefine. | ||
- | Lena Heizman (Doda): Mentoring with OpenRefine. | + | Lena Heizman (Dodis / histHub): Mentoring with OpenRefine. |
Hugo Martin | Hugo Martin | ||
Line 61: | Line 81: | ||
Samantha Weiss | Samantha Weiss | ||
- | Michael Gasser | + | Michael Gasser |
Irina Schubert | Irina Schubert | ||
Line 77: | Line 97: | ||
Laurel Zuckerman | Laurel Zuckerman | ||
- | Christian Sisi?? | + | Christiane Sibille (Dodis / histHub) |
Adrien Zemma | Adrien Zemma | ||
Line 85: | Line 105: | ||
* [[user: | * [[user: | ||
| | ||
- | {{tag> | + | {{tag> |