project:hds_out_of_the_box

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
project:hds_out_of_the_box [2016/07/04 12:47] – [Named Entity Recognition] joschneproject:hds_out_of_the_box [2016/07/04 15:12] (current) – [Exploring bibliographic enrichment with OpenRefine] pmau
Line 1: Line 1:
 ===== Historical Dictionary of Switzerland Out of the Box ===== ===== Historical Dictionary of Switzerland Out of the Box =====
  
-The [[http://www.hls-dhs-dss.ch/english.php|Historical Dictionary of Switzerland]] (HDS) is an academic reference work which documents ``the most important topics and objects of Swiss history from prehistory up to the present''+The [[http://www.hls-dhs-dss.ch/english.php|Historical Dictionary of Switzerland]] (HDS) is an academic reference work which documents //the most important topics and objects of Swiss history from prehistory up to the present//
  
-The HDS digital edition comprises about XXXX articles organized in 4 main headword groups: \\+The HDS digital edition comprises about 36.000 articles organized in 4 main headword groups: \\
 - Biographies,\\  - Biographies,\\ 
 - Families, \\ - Families, \\
-- Geographical Entities and \\+- Geographical entities and \\
 - Thematical contributions.  - Thematical contributions. 
  
-Beyond the encyclopaedic description of entities/concepts, each articles contains references to primary and secondary sources which supported authors when writing articles.+Beyond the encyclopaedic description of entities/concepts, each article contains references to primary and secondary sources which supported authors when writing articles.
  
  
Line 17: Line 17:
 We have the following data:\\ We have the following data:\\
  
-[[http://make.opendata.ch/wiki/data:glam_ch#metadata_of_the_historical_dictionary_of_switzerland|metadata information]] about HDS articles Historical Dictionary of Switzerland\\ + [[http://make.opendata.ch/wiki/data:glam_ch#metadata_of_the_historical_dictionary_of_switzerland|metadata information]] about HDS articles Historical Dictionary of Switzerland comprising:\\ 
-[[http://make.opendata.ch/wiki/data:glam_ch#journal_de_geneve_gazette_de_lausanne_1914|Le Temps digital archive]] for the year 1914\\ +  * bibliographic references of HDS articles\\ 
-- bibliographic references of HDS articles\\+  * article titles\\ 
 + [[http://make.opendata.ch/wiki/data:glam_ch#journal_de_geneve_gazette_de_lausanne_1914|Le Temps digital archive]] for the year 1914\\ 
      
 ===== Goals ===== ===== Goals =====
  
-Our projects revolve around **Linking the HDS to external data** and aim at:\\ +Our projects revolve around **linking the HDS to external data** and aim at:\\
- +
-** 1. Entity Linking towards HDS** +
- +
-The objective is to link named entity mentions discovered in historical Swiss newspapers to their correspondant HDS articles.+
  
-** 2. Exploring reference citation of HDS articles** +  - **Entity linking towards HDS**\\ The objective is to link named entity mentions discovered in historical Swiss newspapers to their correspondant HDS articles.\\
  
-The objective is to reconcile HDS bibliographic data with SwissBib.+  - **Exploring reference citation of HDS articles**\\ The objective is to reconcile HDS bibliographic data contained in articles with SwissBib.
  
  
Line 52: Line 50:
 - working with a more refined NER output which comprises information about name components (first, middle,last names)\\ - working with a more refined NER output which comprises information about name components (first, middle,last names)\\
  
 +=== Some statistics ===
 +In the 23.622 articles of the year 1914 in «Le Temps digital archive» we linked 90.603 entities pointing to 1.417 articles of the «Historical Dictionary of Switzerland». 
  
-=== Showing Linked Named Entities ===+{{:project:to15hds.png?500|}}\\ 
 + 
 + 
 +=== Web Interface ===
  
 We developed a simple web interface for searching in the corpus and displaying the texts with the links.\\ We developed a simple web interface for searching in the corpus and displaying the texts with the links.\\
Line 59: Line 62:
  
  
-{{:project:01home.png?direct&500|}}\\ 
 1. Home\\ 1. Home\\
- +{{:project:01home.png?direct&200|}}\\ 
-{{:project:02search.png?direct&500|}}\\+\\
 2. Search\\ 2. Search\\
 +{{:project:02search.png?direct&200|}}\\
 +\\
 +3. Article with links to HDS, Wikipedia and dbpedia\\
 +{{:project:03article.png?direct&200|}}\\
 +\\
  
-{{:project:03article.png?direct&500|}}\\ +=== Further works ===
-3. Article with links\\ +
- +
 Further works would include:\\ Further works would include:\\
 - evaluate and improve method.\\ - evaluate and improve method.\\
Line 75: Line 79:
  
  
-===== Bibliographic Enrichment =====+===== Bibliographic enrichment =====
  
 We work on the list of references in all articles of the HDS, with three goals: We work on the list of references in all articles of the HDS, with three goals:
-  - Finding all the sources which are cited in the HDS (several sources are cited multiple times). +  - Finding all the sources which are cited in the HDS (several sources are cited multiple times) ; 
-  - Link all the sources with the SwissBib catalog, if possible. +  - Link all the sources with the SwissBib catalog, if possible ; 
-  - Interactively explore the citation networks of the HDS.+  - Interactively explore the citation network of the HDS.
  
-The datasetlists of references in every HDS article:+The dataset comes from the HDS metadata. It contains lists of references in every HDS article:
  
 {{:project:hdsrefs.png?direct&500|}} {{:project:hdsrefs.png?direct&500|}}
Line 201: Line 205:
  
 (note that the parentheses around "RERO" have to be removed for the search URL to work). (note that the parentheses around "RERO" have to be removed for the search URL to work).
 +
 +=== Further works ===
 +This is only the first step of a more general work inside the HDS:\\
 + * identify precisely each notice in an article (ID attribute to generate)\\
 + * collect references with a separation by language\\
 + * clean and refine the collected data\\
 + * setup a querying workflow that keeps the ID of the matched target in a reference catalog\\
 + * replace each matching occurence in the HDS article by a reference to an external catalog\\
 +
 ===== Team ===== ===== Team =====
  
  • project/hds_out_of_the_box.1467629232.txt.gz
  • Last modified: 2016/07/04 12:47
  • by joschne