Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
project:hds_out_of_the_box [2016/07/02 16:15] – [Team] timtom | project:hds_out_of_the_box [2016/07/04 14:53] – [Data] pmau | ||
---|---|---|---|
Line 1: | Line 1: | ||
===== Historical Dictionary of Switzerland Out of the Box ===== | ===== Historical Dictionary of Switzerland Out of the Box ===== | ||
- | The [[http:// | + | The [[http:// |
- | The HDS digital edition comprises about XXXX articles organized in 4 main headword groups: \\ | + | The HDS digital edition comprises about 36.000 |
- Biographies, | - Biographies, | ||
- Families, \\ | - Families, \\ | ||
- | - Geographical | + | - Geographical |
- Thematical contributions. | - Thematical contributions. | ||
- | Beyond the encyclopaedic description of entities/ | + | Beyond the encyclopaedic description of entities/ |
Line 17: | Line 17: | ||
We have the following data:\\ | We have the following data:\\ | ||
- | - [[http:// | + | |
- | - [[http:// | + | * bibliographic references of HDS articles\\ |
- | - bibliographic references of HDS articles\\ | + | * article titles\\ |
+ | * [[http:// | ||
| | ||
===== Goals ===== | ===== Goals ===== | ||
Line 25: | Line 27: | ||
Our projects revolve around **Linking the HDS to external data** and aim at:\\ | Our projects revolve around **Linking the HDS to external data** and aim at:\\ | ||
- | ** 1. Entity | + | ** 1. Entity |
The objective is to link named entity mentions discovered in historical Swiss newspapers to their correspondant HDS articles. | The objective is to link named entity mentions discovered in historical Swiss newspapers to their correspondant HDS articles. | ||
Line 31: | Line 33: | ||
** 2. Exploring reference citation of HDS articles** | ** 2. Exploring reference citation of HDS articles** | ||
- | The objective is to reconcile HDS bibliographic data with SwissBib. | + | The objective is to reconcile HDS bibliographic data contained in articles |
Line 37: | Line 39: | ||
===== Named Entity Recognition ===== | ===== Named Entity Recognition ===== | ||
+ | We used web-services to annotate text with named entities:\\ | ||
+ | - Dandelion\\ | ||
+ | - Alchemy\\ | ||
+ | - OpenCalais \\ | ||
+ | |||
+ | |||
+ | {{: | ||
+ | |||
+ | Named entity mentions (persons and places) are matched against entity labels of HDS entries and directly linked when only one HDS entry exists. | ||
+ | |||
+ | Further developments would includes:\\ | ||
+ | - handling name variants, e.g. 'W.A. Mozart' | ||
+ | - real disambiguation by comparing the newspaper article context with the HDS article context (a first simple similarity could be tf-idf based)\\ | ||
+ | - working with a more refined NER output which comprises information about name components (first, middle,last names)\\ | ||
+ | |||
+ | === Some statistics === | ||
+ | In the 23.622 articles of the year 1914 in «Le Temps digital archive» we linked 90.603 entities pointing to 1.417 articles of the «Historical Dictionary of Switzerland». | ||
+ | |||
+ | {{: | ||
+ | |||
+ | |||
+ | === Web Interface === | ||
+ | |||
+ | We developed a simple web interface for searching in the corpus and displaying the texts with the links.\\ | ||
+ | It consists of 3 views: | ||
+ | |||
+ | |||
+ | 1. Home\\ | ||
+ | {{: | ||
+ | \\ | ||
+ | 2. Search\\ | ||
+ | {{: | ||
+ | \\ | ||
+ | 3. Article with links to HDS, Wikipedia and dbpedia\\ | ||
+ | {{: | ||
+ | \\ | ||
+ | |||
+ | === Further works === | ||
+ | Further works would include:\\ | ||
+ | - evaluate and improve method.\\ | ||
+ | - apply the method to the Historical Dictionary of Switzerland itself for internal linking.\\ | ||
Line 83: | Line 126: | ||
</ | </ | ||
- | Note that the above expression combines the <PUB> column (accessed through value) and the <AUT> column (containing the author' | + | Note that the above expression combines the <PUB> column (accessed through value) and the <AUT> column (containing the author' |
Swissbib queries can return Dublin Core, MARC XML or MARC in JSON format. Dublin core is the easiest to manipulate, but unfortunately it does not contain the entirety of the returned record. To access the full record, it is necessary to use either MARC XML or MARC JSON. | Swissbib queries can return Dublin Core, MARC XML or MARC in JSON format. Dublin core is the easiest to manipulate, but unfortunately it does not contain the entirety of the returned record. To access the full record, it is necessary to use either MARC XML or MARC JSON. | ||
Line 168: | Line 211: | ||
===== Team ===== | ===== Team ===== | ||
+ | * Pierre-Marie Aubertel | ||
+ | * Francesco Beretta | ||
* Giovanni Colavizza | * Giovanni Colavizza | ||
- | * Jonas Schneider | ||
* Maud Ehrmann | * Maud Ehrmann | ||
* [[https:// | * [[https:// | ||
- | * Pierre-Marie Aubertel | + | * Jonas Schneider |
- | * Francesco Beretta | + | |
+ | |||
+ | |||
| | ||
- | {{tag> | + | {{tag> |