project:diplomatic_documents_and_swiss_newspapers_in_1914

This is an old revision of the document!


This project gathers two data sets: Diplomatic Documents of Switzerland and Le Temps Historical Archive for year 1914. Our aim is to find links between the two data sets to allow users to more easily search the corpora together. The project is composed by two parts:

  1. The Geographical Browser of the corpora. We extract all places from Dodis metadata and all places mentioned in each article of Le Temps, we then match documents and articles that refer to the same places and visualise them on a map for geographical browsing.
  2. The Text similarity search of the corpora. We train two models on the Le Temps corpus: Term Frequency Inverse Document Frequency and Latent Semantic Indexing with 25 topics. We then develop a web interface for text similarity search over the corpus and test it with Dodis summaries and full text documents.
  • Relevant documentation …
  • Blog or forum posts …
  • Tools you used …
  • project/diplomatic_documents_and_swiss_newspapers_in_1914.1425128622.txt.gz
  • Last modified: 2015/02/28 14:03
  • by yrochat