
This is an old revision of the document!

We try to analyse bibliographical data using big data technology (flink, elasticsearch, metafacture).

- more info will follow…

- see also:

here a first sketch of what we're aiming at:

We use biographical metadata:

Swissbib bibliographical data

  • Catalog of all the Swiss University Libraries, the Swiss Nationallibrary, etc.
  • 960 Libraries / 23 repositories (Bibliotheksverbunde)
  • ca. 30 Mio records
  • MARC21 XML Format
  • → raw data stored in Mongo DB
  • → transformed and clustered data stored in CBS (central library system)


  • Institutional Repository der Universität Basel (Dokumentenserver, Open Access Publications)
  • ca. 50'000 records
  • JSON File


  • Digital Object Identifier (DOI) Registration Agency
  • ca. 90 Mio records (we only use 30 Mio)
  • JSON scraped from API
  • Dominique Blaser
  • Jean-Baptiste Genicot
  • Günter Hipler
  • Jacqueline Martinelli
  • Rémy Meja
  • Andrea Notroff
  • Sebastian Schüpbach
  • T
  • Silvia Witzig
  • project/big_data_analytics.1505566885.txt.gz
  • Last modified: 2017/09/16 15:01
  • by j.martinelli