project:big_data_analytics

This is an old revision of the document!


We try to analyse bibliographical data using big data technology (flink, elasticsearch, metafacture).

- more info will follow…

- see also:

here a first sketch of what we're aiming at:

We use biographical metadata:

Swissbib bibliographical data https://www.swissbib.ch/

  • Catalog of all the Swiss University Libraries, the Swiss Nationallibrary, etc.
  • 960 Libraries / 23 repositories (Bibliotheksverbunde)
  • ca. 30 Mio records
  • MARC21 XML Format
  • → raw data stored in Mongo DB
  • → transformed and clustered data stored in CBS (central library system)

edoc http://edoc.unibas.ch/

  • Institutional Repository der Universität Basel (Dokumentenserver, Open Access Publications)
  • ca. 50'000 records
  • JSON File

crossref https://www.crossref.org/

  • Digital Object Identifier (DOI) Registration Agency
  • ca. 90 Mio records (we only use 30 Mio)
  • JSON scraped from API

Swissbib

  • Dominique Blaser
  • Jean-Baptiste Genicot
  • Günter Hipler
  • Jacqueline Martinelli
  • Rémy Meja
  • Andrea Notroff
  • Sebastian Schüpbach
  • T
  • Silvia Witzig
  • project/big_data_analytics.1505567001.txt.gz
  • Last modified: 2017/09/16 15:03
  • by j.martinelli