Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
project:big_data_analytics [2017/09/16 17:50] – waddell | project:big_data_analytics [2017/09/18 10:12] (current) – Correct a few typos andrea | ||
---|---|---|---|
Line 3: | Line 3: | ||
We try to analyse bibliographical data using big data technology (flink, elasticsearch, | We try to analyse bibliographical data using big data technology (flink, elasticsearch, | ||
- | - more info will follow... | + | Here a first sketch of what we're aiming at: |
- | + | ||
- | - see also: | + | |
- | - https:// | + | |
- | - https:// | + | |
- | + | ||
- | here a first sketch of what we're aiming at: | + | |
{{: | {{: | ||
Line 15: | Line 9: | ||
===== Datasets ===== | ===== Datasets ===== | ||
- | We use biographical | + | We use bibliographical |
**Swissbib bibliographical data** [[https:// | **Swissbib bibliographical data** [[https:// | ||
- | * Catalog of all the Swiss University Libraries, the Swiss Nationallibrary, etc. | + | * Catalog of all the Swiss University Libraries, the Swiss National Library, etc. |
* 960 Libraries / 23 repositories (Bibliotheksverbunde) | * 960 Libraries / 23 repositories (Bibliotheksverbunde) | ||
* ca. 30 Mio records | * ca. 30 Mio records | ||
Line 36: | Line 30: | ||
* JSON scraped from API | * JSON scraped from API | ||
- | ===== Usecases | + | ===== Use Cases ===== |
=== Swissbib === | === Swissbib === | ||
Line 50: | Line 44: | ||
__Data analyst__: | __Data analyst__: | ||
- | - I wan‘t | + | - I want to get to know better my data. And be faster. |
- | → e.g. I want to know which records don‘t have any entry for ‚year of publication‘. I want to analyse, if these records should be sent through the merging process of CBS. There fore I also want to know, if these records contain other ‚relevant‘ fields, | + | → e.g. I want to know which records don‘t have any entry for ‚year of publication‘. I want to analyze, if these records should be sent through the merging process of CBS. Therefore |
=== edoc === | === edoc === | ||
Line 66: | Line 60: | ||
**elasticsearch** [[https:// | **elasticsearch** [[https:// | ||
- | JAVA based searchengine, results exported in JSON | + | JAVA based search engine, results exported in JSON |
**Flink** [[https:// | **Flink** [[https:// |