Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
project:big_data_analytics [2017/09/16 15:14] – [Team] j.martinelli | project:big_data_analytics [2017/09/18 10:12] (current) – Correct a few typos andrea | ||
---|---|---|---|
Line 3: | Line 3: | ||
We try to analyse bibliographical data using big data technology (flink, elasticsearch, | We try to analyse bibliographical data using big data technology (flink, elasticsearch, | ||
- | - more info will follow... | + | Here a first sketch of what we're aiming at: |
- | + | ||
- | - see also: | + | |
- | - https:// | + | |
- | - https:// | + | |
- | + | ||
- | here a first sketch of what we're aiming at: | + | |
{{: | {{: | ||
Line 15: | Line 9: | ||
===== Datasets ===== | ===== Datasets ===== | ||
- | We use biographical | + | We use bibliographical |
**Swissbib bibliographical data** [[https:// | **Swissbib bibliographical data** [[https:// | ||
- | * Catalog of all the Swiss University Libraries, the Swiss Nationallibrary, etc. | + | * Catalog of all the Swiss University Libraries, the Swiss National Library, etc. |
* 960 Libraries / 23 repositories (Bibliotheksverbunde) | * 960 Libraries / 23 repositories (Bibliotheksverbunde) | ||
* ca. 30 Mio records | * ca. 30 Mio records | ||
Line 36: | Line 30: | ||
* JSON scraped from API | * JSON scraped from API | ||
- | ===== Usecases | + | ===== Use Cases ===== |
=== Swissbib === | === Swissbib === | ||
Line 50: | Line 44: | ||
__Data analyst__: | __Data analyst__: | ||
- | - I wan‘t | + | - I want to get to know better my data. And be faster. |
- | → e.g. I want to know which records don‘t have any entry for ‚year of publication‘. I want to analyse, if these records should be sent through the merging process of CBS. There fore I also want to know, if these records contain other ‚relevant‘ fields, | + | → e.g. I want to know which records don‘t have any entry for ‚year of publication‘. I want to analyze, if these records should be sent through the merging process of CBS. Therefore |
=== edoc === | === edoc === | ||
- | Goal: Enrichment. I want to add missing | + | Goal: Enrichment. I want to add missing |
→ Match the two datasets by author and title | → Match the two datasets by author and title | ||
Line 66: | Line 60: | ||
**elasticsearch** [[https:// | **elasticsearch** [[https:// | ||
- | JAVA based searchengine, results exported in JSON | + | JAVA based search engine, results exported in JSON |
**Flink** [[https:// | **Flink** [[https:// | ||
Line 80: | Line 74: | ||
Visualisation of the results | Visualisation of the results | ||
+ | |||
+ | ===== How to get there ===== | ||
+ | |||
+ | === Usecase 1: Swissbib === | ||
+ | |||
+ | {{: | ||
+ | |||
+ | === Usecase 2: edoc === | ||
+ | |||
+ | {{: | ||
===== Links ===== | ===== Links ===== | ||
Line 98: | Line 102: | ||
| | ||
- | {{tag> | + | {{tag> |