Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| project:big_data_analytics [2017/09/16 15:13] – [Tools] j.martinelli | project:big_data_analytics [2017/09/18 10:12] (current) – Correct a few typos andrea | ||
|---|---|---|---|
| Line 3: | Line 3: | ||
| We try to analyse bibliographical data using big data technology (flink, elasticsearch, | We try to analyse bibliographical data using big data technology (flink, elasticsearch, | ||
| - | - more info will follow... | + | Here a first sketch of what we're aiming at: |
| - | + | ||
| - | - see also: | + | |
| - | - https:// | + | |
| - | - https:// | + | |
| - | + | ||
| - | here a first sketch of what we're aiming at: | + | |
| {{: | {{: | ||
| Line 15: | Line 9: | ||
| ===== Datasets ===== | ===== Datasets ===== | ||
| - | We use biographical | + | We use bibliographical |
| **Swissbib bibliographical data** [[https:// | **Swissbib bibliographical data** [[https:// | ||
| - | * Catalog of all the Swiss University Libraries, the Swiss Nationallibrary, etc. | + | * Catalog of all the Swiss University Libraries, the Swiss National Library, etc. |
| * 960 Libraries / 23 repositories (Bibliotheksverbunde) | * 960 Libraries / 23 repositories (Bibliotheksverbunde) | ||
| * ca. 30 Mio records | * ca. 30 Mio records | ||
| Line 36: | Line 30: | ||
| * JSON scraped from API | * JSON scraped from API | ||
| - | ===== Usecases | + | ===== Use Cases ===== |
| === Swissbib === | === Swissbib === | ||
| Line 50: | Line 44: | ||
| __Data analyst__: | __Data analyst__: | ||
| - | - I wan‘t | + | - I want to get to know better my data. And be faster. |
| - | → e.g. I want to know which records don‘t have any entry for ‚year of publication‘. I want to analyse, if these records should be sent through the merging process of CBS. There fore I also want to know, if these records contain other ‚relevant‘ fields, | + | → e.g. I want to know which records don‘t have any entry for ‚year of publication‘. I want to analyze, if these records should be sent through the merging process of CBS. Therefore |
| === edoc === | === edoc === | ||
| - | Goal: Enrichment. I want to add missing | + | Goal: Enrichment. I want to add missing |
| → Match the two datasets by author and title | → Match the two datasets by author and title | ||
| Line 66: | Line 60: | ||
| **elasticsearch** [[https:// | **elasticsearch** [[https:// | ||
| - | JAVA based searchengine, results exported in JSON | + | JAVA based search engine, results exported in JSON |
| **Flink** [[https:// | **Flink** [[https:// | ||
| Line 80: | Line 74: | ||
| Visualisation of the results | Visualisation of the results | ||
| + | |||
| + | ===== How to get there ===== | ||
| + | |||
| + | === Usecase 1: Swissbib === | ||
| + | |||
| + | {{: | ||
| + | |||
| + | === Usecase 2: edoc === | ||
| + | |||
| + | {{: | ||
| + | ===== Links ===== | ||
| + | |||
| + | Data Ramblers Project Wiki [[https:// | ||
| + | |||
| ===== Team ===== | ===== Team ===== | ||
| Line 94: | Line 102: | ||
| | | ||
| - | {{tag> | + | {{tag> |