Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
project:chparlscraping [2015/09/04 12:10] – yrochat | project:chparlscraping [2015/09/07 16:23] (current) – yrochat | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== Swiss parliament minutes scraping | + | ==== Swiss parliament minutes scraping ==== |
- | In this project, we are planning to: | + | [[http:// |
- | - Scrape | + | Is the Swiss parliament |
- | - Structure them. | + | |
- | - Analysis (person vs. vocabulary, dialogue order, gender). | + | |
- | Our [[https:// | + | Goal of this project is to answer some of these questions and many more. To do this, we are planning to: |
- | ===== Data ===== | + | - Scrape the parliament website in order to retrieve councilors bio, topics discussed and minutes |
+ | - Structure them in session, intervention (with rank/ | ||
- | * Soon ! | + | In order to perform some analysis, as |
+ | - from keywords, who talks about what, by parties, cantons and people | ||
+ | - person vs. vocabulary | ||
+ | - dialogue order | ||
+ | - gender | ||
+ | Our NEW [[https:// | ||
+ | |||
+ | === Data === | ||
+ | Raw data are available as one single JSON file, and its .csv counterpart. We had size problems, thus exploring ways to produce several .csv. | ||
+ | **The [[https:// | ||
+ | |||
+ | == Structure of the main JSON (on Giovanni' | ||
+ | |||
+ | list of interventions, | ||
+ | |||
+ | * Link_subject: | ||
+ | * Surname: of the person speaking | ||
+ | * Description: | ||
+ | * Bio: link to the page of the person speaking | ||
+ | * Canton: of provenance | ||
+ | * Subject_id: of the subject under discussion (still to be understood: typology) | ||
+ | * Date: of the intervention (DD.MM.YY) | ||
+ | * Group: political group of the person speaking at the moment of the intervention | ||
+ | * Session_title: | ||
+ | * Data: transcript of the intervention | ||
+ | * Name: of the person speaking | ||
+ | |||
+ | == Structure of the Parliament API data via Yannick == | ||
+ | |||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[https:// | ||
+ | * From those data, building of a file (light one) describing members of parliament ([[https:// | ||
+ | |||
+ | Fields are: id cantonName council firstName lastName party active birthDate gender language maritalStatus militaryGrade partyId salutationLetter workLanguage | ||
+ | |||
+ | == Structure of the final files from Jérémie == | ||
+ | |||
+ | 1 JSON file + its .csv counterpart for each Parliament session of the National Council from 1995. | ||
+ | The same datasets are also available split as one JSON/CSV file per legislative session. | ||
+ | |||
+ | == Graph data for Yannick == | ||
+ | |||
+ | graph.csv: edgeless with Source (bio url as id of person replying to) - Destination (bio url as id of person talking before) - Subject (id of subject under discussion) - Date (of intervention, | ||
+ | nodes.csv: nodelist with bio id - name - surname - canton - political group | ||
+ | |||
+ | === Results visualization === | ||
+ | |||
+ | * Kibana Dashboard iframe: | ||
+ | < | ||
+ | |||
+ | * [[http:// | ||
+ | <pic sylvain> | ||
+ | |||
+ | * Example viz graph "who talks to who": | ||
+ | <pic yannick> | ||
+ | |||
+ | * Semantic distance between members of parliament: <viz pa> | ||
+ | |||
+ | * A simple gender gap visualization for the current Parliament that kind of summarizes it all: < | ||
===== Team ===== | ===== Team ===== | ||
+ | * Giovanni Colavizza, [[https:// | ||
+ | * Pierre-Alexandre Fonta [[https:// | ||
+ | * [[http:// | ||
+ | * Fabrice Hong, [[https:// | ||
+ | * Jan Iwaszkiewicz, | ||
+ | * Jérémie Knüsel [[https:// | ||
+ | * Sylvain Moesching | ||
+ | * [[user: | ||
+ | * [[http:// | ||
+ | * Douglas Watson, [[https:// | ||
- | * Giovanni Colavizza | ||
- | * Pierre-Alexandre Fonta | ||
- | * [[user: | ||
- | * Fabrice Hong | ||
- | * Jérémie Knüsel | ||
- | * Sylvain Moeshing | ||
- | * Nicolas Ray | ||
- | * [[user: | ||
- | * Douglas Watson | ||
===== Links ===== | ===== Links ===== | ||
- | + | * [[http:// | |
- | * Soon ! | + | * [[http:// |
+ | * [[http:// | ||
| | ||
- | {{tag> | + | {{tag> |