Both sides previous revision Previous revision Next revision | Previous revision |
project:chparlscraping [2015/09/05 17:20] – shalf | project:chparlscraping [2015/09/07 16:23] (current) – yrochat |
---|
==== Swiss parliament minutes scraping ==== | ==== Swiss parliament minutes scraping ==== |
Is the Swiss parliament really useful ? Once elected, what are our councilors talking about ? Who is answering to who ? | |
| [[http://parlement.letemps.ch]] |
| |
| Is the Swiss parliament really useful ? Once elected, what are our councilors talking about ? Who is answering to whom ? |
| |
Goal of this project is to answer some of these questions and many more. To do this, we are planning to: | Goal of this project is to answer some of these questions and many more. To do this, we are planning to: |
| |
=== Data === | === Data === |
Raw data are available as one single JSON file, and its .cvs counterpart. We had size problems, thus exploring ways to produce several .csv | Raw data are available as one single JSON file, and its .csv counterpart. We had size problems, thus exploring ways to produce several .csv. |
**The [[https://github.com/douglas-watson/parl-scraping/tree/master/data|final folder for our data is on github]], the [[https://github.com/douglas-watson/parl-scraping/blob/master/data/merged-csv.zip|.csv files are here]].** | **The [[https://github.com/douglas-watson/parl-scraping/tree/master/data|final folder for our data is on github]], the [[https://github.com/douglas-watson/parl-scraping/blob/master/data/with-bio-split-csv.zip|.csv files split by session are here]].** |
| |
== Structure of the main JSON (on Giovanni's side, complement with bio data from Jeremie) == | == Structure of the main JSON (on Giovanni's side, complement with the bio data from the Parliament API) == |
| |
list of interventions, with the following fields: | list of interventions, with the following fields: |
* Data: transcript of the intervention | * Data: transcript of the intervention |
* Name: of the person speaking | * Name: of the person speaking |
| |
== Graph data for Yannick == | |
| |
graph.csv: edgeless with Source (bio url as id of person replying to) - Destination (bio url as id of person talking before) - Subject (id of subject under discussion) - Date (of intervention, YY.MM.DD) | |
nodes.csv: nodelist with bio id - name - surname - canton - political group | |
| |
== Structure of the Parliament API data via Yannick == | == Structure of the Parliament API data via Yannick == |
== Structure of the final files from Jérémie == | == Structure of the final files from Jérémie == |
| |
1 JSON file + its .csv counterpart for each Parliament session from 1995. | 1 JSON file + its .csv counterpart for each Parliament session of the National Council from 1995. |
| The same datasets are also available split as one JSON/CSV file per legislative session. |
| |
| == Graph data for Yannick == |
| |
| graph.csv: edgeless with Source (bio url as id of person replying to) - Destination (bio url as id of person talking before) - Subject (id of subject under discussion) - Date (of intervention, YY.MM.DD) |
| nodes.csv: nodelist with bio id - name - surname - canton - political group |
| |
=== Results visualization === | === Results visualization === |
| |
- Kibana Dashboard iframe: | * Kibana Dashboard iframe: |
<code><iframe src="http://178.62.236.56:3335/#/dashboard/Swiss-parliament-minutes-scraping?embed&_a=(filters:!(),panels:!((col:1,id:Total-count,row:1,size_x:3,size_y:2,type:visualization),(col:9,id:Who,row:3,size_x:4,size_y:4,type:visualization),(col:1,id:Parties,row:3,size_x:4,size_y:4,type:visualization),(col:4,id:Members,row:1,size_x:9,size_y:2,type:visualization),(col:5,id:County,row:3,size_x:4,size_y:4,type:visualization)),query:(query_string:(analyze_wildcard:!t,query:'*')),title:'Swiss%20parliament%20minutes%20scraping')&_g=()" height="600" width="800"></iframe></code> | <code><iframe src="http://178.62.236.56:3335/#/dashboard/Swiss-parliament-minutes-scraping?embed&_a=(filters:!(),panels:!((col:1,id:Total-count,row:1,size_x:3,size_y:2,type:visualization),(col:9,id:Who,row:3,size_x:4,size_y:4,type:visualization),(col:1,id:Parties,row:3,size_x:4,size_y:4,type:visualization),(col:4,id:Members,row:1,size_x:9,size_y:2,type:visualization),(col:5,id:County,row:3,size_x:4,size_y:4,type:visualization)),query:(query_string:(analyze_wildcard:!t,query:'*')),title:'Swiss%20parliament%20minutes%20scraping')&_g=()" height="600" width="800"></iframe></code> |
| |
- [[http://178.62.236.56:3335/#/dashboard/Swiss-parliament-minutes-scraping?_a=(filters:!(),panels:!((col:1,id:Total-count,row:1,size_x:3,size_y:2,type:visualization),(col:9,id:Who,row:3,size_x:4,size_y:4,type:visualization),(col:1,id:Parties,row:3,size_x:4,size_y:4,type:visualization),(col:4,id:Members,row:1,size_x:9,size_y:2,type:visualization),(col:5,id:County,row:3,size_x:4,size_y:4,type:visualization)),query:(query_string:(analyze_wildcard:!t,query:'*')),title:'Swiss%20parliament%20minutes%20scraping')&_g=()|Example of a Kibana Dashboard]]: | * [[http://178.62.236.56:3335/#/dashboard/Swiss-parliament-minutes-scraping?_a=(filters:!(),panels:!((col:1,id:Total-count,row:1,size_x:3,size_y:2,type:visualization),(col:9,id:Who,row:3,size_x:4,size_y:4,type:visualization),(col:1,id:Parties,row:3,size_x:4,size_y:4,type:visualization),(col:4,id:Members,row:1,size_x:9,size_y:2,type:visualization),(col:5,id:County,row:3,size_x:4,size_y:4,type:visualization)),query:(query_string:(analyze_wildcard:!t,query:'*')),title:'Swiss%20parliament%20minutes%20scraping')&_g=()|Example of a Kibana Dashboard]]: |
<pic sylvain> | <pic sylvain> |
| |
- Example viz graph "who talks to who": | * Example viz graph "who talks to who": |
<pic yannick> | <pic yannick> |
| |
- Semantic distance between members of parliament: <viz pa> | * Semantic distance between members of parliament: <viz pa> |
| |
- A simple gender gap visualization for the current Parliament: <[[https://docs.google.com/spreadsheets/d/1MiO6w331UMGX4vYTyhsMs5uUAgAUCJfyzPkRqAUSjww/edit?usp=sharing|gsheet shalf]]> | |
| |
| * A simple gender gap visualization for the current Parliament that kind of summarizes it all: <[[https://docs.google.com/spreadsheets/d/1MiO6w331UMGX4vYTyhsMs5uUAgAUCJfyzPkRqAUSjww/edit?usp=sharing|gsheet shalf]]> |
===== Team ===== | ===== Team ===== |
* Giovanni Colavizza, [[https://github.com/Giovanni1085|github: Giovanni1085]] | * Giovanni Colavizza, [[https://github.com/Giovanni1085|github: Giovanni1085]] |
* [[http://shalf.me|Yann Heurtaux]] [[https://twitter.com/shalf|@shalf]], [[https://github.com/shalf|github: shalf]] | * [[http://shalf.me|Yann Heurtaux]] [[https://twitter.com/shalf|@shalf]], [[https://github.com/shalf|github: shalf]] |
* Fabrice Hong, [[https://github.com/fabricehong|github: fabricehong]] | * Fabrice Hong, [[https://github.com/fabricehong|github: fabricehong]] |
* Jérémie Knüsel [[https://twitter.com/ambystome|@ambystome]], [[https://github.com/knuessel|github: knuessel]] | * Jan Iwaszkiewicz, [[https://github.com/jan44|github: jan44]] |
| * Jérémie Knüsel [[https://twitter.com/ambystome|@ambystome]], [[https://github.com/knuesel|github: knuesel]] |
* Sylvain Moesching | * Sylvain Moesching |
* [[user:nray|Nicolas Ray]] | * [[user:nray|Nicolas Ray]] |
* [[http://yro.ch|Yannick Rochat]] [[https://twitter.com/yrochat|@yrochat]], [[https://github.com/yrochat|github: yrochat]] | * [[http://yro.ch|Yannick Rochat]] [[https://twitter.com/yrochat|@yrochat]], [[https://github.com/yrochat|github: yrochat]] |
* Douglas Watson, [[https://github.com/douglas-watson|github: douglas-watson]] | * Douglas Watson, [[https://github.com/douglas-watson|github: douglas-watson]] |
* Jan Iwaszkiewicz, [[https://github.com/jan44|github: jan44]] | |
| |
===== Links ===== | ===== Links ===== |