project:chparlscraping

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
project:chparlscraping [2015/09/04 12:25] yrochatproject:chparlscraping [2015/09/07 16:23] (current) yrochat
Line 1: Line 1:
-===== Swiss parliament minutes scraping =====+==== Swiss parliament minutes scraping ====
  
-In this project, we are planning to:+[[http://parlement.letemps.ch]]
  
-  - Scrape the parliament website in order to retrieve  +Is the Swiss parliament really useful ? Once elected, what are our councilors talking about ? Who is answering to whom ?
-    - councilors bio  +
-    - minutes +
-  - Structure them:  +
-    - session +
-    - intervention (with rank/order) +
-    - author +
-    - text +
-  - Analysis  +
-      - person vs. vocabulary +
-      - dialogue order +
-      - gender+
  
-Our [[https://github.com/fabricehong/parl-scraping|github]].+Goal of this project is to answer some of these questions and many more. To do this, we are planning to:
  
-===== Data =====+  - Scrape the parliament website in order to retrieve councilors bio, topics discussed and minutes 
 +  - Structure them in session, intervention (with rank/order), author and text
  
-  * Soon !+In order to perform some analysis, as  
 +    -  from keywords, who talks about what, by parties, cantons and people 
 +    - person vs. vocabulary 
 +    - dialogue order 
 +    - gender
  
-===== Team =====+Our NEW [[https://github.com/douglas-watson/parl-scraping|github]] (careful, we had to fork it at the beginning of day 2).
  
-  * Giovanni Colavizza +=== Data === 
-  * Pierre-Alexandre Fonta +Raw data are available as one single JSON file, and its .csv counterpart. We had size problems, thus exploring ways to produce several .csv. 
-  * [[user:shalf|Yann Heurtaux]] +**The [[https://github.com/douglas-watson/parl-scraping/tree/master/data|final folder for our data is on github]], the [[https://github.com/douglas-watson/parl-scraping/blob/master/data/with-bio-split-csv.zip|.csv files split by session are here]].** 
-  * Fabrice Hong + 
-  * Jérémie Knüsel +== Structure of the main JSON (on Giovanni's side, complement with the bio data from the Parliament API) == 
-  * Sylvain Moeshing+ 
 +list of interventions, with the following fields: 
 + 
 +  * Link_subject: link to the page of the subject under discussion (cf. CuriaVista) 
 +  * Surname: of the person speaking 
 +  * Description: of the subject under discussion 
 +  * Bio: link to the page of the person speaking 
 +  * Canton: of provenance 
 +  * Subject_id: of the subject under discussion (still to be understood: typology) 
 +  * Date: of the intervention (DD.MM.YY) 
 +  * Group: political group of the person speaking at the moment of the intervention 
 +  * Session_title: title of the session (Séance) 
 +  * Data: transcript of the intervention 
 +  * Name: of the person speaking 
 + 
 +== Structure of the Parliament API data via Yannick == 
 + 
 +  * [[http://ws.parlament.ch/councillors/823?format=json|Very rich example fields-wise]] (thus almost complete) for one councillor 
 +  * [[http://ws.parlament.ch/councillors|All councillors]] 
 +  * [[http://www.parlament.ch/e/dokumentation/webservices-opendata/Documents/webservices-info-dritte-e.pdf|API's doc]] 
 +  * [[https://github.com/douglas-watson/parl-scraping/tree/master/biography_retrieval|Code on github]] 
 +  * From those data, building of a file (light one) describing members of parliament ([[https://github.com/douglas-watson/parl-scraping/tree/master/biography_csv_extract|script on github in the same folder]]) 
 + 
 +Fields are: id cantonName council firstName lastName party active birthDate gender language maritalStatus militaryGrade partyId salutationLetter workLanguage 
 + 
 +== Structure of the final files from Jérémie == 
 + 
 +1 JSON file + its .csv counterpart for each Parliament session of the National Council from 1995. 
 +The same datasets are also available split as one JSON/CSV file per legislative session. 
 + 
 +== Graph data for Yannick == 
 + 
 +graph.csv: edgeless with Source (bio url as id of person replying to) - Destination (bio url as id of person talking before) - Subject (id of subject under discussion) - Date (of intervention, YY.MM.DD) 
 +nodes.csv: nodelist with bio id - name - surname - canton - political group 
 + 
 +=== Results visualization === 
 + 
 +  * Kibana Dashboard iframe: 
 +<code><iframe src="http://178.62.236.56:3335/#/dashboard/Swiss-parliament-minutes-scraping?embed&_a=(filters:!(),panels:!((col:1,id:Total-count,row:1,size_x:3,size_y:2,type:visualization),(col:9,id:Who,row:3,size_x:4,size_y:4,type:visualization),(col:1,id:Parties,row:3,size_x:4,size_y:4,type:visualization),(col:4,id:Members,row:1,size_x:9,size_y:2,type:visualization),(col:5,id:County,row:3,size_x:4,size_y:4,type:visualization)),query:(query_string:(analyze_wildcard:!t,query:'*')),title:'Swiss%20parliament%20minutes%20scraping')&_g=()" height="600" width="800"></iframe></code> 
 + 
 +  * [[http://178.62.236.56:3335/#/dashboard/Swiss-parliament-minutes-scraping?_a=(filters:!(),panels:!((col:1,id:Total-count,row:1,size_x:3,size_y:2,type:visualization),(col:9,id:Who,row:3,size_x:4,size_y:4,type:visualization),(col:1,id:Parties,row:3,size_x:4,size_y:4,type:visualization),(col:4,id:Members,row:1,size_x:9,size_y:2,type:visualization),(col:5,id:County,row:3,size_x:4,size_y:4,type:visualization)),query:(query_string:(analyze_wildcard:!t,query:'*')),title:'Swiss%20parliament%20minutes%20scraping')&_g=()|Example of a Kibana Dashboard]]: 
 +<pic sylvain> 
 + 
 +  * Example viz graph "who talks to who": 
 +<pic yannick> 
 + 
 +  * Semantic distance between members of parliament: <viz pa> 
 + 
 +  * A simple gender gap visualization for the current Parliament that kind of summarizes it all: <[[https://docs.google.com/spreadsheets/d/1MiO6w331UMGX4vYTyhsMs5uUAgAUCJfyzPkRqAUSjww/edit?usp=sharing|gsheet shalf]]> 
 +===== Team ===== 
 +  * Giovanni Colavizza, [[https://github.com/Giovanni1085|github: Giovanni1085]] 
 +  * Pierre-Alexandre Fonta [[https://twitter.com/pa_fonta|@pa_fonta]], [[https://github.com/pafonta|github: pafonta]] 
 +  * [[http://shalf.me|Yann Heurtaux]] [[https://twitter.com/shalf|@shalf]], [[https://github.com/shalf|github: shalf]] 
 +  * Fabrice Hong, [[https://github.com/fabricehong|github: fabricehong]] 
 +  * Jan Iwaszkiewicz, [[https://github.com/jan44|github: jan44]] 
 +  * Jérémie Knüsel [[https://twitter.com/ambystome|@ambystome]], [[https://github.com/knuesel|github: knuesel]] 
 +  * Sylvain Moesching
   * [[user:nray|Nicolas Ray]]   * [[user:nray|Nicolas Ray]]
-  * [[user:yrochat|Yannick Rochat]] +  * [[http://yro.ch|Yannick Rochat]] [[https://twitter.com/yrochat|@yrochat]], [[https://github.com/yrochat|github: yrochat]] 
-  * Douglas Watson+  * Douglas Watson, [[https://github.com/douglas-watson|github: douglas-watson]]
  
-===== Links ===== 
  
 +===== Links =====
   * [[http://www.parlament.ch/ab/frameset/f/index.htm|Minutes of the parliament]]   * [[http://www.parlament.ch/ab/frameset/f/index.htm|Minutes of the parliament]]
   * [[http://ws.parlament.ch/|Parliament API]]   * [[http://ws.parlament.ch/|Parliament API]]
   * [[http://www.parlament.ch/e/dokumentation/webservices-opendata/Documents/webservices-info-dritte-e.pdf|API doc]]   * [[http://www.parlament.ch/e/dokumentation/webservices-opendata/Documents/webservices-info-dritte-e.pdf|API doc]]
      
-{{tag>elections}}+{{tag>elections politics}}
  • project/chparlscraping.1441362342.txt.gz
  • Last modified: 2015/09/04 12:25
  • by yrochat