project:chparlscraping

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revisionBoth sides next revision
project:chparlscraping [2015/09/04 12:10] yrochatproject:chparlscraping [2015/09/06 18:56] – [Swiss parliament minutes scraping] jk
Line 1: Line 1:
-===== Swiss parliament minutes scraping =====+==== Swiss parliament minutes scraping ====
  
-In this project, we are planning to:+[[http://parlement.letemps.ch]]
  
-  - Scrape the parliament website in order to retrieve councilors bio and minutes (with proper structure, explained later). +Is the Swiss parliament really useful ? Once elected, what are our councilors talking about ? Who is answering to whom ?
-  - Structure them.  +
-  - Analysis (person vs. vocabulary, dialogue order, gender). +
  
-Our [[https://github.com/fabricehong/parl-scraping|github]].+Goal of this project is to answer some of these questions and many more. To do this, we are planning to:
  
-===== Data =====+  - Scrape the parliament website in order to retrieve councilors bio, topics discussed and minutes 
 +  - Structure them in session, intervention (with rank/order), author and text
  
-  * Soon !+In order to perform some analysis, as  
 +    -  from keywords, who talks about what, by parties, cantons and people 
 +    - person vs. vocabulary 
 +    - dialogue order 
 +    - gender
  
-===== Team =====+Our NEW [[https://github.com/douglas-watson/parl-scraping|github]] (careful, we had to fork it at the beginning of day 2).
  
-  * Giovanni Colavizza +=== Data === 
-  * Pierre-Alexandre Fonta +Raw data are available as one single JSON file, and its .csv counterpart. We had size problems, thus exploring ways to produce several .csv. 
-  * [[user:shalf|Yann Heurtaux]] +**The [[https://github.com/douglas-watson/parl-scraping/tree/master/data|final folder for our data is on github]], the [[https://github.com/douglas-watson/parl-scraping/blob/master/data/with-bio-split-csv.zip|.csv files split by session are here]].**
-  * Fabrice Hong +
-  * Jérémie Knüsel +
-  * Sylvain Moeshing +
-  * Nicolas Ray +
-  * [[user:yrochat|Yannick Rochat]] +
-  Douglas Watson+
  
-===== Links =====+== Structure of the main JSON (on Giovanni's side, complement with the bio data from the Parliament API) ==
  
-  * Soon ! +list of interventions, with the following fields:
-   +
-{{tag>status:concept needs:dev needs:design needs:data needs:expert elections}}+
  
 +  * Link_subject: link to the page of the subject under discussion (cf. CuriaVista)
 +  * Surname: of the person speaking
 +  * Description: of the subject under discussion
 +  * Bio: link to the page of the person speaking
 +  * Canton: of provenance
 +  * Subject_id: of the subject under discussion (still to be understood: typology)
 +  * Date: of the intervention (DD.MM.YY)
 +  * Group: political group of the person speaking at the moment of the intervention
 +  * Session_title: title of the session (Séance)
 +  * Data: transcript of the intervention
 +  * Name: of the person speaking
 +
 +== Structure of the Parliament API data via Yannick ==
 +
 +  * [[http://ws.parlament.ch/councillors/823?format=json|Very rich example fields-wise]] (thus almost complete) for one councillor
 +  * [[http://ws.parlament.ch/councillors|All councillors]]
 +  * [[http://www.parlament.ch/e/dokumentation/webservices-opendata/Documents/webservices-info-dritte-e.pdf|API's doc]]
 +  * [[https://github.com/douglas-watson/parl-scraping/tree/master/biography_retrieval|Code on github]]
 +  * From those data, building of a file (light one) describing members of parliament ([[https://github.com/douglas-watson/parl-scraping/tree/master/biography_csv_extract|script on github in the same folder]])
 +
 +Fields are: id cantonName council firstName lastName party active birthDate gender language maritalStatus militaryGrade partyId salutationLetter workLanguage
 +
 +== Structure of the final files from Jérémie ==
 +
 +1 JSON file + its .csv counterpart for each Parliament session of the National Council from 1995.
 +The same datasets are also available split as one JSON/CSV file per legislative session.
 +
 +== Graph data for Yannick ==
 +
 +graph.csv: edgeless with Source (bio url as id of person replying to) - Destination (bio url as id of person talking before) - Subject (id of subject under discussion) - Date (of intervention, YY.MM.DD)
 +nodes.csv: nodelist with bio id - name - surname - canton - political group
 +
 +=== Results visualization ===
 +
 +  * Kibana Dashboard iframe:
 +<code><iframe src="http://178.62.236.56:3335/#/dashboard/Swiss-parliament-minutes-scraping?embed&_a=(filters:!(),panels:!((col:1,id:Total-count,row:1,size_x:3,size_y:2,type:visualization),(col:9,id:Who,row:3,size_x:4,size_y:4,type:visualization),(col:1,id:Parties,row:3,size_x:4,size_y:4,type:visualization),(col:4,id:Members,row:1,size_x:9,size_y:2,type:visualization),(col:5,id:County,row:3,size_x:4,size_y:4,type:visualization)),query:(query_string:(analyze_wildcard:!t,query:'*')),title:'Swiss%20parliament%20minutes%20scraping')&_g=()" height="600" width="800"></iframe></code>
 +
 +  * [[http://178.62.236.56:3335/#/dashboard/Swiss-parliament-minutes-scraping?_a=(filters:!(),panels:!((col:1,id:Total-count,row:1,size_x:3,size_y:2,type:visualization),(col:9,id:Who,row:3,size_x:4,size_y:4,type:visualization),(col:1,id:Parties,row:3,size_x:4,size_y:4,type:visualization),(col:4,id:Members,row:1,size_x:9,size_y:2,type:visualization),(col:5,id:County,row:3,size_x:4,size_y:4,type:visualization)),query:(query_string:(analyze_wildcard:!t,query:'*')),title:'Swiss%20parliament%20minutes%20scraping')&_g=()|Example of a Kibana Dashboard]]:
 +<pic sylvain>
 +
 +  * Example viz graph "who talks to who":
 +<pic yannick>
 +
 +  * Semantic distance between members of parliament: <viz pa>
 +
 +  * A simple gender gap visualization for the current Parliament that kind of summarizes it all: <[[https://docs.google.com/spreadsheets/d/1MiO6w331UMGX4vYTyhsMs5uUAgAUCJfyzPkRqAUSjww/edit?usp=sharing|gsheet shalf]]>
 +===== Team =====
 +  * Giovanni Colavizza, [[https://github.com/Giovanni1085|github: Giovanni1085]]
 +  * Pierre-Alexandre Fonta [[https://twitter.com/pa_fonta|@pa_fonta]], [[https://github.com/pafonta|github: pafonta]]
 +  * [[http://shalf.me|Yann Heurtaux]] [[https://twitter.com/shalf|@shalf]], [[https://github.com/shalf|github: shalf]]
 +  * Fabrice Hong, [[https://github.com/fabricehong|github: fabricehong]]
 +  * Jérémie Knüsel [[https://twitter.com/ambystome|@ambystome]], [[https://github.com/knuesel|github: knuesel]]
 +  * Sylvain Moesching
 +  * [[user:nray|Nicolas Ray]]
 +  * [[http://yro.ch|Yannick Rochat]] [[https://twitter.com/yrochat|@yrochat]], [[https://github.com/yrochat|github: yrochat]]
 +  * Douglas Watson, [[https://github.com/douglas-watson|github: douglas-watson]]
 +  * Jan Iwaszkiewicz, [[https://github.com/jan44|github: jan44]]
 +
 +===== Links =====
 +  * [[http://www.parlament.ch/ab/frameset/f/index.htm|Minutes of the parliament]]
 +  * [[http://ws.parlament.ch/|Parliament API]]
 +  * [[http://www.parlament.ch/e/dokumentation/webservices-opendata/Documents/webservices-info-dritte-e.pdf|API doc]]
 +  
 +{{tag>elections politics}}
  • project/chparlscraping.txt
  • Last modified: 2015/09/07 16:23
  • by yrochat