project:chparlscraping

This is an old revision of the document!


Is the Swiss parliament really useful ? Once elected, what are our councilors talking about ? Who is answering to who ?

Goal of this project is to answer some of these questions and many more. To do this, we are planning to:

  1. Scrape the parliament website in order to retrieve councilors bio, topics discussed and minutes
  2. Structure them in session, intervention (with rank/order), author and text

In order to perform some analysis, as

  1. from keywords, who talks about what, by parties, cantons and people
  2. person vs. vocabulary
  3. dialogue order
  4. gender

Our NEW github (careful, we had to fork it at the beginning of day 2).

Data

Raw data are available as one single JSON file, and its .cvs counterpart. We had size problems, thus exploring ways to produce several .csv

Structure of the main JSON (on Giovanni's side, complement with bio data from Jeremie)

list of interventions, with the following fields:

  • Link_subject: link to the page of the subject under discussion (cf. CuriaVista)
  • Surname: of the person speaking
  • Description: of the subject under discussion
  • Bio: link to the page of the person speaking
  • Canton: of provenance
  • Subject_id: of the subject under discussion (still to be understood: typology)
  • Date: of the intervention (DD.MM.YY)
  • Group: political group of the person speaking at the moment of the intervention
  • Session_title: title of the session (Séance)
  • Data: transcript of the intervention
  • Name: of the person speaking
Graph data for Yannick

graph.csv: edgeless with Source (bio url as id of person replying to) - Destination (bio url as id of person talking before) - Subject (id of subject under discussion) - Date (of intervention, YY.MM.DD) nodes.csv: nodelist with bio id - name - surname - canton - political group

Structure of the Parliament API data via Yannick

Fields are: id cantonName council firstName lastName party active birthDate gender language maritalStatus militaryGrade partyId salutationLetter workLanguage

Structure of the final files from Jérémie

1 JSON file + its .csv counterpart for each Parliament session from 1995.

Results visualization

Kibana Dashboard iframe:

<iframe src="http://178.62.236.56:3335/#/dashboard/Swiss-parliament-minutes-scraping?embed&_a=(filters:!(),panels:!((col:1,id:Total-count,row:1,size_x:3,size_y:2,type:visualization),(col:9,id:Who,row:3,size_x:4,size_y:4,type:visualization),(col:1,id:Parties,row:3,size_x:4,size_y:4,type:visualization),(col:4,id:Members,row:1,size_x:9,size_y:2,type:visualization),(col:5,id:County,row:3,size_x:4,size_y:4,type:visualization)),query:(query_string:(analyze_wildcard:!t,query:'*')),title:'Swiss%20parliament%20minutes%20scraping')&_g=()" height="600" width="800"></iframe>

Example of a Kibana Dashboard: <pic sylvain>

Example viz graph “who talks to who”: <pic yannick>

  • project/chparlscraping.1441465875.txt.gz
  • Last modified: 2015/09/05 17:11
  • by shalf