This is an old revision of the document!
Swiss parliament minutes scraping
Is the Swiss parliament really useful ? Once elected, what are our councilors talking about ? Who is answering to who ?
Goal of this project is to answer some of these questions and many more. To do this, we are planning to:
- Scrape the parliament website in order to retrieve councilors bio, topics discussed and minutes
- Structure them in session, intervention (with rank/order), author and text
In order to perform some analysis, as
- person vs. vocabulary
- dialogue order
- gender
Our NEW github (careful, we had to fork it at the beginning of day 2).
Data
Raw data are available as one single JSON file, and its .cvs counterpart. We had size problems, thus exploring ways to produce several .csv
Structure of the main JSON (on Giovanni's side, complement with bio data from Jeremie)
list of interventions, with the following fields:
- Link_subject: link to the page of the subject under discussion (cf. CuriaVista)
- Surname: of the person speaking
- Description: of the subject under discussion
- Bio: link to the page of the person speaking
- Canton: of provenance
- Subject_id: of the subject under discussion (still to be understood: typology)
- Date: of the intervention (DD.MM.YY)
- Group: political group of the person speaking at the moment of the intervention
- Session_title: title of the session (Séance)
- Data: transcript of the intervention
- Name: of the person speaking
Graph data for Yannick
graph.csv: edgeless with Source (bio url as id of person replying to) - Destination (bio url as id of person talking before) - Subject (id of subject under discussion) - Date (of intervention, YY.MM.DD) nodes.csv: nodelist with bio id - name - surname - canton - political group
Structure of the Parliament API data via Yannick
- Very rich example fields-wise (thus almost complete) for one councillor
- From those data, building of a file (light one) describing members of parliament (script on github in the same folder)
Fields are: id cantonName council firstName lastName party active birthDate gender language maritalStatus militaryGrade partyId salutationLetter workLanguage
Structure of the final files from Jérémie
<to do>
Results visualization
Kibana Dashboard iframe:
<iframe src="http://178.62.236.56:3335/#/dashboard/Swiss-parliament-minutes-scraping?embed&_a=(filters:!(),panels:!((col:1,id:Total-count,row:1,size_x:3,size_y:2,type:visualization),(col:9,id:Who,row:3,size_x:4,size_y:4,type:visualization),(col:1,id:Parties,row:3,size_x:4,size_y:4,type:visualization),(col:4,id:Members,row:1,size_x:9,size_y:2,type:visualization),(col:5,id:County,row:3,size_x:4,size_y:4,type:visualization)),query:(query_string:(analyze_wildcard:!t,query:'*')),title:'Swiss%20parliament%20minutes%20scraping')&_g=()" height="600" width="800"></iframe>
Example of a Kibana Dashboard: <pic sylvain>
Example viz graph: <pic yannick>
Team
- Giovanni Colavizza, github: Giovanni1085
- Pierre-Alexandre Fonta @pa_fonta, github: pafonta
- Fabrice Hong, github: fabricehong
- Jérémie Knüsel @ambystome, github: knuessel
- Sylvain Moesching
- Douglas Watson, github: douglas-watson
- Jan Iwaszkiewicz, github: jan44