We are trying to analyse historical data of Tour the France. We started checking online resources and extracting data from them. Unfortunately data are not easy to extract, official website of the Tour offers just very general data like distance of the whole tour + results, more interesting data like stages profiles, lengths etc. are missing.

Ideas we have:

  • very interesting would be to put together gps data from Tour and map all these races. There are a lot of services where people put there bike tracks (like, strava, mapmyride etc), for sure there are also stages of at least latest Tours. Plus we could probably get this from official website…
  • put the race data together with drugs or technique enhancements
  • various statistics like nationalities, teams tactics (one leader/domestics/leaders + spurters/climbers etc) etc.
  • how has Tour developed? how has the elevation/distance/distance per stage/number of stages changed? how have the sizes of teams changed?
  • another interesting challenge would be to get data of individual riders and see how they perform during the race (eg. trainingpeaks has some riders data including heart rate, watts, speed during the race etc.)

Besides these sports related ideas it would be interesting to correlate external historical events (such as advancements in technology or drug abuse in sports) with the trends discovered in the historical Tour de France data. For a further discussion of these topics we compiled a dataset through scraping and that brings together the average overall performance of the Tour de France over years with the performance of the winning athlete.

Unfortunately the data between 1904 and 1914 is incomplete, that mainly comes from the different performance measuring system applied during the tour (points vs. time). For the years 1915-1918 and 1940-1946 no data exists due to the 1st and 2nd world wars.

The Winner data also includes athletes that were later disqualified due to e.g. doping. > should be scraped and parsed > used for stages extraction (difficult) > one rider's performance

Own compiled dataset

  • project/tour_de_france_history.txt
  • Last modified: 2014/05/24 11:23
  • by dergraf