project:openfooddna [make.opendata.ch wiki]

The basic idea is to carry out a citizen science project to compile an open dataset of DNA information about food and beverages.

Many more details about the project on the dedicated wiki here.

Take beer. The assumption is that the DNA content (genetic or genomic information about every single living organism that was present during the brewing) may correlate with the type and taste of the brew.

Somebody else (working for a Yeast provider in California) seems to be working on this, too. The american press (including the NYT) did cover his project, but there is no data or report available so far.

Luc Henry, ideator/scientist @heluc
Gianpaolo Rando, scientist @randogp
Soraia Binz, designer @supsi
Antoine Logean, engineer @ecolix

Disclaimer: Luc Henry and Gianpaolo Rando run the BeerDeCoded project at the open laboratory Hackuarium in Renens and came to get input about data analysis.

In this project, genetic data will initially be qualitative sequencing data. The idea is to transform this sequencing data into binary data about the presence=1/absence=0 of organisms in a given beer samples.

Since we do not have data yet, we generated a random dataset using this Octave/Matlab code: (generates a table containing random binary data for 10 samples and 30 parameters and calculate the euclidian distance matrix)

#!/usr/bin/octave
% lines : beer samples
li = 10;
% columns : species and sub species
co = 30;
M = round(rand(li,co))
O = zeros(li);
for i = 1:li
  for j = 1:i
    O(i,j) = norm( M(:,i)-M(:,j) , 2  ); % euclidian norm
  endfor
endfor  
O

We did not plot this data yet.

We searched for metadata to add to the data generated and making it look more real. Few of our findings:

American beers metadata set contains name

Webpage of CraftBeerAnalytics

Download data here

Starting from a table containing m samples (S1-Sm) and a [m x n] matrix of binary data (D1-Dn species are either present or not) for each sample, we can build a [m x m] matrix of Euclidian distances between the samples.

The metadata entries (M1-Mn) can be attached to this matrix and used to generate a plot that contains sample points with associated characteristics. This plot contains every single sample present in the database.

A user friendly interface has to be built in order for the beer lover to choose the beers they know, and compare this “genetic diversity” with that of the beers available at the bar where they are. The sidebar should contain a search space (with autofilling) and then items can be either added as “unknown” (beers available but not tasted yet, white circles) or “favourites” (beers that the user knows). Links show some metadata characteristics that link beers together (same country, same brewery, etc.)

We “built” a prototype interface to sketch how we want to present the data.

We have a main display with three visualisation types you can switch from: Plot (display beer samples based on Euclidian distance), Tree (same but in a tree fashion), Rank (forget distances and rank based on metadata, such as alcohol %, bitterness).

Here is the Plot: