Network Reconstruction: The Contribution of Challenges in Machine Learning: Networks of influence are found at all levels of physical, biological, and societal systems: climate networks, gene networks, neural networks, and social networks are a few examples. These networks are not just descriptive of the “State of Nature”, they allow us to make predictions such as forecasting disruptive weather patterns, evaluating the possible effect of a drug, locating the focus of a neural seizure, and predicting the propagation of epidemics. This, in turns, allows us to device adequate interventions or change in policies to obtain desired outcomes: evacuate people before a region is hit by a hurricane, administer treatment, vaccinate, etc. But knowing the network structure is a prerequisite, and this structure may be very hard and costly to obtain with traditional means. For example, the medical community relies on clinical trials, which cost millions of dollars; the neuroscience community engages in connection tracing with election microscopy, which take years before establishing the connectivity of 100 neurons (the brain contains billions).
This presentation will review recent progresses that have been made in network reconstruction methods based solely on observational data. Great advances have been recently made using machine learning. We will analyze the results of several challenges we organized, which point us to new simple and practical methodologies.
2. 2
Motivation
BIG data makes lots of BIG promises, but…
… will the promises be held?
DIFFICULTY
VALUE
Classical statistics Machine learning
What
happened?
How
happened?
Explicative
power
Forecasting
power
http://chalearn.org/
Decisional
power
What will
happen?
11. 11
“Please test your
researchers for ten
years: Randomly pick
half of them and give
them chocolate for
desert and give apples
to the other half. Then
compare the number
of Nobel prizes in the
two populations.”
Perform randomized
controlled experiments http://chalearn.org/
13. 13
• Pioneer work: Glymour, Scheines, Spirtes, Pearl (Turing
Award, 2011), Rubin, in the US, since the 80’s.
• New wave: Hyvärinen, Schölkopf, Bühlmann in the EU.
• Nobel prizes in econometrics: Haavelmo (1989),
Granger (2003), Sargent and Sims (2011).
• DARPA programs: Big mechanisms (2014), upcoming
program (Schwartz, program manager).
Landmark work http://chalearn.org/
15. 15
To make a long story short… http://chalearn.org/
1. Discovering dependencies: easiest = classical
feature selection. Hard to beat!
2. Removing spurious dependencies: harder and
“dangerous” because removing good features
is more harmful than keeping bad ones.
3. Orienting dependencies: hardest.
16. 16
Cause-effect pair challenge
(2013) http://chalearn.org/
Initial impulse: Joris Mooij, Dominik Janzing, and Bernhard Schölkopf.
Examples of algorithms and data: Povilas Daniušis, Arthur Gretton,
Patrik O. Hoyer, Dominik Janzing, Antti Kerminen, Joris Mooij, Jonas
Peters, Bernhard Schölkopf, Shohei Shimizu, Oliver Stegle, and Kun
Zhang, Jakob Zscheischler.
Datasets and result analysis: Isabelle Guyon + Mehreen Saeed +
{Mikael Henaff, Sisi Ma, and Alexander Statnikov}, from NYU.
Website and sample code: Isabelle Guyon +
Phase 1: Ben Hamner (Kaggle) https://www.kaggle.com/c/cause-
effect-pairs
Phase 2: Ivan Judson, Christophe Poulain, Evelyne Viegas,
Michael Zyskowski https://www.codalab.org/competitions/1381
Review, testing: Marc Boullé,
Hugo Jair Escalante, Frederick Eberhardt,
Seth Flaxman, Patrik Hoyer,
Dominik Janzing, Richard Kennaway,
Vincent Lemaire, Joris Mooij,
Jonas Peters, Florin Popescu, Peter Spirtes,
Ioannis Tsamardinos, Jianxin Yin, Kun Zhang.
Mehreen
Evelyne
Joris Dominik
Bernhard
Kun
Ben
Alexander
Marc
20. 20
A B A B
Best fit: A B http://chalearn.org/
21. 21
The data:
A
B
Z
A B
B
A
Z
A <- B
A B
Z
ZBZA
A Z B
A B A | B
Demographics:
Sex Height
Age Wages
Country Education
Latitude Infant mortality
Ecology:
City elevation Temperature
Water level Algal frequency
Elevation Vegetation
Dist. to hydrology Fire
Econometrics:
Mileage Car resell price
Num.rooms House price
Trade price last day Trade price
Medicine:
Cancer vol. Recurrence
Metastasis Prognosis
Age Blood pressure
Genomics (mRNA level):
transcription factor protein
induced
Engineering:
Car model year Horsepower
Number of cylinders MPG
Cache memory Compute power
Roof area Heating load
Cement used Compressive strength
20% 80%
http://chalearn.org/
24. 24
Neural connectomics
Challenge (2014) http://chalearn.org/
Coordinator:
Isabelle Guyon
Data Providers:
Demian Battaglia
Javier Orlandi
Jordi Soriano Fradera
Olav Stetter
Advisors:
Gavin Cawley
Gideon Dror
Hugo-Jair Escalante
Alice Guyon
Vincent Lemaire
Sisi Ma
Eric Peskin
Florin Popescu
Bisakha Ray,
Mehreen Saeed
Alexander Statnikov
Demian
Olav
Jordi
Javier
Bisakha
Mehreen