Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix
1. Ori Pekelman @ Open World Forum 2014
Combining Big Data & Open Source strategy
2. Me
I am an entrepreneur and a consultant check out http://platform.sh on which I have been working a lot on lately
I am the originator and co-organizer of a bunch of meetups such as the Functional Languages User Group (btw happenning right now…) and the big informal Data group we call ParisDataGeeks (with people like Olivier Grisel and Sam Bessalah.. And btw this one happens all day tomorrow!)
On Social Media I am @OriPekelman
OWF 2014
2
3. Big Data Small Talk
This is a short talk. There won’t be anything overly technical here.
I don’t remember how this got to be the title of the talk..
If you come tomorrow you will get an incredible birds-eye view of current trends in real time big machine learningy data applications
OWF 2014
3
4. Data this Data that
Data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data … data
Everybody loves the data.
OWF 2014
4
6. Data this Data that
Well, are contractions are so hard that with 100 petabytes we can’t do some simple Markov chains in the 24th century ?
We say “Big Data” so often these days it has become an extremely vague term.
And when we say “Open Data” we get the same form of vagueness. Let’s try to frame this.
7. What applications of big data are we talking about here?
The machine learning kind: Everything else is mostly trivial or just a bit of engineering away.
When we say Machine Learning it basically means:
Rediculous amount of data
100% proprietary mostly about intimate human interactions
Software
Mostly Open Source
Model
Mostly Opaque. Mostly Closed. Some with APIs
Robust Predictions
Mostly about human behaviour
8. The ingredients
Data sources
Proprietary and closed (property of whom?)
Proprietary with some APIs
Open with an open license
Software
Proprietary
Free
Stuff to run the software on
Proprietary
Models
Proprietary and closed
Proprietary with some APIs
?
OWF 2014
8
9. Data property
Us as individuals to a very faint degree
Governements
Google
Apple
Credit Card Companies and banks
People we haven’t heard about
OWF 2014
9
10. On the software ingredient
Big Data is predominantly an Open Source game
How much big data software is not prefixed by « Apache »?
OWF 2014
10
11. Laws of data. I like laws.
« Data expands to fill the space available for storage. »
Parkinson’s law applied to data
« Free disk space is always pronounced in percentage, and the percentage is always a single digit »
My father
OWF 2014
11
12. Parkinson’s law
Cloud technologies represent an ultimate phase in the commoditization of computing storage and calculation power
Becomes limited only by cost (well at least in theory).
So if we take Parkinson’s law to the letter data will expand until we have spent humanity’s last dime.
OWF 2014
12
13. The Cloud, Data and Free Software
The cloud is orthogonal at the least to the basic idea of free software (the libre variety)
Because what makes free software economically possible is that the marginal cost of duplicating code tends to zero.
The marginal cost of duplicating data grows at best linearly and because of Parkinson's law.. Probably more than that.
This means that in the list of ingredients we noted before “data” will by nature be mostly proprietary. Because its cost is directly linked to that of machines and because Moore’s law is of no help.
14. Models
Models are better than data
They are less sparse , more dense
They are data reduced
They always give an answer
They are immediately useful
Its like the thing with Data->Information->Knowledge + (Wisdom?)
As we noted before the models we are talking about are mostly Opaque, they do not generate Wisdom.
OWF 2014
14
15. Laws of data. I like laws.
«Information wants to be free »
Stewart Brand
Well this one is less of a law in the sens of a physical one, and more of a moral one. We will get back to this at the end.
OWF 2014
15
16. Laws..
“Hybrid data makes all your data big”
I think that's me.. But you know, zeitgeist
Hybrid data denotes “Data Applications” where the data comes from your own internal data sources and either open or proprietary external sources.
Often enough mixing data sources has a combinatorial effect. Data locality become really important.
Using Predictive APIs means building a Hybrid Data application where you only have access to the resulting model.
17. Watson in the mix
ML requires data. The bigger it gets the more robust you will be.
Open Source mostly commoditizes the algorithmic and software layer, not a lot of secret source there.
Players with the most data will probably be able to build more robust models
And as basically all “Data Applications” will be Hybrid ones, we will see more and more applications dependent on external derived, opaque, models
18. Predictive APIs
The "As A service" crowd is becoming the more potent rival to Free software
While most of them will run Open Source solutions in any case
Most of the value will remain proprietary and these robust models are going to be at least as important as the software
As a company, blindly going into this means you might very well find yourself extremly dependent on others for some of your core operations
Free software alone will not defend you
20. It’s a social issue too
There is a strong ethical reason we want to fight not only for open source but also for open data
The advent of opaque systems with smart algorithms and an extreme amount of data on us (the proprietary data + as a service model) is not only going to be bad for our privacy, its going to have tangible effects on our livelihoods, on our place is society as it can introduce an extreme form of information asymmetry at a scale not seen before.
In this domain more then in others the actors of Free Software need to be more vigilant and by working with the other actors of freedom make sure we are not constructing the tools of our demise.
21. Information wants to be free
Well, if you are stuck in the 2000s and do nightly batches you are probably not managing well your own internal data wealth. So get on it.
Learn about what we can currently do in Machine Learning. Start having a plan.
Don’t hoard the data. Open it at least to some extent.
Collaborate on the economical and social framework for open data and open models.
Either because you are a government and you have a moral obligation to defend your citizens.
Or because if you become a consumer only you will not be able to manage your dependency on external opaque sources.
OWF 2014
21
22. #ParisDataGeeks
… and come tomorrow starting at 9am for talks such as:
Algebird : algebra for efficient big data processing Abstract algebra for data mining par Sam Bessalah (Software Engineer, Independant)
Context Awareness : From NEST to Google Now and IFTTT, in this talk we will go through some of the most successful use cases of context awareness, and explain some of the technology behind the pocket brain we are currently building at Snips. par Dr. Rand Hindi
Apache Kafka distributed publish-subscribe messaging system Par Charly Clairmont (CTO, Altic)
Data encoding and Metadata for Streams Par Jonathan Winandy (Founder, Primatice)
Next Open Source Big Data Suite A new low level approach for BigData Par Emmanuel Keller (CEO/CTO, OpenSearchServer)
State Of the Art in Machine Learning Par Olivier Grisel (Software Engineer, Inria)
Take back control of your web tracking Go further by doing it yourself par Clément Stenac (CTO, Dataiku)
Real time energy data analysis with Apache Storm par Simon Maby (Software Architect, Octo Technology)
OWF 2014
22