2. Taewook Eom
Data Programmer
Plaster(Planet Master)
of Big Data Infra
Pre-Assessor of Hiring Programmers
Mentor of 101 Startup Korea
Twitter: @taewooke
LinkedIn: http://kr.linkedin.com/in/taewookeom
http://www.flickr.com/photos/oreillyconf/10616622085/
3. Santa Clara
: Technical
New York
with Cloudera
: Financial, Business
Europe
: Privacy, Government
Boston
: Medical
http://strataconf.com/
by O’Reilly
Web 2.0
: Open, Sharing, Participation
Big Data
: Making Data Work
Change the World with Data.
4. Data
When hardware became commoditized,
software was valuable.
Now software being commoditized,
data is valuable.
– Tim O’Reilly, 2011
Data is like the blood of the enterprise.
– Amr Awadallah, CTO at Cloudera, 2013
5. What is Big Data?
All data that is not a fit for a traditional RDBMS,
whether used for OLTP or Analytics purposes
Big Data Architectural Patterns
http://strataconf.com/stratany2013/public/schedule/detail/30397
6. Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data
- Gartner, 2011
http://blog.vitria.com/Portals/47881/images/3values-resized-600.png
17. Big Data Space
No one tools is the right fit for all Big Data problem
Do not be afraid to recommend the right solution
for the problem over the popular solution
To do this, you must be aware of the entire ecosystem
Big Data Architectural Patterns
http://strataconf.com/stratany2013/public/schedule/detail/30397
18. Practical Performance Analysis and Tuning for Cloudera Impala
http://strataconf.com/stratany2013/public/schedule/detail/30551
19. Hadoop and the Relational Data Warehouse – When to Use Which?
http://strataconf.com/stratany2013/public/schedule/detail/30964
20. Defining your Big Data Arsenal: NoSQL, Hadoop, and RDBMS
http://strataconf.com/stratany2013/public/schedule/detail/29968
21. Ignite
Signal Detection Theory: Man vs Machine
Co-Founder @VividCortex
Kyle Redinger
http://www.youtube.com/watch?v=Fg6mN-jevds
(5 minutes 6 seconds)
http://www.slideshare.net/realkyleredinger/man-vs-machine-signal-detection-theory-and-big-data
22. Signal Detection Theory: Man vs Machine
Remove the obvious and look at what is important
Remember: Less is more.
23. Keynote
Towards Strata 2014
Director of market research at O’Reilly Media
Roger Magoulas
http://www.youtube.com/watch?v=Ytd5VkEgQf8
(5 minutes 26 seconds)
http://strataconf.com/stratany2013/public/schedule/detail/31935
http://www.oreilly.com/data/free/files/stratasurvey.pdf
28. Science is fundamentally about data,
but data is not fundamentally about science
Beyond R and Ph.D.s: The Mythology of Data Science Debunked
Douglas Merrill (ZestFinance)
http://www.youtube.com/watch?v=J2sgObXbIWY (8 minutes 9 seconds)
29. People
A data scientist is a data analyst who lives in California.
– George Roumeliotis, (Intuit)
31. Data
Data
Data
Data
Businessperson: Business person, Leader, Entrepreneur
Creative: Artist, Jack-of-All-Trades, Hacker
Researcher: Scientist, Researcher, Statistician
Engineer: Engineer, Developer
http://datacommunitydc.org/blog/2012/08/data-scientists-survey-results-teaser/
http://cdn.oreillystatic.com/oreilly/radarreport/0636920029014/Analyzing_the_Analyzers.pdf
32. Scientists think they can code,
software engineers think they are scientists.
Team them up so they collaborate.
– Scott Sorenson (Ancestry.com)
Ancestry.com: Managing Big Data Reaching Back to the 11th Century with Hadoop
33. How Nordstrom Utilizes Human Intelligence to Blend Brick-and-Mortar with Online Commerce
http://strataconf.com/stratany2013/public/schedule/detail/30707
34. Data scientists spend their lives as data janitors
instead of leveraging their skills
– Wes McKinney (DataPad)
Building More Productive Data Science and Analytics Workflows
35. Keynote
Is Bigger Really Better?
Predictive Analytics
with Fine-grained Behavior Data
Professor at the NYU Stern School of Business
Foster Provost
http://www.youtube.com/watch?v=1jzMiAfLH2c
(10 minutes 16 seconds)
http://strataconf.com/stratany2013/public/schedule/detail/31685
36. Is Bigger Really Better?
Predictive Analytics with Fine-grained Behavior Data
37. Is Bigger Really Better?
Predictive Analytics with Fine-grained Behavior Data
38. Is Bigger Really Better?
Predictive Analytics with Fine-grained Behavior Data
Predictive does not mean actionable.
– Scott Sorenson (Ancestry.com)
Ancestry.com: Managing Big Data Reaching Back to the 11th Century with Hadoop
39. More data gives you more precision, not more prediction.
Using multiple datasets to reduce errors when measuring values.
Is Bigger Really Better?
- Ravi Iyer (Ranker.com)
Predictive Analytics with Fine-grained Understand yourData Users, and Employees
Behavior Customers,
Using Graphs of Data to
40. Is Bigger Really Better?
Predictive Analytics with Fine-grained Behavior Data
41. Is Bigger Really Better?
Predictive Analytics with Fine-grained Behavior Data
42. Keynote
Big Impact from Big Data
Head of Analytics at Facebook
Ken Rudin
http://www.youtube.com/watch?v=RJFwsZwTBgg
(11 minutes 57 seconds)
http://strataconf.com/stratany2013/public/schedule/detail/31903
44. Hadoop is a hammer,
but you need other tools along with it.
Designing Your Data-Centric Organization
Josh Klahr (Pivotal)
http://www.youtube.com/watch?v=D86udfrVzrI (12 minutes)
45. Big Impact from Big Data
The way you organize information
depends on the question
you intend to ask of it.
- Richard Saul Wurman
Building a Data Platform
46. HaDump
: Loading data into Hadoop
for not reason.
Data Science Without a Scientist
http://strataconf.com/stratany2013/public/schedule/detail/31801
47. Big Impact from Big Data
Technical people still don't understand the business needs of business people!
Business people don't know what's a table.
- Anurag Tandon (MicroStrategy)
Inject Big Data into your Corporate DNA: Enable Every Employee to Make Data Driven Decisions
48. Ask the Right Questions
Organizations already have people who know their own data
better than mystical data scientists.
Learning Hadoop is easier than learning the company’s business.
- Gartner, 2012
Defining your Big Data Arsenal: NoSQL, Hadoop, and RDBMS
http://strataconf.com/stratany2013/public/schedule/detail/29968
49. Non-linear Storytelling: Towards New Methods and Aesthetics for Data Narrative
http://strataconf.com/stratany2013/public/schedule/detail/30207
50. Every Soldier is a Sensor: Countering Corruption in Afghanistan
http://strataconf.com/stratany2013/public/schedule/detail/30828
54. Value of Data
Usable < Useful < Actionable
with Impact
If you can't answer for "so what?",
you only have facts, not insight
- Baron Schwartz (VividCortex Inc)
Making Big Data Small
Descriptive (Easy)
Predictive (Medium)
Prescriptive (Hard)
What happened?
What will happen?
What should we do about it?
Hadoop & Data Science for the Enterprise
55. The Future of Hadoop
: What Happened
& What's Possible?
Co-Founder of Hadoop
Doug Cutting
http://www.youtube.com/watch?v=_WwuZI6AhN8
(14 minutes 41 seconds)
http://strataconf.com/stratany2013/public/
schedule/detail/31591
Big Data is first industry that was created
by open source.
- Jack Norris (MapR Technologies)
Separating Hadoop Myths from Reality
Hadoop the kernel of the OS for data.
56. Hadoop's Impact on the Future of Data Management
Mike Olson (Cloudera)
http://www.youtube.com/watch?v=puHS2JNKgRM
http://strataconf.com/stratany2013/public/schedule/detail/31380
57. Single
:
:
:
:
:
:
S/W & H/W system
security model
management model
metadata model
audit model
resource
management model
Common
: storage & schema
http://www.slideshare.net/cloudera/enterprise-data-hub-the-next-big-thing-in-big-data
58. Last generation of data management is not sufficient
More copies, representations, transformations increase risk
Index once and reuse across workloads, lifecycle
NoSQL: indexing and updates for interactive apps
Hadoop: staging, persistence, and analytics
Data Governance for Regulated Industries Using Hadoop
http://strataconf.com/stratany2013/public/schedule/detail/30738
59. Data Intelligence
Rethink How You See Data
Sharmila Shahani-Mulligan (ClearStory Data)
http://www.youtube.com/watch?v=07hGulTOZGk (9 minutes 6 seconds)
http://strataconf.com/stratany2013/public/schedule/detail/31742
60. The Data Availability Problem
?
Access
Question
Sampling
Analysis & Disc
Modeling
overy
Loading
Insight
Data Prep – too slow!
Information Supply Chain
Introducing a New Way to Interact with Insight
http://strataconf.com/stratany2013/public/schedule/detail/31743
Presentation
61. Running Non-MapReduce Big Data applications on Apache Hadoop
http://strataconf.com/stratany2013/public/schedule/detail/30755
62. Apache HBase for Architects
http://strataconf.com/stratany2013/public/schedule/detail/30619
What’s Next for Apache HBase: Multi-tenancy, Predictability, and Extensions.
http://strataconf.com/stratany2013/public/schedule/detail/30857
63. Securing the Apache Hadoop Ecosystem
http://strataconf.com/stratany2013/public/schedule/detail/30302
64. An Introduction to the Berkeley Data Analytics Stack With Spark, Spark Streaming, Shark, Tachyon, and BlinkDB
http://strataconf.com/stratany2013/public/schedule/detail/30959
65. Schema
Information does not exist until a schema is defined
and data is stored in a relational database
- anonymous
Building a Data Platform
http://strataconf.com/stratany2013/public/schedule/detail/31400
66. Lessons Learned From A Decade’s Worth of Big Data At The U.S. National Security Agency (NSA)
http://strataconf.com/stratany2013/public/schedule/detail/30913
67. Managing a Rapidly Evolving Analytics Pipeline
http://strataconf.com/stratany2013/public/schedule/detail/30635
68. Stringer/Tez
Shark
SQL on/in Hadoop/Hbase Solutions
Perception is Key: Telescopes, Microscopes and Data
http://strataconf.com/strataeu2013/public/schedule/detail/32351
69. All SQL on Hadoop Solutions are
Missing the Point of Hadoop
Every Solution makes you define a schema
- SQL(Structured Query Language) is expressed over an assumed schema
Major reasons why Hadoop has taken of include:
- Ability to load data without defining a schema
- Process data using schema-on-read instead of first defining a schema
Hadoop contains a lot of:
- Raw, granular data sets with potentially inconsistent schemas
- Data sets in JSON, key-value, and other self-describing (non-relational) models
designed for schema-on-read processing
SQL on Hadoop solutions that make you first define a schema are missing
a major part of Hadoop’s usage patterns
Flexible Schema and the End of ETL
http://strataconf.com/stratany2013/public/schedule/detail/31868
71. Hadoop Adventures At Spotify
http://strataconf.com/stratany2013/public/schedule/detail/30570
72. Hadoop Adventures At Spotify
http://strataconf.com/stratany2013/public/schedule/detail/30570
73. Quick prototyping is the fastest way to internal advocacy. Ship It!
Cloud == Speed
We don’t always need a complicated solution. KISS
Play to your differentiating strengths. Experience >> Data
Bias towards impact.
It Takes a Village
EASE!! (Emulate, Analyze, Scale, Evaluate)
How Nordstrom Utilizes Human Intelligence to Blend Brick-and-Mortar with Online Commerce
http://strataconf.com/stratany2013/public/schedule/detail/30707
Prototyping is key to overcoming resistance to change
Technical architecture is heavily influenced by people organization
Developing a team of experienced Hadoop users can often be done
using internal employees
A culture of experimentation and innovation yields the best result
Ancestry.com: Managing Big Data Reaching Back to the 11th Century with Hadoop
http://strataconf.com/stratany2013/public/schedule/detail/30499