David Douglas, CrinLogic
The Big Data headlines are unrelenting; with each passing day seemingly bringing new discoveries, products, partnerships, venture funds, you name it into the mix. If anything, it is all a bit confusing. Listening to all this you might come to the conclusion that Big Data will solve most of your problems, place your company miles ahead of your competition, drive your Net Promoter Scores through the roof, and fall just short of solving world hunger (ok…maybe not that far).
And one can’t blame you if you think all one needs to do is install the Hadoop ecosystem of projects, conjure up some possible business use cases, throw some commodity hardware into the mix, attend some training, purchase some Big Data analytics software and VOILA, you have arrived and can enjoy the fruits of your Big Data efforts.
With tongue firmly planted in cheek, the reality is vastly different. This talk is partially a reality check on Big Data implementation strategies - starting with Big Data is easy, becoming proficient is hard, fully integrating into a broader enterprise data strategy is very hard – and partially an information sharing session on what we’re learning as we engage with customers in various industries on Big Data. Among other things we will explore: building the business case; software and hardware requirements analysis; selection process and implementation approaches; what tends to work well, not so well, and what to avoid; and how big data is likely to affect enterprise data architecture.
David Douglas is a member of Hadoop-DC User Group and is a co-founder of CrinLogic, a Big Data consultancy based in the greater DC area. He has devoted his 17 years of professional experience to helping clients maximize the value of their strategic IT initiatives. Prior to co-founding CrinLogic, David started two other companies. The first was an angel-backed Sales Force Automation software company he sold in 2002 and the second is a consulting services company that focuses on Agile and Lean software adoption and large-scale program implementation services. He helped start the Data Warehousing practice at American Management Systems and was one of the first consultants to join IBM’s Business Intelligence practice.
3. A little Big Data story to start us out ;)
5/1/2012 3 CrinLogic
4. About me
David Douglas is a member of Hadoop-DC and co-founder of CrinLogic. He has
over 17 years of IT consulting experience with concentration in Business
Intelligence, Agile and Lean software development, and large program
implementations. He is a passionate believer in Big Data and the enormous
possibilities it offers.
CrinLogic is a Big Data consulting firm. Our passion for Big Data is surpassed
only by our curiosity and love of learning. We offer full service Big Data
consulting services and training. Visit us at www.CrinLogic.com. We are based in
DC, Chicago, Austin, and Sarajevo.
david@crinlogic.com
443.413.4038
5/1/2012 4 CrinLogic
5. This talk is about…
Some things I’d like you to walk away with
1. A big picture perspective on this market
2. What customers are saying
3. Thoughts on developing a business case
4. Learnings
What this talk is not about
1. A technical discussion of Big Data
5/1/2012 5 CrinLogic
6. Data for this talk came from
1. Talking with 30 plus companies from all walks
of life and all stages of maturity
2. Talking with colleagues in the Big Data space
(hardware and software vendors)
3. Current customer engagements
4. Research
My biggest surprise is my awareness of how little I know. No one really
understands how to build a Big Data solution. We are all learning as we go.
5/1/2012 6 CrinLogic
7. My Perspective on Big Data
Enterprise Data Architecture
Iterative
Rising tide Post adopter
syndrome
Failures
Business problem Systems Thinking
focused
5/1/2012 7 CrinLogic
8. Yes it is big and growing!
5/1/2012 8 CrinLogic
14. No generally accepted definition for “Big Data”
“We don’t generate enough data for that”
“Don’t you need at least 100TBs?”
Or they simply think they are already using Big Data
And lest we forget the 3Vs…
Volume, Variety, Velocity
(just a couple pointers on these)
5/1/2012 14 CrinLogic
15. So many products and choices promising so much
Where Database
Open Source
Hardware Analytical
Tools
Network
5/1/2012 15 CrinLogic
17. No generally accepted definition for “Data
Scientist”
Are they a critical success factor for Big Data Solutions?
[True or False]
Were they a critical success factor to Business
Intelligence solutions?
OSEMI – Obtain, Scrub, Explore, Model, Interpret
www.dataists.com Hillary Mason & Chris Wiggins
5/1/2012 17 CrinLogic
18. So tell me the why please?
5/1/2012 18 CrinLogic
19. McKinsey’s 5 Value Propositions
1. Make information transparent and usable more
readily
2. Expose variability and enable performance
improvement
3. Better customer segmentations
4. Advanced analytics for better decision making
5. New products
5/1/2012 19 CrinLogic
20. Not seeing the Big Analytics Piece
In fact, of the many companies employing Big Data we’ve
talked to or are working with are not doing big data
analytics
5/1/2012 20 CrinLogic
21. Tactical versus Strategic
Tactical solves an immediate pain point
•Batch jobs taking too long
•Reaching limit of scalability on current infrastructure
•Budget was reduced recently but still have to deliver
•New project ‘just so happens’ to need this newer technology
Strategic implies Big Data as strategic
•Seeking competitive differentiation direction is much
•Creating actual solutions with value harder
5/1/2012 21 CrinLogic
23. Figure out the Business Case
Congratulations! The CEO of a large Financial Services firm has
asked you and your team to map out the company’s Big Data
Strategy so he can present to the board. He is known for being
thorough. Now get to work!!
Of the below choices, which is the best first step?
a. Scour the Internet for Big Data c. Phone a friend (or CrinLogic)
use case success stories for d. Build relationships, interview all
Financial Services and then go areas of the company, research
talk to VPs in that area market, and consolidate the
b. Build a virtual cluster on your results [but time-box it to a
machine, open direct link to couple weeks]
Twitter hose and show CEO what
the community is saying about
Goal is to identify the most
him real-time appropriate areas to start…high
reward…high visibility
5/1/2012 23 CrinLogic
24. This can be helpful…
Manage Manage
Manage
Develop Business Acquire Customer Service Delinquencies
Finance &
Strategy Customers Relationshi Customers Recoveries &
Accounting
p Fraud
Establish Develop Account
Develop Card Define Customer Develop Collections Manage Accounting
Strategic Management Offers
Acquisition Offers Experience Strategies and Reporting
Imperatives & Policies
Design Account
Develop Marketing Develop Acquisition Develop Servicing Develop Recoveries
Management Manage Treasury
Strategy Campaigns Strategies Strategies
Campaigns
Develop Market Identify Customers/ Provide Customer Develop Fraud Manage Planning &
Identify Prospects
Innovations Targets Service Strategies Analysis
Collect on
Solicit Prospects & Communicate Delinquent Manage Line of
Maintain Accounts
Promote Offers Offers/Changes Accounts Business
Sample Decision
Applications & Book
Accounts
Decision
Response/Request
Process Credit Card
Transactions
Recover Charged
Off Accounts Manage Credit Risk
Large Fulfill on Decisions
Fulfill on Detect and Recover
Fraud
Financial Offers/Changes
Institution Manage Manage
Manage
Regulatory
Manage Human Information
Affairs &
Correspondence Resources Technology
Compliance
Manage Funds Manage IT Manage External
Manage Rewards Manage HR
Disbursements Operations Compliance
5/1/2012 24 CrinLogic
25. Identify Big Data Impact Areas
Manage Manage
Manage
Develop Business Acquire Customer Service Delinquencies
Finance &
Strategy Customers Relationshi Customers Recoveries &
Accounting
p Fraud
Establish Develop Account
Develop Card Define Customer Develop Collections Manage Accounting
Strategic Management Offers
Acquisition Offers Experience Strategies and Reporting
Imperatives & Policies
Design Account
Develop Marketing Develop Acquisition Develop Servicing Develop Recoveries
Management Manage Treasury
Strategy Campaigns Strategies Strategies
Campaigns
Develop Market Identify Customers/ Provide Customer Develop Fraud Manage Planning &
Identify Prospects
Innovations Targets Service Strategies Analysis
Collect on
Solicit Prospects & Communicate Delinquent Manage Line of
Maintain Accounts
Promote Offers Offers/Changes Accounts Business
Decision Recover Charged
Decision Process Credit Card
Applications & Book Off Accounts Manage Credit Risk
Response/Request Transactions
Accounts
Fulfill on Detect and Recover
Fulfill on Decisions Fraud
Offers/Changes
No Impact Manage
Manage Manage
Regulatory
Low Impact Manage Human Information
Affairs &
Correspondence Resources Technology
Compliance
Moderate Impact
High Impact Manage Funds Manage IT Manage External
Manage Rewards Manage HR
Disbursements Operations Compliance
5/1/2012 25 CrinLogic
26. Manage
Naturally! Fraud & Recoveries Delinquencies
Recoveries &
Fraud
Develop Develop Collect on Recover
Develop Fraud Detect and
Collections Recoveries Delinquent Charged Off
Strategies Recover Fraud
Strategies Strategies Accounts Accounts
• Analyze Collections • Research Fraud • Determine Collections • Charge Off Bad Debt • Detect Fraud
Strategies Strategies Strategy
• Process • Decision
• Maintain Collections • Design/Test • Enter Collections Bankruptcies Identity Fraud
Systems Fraud
• Exit Collections • Process Estates • Decision
Strategies
Strategy Transaction
• Process Recoveries
• Implement Fraud
• Fulfill Collections Payments
Fraud
Strategy • Recover Fraud
Strategies
• Monitor
Commitments
• Service Collections
Account
No Impact
Low Impact
Moderate Impact
High Impact
5/1/2012 26 CrinLogic
27. Be ready to answer these questions
1.Do they currently have an analytics group?
2.Do they make decisions based on data?
3.Do they have data center management skills?
4.Do they have stringent regulatory requirements?
5.What are the current sources of data?
6.What other sources of data are of interest?
7.What are their KPIs?
8.What is the maturity of their enterprise data architecture?
9. What is the maturity of their business intelligence
initiative(s)?
10. Others?
5/1/2012 27 CrinLogic
28. Predictive Analytics Maturity Model
Maturity Level Supply Chain Simulator
PnP Simulator
Full picture/context optimization Mix Simulator
Global Suite
Analytics (Supply Chain/PnP/Mix/Media …)
Media Buy
Holy Grail with integrated workflow (ERP …)
[…]
5
Actionable
Analytical Optimization Implementable
Master Insights
( Institutionalized
Analytics) Forecasting/Full
Threshold Simulation
4 based
Insights
Preemptive
Suggestions
Forward Looking DSS
Insights
Analytical
Practitioner Automated Automated
(In-house Insight Insights
s team)
Decks
Pricing and Promotions
Marketing Mix
3 Specialized/Targeted
Analytics Products Segmentation
Current Market is fragmented and
overlapping.
Consumerization
Analytics Highly specialized.
Assortment
Amateur Many players often produced excel-
Churn/Attrition
(Some BI) Oracle Suite based tools
Supply Chain
Microsoft Suite […]
Full BI Suites With Some SAS
2 Data Mining/Analytics IBM
(Mostly Built-In) Spotfire
Localized
[…]
Analytics
(Some Sales
Drilldown)
1 BI Tools/ Reporting MS Office Tools
Engine BI Reporting Tools
Really basic Analytics Internal attempts
Analytics
Laggard
Data/Software Decision Sciences
MAP 4.0 Product Features
Model courtesy of Dhiraj Rajaram, CEO Mu Sigma and
Joseph de Castelnau, SVP Engineering Nielsen CrinLogic
29. Opportunity Areas
High Highest benefits
are most likely
realized when
building these
products or
features
Business
Strategic Size of bubble = Est.
Value Effort
Low
IT Strategic
Low High
Value
Sources: “Measuring the Business Value of Information Technology”, Intel Press
5/1/2012 29 CrinLogic
30. Opportunity Areas
Size of bubble = Est.
High Effort
Business
Strategic
Alignment
So why do these get built?
Low
IT Strategic
Low Alignment
High
Sources: “Measuring the Business Value of Information Technology”, Intel Press
5/1/2012 30 CrinLogic
32. Implementation Approach
• Big Data does not lend itself to a Big Bang approach (actually does
anything really?)
• Proof of concepts make perfect sense to gain traction (top-down push is
preferable to federated)
• As with any effort with such potential, appropriate oversight by
combination of IT/Business executive
Other considerations
• Form a central team with key skills in building Big Data solutions. This consulting
team should help train, mentor, and provide consulting expertise to new initiatives
….helps ensure consistency in approach.
• There is a price of entry…each new participating area should bring resources to the
table
• Encourage building a community of analytic junkies and support them…community
building … goal is information sharing…build a Big Data culture
• Preference for consolidation
5/1/2012 32 CrinLogic
34. 2009/2010 H/W Recommendations
• 4 x 1TB hard drives
• 2 x Quad-core CPUs, each 2.0-2.5GHz
• 16GB RAM
• Gigabit Ethernet
5/1/2012 34 CrinLogic
35. Commodity Hardware Today
4-6 x 2TB SATA Drives
1U 1 or 2 Socket
1 x 6 or 2 x 6 Cores Approx $4K
4GB Core Memory
24+ GB RAM
12 x 2TB SATA Drives
2U
1 or 2 Socket
Approx $6K 10GB
1 x 6 or 2 x 6 Cores ethernet?
4-8GB Core Memory ($7K)
24 + GB RAM
20 or 36 x 2TB SATA Drives
4U
80 x 3TB (???)
1 or 2 Socket Approx $12K
1 x 6 or 2 x 6 Cores
4-8GB Core Memory
5/1/2012 35 CrinLogic
36. Thoughts on Storage TCO
• *Price != Cost and TCA is < 20% of TCO
• $ per TB not an exact science
*David Merrill, Hitachi Data Systems Chief Economist, “Storage Economics: Four Principles for Reducing Total Cost of Ownership” July, 2011
5/1/2012 36 CrinLogic
37. ‘New to Big Data’ learnings
Iterative process…if you go in
A lot of this is not claiming you ‘know’ the use case you
intuitive…e.g. MapReduce, want to solve you are in for a surprise
Columnar based DBs and we live in an
RDBMS world
Open Source or
It takes a wealth of skills not
‘Free’ not a big
resident in a single person…
understand batch MapReduce framework, selling point for the
networks, grid computing, analytics, subject larger companies
matter experts, and more
5/1/2012 37 CrinLogic
38. Some key learnings from early adopters
Don’t forget operations…in Be ready to embrace
2010 Facebook had between 400-500 emergent solutions and
operations professionals…on par with
entire engineering organization
emergent architecture…
Source: http://framethink.wordpress.com/2011/01/17/how- and emergent support base within
facebook-ships-code/ company
Vendor support still
needs to catch up…not the
level of support companies are used to
from established technology vendors
5/1/2012 38 CrinLogic
39. Thinking about workloads…latency
Solutions are generally a mix
of different paradigms
Start here!
High Latency Low Latency
(1 hour plus) (real-time)
5/1/2012 39 CrinLogic
40. Other Random Learnings
• For many companies, there will likely be a cultural
change required to become good in Big Data
analytics
• Customers have major concerns about security and
cloud
• Don’t tell a risk officer that Hadoop’s replication
framework mitigates need for disaster recovery
• How about you all…any Random Learnings you want
to share?
5/1/2012 40 CrinLogic
41. Big Data Analytics Learnings
“Analytics is the act of taking Big Data streams and human-sizing
them for our small data brains.”
Source: http://www.dataspora.com/2010/05/new-tools-for-big-data/#more-182
There are no turnkey solutions in analytics space
(efforts underway to make big data analytics accessible to the non-Data Scientist)
5/1/2012 41 CrinLogic
43. Musings
• Chief Data Officer || Chief Data Scientist
• Just because you can retain all this data does it
mean you should?
• Big Data and virtualization
5/1/2012 43 CrinLogic
44. Thinking about Starting a Big Data solution
1. Big Data strategy assessment?
2. Go small (success breeds success)
3. Let RT and near RT come to you…don’t start there
4. Ensure you have the right skills (or bring them in)
5. If only R&D focus then upside may be
limited…business needs to have a seat at the table
Consider hiring a professional Big Data consulting firm to
help in the transition!
5/1/2012 44 CrinLogic
45. Some good resources for you
Blogs Videos
Databases and Data Infrastructure http://www.youtube.com/watch?v=SS27F-
http://www.dbms2.com hYWfU&feature=relmfu
http://dbmsmusings.blogspot.com. http://www.youtube.com/watch?v=2FpO7w6X
41I
http://databeta.wordpress.com.
http://www.youtube.com/watch?v=OmlX3IHb
http://blogs.gartner.com/donald-feinberg
0JE
http://itmarketstrategy.com/
http://www.youtube.com/watch?src_vid=UaGI
Big Data Analytics NWPK068&annotation_id=annotation_65559
http://hunch.net/ &v=XAuwAHWpzPc&feature=iv
http://ml.typepad.com/ http://www.youtube.com/watch?v=eUcej07dG
www.dataists.com u4
http://www.dataspora.com/blog/ http://www.youtube.com/watch?v=viPRny0nq
http://blog.data-miners.com/ 3o
http://www.visualcomplexity.com/vc/blog/
5/1/2012 45 CrinLogic