Mais conteúdo relacionado
Semelhante a The Five Markers on Your Big Data Journey (20)
Mais de Cloudera, Inc. (20)
The Five Markers on Your Big Data Journey
- 1. 1© Cloudera, Inc. All rights reserved.
Your customer’s journey to success AND
Your journey to quicker and larger expansions
Making Data Real
- 2. 2© Cloudera, Inc. All rights reserved.
What we are going to discuss here
1. How advanced analytics is better than data
warehousing
2. Being data-driven is a journey, not a project
3. Our most successful customers do five key things
- 3. 3© Cloudera, Inc. All rights reserved.
Innovation around the world driven by data
Using data about bets per second
and machine learning to promote
responsible gambling by
customising offers to minimise the
customer's vulnerability.
A "smart business" application for
small businesses that enables them
to see patterns in an anonymised
data generated by the bank's other
customers.
CONNECT PRODUCT & SERVICES (IoT) DRIVE CUSTOMER INSIGHTSPROTECT LIVES
Analyzing acoustic data coming
from turbines in real-time to
monitor the health of and predict
failures in turbines for hydro power
stations.
- 4. 4© Cloudera, Inc. All rights reserved.
Advanced analytics is better than data warehousing
Build your data asset economically and at scale
1. Collect data in native format – enables agility
2. Build history by collecting data prior to its use
Securely share on-prem, in cloud, anywhere
3. Security at the data layer increases flexibility and ability to protect privacy
4. Create community data and drive innovation by sharing across your business
Innovate with analytics and operationalize the insights
5. Analyze data in near real-time
6. Build and deploy machine learning models and other advanced analytics
7. Deliver insights via enterprise, mobile and web applications
- 5. 5© Cloudera, Inc. All rights reserved.
Think Big.
Start small.
Iterate to success.
Being data-driven is a journey, not a project.
- 6. 6© Cloudera, Inc. All rights reserved.
Customers ask: what does ‘think big’ mean?
Determine your strategic initiatives.
Read your annual report.
Define a reasonable timeframe and goals.
Typically 3-5 strategic initiatives in parallel
Often segmented by business unit
At maturity initiatives cross business units.
- 7. 7© Cloudera, Inc. All rights reserved.
Customers ask: what does ‘start small’ mean?
Embrace the familiar.
Enhance something you know.
Make a report better with more data.
Then go get a shiny object (new data).
Bring in and integrate.
Showcase your results with a visualization.
- 8. 8© Cloudera, Inc. All rights reserved.
Customers ask: what does ‘iterate often’ mean?
Break strategic initiatives into quarterly objectives.
Break your quarterly objectives into sprints.
Deliver and visualize outcomes at every sprint exit.
Continuously learning and adapting.
Outcomes can be both positive and negative.
Outcomes can be about the business and data.
- 9. 9© Cloudera, Inc. All rights reserved.
Our Most Successful Customers iterate these Five Things
1. Build a Big Data Culture
Led by an enabled executive sponsor(s). Communication methodologies. Advocating change.
2. Assemble the right team
Tightly aligned team. Mix of seasoned experts and innovators
3. Adopt an agile approach for data engineering, data science, analysis
Successful projects start small, are hypothesis driven and iterate to success approach.
Roadmaps: Document expected direction, yet expect insights to create change
4. Efficiently operationalize insights
Analytics -> Reports, Big Data -> Actions. Create a bridge between Dev and Ops
5. Rightsize your data governance
Rightsize and iteratively building towards maturity.
- 10. 10© Cloudera, Inc. All rights reserved.
Description
Executive
Sponsorship
Executive Sponsor for the overall Big Data mission including advocacy for
creating/collecting data and business stakeholders for individual use cases. Align to
strategic initiatives.
Community
Build community through communications about vision, insights, data and platform
and technology.
Make communications more programmatic across the entire organization with
meet ups, big data days and hackathons.
Foster a culture that iteratively and continuous builds a strong
sharing community. Enable many in the organization – over time –
to become evangelists
1. Build a data-driven culture
Visualizations Visualize EVERYTHING! Use visualizations to tell stories about the data asset itself
(how big is it, how fast is it growing) as well as insights found for the business.
- 11. 11© Cloudera, Inc. All rights reserved.
Logical Information Architecture
An environment that supports new ways of working
Ingestion Zone Discovery Zone Integrated Zone Production
Trusted power users have broad
access & new tools.
The new data engineering
team partners with
data stewards to ingest
full fidelity raw data
Continuous deployment concepts move
data, models, etc from exploratory
environment to production
Business users have
narrowed access with
traditional BI tools
and applications
- 12. 12© Cloudera, Inc. All rights reserved.
Ingestion Zone Discovery Zone Integrated Zone Production
Raw Trusted
Ingest
Validation &
Verification
Enrichment Transform Routing
Logical Information Architecture
An environment that supports new ways of working
- 13. 13© Cloudera, Inc. All rights reserved.
2. Assemble the right team 1
Executive
Architecture &
Operations
Data
Engineering
Data Science,
SQL & app
development
Vision and Goals
- 14. 14© Cloudera, Inc. All rights reserved.
Description
An essential key to success is having a strong executive sponsor for the
overall Big Data mission including advocacy for creating/collecting data
and business stakeholders for individual use cases.
Profile
An executive focused on change, and willing to take risk to ensure the
success of the business via the Big Data initiatives.
Education
Use every opportunity to bring the topic in front of potential
sponsors and stakeholders. Share industry and business potential
ROI models (heeding the warning not to overstate).
Advocacy
Build big data success stories from within the business. Advocate for
the use of data in new ways. Support the proactive collection of data
and lead the charge to assign value to data.
The Important Role of the Executive Sponsor
- 15. 15© Cloudera, Inc. All rights reserved.
Hadoop and the Big Data technology ecosystems change rapidly
– infrastructure architecture is a critical component of your
team. Architects need to balance tactical and strategic needs.
Communication
The software and hardware infrastructure is often physically operated
by an different group. The Architect needs close collaboration.
Education
Continually explore new technologies, including 3rd party tools –
architects need to stay ahead of the curve. Training is essential:
admin, developer.
Leadership
Be the infrastructure expert and advise on new projects and new
requirements from the data management team and the business.
Know when to call in the experts on Hadoop and Big Data.
Description
Your Infrastructure Team & Architect
- 16. 16© Cloudera, Inc. All rights reserved.
Data is only useful if users can employ it in a meaningful way.
Data engineers have to be committed to making your company’s
data the utmost strategic asset, from acquisition to advocacy.
Communication
Document, secure, audit the data. Create simple schemas and search
indexes for each data set. Create common profiles, and continually
advocate for new data and for improved data.
Education
Get trained and certified with Cloudera Administrator, Developer,
Data Analyst courses and become an expert with Navigator, Sentry
Leadership
Promote and evangelize to educate on the value of Big Data, take the
lead on data governance – love the data
Description
Your Data Engineering Team
- 17. 17© Cloudera, Inc. All rights reserved.
Curiosity
Math &
Statistical
Knowledge
Hacking
skills
Subject
Matter
Expertise
The hybrid data scientist
• Subject Matter Expertise lies
in the business
• Hacking skills can come from
existing IT staff or new hires
• Staff at least one true Ph.D
statistician for model
oversight across all teams
Important character trait
Data Science
A luxury is finding one or more
data scientists that cross these
disciplines
Your Data Scientist Team(s)
- 18. 18© Cloudera, Inc. All rights reserved.
Often a centralized Data Science team can partner with the
business to identify data that differentiates, explore use cases to
solve, and help to jumpstart business teams. Be mindful not to
overbuild centrally.
Agility The team must be able to learn quickly and adapt
Skills
Hybrid skills of computer science (hacking), domain expertise and
at least one true statistician. Data Science training.
Teams
Often businesses find the domain expertise in-house, add in MS/Ph.D.
candidates from local universities and hire that one true statistician
Experts
This team must be the “data experts” for the entire company in order
to fulfil the vision of sharing data for maximum innovation
Description
Staff for Success: Data Science-as-a-Service
- 19. 19© Cloudera, Inc. All rights reserved.
Lower risk
Risk of funding long-running projects with limited business value is
small. Use daily results to improve the process or change course.
Lower costs
Can run infrastructure, data and insights workstreams in parallel.
Avoids large build-out of infrastructure and data before insights.
Communication
With clear short-term results, enables a continuous communications
stream showcasing results or failures
Team
Can start with small team, and add additional scrum teams as value is
determined and investment is available
Agile methodology provides actionable results more rapidly and
measures the value gained at each step, in small iterations. Agile
should be applied to data and insights project workstreams.
Description
3. Adopt an agile approach to data engineering & science
- 20. 20© Cloudera, Inc. All rights reserved.
Use Case Development
EDH Buildout
Data Governance & Common Profile Development
Data Engineering
Agile Methodology Enables Iterative Workstreams
Use Case Development/App development/Data science
- 21. 21© Cloudera, Inc. All rights reserved.
Agile Use Case Development
Scrum Team Release 1 Release 2 Release 3
Production
Ready
Scrum Team Release 1 Release 2
Production
Ready
Release 4
Release 3
Agile Data Ingestion/Management
Scrum Team Release 1 Release 2 Release 3
Production
Ready
Scrum Team Release 1 Release 2
Production
Ready
Release 4
Release 3
Agile Methodology Enables Iterative Workstreams
Agile Data Governance & Common Profile Development
Scrum Team Release 1 Release 2 Release 3
Production
Ready
Scrum Team Release 1 Release 2
Production
Ready
Release 4
Release 3
EDH Buildout
- 22. 22© Cloudera, Inc. All rights reserved.
Description
DevOps Operationalizing data and insights from analytics is more like digital and mobile
development cycles, than traditional ERP or RDBMS applications
Communication
Start by finding those people who understood both sides of the agile development
and IT deployment cycle – and bridge the communication gap between both sides
Initially most DevOps processes were manual – making sure web code was unit and
functionally tested, ensuring source code was under a control system, etc.
Apply DevOps concepts from the world of application / web
development to the world of data and analytics – from
managing data that needs to move to production, as well as the
models used to create insights from that data.
4. Efficiently operationalize your insights
Continuous
Delivery
Move towards automation of the processes needed to move new analytical
code/models/data from development into production. Eventually get to complete
automation of those processes, allowing for continuous deployment of new
analytical models and data into production
- 23. 23© Cloudera, Inc. All rights reserved.
Data Stewards
Owners and/or creators
of the data
Responsibilities
Providing knowledge
about the data (e.g.
privacy, use case
concerns)
Documenting and
improving the raw data,
with focus on link-ability
Data Engineers
Implement the data
governance policies
Responsibilities
Defining and driving the
governance
Organizing and hosting
the Governance Council
Delivering and utilizing
tools (e.g. Navigator) to
enforce governance
Data Governance
Council
Business owners of the
Data Governance
Responsibilities
Communication about
and enforcement of
data governance
Assigning data steward
roles
Improving the link-
ability of data
5. Rightsize Your Big Data Governance
- 24. 24© Cloudera, Inc. All rights reserved.
Rightsize your data governance: Iterate to maturity
Chaos: “We don’t
know what’s in our
data hub”
CYA: Basic
governance artifact
capture
Self-service: Data
curation automation
Automation: Data
stewardship and
lifecycle automation
Continuous
improvement:
ongoing
optimization
1
2
3
4
5Initial
Managed
Standardized
Measured
Optimized
- 25. 25© Cloudera, Inc. All rights reserved.
Data can make what is
impossible today,
possible tomorrow
- 26. 26© Cloudera, Inc. All rights reserved.
Changing our relationship with the
products and services we consume
- 27. 27© Cloudera, Inc. All rights reserved.
Improving reliability, quality,
& sustainability