SlideShare uma empresa Scribd logo
1 de 24
Think Big, A Teradata Company
Rick Farnell, Co-Founder & SVP International
2
Open Data Platform Support
© 2015 Think Big, a Teradata Company
3
What we learned over 5 years at Think Big
Teamwork is Critical Skills Matter Celebrate Success
© 2015 Think Big, a Teradata Company
4
Our hunch was right…this market will be BIG
Source: www.Indeed.com April 8, 2015
© 2015 Think Big, a Teradata Company
http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2013-2017
5
• 100% Big Data Focus
• Founded in 2010 with over 100 engagements across 70 clients
• Unlock client value of big data with data science and data engineering services
• Proven vendor-neutral open source integration expertise
• Agile team-based development methodology
• Think Big Academy for skills and organizational development
• Global delivery model on-site services with near shore and off shore support
Who is Think Big?
© 2015 Think Big, a Teradata Company
6
Think Big Services Engagement Model
STRATEGY IMPLEMENTATION SOLUTION SUPPORT
Think Big offers end-to-end Big Data strategy, implementation and support services focused on
helping customers quickly achieve ROI on their Big Data investments
Enterprise Data Lake Software Frameworks
Big Data
Roadmap
Establish
Data Lake
Analytic
Solutions
Managed
Services
Data Lake
Optimization
© 2015 Think Big, a Teradata Company
7
5 Years of Big Data Services with Industry Leaders
Think Big Clients
eCommerce
2 of Global Top 5
Retail
2 of Global Top 5
Social Networking
Global #1
Banking
4 of Global Top 10
Credit Issuer
2 of Global Top 5
Financial Data Services
2 of Global Top 5
Financial Exchanges
Global #2
Brokerage & Mutual Funds
2 of Global Top 5
Asset Management
Global #1
Semiconductor
2 of Global Top 5
Data Storage Devices
3 of Global Top 5
Disk Drive Manufacturing
Global #1
Telecommunications
2 of Global Top 5
Media & Advertising
2 of Global Top 4
Internet Transaction Security
Global #1
© 2015 Think Big, a Teradata Company
8
Top 5 Learnings over 5 Years in Big Data
1. Big Data is a journey not an event.
2. Open source is great for innovation but requires engineering
sophistication in design patterns & best practices to continually scale
analytics in production.
3. Importance of metadata, lineage, data management, data
governance, data quality is critical to successful enterprise Big Data
deployments.
4. A new data platform will not resolve organizational problems.
5. The Hadoop & opensource ecosystem is moving at the speed of light,
you must be agile. Test, build, learn, repeat.
© 2015 Think Big, a Teradata Company
Think Big Enterprise Data Lake Case Study:
Global Manufacturing Client
10
Manufacturing Client Overview
Metrics:
• Collecting >2M manufacturing/testing binary files daily across Americas and APAC Facilities
• Collecting from ~500 tables across 6 databases  tens of millions of records daily
• Over 140 analytics users to date
• Over 150 attendees participated in Big Data Platform training
Business Goals for Investment in Enterprise Data Lake:
• Provide Enterprise-wide data access for timely analytics to product quality engineers
• Establish foundation for large scale proactive manufacturing and quality analytics
• Reduce $Millions in costs of scrap waste
• Reduce $Millions in costs of containment processes
• Reduce $Millions in costs of data search parties
• Increase revenue and market share by accelerating time to market of products
© 2015 Think Big, a Teradata Company
11
Engineering and Ingestion Design are Critical Day 1
1. Ingestion1. Utilize a robust Ingestion
Design with a Buffer Server
Zone
2. Packaging
2. Robust packaging many
small flies into larger files
with metadata.
3. Parse on Demand
3. Parse on demand.
Increase velocity with right
level of parsing, parse and
refine in more detail as
needed.
4. Establish Zones for Control
4. Establish Zones for
comprehensive pipeline
control, buffer, landing,
ingest, core and publish.
© 2015 Think Big, a Teradata Company
12
Think Globally
EDW
EDW
5. Replication on Ingestion
5. Replicate at ingestion
with collection of key
metadata information.
6. Data Treatment Facility
6. Utilize HDFS as giant file
system, all data lands but
some will continue in pipe
to other MPP/EDW systems
etc.
7. WAN Accelerators
7. WAN accelerates the
speed transfer of ASIA
to/from North America
8. Published Data
8. Processing and format is
customized for the
audience and
consumption pattern.
© 2015 Think Big, a Teradata Company
13
Governance for Production Workloads
9. Design Zones for Enterprise Governance
9. Zones will have different
retention periods and
different access patterns
and in order to have
pipeline control within your
governance strategy you
must design for this upfront.
© 2015 Think Big, a Teradata Company
14
Optimize Access Patterns of your Data in Hadoop
10. Relational access pattern
10. Optimized for known
queries. Equivalent of a
Ferrari moving a bag of
groceries.
11. Hadoop access pattern
11. Recast your problem.
Hadoop optimal for large
batches of work. Optimize
hierarchical queries with
transitive closure.
Equivalent of a freight train
moving multiple
warehouses of groceries.
© 2015 Think Big, a Teradata Company
15
Pipeline Management, Monitoring & Control
HDFS
12. Data Pipeline
12. End to End Pipeline
visibility is extremely
important in an Enterprise
Data Lake. Metadata
Design and use of best
practices are key in the use
of this pattern.
© 2015 Think Big, a Teradata Company
16
Detailed Ingestion Design Patterns
13. Source Systems
13. Hundreds of servers &
hundreds of data streams
require expert enterprise
engineering.
14. Utilities
14. Layer in logging,
monitoring and scheduling
for complete control of
your Enterprise Data Lake.
© 2015 Think Big, a Teradata Company
17
ROI for Enterprise Data Lake Manufacturing Client
Est. at $11M year 1, $30M cumulative year 2
© 2015 Think Big, a Teradata Company
Operations Engineer: a recent production issue required detailed historical testing data. Our current
systems did not have the required retention for this request. The Big Data team was able to pull and
analyze all the required data from the Big Data Platform in minutes, as opposed to 3+ weeks that we
used to take to pull the data from multiple systems and off tape archive.
Legacy Systems Enterprise Data Lake
Retention 3-6 months scattered
DBs & tape archive.
100% data online in
enterprise data lake for 3+
years.
Coverage Summaries, samples
for several data sets.
100% parametric data
captured in raw form.
Analysis Reactive, missed
quality improvement
opportunities.
Daily dashboard with larger
data sets to support
proactive improvements.
18 © 2015 Think Big, a Teradata Company
Enterprise Data Lake
Information Sources
Evaluate
Source Data
Ingest
Collect & Manage
Metadata
Apply
Structure
Sequence
Compress
Automate
Protect
Prepare Data
for Ingest
Prepare Source
Metadata
Perimeter-Authentication-Authorization
InfoSec
Downstream
Applications
Dashboard
Engine
Think Big Enterprise Data Lake
Industry Leading Assets, Services and Methodology
19
20 © 2015 Think Big, a Teradata Company
• Think Big is expanding, bringing its focus
on open source consulting to the
international region
• An office in the UK’s London Bridge
Business district will serve as its
international hub
• Think Big is aggressively hiring a team of
data engineers, data scientists,
technology project managers and sales
leaders
• Rick Farnell, Think Big co-founder and
SVP, International will lead the
international practice
London Bridge Business District
Photo credit: Duncan Harris. Courtesy of Flickr. Creative Commons
Think Big International Expansion
21 © 2015 Think Big, a Teradata Company
Dublin, Ireland Munich, Germany Mumbai, India
Think Big International Expansion: Phase 1
Think Big International Hub
London, England
Photo credits: London (Duncan Harris). Dublin (Guiseppe Milo), Munich (John Morgan), Mumbai (McKay Savage).
Courtesy of Flickr. Creative Commons
22
23
Dashboard Engine for Hadoop
Fast Access for Comprehensive Historical Data Stored in Hadoop
“Right” time data
Latencies under a second
Scales easily for thousands of simultaneous users
Reporting, Visualization and
Analytics
© 2015 Think Big, a Teradata Company
Visit our Think Big Booth
to see our Dashboard Engine Demo.
Enterprise Data Lake
Information Sources
Downstream
Applications
24 © 2015 Think Big, a Teradata Company
Thank you
Rick Farnell
Co-Founder and SVP International, Think Big
Rick.Farnell@thinkbiganalytics.com

Mais conteúdo relacionado

Mais procurados

Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...DLT Solutions
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Regulation and Compliance in the Data Driven Enterprise
Regulation and Compliance in the Data Driven EnterpriseRegulation and Compliance in the Data Driven Enterprise
Regulation and Compliance in the Data Driven EnterpriseDenodo
 
Building Your Enterprise Data Marketplace with DMX-h
Building Your Enterprise Data Marketplace with DMX-hBuilding Your Enterprise Data Marketplace with DMX-h
Building Your Enterprise Data Marketplace with DMX-hPrecisely
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...Capgemini
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesDATAVERSITY
 
The Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data SolutionThe Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data SolutionDATAVERSITY
 
The Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationThe Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationAnalytics8
 
The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldDATAVERSITY
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesDATAVERSITY
 
Analyst Webinar: Best Practices In Enabling Data-Driven Decision Making
Analyst Webinar: Best Practices In Enabling Data-Driven Decision MakingAnalyst Webinar: Best Practices In Enabling Data-Driven Decision Making
Analyst Webinar: Best Practices In Enabling Data-Driven Decision MakingDenodo
 
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera, Inc.
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?Denodo
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data ArchitectureWei-Chiu Chuang
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake Pat O'Sullivan
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoopDr. Wilfred Lin (Ph.D.)
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with HadoopPrecisely
 
Agile BI: How to Deliver More Value in Less Time
Agile BI: How to Deliver More Value in Less TimeAgile BI: How to Deliver More Value in Less Time
Agile BI: How to Deliver More Value in Less TimePerficient, Inc.
 
Real-Time Data Integration for Modern BI
Real-Time Data Integration for Modern BIReal-Time Data Integration for Modern BI
Real-Time Data Integration for Modern BIibi
 

Mais procurados (20)

Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Regulation and Compliance in the Data Driven Enterprise
Regulation and Compliance in the Data Driven EnterpriseRegulation and Compliance in the Data Driven Enterprise
Regulation and Compliance in the Data Driven Enterprise
 
Building Your Enterprise Data Marketplace with DMX-h
Building Your Enterprise Data Marketplace with DMX-hBuilding Your Enterprise Data Marketplace with DMX-h
Building Your Enterprise Data Marketplace with DMX-h
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data Lakes
 
The Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data SolutionThe Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data Solution
 
The Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationThe Path to Data and Analytics Modernization
The Path to Data and Analytics Modernization
 
The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud World
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse Strategies
 
Analyst Webinar: Best Practices In Enabling Data-Driven Decision Making
Analyst Webinar: Best Practices In Enabling Data-Driven Decision MakingAnalyst Webinar: Best Practices In Enabling Data-Driven Decision Making
Analyst Webinar: Best Practices In Enabling Data-Driven Decision Making
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learning
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data Architecture
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
 
Agile BI: How to Deliver More Value in Less Time
Agile BI: How to Deliver More Value in Less TimeAgile BI: How to Deliver More Value in Less Time
Agile BI: How to Deliver More Value in Less Time
 
Real-Time Data Integration for Modern BI
Real-Time Data Integration for Modern BIReal-Time Data Integration for Modern BI
Real-Time Data Integration for Modern BI
 

Semelhante a Hadoop 2015: what we larned -Think Big, A Teradata Company

Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Precisely
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsDenodo
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution AnalyticsRevolution Analytics
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02email2jl
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
 
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...Cloudera, Inc.
 
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseJeff Kelly
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Cloudera, Inc.
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderDataconomy Media
 
How to implement Hadoop successfully
How to implement Hadoop successfullyHow to implement Hadoop successfully
How to implement Hadoop successfullyAdir Sharabi
 

Semelhante a Hadoop 2015: what we larned -Think Big, A Teradata Company (20)

Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
How Businesses use Big Data to Impact the Bottom Line
How Businesses use Big Data to Impact the Bottom LineHow Businesses use Big Data to Impact the Bottom Line
How Businesses use Big Data to Impact the Bottom Line
 
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
 
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouse
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 
How to implement Hadoop successfully
How to implement Hadoop successfullyHow to implement Hadoop successfully
How to implement Hadoop successfully
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 

Último (20)

Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 

Hadoop 2015: what we larned -Think Big, A Teradata Company

  • 1. Think Big, A Teradata Company Rick Farnell, Co-Founder & SVP International
  • 2. 2 Open Data Platform Support © 2015 Think Big, a Teradata Company
  • 3. 3 What we learned over 5 years at Think Big Teamwork is Critical Skills Matter Celebrate Success © 2015 Think Big, a Teradata Company
  • 4. 4 Our hunch was right…this market will be BIG Source: www.Indeed.com April 8, 2015 © 2015 Think Big, a Teradata Company http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2013-2017
  • 5. 5 • 100% Big Data Focus • Founded in 2010 with over 100 engagements across 70 clients • Unlock client value of big data with data science and data engineering services • Proven vendor-neutral open source integration expertise • Agile team-based development methodology • Think Big Academy for skills and organizational development • Global delivery model on-site services with near shore and off shore support Who is Think Big? © 2015 Think Big, a Teradata Company
  • 6. 6 Think Big Services Engagement Model STRATEGY IMPLEMENTATION SOLUTION SUPPORT Think Big offers end-to-end Big Data strategy, implementation and support services focused on helping customers quickly achieve ROI on their Big Data investments Enterprise Data Lake Software Frameworks Big Data Roadmap Establish Data Lake Analytic Solutions Managed Services Data Lake Optimization © 2015 Think Big, a Teradata Company
  • 7. 7 5 Years of Big Data Services with Industry Leaders Think Big Clients eCommerce 2 of Global Top 5 Retail 2 of Global Top 5 Social Networking Global #1 Banking 4 of Global Top 10 Credit Issuer 2 of Global Top 5 Financial Data Services 2 of Global Top 5 Financial Exchanges Global #2 Brokerage & Mutual Funds 2 of Global Top 5 Asset Management Global #1 Semiconductor 2 of Global Top 5 Data Storage Devices 3 of Global Top 5 Disk Drive Manufacturing Global #1 Telecommunications 2 of Global Top 5 Media & Advertising 2 of Global Top 4 Internet Transaction Security Global #1 © 2015 Think Big, a Teradata Company
  • 8. 8 Top 5 Learnings over 5 Years in Big Data 1. Big Data is a journey not an event. 2. Open source is great for innovation but requires engineering sophistication in design patterns & best practices to continually scale analytics in production. 3. Importance of metadata, lineage, data management, data governance, data quality is critical to successful enterprise Big Data deployments. 4. A new data platform will not resolve organizational problems. 5. The Hadoop & opensource ecosystem is moving at the speed of light, you must be agile. Test, build, learn, repeat. © 2015 Think Big, a Teradata Company
  • 9. Think Big Enterprise Data Lake Case Study: Global Manufacturing Client
  • 10. 10 Manufacturing Client Overview Metrics: • Collecting >2M manufacturing/testing binary files daily across Americas and APAC Facilities • Collecting from ~500 tables across 6 databases  tens of millions of records daily • Over 140 analytics users to date • Over 150 attendees participated in Big Data Platform training Business Goals for Investment in Enterprise Data Lake: • Provide Enterprise-wide data access for timely analytics to product quality engineers • Establish foundation for large scale proactive manufacturing and quality analytics • Reduce $Millions in costs of scrap waste • Reduce $Millions in costs of containment processes • Reduce $Millions in costs of data search parties • Increase revenue and market share by accelerating time to market of products © 2015 Think Big, a Teradata Company
  • 11. 11 Engineering and Ingestion Design are Critical Day 1 1. Ingestion1. Utilize a robust Ingestion Design with a Buffer Server Zone 2. Packaging 2. Robust packaging many small flies into larger files with metadata. 3. Parse on Demand 3. Parse on demand. Increase velocity with right level of parsing, parse and refine in more detail as needed. 4. Establish Zones for Control 4. Establish Zones for comprehensive pipeline control, buffer, landing, ingest, core and publish. © 2015 Think Big, a Teradata Company
  • 12. 12 Think Globally EDW EDW 5. Replication on Ingestion 5. Replicate at ingestion with collection of key metadata information. 6. Data Treatment Facility 6. Utilize HDFS as giant file system, all data lands but some will continue in pipe to other MPP/EDW systems etc. 7. WAN Accelerators 7. WAN accelerates the speed transfer of ASIA to/from North America 8. Published Data 8. Processing and format is customized for the audience and consumption pattern. © 2015 Think Big, a Teradata Company
  • 13. 13 Governance for Production Workloads 9. Design Zones for Enterprise Governance 9. Zones will have different retention periods and different access patterns and in order to have pipeline control within your governance strategy you must design for this upfront. © 2015 Think Big, a Teradata Company
  • 14. 14 Optimize Access Patterns of your Data in Hadoop 10. Relational access pattern 10. Optimized for known queries. Equivalent of a Ferrari moving a bag of groceries. 11. Hadoop access pattern 11. Recast your problem. Hadoop optimal for large batches of work. Optimize hierarchical queries with transitive closure. Equivalent of a freight train moving multiple warehouses of groceries. © 2015 Think Big, a Teradata Company
  • 15. 15 Pipeline Management, Monitoring & Control HDFS 12. Data Pipeline 12. End to End Pipeline visibility is extremely important in an Enterprise Data Lake. Metadata Design and use of best practices are key in the use of this pattern. © 2015 Think Big, a Teradata Company
  • 16. 16 Detailed Ingestion Design Patterns 13. Source Systems 13. Hundreds of servers & hundreds of data streams require expert enterprise engineering. 14. Utilities 14. Layer in logging, monitoring and scheduling for complete control of your Enterprise Data Lake. © 2015 Think Big, a Teradata Company
  • 17. 17 ROI for Enterprise Data Lake Manufacturing Client Est. at $11M year 1, $30M cumulative year 2 © 2015 Think Big, a Teradata Company Operations Engineer: a recent production issue required detailed historical testing data. Our current systems did not have the required retention for this request. The Big Data team was able to pull and analyze all the required data from the Big Data Platform in minutes, as opposed to 3+ weeks that we used to take to pull the data from multiple systems and off tape archive. Legacy Systems Enterprise Data Lake Retention 3-6 months scattered DBs & tape archive. 100% data online in enterprise data lake for 3+ years. Coverage Summaries, samples for several data sets. 100% parametric data captured in raw form. Analysis Reactive, missed quality improvement opportunities. Daily dashboard with larger data sets to support proactive improvements.
  • 18. 18 © 2015 Think Big, a Teradata Company Enterprise Data Lake Information Sources Evaluate Source Data Ingest Collect & Manage Metadata Apply Structure Sequence Compress Automate Protect Prepare Data for Ingest Prepare Source Metadata Perimeter-Authentication-Authorization InfoSec Downstream Applications Dashboard Engine Think Big Enterprise Data Lake Industry Leading Assets, Services and Methodology
  • 19. 19
  • 20. 20 © 2015 Think Big, a Teradata Company • Think Big is expanding, bringing its focus on open source consulting to the international region • An office in the UK’s London Bridge Business district will serve as its international hub • Think Big is aggressively hiring a team of data engineers, data scientists, technology project managers and sales leaders • Rick Farnell, Think Big co-founder and SVP, International will lead the international practice London Bridge Business District Photo credit: Duncan Harris. Courtesy of Flickr. Creative Commons Think Big International Expansion
  • 21. 21 © 2015 Think Big, a Teradata Company Dublin, Ireland Munich, Germany Mumbai, India Think Big International Expansion: Phase 1 Think Big International Hub London, England Photo credits: London (Duncan Harris). Dublin (Guiseppe Milo), Munich (John Morgan), Mumbai (McKay Savage). Courtesy of Flickr. Creative Commons
  • 22. 22
  • 23. 23 Dashboard Engine for Hadoop Fast Access for Comprehensive Historical Data Stored in Hadoop “Right” time data Latencies under a second Scales easily for thousands of simultaneous users Reporting, Visualization and Analytics © 2015 Think Big, a Teradata Company Visit our Think Big Booth to see our Dashboard Engine Demo. Enterprise Data Lake Information Sources Downstream Applications
  • 24. 24 © 2015 Think Big, a Teradata Company Thank you Rick Farnell Co-Founder and SVP International, Think Big Rick.Farnell@thinkbiganalytics.com

Notas do Editor

  1. Think Big Americas Office Locations Mountain View Chicago Salt Lake City Boston New York