SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
1
Introduction
This document is based on the MIT Sloan Management Review article on data and analytics at GE titled “Gone
Fishing - For Data”. The document should be viewed as a summary of some of the key points from the article.
The full case study is available at:
http://sloanreview.mit.edu/article/gone-fishing-for-data/
Before getting into the details of GE’s data and analytics efforts, a quick detour is in order to first establish
what is meant by the term “Big Data”.
Big Data a definition
In simple terms Big Data refers to a data environment that cannot be handled by traditional technologies.
Big Data is frequently described in terms of the three V’s, and if you are at IBM, it is likely to be the four V’s.
Figure 1 below illustrates the IBM four V representation of Big Data:
Figure 1: Big Data in dimensions
Figure 1. Four dimensions of big data. Copyright 2012 by IBM. Reprinted with permission.
Please see Appendix A for further elaboration on each of the four V’s.
GE’s objective
Turning to GE’s data and analytics efforts, the company uses sensors to collect data about the performance of
its industrial equipment, including turbines, jet engines and factory floors. Ultimately the company’s efforts
are aimed at being able to sell services to its customers based on detailed analysis of data streaming from its
equipment and the ability to predict failures and other key events.
To get things going
In November 2013, GE set out to connect with 25 airlines and to collect and manage engine data from 3.4
million flights. To do this GE had to build a Data Lake (see table for definition) and it did so with what GE’s
Vince Campisi calls “a two-pizza team,” meaning, a team no bigger than the number of people you could feed
off of two pizzas.
2
Seventy days later GE had created a Data Lake which provided the company with the ability to ingest and
connect the full flight data from the engines, and also integrate the engine data with maintenance visits and
parts information. This data was then provided to GE’s data science community to look at things that were
reducing time on wing for customers.
What is a Data Lake?
A Data Lake is a central source in which data can be used in a variety of ways for many different internal
customers, some currently of interest, others to be discovered in the future. Importantly a Data Lake provides
the organisation with the centralization of data, a capability required in order to break down unwanted data
silos. The growing use of Data Lakes has been made possible by the relatively low cost of large-scale storage on
Hadoop.
A Data Lake brings a different paradigm
As articulated in the article, when using a Data Lake, the data is collected in its raw format and there is no
modelling (structuring) of the data up front like what would be done in a traditional data warehouse. Using
such an approach GE takes the position that they don’t understand the relationships that matter and don’t
understand fully what they are going to find when they bring all of these data sets together. In summary GE’s
Data Lake approach is all about collecting data in its raw format, pumping it into one place in order to break
down data silos, and then modelling the data based on the outcome they are trying to solve for.
More than just a technology solution
Moving beyond merely the technology solution GE also addressed organisational culture as well as the hiring
and development of analytics talent. According to Campisi GE’s talent resides in three communities which have
different data usage patterns.
1) The data science community.
This community is focused on a very specific item or outcome they are trying to solve, or a question
they are trying to answer. The objective of the data science community is to leverage the Data Lake to
look for the answer to the specific problem.
2) The software engineering community.
This community will operationalise the models created by the data science community into an
analytic application.
3) The traditional business intelligence community, which connects to the Data Lake in order to unlock
and answer questions that are more traditional in nature.
Getting all the plumbing right with Data Engineers
An important component to the functioning of data and analytics within an organisation are capabilities to
bridge the data management/IT group and the data science group. These capabilities are provided by Data
Engineers and as articulated in the article; “Data engineering is a discipline that sits in between the two, makes
data more accessible and provides the tools a data scientist would want to have. It allows the data scientist to
focus more on developing the model, developing the insight, not on how to stitch the information or stitch the
toolset to make it productive.”
Organisations lacking the combination of a Data Lake and Data Engineering capability all too often become
bogged down in data preparation efforts. The harsh reality is that Big Data is messy data and there is no quick
and easy way around it. People often think that because the data is there, it is ready to be used - but that is
seldom the case. Campisi provided a good example of this; “You go out and hunt for these coveted data
scientists and bring them in, only to frustrate them. They spend 80% of time trying to organize the
3
information. One of our first use cases, before using our current approach with the data lake plus data
engineering we went through 10 months of organizing data and figuring out where it existed and breaking
down silos, in order for someone to actually go after the outcome. It’s not effective.”
To paraphrase the Ancient Mariner, without a Data Lake and Data Engineering capability organisations can
easily find themselves in the situation of; Data, data, every where, Nor any drop to drink.
Finding people is a challenge
One of GE’s major challenges has been acquiring capable people in the data and analytics domain. This is made
worse by the scale at which GE is doing things. As stated in the article; “Anybody who can spell “Hadoop” is
heavily recruited. It’s hard to find people who’ve really done it at the scale we’re talking about and looking to
do it, so even in the data management space, it’s hard to find talent at the levels we’re constantly searching
for.” Organisations considering undertaking efforts in the data and analytics space clearly should not refrain
from doing so, but are well advised to spend as much consideration on the human talent component as on the
technology component.
Data governance not to be underestimated
Aside from the challenges of finding the right people, being awash with data brings its own set of challenges.
According to the article these data governance challenges are dictating GE’s speed at which it is able to scale
its data and analytics initiative. Also worth noting is that many of these challenges are being brought on by
technology that is so new that there is no precedent on how they should be addressed. Addressing these data
governance challenges for the first time and doing so consistently is a critical consideration for organisations
looking to exploit opportunities in data and analytics – where the difference between those that succeed and
those that fail could well rest on the strength or weakness of the organisations data governance foundation.
Summary
The article clearly demonstrates the opportunities opening to organisation pursuing data and analytics
initiatives. While Big Data has been enabled by technologies like Hadoop, challenges are arising on two fronts.
Firstly organisations face challenges finding people skilled in this environment. Secondly data governance
challenges are increasing in number and evolving in complexity. While these challenges are not trivial, those
organisations that successfully navigate these challenges will be rewarded with opportunities yet to be
discovered.
4
Appendix:
Volume refers to the quantity (gigabytes, terabytes, petabytes etc.) of data that organizations are trying to
harness. Importantly there is no specific measure of volume that defines Big Data, as what constitutes truly
“high” volume varies by industry and even geography. What is clear is that data volumes continue to rise.
Variety refers to different types (forms) of data and data sources. When referring to data types this includes;
numeric, text, image, audio, web, log files etc., whether structured or unstructured. The growth of data
sources such as social media, smart devices, sensors and the Internet of Things has not only resulted in
increases in the volume of data but increases in the types of data as well.
Velocity refers to speed at which data is created, processed and analysed. Velocity impacts latency, which is
the lag time between when data is created or captured, and when it is processed into an output form for
decision making purposes. Importantly, certain types of data must be analysed in real-time to be of value to
the business, a task that places impossible demands on traditional systems where the ability to capture, store
and analyse data in real-time is severely limited.
Veracity refers to the level of reliability associated with certain types of data. According to IBM some data is
inherently uncertain, for example: sentiment and truthfulness in humans; GPS sensors bouncing among the
skyscrapers of Manhattan; weather conditions; economic factors; and the future. When dealing with these
types of data, no amount of data cleansing can correct for it. Yet despite uncertainty, the data still contains
valuable information. The need to acknowledge and embrace this uncertainty is a hallmark of Big Data.
(IBM, 2012, pg. 5)
5
Reference:
IBM. (2012). Four dimensions of big data. [Diagram] Retrieved from IBM, (2012). Analytics: the real-world use of big
data. [pdf]. Retrieved from
http://public.dhe.ibm.com/common/ssi/ecm/en/gbe03519usen/GBE03519USEN.PDF
IBM. (2012). Analytics: the real-world use of big data. [pdf].
Retrieved from http://public.dhe.ibm.com/common/ssi/ecm/en/gbe03519usen/GBE03519USEN.PDF

Mais conteúdo relacionado

Mais procurados

Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
 Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Mindshappiestmindstech
 
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)Cisco Service Provider Mobility
 
Demystifying Big Data for Associations
Demystifying Big Data for AssociationsDemystifying Big Data for Associations
Demystifying Big Data for AssociationsPatrick Dorsey
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Inside Analysis
 
Analytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataAnalytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataDavid Pittman
 
Big Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesBig Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesGregg Barrett
 
Business_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanBusiness_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanLuke Caratan
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET Journal
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesmark madsen
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013Brian Crotty
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software marketmark madsen
 
Balance your Supply Chain with Big Data
Balance your Supply Chain with Big DataBalance your Supply Chain with Big Data
Balance your Supply Chain with Big DataBodhtree
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and OpportunitiesKenny Huang Ph.D.
 
Big dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondBig dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondPatrick Bouillaud
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humansmark madsen
 
Module 6 The Future of Big and Smart Data- Online
Module 6 The Future of Big and Smart Data- Online Module 6 The Future of Big and Smart Data- Online
Module 6 The Future of Big and Smart Data- Online caniceconsulting
 

Mais procurados (20)

Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
 Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
 
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
 
Demystifying Big Data for Associations
Demystifying Big Data for AssociationsDemystifying Big Data for Associations
Demystifying Big Data for Associations
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Big data
Big dataBig data
Big data
 
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
 
Analytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataAnalytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big Data
 
Big Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesBig Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and Challenges
 
Business_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanBusiness_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_Caratan
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software market
 
Balance your Supply Chain with Big Data
Balance your Supply Chain with Big DataBalance your Supply Chain with Big Data
Balance your Supply Chain with Big Data
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Big dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondBig dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyond
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humans
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
Module 6 The Future of Big and Smart Data- Online
Module 6 The Future of Big and Smart Data- Online Module 6 The Future of Big and Smart Data- Online
Module 6 The Future of Big and Smart Data- Online
 

Semelhante a Overview of mit sloan case study on ge data and analytics initiative titled gone fishing - for data

Starting small with big data
Starting small with big data Starting small with big data
Starting small with big data WGroup
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paperJohn Enoch
 
Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013Lora Cecere
 
IRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its ChallengesIRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its ChallengesIRJET Journal
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Thingspateelhs
 
sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxcalf_ville86
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 
How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management Abhishek Sood
 
Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?Jennifer Walker
 
Big Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to YouBig Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to YouDATAVERSITY
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big DataIRJET Journal
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAudrey Britton
 
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringFrom Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringRy Walker
 
What's the Big Deal About Big Data?
What's the Big Deal About Big Data?What's the Big Deal About Big Data?
What's the Big Deal About Big Data?Logi Analytics
 

Semelhante a Overview of mit sloan case study on ge data and analytics initiative titled gone fishing - for data (20)

Bigdata
BigdataBigdata
Bigdata
 
Starting small with big data
Starting small with big data Starting small with big data
Starting small with big data
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
 
Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013
 
IRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its ChallengesIRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its Challenges
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
 
sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptx
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management
 
BIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdfBIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdf
 
Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?
 
Big Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to YouBig Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to You
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big Data
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringFrom Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data Engineering
 
Bidata
BidataBidata
Bidata
 
What's the Big Deal About Big Data?
What's the Big Deal About Big Data?What's the Big Deal About Big Data?
What's the Big Deal About Big Data?
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
 

Mais de Gregg Barrett

Cirrus: Africa's AI initiative, Proposal 2018
Cirrus: Africa's AI initiative, Proposal 2018Cirrus: Africa's AI initiative, Proposal 2018
Cirrus: Africa's AI initiative, Proposal 2018Gregg Barrett
 
Cirrus: Africa's AI initiative
Cirrus: Africa's AI initiativeCirrus: Africa's AI initiative
Cirrus: Africa's AI initiativeGregg Barrett
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: InsuranceGregg Barrett
 
Road and Track Vehicle - Project Document
Road and Track Vehicle - Project DocumentRoad and Track Vehicle - Project Document
Road and Track Vehicle - Project DocumentGregg Barrett
 
Modelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingModelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingGregg Barrett
 
Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Gregg Barrett
 
Revenue Generation Ideas for Tesla Motors
Revenue Generation Ideas for Tesla MotorsRevenue Generation Ideas for Tesla Motors
Revenue Generation Ideas for Tesla MotorsGregg Barrett
 
Data science unit introduction
Data science unit introductionData science unit introduction
Data science unit introductionGregg Barrett
 
Social networking brings power
Social networking brings powerSocial networking brings power
Social networking brings powerGregg Barrett
 
Procurement can be exciting
Procurement can be excitingProcurement can be exciting
Procurement can be excitingGregg Barrett
 
Machine Learning Approaches to Brewing Beer
Machine Learning Approaches to Brewing BeerMachine Learning Approaches to Brewing Beer
Machine Learning Approaches to Brewing BeerGregg Barrett
 
A note to Data Science and Machine Learning managers
A note to Data Science and Machine Learning managersA note to Data Science and Machine Learning managers
A note to Data Science and Machine Learning managersGregg Barrett
 
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Gregg Barrett
 
Efficient equity portfolios using mean variance optimisation in R
Efficient equity portfolios using mean variance optimisation in REfficient equity portfolios using mean variance optimisation in R
Efficient equity portfolios using mean variance optimisation in RGregg Barrett
 
Variable selection for classification and regression using R
Variable selection for classification and regression using RVariable selection for classification and regression using R
Variable selection for classification and regression using RGregg Barrett
 
Diabetes data - model assessment using R
Diabetes data - model assessment using RDiabetes data - model assessment using R
Diabetes data - model assessment using RGregg Barrett
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R ServicesGregg Barrett
 
Insurance metrics overview
Insurance metrics overviewInsurance metrics overview
Insurance metrics overviewGregg Barrett
 
Review of mit sloan management review case study on analytics at Intermountain
Review of mit sloan management review case study on analytics at IntermountainReview of mit sloan management review case study on analytics at Intermountain
Review of mit sloan management review case study on analytics at IntermountainGregg Barrett
 

Mais de Gregg Barrett (20)

Cirrus: Africa's AI initiative, Proposal 2018
Cirrus: Africa's AI initiative, Proposal 2018Cirrus: Africa's AI initiative, Proposal 2018
Cirrus: Africa's AI initiative, Proposal 2018
 
Cirrus: Africa's AI initiative
Cirrus: Africa's AI initiativeCirrus: Africa's AI initiative
Cirrus: Africa's AI initiative
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: Insurance
 
Road and Track Vehicle - Project Document
Road and Track Vehicle - Project DocumentRoad and Track Vehicle - Project Document
Road and Track Vehicle - Project Document
 
Modelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingModelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boosting
 
Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?
 
Revenue Generation Ideas for Tesla Motors
Revenue Generation Ideas for Tesla MotorsRevenue Generation Ideas for Tesla Motors
Revenue Generation Ideas for Tesla Motors
 
Data science unit introduction
Data science unit introductionData science unit introduction
Data science unit introduction
 
Social networking brings power
Social networking brings powerSocial networking brings power
Social networking brings power
 
Procurement can be exciting
Procurement can be excitingProcurement can be exciting
Procurement can be exciting
 
Machine Learning Approaches to Brewing Beer
Machine Learning Approaches to Brewing BeerMachine Learning Approaches to Brewing Beer
Machine Learning Approaches to Brewing Beer
 
A note to Data Science and Machine Learning managers
A note to Data Science and Machine Learning managersA note to Data Science and Machine Learning managers
A note to Data Science and Machine Learning managers
 
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
 
Efficient equity portfolios using mean variance optimisation in R
Efficient equity portfolios using mean variance optimisation in REfficient equity portfolios using mean variance optimisation in R
Efficient equity portfolios using mean variance optimisation in R
 
Hadoop Overview
Hadoop OverviewHadoop Overview
Hadoop Overview
 
Variable selection for classification and regression using R
Variable selection for classification and regression using RVariable selection for classification and regression using R
Variable selection for classification and regression using R
 
Diabetes data - model assessment using R
Diabetes data - model assessment using RDiabetes data - model assessment using R
Diabetes data - model assessment using R
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R Services
 
Insurance metrics overview
Insurance metrics overviewInsurance metrics overview
Insurance metrics overview
 
Review of mit sloan management review case study on analytics at Intermountain
Review of mit sloan management review case study on analytics at IntermountainReview of mit sloan management review case study on analytics at Intermountain
Review of mit sloan management review case study on analytics at Intermountain
 

Último

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Overview of mit sloan case study on ge data and analytics initiative titled gone fishing - for data

  • 1. 1 Introduction This document is based on the MIT Sloan Management Review article on data and analytics at GE titled “Gone Fishing - For Data”. The document should be viewed as a summary of some of the key points from the article. The full case study is available at: http://sloanreview.mit.edu/article/gone-fishing-for-data/ Before getting into the details of GE’s data and analytics efforts, a quick detour is in order to first establish what is meant by the term “Big Data”. Big Data a definition In simple terms Big Data refers to a data environment that cannot be handled by traditional technologies. Big Data is frequently described in terms of the three V’s, and if you are at IBM, it is likely to be the four V’s. Figure 1 below illustrates the IBM four V representation of Big Data: Figure 1: Big Data in dimensions Figure 1. Four dimensions of big data. Copyright 2012 by IBM. Reprinted with permission. Please see Appendix A for further elaboration on each of the four V’s. GE’s objective Turning to GE’s data and analytics efforts, the company uses sensors to collect data about the performance of its industrial equipment, including turbines, jet engines and factory floors. Ultimately the company’s efforts are aimed at being able to sell services to its customers based on detailed analysis of data streaming from its equipment and the ability to predict failures and other key events. To get things going In November 2013, GE set out to connect with 25 airlines and to collect and manage engine data from 3.4 million flights. To do this GE had to build a Data Lake (see table for definition) and it did so with what GE’s Vince Campisi calls “a two-pizza team,” meaning, a team no bigger than the number of people you could feed off of two pizzas.
  • 2. 2 Seventy days later GE had created a Data Lake which provided the company with the ability to ingest and connect the full flight data from the engines, and also integrate the engine data with maintenance visits and parts information. This data was then provided to GE’s data science community to look at things that were reducing time on wing for customers. What is a Data Lake? A Data Lake is a central source in which data can be used in a variety of ways for many different internal customers, some currently of interest, others to be discovered in the future. Importantly a Data Lake provides the organisation with the centralization of data, a capability required in order to break down unwanted data silos. The growing use of Data Lakes has been made possible by the relatively low cost of large-scale storage on Hadoop. A Data Lake brings a different paradigm As articulated in the article, when using a Data Lake, the data is collected in its raw format and there is no modelling (structuring) of the data up front like what would be done in a traditional data warehouse. Using such an approach GE takes the position that they don’t understand the relationships that matter and don’t understand fully what they are going to find when they bring all of these data sets together. In summary GE’s Data Lake approach is all about collecting data in its raw format, pumping it into one place in order to break down data silos, and then modelling the data based on the outcome they are trying to solve for. More than just a technology solution Moving beyond merely the technology solution GE also addressed organisational culture as well as the hiring and development of analytics talent. According to Campisi GE’s talent resides in three communities which have different data usage patterns. 1) The data science community. This community is focused on a very specific item or outcome they are trying to solve, or a question they are trying to answer. The objective of the data science community is to leverage the Data Lake to look for the answer to the specific problem. 2) The software engineering community. This community will operationalise the models created by the data science community into an analytic application. 3) The traditional business intelligence community, which connects to the Data Lake in order to unlock and answer questions that are more traditional in nature. Getting all the plumbing right with Data Engineers An important component to the functioning of data and analytics within an organisation are capabilities to bridge the data management/IT group and the data science group. These capabilities are provided by Data Engineers and as articulated in the article; “Data engineering is a discipline that sits in between the two, makes data more accessible and provides the tools a data scientist would want to have. It allows the data scientist to focus more on developing the model, developing the insight, not on how to stitch the information or stitch the toolset to make it productive.” Organisations lacking the combination of a Data Lake and Data Engineering capability all too often become bogged down in data preparation efforts. The harsh reality is that Big Data is messy data and there is no quick and easy way around it. People often think that because the data is there, it is ready to be used - but that is seldom the case. Campisi provided a good example of this; “You go out and hunt for these coveted data scientists and bring them in, only to frustrate them. They spend 80% of time trying to organize the
  • 3. 3 information. One of our first use cases, before using our current approach with the data lake plus data engineering we went through 10 months of organizing data and figuring out where it existed and breaking down silos, in order for someone to actually go after the outcome. It’s not effective.” To paraphrase the Ancient Mariner, without a Data Lake and Data Engineering capability organisations can easily find themselves in the situation of; Data, data, every where, Nor any drop to drink. Finding people is a challenge One of GE’s major challenges has been acquiring capable people in the data and analytics domain. This is made worse by the scale at which GE is doing things. As stated in the article; “Anybody who can spell “Hadoop” is heavily recruited. It’s hard to find people who’ve really done it at the scale we’re talking about and looking to do it, so even in the data management space, it’s hard to find talent at the levels we’re constantly searching for.” Organisations considering undertaking efforts in the data and analytics space clearly should not refrain from doing so, but are well advised to spend as much consideration on the human talent component as on the technology component. Data governance not to be underestimated Aside from the challenges of finding the right people, being awash with data brings its own set of challenges. According to the article these data governance challenges are dictating GE’s speed at which it is able to scale its data and analytics initiative. Also worth noting is that many of these challenges are being brought on by technology that is so new that there is no precedent on how they should be addressed. Addressing these data governance challenges for the first time and doing so consistently is a critical consideration for organisations looking to exploit opportunities in data and analytics – where the difference between those that succeed and those that fail could well rest on the strength or weakness of the organisations data governance foundation. Summary The article clearly demonstrates the opportunities opening to organisation pursuing data and analytics initiatives. While Big Data has been enabled by technologies like Hadoop, challenges are arising on two fronts. Firstly organisations face challenges finding people skilled in this environment. Secondly data governance challenges are increasing in number and evolving in complexity. While these challenges are not trivial, those organisations that successfully navigate these challenges will be rewarded with opportunities yet to be discovered.
  • 4. 4 Appendix: Volume refers to the quantity (gigabytes, terabytes, petabytes etc.) of data that organizations are trying to harness. Importantly there is no specific measure of volume that defines Big Data, as what constitutes truly “high” volume varies by industry and even geography. What is clear is that data volumes continue to rise. Variety refers to different types (forms) of data and data sources. When referring to data types this includes; numeric, text, image, audio, web, log files etc., whether structured or unstructured. The growth of data sources such as social media, smart devices, sensors and the Internet of Things has not only resulted in increases in the volume of data but increases in the types of data as well. Velocity refers to speed at which data is created, processed and analysed. Velocity impacts latency, which is the lag time between when data is created or captured, and when it is processed into an output form for decision making purposes. Importantly, certain types of data must be analysed in real-time to be of value to the business, a task that places impossible demands on traditional systems where the ability to capture, store and analyse data in real-time is severely limited. Veracity refers to the level of reliability associated with certain types of data. According to IBM some data is inherently uncertain, for example: sentiment and truthfulness in humans; GPS sensors bouncing among the skyscrapers of Manhattan; weather conditions; economic factors; and the future. When dealing with these types of data, no amount of data cleansing can correct for it. Yet despite uncertainty, the data still contains valuable information. The need to acknowledge and embrace this uncertainty is a hallmark of Big Data. (IBM, 2012, pg. 5)
  • 5. 5 Reference: IBM. (2012). Four dimensions of big data. [Diagram] Retrieved from IBM, (2012). Analytics: the real-world use of big data. [pdf]. Retrieved from http://public.dhe.ibm.com/common/ssi/ecm/en/gbe03519usen/GBE03519USEN.PDF IBM. (2012). Analytics: the real-world use of big data. [pdf]. Retrieved from http://public.dhe.ibm.com/common/ssi/ecm/en/gbe03519usen/GBE03519USEN.PDF