SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
1
Introduction
This document is based on the MIT Sloan Management Review article on data and analytics at GE titled “Gone
Fishing - For Data”. The document should be viewed as a summary of some of the key points from the article.
The full case study is available at:
http://sloanreview.mit.edu/article/gone-fishing-for-data/
Before getting into the details of GE’s data and analytics efforts, a quick detour is in order to first establish
what is meant by the term “Big Data”.
Big Data a definition
In simple terms Big Data refers to a data environment that cannot be handled by traditional technologies.
Big Data is frequently described in terms of the three V’s, and if you are at IBM, it is likely to be the four V’s.
Figure 1 below illustrates the IBM four V representation of Big Data:
Figure 1: Big Data in dimensions
Figure 1. Four dimensions of big data. Copyright 2012 by IBM. Reprinted with permission.
Please see Appendix A for further elaboration on each of the four V’s.
GE’s objective
Turning to GE’s data and analytics efforts, the company uses sensors to collect data about the performance of
its industrial equipment, including turbines, jet engines and factory floors. Ultimately the company’s efforts
are aimed at being able to sell services to its customers based on detailed analysis of data streaming from its
equipment and the ability to predict failures and other key events.
To get things going
In November 2013, GE set out to connect with 25 airlines and to collect and manage engine data from 3.4
million flights. To do this GE had to build a Data Lake (see table for definition) and it did so with what GE’s
Vince Campisi calls “a two-pizza team,” meaning, a team no bigger than the number of people you could feed
off of two pizzas.
2
Seventy days later GE had created a Data Lake which provided the company with the ability to ingest and
connect the full flight data from the engines, and also integrate the engine data with maintenance visits and
parts information. This data was then provided to GE’s data science community to look at things that were
reducing time on wing for customers.
What is a Data Lake?
A Data Lake is a central source in which data can be used in a variety of ways for many different internal
customers, some currently of interest, others to be discovered in the future. Importantly a Data Lake provides
the organisation with the centralization of data, a capability required in order to break down unwanted data
silos. The growing use of Data Lakes has been made possible by the relatively low cost of large-scale storage on
Hadoop.
A Data Lake brings a different paradigm
As articulated in the article, when using a Data Lake, the data is collected in its raw format and there is no
modelling (structuring) of the data up front like what would be done in a traditional data warehouse. Using
such an approach GE takes the position that they don’t understand the relationships that matter and don’t
understand fully what they are going to find when they bring all of these data sets together. In summary GE’s
Data Lake approach is all about collecting data in its raw format, pumping it into one place in order to break
down data silos, and then modelling the data based on the outcome they are trying to solve for.
More than just a technology solution
Moving beyond merely the technology solution GE also addressed organisational culture as well as the hiring
and development of analytics talent. According to Campisi GE’s talent resides in three communities which have
different data usage patterns.
1) The data science community.
This community is focused on a very specific item or outcome they are trying to solve, or a question
they are trying to answer. The objective of the data science community is to leverage the Data Lake to
look for the answer to the specific problem.
2) The software engineering community.
This community will operationalise the models created by the data science community into an
analytic application.
3) The traditional business intelligence community, which connects to the Data Lake in order to unlock
and answer questions that are more traditional in nature.
Getting all the plumbing right with Data Engineers
An important component to the functioning of data and analytics within an organisation are capabilities to
bridge the data management/IT group and the data science group. These capabilities are provided by Data
Engineers and as articulated in the article; “Data engineering is a discipline that sits in between the two, makes
data more accessible and provides the tools a data scientist would want to have. It allows the data scientist to
focus more on developing the model, developing the insight, not on how to stitch the information or stitch the
toolset to make it productive.”
Organisations lacking the combination of a Data Lake and Data Engineering capability all too often become
bogged down in data preparation efforts. The harsh reality is that Big Data is messy data and there is no quick
and easy way around it. People often think that because the data is there, it is ready to be used - but that is
seldom the case. Campisi provided a good example of this; “You go out and hunt for these coveted data
scientists and bring them in, only to frustrate them. They spend 80% of time trying to organize the
3
information. One of our first use cases, before using our current approach with the data lake plus data
engineering we went through 10 months of organizing data and figuring out where it existed and breaking
down silos, in order for someone to actually go after the outcome. It’s not effective.”
To paraphrase the Ancient Mariner, without a Data Lake and Data Engineering capability organisations can
easily find themselves in the situation of; Data, data, every where, Nor any drop to drink.
Finding people is a challenge
One of GE’s major challenges has been acquiring capable people in the data and analytics domain. This is made
worse by the scale at which GE is doing things. As stated in the article; “Anybody who can spell “Hadoop” is
heavily recruited. It’s hard to find people who’ve really done it at the scale we’re talking about and looking to
do it, so even in the data management space, it’s hard to find talent at the levels we’re constantly searching
for.” Organisations considering undertaking efforts in the data and analytics space clearly should not refrain
from doing so, but are well advised to spend as much consideration on the human talent component as on the
technology component.
Data governance not to be underestimated
Aside from the challenges of finding the right people, being awash with data brings its own set of challenges.
According to the article these data governance challenges are dictating GE’s speed at which it is able to scale
its data and analytics initiative. Also worth noting is that many of these challenges are being brought on by
technology that is so new that there is no precedent on how they should be addressed. Addressing these data
governance challenges for the first time and doing so consistently is a critical consideration for organisations
looking to exploit opportunities in data and analytics – where the difference between those that succeed and
those that fail could well rest on the strength or weakness of the organisations data governance foundation.
Summary
The article clearly demonstrates the opportunities opening to organisation pursuing data and analytics
initiatives. While Big Data has been enabled by technologies like Hadoop, challenges are arising on two fronts.
Firstly organisations face challenges finding people skilled in this environment. Secondly data governance
challenges are increasing in number and evolving in complexity. While these challenges are not trivial, those
organisations that successfully navigate these challenges will be rewarded with opportunities yet to be
discovered.
4
Appendix:
Volume refers to the quantity (gigabytes, terabytes, petabytes etc.) of data that organizations are trying to
harness. Importantly there is no specific measure of volume that defines Big Data, as what constitutes truly
“high” volume varies by industry and even geography. What is clear is that data volumes continue to rise.
Variety refers to different types (forms) of data and data sources. When referring to data types this includes;
numeric, text, image, audio, web, log files etc., whether structured or unstructured. The growth of data
sources such as social media, smart devices, sensors and the Internet of Things has not only resulted in
increases in the volume of data but increases in the types of data as well.
Velocity refers to speed at which data is created, processed and analysed. Velocity impacts latency, which is
the lag time between when data is created or captured, and when it is processed into an output form for
decision making purposes. Importantly, certain types of data must be analysed in real-time to be of value to
the business, a task that places impossible demands on traditional systems where the ability to capture, store
and analyse data in real-time is severely limited.
Veracity refers to the level of reliability associated with certain types of data. According to IBM some data is
inherently uncertain, for example: sentiment and truthfulness in humans; GPS sensors bouncing among the
skyscrapers of Manhattan; weather conditions; economic factors; and the future. When dealing with these
types of data, no amount of data cleansing can correct for it. Yet despite uncertainty, the data still contains
valuable information. The need to acknowledge and embrace this uncertainty is a hallmark of Big Data.
(IBM, 2012, pg. 5)
5
Reference:
IBM. (2012). Four dimensions of big data. [Diagram] Retrieved from IBM, (2012). Analytics: the real-world use of big
data. [pdf]. Retrieved from
http://public.dhe.ibm.com/common/ssi/ecm/en/gbe03519usen/GBE03519USEN.PDF
IBM. (2012). Analytics: the real-world use of big data. [pdf].
Retrieved from http://public.dhe.ibm.com/common/ssi/ecm/en/gbe03519usen/GBE03519USEN.PDF

Mais conteúdo relacionado

Mais procurados

Business_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanBusiness_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_Caratan
Luke Caratan
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
mark madsen
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
Brian Crotty
 
Balance your Supply Chain with Big Data
Balance your Supply Chain with Big DataBalance your Supply Chain with Big Data
Balance your Supply Chain with Big Data
Bodhtree
 
Big dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondBig dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyond
Patrick Bouillaud
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humans
mark madsen
 

Mais procurados (20)

Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
 Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
 
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
 
Demystifying Big Data for Associations
Demystifying Big Data for AssociationsDemystifying Big Data for Associations
Demystifying Big Data for Associations
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Big data
Big dataBig data
Big data
 
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
 
Analytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataAnalytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big Data
 
Big Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesBig Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and Challenges
 
Business_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanBusiness_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_Caratan
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software market
 
Balance your Supply Chain with Big Data
Balance your Supply Chain with Big DataBalance your Supply Chain with Big Data
Balance your Supply Chain with Big Data
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Big dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondBig dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyond
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humans
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
Module 6 The Future of Big and Smart Data- Online
Module 6 The Future of Big and Smart Data- Online Module 6 The Future of Big and Smart Data- Online
Module 6 The Future of Big and Smart Data- Online
 

Semelhante a Overview of mit sloan case study on ge data and analytics initiative titled gone fishing - for data

Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
John Enoch
 
Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013
Lora Cecere
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
pateelhs
 
Big Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to YouBig Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to You
DATAVERSITY
 

Semelhante a Overview of mit sloan case study on ge data and analytics initiative titled gone fishing - for data (20)

Bigdata
BigdataBigdata
Bigdata
 
Starting small with big data
Starting small with big data Starting small with big data
Starting small with big data
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
 
Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013
 
IRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its ChallengesIRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its Challenges
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
 
sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptx
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management
 
BIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdfBIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdf
 
Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?
 
Big Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to YouBig Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to You
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big Data
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringFrom Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data Engineering
 
Bidata
BidataBidata
Bidata
 
What's the Big Deal About Big Data?
What's the Big Deal About Big Data?What's the Big Deal About Big Data?
What's the Big Deal About Big Data?
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
 

Mais de Gregg Barrett

Mais de Gregg Barrett (20)

Cirrus: Africa's AI initiative, Proposal 2018
Cirrus: Africa's AI initiative, Proposal 2018Cirrus: Africa's AI initiative, Proposal 2018
Cirrus: Africa's AI initiative, Proposal 2018
 
Cirrus: Africa's AI initiative
Cirrus: Africa's AI initiativeCirrus: Africa's AI initiative
Cirrus: Africa's AI initiative
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: Insurance
 
Road and Track Vehicle - Project Document
Road and Track Vehicle - Project DocumentRoad and Track Vehicle - Project Document
Road and Track Vehicle - Project Document
 
Modelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingModelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boosting
 
Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?
 
Revenue Generation Ideas for Tesla Motors
Revenue Generation Ideas for Tesla MotorsRevenue Generation Ideas for Tesla Motors
Revenue Generation Ideas for Tesla Motors
 
Data science unit introduction
Data science unit introductionData science unit introduction
Data science unit introduction
 
Social networking brings power
Social networking brings powerSocial networking brings power
Social networking brings power
 
Procurement can be exciting
Procurement can be excitingProcurement can be exciting
Procurement can be exciting
 
Machine Learning Approaches to Brewing Beer
Machine Learning Approaches to Brewing BeerMachine Learning Approaches to Brewing Beer
Machine Learning Approaches to Brewing Beer
 
A note to Data Science and Machine Learning managers
A note to Data Science and Machine Learning managersA note to Data Science and Machine Learning managers
A note to Data Science and Machine Learning managers
 
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
 
Efficient equity portfolios using mean variance optimisation in R
Efficient equity portfolios using mean variance optimisation in REfficient equity portfolios using mean variance optimisation in R
Efficient equity portfolios using mean variance optimisation in R
 
Hadoop Overview
Hadoop OverviewHadoop Overview
Hadoop Overview
 
Variable selection for classification and regression using R
Variable selection for classification and regression using RVariable selection for classification and regression using R
Variable selection for classification and regression using R
 
Diabetes data - model assessment using R
Diabetes data - model assessment using RDiabetes data - model assessment using R
Diabetes data - model assessment using R
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R Services
 
Insurance metrics overview
Insurance metrics overviewInsurance metrics overview
Insurance metrics overview
 
Review of mit sloan management review case study on analytics at Intermountain
Review of mit sloan management review case study on analytics at IntermountainReview of mit sloan management review case study on analytics at Intermountain
Review of mit sloan management review case study on analytics at Intermountain
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Overview of mit sloan case study on ge data and analytics initiative titled gone fishing - for data

  • 1. 1 Introduction This document is based on the MIT Sloan Management Review article on data and analytics at GE titled “Gone Fishing - For Data”. The document should be viewed as a summary of some of the key points from the article. The full case study is available at: http://sloanreview.mit.edu/article/gone-fishing-for-data/ Before getting into the details of GE’s data and analytics efforts, a quick detour is in order to first establish what is meant by the term “Big Data”. Big Data a definition In simple terms Big Data refers to a data environment that cannot be handled by traditional technologies. Big Data is frequently described in terms of the three V’s, and if you are at IBM, it is likely to be the four V’s. Figure 1 below illustrates the IBM four V representation of Big Data: Figure 1: Big Data in dimensions Figure 1. Four dimensions of big data. Copyright 2012 by IBM. Reprinted with permission. Please see Appendix A for further elaboration on each of the four V’s. GE’s objective Turning to GE’s data and analytics efforts, the company uses sensors to collect data about the performance of its industrial equipment, including turbines, jet engines and factory floors. Ultimately the company’s efforts are aimed at being able to sell services to its customers based on detailed analysis of data streaming from its equipment and the ability to predict failures and other key events. To get things going In November 2013, GE set out to connect with 25 airlines and to collect and manage engine data from 3.4 million flights. To do this GE had to build a Data Lake (see table for definition) and it did so with what GE’s Vince Campisi calls “a two-pizza team,” meaning, a team no bigger than the number of people you could feed off of two pizzas.
  • 2. 2 Seventy days later GE had created a Data Lake which provided the company with the ability to ingest and connect the full flight data from the engines, and also integrate the engine data with maintenance visits and parts information. This data was then provided to GE’s data science community to look at things that were reducing time on wing for customers. What is a Data Lake? A Data Lake is a central source in which data can be used in a variety of ways for many different internal customers, some currently of interest, others to be discovered in the future. Importantly a Data Lake provides the organisation with the centralization of data, a capability required in order to break down unwanted data silos. The growing use of Data Lakes has been made possible by the relatively low cost of large-scale storage on Hadoop. A Data Lake brings a different paradigm As articulated in the article, when using a Data Lake, the data is collected in its raw format and there is no modelling (structuring) of the data up front like what would be done in a traditional data warehouse. Using such an approach GE takes the position that they don’t understand the relationships that matter and don’t understand fully what they are going to find when they bring all of these data sets together. In summary GE’s Data Lake approach is all about collecting data in its raw format, pumping it into one place in order to break down data silos, and then modelling the data based on the outcome they are trying to solve for. More than just a technology solution Moving beyond merely the technology solution GE also addressed organisational culture as well as the hiring and development of analytics talent. According to Campisi GE’s talent resides in three communities which have different data usage patterns. 1) The data science community. This community is focused on a very specific item or outcome they are trying to solve, or a question they are trying to answer. The objective of the data science community is to leverage the Data Lake to look for the answer to the specific problem. 2) The software engineering community. This community will operationalise the models created by the data science community into an analytic application. 3) The traditional business intelligence community, which connects to the Data Lake in order to unlock and answer questions that are more traditional in nature. Getting all the plumbing right with Data Engineers An important component to the functioning of data and analytics within an organisation are capabilities to bridge the data management/IT group and the data science group. These capabilities are provided by Data Engineers and as articulated in the article; “Data engineering is a discipline that sits in between the two, makes data more accessible and provides the tools a data scientist would want to have. It allows the data scientist to focus more on developing the model, developing the insight, not on how to stitch the information or stitch the toolset to make it productive.” Organisations lacking the combination of a Data Lake and Data Engineering capability all too often become bogged down in data preparation efforts. The harsh reality is that Big Data is messy data and there is no quick and easy way around it. People often think that because the data is there, it is ready to be used - but that is seldom the case. Campisi provided a good example of this; “You go out and hunt for these coveted data scientists and bring them in, only to frustrate them. They spend 80% of time trying to organize the
  • 3. 3 information. One of our first use cases, before using our current approach with the data lake plus data engineering we went through 10 months of organizing data and figuring out where it existed and breaking down silos, in order for someone to actually go after the outcome. It’s not effective.” To paraphrase the Ancient Mariner, without a Data Lake and Data Engineering capability organisations can easily find themselves in the situation of; Data, data, every where, Nor any drop to drink. Finding people is a challenge One of GE’s major challenges has been acquiring capable people in the data and analytics domain. This is made worse by the scale at which GE is doing things. As stated in the article; “Anybody who can spell “Hadoop” is heavily recruited. It’s hard to find people who’ve really done it at the scale we’re talking about and looking to do it, so even in the data management space, it’s hard to find talent at the levels we’re constantly searching for.” Organisations considering undertaking efforts in the data and analytics space clearly should not refrain from doing so, but are well advised to spend as much consideration on the human talent component as on the technology component. Data governance not to be underestimated Aside from the challenges of finding the right people, being awash with data brings its own set of challenges. According to the article these data governance challenges are dictating GE’s speed at which it is able to scale its data and analytics initiative. Also worth noting is that many of these challenges are being brought on by technology that is so new that there is no precedent on how they should be addressed. Addressing these data governance challenges for the first time and doing so consistently is a critical consideration for organisations looking to exploit opportunities in data and analytics – where the difference between those that succeed and those that fail could well rest on the strength or weakness of the organisations data governance foundation. Summary The article clearly demonstrates the opportunities opening to organisation pursuing data and analytics initiatives. While Big Data has been enabled by technologies like Hadoop, challenges are arising on two fronts. Firstly organisations face challenges finding people skilled in this environment. Secondly data governance challenges are increasing in number and evolving in complexity. While these challenges are not trivial, those organisations that successfully navigate these challenges will be rewarded with opportunities yet to be discovered.
  • 4. 4 Appendix: Volume refers to the quantity (gigabytes, terabytes, petabytes etc.) of data that organizations are trying to harness. Importantly there is no specific measure of volume that defines Big Data, as what constitutes truly “high” volume varies by industry and even geography. What is clear is that data volumes continue to rise. Variety refers to different types (forms) of data and data sources. When referring to data types this includes; numeric, text, image, audio, web, log files etc., whether structured or unstructured. The growth of data sources such as social media, smart devices, sensors and the Internet of Things has not only resulted in increases in the volume of data but increases in the types of data as well. Velocity refers to speed at which data is created, processed and analysed. Velocity impacts latency, which is the lag time between when data is created or captured, and when it is processed into an output form for decision making purposes. Importantly, certain types of data must be analysed in real-time to be of value to the business, a task that places impossible demands on traditional systems where the ability to capture, store and analyse data in real-time is severely limited. Veracity refers to the level of reliability associated with certain types of data. According to IBM some data is inherently uncertain, for example: sentiment and truthfulness in humans; GPS sensors bouncing among the skyscrapers of Manhattan; weather conditions; economic factors; and the future. When dealing with these types of data, no amount of data cleansing can correct for it. Yet despite uncertainty, the data still contains valuable information. The need to acknowledge and embrace this uncertainty is a hallmark of Big Data. (IBM, 2012, pg. 5)
  • 5. 5 Reference: IBM. (2012). Four dimensions of big data. [Diagram] Retrieved from IBM, (2012). Analytics: the real-world use of big data. [pdf]. Retrieved from http://public.dhe.ibm.com/common/ssi/ecm/en/gbe03519usen/GBE03519USEN.PDF IBM. (2012). Analytics: the real-world use of big data. [pdf]. Retrieved from http://public.dhe.ibm.com/common/ssi/ecm/en/gbe03519usen/GBE03519USEN.PDF