O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Data as the New Oil: Producing Value in the Oil and Gas Industry

4.761 visualizações

Publicada em

Oil and gas exploration and production activities generate large amounts of data from sensors, logistics, business operations and more. Given the data volume, variety and velocity, gaining actionable and relevant insights from the data is challenging. Learn about these challenges and how to address them by leveraging big data technologies in this webinar.

During the webinar we will dive deep into approaches for predicting drilling equipment function and failure, a key step towards zero unplanned downtime. In the process of drilling wells, non-productive time due to drilling equipment failure can be expensive. We will highlight how the Pivotal Data Labs team uses big data technologies to build models for predicting drilling equipment function and failure. Models such as these can be used to build essential early warning systems to reduce costs and minimize unplanned downtime.

Panelist:
Rashmi Raghu, Senior Data Scientist, Pivotal​

Hosted by:
Tim Matteson, Co-Founder -- Data Science Central

Video replay is available to watch here: http://youtu.be/dhT-tjHCr9E

Publicada em: Dados e análise, Tecnologia
  • I went from getting $3 surveys to $500 surveys every day!! learn more... ♥♥♥ https://tinyurl.com/make2793amonth
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • I went from getting $3 surveys to $500 surveys every day!! learn more... ▲▲▲ http://ishbv.com/surveys6/pdf
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Unlock Her Legs is your passage way to a life full of loving and sex... read more >> ♥♥♥ https://tinyurl.com/unlockherlegss
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Data as the New Oil: Producing Value in the Oil and Gas Industry

  1. 1. 1© 2014 Pivotal Software, Inc. All rights reserved. 1© 2014 Pivotal Software, Inc. All rights reserved. Data as the New Oil Producing Value for the Oil & Gas Industry
  2. 2. 2© 2014 Pivotal Software, Inc. All rights reserved. Data: The New Oil •  Oil and gas exploration and production activities generate large amounts of data from sensors, logistics, business operations and more •  The rise of cost-effective data collection, storage and computing devices is giving an established industry a new boost •  Producing value from big data is a challenge and an opportunity in the industry •  The promise of Data as “the new oil” is realized when we can tap into its value in a meaningful, cross-functional way to enhance decision-making, which provides the competitive advantage http://commons.wikimedia.org/wiki/ File:Rig_wind_river.jpg
  3. 3. 3© 2014 Pivotal Software, Inc. All rights reserved. Challenges and Opportunities Challenges •  Current data collection and curation practices are mostly in silos •  Different data models for data from different functions in the organization •  Missing or incomplete data for integrating varied data sources •  Legacy systems that need to be taken into consideration •  Domain expertise in silos – ability to work across domains needed for extracting full value from ‘the new oil’ Opportunities •  Data Lake concepts and technology allow data to be stored centrally and curated in a meaningful way •  Comprehensive, single view of the truth: –  Integration of data assets lead to more informed, powerful models –  Many “first-of-its-kind” models become possible for the business –  These models enhance decision making by providing better predictions •  Real-time application of predictive models can speed up responses to events
  4. 4. 4© 2014 Pivotal Software, Inc. All rights reserved. Significant Use Cases •  Predictive Maintenance –  Model equipment function and failure –  Optimize maintenance schedules –  Real-time alerts based on predictive models •  Seismic Imaging and Inversion Analysis •  Reservoir Simulation and Management •  Production Optimization •  Supply Chain Optimization •  Energy Trading
  5. 5. 5© 2014 Pivotal Software, Inc. All rights reserved. 5© Copyright 2013 Pivotal. All rights reserved. Predictive Analytics for Drilling Operations Predicting Equipment Function and Failure
  6. 6. 6© 2014 Pivotal Software, Inc. All rights reserved. Predictive Analytics for Drilling Operations Business Goals •  Increase efficiency, reduce costs •  Take steps towards zero unplanned downtime •  Predict equipment function for maintenance •  Provide early warning system for equipment failure •  Optimize parameters for drilling operations •  Improve health, safety and environmental risks Big Data Sources •  Sensor data –  Surface and down-hole sensors –  Measurement While Drilling (MWD) –  SCADA data •  Drill Operator data –  Operator comments –  Activity log / codes –  Incident reports / logs •  And more … Introduction Data Integration Feature Building Modeling & Impact
  7. 7. 7© 2014 Pivotal Software, Inc. All rights reserved. Predicting Equipment Function and Failure •  Business Problem: Predict drilling equipment function and failure – a step towards early warning systems and zero unplanned downtime •  Motivation: Drilling wells and equipment failure during the process are expensive. Example: Drilling motor damage could account for 35% of rig non-productive time (NPT) and can cost $150,000 per incident1 •  Goals: –  Predict equipment function and failure à this enables: •  Optimization of parameters for efficient drilling •  Reducing non-productive drill time (and costs) •  Reducing failures –  Provide insights into prominent features impacting operation and failure Introduction Data Integration Feature Building Modeling & Impact 1 The American Oil & Gas Reporter, April 2014 Cover Story
  8. 8. 8© 2014 Pivotal Software, Inc. All rights reserved. The Eightfold Path of Data Science Four Phases and Four Differentiating Factors Technology Selection Select the right platform and the right set of tools for solving the problem at hand Iterative Approach Perform each phase in an agile manner, team up with domain experts and SMEs, and iterate as required Creativity Take the opportunity to innovate at every phase Building a Narrative Create a fact-based narrative that clearly communicates insights to stakeholders Phase 1: Problem Formulation Make sure you formulate a problem that is relevant to the goals and pain points of the stakeholders Phase 2: Data Step Build the right feature set making full use of the volume, variety and velocity of all available data Phase 3: Modeling Step This is where you move from answering what, where and when to answering why and what if? Phase 4: Application Create a framework for integrating the model with decision making processes and taking action using the Internet of Things Introduction Data Integration Feature Building Modeling & Impact
  9. 9. 9© 2014 Pivotal Software, Inc. All rights reserved. Technology Selection •  Platform for all phases of the analytics cycle •  Support development of complex and extensible predictive models to predict equipment function and failure •  Provide framework for integrating data from multiple sources across data warehouses and rig operators •  Ability to analyze both structured and unstructured data in a unified manner. For instance: –  Support fast computation of hundreds of features over time windows within 100s of millions (or billions / trillions) of records of time-series data –  Natural language processing pipeline for analysis of operator comments to identify failures from unstructured text Introduction Data Integration Feature Building Modeling & Impact PL/PythonPL/R
  10. 10. 10© 2014 Pivotal Software, Inc. All rights reserved. Predictive Analytics for Drilling Operations •  Consider two examples: –  Predicting drill rate-of-penetration (ROP) –  Predicting drilling equipment failure •  Primary data sources for these examples –  Drill Rig Sensor Data: Depth, Rate of Penetration (ROP), RPM, Torque, Weight on Bit, etc… ( >billions of records) –  Operator Data: Drill Bit details, Failure details, Component details etc… (>100s of thousands of records) Introduction Data Integration Feature Building Modeling & Impact Data Integration Feature Building Modeling
  11. 11. 11© 2014 Pivotal Software, Inc. All rights reserved. Drill Rig Sensor data Comprehensive Data Integration Framework •  Need a comprehensive framework for data integration at scale –  Data cleansing – removing NULLs and outliers, missing value impuation techniques –  Standardizing columns that are used to join across multiple data sources Sensor and Operator data integrated Introduction Data Integration Feature Building Modeling & Impact Operator data
  12. 12. 12© 2014 Pivotal Software, Inc. All rights reserved. Data Integration Challenges •  Data sources do not use consistent entries in features / columns that link them (join columns) – e.g. well names •  Manually entered data (some operator data) is prone to entry errors –  Hitting several keys –  Key strokes not appearing (e.g. missing a character / digit) •  Invalid values for sensor measurements –  Invalid values could be placeholders for sensor malfunction or non-recording time –  Duration of invalid values can range from one-off occurrences to several hours Introduction Data Integration Feature Building Modeling & Impact
  13. 13. 13© 2014 Pivotal Software, Inc. All rights reserved. Data Integration Challenges •  Standardization of join column entries across data sources •  Problem: Data sources do not use consistent entries in join columns •  Resolution options: Derive a canonical representation for the columns –  Regular expression transformations –  String edit distance computations à closest distance matches –  + Manual correction •  Include standardized entries in each table Introduction Data Integration Feature Building Modeling & Impact Data Source #1 Data Source #2 A B C A-B-C PARENT-TEACHER PARENT-TEACHERS GRANDFATHER CLOK GRANDFATHER_CLOCK KOALA 123 KOALA 122
  14. 14. 14© 2014 Pivotal Software, Inc. All rights reserved. Data Integration Challenges •  Problem: Manually entered data is prone to operator entry errors –  Hitting several keys –  Key strokes not appearing (e.g. missing a digit / character) •  Resolution options: –  Ignore rows if depth does not lie between previous and next values –  Replace value with interpolated result Timestamp Depth 2014-09-01 00:06:00 13504 2014-09-02 00:05:00 140068 2014-09-03 00:07:00 14754 2014-09-04 00:11:00 15388 2014-09-05 00:16:00 16100 Introduction Data Integration Feature Building Modeling & Impact
  15. 15. 15© 2014 Pivotal Software, Inc. All rights reserved. Understanding Correlations in Data •  Summary statistics and Correlations between variables need to be computed at-scale for >1000s of variable combinations •  Able to leverage MADlib’s parallel implementation of: –  ‘summary’ function –  Pearson’s correlation Introduction Data Integration Feature Building Modeling & Impact
  16. 16. 16© 2014 Pivotal Software, Inc. All rights reserved. Big Data Machine Learning in SQL Introduction Data Integration Feature Building Modeling & Impact Predictive Modeling Library Linear Systems •  Sparse and Dense Solvers Matrix Factorization •  Single Value Decomposition (SVD) •  Low-Rank Generalized Linear Models •  Linear Regression •  Logistic Regression •  Multinomial Logistic Regression •  Cox Proportional Hazards •  Regression •  Elastic Net Regularization •  Sandwich Estimators (Huber white, clustered, marginal effects) Machine Learning Algorithms •  Principal Component Analysis (PCA) •  Association Rules (Affinity Analysis, Market Basket) •  Topic Modeling (Parallel LDA) •  Decision Trees •  Ensemble Learners (Random Forests) •  Support Vector Machines •  Conditional Random Field (CRF) •  Clustering (K-means) •  Cross Validation Descriptive Statistics Sketch-based Estimators •  CountMin (Cormode- Muthukrishnan) •  FM (Flajolet-Martin) •  MFV (Most Frequent Values) Correlation Summary Support Modules Array Operations Sparse Vectors Random Sampling Probability Functions PMML Export http://madlib.net/
  17. 17. 17© 2014 Pivotal Software, Inc. All rights reserved. Complex Feature Set Across Multiple Data Sources •  Often useful to create features from time series variables and not just use them raw •  One such class of features are statistical features created on moving windows of time series data •  Fast computation of features is possible on Pivotal’s MPP platform leveraging window functions on native SQL (and MADlib or PL/R if needed for added functionality) Introduction Data Integration Feature Building Modeling & Impact Time window
  18. 18. 18© 2014 Pivotal Software, Inc. All rights reserved. Complex Feature Set Across Multiple Data Sources •  Depth •  Rate of Penetration •  Torque •  Weight on Bit •  RPM •  … •  Drill Bit details •  Component details etc. •  Failure events •  … Features on Time Windows •  Mean •  Median •  Standard Deviation •  Range •  Skewness •  … Final Set of Features on Time Windows Introduction Data Integration Feature Building Modeling & Impact Leverage GPDB / HAWQ (+ MADlib and PL/R if needed) for fast computation of hundreds of features over time windows within billions of rows of time-series data Operator data Drill Rig Sensor data
  19. 19. 19© 2014 Pivotal Software, Inc. All rights reserved. Working with Time Series Data •  Pivotal GPDB has built in support for dealing with time series data –  SQL window functions: e.g. lead, lag, custom windows –  More details in Pivotal’s Time Series Analysis blogs: http://blog.pivotal.io/tag/time-series-analysis Aggregations •  By time slice •  By custom window •  Example aggregates: Avg, median, variance Mapping What time slice does an observation at a particular timestamp map to? Pattern detection Introduction Data Integration Feature Building Modeling & Impact Rolling averages Gap filling and interpolation Running Accumulations
  20. 20. 20© 2014 Pivotal Software, Inc. All rights reserved. Predictive Analytics for Drilling Operations Predict function •  Predict Rate-of-Penetration –  Linear Regression –  Elastic Net Regularized Regression (Gaussian) –  Support Vector Machines Predict failure •  Predict occurrence of equipment failure in a chosen future time window –  Logistic Regression –  Elastic Net Regularized Regression (Binomial) –  Support Vector Machines •  Predict remaining life of equipment –  Cox Proportional Hazards Regression Introduction Data Integration Feature Building Modeling & Impact Elastic Net Regularized Regression •  Fits problem statements •  Ease of interpretation, scoring and operationalization •  Provides probability of failure in the binomial case •  Leveraged MADlib’s in-database parallel implementation
  21. 21. 21© 2014 Pivotal Software, Inc. All rights reserved. Background on Elastic Net Regularization •  Elastic Net regularization seeks to find a weight vector that, for any given training example set, minimizes: Advantages Limitations Ordinary Least Squares •  Unbiased estimators •  Significance levels for coefficients •  Highly affected by multi-collinearity •  Requires more records than predictors •  Feature selection Elastic Net Regularization •  Biased towards smaller MSE •  Less limitations on number of predictors •  Better at handling multi-collinearity •  Feature selection •  Multiple parameters •  No significance levels for coefficients where α∈[0,1], λ≥0 and L(w) is the linear/logistic objective function •  If α=0 à Ridge regularization •  If α=1 à LASSO regularization Available in MADlib: http://doc.madlib.net/latest/group__grp__elasticnet.html Introduction Data Integration Feature Building Modeling & Impact
  22. 22. 22© 2014 Pivotal Software, Inc. All rights reserved. Predictive Analytics for Drilling Operations Predict ROP Predict equipment failure Introduction Data Integration Feature Building Modeling & ImpactIntroduction Data Integration Feature Building Modeling & Impact Actual Predicted Time 0 0.5 1 0 0.5 1 ROC curve ROP time series
  23. 23. 23© 2014 Pivotal Software, Inc. All rights reserved. Data Science Platform and Technology Summary 0.5GB Platform PL/PythonPL/R Visualization Introduction Data Integration Feature Building Modeling & Impact
  24. 24. 24© 2014 Pivotal Software, Inc. All rights reserved. One step closer to zero unplanned downtime … •  Ability to fully utilize big data – volume, variety and velocity •  Comprehensive data integration framework for multiple complex data sources •  Learn and implement best practices for: –  Data governance policy –  Data capture techniques, flow, and curation –  Platform and toolset for data fabric •  Build and operationalize complex and extensible predictive models •  Improve efficiency, reduce costs and risks •  Gain competitive advantage by leveraging full big data analytics pipeline Business Impacts Introduction Data Integration Feature Building Modeling & Impact
  25. 25. A NEW PLATFORM FOR A NEW ERA

×