O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Transforming into a data-driven enterprise: paths to success

471 visualizações

Publicada em

Organisations of all kinds recognise that they must rapidly digitise their businesses to remain competitive in the face of massive technological change. They must develop new business models and routes to customer and partner engagement using the power of digital. That's why over 80% of the CEOs of large European companies have digital transformation (DX) at the centre of today’s corporate strategy. As part of this shift, forward-thinking companies are investing heavily in becoming data-driven organisations, utilising an evidence based culture that expands their capacity to collect, analyse and monetise data in areas such as enhancing customer experiences, empowering the workforce and rethinking business models.

Publicada em: Negócios
  • Get access to 16,000 woodworking plans, Download 50 FREE Plans... ●●● http://tinyurl.com/yy9yh8fu
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • The #1 Woodworking Resource With Over 16,000 Plans, Download 50 FREE Plans... ♥♥♥ http://ishbv.com/tedsplans/pdf
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Want to preview some of our plans? You can get 50 Woodworking Plans and a 440-Page "The Art of Woodworking" Book... Absolutely FREE ▲▲▲ http://tinyurl.com/y3hc8gpw
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Transforming into a data-driven enterprise: paths to success

  1. 1. 1© Cloudera, Inc. All rights reserved. Transforming into a data-driven enterprise: Paths to success Philip Carnelley | Research Director, IDC Michael Wrisley |Analytic Sales Enablement Director, Intel Wim Stoop | Senior Product Marketing Manager, Cloudera
  2. 2. Transforming into a data-driven enterprise: paths to success Philip Carnelley, Research Director, IDC Europe
  3. 3. Digital Transformation: A Board Level Agenda Item 3© IDC Visit us at IDC.com and follow us on Twitter: @IDC 80% of large European companies have “DX” at the heart of their corporate strategy Source: IDC, European DX Survey 2017
  4. 4. 21% 26% 29% 18% 6% Digital Resister Digital Explorer Digital Player Digital Transformer Digital Disrupter Many Organizations Are at a DX Deadlock © IDC 4 55% Source: IDC, European Digital Transformation Maturity Model Benchmark, 2017; n=403, May 2017 © IDC Visit us at IDC.com and follow us on Twitter: @IDC
  5. 5. Getting the Pulse to Test Our Ideas © IDC Visit us at IDC.com and follow us on Twitter: @IDC 5 750 Business and IT Leaders Across Western Europe All Major Industries 0 50 100 150 200 Finance and Insurance Telco and Media Public Sector / Government Retail Energy and Utilities Manufacturing and Automotive Source: IDC survey for Cloudera and Intel, 2017
  6. 6. Recognizing the Significance of Big Data Analytics to Digital Transformation © IDC Visit us at IDC.com and follow us on Twitter: @IDC 6 43% 70% Source: IDC survey for Cloudera and Intel, 2017 Now In2years “Important/Very Important” “Important/Very Important”
  7. 7. The New Digital Platform 7© IDC Visit us at IDC.com and follow us on Twitter: @IDC Source: IDC EXTERNAL PROCESSES Connected Processes Assets People INTERNAL PROCESSES INTELLIGENT CORE Mobile IoT AR/VR BOT API
  8. 8. The New Digital Platform 8 EXTERNAL PROCESSES Connected Processes Assets People INTERNAL PROCESSES INTELLIGENT CORE Mobile IoT AR/VR BOT API © IDC Visit us at IDC.com and follow us on Twitter: @IDC Source: IDC INTELLIGENT CORE Databases Data StreamsBig Data AI/MLAnalytics Decision Support
  9. 9. But … © IDC Visit us at IDC.com and follow us on Twitter: @IDC 9 12% 44% 33% 11% Still exploring Enterprise-wide platform being established Platform available to customers and partners Source: IDC survey for Cloudera and Intel, 2017 37% Infrastructure is unsuitable 44%Skills issues 55% Uncoordinated Used in isolated pockets
  10. 10. Paths to Success © IDC Visit us at IDC.com and follow us on Twitter: @IDC 10 Adopt a flexible hybrid deployment model Seek to exploit advanced analytics / AI Choose a suitable platform for advanced analytics
  11. 11. What Do We Mean By AI? AI can be viewed in three layers: • Artificial intelligence — the broadest term, applying to any technique that enables computers to mimic human intelligence. More precisely, AI is the study and development of software and hardware that attempts to emulate a human being in learning and reasoning. • Machine learning — A subset of AI: the process of creating a statistical model from various types of data that perform various functions without having to be programmed by a human. Machine learning models are "trained" by various types of data (often, lots of data).This category includes deep learning. • Deep learning — The subset of machine learning composed of algorithms that permit software to train itself to perform tasks, like speech and image recognition, without specifying outcomes or goals. These generally rely on the input of large amounts of data. Cognitive computing / AI software systems are self-learning, reasoning systems that can augment or replace human decision-making in situations that involve complexity, very high information volumes, and/or uncertainty. They are adaptive, iterative and contextual, and make a new class of problems computable. AI Systems learn as they operate. They replace logic with data as the primary behavior driver. They are therefore critically dependent on (big) data. 11© IDC Visit us at IDC.com and follow us on Twitter: @IDC 11 AI ML DL
  12. 12. Establish a Flexible, Hybrid Deployment Model © IDC Visit us at IDC.com and follow us on Twitter: @IDC 12 Source: IDC survey for Cloudera and Intel, 2017
  13. 13. 46% Using open source data science frameworks and languages Seek to Exploit Advanced Analytics, AI and Machine Learning © IDC Visit us at IDC.com and follow us on Twitter: @IDC 13 94% 74% 17% 9%5% 20% 31% 17% Descriptive Predictive Prescriptive Cognitive analytics Using now Planning to use Source: IDC survey for Cloudera and Intel, 2017
  14. 14. Establish a Suitable Platform for Big Data, Advanced Analytics and AI © IDC Visit us at IDC.com and follow us on Twitter: @IDC 14 25% A quarter of organisations believe data science to be very or extremely important to their Big Data Analytics environment. This will grow. 31% Almost one third of organisations plan to use self-learning and AI techniques, e.g. deep learning and neural nets. Standard hardware platforms have a key role to play
  15. 15. Recap: Paths to Success © IDC Visit us at IDC.com and follow us on Twitter: @IDC 15 Adopt a flexible hybrid deployment model Seek to exploit advanced analytics / AI Choose a suitable platform for advanced analytics
  16. 16. Digital Winners are Leaders in Information 16 Source: IDC Custom Research 2016 © IDC Visit us at IDC.com and follow us on Twitter: @IDC The more mature an organization is in its information strategy, the more impactful its digital transformation efforts are.
  17. 17. © IDC Visit us at IDC.com and follow us on Twitter: @IDC 17 Philip Carnelley Research Director, Enterprise Software IDC Europe pcarnelley@idc.com @PCarnelley
  18. 18. Data Center Group Michael Wrisley Industry Technical Specialist
  19. 19. Data Center Group Begin your AI journey today using existing, familiar infrastructure DL training in days HOURS with up to 113X2 performance vs. prior gen (2.2x excluding optimized SW1) Robust support for full range of AI deployments Intel® Xeon® Scalable Processors Scalable performance for widest variety of AI & other datacenter workloads – including deep learning 1,2Configuration details on slide: 4, 5, 6 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804 The AI you need On the chip you know Built-in ROI Potent Performance Production Ready
  20. 20. Data Center Group Intel’s Role in Accelerating Analytics & AI Holistic Strategy from Edge-Cloud to the Enterprise ¥Note: Intel® Data Analytics Acceleration Library, Intel® Math Kernel Library, Intel® Math Kernel Library for Deep Neural Networks, BigDL: Distributed Deep Learning on Apache Spark*, MLib: Apache Spark’s Scalable Machine Learning Library *Other names and brands may be claimed as the property of others. Co- Optimizin g Applicatio ns Optimized Libraries Intel® MKL¥ Intel® MKL-DNN¥Intel® DAAL¥Intel® Distribution for Python* Intel® Nervana™ GraphMovidius MvTensor LibraryMLib* BigDL Open Source Enabling HARDWA RE/ SOFTWA RE Networking Lake Crest Compute Memory & Storage Artificial Intelligence Solutions
  21. 21. Data Center Group BigDL – DL On Your Existing Infrastructure, Now Make deep learning more accessible to big data and data science communities *Other names and brands may be claimed as the property of others. Continue the use of familiar SW tools and HW infrastructure to build deep learning applications Analyze “big data” using deep learning on the same Apache Hadoop*/Spark* cluster where the data are stored Add deep learning functionalities to the Big Data (Spark) programs and/or workflow Leverage existing Hadoop/Spark clusters to run deep learning applications Dynamically share with other workloads (e.g., ETL, data warehouse, feature engineering, statistic machine learning, graph analytics, etc.)
  22. 22. Data Center Group BigDL Industry Support – Start Today! Technology Cloud Service Providers End Users
  23. 23. Data Center Group More Resources….. www.intel.com/bigdata www.intel.com/ai www.intel.com/software Thank You!
  24. 24. 26© Cloudera, Inc. All rights reserved. Cloudera Enterprise 26 The modern platform for machine learning and analytics optimized for the cloud EXTENSIBLE SERVICES CORE SERVICES DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA CATALOG INGEST & REPLICATION SECURITY GOVERNANCE WORKLOAD MANAGEMENT DATA SCIENCE Amazon S3 Microsoft ADLS HDFS KUDU STORAGE SERVICES
  25. 25. 27© Cloudera, Inc. All rights reserved. • Unified security – protects sensitive data with consistent controls, even for transient and recurring workloads • Consistent governance – enables secure self-service access to all relevant data and increases compliance • Easy workload management – increases user productivity and boosts job predictability • Flexible ingest and replication – aggregates a single copy of all data, provides disaster recovery, and eases migration • Shared catalog – defines and preserves structure and business context of data for new applications and partner solutions Open platform services Built for multi-function analytics | Optimized for cloud
  26. 26. 28© Cloudera, Inc. All rights reserved. 5 keys to success 1) Build a data-driven culture 2) Develop the right team and skills 3) Be agile/lean in development 4) Leverage DevOps for production 5) Right-size data governance 28© Cloudera, Inc. All rights reserved.
  27. 27. 29© Cloudera, Inc. All rights reserved. World-class training, services, and support 3 top big data certifications Cloudera University Fastest route from zero to production Professional Services SCP-certified support anywhere in the world Cloudera Support
  28. 28. 30© Cloudera, Inc. All rights reserved. Published research subscription service Delivers cutting edge advances in applied ML / AI Accelerates adoption in large enterprises Drives demand for our platform Applied research for machine learning and data science Continued machine learning innovation 30© Cloudera, Inc. All rights reserved.
  29. 29. 31© Cloudera, Inc. All rights reserved. Thank you
  30. 30. Data Center Group
  31. 31. Data Center Group Notices and Disclaimers Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com. No computer system can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. © 2017 Intel Corporation. 3D XPoint, Arria, the Arria logo, Intel, the Intel logo, Intel Nervana, Intel Optane, Intel RealSense, Intel Xeon Phi, Stratix and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as property of others.
  32. 32. Data Center Group Notices and Disclaimers Slide 23 under Potent Performance current footnote #1 (2.2x performance) 2.2X higher deep learning training and inference performance than the prior generation: Platform: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux* release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Compared with Platform: 2S Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz (22 cores), HT enabled, turbo disabled, scaling governor set to “performance” via acpi-cpufreq driver, 256GB DDR4-2133 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3500 Series (480GB, 2.5in SATA 6Gb/s, 20nm, MLC). Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact,1,0‘, OMP_NUM_THREADS=44, CPU Freq set with cpupower frequency-set -d 2.2G -u 2.2G -g performance. Neon: ZP/MKL_CHWN branch commit id:52bd02acb947a2adabb8a227166a7da5d9123b6d. Dummy data was used. The main.py script was used for benchmarking , in mkl mode. ICC version used : 17.0.3 20170404, Intel® MKL small libraries version 2018.0.20170425; Inference and training throughput uses FP32 instructions.
  33. 33. Data Center Group Slide 23 under Potent Performance current footnote #2 (113x) https://www.intel.com/content/www/us/en/benchmarks/server/xeon-scalable/xeon-scalable-artificial-intelligence.html Notices and Disclaimers Platform 2S Intel® Xeon® Platinum 8180 processor CPU @ 2.50GHz (28 cores) 2S Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz (22 cores) Hyper Threading HT disabled HT enabled Turbo Turbo disabled Turbo disabled Driver Scaling governor set to “performance” via intel_pstate driver Scaling governor set to “performance” via acpi-cpufreq driver Memory 384GB DDR4-2666 ECC RAM 256GB DDR4-2133 ECC RAM OS CentOS* Linux release 7.3.1611 (Core) CentOS* Linux release 7.3.1611 (Core) Kernel Linux kernel 3.10.0-514.10.2.el7.x86_64 Linux kernel 3.10.0-514.10.2.el7.x86_64 SSD SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC) SSD: Intel® SSD DC S3500 Series (480GB, 2.5in SATA 6Gb/s, 20nm, MLC) Performance Measurement Command Variables Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set - d 2.5G -u 3.8G -g performance Environment variables: KMP_AFFINITY='granularity=fine, compact,1,0‘, OMP_NUM_THREADS=44, CPU Freq set with cpupower frequency-set -d 2.2G -u 2.2G -g performance Caffe Revision Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Other Arguments Training measured with “caffe time” command. Caffe run with “numactl - l“. Training measured with “caffe time” command. Dataset For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topologies Topology specs from https://github.com/intel/caffe/tree/master/ models/intel_optimized_models (GoogLeNet v1), Topology specs from https://github.com/intel/caffe/tree/master/ models/intel_optimized_models (GoogLeNet v1), Compiler Intel C++ compiler ver. 17.0.2 20170213 GCC 4.8.5 Library Intel® MKL small libraries version 2018.0.20170425 Intel® MKL small libraries version 2017.0.2.20170110
  34. 34. Data Center Group Hardware Configuration Processors Platinum 8160 E5-2699 v4 Number of Nodes in Cluster 4 (1 master + 3 workers) 4 (1 master + 3 workers) Number of Sockets per Node 2 2 Number of Cores per Node 48 Cores/ 96 Threads 44 Cores/ 88 Threads Clock 2.1 GHz (3.70 GHz Max) 2.2 GHz (3.60 GHz Max) Cache 33 MB L3 Cache 55MB Smart Cache Memory 384GB DDR4 (12 x 32GB, 2666 MT/s) 384GB DDR4 (24 x 16GB, 2133 MT/s) Storage 8x800GB SATA SSD 8x800GB SATA SSD Network 10 Gigabit 10 Gigabit Decision Support Workload Performance Comparison Notices and Disclaimers
  35. 35. Data Center Group BIOS Knob SKX BDX BIOS version SE5C620.86B.01.00.0470.040720170855 SE5C610.86B.01.01.0018.072020161249 Hyper-Threading Enabled Enabled Other Options Default Default Decision Support Workload Performance Comparison Notices and Disclaimers
  36. 36. Data Center Group Decision Support Workload Performance Comparison * Software Stack A – Old software stack with old software component versions ** Software Stack B – New software stack with upgraded software component versions (more software optimizations included, such as Hive Parquet Vectorization) Software Configuration SKX BDX OS CentOS 7.3 CentOS 7.3 Kernel 3.10.0- 514.el7.x86_64 3.10.0- 514.el7.x86_64 Java Oracle JDK 1.8.0_121 Oracle JDK 1.8.0_121 Hadoop 2.7.3 2.7.3 File System HDFS HDFS Hive 2.0.0 2.0.0 Spark 1.6.3 1.6.3 Software Configuratio n SKX BDX OS CentOS 7.3 CentOS 7.3 Kernel 3.10.0- 514.el7.x86_64 3.10.0- 514.el7.x86_64 Java Oracle JDK 1.8.0_121 Oracle JDK 1.8.0_121 Hadoop 2.7.3 2.7.3 File System HDFS HDFS Hive 3.0.0-SNAPSHOT (commit id: 3330403) 3.0.0-SNAPSHOT (commit id: 3330403) Spark 2.0.2 2.0.2 Notices and Disclaimers
  37. 37. Data Center Group Hardware Configuration (each data node) Processors E5-2697v4 (BDX) Xeon Platinum 8168 (SKX) Nodes 8 Number of Sockets 2 Number of Cores / Socket 18 Cores / 36 Threads 24 Cores / 48 Threads Clock 2.3 GHz 2.7 GHz L3 Cache 45 MB 33 MB Memory 768 GB (24 * 32GB Samsung DIMMs @ 2133/2400MHz) 768 GB (12 * 64GB Micron DIMMS @ 2400MHz) Data Storage (SATA3 SSDs) 2 * 2 TB + 2 * 1 TB Network 1 * 10 Gbps Ethernet TPCx-BB and Hibench System Configuration Hardware Notices and Disclaimers
  38. 38. Data Center Group BigBench and Hibench System Configuration Software Software Configuration OS CentOS release 7.3 Kernel 3.10.0-514.el7.x86_64 Java 1.8.0_131 Python 2.7.5 Hadoop 2.7.3 File System HDFS Spark 2.2.0 Notices and Disclaimers
  39. 39. Data Center Group Intel® Math Kernel Library Intel® MLSL Intel® Data Analytics Acceleration Library (DAAL) Intel® Distribution Open Source Frameworks Intel Deep Learning SDK Intel® Computer Vision SDKIntel® MKL MKL-DNN High Level Overview High performance math primitives granting low level of control Free open source DNN functions for high-velocity integration with deep learning frameworks Primitive communication building blocks to scale deep learning framework performance over a cluster Broad data analytics acceleration object oriented library supporting distributed ML at the algorithm level Most popular and fastest growing language for machine learning Toolkits driven by academia and industry for training machine learning algorithms Accelerate deep learning model design, training and deployment Toolkit to develop & deploying vision- oriented solutions that harness the full performance of Intel CPUs and SOC accelerators Primary Audience Consumed by developers of higher level libraries and Applications Consumed by developers of the next generation of deep learning frameworks Deep learning framework developers and optimizers Wider Data Analytics and ML audience, Algorithm level development for all stages of data analytics Application Developers and Data Scientists Machine Learning App Developers, Researchers and Data Scientists. Application Developers and Data Scientists Developers who create vision- oriented solutions Example Usage Framework developers call matrix multiplication, convolution functions New framework with functions developers call for max CPU performance Framework developer calls functions to distribute Caffe training compute across an Intel® Xeon Phi™ cluster Call distributed alternating least squares algorithm for a recommendation system Call scikit-learn k-means function for credit card fraud detection Script and train a convolution neural network for image recognition Deep Learning training and model creation, with optimization for deployment on constrained end device Use deep learning to do pedestrian detection … Data Scientists: Libraries, Frameworks & Tools Find out more at software.intel.com/ai

×