SlideShare uma empresa Scribd logo
1 de 22
Evolution of Big Data at Intel - crawl, walk
and run approach
Gomathy Bala | Director
Chandhu Yalla | Manager & Architect
Key Contributors: Sonja Sandeen, Seshu Edala, Nghia Ngo and Darin Watson
IT BI Big Data Team
Copyright © 2014, Intel Corporation. All rights reserved.
Legal Notices
This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.
The content in this presentation is being shared Under NDA.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others.
Copyright © 2014, Intel Corporation. All rights reserved.
2
Copyright © 2014, Intel Corporation. All rights reserved.
Agenda
• Intel IT Big Data Journey
• Enterprise DW architecture
• BI Big Data 3 yr Roadmap
• Big Data Ecosystem Architecture
• Platform Strategies & BKMs
• Summary
3
Copyright © 2014, Intel Corporation. All rights reserved.
2011 2012 2013 2014 2015
Intel IT Big Data Journey
4
Big Data
&
Analytics
Strategy
Production
Online
Telmap:
1st Use Case
Preproduction
Online
Hadoop
Evaluation
IDH to CDH
Hadoop 2.0
$176M BV
Production: Security BI,
Attribute Reduction System,
ATM Ellipses Engine, IAH-
Retail Analytics
6 Environments
CDH 5.3
4 Use Cases in
Preproduction
12 POC Use
Cases
6 Use Cases in
Production
$290K
investment
$948/TB
3 Use Cases in
Production
Smart-What, Marketing-
IAH, Incident
Predictability
$6M BV
CDH 5.1
IAH – Cloud CRM
In Production
Enterprise
Standards,
Guidance,
Processes for
Platform &
Capabilities
15 Active Use Cases | $290K + 10.5 HC Investment | Delivered $182M BV
Copyright © 2014, Intel Corporation. All rights reserved.
Big Data & Analytics Really Delivers!
5From 2014 – 2015 Intel IT Business Review – Annual Edition
Kim's Video
Copyright © 2014, Intel Corporation. All rights reserved.
Any Data Source
ERP
In Memory Real-Time Data Platform
CRM
SCM
SRM
ECC
BW
ECCW
Real-Time & Self Service
Analytics Platform
MDG
NW
Teradata Cloudera Hadoop Data Lake
Reporting Tools
Data Tiering
Hot-Cold data
Enterprise
Data Warehouse
Other Apps
Custom
Intel
…
NR
T
Predictive
Analytics
BPC
BCS
Cloud
BI
Saa
S
New
Apps.
Downstream
Applications
2014-2017 Vision: Real-Time Enterprise
6
Copyright © 2014, Intel Corporation. All rights reserved.
FE Tools
CLS/Proxy
High speed data loader
BigData
• Machine Learning
• Log Processing
• Unstructured data
Use Cases
• High volume counter Analytics
• Text Parsing/Mining
• Strategic/Operational reporting
• Interactive Reporting
Use Cases
• High Concurrent user analytics -
Supply/Order
• Mission critical analytics – Finance/HR
SQL on Hadoop
Enterprise Data Architecture with Hadoop and Other MPP DWH
Current & Future Strategy
Future Present
EDWMfg Data
A %ge of
Traditiona
l BI use
cases
IMT
Copyright © 2014, Intel Corporation. All rights reserved.
BI Big Data | 3-Year Roadmap
8
Big Data + AA
Big Data + SSAA +
Traditional BI
Big Data + SSAA +
Traditional BI
2015
2016
2017
Scalable and well
designed Hadoop
Platform
 Evolve IMT + Hadoop
 Data Lineage & Data
Catalog
 Streaming Capabilities
 Advanced SQL on Hadoop
 ACID semantics
 Evolve Big Data + SSAA per
ecosystem roadmaps
 BC/DR
 End to end enterprise features
 Enterprise ready: OLAP and
Traditional DW
Hadoop is an open source framework designed for big data analytics.
Hadoop is evolving rapidly, but it will still take a couple of years for it to
mature and support “traditional bi” use cases.
Legend
Orange Text: Traditional BI Capabilities
Green Text: Big Data/AA Capabilities
 Security (RBAC, ITS/IRS)
 Data Governance
 Data Discovery
 Self Service AA Framework
 IMT + Hadoop
 AVP + Hadoop
 In-memory + Near real time
capabilities
 SQL on Hadoop
Copyright © 2014, Intel Corporation. All rights reserved.
Data Integration
Big Data Platform – Ecosystem Architecture & Maturity
9
NRT/Stream Processing In-Memory Processing
Processing
Layer Batch Processing
Data Virtualization Data DiscoveryAdv. AnalyticsAdv. Visualization
Data
Management
Presentation
Layer
End User
Data
Steward
Business
Analyst
Data
Scientist
DeveloperUser layer Auditor
Machine Learning
Analytical
layer Statistical
Numerical Time series
Textual/Log Spatial
Graph
Textual/Log DB Hierarchy DBRelational DB Graph DB
Storage
Model
Platform Virtualization
Infrastructure
Platform Management Network Management Systems Management
Data Ingestion
Continuous IntegrationDev Framework Security
Source/Target APIs 3rd Party Drivers
Ent. Scheduler Srvs Metadata MgmtWorkload Mgmt
Middleware
*Other names and brands may be claimed as the property of others.
Columnar DB
Data Egression
Other Vendors offered capabilities
Majority CDH offered capabilities
Data Consumption
Prescriptive
Guidance
Change
Release
GovernanceEngagement
Service
Management
Training
Support
Processes
Copyright © 2014, Intel Corporation. All rights reserved.
BI Big Data Platform
10
Hadoop Project Sandbox – CDH 5.3
Multiple Instances
Deployed on Intel Cloud & MyCloud
environments. TTM to business: 2-3 Days
Hadoop Pre-Production – CDH 5.3
10 data nodes | 399TB | 320 vcores
Use cases in Dev/POC: 14
Hadoop Production – CDH 5.3
22 data nodes | 658TB | 704 vcores
Use cases Live in prod: 7
 Hadoop 2.0 architecture provides reliability,
scalability & performance
 High availability and scalability design
 Well positioned to meet 2015 business use case
requirements
 Repeatable architecture for faster builds.
 Capacity additions: Add data node. White boxes,
Waterfall equipment or HP servers
 TTM: Varies depending on HW (3 wks-2 months) Job/Workflow
Management
Data Node Data Node Data Node Data Node Data Node
Name Node
Resource Mgr
Name Node
Resource Mgr
heartbeat, balancing, replication
YARN
Scale to meet business needs
Gateway
Nodes
(NN hi-av)
Gateway nodes
Login (ssh) : AD authentication &
authorization, access cluster, run
HDFS commands, submit jobs, etc.
Management
Node
Source Data
DB Data
Visualization
Tools
Data Movement/ETL
EDW or Datamart
DB data
Unstructured Semi-structured
Copyright © 2014, Intel Corporation. All rights reserved.
• Skills and resources with time to ramp up
• Starting small is ok. Focus on design and scalability for the platform.
• Technical product evaluation
 Stick with a distribution which is core Hadoop open source stack vs proprietary software
• Security is a big deal to Intel, Big Data Security capabilities implementation is
key focus
• Methodology to understand the data is to use an iterative discovery method with
technical, business and modeling teams.
• Intel IT Big Data Journey benefited heavily from Cloudera partnership
• Open source will play a big role in advancing Big Data capabilities and analytics
BKM’s | Summary
Copyright © 2014, Intel Corporation. All rights reserved.
BI Big Data IT@Intel Resource Info
12
BI Big Data IT@Intel Resource Links:
1. Hadoop Migration Success Story: How Intel IT Moved to Cloudera
2. Mining Big Data in the Enterprise for Better Business Intelligence
3. Enabling Big Data Platforms and Solutions with Centralized Data Management
4. Integrating Apache Hadoop* into Intel’s Big Data Environment
5. Using a Multiple Data Warehouse Strategy to Improve BI Analytics
To learn more: www.intel.com/bigdata
Copyright © 2014, Intel Corporation. All rights reserved.
Q & A
13
Intel Confidential — Do Not Forward
Copyright © 2014, Intel Corporation. All rights reserved.
Backup
15
Copyright © 2014, Intel Corporation. All rights reserved.
Big Data Capability Catalog
Hive
HDFS MapReduceZookeeper
Pig Mahout
NetworkServers Storage Security OS Hi-AvEAM / AD Integration
HDFS Compress
WHIRR
Hbase
Governance
Change
Release
Engagement
Service mgmt.
Prescriptive
Guidance
Training
SQOOP JDBC
Other DW
Infrastructure
Process
Cloudera* Distribution of Hadoop (CDH)
*Other names and brands may be claimed as the property of others.
Storm
Hcatalog
ACCUMULOYARN
SPARK
Autosys
SecureGIT
Impala JDBC
HiveODBC
3rd Party SW/Connectors
Integration
HUE SOLRIMPALA
PARQUET DataFu
Impala ODBC
TDCH
Oozie
Kafka
Sqoop
DI
Gateway
Flume
SFTP
SMBClient
Data
Integration
Camel
Enabled PlannedWIP
Avail. Now 1-3 Months 3-6+ Months
Cloudera Manager*
System Management
Cloudera Navigator*
Data Management
Audit
Access Control
Discovery Explore
Lineage Lifecyle
DeploymentMonitoring Reporting Diagnostics
Alerting
Service
Management
Rolling
Upgrades
Config
Rollbacks
List includes only the capabilities planned for next 6 months.
16
Google Analytics
SFDC
Sentry
Copyright © 2014, Intel Corporation. All rights reserved.
i. Find Differences with a
Comparative Evaluation in a
Sandbox Environment
ii. Define Your Strategy for the
Cloudera Implementation
iii. Split the Hardware
Environment
iv. Upgrade the Hadoop Version
v. Create a Preproduction-to-
Production Pipeline
vi. Rebalance the Data
Migration to Cloudera – 6 BKMs
Copyright © 2014, Intel Corporation. All rights reserved.
Building Block Strategy to Enterprise Security of Hadoop
Q1’15: Perimeter access with LDAP + finer grain
controls with Sentry. The second building block
towards enterprise grade security design.
Q2’15: Add Kerberos to enable
more Hadoop components and
further secure the platform
2H’15: Exploration starting,
awaiting product and target to
adopt in 2H’15 in Production.
NowQ2’15 2H’15
Copyright © 2014, Intel Corporation. All rights reserved.
Hadoop Maturity & Evolution
19
MapReduce
(batch data processing, cluster
resource management)
HDFS 1.0
(redundant, reliable
data storage)
Hadoop 1.0
YARN
(cluster resource management)
HDFS 2.0
(redundant, reliable data storage)
Interactive
(Impala)
In-Memory
(Spark)
Batch
(Map
Reduce)
Online
(Hbase)
Others
(Search, Storm
etc.)
Graph
Applications Run Natively In Hadoop
+ Scalable data storage and processing
platform
+ Positioned for Batch processing workloads
for Map and Reduce only
+ Apache Hive offers SQL like query
language
- Lacks reliability and stability
- No support for low latency queries
 Apache YARN allows you to run multiple applications in Hadoop and provides reliability, scalability
and performance
 Advanced Resource Management
 Apache Hive offers a 50x improvement in performance for queries
 Cloudera Impala to support low latency query requirements with SQL-92 and SQL- 2000 support
 Data at Rest Encryption and Row Level/Cell Level Security planned
 Data Streaming and Search Capability
 GraphDB
 Expanded Data Governance
 IMT + Hadoop Integration
 Improved Front End tool integration/support
 Deeper Diagnostics for multiple components
2005 - 2012 2013 - 2014
Hadoop 2.0
HDFS
(redundant, reliable
data storage)
YARN
(cluster resource management)
Batch
(Map Reduce)
Others
(data processing)
2015 - 2017
Copyright © 2014, Intel Corporation. All rights reserved.
2014 Intel IT Vital Statistics
20
>6,300 IT employees
59 global IT sites
>98,000 Intel employees1
168 Intel sites in 65 Countries
64 Data Centers
(91 Data Centers in 2010)
80% of servers virtualized
(42% virtualized in 2010, goal of 75%)
>147,000+ Devices
100% of laptops encrypted
100% of laptops with SSD’s
>43,200 handheld devices
57 mobile applications developed
Source: Information provided by Intel IT as of Jan 2014
1Total employee count does not include wholly owned subsidiaries that Intel IT
does not directly support
Copyright © 2014, Intel Corporation. All rights reserved.
Copyright © 2014, Intel Corporation. All rights reserved.
Big Data in the Industry
21
Recommendation Engine Fraud Detection
Sentiment Analytics
Behavioral Targeting
Customer Experience AnalyticsMarketing campaign Analytics
Copyright © 2014, Intel Corporation. All rights reserved.
Learn more about Intel IT’s Initiatives at
www.intel.com/IT
Sharing Intel IT Best Practices
With the World

Mais conteúdo relacionado

Mais procurados

Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Accenture Regulatory Reporting As A Service
Accenture Regulatory Reporting As A ServiceAccenture Regulatory Reporting As A Service
Accenture Regulatory Reporting As A Serviceaccenture
 
Enterprise Data Management Framework Overview
Enterprise Data Management Framework OverviewEnterprise Data Management Framework Overview
Enterprise Data Management Framework OverviewJohn Bao Vuu
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesCarole Gunst
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingAmazon Web Services
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Evolution of Data at Nubank - Product.io Meetup 2019-01-29
Evolution of Data at Nubank - Product.io Meetup 2019-01-29Evolution of Data at Nubank - Product.io Meetup 2019-01-29
Evolution of Data at Nubank - Product.io Meetup 2019-01-29André de Lannoy Tavares
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 

Mais procurados (20)

Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Accenture Regulatory Reporting As A Service
Accenture Regulatory Reporting As A ServiceAccenture Regulatory Reporting As A Service
Accenture Regulatory Reporting As A Service
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Enterprise Data Management Framework Overview
Enterprise Data Management Framework OverviewEnterprise Data Management Framework Overview
Enterprise Data Management Framework Overview
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
 
Big Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud DetectionBig Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud Detection
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Evolution of Data at Nubank - Product.io Meetup 2019-01-29
Evolution of Data at Nubank - Product.io Meetup 2019-01-29Evolution of Data at Nubank - Product.io Meetup 2019-01-29
Evolution of Data at Nubank - Product.io Meetup 2019-01-29
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 

Destaque

Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicDataWorks Summit
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitDataWorks Summit
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitDataWorks Summit
 
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...DataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesAirflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesDataWorks Summit
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2DataWorks Summit
 
large scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraphlarge scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache GiraphDataWorks Summit
 
Apache Lens: Unified OLAP on Realtime and Historic Data
Apache Lens: Unified OLAP on Realtime and Historic DataApache Lens: Unified OLAP on Realtime and Historic Data
Apache Lens: Unified OLAP on Realtime and Historic DataDataWorks Summit
 
From Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for AllFrom Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for AllDataWorks Summit
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopDataWorks Summit
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsDataWorks Summit
 
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceImproving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceDataWorks Summit
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeDataWorks Summit
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionDataWorks Summit
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application ResourcesDataWorks Summit
 

Destaque (20)

Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesAirflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2
 
large scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraphlarge scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraph
 
Apache Lens: Unified OLAP on Realtime and Historic Data
Apache Lens: Unified OLAP on Realtime and Historic DataApache Lens: Unified OLAP on Realtime and Historic Data
Apache Lens: Unified OLAP on Realtime and Historic Data
 
From Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for AllFrom Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for All
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
 
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceImproving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
 
Comparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBaseComparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBase
 

Semelhante a Evolution of Big Data at Intel - Crawl, Walk and Run Approach

Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsjdijcks
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopCloudera, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformEMC
 
EMC Pivotal overview deck
EMC Pivotal overview deckEMC Pivotal overview deck
EMC Pivotal overview deckmister_moun
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpbigdata sunil
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emcTaldor Group
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 

Semelhante a Evolution of Big Data at Intel - Crawl, Walk and Run Approach (20)

Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
EMC Pivotal overview deck
EMC Pivotal overview deckEMC Pivotal overview deck
EMC Pivotal overview deck
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExp
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Big Data
Big DataBig Data
Big Data
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Último (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Evolution of Big Data at Intel - Crawl, Walk and Run Approach

  • 1. Evolution of Big Data at Intel - crawl, walk and run approach Gomathy Bala | Director Chandhu Yalla | Manager & Architect Key Contributors: Sonja Sandeen, Seshu Edala, Nghia Ngo and Darin Watson IT BI Big Data Team
  • 2. Copyright © 2014, Intel Corporation. All rights reserved. Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. The content in this presentation is being shared Under NDA. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. * Other names and brands may be claimed as the property of others. Copyright © 2014, Intel Corporation. All rights reserved. 2
  • 3. Copyright © 2014, Intel Corporation. All rights reserved. Agenda • Intel IT Big Data Journey • Enterprise DW architecture • BI Big Data 3 yr Roadmap • Big Data Ecosystem Architecture • Platform Strategies & BKMs • Summary 3
  • 4. Copyright © 2014, Intel Corporation. All rights reserved. 2011 2012 2013 2014 2015 Intel IT Big Data Journey 4 Big Data & Analytics Strategy Production Online Telmap: 1st Use Case Preproduction Online Hadoop Evaluation IDH to CDH Hadoop 2.0 $176M BV Production: Security BI, Attribute Reduction System, ATM Ellipses Engine, IAH- Retail Analytics 6 Environments CDH 5.3 4 Use Cases in Preproduction 12 POC Use Cases 6 Use Cases in Production $290K investment $948/TB 3 Use Cases in Production Smart-What, Marketing- IAH, Incident Predictability $6M BV CDH 5.1 IAH – Cloud CRM In Production Enterprise Standards, Guidance, Processes for Platform & Capabilities 15 Active Use Cases | $290K + 10.5 HC Investment | Delivered $182M BV
  • 5. Copyright © 2014, Intel Corporation. All rights reserved. Big Data & Analytics Really Delivers! 5From 2014 – 2015 Intel IT Business Review – Annual Edition Kim's Video
  • 6. Copyright © 2014, Intel Corporation. All rights reserved. Any Data Source ERP In Memory Real-Time Data Platform CRM SCM SRM ECC BW ECCW Real-Time & Self Service Analytics Platform MDG NW Teradata Cloudera Hadoop Data Lake Reporting Tools Data Tiering Hot-Cold data Enterprise Data Warehouse Other Apps Custom Intel … NR T Predictive Analytics BPC BCS Cloud BI Saa S New Apps. Downstream Applications 2014-2017 Vision: Real-Time Enterprise 6
  • 7. Copyright © 2014, Intel Corporation. All rights reserved. FE Tools CLS/Proxy High speed data loader BigData • Machine Learning • Log Processing • Unstructured data Use Cases • High volume counter Analytics • Text Parsing/Mining • Strategic/Operational reporting • Interactive Reporting Use Cases • High Concurrent user analytics - Supply/Order • Mission critical analytics – Finance/HR SQL on Hadoop Enterprise Data Architecture with Hadoop and Other MPP DWH Current & Future Strategy Future Present EDWMfg Data A %ge of Traditiona l BI use cases IMT
  • 8. Copyright © 2014, Intel Corporation. All rights reserved. BI Big Data | 3-Year Roadmap 8 Big Data + AA Big Data + SSAA + Traditional BI Big Data + SSAA + Traditional BI 2015 2016 2017 Scalable and well designed Hadoop Platform  Evolve IMT + Hadoop  Data Lineage & Data Catalog  Streaming Capabilities  Advanced SQL on Hadoop  ACID semantics  Evolve Big Data + SSAA per ecosystem roadmaps  BC/DR  End to end enterprise features  Enterprise ready: OLAP and Traditional DW Hadoop is an open source framework designed for big data analytics. Hadoop is evolving rapidly, but it will still take a couple of years for it to mature and support “traditional bi” use cases. Legend Orange Text: Traditional BI Capabilities Green Text: Big Data/AA Capabilities  Security (RBAC, ITS/IRS)  Data Governance  Data Discovery  Self Service AA Framework  IMT + Hadoop  AVP + Hadoop  In-memory + Near real time capabilities  SQL on Hadoop
  • 9. Copyright © 2014, Intel Corporation. All rights reserved. Data Integration Big Data Platform – Ecosystem Architecture & Maturity 9 NRT/Stream Processing In-Memory Processing Processing Layer Batch Processing Data Virtualization Data DiscoveryAdv. AnalyticsAdv. Visualization Data Management Presentation Layer End User Data Steward Business Analyst Data Scientist DeveloperUser layer Auditor Machine Learning Analytical layer Statistical Numerical Time series Textual/Log Spatial Graph Textual/Log DB Hierarchy DBRelational DB Graph DB Storage Model Platform Virtualization Infrastructure Platform Management Network Management Systems Management Data Ingestion Continuous IntegrationDev Framework Security Source/Target APIs 3rd Party Drivers Ent. Scheduler Srvs Metadata MgmtWorkload Mgmt Middleware *Other names and brands may be claimed as the property of others. Columnar DB Data Egression Other Vendors offered capabilities Majority CDH offered capabilities Data Consumption Prescriptive Guidance Change Release GovernanceEngagement Service Management Training Support Processes
  • 10. Copyright © 2014, Intel Corporation. All rights reserved. BI Big Data Platform 10 Hadoop Project Sandbox – CDH 5.3 Multiple Instances Deployed on Intel Cloud & MyCloud environments. TTM to business: 2-3 Days Hadoop Pre-Production – CDH 5.3 10 data nodes | 399TB | 320 vcores Use cases in Dev/POC: 14 Hadoop Production – CDH 5.3 22 data nodes | 658TB | 704 vcores Use cases Live in prod: 7  Hadoop 2.0 architecture provides reliability, scalability & performance  High availability and scalability design  Well positioned to meet 2015 business use case requirements  Repeatable architecture for faster builds.  Capacity additions: Add data node. White boxes, Waterfall equipment or HP servers  TTM: Varies depending on HW (3 wks-2 months) Job/Workflow Management Data Node Data Node Data Node Data Node Data Node Name Node Resource Mgr Name Node Resource Mgr heartbeat, balancing, replication YARN Scale to meet business needs Gateway Nodes (NN hi-av) Gateway nodes Login (ssh) : AD authentication & authorization, access cluster, run HDFS commands, submit jobs, etc. Management Node Source Data DB Data Visualization Tools Data Movement/ETL EDW or Datamart DB data Unstructured Semi-structured
  • 11. Copyright © 2014, Intel Corporation. All rights reserved. • Skills and resources with time to ramp up • Starting small is ok. Focus on design and scalability for the platform. • Technical product evaluation  Stick with a distribution which is core Hadoop open source stack vs proprietary software • Security is a big deal to Intel, Big Data Security capabilities implementation is key focus • Methodology to understand the data is to use an iterative discovery method with technical, business and modeling teams. • Intel IT Big Data Journey benefited heavily from Cloudera partnership • Open source will play a big role in advancing Big Data capabilities and analytics BKM’s | Summary
  • 12. Copyright © 2014, Intel Corporation. All rights reserved. BI Big Data IT@Intel Resource Info 12 BI Big Data IT@Intel Resource Links: 1. Hadoop Migration Success Story: How Intel IT Moved to Cloudera 2. Mining Big Data in the Enterprise for Better Business Intelligence 3. Enabling Big Data Platforms and Solutions with Centralized Data Management 4. Integrating Apache Hadoop* into Intel’s Big Data Environment 5. Using a Multiple Data Warehouse Strategy to Improve BI Analytics To learn more: www.intel.com/bigdata
  • 13. Copyright © 2014, Intel Corporation. All rights reserved. Q & A 13
  • 14. Intel Confidential — Do Not Forward
  • 15. Copyright © 2014, Intel Corporation. All rights reserved. Backup 15
  • 16. Copyright © 2014, Intel Corporation. All rights reserved. Big Data Capability Catalog Hive HDFS MapReduceZookeeper Pig Mahout NetworkServers Storage Security OS Hi-AvEAM / AD Integration HDFS Compress WHIRR Hbase Governance Change Release Engagement Service mgmt. Prescriptive Guidance Training SQOOP JDBC Other DW Infrastructure Process Cloudera* Distribution of Hadoop (CDH) *Other names and brands may be claimed as the property of others. Storm Hcatalog ACCUMULOYARN SPARK Autosys SecureGIT Impala JDBC HiveODBC 3rd Party SW/Connectors Integration HUE SOLRIMPALA PARQUET DataFu Impala ODBC TDCH Oozie Kafka Sqoop DI Gateway Flume SFTP SMBClient Data Integration Camel Enabled PlannedWIP Avail. Now 1-3 Months 3-6+ Months Cloudera Manager* System Management Cloudera Navigator* Data Management Audit Access Control Discovery Explore Lineage Lifecyle DeploymentMonitoring Reporting Diagnostics Alerting Service Management Rolling Upgrades Config Rollbacks List includes only the capabilities planned for next 6 months. 16 Google Analytics SFDC Sentry
  • 17. Copyright © 2014, Intel Corporation. All rights reserved. i. Find Differences with a Comparative Evaluation in a Sandbox Environment ii. Define Your Strategy for the Cloudera Implementation iii. Split the Hardware Environment iv. Upgrade the Hadoop Version v. Create a Preproduction-to- Production Pipeline vi. Rebalance the Data Migration to Cloudera – 6 BKMs
  • 18. Copyright © 2014, Intel Corporation. All rights reserved. Building Block Strategy to Enterprise Security of Hadoop Q1’15: Perimeter access with LDAP + finer grain controls with Sentry. The second building block towards enterprise grade security design. Q2’15: Add Kerberos to enable more Hadoop components and further secure the platform 2H’15: Exploration starting, awaiting product and target to adopt in 2H’15 in Production. NowQ2’15 2H’15
  • 19. Copyright © 2014, Intel Corporation. All rights reserved. Hadoop Maturity & Evolution 19 MapReduce (batch data processing, cluster resource management) HDFS 1.0 (redundant, reliable data storage) Hadoop 1.0 YARN (cluster resource management) HDFS 2.0 (redundant, reliable data storage) Interactive (Impala) In-Memory (Spark) Batch (Map Reduce) Online (Hbase) Others (Search, Storm etc.) Graph Applications Run Natively In Hadoop + Scalable data storage and processing platform + Positioned for Batch processing workloads for Map and Reduce only + Apache Hive offers SQL like query language - Lacks reliability and stability - No support for low latency queries  Apache YARN allows you to run multiple applications in Hadoop and provides reliability, scalability and performance  Advanced Resource Management  Apache Hive offers a 50x improvement in performance for queries  Cloudera Impala to support low latency query requirements with SQL-92 and SQL- 2000 support  Data at Rest Encryption and Row Level/Cell Level Security planned  Data Streaming and Search Capability  GraphDB  Expanded Data Governance  IMT + Hadoop Integration  Improved Front End tool integration/support  Deeper Diagnostics for multiple components 2005 - 2012 2013 - 2014 Hadoop 2.0 HDFS (redundant, reliable data storage) YARN (cluster resource management) Batch (Map Reduce) Others (data processing) 2015 - 2017
  • 20. Copyright © 2014, Intel Corporation. All rights reserved. 2014 Intel IT Vital Statistics 20 >6,300 IT employees 59 global IT sites >98,000 Intel employees1 168 Intel sites in 65 Countries 64 Data Centers (91 Data Centers in 2010) 80% of servers virtualized (42% virtualized in 2010, goal of 75%) >147,000+ Devices 100% of laptops encrypted 100% of laptops with SSD’s >43,200 handheld devices 57 mobile applications developed Source: Information provided by Intel IT as of Jan 2014 1Total employee count does not include wholly owned subsidiaries that Intel IT does not directly support Copyright © 2014, Intel Corporation. All rights reserved.
  • 21. Copyright © 2014, Intel Corporation. All rights reserved. Big Data in the Industry 21 Recommendation Engine Fraud Detection Sentiment Analytics Behavioral Targeting Customer Experience AnalyticsMarketing campaign Analytics
  • 22. Copyright © 2014, Intel Corporation. All rights reserved. Learn more about Intel IT’s Initiatives at www.intel.com/IT Sharing Intel IT Best Practices With the World

Notas do Editor

  1. 2
  2. Stream Processing or Complex Event Processing -- where small chunks of data come at rapid intervals [smaller quantum, requiring transformation]. E.g., Sensory data from manufacturing floors. Batch Processing -- aggregated chunks of data, perhaps collected over a long span, waiting to be analyzed in one run. OLAP processing. E.g. Gold path analysis on intel.com In-memory processing -- running interactive analytics over large batches of summary/factual data by leveraging the memory as the pre-emptive transient store. E.g. SQL aggregates/operational metrics from OLAP process Machine Learning -- class of unsupervised and supervised learning techniques destined for a decision support or an expert system Unsupervised Learning (No "response" variable. Just observations) -- tools Mahout Clustering -- E.g. customer segmentation; clustering users by age, ethnicity, gender, income standards, geo, profession, and buying propensity to new form factors. Frequent pattern mining -- E.g. co-branding strategies. People buying realsense cameras also downloading Intel XDK kits within 7 days of purchase. Supervised Learning [predicting a "response" variable when encountering a new "condition". The response patterns learned from prior training sets of course…] -- H2O Regression -- E.g. YoY growth for DCG Xeon co-processor shipment at 16% between 2011 and 2014. This year, we will ship 36 million units; current inventory levels at 23 mill Classification -- E.g. Customer (Widgets Inc) responses to email automation and phone calls favorable in the last 3 months. Last upgrade was 2 years ago. The likelihood of an enterprise upgrade is "high". Textual -- class of algorithms that "derive" meaning from what is otherwise flat left-to-right-top-to-bottom "text". Shred sentence structure into nouns-verbs-adjectives-adverbs; count entities and turn "text" into "terms" [features]. Encode the feature into a term-document or a "graph" representation so traditional analytics -- machine learning (supervised and unsupervised techniques may be applied). Lucene, SOLR is useful for indexing/tokenizing text; NLTK or Stanford parsers are useful to "tag" terms to class of linguistic tokens such as nouns and verbs. E.g. identify service management tickets that entail Windows 8.1 issues. Log -- Logs are textual in syntax but do not possess linguistic rigor. Such contents are useful just indexing as is and searching. The machines do not "decode" meaning. Humans synthesize and add logical rules when the content is surfaced back via a search interface. E.g Logstash used to monitor errors in log4j logs of Hive jobs. Spatial -- Class of problems that deal with spatial layout of entities. E.g. every die is sacred. Rationing and allocating sub-systems on a die via simulatory techniques to optimize wastage loss and maximize "premium" quality. Or optimizing lithographic etches that minimize orthogonal cuts by employing space-filling heuristics. Statistical -- class of problems that infer patterns from data that exhibits stochastic characteristics -- e.g. identifying aggreations like stddev, min, max, avg yields of a graphics die; and performing outlier analysis. Numerical -- class of problems that deal with data that exhibits deterministic characteristics -- e.g. Taguchi methods or iterative monte carlo methods that search and seek global minima/maxima. Genetic algorithms, deep learning methods/neural networks etc. Time-series -- class of problems that deal with data that exhibits stochasticity, but also exhibits temporal/seasonal resonance patterns. E.g. noise-cancellation filters that employ feedback loops; or predicting stock-price movement etc Graph -- class of problems that compute statistics about entities connected to other entities. E.g. computing pagerank/link-popularity of a web page, congestion patterns of a traffic flow, sewage system planning etc Storage Models Textual/Binary -- No DDL. All data is stored row-first, column-next where there is only one BLOB column per row. E.g Zip files, MainFrames Relational -- well specified DDL, but data is stored row-first [co-located fields of a row]; locking semantics at row level. Yields faster entity retrievals but poorer compression ratios when heterogeous fields co-exist in data. The index is built for row-offsets; e.g. -- Oracle, MySQL Columnar -- well specified DDL; but data is stored column-first [all first names are co-stored in ine file, last-names co-stored in another etc]; locking semantics at cell level. Yields faster aggregates [min, max on a single field], better compression ratios [because all fields of a columnar file are a homogenous type]. But lacks atomic consistency because a record change transpires into mutations in multiple "columnar/co-location" files. E.g. HBase, Cassandra Hierarchy -- well specified structural definition. Mostly follows a denormalized parent-child taxonomy. All fields relevant to a record are stored as a "hierarchic document" ala XML or JSON document. Yields a great consistency model because the grain of the data is a "document". Any mutation will always mean a complete denormalized update of the full document -- json or xml. E.g. MongoDB, CouchDB GraphDB -- native adjacency property graph that stores entities as "vertices" of a graph, relations as "edges", and attributes as "properties". Since indices are combinatorially developed on all -- entities, relations, and attributes -- adjacency mining, filtering, mutations are performant and atomic. E.g. Neo4J, TitanDB
  3. SLIDE PURPOSE: Who Are We … we are the IT organization at Intel (IT@Intel) .. Core background information on Intel IT and our mission/goals/capabilities Key Messages: We are the IT organization Inside Intel’s Business. Our organization is large, diverse multi-national enterprise with a wide variety of operational requirements and needs Our Vision is to accelerate Intel’s quest to connect and enrich the lives of every person on Earth by the end of the decade. Our Mission is to Grow Intel’s Business through Information Technology for Intel by facilitating IT Consumerization, delivering IT efficiency and continuity through Cloud Computing, increase employee productivity through seamless connectivity and Security, provide significant business value through Business Intelligence initiatives and drive increased collaboration through Social Computing. Review some of the Information/Key Stats shown here. Size and Location: 6,334 IT employees … Supporting over 98,000 employees. Note: Intel IT only reflects the number of employees we support directly (we exclude Intel employees who support wholly owned subsidiaries) Remote Support is Vital. Data Centers and Facilities: 59 Data Centers worldwide (down from 142 in 2007) Need to confirm this data[~55,000 servers (down from 100,000 in 2007) consuming a large electrical and power/cooling load (roughly 55MW total power) Our Data Centers also support 300M email messages (per month), >2,183 Terabytes WAN traffic (per month)] and store 45 petabytes of raw storage capacity Employee / Client Technology: Support over 147K devices (note >1 per employee ratio .. This ratio is growing with support of BYO and custom technology delivery to meet business needs) >We have been 80%+ mobile PCs (laptops) as our core employee technology standard since 1997 We have been actively evaluating, enabling and supporting many companion devices for improved productivity and flexibility Need to add what we are doing with tablets - Janet >43,200 Handhelds (variety of form factors (phones/tablets) vendors, software and solutions)  the majority of these devices are now EMPLOYEE OWNED Intel IT continues to embrace consumerization of IT and mobile applications are a major component of our strategy. We have delivered 57 mobile apps and counting to support new form factors. Our goal is to deliver a seamless, secure experience for our employees across a wide spectrum of devices by putting user experience first. Enabled Leadership Business Capabilities: Enable a top 25 supply chain (recognized by Gartner, previously AMR Research) . #25 in 2009, #18 in 2010, #16 in 2011, #7 in 2012 and #5 in 2013 key focus for IT innovation … delivered solid business results and competitive differentiation for Intel Additional fun facts … 100% Intel laptops support SSD and 100% are deployed with disk encryption