SlideShare uma empresa Scribd logo
1 de 34
BigData Shankar Radhakrishnan HCL Technologies July, 2011
Big Data in the News Savings American Health-Care: $300 Billion/Year European Public Sector: €250 Billion/Year Productivity Margins: 60% increase Sources: McKinsey Global Institute
Topics What do we collect today? DBMS Landscape The Disconnect The Need What is BigData? Characteristics Approach Architectural Requirements Techniques Challenges Solutions Issues Deep Dive – Practical Approaches to Big Data Hadoop Aster Data Case Studies Big Data @ HCL Project Carbon SMILe CoE - Opportunities, Challenges
What do we collect? In 2010, people stored data to fill 60,000 Library of Congress (LoC collected 235TB in Apr/2011) YouTube receives 24hours of video, every minute 5 Billion mobile phones in use in 2010 Tesco (British Retailer) collects 1.5 billion pieces of information to adjust prices and promotions Amazon.com: 30% of sales is out of its recommendation engine Planecast, Mobclix : Track & Target systems promotes contextual promotions A Boeing Jet Engine produces 20TB/Hour for engineers to examine in real time to make improvements Sources: Forrester, The Economist,McKinsey Global Institute
Collect More Business Operations Transactions Registers Gateways Customer Information CRM Product Information Barcodes RFID Web Pages Web Repositories Unstructured Information Social Media Signals Mobile GPS, GeoSpatial
DBMS Solutions Legacy Faster Retrieval Efficient Storage Divide and Access Data Consolidation Broader Tables Access all as a row Fine Grain Access Security Rules and Policies Problems Data Growth When storage cost is not an issue Scalability Issues Performance Issues New types of requirements Deciding what to analyze, when and how? Cost of a change in the subject-area to analyze
The Disconnect Old DBMS vs. New Data Types/Structures Old DBMS vs. New volume Old DBMS vs. New Analysis Old DBMS vs. Data Retention Old DBMS vs. Data Element Striping Old DBMS vs. Data Infrastructure Old DBMS vs. One DB Platform for all
The Need System that can handle high volume data Perform complex operations Scalable Robust Highly Available Fault Tolerant Economic New Approach
Big Data “Tools and techniques to manage different types of data, in high volume, in high velocitywith varied requirements to mine them” Characteristics Size Scale up and scale out: Terabyte, Petabyte … Structure Structured Unstructured : Audio, Video, Text, GeoSpatial Schema Less Structures Stream Torrent of real-time information Operation Massively Parallel Processing (MPP)
Approach Hardware Commodity Hardware Appliance Dynamic Scaling Fault Tolerant Highly Available No constraints on Storage Cloud Virtual Environment, Storage Processing Models In-memory In-database Interfaces/Adapters Workload Management Distributed Data Processing Software Frameworks – Hadoop, MapReduce, Vrije, BOOM, Bloom Open Source Proprietary
Architectural Requirements Integration Framework Development Framework Management Framework Modeling Framework Processing Framework Data Management Framework
Challenges Volumetric Analysis Complexity Streaming Data/Real Time Data Network Topology Infrastructure Pattern-based Strategy
Techniques Controlled and Variate Testing Mining Machine Learning Natural Language Processing (NLP) Cohort Analysis Network or Path Analysis Predictive Models Crowd Sourcing Regression Models Sentiment Analysis Processing Signals Spatial Analytics Visualization Time-series Analysis
Solutions IBM: Infosphere BigInsights, Streams Teradata/Aster Data: nCluster, SQL-MR Frameworks Hadoop MapReduce Infobright* Splunk Cloudera* Cassandra NoSQL, NewSQL Google’s Big Table Appliance Teradata Netezza (IBM) Columnar Databases Vertica (HP) ParAccel Managed Services Available
Issues Latency Faultiness Accuracy ACID Atomicity Consistency Isolation Durability Setup Cost Development Cost Cost-to-fly
Deep Dive Hadoop
Top level Apache project Open source Software Framework - Java Inspired by Google’s white papers onMap/Reduce (MR)Google File System (GFS)Big Table Originally developed to support Apache Nutch Designed Large scale data processing For batch processing For sophisticated analysis To deal with structured and unstructured data DB Architect’s Hadoop : "Heck Another Darn Obscure Open-source Project"
Why Hadoop? Runs on commodity hardware Portability across heterogeneous hardwareand software platforms Shared-nothing architecture Scale hardware when ever you want System compensates for hardware scalingand issues (if any) Run large-scale, high volume data processes Scales well with complex analysis jobs (Hardware) “Failure is an option” Ideal to consolidate data from both new andlegacy data sources Highly Integrable Value to the business
Hadoop Ecosystem HDFS	Hadoop Distributed File System Map/Reduce		Software framework for 			Clustered, Distributed data 			processing ZooKeeper	Scheduler Avro		Data Serialization Chukwa	Data Collection System to			monitor Distributed Systems HBase 		Data storage for distributed			large tables Hive			Data warehouse Pig		High-Level Query Language Scribe		Log Collection UDF			User Defined Functions
Hadoop Flow (Example) Network Storage Web Servers Scribe Oracle MySQL Hadoop Hive DWH MySQL Oracle Apps Feeds
HDFS Hadoop Distributed File System Master/Slave Architecture Runs on commodity hardware Fault Tolerant Handle large volumes of data Provides High Throughput Streaming data-access Simple file coherency model Portable to heterogeneous hardware and software Robust Handles disk failures, replication (& re-replication) Performs cluster rebalancing, data integrity checks
HDFS Architecture Name node ,[object Object]
Maps data-nodesData node ,[object Object]
Handles Data-blocks
Replication,[object Object]
Mapper Function cat * | grep | sort | uniq –c | cat > file input | map | shuffle | reduce | output
Reduce Function cat * | grep | sort | uniq –c | cat > file input | map | shuffle | reduce | output
Hadoop Environment
Who uses Hadoop?
Deep Dive Aster Data
Aster Data Now part of Teradata Massively Parallel SQL Layer on MR (MapReduce) In-Database Analytics Appliance vs. Software Stack Model Cloud Options nPath and Statistical Options Data Integration
nCluster
Deep Dive Case Studies
Aster nCluster @ Mobclix Mobclix App Library Smart Phone App-Ad Fabric Relevance Recommendation Events Ads Ad Exchange Context BI DB DB DB DB Targeting DB DB DB DB Amazon EC2 Analytics nCluster Amazon EC2 nCluster 1 TB/Hour 1-2 TB/Hour IP of Mobclix

Mais conteúdo relacionado

Mais procurados

ROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopDataWorks Summit
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...Romeo Kienzler
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesboorad
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big DataForwardSprint
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big DataLewis Crawford
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview pptVIKAS KATARE
 

Mais procurados (20)

Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
ROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on Hadoop
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big Data
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 

Destaque

Social media-101-presentation
Social media-101-presentationSocial media-101-presentation
Social media-101-presentationaja1973
 
Usama Fayyad talk in South Africa: From BigData to Data Science
Usama Fayyad talk in South Africa:  From BigData to Data ScienceUsama Fayyad talk in South Africa:  From BigData to Data Science
Usama Fayyad talk in South Africa: From BigData to Data ScienceUsama Fayyad
 
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Usama Fayyad
 
Bigdata Machine Learning Platform
Bigdata Machine Learning PlatformBigdata Machine Learning Platform
Bigdata Machine Learning PlatformMk Kim
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detectionMk Kim
 
Fraud Detection presentation
Fraud Detection presentationFraud Detection presentation
Fraud Detection presentationHernan Huwyler
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big DataBernard Marr
 
Presentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlPresentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlDominic Sroda Korkoryi
 
Big data overwiew
Big data overwiewBig data overwiew
Big data overwiewDataArt
 
Indian Auto Industry Analysis
Indian Auto Industry AnalysisIndian Auto Industry Analysis
Indian Auto Industry AnalysisNitesh Luthra
 
BCG Matrix of Engro foods
BCG Matrix of Engro foodsBCG Matrix of Engro foods
BCG Matrix of Engro foodsMutahir Bilal
 

Destaque (20)

Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Social media-101-presentation
Social media-101-presentationSocial media-101-presentation
Social media-101-presentation
 
Bigdata
BigdataBigdata
Bigdata
 
Usama Fayyad talk in South Africa: From BigData to Data Science
Usama Fayyad talk in South Africa:  From BigData to Data ScienceUsama Fayyad talk in South Africa:  From BigData to Data Science
Usama Fayyad talk in South Africa: From BigData to Data Science
 
Security bigdata
Security bigdataSecurity bigdata
Security bigdata
 
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
 
Bigdata Machine Learning Platform
Bigdata Machine Learning PlatformBigdata Machine Learning Platform
Bigdata Machine Learning Platform
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detection
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
Fraud Detection presentation
Fraud Detection presentationFraud Detection presentation
Fraud Detection presentation
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Presentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlPresentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & control
 
Big data overwiew
Big data overwiewBig data overwiew
Big data overwiew
 
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUDLITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
 
Indian Auto Industry Analysis
Indian Auto Industry AnalysisIndian Auto Industry Analysis
Indian Auto Industry Analysis
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
BCG Matrix of Engro foods
BCG Matrix of Engro foodsBCG Matrix of Engro foods
BCG Matrix of Engro foods
 

Semelhante a BigData

2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...oj08
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16AppDynamics
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisYuanyuan Tian
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Rio Info
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overviewSoojung Hong
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET Journal
 

Semelhante a BigData (20)

The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overview
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big Data
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
 

Último

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Último (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

BigData

  • 1. BigData Shankar Radhakrishnan HCL Technologies July, 2011
  • 2. Big Data in the News Savings American Health-Care: $300 Billion/Year European Public Sector: €250 Billion/Year Productivity Margins: 60% increase Sources: McKinsey Global Institute
  • 3. Topics What do we collect today? DBMS Landscape The Disconnect The Need What is BigData? Characteristics Approach Architectural Requirements Techniques Challenges Solutions Issues Deep Dive – Practical Approaches to Big Data Hadoop Aster Data Case Studies Big Data @ HCL Project Carbon SMILe CoE - Opportunities, Challenges
  • 4. What do we collect? In 2010, people stored data to fill 60,000 Library of Congress (LoC collected 235TB in Apr/2011) YouTube receives 24hours of video, every minute 5 Billion mobile phones in use in 2010 Tesco (British Retailer) collects 1.5 billion pieces of information to adjust prices and promotions Amazon.com: 30% of sales is out of its recommendation engine Planecast, Mobclix : Track & Target systems promotes contextual promotions A Boeing Jet Engine produces 20TB/Hour for engineers to examine in real time to make improvements Sources: Forrester, The Economist,McKinsey Global Institute
  • 5. Collect More Business Operations Transactions Registers Gateways Customer Information CRM Product Information Barcodes RFID Web Pages Web Repositories Unstructured Information Social Media Signals Mobile GPS, GeoSpatial
  • 6. DBMS Solutions Legacy Faster Retrieval Efficient Storage Divide and Access Data Consolidation Broader Tables Access all as a row Fine Grain Access Security Rules and Policies Problems Data Growth When storage cost is not an issue Scalability Issues Performance Issues New types of requirements Deciding what to analyze, when and how? Cost of a change in the subject-area to analyze
  • 7. The Disconnect Old DBMS vs. New Data Types/Structures Old DBMS vs. New volume Old DBMS vs. New Analysis Old DBMS vs. Data Retention Old DBMS vs. Data Element Striping Old DBMS vs. Data Infrastructure Old DBMS vs. One DB Platform for all
  • 8. The Need System that can handle high volume data Perform complex operations Scalable Robust Highly Available Fault Tolerant Economic New Approach
  • 9. Big Data “Tools and techniques to manage different types of data, in high volume, in high velocitywith varied requirements to mine them” Characteristics Size Scale up and scale out: Terabyte, Petabyte … Structure Structured Unstructured : Audio, Video, Text, GeoSpatial Schema Less Structures Stream Torrent of real-time information Operation Massively Parallel Processing (MPP)
  • 10. Approach Hardware Commodity Hardware Appliance Dynamic Scaling Fault Tolerant Highly Available No constraints on Storage Cloud Virtual Environment, Storage Processing Models In-memory In-database Interfaces/Adapters Workload Management Distributed Data Processing Software Frameworks – Hadoop, MapReduce, Vrije, BOOM, Bloom Open Source Proprietary
  • 11. Architectural Requirements Integration Framework Development Framework Management Framework Modeling Framework Processing Framework Data Management Framework
  • 12. Challenges Volumetric Analysis Complexity Streaming Data/Real Time Data Network Topology Infrastructure Pattern-based Strategy
  • 13. Techniques Controlled and Variate Testing Mining Machine Learning Natural Language Processing (NLP) Cohort Analysis Network or Path Analysis Predictive Models Crowd Sourcing Regression Models Sentiment Analysis Processing Signals Spatial Analytics Visualization Time-series Analysis
  • 14. Solutions IBM: Infosphere BigInsights, Streams Teradata/Aster Data: nCluster, SQL-MR Frameworks Hadoop MapReduce Infobright* Splunk Cloudera* Cassandra NoSQL, NewSQL Google’s Big Table Appliance Teradata Netezza (IBM) Columnar Databases Vertica (HP) ParAccel Managed Services Available
  • 15. Issues Latency Faultiness Accuracy ACID Atomicity Consistency Isolation Durability Setup Cost Development Cost Cost-to-fly
  • 17. Top level Apache project Open source Software Framework - Java Inspired by Google’s white papers onMap/Reduce (MR)Google File System (GFS)Big Table Originally developed to support Apache Nutch Designed Large scale data processing For batch processing For sophisticated analysis To deal with structured and unstructured data DB Architect’s Hadoop : "Heck Another Darn Obscure Open-source Project"
  • 18. Why Hadoop? Runs on commodity hardware Portability across heterogeneous hardwareand software platforms Shared-nothing architecture Scale hardware when ever you want System compensates for hardware scalingand issues (if any) Run large-scale, high volume data processes Scales well with complex analysis jobs (Hardware) “Failure is an option” Ideal to consolidate data from both new andlegacy data sources Highly Integrable Value to the business
  • 19. Hadoop Ecosystem HDFS Hadoop Distributed File System Map/Reduce Software framework for Clustered, Distributed data processing ZooKeeper Scheduler Avro Data Serialization Chukwa Data Collection System to monitor Distributed Systems HBase Data storage for distributed large tables Hive Data warehouse Pig High-Level Query Language Scribe Log Collection UDF User Defined Functions
  • 20. Hadoop Flow (Example) Network Storage Web Servers Scribe Oracle MySQL Hadoop Hive DWH MySQL Oracle Apps Feeds
  • 21. HDFS Hadoop Distributed File System Master/Slave Architecture Runs on commodity hardware Fault Tolerant Handle large volumes of data Provides High Throughput Streaming data-access Simple file coherency model Portable to heterogeneous hardware and software Robust Handles disk failures, replication (& re-replication) Performs cluster rebalancing, data integrity checks
  • 22.
  • 23.
  • 25.
  • 26. Mapper Function cat * | grep | sort | uniq –c | cat > file input | map | shuffle | reduce | output
  • 27. Reduce Function cat * | grep | sort | uniq –c | cat > file input | map | shuffle | reduce | output
  • 31. Aster Data Now part of Teradata Massively Parallel SQL Layer on MR (MapReduce) In-Database Analytics Appliance vs. Software Stack Model Cloud Options nPath and Statistical Options Data Integration
  • 33. Deep Dive Case Studies
  • 34. Aster nCluster @ Mobclix Mobclix App Library Smart Phone App-Ad Fabric Relevance Recommendation Events Ads Ad Exchange Context BI DB DB DB DB Targeting DB DB DB DB Amazon EC2 Analytics nCluster Amazon EC2 nCluster 1 TB/Hour 1-2 TB/Hour IP of Mobclix
  • 35. MYNA @ Yahoo Front Page 12,000 Web Servers App Servers 3,500 Logs 12-15 TB/Hour Cut Clean Spray Process Cut Push Logs Highway 12-15 TB/Hour Access Collect Process Query Spray Push MYNA Fabric 3-4 TB/Hour Atlantic DW IP of Yahoo Inc.
  • 36. Front Page Data Mart BI DB DB Web Servers DB App Servers Ads DB Targeting Views Clicks DB DB DB Web Logs Actions Registry FP Datamart FY Datamart DB DB DB DB Property Events Context User Ads Ad Events Data Highway IP of Yahoo Inc.
  • 37. Big Data @ HCL Project Carbon Time-series Analytics HDF5 Data SMILe Hadoop framework Buzzmetrics, Nielsen 'sentiment' data CoE Opportunities DW/BI Expertise Grow talent organically POC Case Studies Challenges New Technology Talent in parts Investments Evangelize
  • 38. Thank You "You either scale to where your customer base takes you or you die" Jim Starkey – Founder and CTO NimbusDB "Our philosophy is to build infrastructure using thebest tools available for the job and we areconstantly evaluating better ways to do thingswhen and where it matters."Facebook "In any year we probably generate more data than the Walt Disney Co. did in the first 80 years of existence" Bud Albers - Disney