Enviar pesquisa
Carregar
IBM-Why Big Data?
•
5 gostaram
•
2,849 visualizações
Kun Le
Seguir
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 97
Baixar agora
Baixar para ler offline
Recomendados
Big data ibm keynote d advani presentation
Big data ibm keynote d advani presentation
MassTLC
Overview - IBM Big Data Platform
Overview - IBM Big Data Platform
Vikas Manoria
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
Donghui Zhang
Overview of analytics and big data in practice
Overview of analytics and big data in practice
Vivek Murugesan
Ibm big data
Ibm big data
Peter Tutty
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
Tony Pearson
Big Data Scotland 2017
Big Data Scotland 2017
Ray Bugg
Big Data vs Data Warehousing
Big Data vs Data Warehousing
Thomas Kejser
Recomendados
Big data ibm keynote d advani presentation
Big data ibm keynote d advani presentation
MassTLC
Overview - IBM Big Data Platform
Overview - IBM Big Data Platform
Vikas Manoria
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
Donghui Zhang
Overview of analytics and big data in practice
Overview of analytics and big data in practice
Vivek Murugesan
Ibm big data
Ibm big data
Peter Tutty
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
Tony Pearson
Big Data Scotland 2017
Big Data Scotland 2017
Ray Bugg
Big Data vs Data Warehousing
Big Data vs Data Warehousing
Thomas Kejser
Why Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & Analytics
Rick Perret
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
DataWorks Summit
Telco Big Data Workshop Sample
Telco Big Data Workshop Sample
Alan Quayle
Big Data Overview
Big Data Overview
IMEX Research
Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714
Niu Bai
Big data analytics, research report
Big data analytics, research report
JULIO GONZALEZ SANZ
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
Rohit Dubey
Big data - what, why, where, when and how
Big data - what, why, where, when and how
bobosenthil
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big Data
Ed Dodds
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
m_hepburn
Telco Big Data 2012 Highlights
Telco Big Data 2012 Highlights
Alan Quayle
Big data Analytics
Big data Analytics
TUSHAR GARG
Research paper on big data and hadoop
Research paper on big data and hadoop
Shree M.L.Kakadiya MCA mahila college, Amreli
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
DataWorks Summit
Big data, data science & fast data
Big data, data science & fast data
Kunal Joshi
Big Data Infrastructure and Analytics Solution on FITAT2013
Big Data Infrastructure and Analytics Solution on FITAT2013
Erdenebayar Erdenebileg
IBM InfoSphere Data Replication for Big Data
IBM InfoSphere Data Replication for Big Data
IBM Analytics
Deutsche Telekom on Big Data
Deutsche Telekom on Big Data
DataWorks Summit
Big Data World Forum
Big Data World Forum
bigdatawf
Big Data on AWS
Big Data on AWS
Amazon Web Services LATAM
Social Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data Technologies
Nicolas Morales
Skilwise Big data
Skilwise Big data
Skillwise Group
Mais conteúdo relacionado
Mais procurados
Why Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & Analytics
Rick Perret
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
DataWorks Summit
Telco Big Data Workshop Sample
Telco Big Data Workshop Sample
Alan Quayle
Big Data Overview
Big Data Overview
IMEX Research
Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714
Niu Bai
Big data analytics, research report
Big data analytics, research report
JULIO GONZALEZ SANZ
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
Rohit Dubey
Big data - what, why, where, when and how
Big data - what, why, where, when and how
bobosenthil
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big Data
Ed Dodds
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
m_hepburn
Telco Big Data 2012 Highlights
Telco Big Data 2012 Highlights
Alan Quayle
Big data Analytics
Big data Analytics
TUSHAR GARG
Research paper on big data and hadoop
Research paper on big data and hadoop
Shree M.L.Kakadiya MCA mahila college, Amreli
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
DataWorks Summit
Big data, data science & fast data
Big data, data science & fast data
Kunal Joshi
Big Data Infrastructure and Analytics Solution on FITAT2013
Big Data Infrastructure and Analytics Solution on FITAT2013
Erdenebayar Erdenebileg
IBM InfoSphere Data Replication for Big Data
IBM InfoSphere Data Replication for Big Data
IBM Analytics
Deutsche Telekom on Big Data
Deutsche Telekom on Big Data
DataWorks Summit
Big Data World Forum
Big Data World Forum
bigdatawf
Big Data on AWS
Big Data on AWS
Amazon Web Services LATAM
Mais procurados
(20)
Why Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & Analytics
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Telco Big Data Workshop Sample
Telco Big Data Workshop Sample
Big Data Overview
Big Data Overview
Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714
Big data analytics, research report
Big data analytics, research report
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
Big data - what, why, where, when and how
Big data - what, why, where, when and how
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big Data
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
Telco Big Data 2012 Highlights
Telco Big Data 2012 Highlights
Big data Analytics
Big data Analytics
Research paper on big data and hadoop
Research paper on big data and hadoop
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
Big data, data science & fast data
Big data, data science & fast data
Big Data Infrastructure and Analytics Solution on FITAT2013
Big Data Infrastructure and Analytics Solution on FITAT2013
IBM InfoSphere Data Replication for Big Data
IBM InfoSphere Data Replication for Big Data
Deutsche Telekom on Big Data
Deutsche Telekom on Big Data
Big Data World Forum
Big Data World Forum
Big Data on AWS
Big Data on AWS
Destaque
Social Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data Technologies
Nicolas Morales
Skilwise Big data
Skilwise Big data
Skillwise Group
Big Data and Analytics: The IBM Perspective
Big Data and Analytics: The IBM Perspective
The_IPA
The Palantir Seeing Stone
The Palantir Seeing Stone
Enterprise 2.0 Conference
High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
Jose Luis Lopez Pino
5 Level of MDM Maturity
5 Level of MDM Maturity
PanaEk Warawit
Big data ppt
Big data ppt
Shweta Sahu
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
James Serra
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Nathan Bijnens
Creating a Data-Driven Organization: an executive summary
Creating a Data-Driven Organization: an executive summary
Carl Anderson
Data strategy in a Big Data world
Data strategy in a Big Data world
Craig Milroy
Developing a Data Strategy -- A Guide For Business Leaders
Developing a Data Strategy -- A Guide For Business Leaders
ibi
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
DATAVERSITY
Data Strategy
Data Strategy
Jeff Block
Palantir, Quid, RecordedFuture: Augmented Intelligence Frontier
Palantir, Quid, RecordedFuture: Augmented Intelligence Frontier
Daniel Kornev
Data Governance: Keystone of Information Management Initiatives
Data Governance: Keystone of Information Management Initiatives
Alan McSweeney
Four ways data is improving healthcare operations
Four ways data is improving healthcare operations
Tableau Software
Tableau presentation
Tableau presentation
kt166212
Big Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
Destaque
(19)
Social Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data Technologies
Skilwise Big data
Skilwise Big data
Big Data and Analytics: The IBM Perspective
Big Data and Analytics: The IBM Perspective
The Palantir Seeing Stone
The Palantir Seeing Stone
High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
5 Level of MDM Maturity
5 Level of MDM Maturity
Big data ppt
Big data ppt
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Creating a Data-Driven Organization: an executive summary
Creating a Data-Driven Organization: an executive summary
Data strategy in a Big Data world
Data strategy in a Big Data world
Developing a Data Strategy -- A Guide For Business Leaders
Developing a Data Strategy -- A Guide For Business Leaders
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Data Strategy
Data Strategy
Palantir, Quid, RecordedFuture: Augmented Intelligence Frontier
Palantir, Quid, RecordedFuture: Augmented Intelligence Frontier
Data Governance: Keystone of Information Management Initiatives
Data Governance: Keystone of Information Management Initiatives
Four ways data is improving healthcare operations
Four ways data is improving healthcare operations
Tableau presentation
Tableau presentation
Big Data Analytics with Hadoop
Big Data Analytics with Hadoop
Semelhante a IBM-Why Big Data?
Konceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBM
IBM Danmark
Ibm swg day 2012 jhb big data (white)
Ibm swg day 2012 jhb big data (white)
simonje
MICROSTRATEGY - Sessione introduttiva sulla piattaforma di Business Intelligence
MICROSTRATEGY - Sessione introduttiva sulla piattaforma di Business Intelligence
Twinergy
Future Tech for Concur User Conference
Future Tech for Concur User Conference
Michael Fauscette
Social mobile usage Don't Leave Social at the Office
Social mobile usage Don't Leave Social at the Office
Heath McCarthy
ibm
ibm
canaleenergia
Entering engineering workforce_v4
Entering engineering workforce_v4
Tony Pearson
Social Networking 3 - Final
Social Networking 3 - Final
Willie Favero
Accelerate Return on Data
Accelerate Return on Data
Jeffrey T. Pollock
ASAE Tech Conference 2012 Closing Keynote on Disruptive Tech
ASAE Tech Conference 2012 Closing Keynote on Disruptive Tech
Dion Hinchcliffe
The mobile traveler experience
The mobile traveler experience
Kevin May
Cognitive computing big_data_statistical_analytics
Cognitive computing big_data_statistical_analytics
Pietro Leo
Smarter Analytics: Big Data and Predictive Governance
Smarter Analytics: Big Data and Predictive Governance
IBM Danmark
Mobile Megatrends 2008
Mobile Megatrends 2008
Georges-Hubert Delporte
Kim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our World
BigDataViz
Palestra "Technology Trends To Watch In 2012 and beyond"
Palestra "Technology Trends To Watch In 2012 and beyond"
Dígitro Tecnologia
Big Data: Industry trends and key players
Big Data: Industry trends and key players
CM Research
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
IntelAPAC
Ibm software network2012 claudio cinquepalmi #ibmsocialbiz
Ibm software network2012 claudio cinquepalmi #ibmsocialbiz
Claudio Cinquepalmi
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
VaishnavGhadge1
Semelhante a IBM-Why Big Data?
(20)
Konceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBM
Ibm swg day 2012 jhb big data (white)
Ibm swg day 2012 jhb big data (white)
MICROSTRATEGY - Sessione introduttiva sulla piattaforma di Business Intelligence
MICROSTRATEGY - Sessione introduttiva sulla piattaforma di Business Intelligence
Future Tech for Concur User Conference
Future Tech for Concur User Conference
Social mobile usage Don't Leave Social at the Office
Social mobile usage Don't Leave Social at the Office
ibm
ibm
Entering engineering workforce_v4
Entering engineering workforce_v4
Social Networking 3 - Final
Social Networking 3 - Final
Accelerate Return on Data
Accelerate Return on Data
ASAE Tech Conference 2012 Closing Keynote on Disruptive Tech
ASAE Tech Conference 2012 Closing Keynote on Disruptive Tech
The mobile traveler experience
The mobile traveler experience
Cognitive computing big_data_statistical_analytics
Cognitive computing big_data_statistical_analytics
Smarter Analytics: Big Data and Predictive Governance
Smarter Analytics: Big Data and Predictive Governance
Mobile Megatrends 2008
Mobile Megatrends 2008
Kim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our World
Palestra "Technology Trends To Watch In 2012 and beyond"
Palestra "Technology Trends To Watch In 2012 and beyond"
Big Data: Industry trends and key players
Big Data: Industry trends and key players
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
Ibm software network2012 claudio cinquepalmi #ibmsocialbiz
Ibm software network2012 claudio cinquepalmi #ibmsocialbiz
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
Mais de Kun Le
Ibm big data and analytics for a holistic customer journey
Ibm big data and analytics for a holistic customer journey
Kun Le
Lessons and Challenges from Mining Retail E-Commerce Data
Lessons and Challenges from Mining Retail E-Commerce Data
Kun Le
University of Washington Computer Science & Engineering CSE 403: Software Eng...
University of Washington Computer Science & Engineering CSE 403: Software Eng...
Kun Le
Business Intelligence and Retail
Business Intelligence and Retail
Kun Le
The “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedIn
Kun Le
Marketo - Definitive guide to marketing metrics marketing analytics
Marketo - Definitive guide to marketing metrics marketing analytics
Kun Le
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
Kun Le
Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569
Kun Le
Under the hood_of_mobile_marketing
Under the hood_of_mobile_marketing
Kun Le
IBM-Infoworld Big Data deep dive
IBM-Infoworld Big Data deep dive
Kun Le
Smarter analytics for retailers Delivering insight to enable business success
Smarter analytics for retailers Delivering insight to enable business success
Kun Le
Quoc Le, Stanford & Google - Tera Scale Deep Learning
Quoc Le, Stanford & Google - Tera Scale Deep Learning
Kun Le
IBM - Using Predictive Analytics to Segment, Target and Optimize Marketing
IBM - Using Predictive Analytics to Segment, Target and Optimize Marketing
Kun Le
Big data that drives marketing roi across all channels & campaigns
Big data that drives marketing roi across all channels & campaigns
Kun Le
Mais de Kun Le
(14)
Ibm big data and analytics for a holistic customer journey
Ibm big data and analytics for a holistic customer journey
Lessons and Challenges from Mining Retail E-Commerce Data
Lessons and Challenges from Mining Retail E-Commerce Data
University of Washington Computer Science & Engineering CSE 403: Software Eng...
University of Washington Computer Science & Engineering CSE 403: Software Eng...
Business Intelligence and Retail
Business Intelligence and Retail
The “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedIn
Marketo - Definitive guide to marketing metrics marketing analytics
Marketo - Definitive guide to marketing metrics marketing analytics
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569
Under the hood_of_mobile_marketing
Under the hood_of_mobile_marketing
IBM-Infoworld Big Data deep dive
IBM-Infoworld Big Data deep dive
Smarter analytics for retailers Delivering insight to enable business success
Smarter analytics for retailers Delivering insight to enable business success
Quoc Le, Stanford & Google - Tera Scale Deep Learning
Quoc Le, Stanford & Google - Tera Scale Deep Learning
IBM - Using Predictive Analytics to Segment, Target and Optimize Marketing
IBM - Using Predictive Analytics to Segment, Target and Optimize Marketing
Big data that drives marketing roi across all channels & campaigns
Big data that drives marketing roi across all channels & campaigns
Último
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
Pixlogix Infotech
2024 April Patch Tuesday
2024 April Patch Tuesday
Ivanti
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
marketing932765
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
LoriGlavin3
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
itnewsafrica
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
panagenda
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
Kaya Weers
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
panagenda
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
itnewsafrica
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Mark Goldstein
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
IES VE
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
LoriGlavin3
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Curtis Poe
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
Neo4j
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
Inflectra
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
Pixlogix Infotech
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
LoriGlavin3
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
Nicole Novielli
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
Último
(20)
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
2024 April Patch Tuesday
2024 April Patch Tuesday
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
IBM-Why Big Data?
1.
Big Data Paul Zikopoulos,
BA, MBA Director, IBM Information Management Technical Professionals, WW Competitive Database, and Big Data @BigData_paulz IBM Certified Advanced Technical Expert (Clusters and DRDA) IBM Certified Customer Solutions Expert (DBA and BI) paulz_ibm@msn.com March 27, 2012 © 2012 IBM Corporation
2.
Paul C. Zikopoulos,
B.A., M.B.A., is the Director of Technical Professionals for IBM Software Group’s Information Management division and additionally leads the World Wide Database Competitive and Big Data SWAT teams. Paul is an award-winning writer and speaker with more than 18 years of experience in Information Management. Paul has written more than 350 magazine articles and 15 books including Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, DB2 pureScale: Risk Free Agile Scaling, Break Free with DB2 9.7: A Tour of Cost Saving Features; Information on Demand: Introduction to DB2 9.5 New Features; DB2 Fundamentals Certification for Dummies; DB2 for Dummies; and more. Paul is a DB2 Certified Advanced Technical Expert (DRDA and Clusters) and a DB2 Certified Solutions Expert (BI and DBA). In his spare time, he enjoys all sorts of sporting activities, including running with his dog Chachi, avoiding punches in his MMA training, and trying to figure out the world according to Chloë—his daughter. You can reach him at: paulz_ibm@msn.com. @BigData_paulz 2 © 2012 IBM Corporation
3.
Why Big Data How
We Got Here March 27, 2012 © 2012 IBM Corporation
4.
4
In 2005 there were 1.3 billion RFID tags in circulation… …by the end of 2011, this was about 30 billion and growing even faster © 2012 IBM Corporation
5.
An increasingly sensor-enabled
and instrumented business environment generates HUGE volumes of data with MACHINE SPEED characteristics… 1 BILLION lines of code 5 EACH engine generating 10 TB every 30 minutes! © 2012 IBM Corporation
6.
350B
Transactions/Year Meter Reads every 15 min. 120M – meter reads/month 3.65B – meter reads/day 6 © 2012 IBM Corporation
7.
In August
of 2010, Adam Savage, of “Myth Busters,” took a photo of his vehicle using his smartphone. He then posted the photo to his Twitter account including the phrase “Off to work.” Since the photo was taken by his smartphone, the image contained metadata revealing the exact geographical location the photo was taken By simply taking and posting a photo, Savage revealed the exact location of his home, the vehicle he drives, and the time he leaves for work 7 © 2012 IBM Corporation
8.
The Social Layer
in an Instrumented Interconnected World 4.6 30 billion RFID billion tags today camera 12+ TBs (1.3B in 2005) phones of tweet data world every day wide 100s of millions of GPS data every day ? TBs of enabled devices sold annually 25+ TBs of 2+ log data billion every day people on the 76 million smart Web by meters in 2009… end 2011 200M by 2014 8 © 2012 IBM Corporation
9.
Twitter Tweets per
Second Record Breakers of 2011 Social-media analytics can be used from healthcare to predicting votes Challenges – Volume – Velocity – Variety – Language Processing: consider that Twitter sentences are not well formed and often use urban talk 9 © 2012 IBM Corporation
10.
Can a Social
Media Persona be Monetized? 10 © 2012 IBM Corporation
11.
Extract Intent, Life
Events, Micro Segmentation Attributes Chloe Name, Birthday, Family Tom Sit Not Relevant - Noise Tina Mu Monetizable Intent Jo Jobs Not Relevant - Noise Location Wishful Thinking 11 Relocation Monetizable Intent SPAMbots © 2012 IBM Corporation
12.
12
© 2012 IBM Corporation
13.
1.8 ZB
1 ZB 1 ZB=1T GB 4Trillion 8GB iPods 13 © 2012 IBM Corporation
14.
The Big Data
Opportunity Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible. Variety: Manage the complexity of data in many different structures, ranging from relational, to logs, to raw text Velocity: Streaming data and large volume data movement Volume: Scale from Terabytes to Petabytes (1K TBs) to Zettabytes (1B TBs) 14 © 2012 IBM Corporation
15.
Data In Motion
Analyzes and correlates 5M+ 500K/sec, 6B+ IPDRs analyzed market messages/sec to execute per day on more than 4 PBs/yr. algorithmic option trades with sustaining 1GBps. average latency of 30 micro-secs. height: height: height: 640 1280 640 directory: directory: directory: directory: width: width: width: ”/img" ”/img" ”/opt" ”/img" 480 1024 480 filename: filename: filename: filename: data: data: data: “cow” “bird” “java” “cat” tuples 15 © 2012 IBM Corporation
16.
The Big Data
Conundrum The economies of deletion have changed…. – Leading us into new opportunities and challenges The percentage of available data an enterprise can analyze is decreasing proportionately to the available to that enterprise Quite simply, this means as enterprises, we are getting “more naive” about our business over time Data AVAILABLE to an organization Data an organization can PROCESS 16 © 2012 IBM Corporation
17.
Applications for Big
Data Analytics Smarter Healthcare Multi-channel Finance Log Analysis sales Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail: Churn, NBO 17 © 2012 IBM Corporation
18.
Bigger and Bigger
Volumes of Data Retailers collect click-stream data from Web site interactions and loyalty card-drive transaction data – This traditional POS information is used by retailer for shopping basket analysis, inventory replenishment, +++ – But data is being provided to suppliers for customer buying analysis Healthcare has traditionally been dominated by paper-based systems, but this information is getting digitized Science is increasingly dominated by big science initiatives – Large-scale experiments generate over 15 PB of data a year and can’t be stored within the data center; then sent to laboratories Financial services are seeing larger volumes through smaller trading sizes, increased market volatility, and technological improvements in automated and algorithmic trading Improved instrument and sensory technology – Large Synoptic Survey Telescope’s GPixel camera generates 6PB+ of image data per year or consider Oil and Gas industry 18 © 2012 IBM Corporation
19.
A Simple Log
File Jun 29 04:03:37 192.168.190.24 nagios: GLOBAL SERVICE EVENT HANDLER: ast-oral- vip; remedy_oracle_db_latch-contention; (null); (null); (null); handle-all- critical-event DATE TIME IP ADDRESS MONITOR Jun 29 04:03:57 192.168.190.24 nagios: SERVICE ALERT: ast-oral- vip;remedy_oracle_db_soft-prase-ratio;CRITICAL;HARD;3;CRITICAL – Soft parse ratio 86.63% TYPE COMPONENT Jun 29 07:50:57 192.168.190.24 nagios: SERVICE ALERT: ast-oral vip;remedy_oracle_db_latch-waiting; OK;SOFT;2;OK – SGA latch xssinfo freelist (#350) sleeping 0.000000% of the time Jun 29 07:50:57 192.168.190.24 nagios: GLOBAL SERVICE EVENT HANDLER: ast-oral- vip;remedy_oracle_db_latch-waiting; (null); (null); (null); handle-all- critical-event 19 © 2012 IBM Corporation
20.
20
© 2012 IBM Corporation
21.
Click Stream Analysis
Use Case Goal: Determine how many abandoned shopping carts there are and where they were abandoned. Input: Web log data (IP address, timestamp, URL, HTTP return codes). Output: List of IP addresses and last page visited by user. Web log input Map Step K2 V2 201.67.34.67 | 10:24:48 | www.ff.com | 200 Make IP 201.67.34.67 10:24:48,www.ff.com 123.99.54.76 | 11:15:21 | www.ff.com/search? | 200 address the 123.99.54.76 11:15:21,www.ff.com/search? 231.67.66.84 | 02:40:33 | www.ff.com/addItem.jsp | 200 key and 1 line timestamp 231.67.66.84 02:40:33,www.ff.com/addItem.jsp 101.23.78.92 | 07:09:56 | www.ff.com/shipping.jsp | 200 and URL the 101.23.78.92 07:09:56,www.ff.com/shipping.jsp 114.43.89.75 | 01:32:43 | www.ff.com | 200 value. 114.43.89.75 01:32:43,www.ff.com 109.55.55.81 | 09:45:31 | www.ff.com/confirmation.jsp | 200 109.55.55.81 09:45:31,www.ff.com/confirmation.jsp K1 – Ignored, V1 – 1 line of log Reduce Step K3 V3 Ensure K4 Shuffle/Sort/Group 201.67.34.67 10:24:48,www.ff.com something is Framework 201.67.34.67 in cart and does this automatically determine last 10:25:15,www.ff.com/search? page visited. V4 www.ff.com/shipping 10:26:23,www.ff.com/shipping 21 © 2012 IBM Corporation
22.
Solution Architecture –
Social Media Analytics for Media and Entertainment Online flow: Data-in-motion analysis Social Media Stream Computing and Analytics Timely Decisions Entity Predictive Data Ingest Text Analytics: Analytics: Analytics: and Prep Timely Insights Profile Action Resolution Determination Dashboard Hadoop System and Analytics Comprehensive Entity Social Media Social Media Predictive Customer Text Analytics Analytics and Data Customer Analytics Models Integration Profiles Offline flow: Data-at-rest analysis Reports 22 © 2012 IBM Corporation
23.
Most Requested Uses
of Big Data Log Analytics & Storage Smart Grid / Smarter Utilities RFID Tracking & Analytics Fraud / Risk Management & Modeling 360° View of the Customer Warehouse Extension Email / Call Center Transcript Analysis Call Detail Record Analysis +++ 23 23 © 2012 IBM Corporation
24.
Why Didn’t We
Use All of the Big Data Before? “IBM gave us an opportunity to turn our plans into something that was very tangible right from the beginning. IBM had experts within data mining, Big Data, and Apache Hadoop and it was clear to use from the beginning we wanted to improve our business, not only today, but also prepare for the challenges we will face in three to five years, we had to go with IBM.” – Lars Christian Christensen VP Plant Siting & Forecasting 24 © 2012 IBM Corporation
25.
25
© 2012 IBM Corporation
26.
Watson’s advanced analytic
capabilities can sort through the equivalent of 26 200 MILLION pages of data to uncover an answer in 3 SECONDS.Corporation © 2012 IBM
27.
IBM Big Data
Strategy: Move the Analytics Closer to the Data New analytic applications drive Analytic Applications the requirements for a big data BI / Exploration / Functional Industry Predictive Content Reporting Visualization App BI / App Analytics Analytics platform Reportin g – Integrate and manage the full IBM Big Data Platform variety, velocity and volume of data – Apply advanced analytics to Visualization Application Systems Development Management information in its native form & Discovery – Visualize all available data for ad- hoc analysis Accelerators – Development environment for building new analytic applications Hadoop Hadoop Stream Stream Data – Workload optimization and System System Computing Computing Warehouse scheduling – Security and Governance Information Integration & Governance 27 © 2012 IBM Corporation
28.
The IBM Big
Data Platform InfoSphere BigInsights Hadoop-based low latency analytics for variety and volume Data-At-Rest MPP Hadoop MPP MPP Stream Information Computing InfoSphere Information Server Integration InfoSphere Streams Low Latency Analytics for High volume data streaming data integration and Velocity, Variety & Volume transformation MPP Data Data-In-Motion Warehouse Netezza High Netezza 1000 Capacity Appliance BI+Ad Hoc Analytics on Queryable Archive for Structured Data Structured Data InfoSphere Warehouse Smart Analytics System Large volume structured Operational Analytics on data analytics Structured Data 28 © 2012 IBM Corporation
29.
Deep Analytics Appliance
– Revolutionized Analytics Purpose-built analytics appliance Dedicated High Performance Speed: 10-100x faster than traditional systems Disk Storage Simplicity: Minimal administration and tuning Scalability: Peta-scale user data capacity Blades With Custom FPGA Accelerators Smart: High-performance advanced analytics 29 © 2012 IBM Corporation
30.
Complementary Analytics
Traditional Approach New Approach Structured, analytical, logical Creative, holistic thought, intuition Data Hadoop Warehouse Streams Transaction Data Web Logs Internal App Data Social Data Structured Structured Unstructured Repeatable Enterprise Unstructured Exploratory Mainframe Data Repeatable Integration Exploratory Text Data: emails Linear Iterative Linear Monthly sales reports Iterativesentiment Brand Profitability analysis Product Sensor data: images strategy OLTP System Datasurveys Customer Maximum asset utilization ERP data Traditional New RFID Sources Sources 30 © 2012 IBM Corporation
31.
Most Client Use
Cases Combine Multiple Technologies Pre-processing Ingest and analyze unstructured data types and convert to structured data Combine structured and unstructured analysis Augment data warehouse with additional external sources, such as social media Combine high velocity and historical analysis Analyze and react to data in motion; adjust models with deep historical analysis Reuse structured data for exploratory analysis Experimentation and ad-hoc analysis with structured data 31 © 2012 IBM Corporation
32.
Data Governance
ed Some im es term “Sche times t e r me Somet a Fi rs t” ma L d m ater” “Sche Separation of Duties Separation of Duties – Users can’t delete their audit logs – No authorized access Privilege Users Privilege Users – Monitoring authorized and privilege – Monitoring authorized and privilege user activities user activities Sensitive Data (in tables) Sensitive Data (in the file system) – Protecting and blocking unauthorized – Protecting and blocking unauthorized access to the system and the data access to the system and the data Workflow for audit and compliance Workflow for audit and compliance controls to validate proper security controls to validate proper security procedures (satisfies regulations!) procedures (satisfies regulations!) 32 Internal Use Only © 2012 IBM Corporation
33.
Data Security Design
Goals ed Some imes term tim “Sche es termed Somet a Fi rs t” ma L m ater” “Sche Minimal impact to the Database Minimal impact to the Big Data server resources server resources No Database configuration changes No Big Data configuration changes Separation of Duties: Preventing Separation of Duties: Preventing privilege users from doing malicious privilege users/jobs from doing activities malicious activities Audit trail with very granular details Audit trail with very granular details of Database activity of Big Data activity Real-time alerting and blocking Real-time alerting and blocking Minimal impact to the network Minimal impact to the network 100% Database activity visibility 100% Big Data activity visibility Heterogeneous support Heterogeneous support 33 Internal Use Only © 2012 IBM Corporation
34.
Guardium
Customer Identification Privacy Master Data Management Master Data Management Data Privacy Data Privacy InfoSphere InfoSphere Optim for Test Data, Quality DB2 Redaction, +++ MDM Stage Data Customer Intelligence Appliance Data Quality Quality Data Models Out-of-the-box analytics Information Server Cognos Pre-built Customer Integration behavioral IBM Global Business Appliance attributes Services IBM Retail Data Model Core Metrics Unica Enterprise Data Warehouse Enterprise Data Warehouse Applications and Operational Analytics Applications and Operational Analytics Online Archive OLTP and Big Data Integration Managing Growth Managing Growth Built-in Integration into Big Data Built-in Integration into Big Data Optim Data Archive DB2 SAP - HANA solidDB Informix DB2 34 © 2012 IBM Corporation
35.
IBM Flattens the
Time to Value Big Data Curve + + + + + + Harden Text/ML Velocity Hadoop ETL Dev. Tooling Visualization File System Analytics Harden 100+Big Data Tooling Dev. Text/ML Velocity Hadoop ETL Visualization File System Analytics Business Partners + + + + OR Signed + 35 © 2012 IBM Corporation
36.
First Ever Forrester
Wave on Big Data “IBM has the deepest Hadoop platform and application portfolio. IBM, an established EDW vendor, has its own Hadoop distribution; an extensive professional services force working on Hadoop projects; extensive R&D programs developing Hadoop technologies; connections to Hadoop from its EDW.” –The Forrester Wave™: Enterprise Hadoop Solutions, 1Q12 36 © 2012 IBM Corporation
37.
Open Source
Technology Behind Hadoop March 27, 2012 © 2012 IBM Corporation
38.
A Difference of
Processing Models SETI@home is a Computational Processing Model – Service for Extraterrestrial Intelligence (SETI) uses unused desktop CPU processing power to perform wide-spread analysis of radio telescope data – Pushes data to the program for processing – Data to function MapReduce is a Data Processing Model – Data processing primitives of Mappers and Reducers – Can be complex to write MapReduce programs, but a very simple configuration change makes it scale to 1000s of nodes – Function to data 38 © 2012 IBM Corporation
39.
Building Blocks of
Hadoop 39 © 2012 IBM Corporation
40.
Hadoop Framework Hadoop
Common – A utility layer that provides access to the HDFS and projects HDFS – Data storage platform for Hadoop framework – Can scale to massive size when distributed over multiple computers MapReduce – Framework to process data across a node clusters – MAP process splits work by first mapping the input across the control nodes of the cluster, then splitting the workload into even smaller data sets and distributing it further throughout the computing cluster • Allows MPP – think DB2 DPF ‘like’ – REDUCE collects and combines the nodes’ answers to deliver a result 40 © 2012 IBM Corporation
41.
Hadoop Explained
Hadoop computation model – Data stored in a distributed file system spanning many inexpensive computers – Bring function to the data – Distribute application to the compute resources where the data is stored Scalable to thousands of nodes and petabytes of data public static class TokenizerMapper public static class TokenizerMapper Hadoop Data Nodes extends Mapper<Object,Text,Text,IntWritable> { extends Mapper<Object,Text,Text,IntWritable> { private final static IntWritable private final static IntWritable one = new IntWritable(1); one = new IntWritable(1); private Text word = new Text(); private Text word = new Text(); public void map(Object key, Text val, Context public void map(Object key, Text val, Context StringTokenizer itr = StringTokenizer itr = new StringTokenizer(val.toString()); new StringTokenizer(val.toString()); 1. Map Phase while (itr.hasMoreTokens()) { while (itr.hasMoreTokens()) { word.set(itr.nextToken()); word.set(itr.nextToken()); context.write(word, one); context.write(word, one); } } (break job into small parts) } } } } public static class IntSumReducer public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWrita 2. Shuffle extends Reducer<Text,IntWritable,Text,IntWrita Distribute map private IntWritable result = new IntWritable(); private IntWritable result = new IntWritable(); public void reduce(Text key, public void reduce(Text key, Iterable<IntWritable> val, Context context){ tasks to cluster Iterable<IntWritable> val, Context context){ (transfer interim output int sum = 0; int sum = 0; for (IntWritable v : val) { for (IntWritable v : val) { sum += v.get(); sum += v.get(); . . . . . . for final processing) MapReduce Application 3. Reduce Phase (boil all output down to Shuffle a single result set) Result Set Return a single result set 41 © 2012 IBM Corporation
42.
Growing the Hadoop
Environment Avro: Data serialization system that converts data into a fast, compact binary data format – Avro data can be stored in a file with versioned schemas Chukwa: Large-scale monitoring system that provides insights into the Hadoop distributed file system and MapReduce HBase is a scalable, column-oriented distributed database modeled after Google's BigTable distributed storage system Hive is a data warehouse infrastructure that provides ad hoc query and data summarization for Hadoop Supported data – Hive utilizes SQL-like query language call HiveQL – HiveQL also used by programmers to run custom MapReduce jobs. 42 © 2012 IBM Corporation
43.
Growing the Hadoop
Environment Mahout is a data mining library designed to work on the Hadoop framework – Mahout delivers a core set of algorithms designed for clustering, classification and batch-based filtering. Pig is a high-level programming language and execution framework for parallel computation – Pig works within the Hadoop and MapReduce frameworks ZooKeeper provides coordination, configuration and group services for distributed applications working over the Hadoop stack 43 © 2012 IBM Corporation
44.
Committed to Open
Source Decade of lineage and contributions to the open source community – Apache Hadoop and Jaql, Apache Derby, Apache Geronimo, Apache Jakarta, +++ – Eclipse: founded by IBM – Significant Lucene contributions via IBM Lucene Extension Library (ILEL) – DRDA, XQuery, SQL, XML4J, XERCES, HTTP, Java, Linux, +++ IBM products built on open source – WebSphere: Apache – Rational: Eclipse and Apache – InfoSphere: Eclipse and Apache, +++ IBM’s BigInsights (Hadoop) is 100% open source compatible with no forks 44 © 2012 IBM Corporation
45.
Learn Hadoop: At
Your Place, At Your Pace Making Learning Hadoop Easy and Fun Flexible on-line delivery allows learning @your place and @your pace Free courses, free study materials Cloud-based sandbox for exercises – zero setup 8500+ registered students Hadoop Programming Challenge is sending 3 students to IOD 2011 Conference, all expenses paid 45 © 2012 IBM Corporation
46.
Why IBM
for Big Data The Solution Side Paul Zikopoulos, BA, MBA Director, IBM Information Management Technical Professionals, WW Competitive Database, and Big Data IBM Certified Advanced Technical Expert (Clusters and DRDA) IBM Certified Customer Solutions Expert (DBA and BI) paulz@ca.ibm.com March 27, 2012 © 2012 IBM Corporation
47.
A Big Data
Platform Analytics Excellence In-Motion Operational At-Rest Operational Excellence Text Analytics Toolkit Excellence Harden Hadoop - GPFS Machine Learning Toolkit Surface Area Lock Down Unrivalled…. Industry Accelerators Development Tooling Embrace and Extend Policy Driven Retention & Immutability Role-Based Security Visualization Tooling Adaptive MapReduce Deployment Tooling (“App Workload Manager Store”) Fast Splittable CMX Compression $14B in 5 yrs. on Analytics REST-exposed Administration +++ Open Source IBMHadoop Big Data +++ Platform In-Motion At-Rest Analyze extreme amounts of Beyond traditional data in milliseconds structured data Uses same analytics as BigInsights uses same analytics as Streams BigInsights No forked, not ported: Hadoop Extended with Data can be analyzed on the way operational excellence and security into the enterprise for earlier Netezza for in-database MapReduce pattern detection MPP Data Warehouses 47 © 2012 IBM Corporation
48.
Committed to Innovation
$100M new investment into analytics announced 2Q11 IBM spent $14+ billion in 24 analytics acquisitions in 5 years IBM has the largest commercial research organization on Earth – 200+ mathematicians developing breakthrough analytics Largest patent portfolio in the industry For 19 consecutive years IBM inventors have received the most U.S. Patents 6,000 More than 48 patents in a single year © 2012 IBM Corporation
49.
A Big Data
Platform for Data In-Motion and At-Rest Data Ingest 01011001100011101001001001001 110100101010011100101001111001000100100010010001000100101 11000100101001001011001001010 01100100101001001010100010010 Opportunity Cost Starts Here 01100100101001001010100010010 11000100101001001011001001010 01100100101001001010100010010 Bootstrap 01100100101001001010100010010 Enrich 01100100101001001010100010010 01100100101001001010100010010 11000100101001001011001001010 01100100101001001010100010010 01100100101001001010100010010 01100100101001001010100010010 Adaptive 01100100101001001010100010010 01100100101001001010100010010 Analytics 11000100101001001011001001010 01100100101001001010100010010 Model 01100100101001001010100010010 01100100101001001010100010010 11000100101001001011001001010 49 © 2012 IBM Corporation
50.
Data In Motion
Hear what’s going on miles away to optimize perimeter displacements Perspective: Try to find the word “Zero” in a 1000 MP3 song library in a fraction of a second – Figure out the difference between the sound of a human whisper and the wind 50 © 2012 IBM Corporation
51.
Average cluster size
is 100 nodes one hour What if of this 100-node cluster cost 51 $34 © 2012 IBM Corporation
52.
52
© 2012 IBM Corporation
53.
Ease of Use
for Developers and Users End-user Visualization Development Environment Data exploration, crawling, Familiar coding and tooling and analytics environment, testing, and optimization 53 © 2012 IBM Corporation
54.
What is BigSheets?
Browser-based Big Data analytics tool for business users Big Data Challenges… How can BigSheets help? Business users need a no Spreadsheet-like discovery interface lets programming approach for business users easily analyze Big Data analyzing Big Data with ZERO PROGRAMMING Extremely difficult to find actionable business insights in BUILT-IN “readers” can work data from multiple sources with with data in several common formats different formats – JSON arrays, CSV, TSV, Web crawler output, . . . Translating untapped data into actionable business insights is Users can VISUALLY combine and a common requirement that explore various types of data to identify requires visualization “hidden” insights 54 © 2012 IBM Corporation
55.
55
© 2012 IBM Corporation
56.
8% spread
change affecting brand Over 1M tweets analyzed 2nd most Tweets/sec run rate as of 1Q12 56 © 2012 IBM Corporation
57.
Big Data Made
Easy for the Little Guy USC’s Film Forecaster correctly predicted a clamor for "Hangover 2” that resulted in $100 million opening over Memorial Day weekend – Looked at 250K-500K Tweets and broke down positive and negative messages using a lexicon of 1700 words The Film Forecaster sounds like a big undertaking for USC, but it really came down to one communications masters student who learned Big Sheets in a day, then pulled in the tweets and analyzed them - Ryan Kim 57 © 2012 IBM Corporation
58.
Running Applications from
the Web Console 58 © 2012 IBM Corporation
59.
A Rich Management
Big Data Tool Quick links to common functions Learn more through external Where and how to begin Web resources performing common administrative or analytical tasks 59 © 2012 IBM Corporation
60.
A Rich Management
Big Data Tool 60 © 2012 IBM Corporation
61.
Rich Big Data
Development Environment Eclipse base Development tools – For JQAL, Hive, PIG, Java MapReduce and Text Analytics 61 © 2012 IBM Corporation
62.
The Path to
Efficiency: Declarative Languages Offline Runtime Development Environment Extracted Declarative Language Declarative Language Select Objects Regex create view MonetizableIntent as select P.name as product, Join I.clue as strength Select from Intent I, Product P Intent Product Join Runtime Runtime where Follows(I.clue, P.name, 0, 20) Regex Product and Not(ContainsRegex(/b(not)b/, Cost-based LeftContext(I.clue, 10))); Intent optimization Input … Documents Join Select Product Streams, Text Analytics, Machine Regex High-throughput Learning, SQL all are declarative, simple to learn languages Intent Small memory footprint All have strong development tooling and accelerators Optimizes for tasks: IBM has been optimizing declarative languages for decades, Example, Text IN FACT, IBM INVENTED IT! Analytics needs CPU optimization 62 © 2012 IBM Corporation
63.
Analytic Accelerators Designed
for Variety Text Simple & (listen, verb), Acoustic (radio, noun) Advanced Text Mining in Advanced Microseconds Mathematical Models Predictive ∑ R ( st , at ) population Statistics GeoSpatial Image & Video 63 © 2012 IBM Corporation
64.
Accelerators Improve Time
to Value Telecommunications Retail Customer CDR streaming analytics Intelligence Deep Network Analytics Customer Behavior and Lifetime Value Analysis Finance Social Media Streaming options trading Analytics Insurance and banking DW Sentiment Analytics, Intent to models purchase Public Data Mining Transportation Streaming statistical Real-time monitoring and analysis routing optimization Over 100 sample User Defined Standard Industry Data applications Toolkits Toolkits Models Banking, Insurance, Telco, Healthcare, Retail 64 © 2012 IBM Corporation
65.
Telecommunications CDR Analytic
Accelerator Analyze Call Detail Records in Real Time Streaming Analytic Accelerators – CDR dropped call analysis – Determine VIP customers with service issues – proactive alerts – CDR Adapters – ASN.1, Binary, ASCII – Analytic Operators – CDR de-duplication, dropped call detection, termination reason, customer importance – Visualization – real-time KPI dashboard Data Warehouse Appliance – Integrated network, devices, customer, and services model – Telecom model, KPIs, and KQIs 65 © 2012 IBM Corporation
66.
Without a Big
Data Platform IBM Big Data Platform You Code… Over 100 sample applications and toolkits with industry focused Event toolkits with 300+ functions and Custom SQL Handling and operators! Scripts Multithreading Streams provides development, deployment, runtime, and infrastructure services Check Application Pointing Management Accelerators HA and Toolkits Performance Debug Connectors Optimization “TerraEchos developers can deliver Security applications 45% faster due to the agility of Streams Processing Language…” – Alex Philip, CEO and President 66 © 2012 IBM Corporation
67.
Streams Runtime Illustrated
Optimizing scheduler assigns PEs Dynamically add to nodes, and continually manages nodes and jobs resource allocation Commodity hardware – laptop, Add in SPSS blades or high performance clusters jobs in the flow Meters Company Usage Model Filter Temp Meters Action Text Season Daily Usage Extract Adjust Adjust Contract Text Extract Degree History Compare History Store History X86 Host X86 Host X86 Host X86 Host X86 Host 67 © 2012 IBM Corporation
68.
Streams Runtime Illustrated
PEs on busy nodes, can PEs on failing nodes can be be moved manually by the moved automatically, with Streams administrator communications re-routed Meters Company Usage Model Filter Temp Meters Action Text Season Daily Usage Extract Adjust Adjust Contract Text Extract Degree History Compare History Store History X86 Host X86 Host X86 Host X86 Host X86 Host 68 © 2012 IBM Corporation
69.
How Text Analytics
Works Football World Cup 2010, one team distinguished themselves well, losing to the eventual champions 1- 0 in the Final. Early in the second half, Netherlands’ striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casilas made the save. Winger Andres Iniesta scored for Spain for the win. World Cup 2010 Highlights Arjen Robben Striker Netherlands Iker Casilas Keeper Spain Andres Iniesta Winger Spain 69 © 2012 IBM Corporation
70.
What’s Wrong with
Text Analytics Today Current alternative approaches and infrastructure for text analytics present challenges for analysts – They tend to perform poorly (in terms of accuracy and speed) – They are difficult to use These alternative approaches rely on the raw text flowing only forward through a system of extractors and filters – Inflexible and inefficient approach, often resulting in redundant processing Existing toolkits are also limited in their expressiveness – Analysts having to develop custom code – Programmer Analyst (think Java Developer DBA struggles) – Leads to more delays, complexity, and difficulties getting it right – Biggest factor hurting analyst productivity is the difficulty in determining how the system produced a certain result 70 © 2012 IBM Corporation
71.
Extracting Person and
Phone Relationships 71 © 2012 IBM Corporation
72.
Text Analytic Toolkit
Example Semantically search Enron’s emails to support queries such as “Find Tom’s phone” in order to find his actual phone number – As opposed to emails containing words Tom and Phone Need to be able to accurately extract email entities of type Person and Phone and the relationship between them 72 © 2012 IBM Corporation
73.
The Enron Email
Example Start with a naïve set of rules to identify a PersonPhone relationship built using some sort of editor – Build out an extractor that know how to find a person’s name: Lorraine Smith – Build out an extractor that knows how to find a phone number: 607-205-4493 607.205.4493 Euro spacing xt. ext. … North American spacing +0119054132112 Globalstar GSP-1600 911, 411 +++ 73 © 2012 IBM Corporation
74.
The Tricky Thing
About Sentiment… 74 © 2012 IBM Corporation
75.
Text Analytics Toolkit
System T text analytics engine previously only embedded in IBM products and hidden from end users – Found in Lotus Notes, IBM e-discovery Analyzer, CCI, InfoSphere Warehouse,+++ – Almost a decade since initial release BigInsights is the first time IBM opens up the Text Analytics Engine technology for customization and development BigInsights Text Analytic Toolkit provides developer tools, an easy to use text analytics language, and a set of extractors for fast adoption – Multilingual support, including support for DBCS languages BigInsights includes Annotator Query Language (AQL): SQL-like! – Fully declarative text analytics language – No “black boxes” or modules that can’t be customized. – Tooling for easy customization because you are abstracted from the programmatic details – Competing solutions make use of locked up black-box modules that cannot be customized, which restricts flexibility and are difficult to optimize for performance 75 © 2012 IBM Corporation
76.
Accelerating Analytics –
Explainability Every annotation’s provenance can be visualized Provenance of Morgan Stanley Fax: 205-4493 shows the FullName rule is responsible for generating incorrect annotation – Solutions? • Remove Morgan Stanley from FullName dictionary? • Create new dictionary for CompanyName? Text Analytics Toolkit 76 © 2012 IBM Corporation
77.
Text Analytics Toolkit
Provenance Viewer Major challenge for analysts is determining the lineage of changes that have been applied to text – REALLY difficult to diskern which extractors need to be adjusted to tweak the resulting annotations Provenance Viewer for interactive visualizations to display output annotations for regression, debugging, version enhancements, +++ – Reduce development of extractors by days to weeks 77 © 2012 IBM Corporation
78.
Accelerating Analytics –
discovery and Pattern Matching Contextual Clues discoverer cluster the context surrounding annotation in order to detect frequently occurring patterns Illustrate clustering results between incorrect PersonPhone pairs – Visualize frequent occurrences (bubble size) of clues: i.e. FAX and TELEFAX – Developer improves precision of annotator by adding a rule that filters out PersonPhone pars if the Phone is preceded by a FAX clue – Developer finds call at text as key clue for rule extraction for PersonPhone Text Analytics Toolkit 78 © 2012 IBM Corporation
79.
Accelerating Analytics –
Assisted Development IDE that exposes sophisticated techniques to assist rule developers throughout all states of the development cycle – Promotes agile development because it unifies the business analyst and the developer with a common toolset which fosters understanding • Business Analyst helps mature and validates extractor results • Developer understands visually that AQL enhancements needed to refine rules Text Analytics Toolkit 79 © 2012 IBM Corporation
80.
IBM Text Analytics
Toolkit - Development Toolset Context Sensitive Help Outline View Hyperlink Navigation Procedural Help 80 © 2012 IBM Corporation
81.
BigInsights Text Analytics
Development Difference Content Assist Viewer Regular Expression Builder Context Sensitive Help Outline View Hyperlink Navigation Procedural Help Design Time Validation 81 © 2012 IBM Corporation
82.
Text Analytics Toolkit
– Development Accelerator Eclipse plug-ins enhance analyst productivity – When writing AQL code, the editor features syntax highlighting, and automatic detection of syntax error at design time not runtime – Pre-built extractors and sample test tools, +++ Reduce coding time and debugging by 30-50%+! 82 © 2012 IBM Corporation
83.
Text Analytics Toolkit
– Performance Accelerator AQL code is highly optimized for MapReduce because of its declarative nature Unlike other frameworks, AQL optimizer determines order of execution of the extractor instructions for maximum efficiency Deliver analysis up to 10x faster than other leading alternative frameworks running the same extractors 83 © 2012 IBM Corporation
84.
84
© 2012 IBM Corporation
85.
Business Task: Find
the Pictures that Have Cats 85 © 2012 IBM Corporation
86.
BUT! Your Application
Returns the Following… PRECISION (a measure of exactness) 2/4 RECALL (a measure of coverage) 2/5 86 © 2012 IBM Corporation
87.
IBM Finds RIGHT
Answers Better Than Anyone 87 © 2012 IBM Corporation
88.
IBM’s Hadoop System
Provides Unique Business Value Optimized beyond open source Hadoop – Workload optimization, security, +++ Integration with enterprise systems – Connectors for multiple data sources Accelerators reduce development and implementation times – Industry & application accelerators – Analytic accelerators Visualization tools enable business users to explore Big Data 88 © 2012 IBM Corporation
89.
University of Southern
Cal Political Debate Monitoring Solution to measure public sentiment during the Republican Primary and Presidential Debates Examines trends, volume, and content of millions of public Twitter messages in real-time Analytic accelerators to understand sentiment (positive, negative, neutral) - Stream Computing and visualization Benefits - Real-time display of public sentiment as candidates respond to questions - Debate winner prediction based on public opinion instead of solely political analysts 89 © 2012 IBM Corporation
90.
Cisco turns to
IBM big data for intelligent infrastructure m anagem ent Optimize building energy consumption with centralized monitoring and control of building monitoring system Automates preventive and corrective maintenance of building corrective systems Uses Streams, InfoSphere BigInsights and Cognos - Log Analytics - Energy Bill Forecasting - Energy consumption optimization - Detection of anomalous usage - Presence-aware energy mgt. 90 90 - Policy enforcement 2012 IBM Corporation ©
91.
Stream Computing Provides
Unique Business Value Real-time answers = low latency insight – Better outcomes for time sensitive applications (e.g. fraud detection, network management) Solution when data is too large or expensive to store – Analyze data as it comes to you – Persist data of interest for deeper analysis Insights derived across multiple streams – Fuse streams for new insights 91 © 2012 IBM Corporation
92.
Asian telco reduces
billing costs and improves customer satisfaction Capabilities: Stream Computing Analytic Accelerators Real-time mediation and analysis of 6B CDRs per day Data processing time reduced from 12 hrs to 1 sec Hardware cost reduced to 1/8th Proactively address issues (e.g. dropped calls) impacting customer satisfaction. 92 92 © 2012 IBM Corporation
93.
Data Warehousing Provides
Unique Business Value Consolidate, manage and reconcile data for enterprise business intelligence Establish trust, quality and governance where necessary – Financial data – Credit card data – Healthcare Combine deep and operational analytics Maintain history for trending and historical reporting 93 © 2012 IBM Corporation
94.
Pacific Northwest Smart Grid
Demonstration Project Capabilities: Stream Computing – real-time control system Deep Analytics Appliance – analyze massive data sets Demonstrates scalability from 100 to 500K homes while retaining 10 years’ historical data 60k metered customers in 5 states Accommodates ad hoc analysis of price fluctuation, energy consumption profiles, risk, fraud detection, grid health, etc. 94 © 2012 IBM Corporation
95.
Information Integration Provides
Unique Business Value Movement of large data sets in batch and real time – Parallel processing engine for efficient data movement Governance and trust for Big Data – Lineage and meta data of new Big Data data sources – Profile sources to determine trust Data Quality – Standardize and transform data 95 © 2012 IBM Corporation
96.
Leaders integrate and
govern massive Big Data Marketing Services with InfoSphere big Leader integrates data for customer intelligence Capabilities Utilized: Information Integration – data quality, ETL Deep Analytics Appliance Complex customer data integration for 54M records/hour Processing 5B simultaneous records 96 9696 © 2012 IBM Corporation
97.
@BigData_paulz THINK
PDF http://tinyurl.com/88auawg Hardcopy http://tinyurl.com/7ef9ubs March 27, 2012 © 2012 IBM Corporation 97
Baixar agora