SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
The Analytics Challenges Posed 
by Big Data 
Roger Bradford 
Agilex Technologies 
15 April 2013
2 
Velocity 
Standard Big Data View 
Big Data 
Volume 
Traditional BI 
Source: Forrester Group
3 
Big Data - Volume Examples 
Activity Rate 
E-mail >300 Billion*/Day 
Text Messages > 24 Billion/Day 
Cell Phones > 10 Billion Calls/Day 
YouTube > 1 Million New Videos/day 
Twitter > 500 Million Tweets/Day 
Facebook > 1 Billion Posts/Day 
*Short Scale Billion = 1,000 Million = 109
4 
Big Data - Velocity Example
By Website Content By User Native Language 
English 
5 
Big Data Variety Example – 
Internet Language Usage 
Spanish 
German 
English 
Other 
French 
Chinese 
Japanese 
Russian 
Russian 
Other 
Portuguese 
Japanese 
Spanish 
Chinese 
French 
Arabic 
German
Big Data - Variability Example 
Functions of 17,209 Genes 6
7 
Structured and Unstructured Data 
Structured Unstructured 
Sales Data E-mail 
Financial Data Instant messaging 
Climate Data Tweets 
Census Data Audio 
Movie Ratings Images 
Sensor Measurements Video 
Unstructured Information Accounts for more than 
80% of all Data in Organizations and is Growing 
15X Faster than Structured Data
8 
Challenges: Big Data vs. Hard Problems 
Big Data 
Volume 
Velocity 
Variety 
Variability 
Hard Problems 
Ambiguity 
Nth-order Relations 
Cardinality 
Non-locality
9 
•Synonomy: 
Ambiguity in Text 
Common English Nouns have 6-8 Close Synonyms 
Common English Verbs have 9-11 
•Polysemy: 
The Word Strike has 30 Common Meanings 
•Entity Ambiguity: 
 There are more than 45,000 People Named John Smith in 
the United States 
 There are more than 300,000 People Named Zhang Wei 
in China 
•Entity Variability: 
Some Person Names in Collections of Interest Occur in over 100 
Variants
Name Variant Example 
Vladimir Putin Vladimir Poutine Vladimir V. Putin 
Vladmir Putin Valdimir Putin Vladimir Vladimirovich 
10 
Putin 
Vladamir Putin Vladimr Putin Vladimir Vladimirovitch 
Putin 
Vlaidimir Putin Vladimir Puttin Vladimir Vladimirovic 
Putin 
Vladimir Poutin Putin, Vladimir Putin, Vladimir 
Vladimirovitch 
Vladimir Puttin Vladamir Putin Putin, Vladimir 
Vladimirovich 
Vlademir Putin Vladimier Putin V.V. Putin
# of Relations in 
5,998 Documents: 
11 
John  Bob Relationship: 
First Order: 
Second Order: 
Third Order: 
JOHN 
BOB 
JOHN 
TOM 
TOM 
BOB 
JOHN 
TOM 
TOM 
DAVE 
51,474 
DAVE 
BOB 
11,026,553 
68,070,600 
Nth-order Relationships
12 
Cardinality Example – Alias Detection 
Arthur 
Bishop 
Raul 
Sanchez 
Joel 
Rifkin 
Jose 
Haddock 
William 
Bonin 
Arthur 
Bishop 
Raul 
Sanchez 
.0366 
Joel Rifkin -.0464 .0616 
Jose 
Haddock 
.0366 .9675 .0616 
William 
Bonin 
.1526 .0125 .0016 .0125 
Challenge: Many by Many Comparisons- 
Processing 10 Million Names Requires 50 Trillion 
Comparisons
Non-locality Example– Clustering Documents 
13
14 
Twitter Example
15 
The Tweet Analysis Problem 
• Volume – 500 Million Tweets per Day Worldwide 
• Challenges: 
Very Low Signal to Noise Ratio (31 Million People 
Follow Lady Gaga) 
Implicit Context (“Let’s all Meet at Bob’s House”) 
Incomplete, Conflicting, and Erroneous Information 
Deliberate Deception (50% of all Tweets are Machine-generated)
16 
Applicable Analytic Techniques 
• Statistical Analysis 
• Categorization 
• Clustering 
• NLP Techniques 
• Semantic Analysis 
In General, Application of such Techniques to 
Big Data Problems is Computationally Intensive
17 
Cloud Enabling 
Millions of Documents 
Semantic Indexing Time (in Hours) 
Datacenter 
Server 
Map – Reduce 
with 64 Nodes
18 
GPU Enabling 
CPU 
GPU 
CPU: Intel Xeon X5660 
GPU: Nvidia Quadro 2000 
Seconds (in Thousands) 
Elements (in Billions) 
kNN Calculation
Representation 
19 
Semantic Enabling 
Data 
Semantic 
Analysis 
Semantic 
Space 
• Accommodates Nth-order Relationships 
• Automatically Coalesces Term Variants 
• Supports Automated Entity Disambiguation 
• Identifies Subtle Relationships 
• Can Combine Structured and Unstructured Data 
But Not as Well Understood as Structured Data 
Analysis Techniques
20 
IBM WATSON Winning “Jeopardy” 
• Volume: “Only” 1TB of Data (Mostly Text) 
• Velocity: Meeting the 3-second Response 
__Requirement of Jeopardy Required 80 
__Teraflops of Processing Power 
Challenge: 
•Question Decomposition
21 
Music Genome 
Objective: Match Liked Songs to Recommended Ones 
•  400 Attributes per 
_Song 
• 10 Million Songs 
• Each Song 
_Represented by a 
_Vector of Elements 
• 140 Trillion Elements 
• Distance Function is 
_Calculated between All 
_Songs
22 
Literature-based Discovery 
• PubMed Abstracts 
• Gene – Function Relationships 
__Derived Semantically 
• 98,074,359 Potential Gene-function 
__Associations. 
Zukas, A., GO-Driven Literature-Based Discovery using Semantic Analysis, MS Thesis, George Mason University, 
2007.
23 
Literature-based Discovery (Cont’d) 
Latent Gene and Function Relationships from 
the June 2006 Gene Ontology Later 
Documented in the January 2007 Gene 
Ontology 
•Nth-order Relationships 
• Complexity of Relations 
Challenges:
24 
Patent 
Databases 
Online 
Technical 
Literature 
Internal 
Publications 
Semantic Representation 
Space 
 
 
 
 
  
Prior Art 
Analysis 
White 
Space 
Analysis 
Patent Analysis 
• Need for Conceptual Comparisons 
•Technical Terminology / Obfuscation 
• Convoluted Structure (Claims) 
Challenges:
25 
Concept-driven Discovery 
Incoming 
Reporting Stream 
Fraud 
Exemplars 
Semantic 
Representation 
Space 
Xxxxxxxxx 
Xxxxxxxxx 
defraud 
Xxxxxxxxx 
scheme 
Continuous Cycling 
through ALL Names 
Generate 
Alerts 
Issue: N a me Disambiguation
26 
Rapid Data Overview 
Clustering 
Political 
Economic 
Incoming 
Data 
Admin 
Technical 
Regulatory 
•Technical Information 
• Multilingual Data 
Challenges:
Docs in 13 Languages 
 English Examples 
Range of 
Human 
Performance 
27 
Crosslingual Document Categorization 
– Big Data Solution Accuracy + Completeness 
of Categorization 
English Docs  
English Examples 
Number of Simultaneous Languages
28 
Where is Big Data Analytics Going? 
• Real-time Analysis 
• Multimedia Collections 
 Text 
 Structured Data 
 Audio 
 Video 
 Sensor Data 
• Temporal and Spatial Data Integration 
• Interactive Visualization 
• Continuous Retrospective Analysis 
• Advanced Analytics (Especially Semantic Analysis)
29 
Integration of Multimedia Data 
Integrated 
Analytics 
Structured Data 
Images 
Multi-lingual 
Text 
Audio 
Sensor Data 
Video 
Buyer Seller Material Amount Date 
John 
Smith 
Ace 
Jewelers 
Diamond 
Ring 
3 Carat 8/18/06
30 
Spatiotemporal Data Integration 
•Fully Automatic Integration of Spatial, 
_Temporal, and _Semantic Information 
•Location Disambiguation 
Challenges:
31 
Questions or Comments 
Roger Bradford 
Agilex Technologies Inc 
1-703-889-3916 
r.bradford@agilex.com

Mais conteúdo relacionado

Semelhante a II-SDV 2013 The Analytics Challenges Posed by Big Data

Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInKrishnaram Kenthapadi
 
HPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago ChapterHPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago ChapterHPCC Systems
 
DataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDATAVERSITY
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData Blueprint
 
chương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfchương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfphongnguyen312110237
 
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdfMeet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf09372002dedi
 
Big Data for One Big Family
Big Data for One Big FamilyBig Data for One Big Family
Big Data for One Big FamilyMatt Asay
 
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...Sudhir Tonse
 
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformSudhir Tonse
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handoutYi-Shin Chen
 
Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...
Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...
Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...Cengage Learning
 
HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesHUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesJohn Mulhall
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014Raja Chiky
 
Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Peter Mika
 
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆台灣資料科學年會
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 

Semelhante a II-SDV 2013 The Analytics Challenges Posed by Big Data (20)

Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedIn
 
HPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago ChapterHPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago Chapter
 
DataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big Data
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
datamining-lect1.pptx
datamining-lect1.pptxdatamining-lect1.pptx
datamining-lect1.pptx
 
chương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfchương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdf
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdfMeet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf
 
Big Data for One Big Family
Big Data for One Big FamilyBig Data for One Big Family
Big Data for One Big Family
 
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
 
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handout
 
Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...
Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...
Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...
 
HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesHUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation Slides
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014
 
[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊
 
Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015
 
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 

Mais de Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 

Mais de Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Último

Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...tanu pandey
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...SUHANI PANDEY
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.soniya singh
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...nirzagarg
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceDelhi Call girls
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls DubaiEscorts Call Girls
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubaikojalkojal131
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLimonikaupta
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Delhi Call girls
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...Escorts Call Girls
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...tanu pandey
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...SUHANI PANDEY
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...SUHANI PANDEY
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023
 

Último (20)

Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls Dubai
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 

II-SDV 2013 The Analytics Challenges Posed by Big Data

  • 1. The Analytics Challenges Posed by Big Data Roger Bradford Agilex Technologies 15 April 2013
  • 2. 2 Velocity Standard Big Data View Big Data Volume Traditional BI Source: Forrester Group
  • 3. 3 Big Data - Volume Examples Activity Rate E-mail >300 Billion*/Day Text Messages > 24 Billion/Day Cell Phones > 10 Billion Calls/Day YouTube > 1 Million New Videos/day Twitter > 500 Million Tweets/Day Facebook > 1 Billion Posts/Day *Short Scale Billion = 1,000 Million = 109
  • 4. 4 Big Data - Velocity Example
  • 5. By Website Content By User Native Language English 5 Big Data Variety Example – Internet Language Usage Spanish German English Other French Chinese Japanese Russian Russian Other Portuguese Japanese Spanish Chinese French Arabic German
  • 6. Big Data - Variability Example Functions of 17,209 Genes 6
  • 7. 7 Structured and Unstructured Data Structured Unstructured Sales Data E-mail Financial Data Instant messaging Climate Data Tweets Census Data Audio Movie Ratings Images Sensor Measurements Video Unstructured Information Accounts for more than 80% of all Data in Organizations and is Growing 15X Faster than Structured Data
  • 8. 8 Challenges: Big Data vs. Hard Problems Big Data Volume Velocity Variety Variability Hard Problems Ambiguity Nth-order Relations Cardinality Non-locality
  • 9. 9 •Synonomy: Ambiguity in Text Common English Nouns have 6-8 Close Synonyms Common English Verbs have 9-11 •Polysemy: The Word Strike has 30 Common Meanings •Entity Ambiguity: There are more than 45,000 People Named John Smith in the United States There are more than 300,000 People Named Zhang Wei in China •Entity Variability: Some Person Names in Collections of Interest Occur in over 100 Variants
  • 10. Name Variant Example Vladimir Putin Vladimir Poutine Vladimir V. Putin Vladmir Putin Valdimir Putin Vladimir Vladimirovich 10 Putin Vladamir Putin Vladimr Putin Vladimir Vladimirovitch Putin Vlaidimir Putin Vladimir Puttin Vladimir Vladimirovic Putin Vladimir Poutin Putin, Vladimir Putin, Vladimir Vladimirovitch Vladimir Puttin Vladamir Putin Putin, Vladimir Vladimirovich Vlademir Putin Vladimier Putin V.V. Putin
  • 11. # of Relations in 5,998 Documents: 11 John Bob Relationship: First Order: Second Order: Third Order: JOHN BOB JOHN TOM TOM BOB JOHN TOM TOM DAVE 51,474 DAVE BOB 11,026,553 68,070,600 Nth-order Relationships
  • 12. 12 Cardinality Example – Alias Detection Arthur Bishop Raul Sanchez Joel Rifkin Jose Haddock William Bonin Arthur Bishop Raul Sanchez .0366 Joel Rifkin -.0464 .0616 Jose Haddock .0366 .9675 .0616 William Bonin .1526 .0125 .0016 .0125 Challenge: Many by Many Comparisons- Processing 10 Million Names Requires 50 Trillion Comparisons
  • 15. 15 The Tweet Analysis Problem • Volume – 500 Million Tweets per Day Worldwide • Challenges: Very Low Signal to Noise Ratio (31 Million People Follow Lady Gaga) Implicit Context (“Let’s all Meet at Bob’s House”) Incomplete, Conflicting, and Erroneous Information Deliberate Deception (50% of all Tweets are Machine-generated)
  • 16. 16 Applicable Analytic Techniques • Statistical Analysis • Categorization • Clustering • NLP Techniques • Semantic Analysis In General, Application of such Techniques to Big Data Problems is Computationally Intensive
  • 17. 17 Cloud Enabling Millions of Documents Semantic Indexing Time (in Hours) Datacenter Server Map – Reduce with 64 Nodes
  • 18. 18 GPU Enabling CPU GPU CPU: Intel Xeon X5660 GPU: Nvidia Quadro 2000 Seconds (in Thousands) Elements (in Billions) kNN Calculation
  • 19. Representation 19 Semantic Enabling Data Semantic Analysis Semantic Space • Accommodates Nth-order Relationships • Automatically Coalesces Term Variants • Supports Automated Entity Disambiguation • Identifies Subtle Relationships • Can Combine Structured and Unstructured Data But Not as Well Understood as Structured Data Analysis Techniques
  • 20. 20 IBM WATSON Winning “Jeopardy” • Volume: “Only” 1TB of Data (Mostly Text) • Velocity: Meeting the 3-second Response __Requirement of Jeopardy Required 80 __Teraflops of Processing Power Challenge: •Question Decomposition
  • 21. 21 Music Genome Objective: Match Liked Songs to Recommended Ones • 400 Attributes per _Song • 10 Million Songs • Each Song _Represented by a _Vector of Elements • 140 Trillion Elements • Distance Function is _Calculated between All _Songs
  • 22. 22 Literature-based Discovery • PubMed Abstracts • Gene – Function Relationships __Derived Semantically • 98,074,359 Potential Gene-function __Associations. Zukas, A., GO-Driven Literature-Based Discovery using Semantic Analysis, MS Thesis, George Mason University, 2007.
  • 23. 23 Literature-based Discovery (Cont’d) Latent Gene and Function Relationships from the June 2006 Gene Ontology Later Documented in the January 2007 Gene Ontology •Nth-order Relationships • Complexity of Relations Challenges:
  • 24. 24 Patent Databases Online Technical Literature Internal Publications Semantic Representation Space Prior Art Analysis White Space Analysis Patent Analysis • Need for Conceptual Comparisons •Technical Terminology / Obfuscation • Convoluted Structure (Claims) Challenges:
  • 25. 25 Concept-driven Discovery Incoming Reporting Stream Fraud Exemplars Semantic Representation Space Xxxxxxxxx Xxxxxxxxx defraud Xxxxxxxxx scheme Continuous Cycling through ALL Names Generate Alerts Issue: N a me Disambiguation
  • 26. 26 Rapid Data Overview Clustering Political Economic Incoming Data Admin Technical Regulatory •Technical Information • Multilingual Data Challenges:
  • 27. Docs in 13 Languages English Examples Range of Human Performance 27 Crosslingual Document Categorization – Big Data Solution Accuracy + Completeness of Categorization English Docs English Examples Number of Simultaneous Languages
  • 28. 28 Where is Big Data Analytics Going? • Real-time Analysis • Multimedia Collections Text Structured Data Audio Video Sensor Data • Temporal and Spatial Data Integration • Interactive Visualization • Continuous Retrospective Analysis • Advanced Analytics (Especially Semantic Analysis)
  • 29. 29 Integration of Multimedia Data Integrated Analytics Structured Data Images Multi-lingual Text Audio Sensor Data Video Buyer Seller Material Amount Date John Smith Ace Jewelers Diamond Ring 3 Carat 8/18/06
  • 30. 30 Spatiotemporal Data Integration •Fully Automatic Integration of Spatial, _Temporal, and _Semantic Information •Location Disambiguation Challenges:
  • 31. 31 Questions or Comments Roger Bradford Agilex Technologies Inc 1-703-889-3916 r.bradford@agilex.com