SlideShare uma empresa Scribd logo
1 de 25
Enterprise Search:
Addressing the First Problem
of Big Data & Analytics
Raj Dhillon, Ph.D.
Chief Field Technologist
StampedeCon 2016
July 27, 2016
Foundational methodology
2
Thomas Bayes
(1702-1761)
Foundational methodology
3
If we toss a coin 100 times and get heads
every time, what’s the probability of getting a
head on the 101st toss?
50% 99+%
Traditional probability Bayesian Inference
Foundational methodology
4
Alfred Butts’ letter frequencies
Claude Shannon
(1916-2001)
Silos Volume and Velocity Expectations
What is enterprise search and what are its challenges?
Challenges
Enterprise search is a means of
identifying and enabling content from
multiple enterprise-type sources to
be indexed, searched, and displayed
to a defined audience.
An effective enterprise search
platform should enable productivity.
5
Productivity depends on effective enterprise search
6
10%
50%
6 hours
50-80%
The Butler Group reports up to 10% of staff costs are lost because employees are
unable to find the right information to do their jobs. (2006)
In a study of over 1000 middle managers, Accenture found that managers spend up to
2 hours a day searching for information, and more than 50% of the information they
obtain has no value to them. (2007)
According to the New York Times, data scientists spend 50-80% of their time collecting
and prepping data. (2014)
An Aberdeen Group study of 188 organizations that had implemented enterprise
search revealed executives at the top performing companies within those examined
saved 6 hours a week looking for information, compared to 1 hour for executives at the
other companies. (2009)
Overcoming the data silo
7
Identify and connect
disparate data sources
The data landscape is radically changing
More connected people, apps, and things
generating more data in many forms
Business
data
Human
data
Machine
data
10x
faster
growth than
traditional
business
data
8
Why is processing human data different?
– Human Information is made up of ideas, is diverse, and has context
– Ideas don’t exactly match like data does; they have distance.
– Human Information is not static – it’s dynamic and lives everywhere.
9
MobileTextsEmailAudioVideoSocial Media
Transactional Data Documents Search Engine Images IT/OT
Enterprise Search: Let me Google that for you
Web Enterprise
10
Content Web pages; largely homogeneous
Variety of data sources; variety of file
formats; heterogeneous
Relevance Tolerates large number of results, as well
as duplicated or overlapping information
Demands small number of unique results
with high degree of specificity
Personalization Little personalization expected; expect list
of returned results
Expectation of customized results (data
access) aligned with user profiles (role,
group, projects, etc.)
Analysis Generic Domain-specific
Big data requirements for enterprise search
11
Unifying diverse sets of data1 Allows users to ask questions that haven’t been
asked before
Automatic and real-time3
Content is automatically indexed and available
for search, enabling users to find data almost
as quickly as it’s being captured
Identifying what’s relevant2
Increase productivity by streamlining search 
users can focus on transforming and extracting
the right data for analysis
BenefitsRequirements
Action-oriented / insight driven4 Maximize return on human capital
Tackling big data requirements for enterprise search
12
Unifying diverse sets of data1
Automatic and real-time3
Identifying what’s relevant2
HowRequirements
Action-oriented / insight driven4
– Create single view of enterprise content by
connecting to different sources and
repositories
– Data streamlining
– Automatic query guidance
– Intelligent summarization
– Intelligent highlighting
– Personalization
– Classification and clustering
– Handled via indexing protocol – not directly
visible to end users
– Concept navigation / visualizations
– Eduction
– Sentiment
– Classification and clustering
– Machine Learning
Personalizing data
Implicit and explicit
profiling
Relationship discovery /
community and
expertise networks
Intent-based ranking
13
Customer C is linked
to Customer E via
Customer D
Customer H is the
most influential in
Customer B’s network
Customer A is in
Customer B’s network
Customers F and G
purchased the same
model last year
Classification and Clustering
14
Product performance issues
Side letters
Off balance
sheet transactions
Managed classification:
Create categories using
business rules or training
Automatic classification
and clustering:
Automatically determine
categories based on patterns
and relationships in
information
Eduction and Sentiment
I stayed at the resort last
week, and though the
mattresses were very nice,
the service was awful.
15
Names
Places
IP addresses
Companies
Events
Relationships
Medicines
Airports
Cars
Social Security numbers
Phone numbers
Credit cards
Dates
Holidays
Job titles
Currencies
Eduction: Apply structure to unstructured data by
automatically identifying and extracting terms in
documents that lend themselves to key fields
Sentiment: Decomposition and classification
within a sentence to pull out the sentiment
surrounding specific topics
Intelligent search with Machine Learning
16
Document interpretation /
topic and concept identification
Sentiment analysis
Query analysis / clustering
Personalization of content /
recommendations
Categorization / classification
of data
Entity identification
Ranking results
Auto-complete / directed
navigation
What’s next?
17
What else are users asking for?
 Improved treatment of poor quality data
 More interactive search / digital assistants
 Streamlined / better defined workflows
 Better visualization / user experienceExtract
Analyze
Connect Index
Search
Predict
Case Studies
19
Stanford Children’s Health
Research for healthcare provider ranking study
Challenge
– Quality and clinical effectiveness research on ~115K patients, ~390K
encounters, ~3M documents
– Diverse data types (structured and unstructured) across data silos
involved
– Time constraints vs extensive search scope
Result
– Cross patient search for cohort identification
– Intuitive UI for simple query construction
– Easy clinical note review with highlights, navigation and related
concepts
– Portable queries and results
– Fast indexing
20
Leading Chinese telecom
Communications service provider industry
Challenge
– Allow users to access information on thousands of public services
directly from their mobile phones – success of this platform depends
on the users’ ability to quickly find information
Result
– Over 740 million subscribers can search through more than 8,000
applications for public service information, including public
transportation schedules, public health records, traffic offenses, and
more
– Users receive more accurate search results than ever before
– Customers get the most relevant and useful information regardless of
the terms they use in the search
21
Leading financial software, data and media company
Subscribers require up-to-the-second information on market conditions and trends
Challenge
– Deliver search performance at the scale required by the size of its data
repository, 200 million messages, 15-20 million chats daily
– Provide robust, cost-efficient solution with scalability for large and
growing volume of data, supported by small IT headcount
Result
– Detects trends in real-time messaging and chats for subscribers
– Accommodates 10+ billion of document entries without compromising
performance today
– Ensures scalability delivers ROI in the future
22
Leading American multinational telecom
Paying careful attention to every aspect of customer-facing processes and applications
Challenge
– Provide support desk staff with fast access to precise information
required to address customer’s problem
– Improve knowledge management system search capabilities
Result
– Reduced time-to-resolution with fast queries that ensure support
experts can resolve customer issues quickly
– Relevant results as query functionality makes sure that results deliver
information most likely to resolve customer issues
23
NASCAR
Fan and Media Engagement Center
Challenge
– Economic conditions
– Rapidly changing media landscape (social media growth)
– Rev pressures from sponsors
– Industry leadership expectation
Result
– Live monitoring and analysis of broadcast, news and social media
– Sponsors’ brand and fan sentiment analyses
– Analytics to support race team sponsorship renewals
– Crisis management
– Build fan base with active engagement
24
Thank you
25

Mais conteúdo relacionado

Mais procurados

Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Hortonworks
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
Caserta
 

Mais procurados (20)

Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Enabling digital business with governed data lake
Enabling digital business with governed data lakeEnabling digital business with governed data lake
Enabling digital business with governed data lake
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
 

Destaque

Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
StampedeCon
 

Destaque (16)

Batch and Real-time EHR updates into Hadoop - StampedeCon 2015
Batch and Real-time EHR updates into Hadoop - StampedeCon 2015Batch and Real-time EHR updates into Hadoop - StampedeCon 2015
Batch and Real-time EHR updates into Hadoop - StampedeCon 2015
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
 
Visualizing Big Data – The Fundamentals
Visualizing Big Data – The FundamentalsVisualizing Big Data – The Fundamentals
Visualizing Big Data – The Fundamentals
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 
Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
 
Interplay of Big Data and IoT - StampedeCon 2016
Interplay of Big Data and IoT - StampedeCon 2016Interplay of Big Data and IoT - StampedeCon 2016
Interplay of Big Data and IoT - StampedeCon 2016
 
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
5 Reasons Why Healthcare Data is Unique and Difficult to Measure
5 Reasons Why Healthcare Data is Unique and Difficult to Measure5 Reasons Why Healthcare Data is Unique and Difficult to Measure
5 Reasons Why Healthcare Data is Unique and Difficult to Measure
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Semelhante a Enterprise Search: Addressing the First Problem of Big Data & Analytics - StampedeCon 2016

Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
Bala Iyer
 
Final ppt sec.data.coll
Final ppt sec.data.collFinal ppt sec.data.coll
Final ppt sec.data.coll
Ram Sonawane
 
Running head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docx
Running head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docxRunning head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docx
Running head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docx
jeanettehully
 
Evaluation Of A Customer Relation Assignment
Evaluation Of A Customer Relation AssignmentEvaluation Of A Customer Relation Assignment
Evaluation Of A Customer Relation Assignment
Jan Champagne
 

Semelhante a Enterprise Search: Addressing the First Problem of Big Data & Analytics - StampedeCon 2016 (20)

Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
 
Final ppt sec.data.coll
Final ppt sec.data.collFinal ppt sec.data.coll
Final ppt sec.data.coll
 
HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016
 
Statistika dan Analisis Data
Statistika dan Analisis DataStatistika dan Analisis Data
Statistika dan Analisis Data
 
Enterprise search
Enterprise searchEnterprise search
Enterprise search
 
Cloud and business agility
Cloud and business agilityCloud and business agility
Cloud and business agility
 
The value of big data
The value of big dataThe value of big data
The value of big data
 
FAST Search-webinar-06-29-2010
FAST Search-webinar-06-29-2010FAST Search-webinar-06-29-2010
FAST Search-webinar-06-29-2010
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 
Business Analytics
 Business Analytics  Business Analytics
Business Analytics
 
Running head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docx
Running head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docxRunning head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docx
Running head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docx
 
Overview of Data and Analytics Essentials and Foundations
Overview of Data and Analytics Essentials and FoundationsOverview of Data and Analytics Essentials and Foundations
Overview of Data and Analytics Essentials and Foundations
 
Enabling Success With Big Data - Driven Talent Acquisition
Enabling Success With Big Data - Driven Talent AcquisitionEnabling Success With Big Data - Driven Talent Acquisition
Enabling Success With Big Data - Driven Talent Acquisition
 
Predictive Analytics, AI and the Promise of Personalization
Predictive Analytics, AI and the Promise of PersonalizationPredictive Analytics, AI and the Promise of Personalization
Predictive Analytics, AI and the Promise of Personalization
 
Content analytics
Content analyticsContent analytics
Content analytics
 
Cis 500 assignment 4
Cis 500 assignment 4Cis 500 assignment 4
Cis 500 assignment 4
 
Internet Intelligence Approach
Internet Intelligence ApproachInternet Intelligence Approach
Internet Intelligence Approach
 
Evaluation Of A Customer Relation Assignment
Evaluation Of A Customer Relation AssignmentEvaluation Of A Customer Relation Assignment
Evaluation Of A Customer Relation Assignment
 
The Evolution Of Competitive Intelligence Dec09 Final
The Evolution Of Competitive Intelligence Dec09 FinalThe Evolution Of Competitive Intelligence Dec09 Final
The Evolution Of Competitive Intelligence Dec09 Final
 
2011 05 11 13-45 neu-topsoft-enterprise search_4x3
2011 05 11 13-45 neu-topsoft-enterprise search_4x32011 05 11 13-45 neu-topsoft-enterprise search_4x3
2011 05 11 13-45 neu-topsoft-enterprise search_4x3
 

Mais de StampedeCon

Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
StampedeCon
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 

Mais de StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 

Último

Último (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

Enterprise Search: Addressing the First Problem of Big Data & Analytics - StampedeCon 2016

  • 1. Enterprise Search: Addressing the First Problem of Big Data & Analytics Raj Dhillon, Ph.D. Chief Field Technologist StampedeCon 2016 July 27, 2016
  • 3. Foundational methodology 3 If we toss a coin 100 times and get heads every time, what’s the probability of getting a head on the 101st toss? 50% 99+% Traditional probability Bayesian Inference
  • 4. Foundational methodology 4 Alfred Butts’ letter frequencies Claude Shannon (1916-2001)
  • 5. Silos Volume and Velocity Expectations What is enterprise search and what are its challenges? Challenges Enterprise search is a means of identifying and enabling content from multiple enterprise-type sources to be indexed, searched, and displayed to a defined audience. An effective enterprise search platform should enable productivity. 5
  • 6. Productivity depends on effective enterprise search 6 10% 50% 6 hours 50-80% The Butler Group reports up to 10% of staff costs are lost because employees are unable to find the right information to do their jobs. (2006) In a study of over 1000 middle managers, Accenture found that managers spend up to 2 hours a day searching for information, and more than 50% of the information they obtain has no value to them. (2007) According to the New York Times, data scientists spend 50-80% of their time collecting and prepping data. (2014) An Aberdeen Group study of 188 organizations that had implemented enterprise search revealed executives at the top performing companies within those examined saved 6 hours a week looking for information, compared to 1 hour for executives at the other companies. (2009)
  • 7. Overcoming the data silo 7 Identify and connect disparate data sources
  • 8. The data landscape is radically changing More connected people, apps, and things generating more data in many forms Business data Human data Machine data 10x faster growth than traditional business data 8
  • 9. Why is processing human data different? – Human Information is made up of ideas, is diverse, and has context – Ideas don’t exactly match like data does; they have distance. – Human Information is not static – it’s dynamic and lives everywhere. 9 MobileTextsEmailAudioVideoSocial Media Transactional Data Documents Search Engine Images IT/OT
  • 10. Enterprise Search: Let me Google that for you Web Enterprise 10 Content Web pages; largely homogeneous Variety of data sources; variety of file formats; heterogeneous Relevance Tolerates large number of results, as well as duplicated or overlapping information Demands small number of unique results with high degree of specificity Personalization Little personalization expected; expect list of returned results Expectation of customized results (data access) aligned with user profiles (role, group, projects, etc.) Analysis Generic Domain-specific
  • 11. Big data requirements for enterprise search 11 Unifying diverse sets of data1 Allows users to ask questions that haven’t been asked before Automatic and real-time3 Content is automatically indexed and available for search, enabling users to find data almost as quickly as it’s being captured Identifying what’s relevant2 Increase productivity by streamlining search  users can focus on transforming and extracting the right data for analysis BenefitsRequirements Action-oriented / insight driven4 Maximize return on human capital
  • 12. Tackling big data requirements for enterprise search 12 Unifying diverse sets of data1 Automatic and real-time3 Identifying what’s relevant2 HowRequirements Action-oriented / insight driven4 – Create single view of enterprise content by connecting to different sources and repositories – Data streamlining – Automatic query guidance – Intelligent summarization – Intelligent highlighting – Personalization – Classification and clustering – Handled via indexing protocol – not directly visible to end users – Concept navigation / visualizations – Eduction – Sentiment – Classification and clustering – Machine Learning
  • 13. Personalizing data Implicit and explicit profiling Relationship discovery / community and expertise networks Intent-based ranking 13 Customer C is linked to Customer E via Customer D Customer H is the most influential in Customer B’s network Customer A is in Customer B’s network Customers F and G purchased the same model last year
  • 14. Classification and Clustering 14 Product performance issues Side letters Off balance sheet transactions Managed classification: Create categories using business rules or training Automatic classification and clustering: Automatically determine categories based on patterns and relationships in information
  • 15. Eduction and Sentiment I stayed at the resort last week, and though the mattresses were very nice, the service was awful. 15 Names Places IP addresses Companies Events Relationships Medicines Airports Cars Social Security numbers Phone numbers Credit cards Dates Holidays Job titles Currencies Eduction: Apply structure to unstructured data by automatically identifying and extracting terms in documents that lend themselves to key fields Sentiment: Decomposition and classification within a sentence to pull out the sentiment surrounding specific topics
  • 16. Intelligent search with Machine Learning 16 Document interpretation / topic and concept identification Sentiment analysis Query analysis / clustering Personalization of content / recommendations Categorization / classification of data Entity identification Ranking results Auto-complete / directed navigation
  • 18. What else are users asking for?  Improved treatment of poor quality data  More interactive search / digital assistants  Streamlined / better defined workflows  Better visualization / user experienceExtract Analyze Connect Index Search Predict
  • 20. Stanford Children’s Health Research for healthcare provider ranking study Challenge – Quality and clinical effectiveness research on ~115K patients, ~390K encounters, ~3M documents – Diverse data types (structured and unstructured) across data silos involved – Time constraints vs extensive search scope Result – Cross patient search for cohort identification – Intuitive UI for simple query construction – Easy clinical note review with highlights, navigation and related concepts – Portable queries and results – Fast indexing 20
  • 21. Leading Chinese telecom Communications service provider industry Challenge – Allow users to access information on thousands of public services directly from their mobile phones – success of this platform depends on the users’ ability to quickly find information Result – Over 740 million subscribers can search through more than 8,000 applications for public service information, including public transportation schedules, public health records, traffic offenses, and more – Users receive more accurate search results than ever before – Customers get the most relevant and useful information regardless of the terms they use in the search 21
  • 22. Leading financial software, data and media company Subscribers require up-to-the-second information on market conditions and trends Challenge – Deliver search performance at the scale required by the size of its data repository, 200 million messages, 15-20 million chats daily – Provide robust, cost-efficient solution with scalability for large and growing volume of data, supported by small IT headcount Result – Detects trends in real-time messaging and chats for subscribers – Accommodates 10+ billion of document entries without compromising performance today – Ensures scalability delivers ROI in the future 22
  • 23. Leading American multinational telecom Paying careful attention to every aspect of customer-facing processes and applications Challenge – Provide support desk staff with fast access to precise information required to address customer’s problem – Improve knowledge management system search capabilities Result – Reduced time-to-resolution with fast queries that ensure support experts can resolve customer issues quickly – Relevant results as query functionality makes sure that results deliver information most likely to resolve customer issues 23
  • 24. NASCAR Fan and Media Engagement Center Challenge – Economic conditions – Rapidly changing media landscape (social media growth) – Rev pressures from sponsors – Industry leadership expectation Result – Live monitoring and analysis of broadcast, news and social media – Sponsors’ brand and fan sentiment analyses – Analytics to support race team sponsorship renewals – Crisis management – Build fan base with active engagement 24