SlideShare uma empresa Scribd logo
1 de 39
Baixar para ler offline
Using Graph Theory to understand Intent & Concepts – January 2013	
  



                               tumra.com	
  
UNDERSTANDING INTENT & CONCEPTS	
  
•  Use case:
    -  Enhancing Social TV user experience
    -  Matching users to content that interests them

•  Topics we’ll cover:
    -  Natural Language Processing
    -  Graph Theory
    -  Machine Learning


                         tumra.com	
  
USE CASE ENHANCED SOCIAL TV	
  
•  Objectives:
    -  Increase engagement with content
    -  Enhance multi-channel user experience

•  We built a prototype solution:
    -  Mines unstructured data in real-time
    -  Understands:
      -  What interests individual users
      -  Entities & Concepts (People, Places, Events)


                          tumra.com	
  
THE CHALLENGE	
  


THANKS FORtoLISTENING	
  
 Help users to “follow the story” regardless of the
 news outlet, integrated web / second-screen	
  




                      tumra.com	
  
                                             Photo Credit: byrion on Flickr (cc)
THE PROBLEM	
  


Unstructured
    Data
                  Magic?!?!         Awesomeness!




                    tumra.com	
  
THE PROBLEM	
  
•  Little useful data to work with…
    -  Streams of continuous live TV
    -  Have to create metadata

•  Where did we start?
    -  Ingest several live news channels
    -  Extract whatever data was available:
      -  In-video text using OCR
      -  Subtitles / Closed Captions


                         tumra.com	
  
STEP 1 NAMED ENTITY RECOGNITION	
  


We used a simple N-Gram model for exact matches;
    then Apache Lucene for everything else…	
  




                      tumra.com	
  
EXAMPLE N.E.R.	
  

  “David Cameron and the German
Chancellor Angela Merkel meets to
 discuss the debt crisis and signal
their approval for greater eurozone
           integration.”	
  


               tumra.com	
  
EXAMPLE N.E.R.	
  

  “David Cameron and the German
Chancellor Angela Merkel meets to
 discuss the debt crisis and signal
their approval for greater eurozone
           integration.”	
  


               tumra.com	
  
INITIAL SOLUTION	
  

                       NoSQL

Unstructured
                                       Awesomeness!
    Data


                         NER




                       tumra.com	
  
OH NO!!!
 *facepalm*	
  




     Photo Credit: cesarastudillo on Flickr (cc)
DISAMBIGUATION	
  
•  Which “David Cameron”?
    -  We have many in our Knowledgebase
    -  Sportsmen, actors, painters & characters…

•  Our initial simplistic approach was naïve
    -  Works great with unambiguous matches
    -  Best-case returns top-scoring entity

•  We needed a smarter approach
                       tumra.com	
  
RECAP	
  
•  We have an effectively ‘flat’ KB of Entities
    -    “David Cameron” -> Politician (Person)
    -    “Angela Merkel” -> Politician (Person)
    -    “German Chancellor” -> Political office (Concept)
    -    “Debt” -> Economic concept (Concept)
    -    “Eurozone” -> Economic area (Place)


•  We needed a way to find relationships
   between Entities

                            tumra.com	
  
THE BIG IDEA	
  




Graphs allow us to store relationships between entities, and
graph algorithms allow us to interrogate those connections…	
  
GRAPH DATABASES	
  
                                              Graph
   Neo4J
                                               Lab

                    Apache                             Golden
                    Giraph                              Orb


… of course there are many more open-source & proprietary ones	
  
                              tumra.com	
  
SO, WHICH ONE?	
  


                       ???
… it had to be fast, scalable, active development	
  

                        tumra.com	
  
STEP 2 BUILDING RELATIONSHIPS	
  

We had 250 million Nodes, and 4 billion Edges…
great initial results but horrendously inefficient!

  Example: “David Cameron” & “Angela Merkel”	
  



                       tumra.com	
  
INITIAL IMPROVEMENTS	
  
•  We didn’t need everything… just:
    -    People: “David Cameron”, “Angela Merkel”
    -    Places: “London”, “Downing Street”, “Eurozone”
    -    Concepts: “Debt”, “President”, “Eurozone”
    -    Things: Companies, Products etc.


•  Pruned the graph using Map/Reduce

•  This reduced the number of Entities…
    -  … but we still had billions of connections
                            tumra.com	
  
EXAMPLE PEOPLE, PLACES, CONCEPTS	
  

       “David Cameron and the German
     Chancellor Angela Merkel meets to
      discuss the debt crisis and signal
     their approval for greater eurozone
                integration.”	
  


                    tumra.com	
  
EXAMPLE PEOPLE, PLACES, CONCEPTS	
  
                  	
  
             “David Cameron and the German
           Chancellor Angela Merkel meets to
            discuss the debt crisis and signal
           their approval for greater eurozone
                      integration.”	
  
Concepts                                         Places
                          People

                          tumra.com	
  
DISAMBIGUATION	
  
                                                                         Angela
                                                                         Merkel

   David
 Cameron
 (painter)                  Living
                            Person         Politician
                                                               Head of
                                                                State




   David
  Cameron                                         David
(footballer)           David
                     Cameron                     Cameron
                      (actor)                   (politician)



Possibilities: shortest path, number of common connections etc.	
  
STEP 3 SIMPLIFYING THE GRAPH	
  

Sure all that extra metadata was tasty but we didn’t
           need it all to solve the use-case…

   So we used Map/Reduce to count the common
                  connections	
  


                        tumra.com	
  
SIMPLIFIED	
  
                                                                     Angela
                                                                     Merkel

   David
 Cameron
 (painter)
                                   1
                                                                 3
                                              1
   David
  Cameron                                           David
(footballer)              David
                        Cameron                    Cameron
                         (actor)                  (politician)



       Woah … that looks a lot like Least Cost Routing problem	
  
LEAST COST PATH	
  
                                                                 Angela
                                                                 Merkel

   David
 Cameron
 (painter)
                                   1/1
                                                               1/3
                                              1/1
   David
  Cameron                                         David
(footballer)              David
                        Cameron                  Cameron
                         (actor)                (politician)



               1 / number of common connections = cost	
  
UPDATED SOLUTION	
  

                  Neo4J                      NoSQL

Unstructured
                          Disambiguation             Awesomeness!
    Data


                               NER




                             tumra.com	
  
RECAP	
  
•  Graphs allow us to interrogate relationships
    -  Disambiguate when faced with multiple possibilities
    -  Infer more about the context of what’s happening


•  Went through iterations of improvements
    -  Kept our Entity data in NoSQL = TB’s
    -  Used the Graph as an index of sorts = GB’s


•  Neo4j was a great fit for our needs

                           tumra.com	
  
STEP 4 MAKING IT WORK REAL-TIME	
  

Some queries were taking ‘seconds’ and we needed
 to go a lot faster because TV wont wait for us …

 Do we really need to check the Graph everytime?	
  



                        tumra.com	
  
ENTER MACHINE LEARNING	
  
•  We can use simple predictors to estimate
   the likelihood of Entities occurring
    -  i.e. every time we’ve looked for “David Cameron” in
       the past the best match was the Politician


•  Keeping a ‘probabilistic context’ of recent
   Entities allows us to detect shifts in topics
    -  Works especially well on News channels
    -  Reduces the demand on Graph lookups

                          tumra.com	
  
BAYES THEOREM	
  




Looks complicated, but its basically just counting & division	
  
                                                         Photo Credit: mattbuck007 on Flickr (cc)
STEP 5 MAKING IT WORK WORLDWIDE	
  


 We solved the problem for English, but what about
                 other languages?	
  




                       tumra.com	
  
LANGUAGE	
  
•  Our core Entities of ‘People’, ‘Places’, &
   ‘Concepts’ are language agnostic…

•  We needed a way to ditch ‘language’ and
   jump straight to entities…
    -  The colour ‘Red’ means the same thing regardless of
       you calling it ‘Rot’, ‘Rouge’ or ‘赤’


•  Again, Graphs could solve the problem
                          tumra.com	
  
LANGUAGE INDEPENDENT	
  
Red                                   !"#‫أ‬

                       Color:
Rouge
                        Red           赤


        Rot                     Röd
                Rojo        紅
PROBLEM SOLVED	
  


Typical response time ~30ms … relevancy improves
     over time and learns new entities ‘online’	
  




                       tumra.com	
  
FINAL SOLUTION	
  

                 Neo4J                           NoSQL

Unstructured    Language Model              Disambiguation
                                                             Awesomeness!
    Data
                         Machine Learning

                                 NER




                                 tumra.com	
  
ABOUT US	
  
•  We’ve built a product…
    -  Our ‘Digital Marketing Optimization’ platform
       improves conversion rates & customer satisfaction
       for eCommerce & Marketing campaigns
    -  Launches Q1 2013

•  What else do we do?
    -  ‘Big Data’ & ‘Data Science’ professional services
    -  Bespoke prototype & solution development


         “TUMRA” is a transliteration of the Sanskrit word for “BIG”;
        we thought it’s a great name … ( and the .COM was available )
                                   tumra.com	
  
TUMRA
                                   You?

THANKS FOR LISTENING	
  
         We’re hiring!
        Data Scientists & Developers
              work@tumra.com
                     tumra.com	
  
THANKS FOR LISTENING
    Questions?	
  
          tumra.com
      hello@tumra.com
               	
  
      twitter.com/tumra
            tumra.com	
  

Mais conteúdo relacionado

Destaque

Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jWebinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jNeo4j
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to GraphsNeo4j
 
Deploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsDeploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsNeo4j
 
Digital Transformation in a Connected World
Digital Transformation in a Connected WorldDigital Transformation in a Connected World
Digital Transformation in a Connected WorldNeo4j
 
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Neo4j   graphs in the real world - graph days d.c. - april 14, 2015Neo4j   graphs in the real world - graph days d.c. - april 14, 2015
Neo4j graphs in the real world - graph days d.c. - april 14, 2015Neo4j
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise ArchitectsNeo4j
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
 
RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to GraphsNeo4j
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph DatabasesInfiniteGraph
 
Using a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMUsing a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMNeo4j
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big GraphNeo4j
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowNeo4j
 
Neo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j
 
Fraud Detection with Neo4j
Fraud Detection with Neo4jFraud Detection with Neo4j
Fraud Detection with Neo4jNeo4j
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j InternalsTobias Lindaaker
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentationjexp
 
Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater Neo4j
 

Destaque (20)

Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jWebinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to Graphs
 
Deploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsDeploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime Insights
 
Digital Transformation in a Connected World
Digital Transformation in a Connected WorldDigital Transformation in a Connected World
Digital Transformation in a Connected World
 
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Neo4j   graphs in the real world - graph days d.c. - april 14, 2015Neo4j   graphs in the real world - graph days d.c. - april 14, 2015
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise Architects
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to Graphs
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph Databases
 
Using a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMUsing a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDM
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big Graph
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
 
Neo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j the Anti Crime Database
Neo4j the Anti Crime Database
 
Fraud Detection with Neo4j
Fraud Detection with Neo4jFraud Detection with Neo4j
Fraud Detection with Neo4j
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
 
Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
 

Último

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Último (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Using Graph theory to understand Intent & Concepts - Neo4j User Group (January 2013)

  • 1. Using Graph Theory to understand Intent & Concepts – January 2013   tumra.com  
  • 2. UNDERSTANDING INTENT & CONCEPTS   •  Use case: -  Enhancing Social TV user experience -  Matching users to content that interests them •  Topics we’ll cover: -  Natural Language Processing -  Graph Theory -  Machine Learning tumra.com  
  • 3. USE CASE ENHANCED SOCIAL TV   •  Objectives: -  Increase engagement with content -  Enhance multi-channel user experience •  We built a prototype solution: -  Mines unstructured data in real-time -  Understands: -  What interests individual users -  Entities & Concepts (People, Places, Events) tumra.com  
  • 4. THE CHALLENGE   THANKS FORtoLISTENING   Help users to “follow the story” regardless of the news outlet, integrated web / second-screen   tumra.com   Photo Credit: byrion on Flickr (cc)
  • 5. THE PROBLEM   Unstructured Data Magic?!?! Awesomeness! tumra.com  
  • 6. THE PROBLEM   •  Little useful data to work with… -  Streams of continuous live TV -  Have to create metadata •  Where did we start? -  Ingest several live news channels -  Extract whatever data was available: -  In-video text using OCR -  Subtitles / Closed Captions tumra.com  
  • 7. STEP 1 NAMED ENTITY RECOGNITION   We used a simple N-Gram model for exact matches; then Apache Lucene for everything else…   tumra.com  
  • 8. EXAMPLE N.E.R.   “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   tumra.com  
  • 9. EXAMPLE N.E.R.   “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   tumra.com  
  • 10. INITIAL SOLUTION   NoSQL Unstructured Awesomeness! Data NER tumra.com  
  • 11. OH NO!!! *facepalm*   Photo Credit: cesarastudillo on Flickr (cc)
  • 12. DISAMBIGUATION   •  Which “David Cameron”? -  We have many in our Knowledgebase -  Sportsmen, actors, painters & characters… •  Our initial simplistic approach was naïve -  Works great with unambiguous matches -  Best-case returns top-scoring entity •  We needed a smarter approach tumra.com  
  • 13. RECAP   •  We have an effectively ‘flat’ KB of Entities -  “David Cameron” -> Politician (Person) -  “Angela Merkel” -> Politician (Person) -  “German Chancellor” -> Political office (Concept) -  “Debt” -> Economic concept (Concept) -  “Eurozone” -> Economic area (Place) •  We needed a way to find relationships between Entities tumra.com  
  • 14. THE BIG IDEA   Graphs allow us to store relationships between entities, and graph algorithms allow us to interrogate those connections…  
  • 15. GRAPH DATABASES   Graph Neo4J Lab Apache Golden Giraph Orb … of course there are many more open-source & proprietary ones   tumra.com  
  • 16. SO, WHICH ONE?   ??? … it had to be fast, scalable, active development   tumra.com  
  • 17. STEP 2 BUILDING RELATIONSHIPS   We had 250 million Nodes, and 4 billion Edges… great initial results but horrendously inefficient! Example: “David Cameron” & “Angela Merkel”   tumra.com  
  • 18.
  • 19.
  • 20. INITIAL IMPROVEMENTS   •  We didn’t need everything… just: -  People: “David Cameron”, “Angela Merkel” -  Places: “London”, “Downing Street”, “Eurozone” -  Concepts: “Debt”, “President”, “Eurozone” -  Things: Companies, Products etc. •  Pruned the graph using Map/Reduce •  This reduced the number of Entities… -  … but we still had billions of connections tumra.com  
  • 21. EXAMPLE PEOPLE, PLACES, CONCEPTS   “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   tumra.com  
  • 22. EXAMPLE PEOPLE, PLACES, CONCEPTS     “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   Concepts Places People tumra.com  
  • 23. DISAMBIGUATION   Angela Merkel David Cameron (painter) Living Person Politician Head of State David Cameron David (footballer) David Cameron Cameron (actor) (politician) Possibilities: shortest path, number of common connections etc.  
  • 24. STEP 3 SIMPLIFYING THE GRAPH   Sure all that extra metadata was tasty but we didn’t need it all to solve the use-case… So we used Map/Reduce to count the common connections   tumra.com  
  • 25. SIMPLIFIED   Angela Merkel David Cameron (painter) 1 3 1 David Cameron David (footballer) David Cameron Cameron (actor) (politician) Woah … that looks a lot like Least Cost Routing problem  
  • 26. LEAST COST PATH   Angela Merkel David Cameron (painter) 1/1 1/3 1/1 David Cameron David (footballer) David Cameron Cameron (actor) (politician) 1 / number of common connections = cost  
  • 27. UPDATED SOLUTION   Neo4J NoSQL Unstructured Disambiguation Awesomeness! Data NER tumra.com  
  • 28. RECAP   •  Graphs allow us to interrogate relationships -  Disambiguate when faced with multiple possibilities -  Infer more about the context of what’s happening •  Went through iterations of improvements -  Kept our Entity data in NoSQL = TB’s -  Used the Graph as an index of sorts = GB’s •  Neo4j was a great fit for our needs tumra.com  
  • 29. STEP 4 MAKING IT WORK REAL-TIME   Some queries were taking ‘seconds’ and we needed to go a lot faster because TV wont wait for us … Do we really need to check the Graph everytime?   tumra.com  
  • 30. ENTER MACHINE LEARNING   •  We can use simple predictors to estimate the likelihood of Entities occurring -  i.e. every time we’ve looked for “David Cameron” in the past the best match was the Politician •  Keeping a ‘probabilistic context’ of recent Entities allows us to detect shifts in topics -  Works especially well on News channels -  Reduces the demand on Graph lookups tumra.com  
  • 31. BAYES THEOREM   Looks complicated, but its basically just counting & division   Photo Credit: mattbuck007 on Flickr (cc)
  • 32. STEP 5 MAKING IT WORK WORLDWIDE   We solved the problem for English, but what about other languages?   tumra.com  
  • 33. LANGUAGE   •  Our core Entities of ‘People’, ‘Places’, & ‘Concepts’ are language agnostic… •  We needed a way to ditch ‘language’ and jump straight to entities… -  The colour ‘Red’ means the same thing regardless of you calling it ‘Rot’, ‘Rouge’ or ‘赤’ •  Again, Graphs could solve the problem tumra.com  
  • 34. LANGUAGE INDEPENDENT   Red !"#‫أ‬ Color: Rouge Red 赤 Rot Röd Rojo 紅
  • 35. PROBLEM SOLVED   Typical response time ~30ms … relevancy improves over time and learns new entities ‘online’   tumra.com  
  • 36. FINAL SOLUTION   Neo4J NoSQL Unstructured Language Model Disambiguation Awesomeness! Data Machine Learning NER tumra.com  
  • 37. ABOUT US   •  We’ve built a product… -  Our ‘Digital Marketing Optimization’ platform improves conversion rates & customer satisfaction for eCommerce & Marketing campaigns -  Launches Q1 2013 •  What else do we do? -  ‘Big Data’ & ‘Data Science’ professional services -  Bespoke prototype & solution development “TUMRA” is a transliteration of the Sanskrit word for “BIG”; we thought it’s a great name … ( and the .COM was available ) tumra.com  
  • 38. TUMRA You? THANKS FOR LISTENING   We’re hiring! Data Scientists & Developers work@tumra.com tumra.com  
  • 39. THANKS FOR LISTENING Questions?   tumra.com hello@tumra.com   twitter.com/tumra tumra.com