SlideShare uma empresa Scribd logo
1 de 16
Authors:
Ilias Tachmazidis, Grigoris Antoniou,
      Giorgos Flouris, Spyros Kotoulas
                  Partially funded by PlanetData
   Huge data set coming from
    ◦ the Web, government authorities, scientific
      databases, sensors and more
   Defeasible logic
    ◦ is suitable for encoding commonsense knowledge
      and reasoning
    ◦ avoids triviality of inference due to low-quality data
   Defeasible logic has low complexity
    ◦ The consequences of a defeasible theory D can be
      computed in O(N) time, where N is the number of
      symbols in D
   Reasoning is performed in the presence of
    defeasible rules
   Defeasible logic has been implemented for
    in-memory reasoning, however, it is not
    applicable for huge data set
   Solution: scalability/parallelization using the
    MapReduce framework
   Our approach is restricted to single-
    argument reasoning
   A multi-argument implementation has been
    accepted in ECAI 2012 (to appear)
   Ilias Tachmazidis, Grigoris Antoniou, Giorgos
    Flouris, Spyros Kotoulas, and Lee
    McCluskey, ‘Large-scale Parallel Stratified
    Defeasible Reasoning’, in ECAI, (2012)
   Facts
    ◦ e.g. bird(eagle)
   Strict Rules
    ◦ e.g. bird(X)  animal(X)
   Defeasible Rules
    ◦ e.g. bird(X)  flies(X)
•   Priority Relation (acyclic relation on the set of
    rules)
    –e.g.    r: bird(X)  flies(X)
             r’: brokenWing(X)  ¬ flies(X)
             r’ > r
   Inspired by similar primitives in LISP and
    other functional languages
   Operates exclusively on <key, value> pairs
   Input and Output types of a MapReduce job:
    ◦   Input : <k1, v1>
    ◦   Map(k1,v1) → list(k2,v2)
    ◦   Reduce(k2, list (v2)) → list(k3,v3)
    ◦   Output : list(k3,v3)
   Provides an infrastructure that takes care of
    ◦ distribution of data
    ◦ management of fault tolerance
    ◦ results collection
   For a specific problem
    ◦ developer writes a few routines which are following
      the general interface
   Rule decomposition
    ◦ the computation of each rule is assigned to a
      computer in the cloud
    ◦ difficult to achieve balanced work distribution
   Data decomposition
    ◦ a subset of data is assigned to each computer in
      the cloud
    ◦ provides more fine-grained partitioning
    ◦ our solution is based on data decomposition
   Rule set:
    ◦   r1   : bird(X)  animal(X)
    ◦   r2   : bird(X)  flies(X)
    ◦   r3   : brokenWing(X)  ¬ flies(X)
    ◦   r3   > r2
   Consider bird(eagle) and brokenWing(owl) as
    facts
   Note that flies(eagle) and ¬ flies(owl) are not
    conflicting with each other!
   Reasoning is performed, in isolation, for each
    unique argument value
INPUT                  MAP phase Input
Facts in multiple files      <position in file, fact>

         File01
   --------------------       <0, bird(eagle)>
     bird(eagle)
                               <11, bird(owl)>
      bird(owl)
                             <0, bird(pigeon)>
        File02
                          <12, brokenWing(eagle)>
   -------------------
   bird(pigeon)           <29, brokenWing(owl)>
brokenWing(eagle)
 brokenWing(owl)
MAP phase Output                              Reduce phase Input




                       Grouping/Sorting
<argument,predicate>                       <argument, list(predicates)>

    <eagle, bird>
                                          <eagle, <bird, brokenWing>>
     <owl, bird>
                                          <owl, <bird, brokenWing>>
   <pigeon, bird>

<eagle, brokenWing>                            <pigeon, <bird>>

<owl, brokenWing>
Reduce phase Output
       (Final Output)
<Conclusions after reasoning>

      animal(eagle)
      ¬ flies(eagle)

       animal(owl)
       ¬ flies(owl)

      animal(pigeon)
       flies(pigeon)
   This work is the first to explore the feasibility
    of nonmonotonic reasoning over huge data
    sets
   We considered nonmonotonic reasoning in
    the form of defeasible logic and adapted the
    MapReduce framework for parallelization
   Our experimental results demonstrate that
    ◦ defeasible reasoning with billions of data is
      performant
    ◦ our approach has the potential to scale to trillions
      of facts.

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Data structures "1" (Lectures 2015-2016)
Data structures "1" (Lectures 2015-2016) Data structures "1" (Lectures 2015-2016)
Data structures "1" (Lectures 2015-2016)
 
Set Operations - Union Find and Bloom Filters
Set Operations - Union Find and Bloom FiltersSet Operations - Union Find and Bloom Filters
Set Operations - Union Find and Bloom Filters
 
Chapter 5: linked list data structure
Chapter 5: linked list data structureChapter 5: linked list data structure
Chapter 5: linked list data structure
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
 
Model-based GUI testing using UPPAAL
Model-based GUI testing using UPPAALModel-based GUI testing using UPPAAL
Model-based GUI testing using UPPAAL
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Fi...
HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Fi...HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Fi...
HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Fi...
 
Matching Dirty Data
Matching Dirty DataMatching Dirty Data
Matching Dirty Data
 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
 
Data structures cs301 power point slides lecture 01
Data structures   cs301 power point slides lecture 01Data structures   cs301 power point slides lecture 01
Data structures cs301 power point slides lecture 01
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboost
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
Data Structure Lec #1
Data Structure Lec #1Data Structure Lec #1
Data Structure Lec #1
 
Preparation Data Structures 03 abstract data_types
Preparation Data Structures 03 abstract data_typesPreparation Data Structures 03 abstract data_types
Preparation Data Structures 03 abstract data_types
 
Introduce spark (by 조창원)
Introduce spark (by 조창원)Introduce spark (by 조창원)
Introduce spark (by 조창원)
 
[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Memory Management In C++
Memory Management In C++Memory Management In C++
Memory Management In C++
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 

Semelhante a Towards Parallel Nonmonotonic Reasoning with Billions of Facts

Introduction to R
Introduction to RIntroduction to R
Introduction to R
agnonchik
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
Radek Maciaszek
 
Introduction To Erlang Final
Introduction To Erlang   FinalIntroduction To Erlang   Final
Introduction To Erlang Final
SinarShebl
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
James Wong
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
Fraboni Ec
 

Semelhante a Towards Parallel Nonmonotonic Reasoning with Billions of Facts (20)

Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
 
Cloudera - A Taste of random decision forests
Cloudera - A Taste of random decision forestsCloudera - A Taste of random decision forests
Cloudera - A Taste of random decision forests
 
Hadoop
HadoopHadoop
Hadoop
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoop
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
Survey onhpcs languages
Survey onhpcs languagesSurvey onhpcs languages
Survey onhpcs languages
 
Hands on data science with r.pptx
Hands  on data science with r.pptxHands  on data science with r.pptx
Hands on data science with r.pptx
 
ASFWS 2012 - Obfuscator, ou comment durcir un code source ou un binaire contr...
ASFWS 2012 - Obfuscator, ou comment durcir un code source ou un binaire contr...ASFWS 2012 - Obfuscator, ou comment durcir un code source ou un binaire contr...
ASFWS 2012 - Obfuscator, ou comment durcir un code source ou un binaire contr...
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browser
 
Reactive programming on Android
Reactive programming on AndroidReactive programming on Android
Reactive programming on Android
 
User biglm
User biglmUser biglm
User biglm
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Cloud jpl
Cloud jplCloud jpl
Cloud jpl
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
 
R Workshop for Beginners
R Workshop for BeginnersR Workshop for Beginners
R Workshop for Beginners
 
Introduction To Erlang Final
Introduction To Erlang   FinalIntroduction To Erlang   Final
Introduction To Erlang Final
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
 

Mais de PlanetData Network of Excellence

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
PlanetData Network of Excellence
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
PlanetData Network of Excellence
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
PlanetData Network of Excellence
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
PlanetData Network of Excellence
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
PlanetData Network of Excellence
 

Mais de PlanetData Network of Excellence (20)

Dl2014 slides
Dl2014 slidesDl2014 slides
Dl2014 slides
 
A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
 
Pay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching NetworksPay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching Networks
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
 
Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Towards Parallel Nonmonotonic Reasoning with Billions of Facts

  • 1. Authors: Ilias Tachmazidis, Grigoris Antoniou, Giorgos Flouris, Spyros Kotoulas Partially funded by PlanetData
  • 2. Huge data set coming from ◦ the Web, government authorities, scientific databases, sensors and more  Defeasible logic ◦ is suitable for encoding commonsense knowledge and reasoning ◦ avoids triviality of inference due to low-quality data  Defeasible logic has low complexity ◦ The consequences of a defeasible theory D can be computed in O(N) time, where N is the number of symbols in D
  • 3. Reasoning is performed in the presence of defeasible rules  Defeasible logic has been implemented for in-memory reasoning, however, it is not applicable for huge data set  Solution: scalability/parallelization using the MapReduce framework  Our approach is restricted to single- argument reasoning
  • 4. A multi-argument implementation has been accepted in ECAI 2012 (to appear)  Ilias Tachmazidis, Grigoris Antoniou, Giorgos Flouris, Spyros Kotoulas, and Lee McCluskey, ‘Large-scale Parallel Stratified Defeasible Reasoning’, in ECAI, (2012)
  • 5. Facts ◦ e.g. bird(eagle)  Strict Rules ◦ e.g. bird(X)  animal(X)  Defeasible Rules ◦ e.g. bird(X)  flies(X)
  • 6. Priority Relation (acyclic relation on the set of rules) –e.g. r: bird(X)  flies(X) r’: brokenWing(X)  ¬ flies(X) r’ > r
  • 7. Inspired by similar primitives in LISP and other functional languages  Operates exclusively on <key, value> pairs  Input and Output types of a MapReduce job: ◦ Input : <k1, v1> ◦ Map(k1,v1) → list(k2,v2) ◦ Reduce(k2, list (v2)) → list(k3,v3) ◦ Output : list(k3,v3)
  • 8. Provides an infrastructure that takes care of ◦ distribution of data ◦ management of fault tolerance ◦ results collection  For a specific problem ◦ developer writes a few routines which are following the general interface
  • 9. Rule decomposition ◦ the computation of each rule is assigned to a computer in the cloud ◦ difficult to achieve balanced work distribution  Data decomposition ◦ a subset of data is assigned to each computer in the cloud ◦ provides more fine-grained partitioning ◦ our solution is based on data decomposition
  • 10. Rule set: ◦ r1 : bird(X)  animal(X) ◦ r2 : bird(X)  flies(X) ◦ r3 : brokenWing(X)  ¬ flies(X) ◦ r3 > r2  Consider bird(eagle) and brokenWing(owl) as facts  Note that flies(eagle) and ¬ flies(owl) are not conflicting with each other!  Reasoning is performed, in isolation, for each unique argument value
  • 11. INPUT MAP phase Input Facts in multiple files <position in file, fact> File01 -------------------- <0, bird(eagle)> bird(eagle) <11, bird(owl)> bird(owl) <0, bird(pigeon)> File02 <12, brokenWing(eagle)> ------------------- bird(pigeon) <29, brokenWing(owl)> brokenWing(eagle) brokenWing(owl)
  • 12. MAP phase Output Reduce phase Input Grouping/Sorting <argument,predicate> <argument, list(predicates)> <eagle, bird> <eagle, <bird, brokenWing>> <owl, bird> <owl, <bird, brokenWing>> <pigeon, bird> <eagle, brokenWing> <pigeon, <bird>> <owl, brokenWing>
  • 13. Reduce phase Output (Final Output) <Conclusions after reasoning> animal(eagle) ¬ flies(eagle) animal(owl) ¬ flies(owl) animal(pigeon) flies(pigeon)
  • 14.
  • 15.
  • 16. This work is the first to explore the feasibility of nonmonotonic reasoning over huge data sets  We considered nonmonotonic reasoning in the form of defeasible logic and adapted the MapReduce framework for parallelization  Our experimental results demonstrate that ◦ defeasible reasoning with billions of data is performant ◦ our approach has the potential to scale to trillions of facts.