SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
Politecnico di Torino
Dipartimento di Automatica e Informatica
                                                      http://elite.polito.it




                 FaSet: A Set Theory Model for
                               Faceted Search
                Dario Bonino, Fulvio Corno, Laura Farinetti
Outline
       Faceted Search
       Goal
       The FaSet Set-Theoretical Model
       FaSet Relational Implementation




    2                    WI/IAT 2009, Milano, Italy   FaSet
Faceted Classification
       Originated in Library Science
           Ranganathan, 1962
       Content-based classification scheme
       Multi-dimensional
           Facet = classification dimension
       Multi-valued
           Focus = allowed value in one of the facets




    3                          WI/IAT 2009, Milano, Italy   FaSet
Example

Color     Shape               Taste               Facets
Yellow    Cube                Sweet
Red       Sphere              Bitter
Orange    Cone                Neutral
                                                  Allowed foci for
Green     Cylinder            Acid
                                                  each facet
Blue
White
Black

                     Choice of the foci
                     describing the item




 4                   WI/IAT 2009, Milano, Italy                 FaSet
Faceted Search Systems
       Faceted Classification
           Simple, intuitive, versatile, powerful
       Adopted by more and more web sites
           As a classification system for their
            products/items/documents/resources/…
           As a model for the user interface in search, filtering,
            refinement




    5                            WI/IAT 2009, Milano, Italy           FaSet
Examples




6          WI/IAT 2009, Milano, Italy   FaSet
Examples




7          WI/IAT 2009, Milano, Italy   FaSet
Examples




8          WI/IAT 2009, Milano, Italy   FaSet
Facets in the real world
       Multi-valued                        Color          Shape
        classification                      Yellow         Squared       ▼
                                            Red             Cube
           During classification
                                            Orange          Parallelepiped
           During search
                                            Green          Rounded       ▼
           AND vs OR semantics?                            Sphere
                                            Blue
       Hierarchical (nested)               White           Cylinder
        facets                              Black
           Parents selectable?             Other
                                                           Weight
       Incomplete classification                          0-50 g
       Numerical ranges                                   50-100 g
                                                           100+ g


    9                         WI/IAT 2009, Milano, Italy               FaSet
Facets in the Literature

User Interfaces                            Data and logic model
    Active research field                    Methodologies from
     since ~2000                               Library science
    Usability studies                         (Broughton, Vickery)
        Mainly for search                    Formal models
         interfaces                               Dynamic Taxonomies
    Application case studies                      (Sacco)
    Web vs desktop                               Uniformities, Lattices
                                                   (Priss)
     environment
                                                  Granular computing
    Mainly for multimedia
     data                                     Less applicable results

    10                       WI/IAT 2009, Milano, Italy                     FaSet
Goal of the paper
    Propose a formal model: FaSet
    for representing
        Faceted Classification of resources
        Faceted Search Interfaces for such resource sets
        Searching, Filtering, Ranking operations
    compatible with modern web applications
        Mathematically simple
        Easy mapping to Relational Algebra
        Decouple classification and resources
    versatile and flexible
        Supports all “real-world” variations on Facets

    11                       WI/IAT 2009, Milano, Italy     FaSet
Facets and Foci
    Facets: disjoint sets                                       U

        Fa, Fb, Fc, …                         Fb
    Facet space:
        U = Fa  Fb  Fc  …                               Fa
    Focus L: subset
        La  Fa                                            Fa
        Many foci for each facet
                                                    La<2>
    Focus name: index list                         La<1>
        La<i,j,k,…>                            La<1,1>
                                                La<1,2>



    12                      WI/IAT 2009, Milano, Italy               FaSet
Hierarchy
    Hierarchical nesting of                                                Fa
                                                     La<2>
     foci is represented by
                                                     La<1>
     subset containment
                                                    La<1,1>
        La<narrower> 
                                                    La<1,2>
         La<broader>
    Locus names are                           Incomplete taxonomy
     chosen to represent                           No overlap allowed
     hierarchical containment                      A focus may be larger
        La<i,j,k>  La<i,j>                        than the union of its sub-
        Reminds of Dewey Decimal
                                                    foci
         Classification



    13                         WI/IAT 2009, Milano, Italy                  FaSet
Classification (Facet)
    Resources r are                                                    Fa
                                                      La<2>
     classified w.r.t. the facet
                                                      La<1>
     space
                                                    La<1,1>
        “Projection”: r  Fa
                                                    La<1,2>
    We may only represent
     projections built by                                     r  Fa
     combining foci
        r  Fa = ∪p La<p>
    Just the focus names
     are needed
        {<1,1>,<2>}

    14                          WI/IAT 2009, Milano, Italy             FaSet
Classification (Multidimensional)
    On the multi-
     dimensional space, the                                        rU

     cartesian product is                   r  Fb

     taken
        r  U = rFa  rFb  ...
    Just the focus names                                 r  Fa
     are needed
     




    15                       WI/IAT 2009, Milano, Italy             FaSet
Searching in FaSet
    Resources r                                                 r1
        Classified as r  U                      Fb                  q
                                                            r2
    Query q
        Expressed uniformly as q  U                                     Fa

    Search = Filtering + Ranking
        Filtering: r is relevant to q iff: (r  U) ⋂ (q  U)  
        Ranking: estimate the similarity S(q, r) of r to q




    16                         WI/IAT 2009, Milano, Italy                 FaSet
Filtering
    All resources that match, even partially, with the
     query
        (r  U) ⋂ (q  U)  
    May be easily computed by checking focus names
    Prefix-compatibility: La<p1> ≍ La<p2> iff
        p1 = p2, or
        p1 is a prefix of p2, or
        p2 is a prefix of p1
    At least one couple of foci, per each facet, must be
     prefix-compatible
        ∀Fa : ∃ La<p1> ∈ q, La<p2> ∈ r : La<p1> ≍ La<p2>

    17                         WI/IAT 2009, Milano, Italy   FaSet
Example
                    L<>
          L<1>                                  L<2>
 L<1,1>   L<1,2>   L<1,3>           L<2,1>             L<2,2>


                   <1,3>                        <2>             q


           <1>                                                  r1
                                                       <2,2>    r2
          <1,2>    <1,3>                                        r3
 <1,1>    <1,2>                                                 r4




18                 WI/IAT 2009, Milano, Italy                        FaSet
Ranking
    Compute similarity between resource and query
    Often neglected by Faceted Search Interfaces
    Define a Similarity Measure S(q, r) ∈ [0,1]
        Compute similarity between matching foci (deeper
         matches give higher scores)
        Aggregate focus-based similarity measures in the same
         facet (fuzzy sum)
        Normalize facet-level results
        Aggregate facet-based similarity measures across all
         facets (fuzzy product)



    19                     WI/IAT 2009, Milano, Italy       FaSet
FaSet Relational Implementation
    The FaSet classification requires
        A constant set of Facets
        A constant set of Foci
        An “index” table storing the list of focus names for each
         resource                                             constant




           Resource
           Database




    20                        WI/IAT 2009, Milano, Italy           FaSet
FaSet Relational Implementation
    The FaSet search algorithm uses
        Set operations
        Universal and existential quantification
        Aggregate operations for computing ranking measures
    Directly supported by Relational DBMS primitives




    21                     WI/IAT 2009, Milano, Italy      FaSet
Future work
    Experimentation of FaSet on sample data sets
        Performance evaluation
    Integration with front-end AJAX interfaces
        CMS module
        MIT Exhibit
    Evaluation of the ranking
     algorithm from the
     Information Retrieval
     point of view



    22                     WI/IAT 2009, Milano, Italy   FaSet
Conclusions - FaSet
    Formally defined faceted Representation & Search
     model
    Light formalism
    Supports hierarchies, nesting, multiple classification,
     incomplete specifications, …
    Compatible with modern web development
     technologies

                                                      Thank
                                                       you!
    23                   WI/IAT 2009, Milano, Italy      FaSet

Mais conteúdo relacionado

Destaque

Destaque (11)

Jdbc[1]
Jdbc[1]Jdbc[1]
Jdbc[1]
 
Ontology languages and OWL
Ontology languages and OWLOntology languages and OWL
Ontology languages and OWL
 
Database access and JDBC
Database access and JDBCDatabase access and JDBC
Database access and JDBC
 
Architetture web - Linguaggi e standard - Web server, application server, dat...
Architetture web - Linguaggi e standard - Web server, application server, dat...Architetture web - Linguaggi e standard - Web server, application server, dat...
Architetture web - Linguaggi e standard - Web server, application server, dat...
 
Smart Buildings: dal campo al modello, andata e ritorno
Smart Buildings: dal campo al modello, andata e ritornoSmart Buildings: dal campo al modello, andata e ritorno
Smart Buildings: dal campo al modello, andata e ritorno
 
SPARQL and the Open Linked Data initiative
SPARQL and the Open Linked Data initiativeSPARQL and the Open Linked Data initiative
SPARQL and the Open Linked Data initiative
 
Comunicazione aumentativa alternativa - cenni (corso di Tecnologie per la Dis...
Comunicazione aumentativa alternativa - cenni (corso di Tecnologie per la Dis...Comunicazione aumentativa alternativa - cenni (corso di Tecnologie per la Dis...
Comunicazione aumentativa alternativa - cenni (corso di Tecnologie per la Dis...
 
Ontologies: introduction, design, languages and tools
Ontologies: introduction, design, languages and toolsOntologies: introduction, design, languages and tools
Ontologies: introduction, design, languages and tools
 
JavaFX fundamentals
JavaFX fundamentalsJavaFX fundamentals
JavaFX fundamentals
 
Web Architectures
Web ArchitecturesWeb Architectures
Web Architectures
 
Lists (Java Collections)
Lists (Java Collections)Lists (Java Collections)
Lists (Java Collections)
 

Último

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

FaSet: A Set Theory Model for Faceted Search

  • 1. Politecnico di Torino Dipartimento di Automatica e Informatica http://elite.polito.it FaSet: A Set Theory Model for Faceted Search Dario Bonino, Fulvio Corno, Laura Farinetti
  • 2. Outline  Faceted Search  Goal  The FaSet Set-Theoretical Model  FaSet Relational Implementation 2 WI/IAT 2009, Milano, Italy FaSet
  • 3. Faceted Classification  Originated in Library Science  Ranganathan, 1962  Content-based classification scheme  Multi-dimensional  Facet = classification dimension  Multi-valued  Focus = allowed value in one of the facets 3 WI/IAT 2009, Milano, Italy FaSet
  • 4. Example Color Shape Taste Facets Yellow Cube Sweet Red Sphere Bitter Orange Cone Neutral Allowed foci for Green Cylinder Acid each facet Blue White Black Choice of the foci describing the item 4 WI/IAT 2009, Milano, Italy FaSet
  • 5. Faceted Search Systems  Faceted Classification  Simple, intuitive, versatile, powerful  Adopted by more and more web sites  As a classification system for their products/items/documents/resources/…  As a model for the user interface in search, filtering, refinement 5 WI/IAT 2009, Milano, Italy FaSet
  • 6. Examples 6 WI/IAT 2009, Milano, Italy FaSet
  • 7. Examples 7 WI/IAT 2009, Milano, Italy FaSet
  • 8. Examples 8 WI/IAT 2009, Milano, Italy FaSet
  • 9. Facets in the real world  Multi-valued Color Shape classification Yellow Squared ▼ Red Cube  During classification Orange Parallelepiped  During search Green Rounded ▼  AND vs OR semantics? Sphere Blue  Hierarchical (nested) White Cylinder facets Black  Parents selectable? Other Weight  Incomplete classification 0-50 g  Numerical ranges 50-100 g 100+ g 9 WI/IAT 2009, Milano, Italy FaSet
  • 10. Facets in the Literature User Interfaces Data and logic model  Active research field  Methodologies from since ~2000 Library science  Usability studies (Broughton, Vickery)  Mainly for search  Formal models interfaces  Dynamic Taxonomies  Application case studies (Sacco)  Web vs desktop  Uniformities, Lattices (Priss) environment  Granular computing  Mainly for multimedia data  Less applicable results 10 WI/IAT 2009, Milano, Italy FaSet
  • 11. Goal of the paper  Propose a formal model: FaSet  for representing  Faceted Classification of resources  Faceted Search Interfaces for such resource sets  Searching, Filtering, Ranking operations  compatible with modern web applications  Mathematically simple  Easy mapping to Relational Algebra  Decouple classification and resources  versatile and flexible  Supports all “real-world” variations on Facets 11 WI/IAT 2009, Milano, Italy FaSet
  • 12. Facets and Foci  Facets: disjoint sets U  Fa, Fb, Fc, … Fb  Facet space:  U = Fa  Fb  Fc  … Fa  Focus L: subset  La  Fa Fa  Many foci for each facet La<2>  Focus name: index list La<1>  La<i,j,k,…> La<1,1> La<1,2> 12 WI/IAT 2009, Milano, Italy FaSet
  • 13. Hierarchy  Hierarchical nesting of Fa La<2> foci is represented by La<1> subset containment La<1,1>  La<narrower>  La<1,2> La<broader>  Locus names are  Incomplete taxonomy chosen to represent  No overlap allowed hierarchical containment  A focus may be larger  La<i,j,k>  La<i,j> than the union of its sub-  Reminds of Dewey Decimal foci Classification 13 WI/IAT 2009, Milano, Italy FaSet
  • 14. Classification (Facet)  Resources r are Fa La<2> classified w.r.t. the facet La<1> space La<1,1>  “Projection”: r  Fa La<1,2>  We may only represent projections built by r  Fa combining foci  r  Fa = ∪p La<p>  Just the focus names are needed  {<1,1>,<2>} 14 WI/IAT 2009, Milano, Italy FaSet
  • 15. Classification (Multidimensional)  On the multi- dimensional space, the rU cartesian product is r  Fb taken  r  U = rFa  rFb  ...  Just the focus names r  Fa are needed  15 WI/IAT 2009, Milano, Italy FaSet
  • 16. Searching in FaSet  Resources r r1  Classified as r  U Fb q r2  Query q  Expressed uniformly as q  U Fa  Search = Filtering + Ranking  Filtering: r is relevant to q iff: (r  U) ⋂ (q  U)    Ranking: estimate the similarity S(q, r) of r to q 16 WI/IAT 2009, Milano, Italy FaSet
  • 17. Filtering  All resources that match, even partially, with the query  (r  U) ⋂ (q  U)    May be easily computed by checking focus names  Prefix-compatibility: La<p1> ≍ La<p2> iff  p1 = p2, or  p1 is a prefix of p2, or  p2 is a prefix of p1  At least one couple of foci, per each facet, must be prefix-compatible  ∀Fa : ∃ La<p1> ∈ q, La<p2> ∈ r : La<p1> ≍ La<p2> 17 WI/IAT 2009, Milano, Italy FaSet
  • 18. Example L<> L<1> L<2> L<1,1> L<1,2> L<1,3> L<2,1> L<2,2> <1,3> <2> q <1> r1 <2,2> r2 <1,2> <1,3> r3 <1,1> <1,2> r4 18 WI/IAT 2009, Milano, Italy FaSet
  • 19. Ranking  Compute similarity between resource and query  Often neglected by Faceted Search Interfaces  Define a Similarity Measure S(q, r) ∈ [0,1]  Compute similarity between matching foci (deeper matches give higher scores)  Aggregate focus-based similarity measures in the same facet (fuzzy sum)  Normalize facet-level results  Aggregate facet-based similarity measures across all facets (fuzzy product) 19 WI/IAT 2009, Milano, Italy FaSet
  • 20. FaSet Relational Implementation  The FaSet classification requires  A constant set of Facets  A constant set of Foci  An “index” table storing the list of focus names for each resource constant Resource Database 20 WI/IAT 2009, Milano, Italy FaSet
  • 21. FaSet Relational Implementation  The FaSet search algorithm uses  Set operations  Universal and existential quantification  Aggregate operations for computing ranking measures  Directly supported by Relational DBMS primitives 21 WI/IAT 2009, Milano, Italy FaSet
  • 22. Future work  Experimentation of FaSet on sample data sets  Performance evaluation  Integration with front-end AJAX interfaces  CMS module  MIT Exhibit  Evaluation of the ranking algorithm from the Information Retrieval point of view 22 WI/IAT 2009, Milano, Italy FaSet
  • 23. Conclusions - FaSet  Formally defined faceted Representation & Search model  Light formalism  Supports hierarchies, nesting, multiple classification, incomplete specifications, …  Compatible with modern web development technologies Thank you! 23 WI/IAT 2009, Milano, Italy FaSet