SlideShare uma empresa Scribd logo
1 de 175
Engineering Web Search Applications Alessandro Bozzon Marco Brambilla Vienna July 5, 2010
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],About the speakers © 2010 Alessandro Bozzon, Marco Brambilla ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 ABOUT   //
About the tutorial ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 ABOUT  //
Agenda © 2010 Alessandro Bozzon, Marco Brambilla
AGENDA ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 AGENDA  //
Introduction © 2010 Alessandro Bozzon, Marco Brambilla
Search prevails ,[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 INTRODUCTION   //
Some numbers … ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 INTRODUCTION   //
…  more numbers … ,[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 INTRODUCTION   // [Ramakrishnan and Tomkins 2007]
Information Retrieval ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla INTRODUCTION // July 5, 2010
Information Retrieval Applications ,[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla INTRODUCTION   // ,[object Object],[object Object],[object Object],[object Object],Static Document Collection Ranked Result Ad-Hoc query Document Routing System Predetermined queries or User profiles Incoming  Documents
The nature of information retrieval ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla INTRODUCTION   //
Information Retrieval  is  NOT  Data Retrieval ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla INTRODUCTION   //
The Information Retrieval Process July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla INTRODUCTION  // Content Management Query analysis Query Interaction Generic search-oriented application B A C K E N D F R O N T E N D q’ q r r’ Search Result Composition Result Manipulation
Search Engine vs. Search Application ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla INTRODUCTION   //
Characterization of the user information need  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla INTRODUCTION   //
Evaluating an IR System ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla INTRODUCTION   //
Enterprise search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla INTRODUCTION   //
Case Studies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010  CASE STUDIES   //
YaGoBi ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla CASE STUDIES   //
The PHAROS Project ,[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 CASE STUDIES   //
The Search Computing Project ,[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 CASE STUDIES   //
Chansonnier ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010  CASE STUDIES   //
Requirements © 2010 Alessandro Bozzon, Marco Brambilla
Key Requirements and Design Dimensions for Web Search © 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 REQUIREMENTS   // ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Sources ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 REQUIREMENTS   //
Data Type ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 REQUIREMENTS   //
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Data Analysis July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // ,[object Object]
Search Engine _1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 REQUIREMENTS   //
Search Engine _2 ,[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 REQUIREMENTS   //
Query Format ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 REQUIREMENTS   //
YaGoBi ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   //
PHAROS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   //
PHAROS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   //
Query Federation in PHAROS July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // JPG Long/Lat XPath Keywords “ amsterdam” where[contains(“amsterdam”)]  and topic[contains(“building”)] Geo search R-tree index 52.37N 4.89 E Text search Inverted index XML search Semantic index Image search Similarity index Query analysis Federation
User Behavior ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // ,[object Object],[object Object],[object Object],Ricardo Baeza-Yates  Next Generation Search , 2 nd  SeCo Workshop,  Milan, 24/06/2010 Start End I am craving for a good  Wiener Schnitzel  and a  Sachertorte  in Vienna  Search Menu Reviews Map
Information Seeking  [Bates, 2002] July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // Bates, Marcia J. 2002. Toward an integrated model for information seeking and searching. In: The Fourth International Conference on Information Needs, Seeking and Use in Different Contexts.
Information Foraging ,[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 REQUIREMENTS   // ,[object Object],[object Object],[object Object],[object Object]
Moving between patches ,[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // July 5, 2010
Information seeking funnel [D. Rose, 2008] ,[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 REQUIREMENTS   //
Berrypicking vs. Orienteering vs. Teleporting ...  ,[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 REQUIREMENTS   //
…  vs. exploratory search ,[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 REQUIREMENTS   // ,[object Object],[object Object],[object Object],[object Object],[object Object]
Multi-domain Exploratory Search ,[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   //
Multi-domain Exploratory Search ,[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   //
Existing Approaches _1 ,[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // ,[object Object],[object Object]
Existing Approaches _2 ,[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // ,[object Object],[object Object]
The note-taking limit ,[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // [Aula and Russel, 2008]
Liquid Queries ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // ,[object Object]
Liquid Queries Definition _1 ,[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // Concert Artist Exhibition Restaurant Hotel Movie Metro Station Theatre Photo Landmark News Photo Concert Metro Station Restaurant News Exhibition Artist Hotel = inputs, outputs  +  GR = global ranking
Liquid Queries Definition _2 ,[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // Photo Concert Metro Station Restaurant News Exhibition Artist Hotel Expand
Result Exploration Support ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // ,[object Object],[object Object]
User Intent ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // [from SIGIR 2008 Tutorial, Baeza-Yates and Jones]  History nyonya food Singapore Airlines Jakarta Weather Nikon Finepix Car Rental Kuala Lumpur
Contextual Content Delivery ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // from Ricardo Baeza-Yates, Next Generation Search ,  2 nd  Search Computing Workshop, Milan, 24/06/2010 Demo: http://sandbox.yahoo.com/Motif
Relevance: the Top-k problem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   //
Result Diversification  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // Relevance Diversity
User Interface ,[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   // Shortcuts Deep Links Enhanced Results
User Interface July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   //
User Interface ,[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   //
User Interface ,[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   //
User Interface ,[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   //
User Interface ,[object Object],July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS   //
Performance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 REQUIREMENTS   //
Other Requirements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 REQUIREMENTS   //
Design © 2010 Alessandro Bozzon, Marco Brambilla
Designing Web Search Applications ,[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   //
Search Applications from 1000 feet © 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010  DESIGN   //
Bird eye view on Search Applications © 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010  DESIGN   //
Search Application Processes July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   //
An example of Indexing Process  July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   //
Pharos: the architecture July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   //
Search Computing: the architecture July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   // Main Query flow <Uses> relation
Search Computing: the architecture July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   // High level query “ Where can I attend a DB scientific conference close to  a beautiful beach reachable  with cheap flights?” Sub query 1 “ Where can I attend  a DB scientific  conference?” Sub query 2 “ place close to  a beautiful  beach?” Sub query 3 “ place reachable  with cheap flight?”
Search Computing: the architecture July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   // Low level query 1 ConfSearch(“DB”,placeX,dateY) Low level query 2 TourSearch(“Beach”,PlaceX) Low level query 3 Flight(“cost<200”,PlaceX,DateY)
Search Computing: the architecture July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   // Services invocations  and operators execution Presented results ESWC-Crete-Olympic CAISE- Hammamet – Alitalia TOOLS-Malaga-EasyJet Query plan Results
Design Dimensions July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   // Design Dimension Affected Process Values Retrieval Policy Indexing Push Pull Data Homogeneity Indexing Homogeneity Heterogeneity Data Analysis Indexing Mono Annotation Multi  Annotation Mono Modal Multi Modal Search Technology Indexing, Query and Result Presentation Search Engine(s) Type Homogeneity Heterogeneity Query Format Query and Result Presentation, User Interface Query Type Mono Modal Multi Modal Mono Domain Multi Domain User Interaction User Interface Direct Indirect Active Passive
Designing Web Search Applications -  A MDD approach ,[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   // ,[object Object],[object Object],[object Object]
Development Methodology ,[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   // ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
An example domain model Content Analysis / ER ,[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   //
An example process model Content Analysis / BPMN - WebML ,[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   // ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Refinement M2M Transformation M2T Transformation
An Example of Complex Process July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   // Analysis of audiovisual content Incremental analysis of audio-visual content with textual annotations
Modeling User Interface ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   // Alessandro Bozzon, Model-driven development of Search Based Web Applications, Ph.D Thesis, Politecnico di Milano, April 2009.
Pattern Example: Faceted Search July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   //
Pattern Example: Faceted Search July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   //
Pharos: Modeling User Interface July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   // http://www.youtube.com/watch?v=ZpxyNi6Ht50
Pharos: Modeling User Interface July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   // http://www.youtube.com/watch?v=ZpxyNi6Ht50 KEYWORD REFINEMENT FACETED REFINEMENT CONTENT-BASED REFINEMENT RESULT PRESENTATION
An Example of M2M Transformation BPMN*    WebML July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   //
MDD in Search Computing ,[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   //
Search Computing Model Example Search Service Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   //
Search Computing Query Meta-model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   //
Search Computing  Model Transformations ,[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   // 1 1 2 4 3 Prototype:  http://dbgroup.como.polimi.it/brambilla/SeCoMDA
Search Computing DSLs  (& Transformations): Panta Rhei ,[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla DESIGN   // D. Braga, S. Ceri, F. Corcoglioniti,M. Grossniklaus, and S. Vadacca: Panta Rhei: An Execution Model for Queries over Web Information Sources, http://www.search-computing.it/sites/cms.web.seco/files/pantarhei2010.pdf
Implementation © 2010 Alessandro Bozzon, Marco Brambilla
From the models to implementation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010  IMPLEMENTATION   //
Search Framework Vs. Search Engine ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010  IMPLEMENTATION   //
Open Source Search Vs Open Search ,[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010  IMPLEMENTATION   // www2010 Tutorial Open Source Tools, Drake & Jones, Yahoo! ,[object Object],API v. 2
Open Source Search High level comparison July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   // Extended version of www2010 Tutorial Open Source Tools, Drake & Jones, Yahoo! Product License Lang. Docs Ranking Users Parallel Scale Support Lucene Apache Java/ C++ Several Flexible Amazon Yes TB 5/5 Zettair BSD Like C HTML, TREC, TXT Flexible Research No TB 1/5 Indri BSD Like C++ Many Very Flexible Research Yes TB 1.5/5 Sphinx GPL C++ Many Flexible Craiglist Yes YB 4/5 Xapian GPL C++ Many Flexible GMane Yes TB 3/5 RDBMS BSD, GPL C Limited Maybe GB 4/5
Open Source Search Benchmark _1 ,[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Open Source Search Benchmark _2 ,[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Lucene ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Lucene Indexing Example July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Additional Indexing Features ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Lucene Querying Example July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   // Simple Term Query Query Parser
Additional Querying Features ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
More Features ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Why Open Search? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Open Search APIs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   // API v. 2
Google  Ajax Search API ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   // Code Snippets from Google Ajax Search API Documentation
Google  Custom Search API ,[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Microsoft  BING API ,[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Yahoo! Boss  (+ Search  Monkey) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   // WWW 2010 Tutorial Open Search Tools - Drake & Jones SearchMonkey keyterms Bookmarks
Search Frameworks –  State of the industry © 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010  IMPLEMENTATION   //
Open Source Search Frameworks © 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010  IMPLEMENTATION   //
SMILA ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Data Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Chansonnier Data Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
SMILA Architecture ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   // CONNECTIVITY SEARCH PROCESSING
Processing Pipelines ,[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   // Process Invocation Condition on a record attribute Condition on an annotation value Activity Invocation
Chansonnier Activities  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Distribution July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   // EclipseCON 2010: http://www.eclipsecon.org/2010/sessions/?page=sessions&id=1388
Content Analysis July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   // Text Annotation Media Annotation Transcoding Media Artifact Generation Media Analysis Media Analysis Text Analysis Text Analysis Media Artifact Generation Media Item Text Item
Text Processing ,[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Index Terms and Precision/Recall ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Text Analysis Process ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   // Document Parsing Lexycal Analysis Phrases Stemming Indexing Weighting Structure Full text Index Terms Stopwords Removal
Document Parsing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Lexical Analysis ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Tokenization ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Stopword Removal ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Phrases (noun groups) ,[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Phrases (noun groups) - Strategies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Thesauri ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Thesauri ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Stemming and Lemmatization ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Stemming ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Stemming Example July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Tools for text analysis _1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Tools for text analysis _2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla SECTION NAME   //
Multimedia Content Analysis ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010  IMPLEMENTATION   //
Audio Segmentation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010  IMPLEMENTATION   //
Video Segmentation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   // CREDITS:  Thorsten Hermes@SSMT2006
Speech Analysis ,[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   // ERIC DAVID JOHN
Classification of Music Genre ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   // Rock Dance!
Images: Low-level features ,[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Face Identification and Recognition ,[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   // CREDITS:  Thorsten Hermes@SSMT2006
Image Concept Detection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Image Object Identification ,[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Tools for media analysis _1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Tools for media analysis _2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION   //
Validation © 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
Disclaimer ,[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
Measures for IR Systems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
Measuring User Happiness ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
Evaluation measures ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
Relevance as a measure of user happiness ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
Evaluating Relevance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   // NOT  COVERED HERE
Information Need Translation ,[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
Set-based evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
Precision / Recall ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
F-Measure ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
Difficulties in using precision/recall ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
Ranked Based evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
Measures for Ranked Based evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
Discounted Cumulative Gain (DCG) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   // ,[object Object],[object Object],[object Object],[object Object]
Preference Judgment ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   // A: preferences in agreement D: preferences in disagreement N r  = # of non-relevant docs above relevant doc r, In the first R non-relevant R = number of relevant results for the query
Presentation Metrics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
Not all results are likely to be reviewed July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   // (Source:  iprospect.com  WhitePaper_2006_SearchEngineUserBehavior.pdf) ‏
Clicks and views depend on rank July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   // [Joachims et al, 2005]
Eye Tracking Studies July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
Heat Maps ,[object Object],[object Object],[object Object],[object Object],July 5, 2010  © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION   //
Thank you for your attention! ,[object Object],© 2010 Alessandro Bozzon, Marco Brambilla Alessandro Bozzon Dipartimento di Elettronica e Informazione Politecnico di Milano Milano, Italy [email_address] http://home.dei.polimi.it/bozzon  Marco Brambilla Dipartimento di Elettronica e Informazione Politecnico di Milano Milano, Italy [email_address] http://home.dei.polimi.it/mbrambil http://www.search-computing.org/book July 5, 2010  REFERENCES   //
References – Books ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010  REFERENCES   //
References - Tutorial ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010  REFERENCES   //
References - Papers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010  REFERENCES   //
References - Papers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Engineering Web Search Applications
Engineering Web Search Applications

Mais conteúdo relacionado

Mais procurados

Scraping and Clustering Techniques for the Characterization of Linkedin Profiles
Scraping and Clustering Techniques for the Characterization of Linkedin ProfilesScraping and Clustering Techniques for the Characterization of Linkedin Profiles
Scraping and Clustering Techniques for the Characterization of Linkedin Profilescsandit
 
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 -...
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 -...Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 -...
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 -...kcortis
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Peter Mika
 
Web Information Network Extraction and Analysis
Web Information Network Extraction and AnalysisWeb Information Network Extraction and Analysis
Web Information Network Extraction and AnalysisTim Weninger
 
710201947
710201947710201947
710201947IJRAT
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Peter Mika
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Digital Methods Initiative
 
The Importance of being LOUD
The Importance of being LOUDThe Importance of being LOUD
The Importance of being LOUDRobert Sanderson
 
The Future of Search - Martin White
The Future of Search - Martin WhiteThe Future of Search - Martin White
The Future of Search - Martin WhiteFindwise
 
Learning Resource Metadata Initiative: Vocabulary Development Best Practices
Learning Resource Metadata Initiative: Vocabulary Development Best PracticesLearning Resource Metadata Initiative: Vocabulary Development Best Practices
Learning Resource Metadata Initiative: Vocabulary Development Best PracticesMike Linksvayer
 
Opportunity and risk in social computing environments
Opportunity and risk in social computing environmentsOpportunity and risk in social computing environments
Opportunity and risk in social computing environmentsHazel Hall
 
Abraham
AbrahamAbraham
Abrahamanesah
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialPeter Mika
 

Mais procurados (15)

Scraping and Clustering Techniques for the Characterization of Linkedin Profiles
Scraping and Clustering Techniques for the Characterization of Linkedin ProfilesScraping and Clustering Techniques for the Characterization of Linkedin Profiles
Scraping and Clustering Techniques for the Characterization of Linkedin Profiles
 
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 -...
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 -...Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 -...
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 -...
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
 
Web Information Network Extraction and Analysis
Web Information Network Extraction and AnalysisWeb Information Network Extraction and Analysis
Web Information Network Extraction and Analysis
 
710201947
710201947710201947
710201947
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
 
Group 3
Group 3Group 3
Group 3
 
The Importance of being LOUD
The Importance of being LOUDThe Importance of being LOUD
The Importance of being LOUD
 
The Future of Search - Martin White
The Future of Search - Martin WhiteThe Future of Search - Martin White
The Future of Search - Martin White
 
Learning Resource Metadata Initiative: Vocabulary Development Best Practices
Learning Resource Metadata Initiative: Vocabulary Development Best PracticesLearning Resource Metadata Initiative: Vocabulary Development Best Practices
Learning Resource Metadata Initiative: Vocabulary Development Best Practices
 
Opportunity and risk in social computing environments
Opportunity and risk in social computing environmentsOpportunity and risk in social computing environments
Opportunity and risk in social computing environments
 
Abraham
AbrahamAbraham
Abraham
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorial
 

Destaque (20)

Consumer Durable Loans
Consumer Durable LoansConsumer Durable Loans
Consumer Durable Loans
 
Web Application Performance
Web Application PerformanceWeb Application Performance
Web Application Performance
 
E-commerce and M-commerce
E-commerce and M-commerceE-commerce and M-commerce
E-commerce and M-commerce
 
What is Portfolio Management
What is Portfolio ManagementWhat is Portfolio Management
What is Portfolio Management
 
Mobile Commerce: A Security Perspective
Mobile Commerce: A Security PerspectiveMobile Commerce: A Security Perspective
Mobile Commerce: A Security Perspective
 
Interferometric modulator (imod)
Interferometric modulator (imod)Interferometric modulator (imod)
Interferometric modulator (imod)
 
Mobile Ecosystem
Mobile EcosystemMobile Ecosystem
Mobile Ecosystem
 
Thread fastener
Thread fastenerThread fastener
Thread fastener
 
Introduction to Financial Services
Introduction to Financial ServicesIntroduction to Financial Services
Introduction to Financial Services
 
Receivable management presentation1
Receivable management presentation1Receivable management presentation1
Receivable management presentation1
 
Módulo 3. ventilación mecánica neonatal
Módulo 3. ventilación mecánica neonatalMódulo 3. ventilación mecánica neonatal
Módulo 3. ventilación mecánica neonatal
 
M commerce ppt
M commerce pptM commerce ppt
M commerce ppt
 
Mobile Tech Trends for 2017
Mobile Tech Trends for 2017Mobile Tech Trends for 2017
Mobile Tech Trends for 2017
 
Seminar Report on NFC
Seminar Report on NFCSeminar Report on NFC
Seminar Report on NFC
 
Instrumentacion-control-procesos
Instrumentacion-control-procesosInstrumentacion-control-procesos
Instrumentacion-control-procesos
 
What is VAVE
What is VAVE What is VAVE
What is VAVE
 
Antiemeticos..farma
Antiemeticos..farmaAntiemeticos..farma
Antiemeticos..farma
 
Samples Management System
Samples Management SystemSamples Management System
Samples Management System
 
Tomografía computada de energía dual
Tomografía computada de energía dualTomografía computada de energía dual
Tomografía computada de energía dual
 
eMBMS for LTE
eMBMS for LTE eMBMS for LTE
eMBMS for LTE
 

Semelhante a Engineering Web Search Applications

A Multimodal Approach to Incremental User Profile Building
A Multimodal Approach to Incremental User Profile Building A Multimodal Approach to Incremental User Profile Building
A Multimodal Approach to Incremental User Profile Building dannyijwest
 
a Model-driven development methodology for 3D User Interface for Information ...
a Model-driven development methodology for 3D User Interface for Information ...a Model-driven development methodology for 3D User Interface for Information ...
a Model-driven development methodology for 3D User Interface for Information ...Juan Manuel Gonzalez Calleros
 
Making things findable
Making things findableMaking things findable
Making things findablePeter Mika
 
Brand niemann06032010
Brand niemann06032010Brand niemann06032010
Brand niemann06032010Brand Niemann
 
Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010Roku
 
Chatbot
ChatbotChatbot
Chatbotijtsrd
 
IRJET- PDF Extraction using Data Mining Techniques
IRJET- PDF Extraction using Data Mining TechniquesIRJET- PDF Extraction using Data Mining Techniques
IRJET- PDF Extraction using Data Mining TechniquesIRJET Journal
 
IRJET - A Web-based College Enquiry Chatbot using .Net and Dataset
IRJET - A Web-based College Enquiry Chatbot using .Net and DatasetIRJET - A Web-based College Enquiry Chatbot using .Net and Dataset
IRJET - A Web-based College Enquiry Chatbot using .Net and DatasetIRJET Journal
 
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...Ajay Ohri
 
Information Architecture: Putting the "I" back in IT
Information Architecture:  Putting the "I" back in ITInformation Architecture:  Putting the "I" back in IT
Information Architecture: Putting the "I" back in ITLouis Rosenfeld
 
Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...
Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...
Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...Rana ZEINE, MD, PhD, MBA
 
Research Process
Research ProcessResearch Process
Research ProcessJedi Labs
 
Study on Issues in Managing and Protecting Data of IOT
Study on Issues in Managing and Protecting Data of IOTStudy on Issues in Managing and Protecting Data of IOT
Study on Issues in Managing and Protecting Data of IOTijsrd.com
 
Big social data analytics - social network analysis
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis Jari Jussila
 
Enterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTEnterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTBert Johnson
 
BoscoChat (A free Wi-Fi Chat Room in Android)
BoscoChat (A free Wi-Fi Chat Room in Android)BoscoChat (A free Wi-Fi Chat Room in Android)
BoscoChat (A free Wi-Fi Chat Room in Android)Samaresh Debbarma
 
Structured and Unstructured Information Extraction Using Text Mining and Natu...
Structured and Unstructured Information Extraction Using Text Mining and Natu...Structured and Unstructured Information Extraction Using Text Mining and Natu...
Structured and Unstructured Information Extraction Using Text Mining and Natu...rahulmonikasharma
 

Semelhante a Engineering Web Search Applications (20)

A Multimodal Approach to Incremental User Profile Building
A Multimodal Approach to Incremental User Profile Building A Multimodal Approach to Incremental User Profile Building
A Multimodal Approach to Incremental User Profile Building
 
a Model-driven development methodology for 3D User Interface for Information ...
a Model-driven development methodology for 3D User Interface for Information ...a Model-driven development methodology for 3D User Interface for Information ...
a Model-driven development methodology for 3D User Interface for Information ...
 
Making things findable
Making things findableMaking things findable
Making things findable
 
Proposal.docx
Proposal.docxProposal.docx
Proposal.docx
 
Brand niemann06032010
Brand niemann06032010Brand niemann06032010
Brand niemann06032010
 
Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010
 
Chatbot
ChatbotChatbot
Chatbot
 
IRJET- PDF Extraction using Data Mining Techniques
IRJET- PDF Extraction using Data Mining TechniquesIRJET- PDF Extraction using Data Mining Techniques
IRJET- PDF Extraction using Data Mining Techniques
 
IRJET - A Web-based College Enquiry Chatbot using .Net and Dataset
IRJET - A Web-based College Enquiry Chatbot using .Net and DatasetIRJET - A Web-based College Enquiry Chatbot using .Net and Dataset
IRJET - A Web-based College Enquiry Chatbot using .Net and Dataset
 
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
 
LouRosenfeldInterview
LouRosenfeldInterviewLouRosenfeldInterview
LouRosenfeldInterview
 
Information Architecture: Putting the "I" back in IT
Information Architecture:  Putting the "I" back in ITInformation Architecture:  Putting the "I" back in IT
Information Architecture: Putting the "I" back in IT
 
Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...
Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...
Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...
 
Research Process
Research ProcessResearch Process
Research Process
 
Study on Issues in Managing and Protecting Data of IOT
Study on Issues in Managing and Protecting Data of IOTStudy on Issues in Managing and Protecting Data of IOT
Study on Issues in Managing and Protecting Data of IOT
 
Big social data analytics - social network analysis
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis
 
Enterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTEnterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FAST
 
BoscoChat (A free Wi-Fi Chat Room in Android)
BoscoChat (A free Wi-Fi Chat Room in Android)BoscoChat (A free Wi-Fi Chat Room in Android)
BoscoChat (A free Wi-Fi Chat Room in Android)
 
Structured and Unstructured Information Extraction Using Text Mining and Natu...
Structured and Unstructured Information Extraction Using Text Mining and Natu...Structured and Unstructured Information Extraction Using Text Mining and Natu...
Structured and Unstructured Information Extraction Using Text Mining and Natu...
 

Mais de Alessandro Bozzon

Weaving the Web of People and Things for Intelligent Cities
Weaving the Web of People and Things for Intelligent CitiesWeaving the Web of People and Things for Intelligent Cities
Weaving the Web of People and Things for Intelligent CitiesAlessandro Bozzon
 
Trustworthy Micro-task Crowdsourcing: Challenges and Opportunities
Trustworthy Micro-task Crowdsourcing: Challenges and OpportunitiesTrustworthy Micro-task Crowdsourcing: Challenges and Opportunities
Trustworthy Micro-task Crowdsourcing: Challenges and OpportunitiesAlessandro Bozzon
 
SAIL 2015 Crowdmanagement Experiment. Pitch slides
SAIL 2015 Crowdmanagement Experiment. Pitch slidesSAIL 2015 Crowdmanagement Experiment. Pitch slides
SAIL 2015 Crowdmanagement Experiment. Pitch slidesAlessandro Bozzon
 
Social Data Science For Intelligent Cities
Social Data Science For Intelligent CitiesSocial Data Science For Intelligent Cities
Social Data Science For Intelligent CitiesAlessandro Bozzon
 
Pattern-Based Specification of Crowdsourcing Applications
Pattern-Based Specification of Crowdsourcing ApplicationsPattern-Based Specification of Crowdsourcing Applications
Pattern-Based Specification of Crowdsourcing ApplicationsAlessandro Bozzon
 
ICWE 2013 - Slides From The Poster And Demo Session
ICWE 2013 - Slides From The Poster And Demo SessionICWE 2013 - Slides From The Poster And Demo Session
ICWE 2013 - Slides From The Poster And Demo SessionAlessandro Bozzon
 
An Introduction to Human Computation and Games With A Purpose - Part I
An Introduction to Human Computation and Games With A Purpose - Part IAn Introduction to Human Computation and Games With A Purpose - Part I
An Introduction to Human Computation and Games With A Purpose - Part IAlessandro Bozzon
 
A Service-Based Architecture for Multi-domain Search on the Web
A Service-Based Architecture for Multi-domain Search on the WebA Service-Based Architecture for Multi-domain Search on the Web
A Service-Based Architecture for Multi-domain Search on the WebAlessandro Bozzon
 
Liquid Query: Multi-domain Exploratory Search on the Web
Liquid Query: Multi-domain Exploratory Search on the WebLiquid Query: Multi-domain Exploratory Search on the Web
Liquid Query: Multi-domain Exploratory Search on the WebAlessandro Bozzon
 

Mais de Alessandro Bozzon (11)

Weaving the Web of People and Things for Intelligent Cities
Weaving the Web of People and Things for Intelligent CitiesWeaving the Web of People and Things for Intelligent Cities
Weaving the Web of People and Things for Intelligent Cities
 
Trustworthy Micro-task Crowdsourcing: Challenges and Opportunities
Trustworthy Micro-task Crowdsourcing: Challenges and OpportunitiesTrustworthy Micro-task Crowdsourcing: Challenges and Opportunities
Trustworthy Micro-task Crowdsourcing: Challenges and Opportunities
 
SAIL 2015 Crowdmanagement Experiment. Pitch slides
SAIL 2015 Crowdmanagement Experiment. Pitch slidesSAIL 2015 Crowdmanagement Experiment. Pitch slides
SAIL 2015 Crowdmanagement Experiment. Pitch slides
 
Social Data Science For Intelligent Cities
Social Data Science For Intelligent CitiesSocial Data Science For Intelligent Cities
Social Data Science For Intelligent Cities
 
Pattern-Based Specification of Crowdsourcing Applications
Pattern-Based Specification of Crowdsourcing ApplicationsPattern-Based Specification of Crowdsourcing Applications
Pattern-Based Specification of Crowdsourcing Applications
 
ICWE 2013 - Slides From The Poster And Demo Session
ICWE 2013 - Slides From The Poster And Demo SessionICWE 2013 - Slides From The Poster And Demo Session
ICWE 2013 - Slides From The Poster And Demo Session
 
An Introduction to Human Computation and Games With A Purpose - Part I
An Introduction to Human Computation and Games With A Purpose - Part IAn Introduction to Human Computation and Games With A Purpose - Part I
An Introduction to Human Computation and Games With A Purpose - Part I
 
Reactive crowdsourcing
Reactive crowdsourcingReactive crowdsourcing
Reactive crowdsourcing
 
A Service-Based Architecture for Multi-domain Search on the Web
A Service-Based Architecture for Multi-domain Search on the WebA Service-Based Architecture for Multi-domain Search on the Web
A Service-Based Architecture for Multi-domain Search on the Web
 
Search Computing
Search ComputingSearch Computing
Search Computing
 
Liquid Query: Multi-domain Exploratory Search on the Web
Liquid Query: Multi-domain Exploratory Search on the WebLiquid Query: Multi-domain Exploratory Search on the Web
Liquid Query: Multi-domain Exploratory Search on the Web
 

Último

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 

Último (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 

Engineering Web Search Applications

  • 1. Engineering Web Search Applications Alessandro Bozzon Marco Brambilla Vienna July 5, 2010
  • 2.
  • 3.
  • 4. Agenda © 2010 Alessandro Bozzon, Marco Brambilla
  • 5.
  • 6. Introduction © 2010 Alessandro Bozzon, Marco Brambilla
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. The Information Retrieval Process July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla INTRODUCTION // Content Management Query analysis Query Interaction Generic search-oriented application B A C K E N D F R O N T E N D q’ q r r’ Search Result Composition Result Manipulation
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Requirements © 2010 Alessandro Bozzon, Marco Brambilla
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35. Query Federation in PHAROS July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS // JPG Long/Lat XPath Keywords “ amsterdam” where[contains(“amsterdam”)] and topic[contains(“building”)] Geo search R-tree index 52.37N 4.89 E Text search Inverted index XML search Semantic index Image search Similarity index Query analysis Federation
  • 36.
  • 37. Information Seeking [Bates, 2002] July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS // Bates, Marcia J. 2002. Toward an integrated model for information seeking and searching. In: The Fourth International Conference on Information Needs, Seeking and Use in Different Contexts.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57. User Interface July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS //
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64. Design © 2010 Alessandro Bozzon, Marco Brambilla
  • 65.
  • 66. Search Applications from 1000 feet © 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 DESIGN //
  • 67. Bird eye view on Search Applications © 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 DESIGN //
  • 68. Search Application Processes July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla DESIGN //
  • 69. An example of Indexing Process July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla DESIGN //
  • 70. Pharos: the architecture July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla DESIGN //
  • 71. Search Computing: the architecture July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla DESIGN // Main Query flow <Uses> relation
  • 72. Search Computing: the architecture July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla DESIGN // High level query “ Where can I attend a DB scientific conference close to a beautiful beach reachable with cheap flights?” Sub query 1 “ Where can I attend a DB scientific conference?” Sub query 2 “ place close to a beautiful beach?” Sub query 3 “ place reachable with cheap flight?”
  • 73. Search Computing: the architecture July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla DESIGN // Low level query 1 ConfSearch(“DB”,placeX,dateY) Low level query 2 TourSearch(“Beach”,PlaceX) Low level query 3 Flight(“cost<200”,PlaceX,DateY)
  • 74. Search Computing: the architecture July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla DESIGN // Services invocations and operators execution Presented results ESWC-Crete-Olympic CAISE- Hammamet – Alitalia TOOLS-Malaga-EasyJet Query plan Results
  • 75. Design Dimensions July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla DESIGN // Design Dimension Affected Process Values Retrieval Policy Indexing Push Pull Data Homogeneity Indexing Homogeneity Heterogeneity Data Analysis Indexing Mono Annotation Multi Annotation Mono Modal Multi Modal Search Technology Indexing, Query and Result Presentation Search Engine(s) Type Homogeneity Heterogeneity Query Format Query and Result Presentation, User Interface Query Type Mono Modal Multi Modal Mono Domain Multi Domain User Interaction User Interface Direct Indirect Active Passive
  • 76.
  • 77.
  • 78.
  • 79.
  • 80. An Example of Complex Process July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla DESIGN // Analysis of audiovisual content Incremental analysis of audio-visual content with textual annotations
  • 81.
  • 82. Pattern Example: Faceted Search July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla DESIGN //
  • 83. Pattern Example: Faceted Search July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla DESIGN //
  • 84. Pharos: Modeling User Interface July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla DESIGN // http://www.youtube.com/watch?v=ZpxyNi6Ht50
  • 85. Pharos: Modeling User Interface July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla DESIGN // http://www.youtube.com/watch?v=ZpxyNi6Ht50 KEYWORD REFINEMENT FACETED REFINEMENT CONTENT-BASED REFINEMENT RESULT PRESENTATION
  • 86. An Example of M2M Transformation BPMN*  WebML July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla DESIGN //
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 92. Implementation © 2010 Alessandro Bozzon, Marco Brambilla
  • 93.
  • 94.
  • 95.
  • 96. Open Source Search High level comparison July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION // Extended version of www2010 Tutorial Open Source Tools, Drake & Jones, Yahoo! Product License Lang. Docs Ranking Users Parallel Scale Support Lucene Apache Java/ C++ Several Flexible Amazon Yes TB 5/5 Zettair BSD Like C HTML, TREC, TXT Flexible Research No TB 1/5 Indri BSD Like C++ Many Very Flexible Research Yes TB 1.5/5 Sphinx GPL C++ Many Flexible Craiglist Yes YB 4/5 Xapian GPL C++ Many Flexible GMane Yes TB 3/5 RDBMS BSD, GPL C Limited Maybe GB 4/5
  • 97.
  • 98.
  • 99.
  • 100. Lucene Indexing Example July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION //
  • 101.
  • 102. Lucene Querying Example July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION // Simple Term Query Query Parser
  • 103.
  • 104.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111. Search Frameworks – State of the industry © 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 IMPLEMENTATION //
  • 112. Open Source Search Frameworks © 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 IMPLEMENTATION //
  • 113.
  • 114.
  • 115.
  • 116.
  • 117.
  • 118.
  • 119. Distribution July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION // EclipseCON 2010: http://www.eclipsecon.org/2010/sessions/?page=sessions&id=1388
  • 120. Content Analysis July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION // Text Annotation Media Annotation Transcoding Media Artifact Generation Media Analysis Media Analysis Text Analysis Text Analysis Media Artifact Generation Media Item Text Item
  • 121.
  • 122.
  • 123.
  • 124.
  • 125.
  • 126.
  • 127.
  • 128.
  • 129.
  • 130.
  • 131.
  • 132.
  • 133.
  • 134. Stemming Example July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla IMPLEMENTATION //
  • 135.
  • 136.
  • 137.
  • 138.
  • 139.
  • 140.
  • 141.
  • 142.
  • 143.
  • 144.
  • 145.
  • 146.
  • 147.
  • 148. Validation © 2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
  • 149.
  • 150.
  • 151.
  • 152.
  • 153.
  • 154.
  • 155.
  • 156.
  • 157.
  • 158.
  • 159.
  • 160.
  • 161.
  • 162.
  • 163.
  • 164.
  • 165. Not all results are likely to be reviewed July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION // (Source: iprospect.com WhitePaper_2006_SearchEngineUserBehavior.pdf) ‏
  • 166. Clicks and views depend on rank July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION // [Joachims et al, 2005]
  • 167. Eye Tracking Studies July 5, 2010 © 2010 Alessandro Bozzon, Marco Brambilla VALIDATION //
  • 168.
  • 169.
  • 170.
  • 171.
  • 172.
  • 173.

Notas do Editor

  1. i.e. it might not be clear to the system whether the user is “recall-oriented” or “precision-oriented”
  2. In information retrieval, users express their information needs as queries sub- mitted to the system. While in data management systems like data-bases user are often required to express queries in a formal, structured language (e.g., SQL, XQuery, which have exact matching predicates and unambiguous se- mantics), in information retrieval the semantics of the query corresponds to the semantics associated with its content, which is interpreted in order to re- trieve the relevant results. Hence, it is not possible to provide a taxonomy for information retrieval queries based, for instance, on the expressive power of the underlying query language. Nonetheless, we can provide a functional classification of queries as follows.
  3. complex search is characterized by: multiple searches, possibly over multiple sessions and spanning multiple sources of information; a combination of exploration and more directed information finding activities; the need of note-taking, the variation of the search goal during the search process.
  4. In information retrieval, users express their information needs as queries sub- mitted to the system. While in data management systems like data-bases user are often required to express queries in a formal, structured language (e.g., SQL, XQuery, which have exact matching predicates and unambiguous se- mantics), in information retrieval the semantics of the query corresponds to the semantics associated with its content, which is interpreted in order to re- trieve the relevant results. Hence, it is not possible to provide a taxonomy for information retrieval queries based, for instance, on the expressive power of the underlying query language. Nonetheless, we can provide a functional classification of queries as follows.
  5. In information retrieval, users express their information needs as queries sub- mitted to the system. While in data management systems like data-bases user are often required to express queries in a formal, structured language (e.g., SQL, XQuery, which have exact matching predicates and unambiguous se- mantics), in information retrieval the semantics of the query corresponds to the semantics associated with its content, which is interpreted in order to re- trieve the relevant results. Hence, it is not possible to provide a taxonomy for information retrieval queries based, for instance, on the expressive power of the underlying query language. Nonetheless, we can provide a functional classification of queries as follows.
  6. From an high-level perspective, “search” is enabled by mechanisms which allow the extraction of contents from data repositories (e.g., text file, audio file, video file, databases, etc). Contents are therefore processed in order to build an index of the managed information, optimized for efficiently answer to users’ queries. Before being indexed, contents are analyzed and enriched with annotations 1 that build contents’ representation. Along with the index, search leverages on ranking models, i.e., mathematical methods that associates a score to the relevance of a content item w.r.t. a query. Once contents are indexed, multiple user interfaces (e.g., Web applications) provide users the means to interact with the search engine by executing queries and displaying the retrieved results.
  7. We define (i) an Indexing process (represented as a dashed line), which addresses the indexation of contents coming from the application data sources (thus involving data retrieval from external sources, transformation or aggregation of the retrieved data and, finally, their indexation) (ii) a Query and Result Presentation (QRP) process (represented as a solid line), addressing the operations related to query execution, orchestration and result-set composition (iii) a User Interaction process (represented as a dotted line), i.e., the way users interact with the application’s functionalities.
  8. One aspect of the proposed development framework is the definition of a methodology for the design and implementation of the application to be produced. A development approach based on a formal methodology and appropriate high level modeling languages smoothly incorporates change management into the mainstream production life-cycle, and greatly reduces the risk of breaking the software engineering process due to the occurrence of changes. The proposed methodology follows the path of the MDD approach by leveraging on a incremental, iterative design steps that foster separation of concerns among the actors involved in the SBA design. The Conceptual Design macro activity represents the core of the development lifecycle, since it involves the main design activities In the terminology of MDD, the BPMN Process Model can be seen as a Computation Independent Model (CIM), which specifies SBA requirements for the CAI and QRP processes; as we will see, instead, the UI process is address as an Interaction pattern composition activity. The WebML application model is a Platform Independent Model (PIM), which exploits SOA and Web hypertext interfaces as a technical space. Finally, the application code is a Platform Specific Model (PSM) for the Java 2 technical space. Initially, requirements are conceptualized in a Domain Model, which formalizes the essential data objects managed by the application, and a Process Model, which pinpoints the workflow of the CAI,QRP and UI processes. The link between the domain and process models is established by the type of objects that flow between activities. The designed solutions do not take into account domain specific informations like the schema of the adopted search technologies, or the format of the annotations produced by the analysis components. Nonetheless, the focus on a specific class of applications allows one to include, in the business model, high-level concepts relative to the applications’ domain. For SBA, for instance, the concept of query, user, index and so on. The use of an high-level model combined with coarse grained domain concepts allows one to address the designed application in perspective, possibly by creating designs that can be applied to classes of applications (e.g., audiovisual search engines), more than punctual solutions. Abstract-level notation, though, cannot be translated into running code,due to the lack of platform-specific details (e.g., the technologies adopted by actual search engines, analysis components, deployment platform etc.) needed to enact code generation. The Domain Model and Process Model are then subject to a first (CIM to PIM) transformation, which produces the Application Model and process metadata. objects. Therefore, coarse-grained design is followed by refinements that take into account more domain-specific information, like the structure and format for the contents, the annotations and indexes. To do so, a finer grained model is adopted, in order to enable the definition of domain-and application-specific details that can lead to automatic code generation. The proposed approach is generic enough in order to adopt alternative modeling languages, both for process and application design. This slide discusses how to derive an application model from high-level process model. The proposed framework employ the BPMN modeling language for process specification and the WebML modeling language for the design of hypertextes and Web service orchestrations
  9. Let’s now have a bird’s eye view on some reference, example design for all the 3 identified SBA’s processes. The CAI process can be defined as the work to be performed by the actors of a SBA to achieve the indexation of a content item . The goal of the domain model is to formalize content- and index-related data and metadata managed by the search applications. Such models build on five basic domain concepts: + Content Item : a Content Item is an individual information unit which is relevant in a search based Web application for indexing purposes. + Annotation : an annotation is the textual information associated with a content item for indexing and searching purposes. Such information might be of different nature, being both manual annotation, provided by the content provider or by the user, and automatically generated annotation, produced by the search application during the Indexing process. + Usage Group : Content Items are published by one or more Content Provider, which is responsible for their publication. A Usage Group is an access profile specified by a content provider to define the set of operations allowed for a given content item to a set of users: + Index : the notion of Index, well known in many disciplines of computer science, denotes a data structure designed in order to optimize speed and performance in finding relevant content items for a search query.
  10. User interaction design, instead, requires a little paradigmatic shift in the proposed methodology, since we manage it not as a process but as an assembly of standard interaction schema expressed as patterns. The reason for this shift stands in the common knowledge that the user interaction cannot be expressed as a linear process, given that users acts driven by task which cannot always be serialized . Traditional information retrieval is inherently based on users searching for information, the so-called “information need” . Recent studies extended the importance of such cognitive process, embedding it into a broader category named information seeking . Such an extension is motivated by the fact that information needs and retrieval stem from social, cultural, biological, and anthropological contexts, that broaden the ways information are gathered. A commonly accepted taxonomy of information seeking has four modes are identified. (This taxonomy considers two orthogonal classification dimensions: Directed and Undirected respectively refer to whether an individual explicitly seeks information by specifying her need by means of a query, or is more or less randomly exposing herself to information; Active and Passive, instead, refer to whether the individual does anything actively to acquire information, or she is passively available to absorb information, but does not seek it out. With the advent of the so-called Web 2.0, the four information seeking interactions listed in the previous section have been enhanced by the availability of new features. The transformation of end-users from passive recipients of content and communication into active contributors gave them new flavors, providing additional means for all the four interaction modes.) We identified more than 30 patterns, that we organized into 3 categories: + Query and result presentation patterns, containing general-purpose patterns that enable the execution and the presentation of the results of queries addressed to the search application; + Information Interaction patterns, for the specification of the four information seeking modalities presented in the previous section; + Permission Management patterns, which contains general purpose patterns that enable usage permission management.
  11. User interaction design, instead, requires a little paradigmatic shift in the proposed methodology, since we manage it not as a process but as an assembly of standard interaction schema expressed as patterns. The reason for this shift stands in the common knowledge that the user interaction cannot be expressed as a linear process, given that users acts driven by task which cannot always be serialized . Traditional information retrieval is inherently based on users searching for information, the so-called “information need” . Recent studies extended the importance of such cognitive process, embedding it into a broader category named information seeking . Such an extension is motivated by the fact that information needs and retrieval stem from social, cultural, biological, and anthropological contexts, that broaden the ways information are gathered. A commonly accepted taxonomy of information seeking has four modes are identified. (This taxonomy considers two orthogonal classification dimensions: Directed and Undirected respectively refer to whether an individual explicitly seeks information by specifying her need by means of a query, or is more or less randomly exposing herself to information; Active and Passive, instead, refer to whether the individual does anything actively to acquire information, or she is passively available to absorb information, but does not seek it out. With the advent of the so-called Web 2.0, the four information seeking interactions listed in the previous section have been enhanced by the availability of new features. The transformation of end-users from passive recipients of content and communication into active contributors gave them new flavors, providing additional means for all the four interaction modes.) We identified more than 30 patterns, that we organized into 3 categories: + Query and result presentation patterns, containing general-purpose patterns that enable the execution and the presentation of the results of queries addressed to the search application; + Information Interaction patterns, for the specification of the four information seeking modalities presented in the previous section; + Permission Management patterns, which contains general purpose patterns that enable usage permission management.
  12. User interaction design, instead, requires a little paradigmatic shift in the proposed methodology, since we manage it not as a process but as an assembly of standard interaction schema expressed as patterns. The reason for this shift stands in the common knowledge that the user interaction cannot be expressed as a linear process, given that users acts driven by task which cannot always be serialized . Traditional information retrieval is inherently based on users searching for information, the so-called “information need” . Recent studies extended the importance of such cognitive process, embedding it into a broader category named information seeking . Such an extension is motivated by the fact that information needs and retrieval stem from social, cultural, biological, and anthropological contexts, that broaden the ways information are gathered. A commonly accepted taxonomy of information seeking has four modes are identified. (This taxonomy considers two orthogonal classification dimensions: Directed and Undirected respectively refer to whether an individual explicitly seeks information by specifying her need by means of a query, or is more or less randomly exposing herself to information; Active and Passive, instead, refer to whether the individual does anything actively to acquire information, or she is passively available to absorb information, but does not seek it out. With the advent of the so-called Web 2.0, the four information seeking interactions listed in the previous section have been enhanced by the availability of new features. The transformation of end-users from passive recipients of content and communication into active contributors gave them new flavors, providing additional means for all the four interaction modes.) We identified more than 30 patterns, that we organized into 3 categories: + Query and result presentation patterns, containing general-purpose patterns that enable the execution and the presentation of the results of queries addressed to the search application; + Information Interaction patterns, for the specification of the four information seeking modalities presented in the previous section; + Permission Management patterns, which contains general purpose patterns that enable usage permission management.
  13. User interaction design, instead, requires a little paradigmatic shift in the proposed methodology, since we manage it not as a process but as an assembly of standard interaction schema expressed as patterns. The reason for this shift stands in the common knowledge that the user interaction cannot be expressed as a linear process, given that users acts driven by task which cannot always be serialized . Traditional information retrieval is inherently based on users searching for information, the so-called “information need” . Recent studies extended the importance of such cognitive process, embedding it into a broader category named information seeking . Such an extension is motivated by the fact that information needs and retrieval stem from social, cultural, biological, and anthropological contexts, that broaden the ways information are gathered. A commonly accepted taxonomy of information seeking has four modes are identified. (This taxonomy considers two orthogonal classification dimensions: Directed and Undirected respectively refer to whether an individual explicitly seeks information by specifying her need by means of a query, or is more or less randomly exposing herself to information; Active and Passive, instead, refer to whether the individual does anything actively to acquire information, or she is passively available to absorb information, but does not seek it out. With the advent of the so-called Web 2.0, the four information seeking interactions listed in the previous section have been enhanced by the availability of new features. The transformation of end-users from passive recipients of content and communication into active contributors gave them new flavors, providing additional means for all the four interaction modes.) We identified more than 30 patterns, that we organized into 3 categories: + Query and result presentation patterns, containing general-purpose patterns that enable the execution and the presentation of the results of queries addressed to the search application; + Information Interaction patterns, for the specification of the four information seeking modalities presented in the previous section; + Permission Management patterns, which contains general purpose patterns that enable usage permission management.
  14. User interaction design, instead, requires a little paradigmatic shift in the proposed methodology, since we manage it not as a process but as an assembly of standard interaction schema expressed as patterns. The reason for this shift stands in the common knowledge that the user interaction cannot be expressed as a linear process, given that users acts driven by task which cannot always be serialized . Traditional information retrieval is inherently based on users searching for information, the so-called “information need” . Recent studies extended the importance of such cognitive process, embedding it into a broader category named information seeking . Such an extension is motivated by the fact that information needs and retrieval stem from social, cultural, biological, and anthropological contexts, that broaden the ways information are gathered. A commonly accepted taxonomy of information seeking has four modes are identified. (This taxonomy considers two orthogonal classification dimensions: Directed and Undirected respectively refer to whether an individual explicitly seeks information by specifying her need by means of a query, or is more or less randomly exposing herself to information; Active and Passive, instead, refer to whether the individual does anything actively to acquire information, or she is passively available to absorb information, but does not seek it out. With the advent of the so-called Web 2.0, the four information seeking interactions listed in the previous section have been enhanced by the availability of new features. The transformation of end-users from passive recipients of content and communication into active contributors gave them new flavors, providing additional means for all the four interaction modes.) We identified more than 30 patterns, that we organized into 3 categories: + Query and result presentation patterns, containing general-purpose patterns that enable the execution and the presentation of the results of queries addressed to the search application; + Information Interaction patterns, for the specification of the four information seeking modalities presented in the previous section; + Permission Management patterns, which contains general purpose patterns that enable usage permission management.
  15. Thanks to the implemented extensions, we inject more information in the higher level model, thus leading to: + finer-grained application models + less errors + more efficiency. Transformations were implemented in ATL, a language for model transformations. Here’s a graphical example of model transformation among BPMN* activities and WebML model, and here’s just to give you a hint of how transformations are coded
  16. Indri/Lemur Language modeling BM25, Okapi, Cosine similarity, inQuery Lucene TF-IDF, weighted by term occurrences Fielded search Terrier Okapi BM25, language modeling and TF-IDF Divergence from Randomness Your own re-ranking code using open search
  17. Not enough comparative benchmarks out there. Hard to do; we really need standards Optimize each platform, per hardware and data set Lot of platforms, with different APIs, options and numerical settings Need good diverse data sets, small &amp; large Lucene was the only solution that produced an index that was smaller than the input data size. Shaves an additional 5 megabytes if one runs it in optimize mode, but at the consequence of adding another ten seconds to indexing. sphinx and zettair index the fastest. Interestingly, I ran zettair in big-and-fast mode (which sucks up 300+ megabytes of RAM) but it ran slower by 3 seconds (maybe because of the nature of tweets). Xapian ran 5x slower than sqlite (which stores the raw input data in addition to the index) and produced the largest index file sizes. The default index_text method in Xapian stores positional information, which blew the index size to 529 megabytes. One must use index_text_without_positions to make the size more reasonable. I checked my Xapian code against the examples and documentation to see if I was doing something wrong, but I couldn’t find any discrepancies. I also included a column about development issues I encountered. zettair was by far the easiest to use (simple command line) but required transforming the input data into a new format. I had some text issues with sqlite (also needs to be recompiled with FTS3 enabled) and sphinx given their strict input constraints. sphinx also requires a conf file which took some searching to find full examples of. Lucene, zettair, and Xapian were the most forgiving when it came to accepting text inputs (zero errors).
  18. Larger data set (3x larger than the Twitter one) we see zettair’s indexing performance improve (makes sense as it’s more designed for larger corpora); zettair’s search speed should probably be a bit faster because its search command line utility prints some unnecessary stats. For multi-searching in sphinx, I developed a Java client (with the hopes of making it competitive with Lucene – the one to beat) which connects to the sphinx searchd server via a socket (that’s their API model in the examples). sphinx returned searches the fastest – ~3x faster than Lucene. Its indexing time was also on par with zettair. Lucene obtained the highest relevance and smallest index size. The index time could probably be improved by fiddling with its merge parameters, but I wanted to avoid numerical adjustments in this evaluation. Xapian has very similar search performance to Lucene but with significant indexing costs (both time and space &gt; 3x). sqlite has the worst relevance because it doesn’t sort by relevance nor seem to provide an ORDER BY function to do so.
  19. &lt;!-- When a message on portType an operation &amp;quot;process&amp;quot; instantiate a variable named &amp;quot;Request&amp;quot; --&gt; &lt;!-- tipicamente la request conterrà un solo Record. Record multipli sono prodotti ad esempio da annotatori che esaminano archivi zip|rar|tgz. L&apos;extension activity verrà eseguita se l&apos;attributo workflow-attribute&apos; presente sul record contiene il valore &amp;quot;split&amp;quot;. Le condizioni sono espresse come espressioni XPath e gli attributi e annotazioni utilizzati devono essere espressamente resi disponibili al workflow BPEL tramite configurazione (di org.eclipse.smila.blackboard). --&gt;
  20. RAP – Rich Ajax Platform G-Eclipse: extensible framework including a GRID model for seamless integration of GRID/Cloud resources. It support different Grid/Cloud interfaces, including AWS
  21. Example: the token “saw” Stemming  it might return just “s” Lemmatization  attempts to return “see” or “saw” depending on whether the use of the token is a verb or a noun