Smart Autocompl... with Solr

•Transferir como PPTX, PDF•

1 gostou•1,102 visualizações

Automatic suggestion functionality on most websites are simply standard Solr fuzzy wildcard queries on a concatenated index. For the Dutch public transportation website 9292.nl we created a more contextual autocompletion function, that uses the different address fields of the source database to its advantage, which allows it to better understand what address a user means in this highly ambiguous data environment. We will show the methods used and explain how this may help bring autocompletion functionality to the next level for faceted indexes This talk was given at Berlin Buzzwords 2012

Tecnologia Negócios

you complete me

Anne Veling – June 5th, 2012 – Berlin Buzzwords
@anneveling

AGENDA
• 9292.nl Public Transport Site
• Naive Address Autocompletion
• Field Inspection Semantic Autocompletion
• Conclusions

9292.NL
• Largest public transport site of The Netherlands
• 1M travel advices per day!
• Complete new site by Q42
• Linking to existing routing engine
• Moving from multiple input boxes to one
• Mobile applications for Windows, iPhone, Android

DATA
• 10M points
• Train and metro stations
• Bus stops
• Places of Interest
• Streets
• Street ranges
• Addresses
• Highly ambiguous
• Streets / city names / POI
• Spelling mistakes
• No single order

NAIVE IMPLEMENTATION
• One concatenated field in Lucene
• Tune tokenizer/analyzer
• Tune query analyzer
• Tune weights

• Syntax Only

FIELD INSPECTION
• Taking advantage of
• Number of fields
• Speed of Lucene
• Query Analysis
• For each term, query in all fields
• Does it appear in that field? Count > 0?
• Use that information to do semantic interpretation

etten leur zeil
city?
station?
bus stop?
street?

city:etten-leur street:zeil

RESULTS
• Implemented in Scala
• Lucene RequestHandler in Solr
• Ajax front-end

TUNING
• Iterative Tuning
• Using real user inputs from production log files
• Regression Testing to track index/algorithm changes over time
• For how many test queries is the expected result
• The top result?
• In the top 5?

CONCLUSIONS
• Very positive feedback
• Iterative tuning based on actual user input from log files
• Regression test
• Lucene is fast
• Entire type-ahead still within 40ms
• But: partner currently evaluating naive-only approach
• sometimes good enough is good enough
• Field Inspection will allow high quality selection
• With fallback to naive syntactic search

Recomendados

SUGNL Colours - Otap & content deliveryColours B.V.

Alpha QA and engineering services may 2014Stephen Peacock

Keeping London On The Move - Interesting Solutions For Challenging ProblemsAnand Ramdeo

Gumi mr. horiuchiawsadovantageseminar

API-Design: Vorsicht vor der Versioning-Hölle!OPEN KNOWLEDGE GmbH

Voice-Controlled Fire Detection VehicleAzka Ihsan Nurrahman

FinTech Belgium MeetUp on APIs 16/11/17 - API Overview - Ingenico ePaymentsAlessandra Gambrill - Guion

Scala eXchange 2013 ReportMichal Bigos

How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...Amit Gupta

How does the Cloud Foundry Diego Project Run at Scale?VMware Tanzu

HueDecide: A lecture voting system augmented by IoTMartin Chapman

Sprintintegration ajipMakarand Bhatambarekar

IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...Daniel Varro

PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxneju3

Open Source Routing Machine - FOSS4G 2016 BonnJohan

RabbitMQ and EasyNetQKen Taylor

Dc roundtablesmall webservices_2002eaiti

Partners Life - Insurer Innovation Award 2024The Digital Insurer

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

A Domino Admins Adventures (Engage 2024)Gabriella Davis

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

GenCyber Cyber Security Day PresentationMichael W. Hawkins

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Mais conteúdo relacionado

Semelhante a Smart Autocompl... with Solr

How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...Amit Gupta

How does the Cloud Foundry Diego Project Run at Scale?VMware Tanzu

HueDecide: A lecture voting system augmented by IoTMartin Chapman

Sprintintegration ajipMakarand Bhatambarekar

IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...Daniel Varro

PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxneju3

Open Source Routing Machine - FOSS4G 2016 BonnJohan

RabbitMQ and EasyNetQKen Taylor

Dc roundtablesmall webservices_2002eaiti

Semelhante a Smart Autocompl... with Solr (9)

How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...

How does the Cloud Foundry Diego Project Run at Scale?

HueDecide: A lecture voting system augmented by IoT

Sprintintegration ajip

IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...

PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx

Open Source Routing Machine - FOSS4G 2016 Bonn

RabbitMQ and EasyNetQ

Dc roundtablesmall webservices_2002

Último

Partners Life - Insurer Innovation Award 2024The Digital Insurer

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

A Domino Admins Adventures (Engage 2024)Gabriella Davis

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

GenCyber Cyber Security Day PresentationMichael W. Hawkins

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

A Call to Action for Generative AI in 2024Results

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Smart Autocompl... with Solr

1. you complete me Anne Veling – June 5th, 2012 – Berlin Buzzwords @anneveling

2. AGENDA • 9292.nl Public Transport Site • Naive Address Autocompletion • Field Inspection Semantic Autocompletion • Conclusions

4. 9292.NL • Largest public transport site of The Netherlands • 1M travel advices per day! • Complete new site by Q42 • Linking to existing routing engine • Moving from multiple input boxes to one • Mobile applications for Windows, iPhone, Android

9. DATA • 10M points • Train and metro stations • Bus stops • Places of Interest • Streets • Street ranges • Addresses • Highly ambiguous • Streets / city names / POI • Spelling mistakes • No single order

10. NAIVE IMPLEMENTATION • One concatenated field in Lucene • Tune tokenizer/analyzer • Tune query analyzer • Tune weights • Syntax Only

11. 100% 80% quality effort

12.

13. FIELD INSPECTION • Taking advantage of • Number of fields • Speed of Lucene • Query Analysis • For each term, query in all fields • Does it appear in that field? Count > 0? • Use that information to do semantic interpretation

14. etten leur zeil city? station? bus stop? street? city:etten-leur street:zeil

15. RESULTS • Implemented in Scala • Lucene RequestHandler in Solr • Ajax front-end

16.

17. TUNING • Iterative Tuning • Using real user inputs from production log files • Regression Testing to track index/algorithm changes over time • For how many test queries is the expected result • The top result? • In the top 5?

18. CONCLUSIONS • Very positive feedback • Iterative tuning based on actual user input from log files • Regression test • Lucene is fast • Entire type-ahead still within 40ms • But: partner currently evaluating naive-only approach • sometimes good enough is good enough • Field Inspection will allow high quality selection • With fallback to naive syntactic search

19. THANK YOU @anneveling

Smart Autocompl... with Solr

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Smart Autocompl... with Solr

Semelhante a Smart Autocompl... with Solr (9)

Último

Último (20)

Smart Autocompl... with Solr