Lucene revolution eu 2013 dublin writeup

•

0 gostou•860 visualizações

This presentation is loosly based on my 2-day writeups on Lucene Revolution conference 2013 held in Dublin http://dmitrykan.blogspot.fi/2013/11/lucene-revolution-eu-2013-in-dublin-day.html http://dmitrykan.blogspot.fi/2013/11/lucene-revolution-eu-2013-in-dublin-day_13.html

Tecnologia

Lucene
Revolution EU
2013
Dublin
Dmitry Kan

Day 1 (Wednesday): conference
1. Keynote by Michael Busch of Twitter
2. Integrating Solr and Storm by Timothy
Potter
3. Lucene at LinkedIn by LI enggs
4. Additions to Lucene arsenal by Adrien
Grand and Shai Erera

Day 1: conference
5. Shrinking the Haystack with SOLR and
OpenNLP
6. Parboiled for query parser generating:
SWAN = SAME, WITHIN, ADJ, NEAR
7. Stump the Chump! ->

Stump the Chump
We use filter queries a lot. Some of these are long boolean queries. Of those some are static, i.e.
are not changing every day, but only sometimes. The example would be:
fq=Country:(Angora OR Russia OR US) // relatively small set of potentially grouppable entries (I.
e. group labels can be created to shorten a query).
The others are very dynamic, changing practically every day. The example would be:
fq=UserId:(userid1 OR userid9 OR...) // veeeery long boolean query, like thousands of
ungrouppable entries
If we don't cache the dynamic filter queries, we save space for useful filter queries, but slow down
the execution. If we do cache the dynamic filter queries we are risking the quick cache flushing.
Is there a smart way of handling such a situation?
Regards,
Dmitry Kan

!

U
ST

M

D
PE

Day 2 (Thursday)
1. Discussion panel with LucidWorks CEO
2. “Lucene Search Essentials: Scorers,
Collections and Custom Queries” by Mikhail
Khludnev
3. Text classification and Apache Mahout by
Isabel Drost

Day 2
4. Turning search upside down by Alan
Woodward and Charlie Hull
TOPIC_TAXONOMY
5. What is in Lucene index by Adrien Grand

Lucene @ Twitter
1. Just Lucene, no SOLR
2. Index in RAM (2 weeks)
3. Postings lists are sorted by time, such that
index reader reads from the end and gets
fresh data
4. No commits => no index reopening!
5. Keep promising code -> Apache

Lucene search essentials
T[0] = "it is what it is"
T[1] = "what is it"
T[2] = "it is a banana"

Lucene search essentials
"a":
"banana":
"is":
"it":
"what":

{2}
{2}
{0, 1, 2}
{0, 1, 2}
{0, 1}

T[0] = "it is what it is"
T[1] = "what is it"
T[2] = "it is a banana"

Lucene search essentials
http://www.lib.rochester.edu/index.cfm?PAGE=489

Give analyst tools and they will produce actionable
data => don’t try to outsmart people too much

Mais conteúdo relacionado

Destaque

Linguistic component Lemmatizer for the Russian language

Dmitry Kan

Видео к презентации: http://vk.com/mtengine В докладе представлен краудсорсинг проект, ориентированный на построение и улучшение системы машинного перевода. Отличительной чертой является применение компьютерной семантики русского языка. Также рассматривается статистический метод автоматической генерации переводных словарей.

MTEngine: Semantic-level Crowdsourced Machine Translation

Dmitry Kan

Introduction To Machine Translation

Dmitry Kan

NoSQL (Not Only SQL) is believed to be a superset of, or sometimes an intersecting set with, relational SQL databases. The concept itself is still shaping, but already now we can say for sure: NoSQL addresses the task of storing and retrieving the data of large volumes in the systems with high load. There is another very important angle in perceiving the concept: NoSQL systems can allow storing and efficient searching of the unstructured or semi-unstructured data, like completely raw or preprocessed documents. Using the example of one world-class document retrieval system Apache SOLR (performant HTTP wrapper around Apache Lucene) as a reference we will check upon its use cases, horizontal and vertical scalability, faceted search, distribution and load balancing, crawling, extendability, linguistic support, integration with relational databases and much more. Dmitry Kan will shortly touch upon *hot* topic of cloud computing using the famous project Apache Hadoop and will help the audience to see whether SOLR shines through the cloud.

NoSQL, Apache SOLR and Apache Hadoop

Dmitry Kan

Rule based approach to sentiment analysis at ROMIP 2011

Dmitry Kan

Poster: Method for an automatic generation of a semantic-level contextual tra...

Dmitry Kan

Rule based approach to sentiment analysis at romip’11 slides

Dmitry Kan

Linguistic component Tokenizer for the Russian language

Dmitry Kan

Semantic Analysis: theory, applications and use cases

Dmitry Kan

Introductory level presentation on Information Retrieval: Open source state. Helps the reader to comprehend what open source systems and tools are available for creating / managing own search engines. Provides a glimpse into research directions in IR, also solvable with open source solutions. These slides were presented in the University of Helsinki, as a guest lecture for the "Information Retrieval and Search Engines - Spring 2017" course.

IR: Open source state

Dmitry Kan

Destaque (10)

Linguistic component Lemmatizer for the Russian language

MTEngine: Semantic-level Crowdsourced Machine Translation

Introduction To Machine Translation

NoSQL, Apache SOLR and Apache Hadoop

Rule based approach to sentiment analysis at ROMIP 2011

Poster: Method for an automatic generation of a semantic-level contextual tra...

Rule based approach to sentiment analysis at romip’11 slides

Linguistic component Tokenizer for the Russian language

Semantic Analysis: theory, applications and use cases

IR: Open source state

Mais de Dmitry Kan

In this talk we will dive into the de facto emerged field of Vector Search that you cannot ignore. We will look at how it all started, examine its algorithmic principles, explore software in the form of databases, frameworks and embedding servers, and go through use cases. The discussion is based on the author’s own experience in researching vector search algorithms, implementing search engines for clients and for Medium blog, as well as interviewing the makers for his Vector Podcast. We will also take a look at vector search in action, tackling some tough search problems, like multilinguality and multimodality. Presented for London IR Meetup, July 26 2022: https://www.meetup.com/london-information-retrieval-meetup-group/events/287183033/

London IR Meetup - Players in Vector Search_ algorithms, software and use cases

Dmitry Kan

Dmitry Kan, Principal AI Scientist at Silo AI and host of the Vector Podcast [1], will give an overview of the landscape of vector search databases and their role in NLP, along with the latest news and his view on the future of vector search. Further, he will share how he and his team participated in the Billion-Scale Approximate Nearest Neighbor Challenge and improved recall by 12% over a baseline FAISS. Presented at https://www.meetup.com/open-nlp-meetup/events/282678520/ YouTube: https://www.youtube.com/watch?v=RM0uuMiqO8s&t=179s Follow Vector Podcast to stay up to date on this topic: https://www.youtube.com/@VectorPodcast

Vector databases and neural search

Dmitry Kan

Promoting diversity among items in a search result has been shown to increase user satisfaction, compared to relevancy only based ranking. In this talk, we'll present how we went about implementing search result diversification methods across different vertical search engines. Starting from zero with no diversification at all, exploring simple heuristic-based methods and moving onwards to more complex ones based on entropy and determinantal point processing. We'll also discuss evaluation methods and useful tooling around that. Presented by Dmitry Kan, Principal AI Scientist at Silo AI and Daniel Wärnå, AI Engineer, Silo AI. YouTube recording: https://www.youtube.com/watch?v=bri0C28mfl8 Code demoed: https://github.com/DmitryKey/bert-solr-search/tree/master/src/diversify

Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...

Dmitry Kan

Автоматический анализ тональности можно по праву считать подзадачей ИИ. В этом докладе мы рассмотрим проблематику создания системы SentiScan, коснёмся вопросов оценки качества, сопровождения, реальных кейсов и способов улучшения качества в полуавтоматическом режиме. Компания SemanticAnalyzer разработала API для распознавания объектной тональности в текстах на русском языке. Испробовать систему можно подключившись к API на сайте: https://www.mashape.com/dmitrykey/russiansentimentanalyzer

SentiScan: система автоматической разметки тональности в social media

Dmitry Kan

Icsoft 2011 51_cr

Dmitry Kan

Computer Semantics And Machine Translation

Dmitry Kan

Mais de Dmitry Kan (6)

London IR Meetup - Players in Vector Search_ algorithms, software and use cases

Vector databases and neural search

Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...

SentiScan: система автоматической разметки тональности в social media

Icsoft 2011 51_cr

Computer Semantics And Machine Translation

Último

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Edi Saputra

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

Scalable LLM APIs for AI and Generative AI Application Development Ettikan Karuppiah, Director/Technologist - NVIDIA Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...

apidays

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Whatsapp Number Escorts Call girls 8617370543 Available 24x7 Navi Mumbai Call Girls Service Offer Genuine VIP Model Escorts Call Girls in Your Budget. Navi Mumbai Call Girls Service Provide Real Call Girls Number. Make Your Sexual Pleasure Memorable with Our Navi Mumbai Call Girls at Affordable Price. Top VIP Escorts Call Girls, High Profile Independent Escorts Call Girls, Housewife Women Escorts Call Girl, College Girls Escorts Call Girls, Russian Escorts Call girls Service in Your Budget.

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Deepika Singh

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Zilliz

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

Modernizing Securities Finance: The cloud-native prime brokerage platform transforming capital markets. Madhu Subbu, Managing Director, Head of Securities Finance Engineering Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu

apidays

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

ICT role in 21st century education and its challenges

rafiqahmad00786416

MINDCTI Revenue Release Quarter One 2024

MIND CTI

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

GenAI Risks & Security Meetup 01052024.pdf

lior mazor

Real Time Object Detection Using Open CV

Khem

Lucene revolution eu 2013 dublin writeup

1. Lucene Revolution EU 2013 Dublin Dmitry Kan

2. Day 0 (Tuesday): exploring Dublin

3. Day 1 (Wednesday): conference 1. Keynote by Michael Busch of Twitter 2. Integrating Solr and Storm by Timothy Potter 3. Lucene at LinkedIn by LI enggs 4. Additions to Lucene arsenal by Adrien Grand and Shai Erera

4. Day 1: conference 5. Shrinking the Haystack with SOLR and OpenNLP 6. Parboiled for query parser generating: SWAN = SAME, WITHIN, ADJ, NEAR 7. Stump the Chump! ->

5. Stump the Chump We use filter queries a lot. Some of these are long boolean queries. Of those some are static, i.e. are not changing every day, but only sometimes. The example would be: fq=Country:(Angora OR Russia OR US) // relatively small set of potentially grouppable entries (I. e. group labels can be created to shorten a query). The others are very dynamic, changing practically every day. The example would be: fq=UserId:(userid1 OR userid9 OR...) // veeeery long boolean query, like thousands of ungrouppable entries If we don't cache the dynamic filter queries, we save space for useful filter queries, but slow down the execution. If we do cache the dynamic filter queries we are risking the quick cache flushing. Is there a smart way of handling such a situation? Regards, Dmitry Kan ! U ST M D PE

6. Stump the Chump ! U ST M D PE

7. Day 2 (Thursday) 1. Discussion panel with LucidWorks CEO 2. “Lucene Search Essentials: Scorers, Collections and Custom Queries” by Mikhail Khludnev 3. Text classification and Apache Mahout by Isabel Drost

8. Day 2 4. Turning search upside down by Alan Woodward and Charlie Hull TOPIC_TAXONOMY 5. What is in Lucene index by Adrien Grand

9. Lucene @ Twitter 1. Just Lucene, no SOLR 2. Index in RAM (2 weeks) 3. Postings lists are sorted by time, such that index reader reads from the end and gets fresh data 4. No commits => no index reopening! 5. Keep promising code -> Apache

10. Lucene search essentials T[0] = "it is what it is" T[1] = "what is it" T[2] = "it is a banana"

11. Lucene search essentials "a": "banana": "is": "it": "what": {2} {2} {0, 1, 2} {0, 1, 2} {0, 1} T[0] = "it is what it is" T[1] = "what is it" T[2] = "it is a banana"

12. Lucene search essentials

13. Lucene search essentials http://www.lib.rochester.edu/index.cfm?PAGE=489

14.

15. Shrinking a haystack by ISS

16. Give analyst tools and they will produce actionable data => don’t try to outsmart people too much

17. What is in Lucene index?

18. What is in Lucene index?

19. What is in Lucene index?

20. What is in Lucene index?

21. What is in Lucene index?