Apache Solr - search for everyone!

•Transferir como PPTX, PDF•

1 gostou•1,634 visualizações

Jaran Flaath

Talk presented at Baksia meet up i Oslo on November 23rd 2011.

Tecnologia

Apache Solr
- search for everyone!

http://www.flickr.com/photos/malikdhadha/

• Co-founder and R&D Director at Integrasco

• Founder and developer of Notpod

• Leader of javaBin Sørlandet

• Programmer and Open Source enthusiast

Jaran Nilsen
twitter.com/jarannilsen

What is search?
http://www.flickr.com/photos/denverjeffrey/5133538450/

http://www.flickr.com/photos/somegeekintn/3709203268/

This is Apache Solr
• Open Source enterprise search server from
Apache

• Built on Apache Lucene

• Offers additional features to those of Lucene

• Started out as a in-house CNET project for adding
search functionality to the CNET website in 2004

• Started out as a in-house CNET project for adding
search functionality to the CNET website.

• Donated to Apache Software Foundation in 2006

• Started out as a in-house CNET project for adding
search functionality to the CNET website.

• Donated to Apache Software Foundation in 2006

• Graduated from incubation status in 2007

• Since version 3.1 (March 2011), Solr and Lucene are
now sharing the same codebase.

+

• Since version 3.1 (March 2011), Solr and Lucene are
now sharing the same codebase.

• Meaning sharing of features and fixes between the
projects at a much higher rate

+

wget http://apache.uib.no/lucene/solr/3.6.1/apache-solr-3.6.1.tgz

tar xvf apache-solr-3.6.1.tgz

cd apache-solr-3.6.1/example/

java -jar start.jar

4 small steps...

cd exampledocs/

./post.sh ipod_other.xml

The obvious part – full text searching

http://www.flickr.com/photos/49889874@N05/6877840735/

• q=yourquery

• Example:
q=android AND ios&rows=100

The Schema
http://www.flickr.com/photos/14804582@N08/2111269218/

Key elements of schema.xml

• Unique identifer

• Default search field

• Types

• Fields and dynamic fields

• Copy fields

Solr configuration
http://www.flickr.com/photos/esetianto/4099842490/

Key elements of solrconfig.xml

• Settings for your search index

• Warm-up routines

• Cache settings

• Replication

• Update chain

Just add this to your URL:

• facet=true&facet.field=field

• Example:
facet=true&facet.field=language

Facet queries

&facet=true
&facet.query=price:[* TO 100]
&facet.query=price:[100 TO 200]
&facet.query=price:[200 TO 300]
&facet.query=price:[300 TO 400]
&facet.query=price:[400 TO 500]
&facet.query=price:[500 TO *]

Now you want to drill down!
http://www.flickr.com/photos/kk/4712925031/

Just add this to your URL:

• fq=field:value

• Example:
fq=source:facebook.com

TermVectorComponent

Term vector information aggregator

Scalability
http://www.flickr.com/photos/dickyfeng/3249837481/

Index sharding strategy

Solr Solr Solr Solr Solr
instance 1 instance 2 instance 3 instance 4 instance N

Index sharding strategy

Solr Solr Solr Solr Solr
instance 1 instance 2 instance 3 instance 4 instance N

ipod OR iphone Search

Just add this to your URL:

• shards=shard1,shard2

• Example:
q=android&shards=solr1.node.com/s
olr,solr2.node.com/solr,solr3.node.co
m/solr

Replication

Indexer Master

Slave android Search

Integration of Solr

http://www.flickr.com/photos/certified_su/229016531/

Solr has support for many different
languages
• Ruby
• PHP
• Java
• Scala
• Python
• .NET
• Perl
• JavaScript

Tips & Gotcha’s
Or; how to avoid the sinkholes!
http://www.flickr.com/photos/67165210@N00/4661419386/

«Figure out what kind of
searches you will be
doing»

«Spend a siginficant
amount of time
designing schema.xml»

«Add dynamic fields for
ALL your field types»

«Do not use Solr as your
primary data store!»

http://www.flickr.com/photos/11304375@N07/2046228644

http://www.flickr.com/photos/davidw/2201099990/

Thank you!

http://www.jeremiahblatz.com/personal/pics/Australia_Travel_Pictures_2009/day12/164_Sunrise_Great_Barrier_Reef.html

Mais conteúdo relacionado

Mais procurados

ZendCon 2010 - Building Intelligent Search Applications with Apache Solr and PHP5. This is a presentation on how to create intelligent web-based search applications using PHP 5 and the out-of-the-box features available in Solr 1.4.1 After we finish we finish the illustration of adding, updating and removing data from the Solr index, we will discuss how to add features such as auto-completion, hit highlighting, faceted navigation, spelling suggestions etc

Building Intelligent Search Applications with Apache Solr and PHP5

israelekpo

Introduction Apache Solr & PHP

Hiraq Citra M

Apache Solr Workshop

Saumitra Srivastav

Introduction to Apache Solr.

Solr Recipes

Apache Solr

Solr: 4 big features

Solr Presentation

Introduction to Apache Lucene/Solr

Rahul Jain

Apache Solr crash course

Tommaso Teofili

Using Apache Solr

pittaya

Enterprise Search Solution: Apache SOLR. What's available and why it's so cool

Ecommerce Solution Provider SysIQ

Solr Masterclass Bangkok, June 2014

Alexandre Rafalovitch

You’re Solr powered, and needing to customize its capabilities. Apache Solr is flexibly architected, with practically everything pluggable. Under the hood, Solr is driven by the well-known Apache Lucene. Lucene for Solr Developers will guide you through the various ways in which Solr can be extended, customized, and enhanced with a bit of Lucene API know-how. We’ll delve into improving analysis with custom character mapping, tokenizing, and token filtering extensions; show why and how to implement specialized query parsing, and how to add your own search and update request handling.

Lucene for Solr Developers

Erik Hatcher

Dev8d Apache Solr Tutorial

Sourcesense

Apache Solr! Enterprise Search Solutions at your Fingertips!

Murshed Ahmmad Khan

Lucene's Latest (for Libraries)

Erik Hatcher

Integrating the Solr search engine

th0masr

SolrTM is the popular, blazing fast open Source Enterprise search platform from the Apache LuceneTM project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites like (Aol, Yahoo, Buy.com, Cnet, CitySearch, Netflix, Zappos, Stubhub!, digg, eTrade, Disney, Apple, NASA and MTV).

New-Age Search through Apache Solr

Edureka!

After a thorough overview of the main features and benefits of Apache Solr (an open source search server), the architecture of Solr and strategies to adopt it for your PHP application and data model will be presented. The main lessons learned around dealing with a mix of structured and non-structured content, multilingual aspects, tuning and the various state-of-the-art features of Solr will be shared as well

Get the most out of Solr search with PHP

Paul Borgermans

Mais procurados (20)

Building Intelligent Search Applications with Apache Solr and PHP5

Introduction Apache Solr & PHP

Apache Solr Workshop

Introduction to Apache Solr.

Solr Recipes

Apache Solr

Solr: 4 big features

Solr Presentation

Introduction to Apache Lucene/Solr

Apache Solr crash course

Using Apache Solr

Enterprise Search Solution: Apache SOLR. What's available and why it's so cool

Solr Masterclass Bangkok, June 2014

Lucene for Solr Developers

Dev8d Apache Solr Tutorial

Apache Solr! Enterprise Search Solutions at your Fingertips!

Lucene's Latest (for Libraries)

Integrating the Solr search engine

New-Age Search through Apache Solr

Get the most out of Solr search with PHP

Semelhante a Apache Solr - search for everyone!

Got hundreds of millions of documents to search? DataImportHandler blowing up while indexing? Random thread errors thrown by Solr Cellduring document extraction? Query performance collapsing? Then you've searching at Big Data scale. This talk will focus on the underlying principles of Big Data, and how to apply them to Solr. This talk isn't a deep dive into SolrCloud, though we'll talk about it. It also isn't meant to be a talk on traditional scaling of Solr.

ApacheCon Europe 2012 -Big Search 4 Big Data

OpenSource Connections

Big Search with Big Data Principles

OpenSource Connections

What good is content if nobody can find it? Many information sites are like icebergs, with only a limited amount of content directly accessible to users and the rest, the "underwater" potion, only available through searches. This talk shows how Rails web sites can take advantage of the world-class Apache SOLR search engine to provide sophisticated and customizable search features. We'll cover how to get started with SOLR, integrating with SOLR using the Sunspot gem, indexing, hit highlighting and other topics.

Rails and the Apache SOLR Search Engine

David Keener

Yahoo! Search monkey API - CEBIT 2008

Eric D.

Meet Solr For The Tirst Again

Varun Thacker

Apache Solr for TYPO3 what's new 2018

timohund

[LDSP] Search Engine Back End API Solution for Fast Prototyping

Jimmy Lai

Got data? Let's make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.

Rapid Prototyping with Solr

Lucidworks (Archived)

Rapid prototyping with solr - By Erik Hatcher

lucenerevolution

Solr @ eBay Kleinanzeigen

Lucidworks (Archived)

There are 900 Tickets currently in the Rails Lighthouse, some big, many small, and some relevant for Rails 3. Many of those issues could be fixed by people like us without too much effort. Using online resources and a short demo we want do find out how to use the Rails Lighthouse, how to clone edge rails and how to run the test suites, and how one can create patches out of the fixes to make them available to the developers.

Contributing to rails

Lukas Eppler

Practical IronRuby

Shay Friedman

Intro to Apache Solr

Shalin Shekhar Mangar

State-of-the-Art Drupal Search with Apache Solr

guest432cd6

State-of-the-Art Drupal Search with Apache Solr

Robert Douglass

Apache Solr for TYPO3 at TYPO3 Usergroup Day Netherlands

Ingo Renner

Solr search engine with multiple table relation

Jay Bharat

Apache Solr 5.0 and beyond

Anshum Gupta

SOLR

Matthew McCullough

The Guardian's Open Platform initiative enables partners to build applications with The Guardian. As part of this initiative, The Guardian provides the Content API - a rich interface to all The Guardian's content and metadata back to 1991 - over 1 million documents. This talk starts with a brief overview of the latest iteration of the content API. It will then cover how we implemented this in Scala using Solr, addressing real-world problems in creating an index of content: how we represented a complex relational database model in Solr how we keep the index up to date, meeting a sub-5 minute end-to-end update requirement how we update the schema as the API evolves, with zero downtime how we scale in response to unpredictable demand, using cloud services

The Guardian Open Platform Content API: Implementation

The Guardian Open Platform

Semelhante a Apache Solr - search for everyone! (20)

ApacheCon Europe 2012 -Big Search 4 Big Data

Big Search with Big Data Principles

Rails and the Apache SOLR Search Engine

Yahoo! Search monkey API - CEBIT 2008

Meet Solr For The Tirst Again

Apache Solr for TYPO3 what's new 2018

[LDSP] Search Engine Back End API Solution for Fast Prototyping

Rapid Prototyping with Solr

Rapid prototyping with solr - By Erik Hatcher

Solr @ eBay Kleinanzeigen

Contributing to rails

Practical IronRuby

Intro to Apache Solr

State-of-the-Art Drupal Search with Apache Solr

Apache Solr for TYPO3 at TYPO3 Usergroup Day Netherlands

Solr search engine with multiple table relation

Apache Solr 5.0 and beyond

SOLR

The Guardian Open Platform Content API: Implementation

Mais de Jaran Flaath

Intro to docker

SpringBoot

Effective Testing

Spring Boot

JUnit 5 Alpha

MyGitBackup

Delivering Great Software

Jaran Flaath

Spring Test DBUnit

Jaran Flaath

Trello

Jaran Flaath

Bootstrap lightning talk

Jaran Flaath

Mais de Jaran Flaath (10)

Intro to docker

SpringBoot

Effective Testing

Spring Boot

JUnit 5 Alpha

MyGitBackup

Delivering Great Software

Spring Test DBUnit

Trello

Bootstrap lightning talk

Último

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

Building Digital Trust in a Digital Economy Veronica Tan, Director - Cyber Security Agency of Singapore Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

apidays

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Safe Software

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

Partners Life - Insurer Innovation Award 2024

The Digital Insurer

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

The presentation explores the development and application of artificial intelligence (AI) from its inception to its current status in the modern world. The term "artificial intelligence" was first coined by John McCarthy in 1956 to describe efforts to develop computer programs capable of performing tasks that typically require human intelligence. This concept was first introduced at a conference held at Dartmouth College, where programs demonstrated capabilities such as playing chess, proving theorems, and interpreting texts. In the early stages, Alan Turing contributed to the field by defining intelligence as the ability of a being to respond to certain questions intelligently, proposing what is now known as the Turing Test to evaluate the presence of intelligent behavior in machines. As the decades progressed, AI evolved significantly. The 1980s focused on machine learning, teaching computers to learn from data, leading to the development of models that could improve their performance based on their experiences. The 1990s and 2000s saw further advances in algorithms and computational power, which allowed for more sophisticated data analysis techniques, including data mining. By the 2010s, the proliferation of big data and the refinement of deep learning techniques enabled AI to become mainstream. Notable milestones included the success of Google's AlphaGo and advancements in autonomous vehicles by companies like Tesla and Waymo. A major theme of the presentation is the application of generative AI, which has been used for tasks such as natural language text generation, translation, and question answering. Generative AI uses large datasets to train models that can then produce new, coherent pieces of text or other media. The presentation also discusses the ethical implications and the need for regulation in AI, highlighting issues such as privacy, bias, and the potential for misuse. These concerns have prompted calls for comprehensive regulations to ensure the safe and equitable use of AI technologies. Artificial intelligence has also played a significant role in healthcare, particularly highlighted during the COVID-19 pandemic, where it was used in drug discovery, vaccine development, and analyzing the spread of the virus. The capabilities of AI in healthcare are vast, ranging from medical diagnostics to personalized medicine, demonstrating the technology's potential to revolutionize fields beyond just technical or consumer applications. In conclusion, AI continues to be a rapidly evolving field with significant implications for various aspects of society. The development from theoretical concepts to real-world applications illustrates both the potential benefits and the challenges that come with integrating advanced technologies into everyday life. The ongoing discussion about AI ethics and regulation underscores the importance of managing these technologies responsibly to maximize their their benefits while minimizing potential harms.

Artificial Intelligence: Facts and Myths

Joaquim Jorge

Advantages of Hiring UIUX Design Service Providers for Your Business

Pixlogix Infotech

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

What are drone anti-jamming systems? The drone anti-jamming systems and anti-spoof technology protect against interference, jamming, and spoofing of the UAVs. To protect their security, countries are beginning to research drone anti-jamming systems, also known as drone strike weapons. The anti-jam and anti-spoof technology protects against interference, jamming and spoofing. A drone strike weapon is a drone attack weapon that can attack and destroy enemy drones. So what is so unique about this amazing system?

What Are The Drone Anti-jamming Systems Technology?

Antenna Manufacturer Coco

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

This presentation explores the impact of HTML injection attacks on web applications, detailing how attackers exploit vulnerabilities to inject malicious code into web pages. Learn about the potential consequences of such attacks and discover effective mitigation strategies to protect your web applications from HTML injection vulnerabilities. for more information visit https://bostoninstituteofanalytics.org/category/cyber-security-ethical-hacking/

HTML Injection Attacks: Impact and Mitigation Strategies

Boston Institute of Analytics

Tech Trends Report 2024 Future Today Institute.pdf

hans926745

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

Tata AIG General Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

Apache Solr - search for everyone!

1. Apache Solr - search for everyone! http://www.flickr.com/photos/malikdhadha/

2. • Co-founder and R&D Director at Integrasco • Founder and developer of Notpod • Leader of javaBin Sørlandet • Programmer and Open Source enthusiast Jaran Nilsen twitter.com/jarannilsen

3. A global leader in social intelligence

4. What is search? http://www.flickr.com/photos/denverjeffrey/5133538450/

7. http://www.flickr.com/photos/somegeekintn/3709203268/

8. This is Apache Solr • Open Source enterprise search server from Apache • Built on Apache Lucene • Offers additional features to those of Lucene

9. First, a little history...

10. • Started out as a in-house CNET project for adding search functionality to the CNET website in 2004

11. • Started out as a in-house CNET project for adding search functionality to the CNET website. • Donated to Apache Software Foundation in 2006

12. • Started out as a in-house CNET project for adding search functionality to the CNET website. • Donated to Apache Software Foundation in 2006 • Graduated from incubation status in 2007

13. • Since version 3.1 (March 2011), Solr and Lucene are now sharing the same codebase. +

14. • Since version 3.1 (March 2011), Solr and Lucene are now sharing the same codebase. • Meaning sharing of features and fixes between the projects at a much higher rate +

15.

16. wget http://apache.uib.no/lucene/solr/3.6.1/apache-solr-3.6.1.tgz tar xvf apache-solr-3.6.1.tgz cd apache-solr-3.6.1/example/ java -jar start.jar 4 small steps...

17. ...and we’re up!

18.

19. cd exampledocs/ ./post.sh ipod_other.xml

20.

21. The obvious part – full text searching http://www.flickr.com/photos/49889874@N05/6877840735/

22. • q=yourquery • Example: q=android AND ios&rows=100

23.

24.

25.

26.

27. Don’t worry - it’s not just XML!

28. The Schema http://www.flickr.com/photos/14804582@N08/2111269218/

29. Key elements of schema.xml • Unique identifer • Default search field • Types • Fields and dynamic fields • Copy fields

30.

31.

32.

33. Solr configuration http://www.flickr.com/photos/esetianto/4099842490/

34. Key elements of solrconfig.xml • Settings for your search index • Warm-up routines • Cache settings • Replication • Update chain

35. Features http://xkcd.com/619/

36. Facets

37. Facets

38. Facets

39. Just add this to your URL: • facet=true&facet.field=field • Example: facet=true&facet.field=language

40.

41. Facet queries

42. Facet queries &facet=true &facet.query=price:[* TO 100] &facet.query=price:[100 TO 200] &facet.query=price:[200 TO 300] &facet.query=price:[300 TO 400] &facet.query=price:[400 TO 500] &facet.query=price:[500 TO *]

43. Now you want to drill down! http://www.flickr.com/photos/kk/4712925031/

44. Filter queries

45. Filter queries

46. Filter queries

47. Just add this to your URL: • fq=field:value • Example: fq=source:facebook.com

48. Produce «word clouds»

49. •TermsComponent •TermVectorComponent

50. TermVectorComponent Term vector information aggregator

51.

52. Scalability http://www.flickr.com/photos/dickyfeng/3249837481/

53. •Sharding •Replication

54. Index sharding strategy Solr Solr Solr Solr Solr instance 1 instance 2 instance 3 instance 4 instance N

55. Index sharding strategy Solr Solr Solr Solr Solr instance 1 instance 2 instance 3 instance 4 instance N ipod OR iphone Search

56. Index sharding strategy Solr Solr Solr Solr Solr instance 1 instance 2 instance 3 instance 4 instance N ipod OR iphone Search

57. Just add this to your URL: • shards=shard1,shard2 • Example: q=android&shards=solr1.node.com/s olr,solr2.node.com/solr,solr3.node.co m/solr

58. Replication Indexer Master Slave android Search

59. Replication configuration

60. Integration of Solr http://www.flickr.com/photos/certified_su/229016531/

61. Solr has support for many different languages • Ruby • PHP • Java • Scala • Python • .NET • Perl • JavaScript

62.

63. Tips & Gotcha’s Or; how to avoid the sinkholes! http://www.flickr.com/photos/67165210@N00/4661419386/

64. «What data do your clients need?»

65. «Figure out what kind of searches you will be doing»

66. «Spend a siginficant amount of time designing schema.xml»

67. «Add dynamic fields for ALL your field types»

68. «Do not use Solr as your primary data store!»

69. «The 20 million mark»

70. But most importantly... Don’t panic!

71. http://www.flickr.com/photos/11304375@N07/2046228644

72. http://www.flickr.com/photos/davidw/2201099990/

73. Thank you! http://www.jeremiahblatz.com/personal/pics/Australia_Travel_Pictures_2009/day12/164_Sunrise_Great_Barrier_Reef.html

Notas do Editor

Welcome...Purpose of this talk is to show you how easy it is to get started and give you an idea of some of the cool things you can do with Solr without too much effort. I love to get quick and easy to understand introductions to tools, that can help me get started quickly. Hopefully that’s what I’ll be able to provide you guys with today.Solr is a big project and it would be silly to attempt to cover everything in one evening, so I am going to focus on some of the features that I believe are the easiest to get started with and which I also am well familiar with and have had good experience with.So let’s get cracking...
Most of you already know me, but I see there are some new faces...
Since 2004, Integrasco has been providing social media methodologies and technologies to corporations, agencies, government regulators and other institutionsDedicated team of vertical expertsTechnology platform for analysis of Social Media where Apache Solr is a vital component.
But enough about me and Integrasco, i bet that’s not why most of you came to this meetup. Let’s talk search. What is search?
I bet many of you think of get this image in your head when you’re thinking about search. For most people today, the term search is equal to the name Google. Because of this «to google» has even entered our dictionaries as an officially accepted verb.Search equals input box and search button! In in many cases this is true. But as Google has become expert on, and which hopefully we will discover Solr can help us do as well – it’s not about searching... ->
It’s all about finding – helping your users find the information they are seeking – not having them search for it! You may think I am just playing with words here, but in my opinion there is a big different between «searching» and «finding». Let’s not fall into a discussion about semantics here, but let’s just say that our job as engineers is to help people find the information they are looking for, and spend our time efficiently doing that instead of spending our time designing search boxes and buttons!
You don’t want your users having spend hours searching on your site (or perhaps you do, if your revenue is driven by advertisement, but let’s also put that discussion on the shelf for later). We want to have our users finding what they’re looking at with little effort and give them a good experience. The great thing is that Solr can help you with just that, to find stuff. It has a lot of features that can improve your user experience and bring value to your data without too much effort. And as we will see it does not have to be linked to search engines at all! That’s eactly what I hope to show you some of today.
Apache Solr is an open source...... But before we get into the juicy details...
... lets learn a little history.
Commo codebase since March 2011Which means...
... They are now sharing features and fixes at a much higher rate than what was the case before.Many often wonder what’s the difference between Solr and Lucene, and up until this merge Solr was often considered to be an additional layer on top of Lucene, providing additional functionality that was not available in Lucene.
What I often find frustrating when looking at new frameworks and technologies, is in many cases the amount of time and resources you have to invest in order to try it out. I am not talking about reading documentation to get a deeper understanding of it – which you eventually have to, but the time you have to invest in order to just get started... I love those 3-point very quick «getting started» guides that just works. That’s what I hope to leave behind here today.To get started with Solr is actually very, very simple. Even though my examples here have been taken from a Unix environment, I’ve tried to make them as platform and language independent as possible. I myself work in a Java environment, but Solr is fully possible to use in many other environments - just as well as from Java.
I’ve tried to shave the process of getting Solr up and running down as much as possible, and I actually came down to these four steps. This is actually all you need to be up and running with a working instance of Solr. There is obviously a lot of configuration and customization you can do to tailor Solr to your specific needs, but to get started playing this is actually all you need to do.
Solr is served by Jetty on port 8983 by default, and opening the solr admin application in a browser yields this view. It’s by no means i candy, but then again – that is YOUR part of the JOB, to create a good looking application that helps find the valuable information Solr can serve you.As you see in the middle here, there’s a search box and a search button – who would have thought that? Let’s cick it!
Voila – there’s not much here yet.Obviously becaus we haven’t actually indexed anything yet. So let’s add some data.
Solr comes with a good set of example data that you can easily import and index to play around and see some of Solr capabilities. These example documents comes with the downloaded package and can be imported using the bash scripts available in the exampledocs folder. Let’s import a couple of ipod related documents and see what happens!
We refresh our search from before and... Behold! We have search results. Don’t be scared by the XML output here, there’s several tools available to work with the response!... So, now that we have some data – it takes us to the obvious part of Solr
Full text searching!This is what Solr was made for and whenever you have a set of documents that you want to query against, you should consider adding a solr instance to your system and query against the documents there – rather against a database. You will quickly see the value it adds! Now, as with most other things in feature, the querying is done via a parameter in the URL...
Very simple!Now lets go back to our admin user interface for another example – just to make sure we don’t scare anyone off with URL parameters in addition to the XML!
If you remember the query from before, it was «asterix, colon, asterix». Solr INDEXES are composed of FIELDS, and you can specify what field you want to search by querying «field, colon, value». So when we searched for «asterix, colon, asterix» we were searching for all values in all fields – hence giving us all documents in the index.Let’s see what happens when we search for documents where the PRICE field contains the value 19.95.
... Unsurprisingly we get documents matching the price 19.95. Ok, but what if you want to add your own test data to play around with? Let’s look quickly at the input format that were used.
It looks like this. A very simpe XML structure that you can easily modify to fit your needs.
Here is the full ipod example we just imported an tested with.
As I said earlier – do not be scared by the XML if you feel it’s overwhelming. There are several different options both consuming the response, as well as for indexing data into Solr. And you do not need to specify all your input data in files, there are various means for importing from databases and other commonly used data sources. The example we have looked at so far has been centered around products from a shop. Obviously this is not likely to fit all your needs. As with any other technology for handling data, you need to define a DOMAIN, or a SCHEMA....
In Solr, the Schema is one of the core configurations you need to master. It’s at the core of your Solr solution and it’s important to spend time designing this schema. Let’s see what it looks like...
Of the key elements, we find....
We define the different types we want to have available in our schema...
... And finally we define all the fields that we want in our schema. One neat thing with Solr, is the elements defined towards the bottom here – the DYNAMIC FIELDS. These allows us to specify any field we want on a document basis, after the index is up and running. This gives us good flexibility in cases where only some document contains some special data. An EXAMPLE from one of our projects at INTEGRASCO was when we received a batch of data we needed to make available for analysis together with our social data. This data contained a lot of META data that did not make sense for social data, but we needed to be able to perform queries across everything. In this situation, our dynamic fields became very handy, as we could simply define them when indexing the new Frankenstein data and it fit nicely in with the rest of our social data.
The second important part of Solr.
Solrconfig.xml is a complex XML document, but as important to spend time with as the Schema.xml – however, we will go into detail on it today as we don’t have time, but here’s some of the key elements that you configure in this file: Settings... How solr should handle your index files etc, Update chain ... You can define your own components for introducing into the indexing processing.
Now that we’ve got solr configured and up and running, lets look closer at what it can DO beyond simple searching!
The first thing I want to show is FACETS.Facets are category counts for search results.Meaning you can provide details about which categories your search results falls within. And as we will see, these categories can be a lot of things. Facets is in many situations also mentioned as FACETED NAVIGATION. Facets are a very powerful feature of solr when it comes to navigating your search results, and as we said in the beginning – it’s not just about searching, it’s about finding. And facets are a good way to provide your users with neat ways to navigate your data.
Here is another good example of how facets CAN be used to enable navigation. This is taken from Finn.no. Now, I do not know whether Finn.no is USING Solr and facets for this information, but it’s a good example of how facets can be used to enable powerful and user friendly navigation.
Yet another example of search facets, or faceted navigation. This time from the webshop Komplett.no, where – after you have done a search – you will get details about CATEGORIES, PRODUCERS and PRICE RANGES.
And this is how you do this withSolr. You simply add a few parameters to your QUERY URL, specifying what field, or fields you wish you receive facet information for – and you’re good to go!To illustrate the output I’ve done a search in our system using the example parameters here, and this would yield....
... This output. As you see here, we get the top 10 languages for my search. You’re not limited to only getting top 10, you choose how many facet counts you want.
Another cool thing is that you are not limited to just getting facets for predefined fields, you can also use FACET QUERIES to provide unique value for your data. Here’s one example from our system, where we use facet queries on date fields to generate statistics for provided queries and display trends over time.The way we do this is to add a facet query for each of the timespans we want in our chart. Then we simply render those facet query counts in a chart.Another very common way to use facet queries, is for instance to generate price range information in online stores – like we saw in the screenshot from Komplett.no. (which we have an example query sting for here ... On the next slide)
So, once we have our FACETs, we may want to drill further down into our data. This is done via FILTER QUERIES. These are constraints you apply to your query to filter the results you get back from Solr.
Here we are going to filter on our facets as we do it in our solution. We have performed a search, and we wish to drill down into this by only looking at Facebook data....
So we choose the facebook Facet...
... and this adds a filter to our existing query – limiting our search within the MEDIA type FACEBOOK.Very easy. And the way we actually do this, in Solr....
… is by adding “fq” parameters to our query. In this case, we added a filter for Facebook. And you can add as many of these as you want, so you’re not limited to filtering on only one thing at the time!Another important thing about filter queries are that they are a big help for optimizing your queries. The reason for this is that they are applied BEFORE other queries are done, which means they limit the number of documents Solr has to query.
The next thing I want to talk about, which in some cases is a bit more tricky to get working, but is definitively worth it for your users – is WORD CLOUDS (or tag clouds, term clouds – whatever you like to call them)We like to call them BUZZ CLOUDS because they tell us what’s buzzing around i the Social Media space.
The way you do this is via SPECIAL SEARCH COMPONENTS in Solr. These are components that you enable in your Solr config which provides access to additional information about the terms and term vectors Solr is using for your index and individual documents. The TERMS COMPONENT is a way to get information from Solr about all the terms available in your index, and how many documents they appear in. So you get the DOCUMENT FREQUENCY for all your terms. This can be used for creating an auto-suggest feature for your search box, although newer versions of Solr contains a special componet for doing just that – so it’s recommended to use that in stead. However, the reason I mention it here is that it is absolutely possible to use this to create such a word cloud for your index. What we decided was a better solution, althought a bit more processing heavy solution, was to use the TermVectorComponent. This component gives you term vector information for individual documents, rather than the entire index. This means we can get information about the term which are included in our resultset and provide a word cloud for that, rather than the entire index.
They way we are doing this is by performing a query, then aggregating the term vector information for the documents i our search result. This means a bit more processing since we have to traverse the documents we’re interested in – and aggregate the term frequencies for each document – we are counting how often the terms appear in our resultset. We then use this information to render the terms based on how often they appear....
... Which in the end enables us to display clouds like this
Now that we’ve looked at some of the powerful, easy to use features of Solr – how do we scale it?The question is, do we continue to build towards the sky, adding more and more memory and processing power, or do you spread it out across smaller instances and distribute across them? How do we ensure that our indexes keeps a decent size and that we can distribute our search? There’s two key features which are useful for this, and which are easily available in Solr...
... That’s SHARDING and REPLICATION.
From an INDEXINGperspective sharding is about determining where to index documents. You do this using a SHARDING STRATEGY. This can be anything from an ID BASED strategy, where you place documents in different Solr instances based on a unique ID. Other strategies can be by GEOGRAPHY, USERNAMES – or as we do at Integrasco, build a strategy based on DATES.The drawback here, is that Solr does not support sharded indexing out of the box, so you need to develop the framework for this yourself. The way we have done this is by creating an index writer for each shard and connecting these index writers to our sharding strategy so it selects the right one to dispatch documents to.
From the SEARCH perspective sharding is very easy. You simply provide your COORDINATING INSTANCE (which can be any of your available instances), with a list of the URLs to the shards you wish to distribute the search across. The coordinating instance will then handle DISTRIBUTION of queries to all the specified shards, and CONSOLIDATE the result before it’s send back to you.
And you do not have to query all shards, it’s a fairly easy job to get your client to only specify relevant shards when performing the query. For instance in the case of Integrasco, where we have sharded based on date – we really don’t have to query the entire cluster when we know we only want to search data for 2011. Then we can select the shards for 2011 and only specify those as the shards we wish the query sent off to.
And how do we do this? Again, by adding some parameters to our query URL containing the addresses of the shards we want to distribute the query across.
When it comes to replication, we build a common MASTER SLAVE relationship, where we have a set of masters where we do all our WRITING and a set of slaves where we do all our SEARCHING. This way we can keep the masters fairly cheap on resources as they do not require that we keep caches up to date in memory, as they do not need to handle any queries. Multiple slavesRepeaters
And the configuration for this is just a cople of lines of XML code in your SOLRCONFIG xml file, as you see examples of here.The only bad thing with the replication feature in Solr that we have seen is that is is very resource hungry when it is replicating. It sucks as much bandwidth and writes to disk as fast as it can, and in cases where there are large amounts of data in each replication batch this can quickly lead to unwanted starvation. So make sure you properly test your replication setup before going into production.
When creating slide decks, I love to go on CreativeCommons.org and search for the main topic of the section and use one of the first pictures as the background. That sometimes leads to interesting slides like this one...Anyways, it’s about integration of Solr – how do you integrate it in your project. And not surprisingly, Solr comes with a lot of client libraries for different languages...
As you can see Solr has good support for many different languages, and most of this are easy-to-use client libraries offering METHODS FOR ACCESSING all of the features we’ve covered in this talk. At Integrasco we’re mostly using Java, so the SolrJ library is what we’re using and it’s providing a very easy to use interface for accessing most of the Solr features we need – more or less right out of the box. I think the only two things we have had to do MORE EXTENSIVE DEVELOPMENT of on top of Solr and SolrJ is SHARDED INDEXING and the WORD CLOUDS using the TermVectorComponent.
A very quick example of the use of SolrJ in Java.
Use sufficient time to analyze and figure out what data your clients need and are interested in. This should be at the core of your planning and help shape the design of your schema and configuration.
Similarly, figure out what kind of searches you will be doing. And make sure your schema allows for these queries to happen. Also, make sure you configure the appropriate warm-up queries to ensure your cache is performing optimally for the type of queries you are doing.
Once you have the previous two covered, make sure you spend significant time designing your schema.xml. As I said this is at the core of your Solr solution and can be difficult to change once you have a lot of data indexed.
This is a very good practice to follow. You never know when you might need a new field for a document, make sure you’re prepared with dynamic fields.
Solr is not for storing data! The documentation even says you should not do it because you should expect the index to become corrupted at one time or another.
As discovered by Twitter (and perhaps others), 20 million documents is the max size you should aim for when it comes to your shard size.
There is a lot of levers and switches in Solr to use for optimization and shaping your search solution to exactly fit your needs. However, don’t mind this in the beginning. Mix and match from the default schema to get familiar with how it works. Start out very simple and build on it. You’ll learn it very quickly and see that search functionality is not just for the large players!
Looking a little further ahead into the future, there’s a lot of exciting things going on with Solr. One of the most exciting developments from my point of view is the Solr Cloud product – which is built on ZooKeeper and will offer a much better setup for large search clusters. Also, there’s work going on with building a Solr distribution based on Hadoop for large scale distributed indexing and search. So it will be very interesting to see where Solr takes us over the next couple of years!
This has been a little taste of what Solr has to offer – and hopefully shown you that it’s not just for large enterprises or people with huge amounts of data. Hopefully you have seen that Solr can fit very well into situation where you normally would not think of placing a search server, and that you start thinking of new ways help your users FIND what they’re looking for – or even better, help them find what they did not know they needed to look for 

Apache Solr - search for everyone!

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Apache Solr - search for everyone!

Semelhante a Apache Solr - search for everyone! (20)

Mais de Jaran Flaath

Mais de Jaran Flaath (10)

Último

Último (20)

Apache Solr - search for everyone!

Notas do Editor