SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
Adventures in Discoverability with C* and Solr

Patricia Gorla, Systems Engineer
@patriciagorla
@o19s

#CASSANDRAEU

CASSANDRASUMMITEU
About Me
• Solr
• Cassandra
• Information retrieval

Paul Hostetler - phostetler.com

#CASSANDRAEU

CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple

Complex

Aristotle’s birthplace?

All ancient Greek philosophers?

Coordinates of Stagira?

All cities within 100km of Stagira

#CASSANDRAEU

CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple

Complex

Aristotle’s birthplace?
select birthPlace
where name = “Aristotle”;

All ancient Greek philosophers?

Coordinates of Stagira?
select coord
where name = “Stagira”;

All cities within 100km of Stagira

#CASSANDRAEU

CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple
Aristotle’s birthplace?
select birthPlace
where name = “Aristotle”;

Complex
All ancient Greek philosophers?
create index on tag;
select *
where tag = “Greek philosophy”;

Coordinates of Stagira?
select coord
where name = “Stagira”;

#CASSANDRAEU

All cities within 100km of Stagira
???

CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple
Aristotle’s birthplace?
q=Aristotle&fl=birthPlace

All ancient Greek philosophers?
q=Greek philosophy

Coordinates of Stagira?
q=Stagira&fl=point

All cities within 100km of Stagira
q=*:*&fq={!geofilt pt=40.530,
23.752 sfield=point d=100}

#CASSANDRAEU

CASSANDRASUMMITEU
Approaches to Search
• Google Site Search
• MySQL ‘like’ statements

Seth Casteel - littlefriendsphoto.com

#CASSANDRAEU

CASSANDRASUMMITEU
Approaches to Search

• Full-text search
• Ranking (Scoring)
• Tokenization
• Stemming
• Faceting

#CASSANDRAEU

CASSANDRASUMMITEU
Approaches to Search

• Full-text search
• Ranking (Scoring)
• Tokenization
• Stemming
• Faceting

#CASSANDRAEU

and many more!

CASSANDRASUMMITEU
Inverted Index
[1] Pleasure in the job puts perfection in the work.
[2] Education is the best provision for the journey to old age.
[3] If some animals are good at hunting and others are suitable
for hunting, then the gods must clearly smile on hunting.
[4] It is the mark of an educated mind to be able to entertain a
thought without absorbing it.

Term

Freq

Documents

education

[2] [4]

hunting

3

[3]

perfection
#CASSANDRAEU

2
1

[1]
CASSANDRASUMMITEU
Defining Field Types
<fieldType name="text_general" class="solr.TextField" >
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"/>
<filter class="solr.StopFilterFactory” ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
#CASSANDRAEU

CASSANDRASUMMITEU
Index-side Analysis

Punctuation
Stop Words
Lowercase
Stemming

#CASSANDRAEU

Hope is a waking dream.
Hope is a waking dream
Hope
waking dream
hope
waking dream
hope
wake dream

CASSANDRASUMMITEU
Query-side Analysis

Punctuation
Stop Words
Lowercase
Stemming
Synonyms
#CASSANDRAEU

Hope is a waking dream.
Hope is a waking dream
Hope
waking dream
hope
waking dream
hope
wake dream
hope
wake dream
desire awake wish
CASSANDRASUMMITEU
Faceting
facet_fields: {
tags: [hunting: 1, education: 2, work: 2],
locations: [Stagira: 5, Chalcis: 3]
}

#CASSANDRAEU

CASSANDRASUMMITEU
Approaches to Distribution
• Pay Google more $$
• MySQL ‘like’ shards
• Master/Slave replication
• SolrCloud

#CASSANDRAEU

CASSANDRASUMMITEU
“Distributed search is hard.”

#CASSANDRAEU

CASSANDRASUMMITEU
Solr + Cassandra: Datastax Enterprise
• Full-text search
• Tokenization
• Stemming
• Date ranges
• Aggregation
• High Availability
• Distributed Nature

#CASSANDRAEU

CASSANDRASUMMITEU
Examining DBpedia.org Datasets
Person
<http://xmlns.com/foaf/0.1/name> "Aristotle"@en .
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> Person .
<http://purl.org/dc/elements/1.1/description> "Greek philosopher"@en .
<http://dbpedia.org/ontology/birthPlace> Stagira
<http://dbpedia.org/ontology/deathPlace> Chalcis .

Place
<http://dbpedia.org/resource/Stagira> <http://www.opengis.net/gml/_Feature> .
<http://dbpedia.org/resource/Stagira#lat> "40.5916667" .
<http://dbpedia.org/resource/Stagira#long> "23.7947222" .
<http://dbpedia.org/resource/Stagira#point> "40.591667 23.7947222"@en .

#CASSANDRAEU

CASSANDRASUMMITEU
“Love is a single soul
inhabiting two bodies.”
#CASSANDRAEU

CASSANDRASUMMITEU
Querying for data
curl http://localhost:8983/solr/solr.person/q=Aristotle

#CASSANDRAEU

CASSANDRASUMMITEU
Filtering by location
curl http://localhost:8983/solr/solr.location/select?q=*:*
&spatial=true&fq={!geofilt pt=40.53027,23.7525 sfield=point
d=100}

#CASSANDRAEU

CASSANDRASUMMITEU
“There is no great genius without a
mixture of madness.”
#CASSANDRAEU

CASSANDRASUMMITEU
Schema.xml
Unified Schema
<fields>
<field name="id" type="string" />
<field name="name" type="text" />
<dynamicField name="*Date" type="date" />
<dynamicField name="*Place" type="text" />
<dynamicField name="*Point" type="location" />
<dynamicField name="*_tag"

type="text" />

</fields>

#CASSANDRAEU

CASSANDRASUMMITEU
Upload to DSE, Create Core
curl http://localhost:8983/solr/resource/solr.location/schema.xml 
--data-binary @solr/location_schema.xml 
-H 'Content-type:text/xml; charset=utf-8 '

curl http://localhost:8983/solr/resource/solr.location/solrconfig.xml 
--data-binary @solr/location_solrconfig.xml 
-H 'Content-type:text/xml; charset=utf-8 '
curl http://localhost:8983/solr/admin/cores?action=CREATE&name=solr.location

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra Schema
Location Schema

gc_grace_seconds=864000 AND

cqlsh:solr> DESC COLUMNFAMILY location;

read_repair_chance=0.100000 AND

CREATE TABLE location (

replicate_on_write='true' AND

id text PRIMARY KEY,
"_docBoost" text,
"_dynFld" text,
location text,

populate_io_cache_on_flush='false' AND
compaction={'class':
'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression':
'SnappyCompressor'};

name text,
solr_query text,
tags text
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND

CREATE INDEX
solr_location__docBoost_index ON location
("_docBoost");
...
CREATE INDEX
solr_location_solr_query_index ON
location (solr_query);

dclocal_read_repair_chance=0.000000 AND
#CASSANDRAEU

CASSANDRASUMMITEU
What Changes
Solr

Cassandra

• No multiValued fields

• No composite columns

• No JOIN*

• No counter columns

#CASSANDRAEU

CASSANDRASUMMITEU
Bringing it All Together
•

Fault tolerant, available
search

#CASSANDRAEU

CASSANDRASUMMITEU
“Thank you.”

#CASSANDRAEU

CASSANDRASUMMITEU
THANK YOU

pgorla@opensourceconnections.com
@patriciagorla
@o19s
All information, including slides, are on http://github.com/pgorla/million-books
#CASSANDRAEU

CASSANDRASUMMITEU

Mais conteúdo relacionado

Semelhante a Adventures in Discovery with Cassandra and Solr

A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)BTI360
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자Donghyeok Kang
 
6 things about perl 6
6 things about perl 66 things about perl 6
6 things about perl 6brian d foy
 
6 more things about Perl 6
6 more things about Perl 66 more things about Perl 6
6 more things about Perl 6brian d foy
 
Introdução a buscas com solr
Introdução a buscas com solrIntrodução a buscas com solr
Introdução a buscas com solrClaudson Oliveira
 
Dev in Santos - Como NÃO fazer pesquisas usando LIKE
Dev in Santos - Como NÃO fazer pesquisas usando LIKEDev in Santos - Como NÃO fazer pesquisas usando LIKE
Dev in Santos - Como NÃO fazer pesquisas usando LIKEFabio Akita
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UNLucidworks
 

Semelhante a Adventures in Discovery with Cassandra and Solr (10)

A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
 
Solr Anti Patterns
Solr Anti PatternsSolr Anti Patterns
Solr Anti Patterns
 
6 things about perl 6
6 things about perl 66 things about perl 6
6 things about perl 6
 
6 more things about Perl 6
6 more things about Perl 66 more things about Perl 6
6 more things about Perl 6
 
Introdução a buscas com solr
Introdução a buscas com solrIntrodução a buscas com solr
Introdução a buscas com solr
 
Dev in Santos - Como NÃO fazer pesquisas usando LIKE
Dev in Santos - Como NÃO fazer pesquisas usando LIKEDev in Santos - Como NÃO fazer pesquisas usando LIKE
Dev in Santos - Como NÃO fazer pesquisas usando LIKE
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
 
Fuzzing - A Tale of Two Cultures
Fuzzing - A Tale of Two CulturesFuzzing - A Tale of Two Cultures
Fuzzing - A Tale of Two Cultures
 

Último

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Adventures in Discovery with Cassandra and Solr

  • 1. Adventures in Discoverability with C* and Solr Patricia Gorla, Systems Engineer @patriciagorla @o19s #CASSANDRAEU CASSANDRASUMMITEU
  • 2. About Me • Solr • Cassandra • Information retrieval Paul Hostetler - phostetler.com #CASSANDRAEU CASSANDRASUMMITEU
  • 3. How Do I Find What I’m Looking For? Simple Complex Aristotle’s birthplace? All ancient Greek philosophers? Coordinates of Stagira? All cities within 100km of Stagira #CASSANDRAEU CASSANDRASUMMITEU
  • 4. How Do I Find What I’m Looking For? Simple Complex Aristotle’s birthplace? select birthPlace where name = “Aristotle”; All ancient Greek philosophers? Coordinates of Stagira? select coord where name = “Stagira”; All cities within 100km of Stagira #CASSANDRAEU CASSANDRASUMMITEU
  • 5. How Do I Find What I’m Looking For? Simple Aristotle’s birthplace? select birthPlace where name = “Aristotle”; Complex All ancient Greek philosophers? create index on tag; select * where tag = “Greek philosophy”; Coordinates of Stagira? select coord where name = “Stagira”; #CASSANDRAEU All cities within 100km of Stagira ??? CASSANDRASUMMITEU
  • 6. How Do I Find What I’m Looking For? Simple Aristotle’s birthplace? q=Aristotle&fl=birthPlace All ancient Greek philosophers? q=Greek philosophy Coordinates of Stagira? q=Stagira&fl=point All cities within 100km of Stagira q=*:*&fq={!geofilt pt=40.530, 23.752 sfield=point d=100} #CASSANDRAEU CASSANDRASUMMITEU
  • 7. Approaches to Search • Google Site Search • MySQL ‘like’ statements Seth Casteel - littlefriendsphoto.com #CASSANDRAEU CASSANDRASUMMITEU
  • 8. Approaches to Search • Full-text search • Ranking (Scoring) • Tokenization • Stemming • Faceting #CASSANDRAEU CASSANDRASUMMITEU
  • 9. Approaches to Search • Full-text search • Ranking (Scoring) • Tokenization • Stemming • Faceting #CASSANDRAEU and many more! CASSANDRASUMMITEU
  • 10. Inverted Index [1] Pleasure in the job puts perfection in the work. [2] Education is the best provision for the journey to old age. [3] If some animals are good at hunting and others are suitable for hunting, then the gods must clearly smile on hunting. [4] It is the mark of an educated mind to be able to entertain a thought without absorbing it. Term Freq Documents education [2] [4] hunting 3 [3] perfection #CASSANDRAEU 2 1 [1] CASSANDRASUMMITEU
  • 11. Defining Field Types <fieldType name="text_general" class="solr.TextField" > <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"/> <filter class="solr.StopFilterFactory” ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> #CASSANDRAEU CASSANDRASUMMITEU
  • 12. Index-side Analysis Punctuation Stop Words Lowercase Stemming #CASSANDRAEU Hope is a waking dream. Hope is a waking dream Hope waking dream hope waking dream hope wake dream CASSANDRASUMMITEU
  • 13. Query-side Analysis Punctuation Stop Words Lowercase Stemming Synonyms #CASSANDRAEU Hope is a waking dream. Hope is a waking dream Hope waking dream hope waking dream hope wake dream hope wake dream desire awake wish CASSANDRASUMMITEU
  • 14. Faceting facet_fields: { tags: [hunting: 1, education: 2, work: 2], locations: [Stagira: 5, Chalcis: 3] } #CASSANDRAEU CASSANDRASUMMITEU
  • 15. Approaches to Distribution • Pay Google more $$ • MySQL ‘like’ shards • Master/Slave replication • SolrCloud #CASSANDRAEU CASSANDRASUMMITEU
  • 16. “Distributed search is hard.” #CASSANDRAEU CASSANDRASUMMITEU
  • 17. Solr + Cassandra: Datastax Enterprise • Full-text search • Tokenization • Stemming • Date ranges • Aggregation • High Availability • Distributed Nature #CASSANDRAEU CASSANDRASUMMITEU
  • 18. Examining DBpedia.org Datasets Person <http://xmlns.com/foaf/0.1/name> "Aristotle"@en . <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> Person . <http://purl.org/dc/elements/1.1/description> "Greek philosopher"@en . <http://dbpedia.org/ontology/birthPlace> Stagira <http://dbpedia.org/ontology/deathPlace> Chalcis . Place <http://dbpedia.org/resource/Stagira> <http://www.opengis.net/gml/_Feature> . <http://dbpedia.org/resource/Stagira#lat> "40.5916667" . <http://dbpedia.org/resource/Stagira#long> "23.7947222" . <http://dbpedia.org/resource/Stagira#point> "40.591667 23.7947222"@en . #CASSANDRAEU CASSANDRASUMMITEU
  • 19. “Love is a single soul inhabiting two bodies.” #CASSANDRAEU CASSANDRASUMMITEU
  • 20. Querying for data curl http://localhost:8983/solr/solr.person/q=Aristotle #CASSANDRAEU CASSANDRASUMMITEU
  • 21. Filtering by location curl http://localhost:8983/solr/solr.location/select?q=*:* &spatial=true&fq={!geofilt pt=40.53027,23.7525 sfield=point d=100} #CASSANDRAEU CASSANDRASUMMITEU
  • 22. “There is no great genius without a mixture of madness.” #CASSANDRAEU CASSANDRASUMMITEU
  • 23. Schema.xml Unified Schema <fields> <field name="id" type="string" /> <field name="name" type="text" /> <dynamicField name="*Date" type="date" /> <dynamicField name="*Place" type="text" /> <dynamicField name="*Point" type="location" /> <dynamicField name="*_tag" type="text" /> </fields> #CASSANDRAEU CASSANDRASUMMITEU
  • 24. Upload to DSE, Create Core curl http://localhost:8983/solr/resource/solr.location/schema.xml --data-binary @solr/location_schema.xml -H 'Content-type:text/xml; charset=utf-8 ' curl http://localhost:8983/solr/resource/solr.location/solrconfig.xml --data-binary @solr/location_solrconfig.xml -H 'Content-type:text/xml; charset=utf-8 ' curl http://localhost:8983/solr/admin/cores?action=CREATE&name=solr.location #CASSANDRAEU CASSANDRASUMMITEU
  • 25. Cassandra Schema Location Schema gc_grace_seconds=864000 AND cqlsh:solr> DESC COLUMNFAMILY location; read_repair_chance=0.100000 AND CREATE TABLE location ( replicate_on_write='true' AND id text PRIMARY KEY, "_docBoost" text, "_dynFld" text, location text, populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; name text, solr_query text, tags text ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND CREATE INDEX solr_location__docBoost_index ON location ("_docBoost"); ... CREATE INDEX solr_location_solr_query_index ON location (solr_query); dclocal_read_repair_chance=0.000000 AND #CASSANDRAEU CASSANDRASUMMITEU
  • 26. What Changes Solr Cassandra • No multiValued fields • No composite columns • No JOIN* • No counter columns #CASSANDRAEU CASSANDRASUMMITEU
  • 27. Bringing it All Together • Fault tolerant, available search #CASSANDRAEU CASSANDRASUMMITEU
  • 29. THANK YOU pgorla@opensourceconnections.com @patriciagorla @o19s All information, including slides, are on http://github.com/pgorla/million-books #CASSANDRAEU CASSANDRASUMMITEU