SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
PostgreSQL Full-text Search
in Django
Paweł Kowalski
● What is full-text search
● How it works in PostgreSQL
○ search
○ ranking
● How to use it in Django
● Questions
Agenda
Full text search refers to techniques for searching a
single computer-stored document or a collection in a
full text database.
https://en.wikipedia.org/wiki/Full_text_search
WHAT IS FULL TEXT SEARCH
SELECT *
FROM table
WHERE Col1 LIKE '%query%';
WHAT IS FULL TEXT SEARCH
SELECT *
FROM table
WHERE Col1 LIKE '%query%';
WHAT IS FULL TEXT SEARCH
SLOW, EXPENSIVE,
NO ORDERING BY RELEVANCE
● LIKE ‘%query’ can’t use index
● Col1 can be very long (eg. entire book)
SELECT to_tsvector(
'english',
'Try not to become a man of success, but rather try to become a man of
value'
);
to_tsvector
----------------------------------------------------------------------
'becom':4,13 'man':6,15 'rather':10 'success':8 'tri':1,11 'valu':17
(1 row)
HOW IT WORKS IN POSTGRESQL
PostgreSQL, please help!
TSVECTOR
Since PostgreSQL 8.3
select to_tsvector('If you can dream it, you can do it') @@ 'dream';
?column?
----------
t
(1 row)
select to_tsvector('It''s kind of fun to do the impossible') @@ 'impossible';
?column?
----------
f
(1 row)
HOW IT WORKS IN POSTGRESQL
Search Operator: @@
SELECT 'dream'::tsquery, to_tsquery('dream');
tsquery | to_tsquery
--------------+------------
'dream' | 'dream'
(1 row)
SELECT 'impossible'::tsquery, to_tsquery('impossible');
tsquery | to_tsquery
--------------+------------
'impossible' | 'imposs'
(1 row)
HOW IT WORKS IN POSTGRESQL
TO_TSQUERY function
SELECT to_tsvector('It''s kind of fun to do the impossible') @@ to_tsquery
('impossible');
?column?
----------
t
(1 row)
HOW IT WORKS IN POSTGRESQL
TO_TSQUERY function
SELECT to_tsvector('If the facts don't fit the theory, change the facts') @@
to_tsquery('! fact');
SELECT to_tsvector('If the facts don''t fit the theory, change the facts') @@
to_tsquery('theory & !fact');
SELECT to_tsvector('If the facts don''t fit the theory, change the facts.') @@
to_tsquery('fiction | theory');
HOW IT WORKS IN POSTGRESQL
Query Operators: ! & |
SELECT COUNT(*) FROM ticketing_event WHERE name ILIKE '%madonna%rebel%heart%
tour%';
Time: 78,083 ms
HOW IT WORKS IN POSTGRESQL
Some numbers
SELECT COUNT(*) FROM ticketing_event WHERE search_vector @@ 'madonna & rebel &
heart & tour'::tsquery;
Time: 30,065 ms
SELECT COUNT(*) FROM ticketing_event;
count
-------
68889
Time: 11,440 ms
SELECT post.id, setweight(to_tsvector(post.title), ‘A’) ||
setweight(to_tsvector(post.content), ‘B’) AS vector1
FROM post
WHERE vector1 @@ to_tsquery(‘Michael & Jackson’)
ORDER BY ts_rank(vector1, to_tsquery(‘Michael & Jackson’));
HOW IT WORKS IN POSTGRESQL
Ranking:
SETWEIGHT, TS_RANK functions
SELECT ts_rank(to_tsvector('This is an example of document'),
to_tsquery('example')) as relevancy;
relevancy
-----------
0.0607927
(1 row)
SELECT ts_rank(to_tsvector('This is an example of document'),
to_tsquery('example | unknown')) as relevancy;
relevancy
-----------
0.0303964
(1 row)
HOW IT WORKS IN POSTGRESQL
Ranking:
SETWEIGHT, TS_RANK functions
HOW TO USE IT IN DJANGO
● django-pg-fts
● djorm-ext-pgfulltext
HOW TO USE IT IN DJANGO
● django-pg-fts
● djorm-ext-pgfulltext
[WIP] Refs #3254 -- Add Full Text Search to contrib.postgres
Post.objects.annotate(
search=SearchVector('title')
+ SearchVector('content'),
).filter(search='Michael Jackson')
HOW IT WORKS IN POSTGRESQL
SearchVector model field
Post.objects.filter(title__search='Michael Jackson')
HOW IT WORKS IN POSTGRESQL
SearchVector model field (stored)
class Post(models.Model):
title = models.CharField(max_length=100)
content = models.TextField()
search_vector = SearchVectorField()
Post.objects.filter(search_vector='Michael Jackson')
vector = SearchVector('title', weight=’A’) + SearchVector('content', weight=’B’)
post.search_vector = vector
post.save()
Update SearchVector field in post_save signal
HOW IT WORKS IN POSTGRESQL
django.contrib.postgres.search.SearchRank
queryset = Post.objects.annotate(
rank=SearchRank(
models.F('search_vector'),
SearchQuery('Michael Jackson')
),
)
queryset.filter(rank__gt=0.5).order_by('-rank')
QuestionTime

Mais conteúdo relacionado

Mais de STX Next

Mais de STX Next (17)

Scrum Master - Breakout session.
Scrum Master - Breakout session.Scrum Master - Breakout session.
Scrum Master - Breakout session.
 
Transformation to agile.
Transformation to agile.Transformation to agile.
Transformation to agile.
 
What scrum masters and product owners should know about software quality and ...
What scrum masters and product owners should know about software quality and ...What scrum masters and product owners should know about software quality and ...
What scrum masters and product owners should know about software quality and ...
 
The essence hidden from the eye.
The essence hidden from the eye. The essence hidden from the eye.
The essence hidden from the eye.
 
Why to nearshore in Central Europe?
Why to nearshore in Central Europe?Why to nearshore in Central Europe?
Why to nearshore in Central Europe?
 
Is there a common pattern in fixing projects?
Is there a common pattern in fixing projects?Is there a common pattern in fixing projects?
Is there a common pattern in fixing projects?
 
Behave automatically: (Almost) Effortless feature testing
Behave automatically: (Almost) Effortless feature testingBehave automatically: (Almost) Effortless feature testing
Behave automatically: (Almost) Effortless feature testing
 
Time to React!
Time to React!Time to React!
Time to React!
 
Salary Formula - bumpy road to transparency.
Salary Formula - bumpy road to transparency.Salary Formula - bumpy road to transparency.
Salary Formula - bumpy road to transparency.
 
Software Quality Visualization
Software Quality Visualization Software Quality Visualization
Software Quality Visualization
 
Kotlin Advanced - language reference for Android developers
Kotlin Advanced - language reference for Android developers Kotlin Advanced - language reference for Android developers
Kotlin Advanced - language reference for Android developers
 
Discover, Define, Deliver - a workflow to create successful digital products.
Discover, Define, Deliver - a workflow to create successful digital products. Discover, Define, Deliver - a workflow to create successful digital products.
Discover, Define, Deliver - a workflow to create successful digital products.
 
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
 
Zwinność procesu rekrutacyjnego w branży IT
Zwinność procesu rekrutacyjnego w branży ITZwinność procesu rekrutacyjnego w branży IT
Zwinność procesu rekrutacyjnego w branży IT
 
STX Next - Scrum Development Process Overview
STX Next - Scrum Development Process OverviewSTX Next - Scrum Development Process Overview
STX Next - Scrum Development Process Overview
 
STX Next - Meet Us
STX Next - Meet UsSTX Next - Meet Us
STX Next - Meet Us
 
Group Process by Example - a PO’s and SM’s perspective
Group Process by Example - a PO’s and SM’s perspectiveGroup Process by Example - a PO’s and SM’s perspective
Group Process by Example - a PO’s and SM’s perspective
 

Último

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Último (20)

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 

PostgreSQL Full-text Search in Django

  • 1. PostgreSQL Full-text Search in Django Paweł Kowalski
  • 2. ● What is full-text search ● How it works in PostgreSQL ○ search ○ ranking ● How to use it in Django ● Questions Agenda
  • 3. Full text search refers to techniques for searching a single computer-stored document or a collection in a full text database. https://en.wikipedia.org/wiki/Full_text_search WHAT IS FULL TEXT SEARCH
  • 4. SELECT * FROM table WHERE Col1 LIKE '%query%'; WHAT IS FULL TEXT SEARCH
  • 5. SELECT * FROM table WHERE Col1 LIKE '%query%'; WHAT IS FULL TEXT SEARCH SLOW, EXPENSIVE, NO ORDERING BY RELEVANCE ● LIKE ‘%query’ can’t use index ● Col1 can be very long (eg. entire book)
  • 6. SELECT to_tsvector( 'english', 'Try not to become a man of success, but rather try to become a man of value' ); to_tsvector ---------------------------------------------------------------------- 'becom':4,13 'man':6,15 'rather':10 'success':8 'tri':1,11 'valu':17 (1 row) HOW IT WORKS IN POSTGRESQL PostgreSQL, please help! TSVECTOR Since PostgreSQL 8.3
  • 7. select to_tsvector('If you can dream it, you can do it') @@ 'dream'; ?column? ---------- t (1 row) select to_tsvector('It''s kind of fun to do the impossible') @@ 'impossible'; ?column? ---------- f (1 row) HOW IT WORKS IN POSTGRESQL Search Operator: @@
  • 8. SELECT 'dream'::tsquery, to_tsquery('dream'); tsquery | to_tsquery --------------+------------ 'dream' | 'dream' (1 row) SELECT 'impossible'::tsquery, to_tsquery('impossible'); tsquery | to_tsquery --------------+------------ 'impossible' | 'imposs' (1 row) HOW IT WORKS IN POSTGRESQL TO_TSQUERY function
  • 9. SELECT to_tsvector('It''s kind of fun to do the impossible') @@ to_tsquery ('impossible'); ?column? ---------- t (1 row) HOW IT WORKS IN POSTGRESQL TO_TSQUERY function
  • 10. SELECT to_tsvector('If the facts don't fit the theory, change the facts') @@ to_tsquery('! fact'); SELECT to_tsvector('If the facts don''t fit the theory, change the facts') @@ to_tsquery('theory & !fact'); SELECT to_tsvector('If the facts don''t fit the theory, change the facts.') @@ to_tsquery('fiction | theory'); HOW IT WORKS IN POSTGRESQL Query Operators: ! & |
  • 11. SELECT COUNT(*) FROM ticketing_event WHERE name ILIKE '%madonna%rebel%heart% tour%'; Time: 78,083 ms HOW IT WORKS IN POSTGRESQL Some numbers SELECT COUNT(*) FROM ticketing_event WHERE search_vector @@ 'madonna & rebel & heart & tour'::tsquery; Time: 30,065 ms SELECT COUNT(*) FROM ticketing_event; count ------- 68889 Time: 11,440 ms
  • 12. SELECT post.id, setweight(to_tsvector(post.title), ‘A’) || setweight(to_tsvector(post.content), ‘B’) AS vector1 FROM post WHERE vector1 @@ to_tsquery(‘Michael & Jackson’) ORDER BY ts_rank(vector1, to_tsquery(‘Michael & Jackson’)); HOW IT WORKS IN POSTGRESQL Ranking: SETWEIGHT, TS_RANK functions
  • 13. SELECT ts_rank(to_tsvector('This is an example of document'), to_tsquery('example')) as relevancy; relevancy ----------- 0.0607927 (1 row) SELECT ts_rank(to_tsvector('This is an example of document'), to_tsquery('example | unknown')) as relevancy; relevancy ----------- 0.0303964 (1 row) HOW IT WORKS IN POSTGRESQL Ranking: SETWEIGHT, TS_RANK functions
  • 14. HOW TO USE IT IN DJANGO ● django-pg-fts ● djorm-ext-pgfulltext
  • 15. HOW TO USE IT IN DJANGO ● django-pg-fts ● djorm-ext-pgfulltext [WIP] Refs #3254 -- Add Full Text Search to contrib.postgres
  • 16. Post.objects.annotate( search=SearchVector('title') + SearchVector('content'), ).filter(search='Michael Jackson') HOW IT WORKS IN POSTGRESQL SearchVector model field Post.objects.filter(title__search='Michael Jackson')
  • 17. HOW IT WORKS IN POSTGRESQL SearchVector model field (stored) class Post(models.Model): title = models.CharField(max_length=100) content = models.TextField() search_vector = SearchVectorField() Post.objects.filter(search_vector='Michael Jackson') vector = SearchVector('title', weight=’A’) + SearchVector('content', weight=’B’) post.search_vector = vector post.save() Update SearchVector field in post_save signal
  • 18. HOW IT WORKS IN POSTGRESQL django.contrib.postgres.search.SearchRank queryset = Post.objects.annotate( rank=SearchRank( models.F('search_vector'), SearchQuery('Michael Jackson') ), ) queryset.filter(rank__gt=0.5).order_by('-rank')