SlideShare uma empresa Scribd logo
1 de 13
Interfaces to Xapian
Open source search day 2009
C++
#include <xapian.h>
Xapian::WritableDatabase db(path, Xapian::DB_OPEN);
Xapian::Document doc;
doc.add_term(“foo”);
db.add_document(doc);
Python: xapian
import xapian
db = xapian.WritableDatabase(path, xapian.DB_OPEN)
doc = xapian.Document()
doc.add_term(“foo”)
db.add_document(doc)
Python: xappy
from xappy import IndexerConnection, FieldActions
db = xappy.IndexerConnection(path)
db.add_field_action(“text”, FieldActions.INDEX_FREETEXT)
doc = xappy.UnprocessedDocument()
doc.append(“text”, “foo”)
db.add(doc)
Python: xappy
from xappy import IndexerConnection, FieldActions
db = xappy.IndexerConnection(path)
db.add_field_action(“text”, FieldActions.INDEX_FREETEXT,
language=”french”)
doc = xappy.UnprocessedDocument()
doc.append(“text”, “foo”)
db.add(doc)
from xappy2.core import *
db = xappy.IndexerConnection(path)
db.add_field_type(“text”, TEXT, language=”french”)
db.add_index(“text”, StandardAnalyser)
doc = xappy.UnprocessedDocument()
doc.append(“text”, “foo”)
db.add(doc)
Python: xappy2.core
Python: xappy2.server
REST based API
Python: xappy2.server
PUT to /v1/dbs/dbname
POST to /v1/dbs/dbname/schema/fields/text
{ 'type': 'text', 'freetext': {'language': 'en'} } }
POST to /v1/dbs/dbname/docs
{ 'text': ['foo'] }
(or PUT to /v1/dbs/dbname/docs/docid)
Python: Zope: ore.xapian
Zope style layer on top of xappy:
class Content( object ):
... implements( interfaces.IIndexable )
Asynchronous loading/updating, event integration,
etc
Python: Django: Djapian
Django integration layer on top of xapian
import djapian
class EntryIndexer(djapian.Indexer):
fields=["text"]
Tags=[ ("content", "content.text" ) ]
Python: Django: Haystack
Another Django integration layer on top of xapian
from haystack import indexes
class TextIndex(indexes.SearchIndex):
text = indexes.CharField(document=True,
use_template=True)
Other
Similar stack of interfaces for Ruby, PHP
Java, C# just have bindings, so far
Image Searching with Xappy
db.add_field_action('image', FieldActions.IMGSEEK,
terms = True)
doc.fields.append('image', path_to_image_file)
db.add(doc)
query = sconn.query_image_similarity('image', docid='0')

Mais conteúdo relacionado

Mais procurados

Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan GateApache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Yahoo Developer Network
 

Mais procurados (20)

Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
 
EuroPython 2017 - Bonono - Simple ETL in python 3.5+
EuroPython 2017 - Bonono - Simple ETL in python 3.5+EuroPython 2017 - Bonono - Simple ETL in python 3.5+
EuroPython 2017 - Bonono - Simple ETL in python 3.5+
 
Database Homework Help
Database Homework HelpDatabase Homework Help
Database Homework Help
 
Kibana: Real-World Examples
Kibana: Real-World ExamplesKibana: Real-World Examples
Kibana: Real-World Examples
 
MongoDB and Python
MongoDB and PythonMongoDB and Python
MongoDB and Python
 
Cascading at the Lyon Hadoop User Group
Cascading at the Lyon Hadoop User GroupCascading at the Lyon Hadoop User Group
Cascading at the Lyon Hadoop User Group
 
Dapper
DapperDapper
Dapper
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
 
Database Homework Help
Database Homework HelpDatabase Homework Help
Database Homework Help
 
Building social network with Neo4j and Python
Building social network with Neo4j and PythonBuilding social network with Neo4j and Python
Building social network with Neo4j and Python
 
ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.
 
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan GateApache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
 
Web Scrapping with Python
Web Scrapping with PythonWeb Scrapping with Python
Web Scrapping with Python
 
Parse, scale to millions
Parse, scale to millionsParse, scale to millions
Parse, scale to millions
 
Power shell
Power shellPower shell
Power shell
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talk
 
Latinoware
LatinowareLatinoware
Latinoware
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
 
Pig workshop
Pig workshopPig workshop
Pig workshop
 

Semelhante a Interfaces to xapian

Java 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forwardJava 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forward
Mario Fusco
 
4 - Files and Directories - Pemrograman Internet Lanjut.pptx
4 - Files and Directories - Pemrograman Internet Lanjut.pptx4 - Files and Directories - Pemrograman Internet Lanjut.pptx
4 - Files and Directories - Pemrograman Internet Lanjut.pptx
MasSam13
 

Semelhante a Interfaces to xapian (20)

Hadoop
HadoopHadoop
Hadoop
 
Know how to redirect input and output- and know how to append to an ex.docx
Know how to redirect input and output- and know how to append to an ex.docxKnow how to redirect input and output- and know how to append to an ex.docx
Know how to redirect input and output- and know how to append to an ex.docx
 
ITT 2015 - Saul Mora - Object Oriented Function Programming
ITT 2015 - Saul Mora - Object Oriented Function ProgrammingITT 2015 - Saul Mora - Object Oriented Function Programming
ITT 2015 - Saul Mora - Object Oriented Function Programming
 
Building .NET Apps using Couchbase Lite
Building .NET Apps using Couchbase LiteBuilding .NET Apps using Couchbase Lite
Building .NET Apps using Couchbase Lite
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...
 
Java/Scala Lab: Руслан Шевченко - Implementation of CSP (Communication Sequen...
Java/Scala Lab: Руслан Шевченко - Implementation of CSP (Communication Sequen...Java/Scala Lab: Руслан Шевченко - Implementation of CSP (Communication Sequen...
Java/Scala Lab: Руслан Шевченко - Implementation of CSP (Communication Sequen...
 
Java 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forwardJava 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forward
 
Jug java7
Jug java7Jug java7
Jug java7
 
APPENEDING OF DATA TO AN EXISTING FILES.
APPENEDING OF DATA TO AN EXISTING FILES.APPENEDING OF DATA TO AN EXISTING FILES.
APPENEDING OF DATA TO AN EXISTING FILES.
 
Python Google Cloud Function with CORS
Python Google Cloud Function with CORSPython Google Cloud Function with CORS
Python Google Cloud Function with CORS
 
Apache Beam de A à Z
 Apache Beam de A à Z Apache Beam de A à Z
Apache Beam de A à Z
 
Stream or not to Stream?

Stream or not to Stream?
Stream or not to Stream?

Stream or not to Stream?

 
SWT Lecture Session 4 - Sesame
SWT Lecture Session 4 - SesameSWT Lecture Session 4 - Sesame
SWT Lecture Session 4 - Sesame
 
4 sesame
4 sesame4 sesame
4 sesame
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScript
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
4 - Files and Directories - Pemrograman Internet Lanjut.pptx
4 - Files and Directories - Pemrograman Internet Lanjut.pptx4 - Files and Directories - Pemrograman Internet Lanjut.pptx
4 - Files and Directories - Pemrograman Internet Lanjut.pptx
 
File handling in C++
File handling in C++File handling in C++
File handling in C++
 

Mais de Richard Boulton (8)

Improving relevance with log information
Improving relevance with log informationImproving relevance with log information
Improving relevance with log information
 
Designing a generic Python Search Engine API - BarCampLondon 8
Designing a generic Python Search Engine API - BarCampLondon 8Designing a generic Python Search Engine API - BarCampLondon 8
Designing a generic Python Search Engine API - BarCampLondon 8
 
Making a simple question into a complicated query
Making a simple question into a complicated queryMaking a simple question into a complicated query
Making a simple question into a complicated query
 
Haystack
HaystackHaystack
Haystack
 
Search as a Service with Xapian - Search Solutions 2009
Search as a Service with Xapian - Search Solutions 2009Search as a Service with Xapian - Search Solutions 2009
Search as a Service with Xapian - Search Solutions 2009
 
Comparing open source search engines
Comparing open source search enginesComparing open source search engines
Comparing open source search engines
 
Optimising Xapian
Optimising XapianOptimising Xapian
Optimising Xapian
 
The Xapian Open Source Search Engine
The Xapian Open Source Search EngineThe Xapian Open Source Search Engine
The Xapian Open Source Search Engine
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Interfaces to xapian

  • 1. Interfaces to Xapian Open source search day 2009
  • 2. C++ #include <xapian.h> Xapian::WritableDatabase db(path, Xapian::DB_OPEN); Xapian::Document doc; doc.add_term(“foo”); db.add_document(doc);
  • 3. Python: xapian import xapian db = xapian.WritableDatabase(path, xapian.DB_OPEN) doc = xapian.Document() doc.add_term(“foo”) db.add_document(doc)
  • 4. Python: xappy from xappy import IndexerConnection, FieldActions db = xappy.IndexerConnection(path) db.add_field_action(“text”, FieldActions.INDEX_FREETEXT) doc = xappy.UnprocessedDocument() doc.append(“text”, “foo”) db.add(doc)
  • 5. Python: xappy from xappy import IndexerConnection, FieldActions db = xappy.IndexerConnection(path) db.add_field_action(“text”, FieldActions.INDEX_FREETEXT, language=”french”) doc = xappy.UnprocessedDocument() doc.append(“text”, “foo”) db.add(doc)
  • 6. from xappy2.core import * db = xappy.IndexerConnection(path) db.add_field_type(“text”, TEXT, language=”french”) db.add_index(“text”, StandardAnalyser) doc = xappy.UnprocessedDocument() doc.append(“text”, “foo”) db.add(doc) Python: xappy2.core
  • 8. Python: xappy2.server PUT to /v1/dbs/dbname POST to /v1/dbs/dbname/schema/fields/text { 'type': 'text', 'freetext': {'language': 'en'} } } POST to /v1/dbs/dbname/docs { 'text': ['foo'] } (or PUT to /v1/dbs/dbname/docs/docid)
  • 9. Python: Zope: ore.xapian Zope style layer on top of xappy: class Content( object ): ... implements( interfaces.IIndexable ) Asynchronous loading/updating, event integration, etc
  • 10. Python: Django: Djapian Django integration layer on top of xapian import djapian class EntryIndexer(djapian.Indexer): fields=["text"] Tags=[ ("content", "content.text" ) ]
  • 11. Python: Django: Haystack Another Django integration layer on top of xapian from haystack import indexes class TextIndex(indexes.SearchIndex): text = indexes.CharField(document=True, use_template=True)
  • 12. Other Similar stack of interfaces for Ruby, PHP Java, C# just have bindings, so far
  • 13. Image Searching with Xappy db.add_field_action('image', FieldActions.IMGSEEK, terms = True) doc.fields.append('image', path_to_image_file) db.add(doc) query = sconn.query_image_similarity('image', docid='0')