SlideShare a Scribd company logo
1 of 25
Download to read offline
Importing Wikipedia in Plone
Eric BREHAULT – Plone Conference 2013
ZODB is good at storing objects
● Plone contents are objects,
● we store them in the ZODB,
● everything is fine, end of the story.
But what if ...
... we want to store non-contentish records?
Like polls, statistics, mail-list subscribers, etc.,
or any business-specific structured data.
Store them as contents anyway
That is a powerfull solution.
But there are 2 major problems...
Problem 1: You need to manage a secondary
system
● you need to deploy it,
● you need to backup it,
● you need to secure it,
● etc.
Problem 2: I hate SQL
No explanation here.
I think I just cannot digest it...
How to store many records in the ZODB?
● Is the ZODB strong enough?
● Is the ZCatalog strong enough?
My grandmother often told me
"If you want to become stronger, you have to eat your soup."
Where do we find a good soup for Plone?
In a super souper!!!
souper.plone and souper
● It provides both storage and indexing.
● Record can store any persistent pickable data.
● Created by BlueDynamics.
● Based on ZODB BTrees, node.ext.zodb, and repoze.catalog.
Add a record
>>> soup = get_soup('mysoup', context)
>>> record = Record()
>>> record.attrs['user'] = 'user1'
>>> record.attrs['text'] = u'foo bar baz'
>>> record.attrs['keywords'] = [u'1', u'2', u'ü']
>>> record_id = soup.add(record)
Record in record
>>> record['homeaddress'] = Record()
>>> record['homeaddress'].attrs['zip'] = '6020'
>>> record['homeaddress'].attrs['town'] = 'Innsbruck'
>>> record['homeaddress'].attrs['country'] = 'Austria'
Access record
>>> from souper.soup import get_soup
>>> soup = get_soup('mysoup', context)
>>> record = soup.get(record_id)
Query
>>> from repoze.catalog.query import Eq, Contains
>>> [r for r in soup.query(Eq('user', 'user1')
& Contains('text', 'foo'))]
[<Record object 'None' at ...>]
or using CQE format
>>> [r for r in soup.query("user == 'user1' and 'foo' in text")]
[<Record object 'None' at ...>]
souper
● a Soup-container can be moved to a specific ZODB mount-
point,
● it can be shared across multiple independent Plone instances,
● souper works on Plone and Pyramid.
Plomino & souper
● we use Plomino to build non-content oriented apps easily,
● we use souper to store huge amount of application data.
Plomino data storage
Originally, documents (=record) were ATFolder.
Capacity about 30 000.
Plomino data storage
Since 1.14, documents are pure CMF.
Capacity about 100 000.
Usally the Plomino ZCatalog contains a lot of indexes.
Plomino & souper
With souper, documents are just soup records.
Capacity: several millions.
Typical use case
● Store 500 000 addresses,
● Be able to query them in full text and display the result on a map.
Demo
What is the limit?
Can we import Wikipedia in souper?
Demo with 400 000 records
Demo with 5,5 millions of records
Conclusion
● Usage performances are good,
● Plone performances are not impacted.
Use it!
Thoughts
● What about a REST API on top of it?
● Massive import is long and difficult, could it be improved?
Makina Corpus
For all questions related to this talk,
please contact Éric Bréhault
eric.brehault@makina-corpus.com
Tel : +33 534 566 958
www.makina-corpus.com

More Related Content

Similar to Importing Wikipedia in Plone

python-160403194316.pdf
python-160403194316.pdfpython-160403194316.pdf
python-160403194316.pdf
gmadhu8
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
MongoDB
 
Linux multiplexing
Linux multiplexingLinux multiplexing
Linux multiplexing
Mark Veltzer
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthrough
gabriellekuruvilla
 

Similar to Importing Wikipedia in Plone (20)

Centralized logging system using mongoDB
Centralized logging system using mongoDBCentralized logging system using mongoDB
Centralized logging system using mongoDB
 
Async IO and Multithreading explained
Async IO and Multithreading explainedAsync IO and Multithreading explained
Async IO and Multithreading explained
 
Buildout: creating and deploying repeatable applications in python
Buildout: creating and deploying repeatable applications in pythonBuildout: creating and deploying repeatable applications in python
Buildout: creating and deploying repeatable applications in python
 
python-160403194316.pdf
python-160403194316.pdfpython-160403194316.pdf
python-160403194316.pdf
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
 
Data analysis with pandas
Data analysis with pandasData analysis with pandas
Data analysis with pandas
 
Data Analysis With Pandas
Data Analysis With PandasData Analysis With Pandas
Data Analysis With Pandas
 
Python
PythonPython
Python
 
Python Seminar PPT
Python Seminar PPTPython Seminar PPT
Python Seminar PPT
 
python into.pptx
python into.pptxpython into.pptx
python into.pptx
 
Running a Plone product on Substance D
Running a Plone product on Substance DRunning a Plone product on Substance D
Running a Plone product on Substance D
 
A novel approach to Undo
A novel approach to UndoA novel approach to Undo
A novel approach to Undo
 
Linux multiplexing
Linux multiplexingLinux multiplexing
Linux multiplexing
 
GSoC2014 - PGCon2015 Presentation June, 2015
GSoC2014 - PGCon2015 Presentation June, 2015GSoC2014 - PGCon2015 Presentation June, 2015
GSoC2014 - PGCon2015 Presentation June, 2015
 
Python_Introduction&DataType.pptx
Python_Introduction&DataType.pptxPython_Introduction&DataType.pptx
Python_Introduction&DataType.pptx
 
Prototype4Production Presented at FOSSASIA2015 at Singapore
Prototype4Production Presented at FOSSASIA2015 at SingaporePrototype4Production Presented at FOSSASIA2015 at Singapore
Prototype4Production Presented at FOSSASIA2015 at Singapore
 
AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...
AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...
AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...
 
python presntation 2.pptx
python presntation 2.pptxpython presntation 2.pptx
python presntation 2.pptx
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthrough
 
Conf orm - explain
Conf orm - explainConf orm - explain
Conf orm - explain
 

More from Makina Corpus

More from Makina Corpus (15)

Développer des applications mobiles avec phonegap
Développer des applications mobiles avec phonegapDévelopper des applications mobiles avec phonegap
Développer des applications mobiles avec phonegap
 
Why CMS will not die
Why CMS will not dieWhy CMS will not die
Why CMS will not die
 
Team up Django and Web mapping - DjangoCon Europe 2014
Team up Django and Web mapping - DjangoCon Europe 2014Team up Django and Web mapping - DjangoCon Europe 2014
Team up Django and Web mapping - DjangoCon Europe 2014
 
Petit déjeuner "Les bases de la cartographie sur le Web"
Petit déjeuner "Les bases de la cartographie sur le Web"Petit déjeuner "Les bases de la cartographie sur le Web"
Petit déjeuner "Les bases de la cartographie sur le Web"
 
Petit déjeuner "Développer sur le cloud, ou comment tout construire à partir ...
Petit déjeuner "Développer sur le cloud, ou comment tout construire à partir ...Petit déjeuner "Développer sur le cloud, ou comment tout construire à partir ...
Petit déjeuner "Développer sur le cloud, ou comment tout construire à partir ...
 
CoDe, le programme de développement d'applications mobiles de Makina Corpus
CoDe, le programme de développement d'applications mobiles de Makina Corpus CoDe, le programme de développement d'applications mobiles de Makina Corpus
CoDe, le programme de développement d'applications mobiles de Makina Corpus
 
Petit déjeuner "Alternatives libres à GoogleMaps" du 11 février 2014 - Nantes...
Petit déjeuner "Alternatives libres à GoogleMaps" du 11 février 2014 - Nantes...Petit déjeuner "Alternatives libres à GoogleMaps" du 11 février 2014 - Nantes...
Petit déjeuner "Alternatives libres à GoogleMaps" du 11 février 2014 - Nantes...
 
Petit déjeuner "Les nouveautés de la cartographie en ligne" du 12 décembre
Petit déjeuner "Les nouveautés de la cartographie en ligne" du 12 décembrePetit déjeuner "Les nouveautés de la cartographie en ligne" du 12 décembre
Petit déjeuner "Les nouveautés de la cartographie en ligne" du 12 décembre
 
Alternatives libres à Google Maps
Alternatives libres à Google MapsAlternatives libres à Google Maps
Alternatives libres à Google Maps
 
Atelier "Les nouveautés de la cartographie en ligne"
Atelier "Les nouveautés de la cartographie en ligne"Atelier "Les nouveautés de la cartographie en ligne"
Atelier "Les nouveautés de la cartographie en ligne"
 
Petit Déjeuner : HTML5 et CSS3, les interfaces de demain.
Petit Déjeuner : HTML5 et CSS3, les interfaces de demain.Petit Déjeuner : HTML5 et CSS3, les interfaces de demain.
Petit Déjeuner : HTML5 et CSS3, les interfaces de demain.
 
Geotrek
GeotrekGeotrek
Geotrek
 
Plomino
Plomino Plomino
Plomino
 
Des cartes d'un autre monde - DjangoCong 2012
Des cartes d'un autre monde - DjangoCong 2012Des cartes d'un autre monde - DjangoCong 2012
Des cartes d'un autre monde - DjangoCong 2012
 
Solutions libres alternatives à Google Maps
Solutions libres alternatives à Google MapsSolutions libres alternatives à Google Maps
Solutions libres alternatives à Google Maps
 

Recently uploaded

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Importing Wikipedia in Plone

  • 1. Importing Wikipedia in Plone Eric BREHAULT – Plone Conference 2013
  • 2. ZODB is good at storing objects ● Plone contents are objects, ● we store them in the ZODB, ● everything is fine, end of the story.
  • 3. But what if ... ... we want to store non-contentish records? Like polls, statistics, mail-list subscribers, etc., or any business-specific structured data.
  • 4. Store them as contents anyway That is a powerfull solution. But there are 2 major problems...
  • 5. Problem 1: You need to manage a secondary system ● you need to deploy it, ● you need to backup it, ● you need to secure it, ● etc.
  • 6. Problem 2: I hate SQL No explanation here.
  • 7. I think I just cannot digest it...
  • 8. How to store many records in the ZODB? ● Is the ZODB strong enough? ● Is the ZCatalog strong enough?
  • 9. My grandmother often told me "If you want to become stronger, you have to eat your soup."
  • 10. Where do we find a good soup for Plone? In a super souper!!!
  • 11. souper.plone and souper ● It provides both storage and indexing. ● Record can store any persistent pickable data. ● Created by BlueDynamics. ● Based on ZODB BTrees, node.ext.zodb, and repoze.catalog.
  • 12. Add a record >>> soup = get_soup('mysoup', context) >>> record = Record() >>> record.attrs['user'] = 'user1' >>> record.attrs['text'] = u'foo bar baz' >>> record.attrs['keywords'] = [u'1', u'2', u'ü'] >>> record_id = soup.add(record)
  • 13. Record in record >>> record['homeaddress'] = Record() >>> record['homeaddress'].attrs['zip'] = '6020' >>> record['homeaddress'].attrs['town'] = 'Innsbruck' >>> record['homeaddress'].attrs['country'] = 'Austria'
  • 14. Access record >>> from souper.soup import get_soup >>> soup = get_soup('mysoup', context) >>> record = soup.get(record_id)
  • 15. Query >>> from repoze.catalog.query import Eq, Contains >>> [r for r in soup.query(Eq('user', 'user1') & Contains('text', 'foo'))] [<Record object 'None' at ...>] or using CQE format >>> [r for r in soup.query("user == 'user1' and 'foo' in text")] [<Record object 'None' at ...>]
  • 16. souper ● a Soup-container can be moved to a specific ZODB mount- point, ● it can be shared across multiple independent Plone instances, ● souper works on Plone and Pyramid.
  • 17. Plomino & souper ● we use Plomino to build non-content oriented apps easily, ● we use souper to store huge amount of application data.
  • 18. Plomino data storage Originally, documents (=record) were ATFolder. Capacity about 30 000.
  • 19. Plomino data storage Since 1.14, documents are pure CMF. Capacity about 100 000. Usally the Plomino ZCatalog contains a lot of indexes.
  • 20. Plomino & souper With souper, documents are just soup records. Capacity: several millions.
  • 21. Typical use case ● Store 500 000 addresses, ● Be able to query them in full text and display the result on a map. Demo
  • 22. What is the limit? Can we import Wikipedia in souper? Demo with 400 000 records Demo with 5,5 millions of records
  • 23. Conclusion ● Usage performances are good, ● Plone performances are not impacted. Use it!
  • 24. Thoughts ● What about a REST API on top of it? ● Massive import is long and difficult, could it be improved?
  • 25. Makina Corpus For all questions related to this talk, please contact Éric Bréhault eric.brehault@makina-corpus.com Tel : +33 534 566 958 www.makina-corpus.com