SlideShare uma empresa Scribd logo
1 de 19
What’s up?

Bouvet BigOne, 2011-10-27
Lars Marius Garshol, <larsga@bouvet.no>
http://twitter.com/larsga




1
The problem with RSS readers

• Many feeds (like newspapers’) are too busy,
  and so unread stories pile up
• In most feeds you are only interested in a small
  subset of the posts
• Staying on top of the flow of news and digging
  up the interesting stuff is hard work




2
What’s up?

• A newsreader that tries solve this for you
• It uses statistics to figure out which news are
  the most interesting for you
    – the statistics are based on feedback from you
• Everything is collected into a single list, sorted
  by relevance and freshness
• Stories sink slowly as they age, so if you don’t
  read them they gradually fade away


3
Like     Not used


    Dislike   Mark as read




4
Very interesting post about beer,
    but the word “beer” doesn’t actually
    appear anywhere

    Probabilities combined with Bayes’s
    theorem.




5
Utterly irrelevant post about sports




6
Adding feeds




Probably the usability Achilles
heel of the system right now




 7
Three implementations

• In-memory single-user version
    – worked well for me for several years
    – wanted to try it out with more users
• Google AppEngine version
    – easy to build and deploy
    – used way too much CPU
• “Traditional” version
    – PostgreSQL backend, ordinary web hosting
    – seems to scale much better


8
The goal

• Make the site pay for its own hosting
    – currently solved by running it on my personal web
      server
    – expect system to outgrow that server soon
• Move to cloud hosting
    – candidates: Amazon EC2, heroku, Google AppEngine
      w/ MySQL
• Income from Google Ads
    – income per user likely to be very low
    – scaling challenge: support enough users to pay for
      computing resources

9
Data structure

                           Feed          Post




                          Subscr         Rated
              User
                          iption          post


• Good
     – fully normalized, no redundancy
     – simple and natural
• Bad
     – showing main page requires many joins
     – limited possibilities for caching
10
Queueing

• The original version would respond to clicks in
  real-time
     – meant recomputing all stories on each up/down vote,
       before showing the page again
     – not really very pleasant user experience
• Changed over to a queue approach
     –   user clicks added to a queue
     –   queue retrieves tasks, processes them, may add more
     –   scheduled tasks injected into queue
     –   admin command-line tool1) to inject tasks when needed
     –   works beautifully

11         1) http://code.google.com/p/whazzup/source/browse/send.py
Google AppEngine experience

• Easy to build, painless to deploy
     – web.py and Python well supported
     – good queue and scheduled tasks APIs
• Datastore and GQL too primitive
     – high latency registers as high CPU usage (costly)
     – very, very limited support for letting the database do
       the work, leads to poor performance
• AppEngine apps require heavy caching to work
     – not really possible with this application
• Would have hit limit of free usage at 4 users
     – not a realistic proposition
12
Example problem

• How to implement aging of posts?
     – that is, reducing score as the posts get older
• Could compute score when loading story list
     – not possible in GQL (no expression language)
• Could run scheduled tasks once an hour
     – in GQL this requires loading all RatedPost objects
       into main process
     – way too resource-intensive
• Just didn’t scale at all

13            (probability * 1000.0) / math.log(ageinsecs)
Current architecture
                                                                            100% Python
                                                                            Based on web.py
          Apache w/ mod_python                                              Single server so far
           Apache w/ mod_python
            Apache w/ mod_python
              Apache w/ mod_python

 cron


              IPC message queue 1)



        Queue worker

          Download   Download   Download                                    PostgreSQL
           thread     thread     thread




         DBM files
          DBM files
           DBM files
14          DBM files                 1) http://semanchuk.com/philip/sysv_ipc/
Aging posts with Postgres

• First attempt
     – load posts, compute in Python, save to DB
     – took 1.1 seconds per subscription
     – with ~50 subscriptions per user, that’s much too slow
• Second attempt
     –   do calculation in the SQL update statement1)
     –   takes 0.5 seconds per user
     –   more than 100 times faster
     –   may still be too slow
          • with 7200 users it would take an hour


15         1)
           http://code.google.com/p/whazzup/source/browse/dbqueue.py?r=4e
More performance tricks

• Loading story pages is a bit expensive
     – because of SQL joins required
     – now handling votes with AJAX, so page doesn’t have
       to be reloaded for every vote
     – next step: caching feed titles and story titles?
• Separate worker threads for feed downloading
     – because feeds may be slow to respond
     – threads save feed XML to disk, then queue task to
       process feed
     – ParseFeed task doesn’t have any network latency

16
Statistics   No perceptible server load
             Bottlenecks:
             • loading story list pages
             • parsing feeds, calculating points




17
Future architecture


     Web frontend   Web frontend    Web frontend


                                        memcached?

         cron       Message queue                      DB cluster
                     (Gearman?)                      (PostgreSQL?)




     Queue worker   Queue worker    Queue worker




                     DBM files?
18
More information

• Blog post
     – http://www.garshol.priv.no/blog/216.html
• Source code
     – http://code.google.com/p/whazzup/
• Pre-alpha trial
     –   open to anyone; sign up if you’re interested
     –   no guarantees about anything
     –   http://whazzup.garshol.priv.no/
     –   currently limited to 100 users (87 accounts available)


19

Mais conteúdo relacionado

Mais procurados

Pure Speed Drupal 4 Gov talk
Pure Speed Drupal 4 Gov talkPure Speed Drupal 4 Gov talk
Pure Speed Drupal 4 Gov talkBryan Ollendyke
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity PlanningMongoDB
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity PlanningMongoDB
 
Configuring Apache Servers for Better Web Perormance
Configuring Apache Servers for Better Web PerormanceConfiguring Apache Servers for Better Web Perormance
Configuring Apache Servers for Better Web PerormanceSpark::red
 
PowerShell and SharePoint
PowerShell and SharePointPowerShell and SharePoint
PowerShell and SharePointTalbott Crowell
 
High Performance WordPress II
High Performance WordPress IIHigh Performance WordPress II
High Performance WordPress IIBarry Abrahamson
 
Moving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScaleMoving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScalemmoline
 
Optimizing the performance of WordPress
Optimizing the performance of WordPressOptimizing the performance of WordPress
Optimizing the performance of WordPressJosh Highland Giese
 
Anthony Somerset - Site Speed = Success!
Anthony Somerset - Site Speed = Success!Anthony Somerset - Site Speed = Success!
Anthony Somerset - Site Speed = Success!WordCamp Cape Town
 
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...Cloudera, Inc.
 
Drupal, git and sanity
Drupal, git and sanityDrupal, git and sanity
Drupal, git and sanityCharlie Morris
 
Scaling wordpress for high traffic
Scaling wordpress for high trafficScaling wordpress for high traffic
Scaling wordpress for high trafficRoshan Bhattarai
 
Roshan Bhattarai: Scaling WordPress for high traffic sites
Roshan Bhattarai: Scaling WordPress for high traffic sitesRoshan Bhattarai: Scaling WordPress for high traffic sites
Roshan Bhattarai: Scaling WordPress for high traffic siteswpnepal
 
High Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningHigh Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningAlbert Chen
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBaseCon
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance DrupalJeff Geerling
 

Mais procurados (19)

Scaling 101
Scaling 101Scaling 101
Scaling 101
 
Pure Speed Drupal 4 Gov talk
Pure Speed Drupal 4 Gov talkPure Speed Drupal 4 Gov talk
Pure Speed Drupal 4 Gov talk
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
Configuring Apache Servers for Better Web Perormance
Configuring Apache Servers for Better Web PerormanceConfiguring Apache Servers for Better Web Perormance
Configuring Apache Servers for Better Web Perormance
 
Azkaban
AzkabanAzkaban
Azkaban
 
PowerShell and SharePoint
PowerShell and SharePointPowerShell and SharePoint
PowerShell and SharePoint
 
High Performance WordPress II
High Performance WordPress IIHigh Performance WordPress II
High Performance WordPress II
 
Moving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScaleMoving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScale
 
WAG the Blog
WAG the BlogWAG the Blog
WAG the Blog
 
Optimizing the performance of WordPress
Optimizing the performance of WordPressOptimizing the performance of WordPress
Optimizing the performance of WordPress
 
Anthony Somerset - Site Speed = Success!
Anthony Somerset - Site Speed = Success!Anthony Somerset - Site Speed = Success!
Anthony Somerset - Site Speed = Success!
 
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
 
Drupal, git and sanity
Drupal, git and sanityDrupal, git and sanity
Drupal, git and sanity
 
Scaling wordpress for high traffic
Scaling wordpress for high trafficScaling wordpress for high traffic
Scaling wordpress for high traffic
 
Roshan Bhattarai: Scaling WordPress for high traffic sites
Roshan Bhattarai: Scaling WordPress for high traffic sitesRoshan Bhattarai: Scaling WordPress for high traffic sites
Roshan Bhattarai: Scaling WordPress for high traffic sites
 
High Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningHigh Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance Tuning
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance Drupal
 

Destaque (19)

XML in software development
XML in software developmentXML in software development
XML in software development
 
Presentacion De Informatica
Presentacion De InformaticaPresentacion De Informatica
Presentacion De Informatica
 
Ontopia tutorial
Ontopia tutorialOntopia tutorial
Ontopia tutorial
 
Web 2.0 and Topic Maps
Web 2.0 and Topic MapsWeb 2.0 and Topic Maps
Web 2.0 and Topic Maps
 
Experiments in genetic programming
Experiments in genetic programmingExperiments in genetic programming
Experiments in genetic programming
 
Ontopia Code Camp
Ontopia Code CampOntopia Code Camp
Ontopia Code Camp
 
Ontopia Code Camp Impressions
Ontopia Code Camp ImpressionsOntopia Code Camp Impressions
Ontopia Code Camp Impressions
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Semantisk integrasjon
Semantisk integrasjonSemantisk integrasjon
Semantisk integrasjon
 
Tolog optimization
Tolog optimizationTolog optimization
Tolog optimization
 
Tolog Updates
Tolog UpdatesTolog Updates
Tolog Updates
 
Archive integration with RDF
Archive integration with RDFArchive integration with RDF
Archive integration with RDF
 
Social Media Workshop 030409 F I N A L
Social  Media  Workshop 030409 F I N A LSocial  Media  Workshop 030409 F I N A L
Social Media Workshop 030409 F I N A L
 
Topic Maps - Human-oriented semantics?
Topic Maps - Human-oriented semantics?Topic Maps - Human-oriented semantics?
Topic Maps - Human-oriented semantics?
 
Bitcoin - digital gold
Bitcoin - digital goldBitcoin - digital gold
Bitcoin - digital gold
 
Big data 101
Big data 101Big data 101
Big data 101
 
Approximate string comparators
Approximate string comparatorsApproximate string comparators
Approximate string comparators
 
Unicode 101
Unicode 101Unicode 101
Unicode 101
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
 

Semelhante a What's up?

(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP PerformanceBIOVIA
 
SharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceSharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceBrian Culver
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool ManagementBIOVIA
 
SharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceSharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceBrian Culver
 
Architecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionArchitecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionNguyen Tung
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQLKonstantin Gredeskoul
 
Fundamentals of performance tuning PHP on IBM i
Fundamentals of performance tuning PHP on IBM i  Fundamentals of performance tuning PHP on IBM i
Fundamentals of performance tuning PHP on IBM i Zend by Rogue Wave Software
 
What is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress HostingWhat is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress HostingWPSFO Meetup Group
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
Supercharge Application Delivery to Satisfy Users
Supercharge Application Delivery to Satisfy UsersSupercharge Application Delivery to Satisfy Users
Supercharge Application Delivery to Satisfy UsersNGINX, Inc.
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013Amazon Web Services
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilitycherryhillco
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpandeIndicThreads
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyCeph Community
 
Architecture for building scalable and highly available Postgres Cluster
Architecture for building scalable and highly available Postgres ClusterArchitecture for building scalable and highly available Postgres Cluster
Architecture for building scalable and highly available Postgres ClusterAshnikbiz
 
My Site is slow - Drupal Camp London 2013
My Site is slow - Drupal Camp London 2013My Site is slow - Drupal Camp London 2013
My Site is slow - Drupal Camp London 2013hernanibf
 

Semelhante a What's up? (20)

CI_CONF 2012: Scaling - Chris Miller
CI_CONF 2012: Scaling - Chris MillerCI_CONF 2012: Scaling - Chris Miller
CI_CONF 2012: Scaling - Chris Miller
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
 
SharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceSharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 Performance
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management
 
SharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceSharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 Performance
 
Architecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionArchitecture Patterns - Open Discussion
Architecture Patterns - Open Discussion
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Fundamentals of performance tuning PHP on IBM i
Fundamentals of performance tuning PHP on IBM i  Fundamentals of performance tuning PHP on IBM i
Fundamentals of performance tuning PHP on IBM i
 
What is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress HostingWhat is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress Hosting
 
Background processing with hangfire
Background processing with hangfireBackground processing with hangfire
Background processing with hangfire
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Supercharge Application Delivery to Satisfy Users
Supercharge Application Delivery to Satisfy UsersSupercharge Application Delivery to Satisfy Users
Supercharge Application Delivery to Satisfy Users
 
Mulesoft meetup 29.06
Mulesoft meetup 29.06Mulesoft meetup 29.06
Mulesoft meetup 29.06
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalability
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpande
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 
Architecture for building scalable and highly available Postgres Cluster
Architecture for building scalable and highly available Postgres ClusterArchitecture for building scalable and highly available Postgres Cluster
Architecture for building scalable and highly available Postgres Cluster
 
My Site is slow - Drupal Camp London 2013
My Site is slow - Drupal Camp London 2013My Site is slow - Drupal Camp London 2013
My Site is slow - Drupal Camp London 2013
 

Mais de Lars Marius Garshol

JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformationLars Marius Garshol
 
Data collection in AWS at Schibsted
Data collection in AWS at SchibstedData collection in AWS at Schibsted
Data collection in AWS at SchibstedLars Marius Garshol
 
NoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityNoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityLars Marius Garshol
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engineLars Marius Garshol
 
Linked Open Data for the Cultural Sector
Linked Open Data for the Cultural SectorLinked Open Data for the Cultural Sector
Linked Open Data for the Cultural SectorLars Marius Garshol
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Hafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practiceHafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practiceLars Marius Garshol
 
Linking data without common identifiers
Linking data without common identifiersLinking data without common identifiers
Linking data without common identifiersLars Marius Garshol
 
Linking data without common identifiers
Linking data without common identifiersLinking data without common identifiers
Linking data without common identifiersLars Marius Garshol
 
A Citizen's Portal for the City of Bergen
A Citizen's Portal for the City of BergenA Citizen's Portal for the City of Bergen
A Citizen's Portal for the City of BergenLars Marius Garshol
 

Mais de Lars Marius Garshol (20)

JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformation
 
Data collection in AWS at Schibsted
Data collection in AWS at SchibstedData collection in AWS at Schibsted
Data collection in AWS at Schibsted
 
Kveik - what is it?
Kveik - what is it?Kveik - what is it?
Kveik - what is it?
 
Nature-inspired algorithms
Nature-inspired algorithmsNature-inspired algorithms
Nature-inspired algorithms
 
Collecting 600M events/day
Collecting 600M events/dayCollecting 600M events/day
Collecting 600M events/day
 
History of writing
History of writingHistory of writing
History of writing
 
NoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityNoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativity
 
Norwegian farmhouse ale
Norwegian farmhouse aleNorwegian farmhouse ale
Norwegian farmhouse ale
 
The Euro crisis in 10 minutes
The Euro crisis in 10 minutesThe Euro crisis in 10 minutes
The Euro crisis in 10 minutes
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Linked Open Data for the Cultural Sector
Linked Open Data for the Cultural SectorLinked Open Data for the Cultural Sector
Linked Open Data for the Cultural Sector
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Hops - the green gold
Hops - the green goldHops - the green gold
Hops - the green gold
 
Hafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practiceHafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practice
 
Semantisk integrasjon
Semantisk integrasjonSemantisk integrasjon
Semantisk integrasjon
 
Linking data without common identifiers
Linking data without common identifiersLinking data without common identifiers
Linking data without common identifiers
 
Linking data without common identifiers
Linking data without common identifiersLinking data without common identifiers
Linking data without common identifiers
 
Deduplication
DeduplicationDeduplication
Deduplication
 
Introduction to SDshare
Introduction to SDshareIntroduction to SDshare
Introduction to SDshare
 
A Citizen's Portal for the City of Bergen
A Citizen's Portal for the City of BergenA Citizen's Portal for the City of Bergen
A Citizen's Portal for the City of Bergen
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

What's up?

  • 1. What’s up? Bouvet BigOne, 2011-10-27 Lars Marius Garshol, <larsga@bouvet.no> http://twitter.com/larsga 1
  • 2. The problem with RSS readers • Many feeds (like newspapers’) are too busy, and so unread stories pile up • In most feeds you are only interested in a small subset of the posts • Staying on top of the flow of news and digging up the interesting stuff is hard work 2
  • 3. What’s up? • A newsreader that tries solve this for you • It uses statistics to figure out which news are the most interesting for you – the statistics are based on feedback from you • Everything is collected into a single list, sorted by relevance and freshness • Stories sink slowly as they age, so if you don’t read them they gradually fade away 3
  • 4. Like Not used Dislike Mark as read 4
  • 5. Very interesting post about beer, but the word “beer” doesn’t actually appear anywhere Probabilities combined with Bayes’s theorem. 5
  • 6. Utterly irrelevant post about sports 6
  • 7. Adding feeds Probably the usability Achilles heel of the system right now 7
  • 8. Three implementations • In-memory single-user version – worked well for me for several years – wanted to try it out with more users • Google AppEngine version – easy to build and deploy – used way too much CPU • “Traditional” version – PostgreSQL backend, ordinary web hosting – seems to scale much better 8
  • 9. The goal • Make the site pay for its own hosting – currently solved by running it on my personal web server – expect system to outgrow that server soon • Move to cloud hosting – candidates: Amazon EC2, heroku, Google AppEngine w/ MySQL • Income from Google Ads – income per user likely to be very low – scaling challenge: support enough users to pay for computing resources 9
  • 10. Data structure Feed Post Subscr Rated User iption post • Good – fully normalized, no redundancy – simple and natural • Bad – showing main page requires many joins – limited possibilities for caching 10
  • 11. Queueing • The original version would respond to clicks in real-time – meant recomputing all stories on each up/down vote, before showing the page again – not really very pleasant user experience • Changed over to a queue approach – user clicks added to a queue – queue retrieves tasks, processes them, may add more – scheduled tasks injected into queue – admin command-line tool1) to inject tasks when needed – works beautifully 11 1) http://code.google.com/p/whazzup/source/browse/send.py
  • 12. Google AppEngine experience • Easy to build, painless to deploy – web.py and Python well supported – good queue and scheduled tasks APIs • Datastore and GQL too primitive – high latency registers as high CPU usage (costly) – very, very limited support for letting the database do the work, leads to poor performance • AppEngine apps require heavy caching to work – not really possible with this application • Would have hit limit of free usage at 4 users – not a realistic proposition 12
  • 13. Example problem • How to implement aging of posts? – that is, reducing score as the posts get older • Could compute score when loading story list – not possible in GQL (no expression language) • Could run scheduled tasks once an hour – in GQL this requires loading all RatedPost objects into main process – way too resource-intensive • Just didn’t scale at all 13 (probability * 1000.0) / math.log(ageinsecs)
  • 14. Current architecture 100% Python Based on web.py Apache w/ mod_python Single server so far Apache w/ mod_python Apache w/ mod_python Apache w/ mod_python cron IPC message queue 1) Queue worker Download Download Download PostgreSQL thread thread thread DBM files DBM files DBM files 14 DBM files 1) http://semanchuk.com/philip/sysv_ipc/
  • 15. Aging posts with Postgres • First attempt – load posts, compute in Python, save to DB – took 1.1 seconds per subscription – with ~50 subscriptions per user, that’s much too slow • Second attempt – do calculation in the SQL update statement1) – takes 0.5 seconds per user – more than 100 times faster – may still be too slow • with 7200 users it would take an hour 15 1) http://code.google.com/p/whazzup/source/browse/dbqueue.py?r=4e
  • 16. More performance tricks • Loading story pages is a bit expensive – because of SQL joins required – now handling votes with AJAX, so page doesn’t have to be reloaded for every vote – next step: caching feed titles and story titles? • Separate worker threads for feed downloading – because feeds may be slow to respond – threads save feed XML to disk, then queue task to process feed – ParseFeed task doesn’t have any network latency 16
  • 17. Statistics No perceptible server load Bottlenecks: • loading story list pages • parsing feeds, calculating points 17
  • 18. Future architecture Web frontend Web frontend Web frontend memcached? cron Message queue DB cluster (Gearman?) (PostgreSQL?) Queue worker Queue worker Queue worker DBM files? 18
  • 19. More information • Blog post – http://www.garshol.priv.no/blog/216.html • Source code – http://code.google.com/p/whazzup/ • Pre-alpha trial – open to anyone; sign up if you’re interested – no guarantees about anything – http://whazzup.garshol.priv.no/ – currently limited to 100 users (87 accounts available) 19