SlideShare a Scribd company logo
1 of 16
Download to read offline
Share what you know




   Sam Kimbrel                               sam@snapguide.com
   Software Engineer

Monday, April 1, 13
What is Snapguide?
                               • 1.5 million uniques/month
                               • ~2000 reqs/min across app
                                 and web

                               • Python (Pyramid/uWSGI/
                                 nginx)

                               • MySQL/Redis
                               • Built primarily on AWS: EC2,
                                 RDS, S3, SQS, SNS,
                                 CloudSearch, CloudFront




                                            daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Snapguide on CloudSearch
           • Beta trial users after mentioning Solr on the phone
                 (seriously!)

           • Primary data set: guides
           • Facets: guide topic, “featured” boolean, visibility/ACL
                 flags

           • “autocomplete” search (more later)




                                                           daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
{
      "lang": "en",
      "fields": {
          "step_count": "14",
          "author_external_id": "qS878yliQ4mxg_9uHt2AZg",
          "author": "Claire Hesseltine",
          "items": [
              "Preheat oven to 325 degrees Fahrenheit.",
              ...
          ],
          "title": "Make Brown Butter Sea Salt Cookies",
          "featured": 1,
          "summary": "The brown butter adds a nutty, caramel-like taste
  to these delicious cookies.",
          "topic": [
              "desserts"
          ],
          "main_image_uuid": "43d201c8fd4b4833b83d3f95d112f1c1",
          "like_count": 761,

                      "public": "true"
             },
             "version": 1364333310,
             "type": "add",
             "id": "9eabff97e32c4244a8205da3fba442e9"
  }                                                     daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Queries
           • Guide text search:
           q=cookies
           • Guide search with topic:
           q=cookies&facet=topic&bq=topic:‘desserts’
           • “Typeahead”/suggestion search:
           bq=(or ‘paper flower’ ‘paper flower*’)




                                              daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Result Ranking
           • Use “Compare Rank Expressions”
           • text_relevance is your friend
           • Goals:
                • Boost popular/featured guides
                • Make title/summary matches worth more than item
                      (supplies, step text) matches




                                                        daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
min(
 cs.text_relevance(
  {"weights":
   {"title":2.5, "author": 1.5, "items":
   0.1, "summary": 1.5},
  "default_weight":1}),
 1000)
+ min(200, like_count / 10)
+ 100*featured


                             daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Offline index updates
           • Extracting guide data to update document is slow
           • Remove update from online web request process
           • Internal-only API endpoints
           • SQS
           • queue_consumer daemon




                                                       daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Offline index updates

                       Web server           SQS




                                      Queue consumer
                       Snapguide
                       DB/Redis




                                         Web server
                                    (dedicated to queues)   CloudSearch




                                                            daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Performance
                      SSL is painful




                                           daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Performance




         but physical proximity (us-west-1) is
                       awesome



                                                 daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Future work
           • Add more domains (users, new features)
           • Search-based suggestion engine
           • Improved ranking/scoring — crawl our social graph




                                                       daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Questions?




    www.snapguide.com

Monday, April 1, 13

More Related Content

What's hot

The Coding Designer's Survival Kit - Capital Camp
The Coding Designer's Survival Kit - Capital CampThe Coding Designer's Survival Kit - Capital Camp
The Coding Designer's Survival Kit - Capital Camp
canarymason
 

What's hot (7)

Irb Tips and Tricks
Irb Tips and TricksIrb Tips and Tricks
Irb Tips and Tricks
 
Customizing the custom loop wordcamp 2012
Customizing the custom loop   wordcamp 2012Customizing the custom loop   wordcamp 2012
Customizing the custom loop wordcamp 2012
 
6 reasons Jubilee could be a Rubyist's new best friend
6 reasons Jubilee could be a Rubyist's new best friend6 reasons Jubilee could be a Rubyist's new best friend
6 reasons Jubilee could be a Rubyist's new best friend
 
Less is more: Getting Real About Content and Features
Less is more: Getting Real About Content and Features Less is more: Getting Real About Content and Features
Less is more: Getting Real About Content and Features
 
Social media and internet marketing
Social media and internet marketingSocial media and internet marketing
Social media and internet marketing
 
Getting Started with Axure Widget Libraries
Getting Started with Axure Widget LibrariesGetting Started with Axure Widget Libraries
Getting Started with Axure Widget Libraries
 
The Coding Designer's Survival Kit - Capital Camp
The Coding Designer's Survival Kit - Capital CampThe Coding Designer's Survival Kit - Capital Camp
The Coding Designer's Survival Kit - Capital Camp
 

Viewers also liked

EDU2.0 and Amazon CloudSearch
EDU2.0 and Amazon CloudSearchEDU2.0 and Amazon CloudSearch
EDU2.0 and Amazon CloudSearch
Michael Bohlig
 
Building great search – how to overcome common challenges jon handler, aws
Building great search – how to overcome common challenges   jon handler, awsBuilding great search – how to overcome common challenges   jon handler, aws
Building great search – how to overcome common challenges jon handler, aws
Amazon Web Services
 

Viewers also liked (9)

EDU2.0 and Amazon CloudSearch
EDU2.0 and Amazon CloudSearchEDU2.0 and Amazon CloudSearch
EDU2.0 and Amazon CloudSearch
 
Amazon CloudSearch - Relevance, Ranking, Tuning and Analytics
Amazon CloudSearch - Relevance, Ranking, Tuning and AnalyticsAmazon CloudSearch - Relevance, Ranking, Tuning and Analytics
Amazon CloudSearch - Relevance, Ranking, Tuning and Analytics
 
Building great search – how to overcome common challenges jon handler, aws
Building great search – how to overcome common challenges   jon handler, awsBuilding great search – how to overcome common challenges   jon handler, aws
Building great search – how to overcome common challenges jon handler, aws
 
AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch
AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearchAWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch
AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch
 
30 Must Read CIO Bloggers
30 Must Read CIO Bloggers30 Must Read CIO Bloggers
30 Must Read CIO Bloggers
 
(ARC309) Building and Scaling Amazon Cloud Drive to Millions of Users | AWS r...
(ARC309) Building and Scaling Amazon Cloud Drive to Millions of Users | AWS r...(ARC309) Building and Scaling Amazon Cloud Drive to Millions of Users | AWS r...
(ARC309) Building and Scaling Amazon Cloud Drive to Millions of Users | AWS r...
 
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
 
AWS Black Belt Techシリーズ Amazon CloudSearch
AWS Black Belt Techシリーズ Amazon CloudSearchAWS Black Belt Techシリーズ Amazon CloudSearch
AWS Black Belt Techシリーズ Amazon CloudSearch
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 

Similar to Snapguide - Amazon Cloudsearch

dataviz on d3.js + elasticsearch
dataviz on d3.js + elasticsearchdataviz on d3.js + elasticsearch
dataviz on d3.js + elasticsearch
Mathieu Elie
 
Presentation
PresentationPresentation
Presentation
mmarchani
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databases
sjwoodman
 
SVApps presentation
SVApps presentationSVApps presentation
SVApps presentation
llkronus
 
Seasprint2012ploneconferencereportout
Seasprint2012ploneconferencereportoutSeasprint2012ploneconferencereportout
Seasprint2012ploneconferencereportout
ableeb
 
Modeling Data in MongoDB
Modeling Data in MongoDBModeling Data in MongoDB
Modeling Data in MongoDB
lehresman
 

Similar to Snapguide - Amazon Cloudsearch (20)

AppEngine Performance Tuning
AppEngine Performance TuningAppEngine Performance Tuning
AppEngine Performance Tuning
 
Emergency Toolkit Presentation
Emergency Toolkit PresentationEmergency Toolkit Presentation
Emergency Toolkit Presentation
 
Lightweight Documentation: An Agile Approach
Lightweight Documentation: An Agile ApproachLightweight Documentation: An Agile Approach
Lightweight Documentation: An Agile Approach
 
Scaling Pinterest
Scaling PinterestScaling Pinterest
Scaling Pinterest
 
OpenStack Doc Overview for Boot Camp
OpenStack Doc Overview for Boot CampOpenStack Doc Overview for Boot Camp
OpenStack Doc Overview for Boot Camp
 
dataviz on d3.js + elasticsearch
dataviz on d3.js + elasticsearchdataviz on d3.js + elasticsearch
dataviz on d3.js + elasticsearch
 
Design Systems at Scale
Design Systems at ScaleDesign Systems at Scale
Design Systems at Scale
 
Icon Fonts FTW
Icon Fonts FTWIcon Fonts FTW
Icon Fonts FTW
 
Untangling the web week1
Untangling the web week1Untangling the web week1
Untangling the web week1
 
Play Architecture, Implementation, Shiny Objects, and a Proposal
Play Architecture, Implementation, Shiny Objects, and a ProposalPlay Architecture, Implementation, Shiny Objects, and a Proposal
Play Architecture, Implementation, Shiny Objects, and a Proposal
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search Solutions
 
Presentation
PresentationPresentation
Presentation
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databases
 
Exploring Data Preparation and Visualization Tools for Urban Forestry
Exploring Data Preparation and Visualization Tools for Urban ForestryExploring Data Preparation and Visualization Tools for Urban Forestry
Exploring Data Preparation and Visualization Tools for Urban Forestry
 
SVApps presentation
SVApps presentationSVApps presentation
SVApps presentation
 
Windycityrails page performance
Windycityrails page performanceWindycityrails page performance
Windycityrails page performance
 
Seasprint2012ploneconferencereportout
Seasprint2012ploneconferencereportoutSeasprint2012ploneconferencereportout
Seasprint2012ploneconferencereportout
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
 
Modeling Data in MongoDB
Modeling Data in MongoDBModeling Data in MongoDB
Modeling Data in MongoDB
 

More from Michael Bohlig

Coursera amazon cloudsearch presentation
Coursera amazon cloudsearch presentation Coursera amazon cloudsearch presentation
Coursera amazon cloudsearch presentation
Michael Bohlig
 

More from Michael Bohlig (11)

Amazon Cloudsearch Session With Elsevier: re:Invent 2013
Amazon Cloudsearch Session With Elsevier: re:Invent 2013 Amazon Cloudsearch Session With Elsevier: re:Invent 2013
Amazon Cloudsearch Session With Elsevier: re:Invent 2013
 
Dzone Webinar: Search Patterns with Amazon CloudSearch
Dzone Webinar: Search Patterns with Amazon CloudSearchDzone Webinar: Search Patterns with Amazon CloudSearch
Dzone Webinar: Search Patterns with Amazon CloudSearch
 
Delivering Better Search For WordPress - AWS Webcast
Delivering Better Search For WordPress - AWS WebcastDelivering Better Search For WordPress - AWS Webcast
Delivering Better Search For WordPress - AWS Webcast
 
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
 
Building Great Mobile Search with Productsy and Amazon CloudSearch
Building Great Mobile Search with Productsy and Amazon CloudSearchBuilding Great Mobile Search with Productsy and Amazon CloudSearch
Building Great Mobile Search with Productsy and Amazon CloudSearch
 
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
 
Amazon CloudSearch User Talk - Naked Wines
Amazon CloudSearch User Talk - Naked Wines Amazon CloudSearch User Talk - Naked Wines
Amazon CloudSearch User Talk - Naked Wines
 
DynamoDB and Amazon Cloudsearch
DynamoDB and Amazon CloudsearchDynamoDB and Amazon Cloudsearch
DynamoDB and Amazon Cloudsearch
 
Tuning Search Requests - Amazon CloudSearch
Tuning Search Requests - Amazon CloudSearchTuning Search Requests - Amazon CloudSearch
Tuning Search Requests - Amazon CloudSearch
 
Coursera amazon cloudsearch presentation
Coursera amazon cloudsearch presentation Coursera amazon cloudsearch presentation
Coursera amazon cloudsearch presentation
 
Geospatial Search With Amazon CloudSearch
Geospatial Search With Amazon CloudSearch Geospatial Search With Amazon CloudSearch
Geospatial Search With Amazon CloudSearch
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Snapguide - Amazon Cloudsearch

  • 1. Share what you know Sam Kimbrel sam@snapguide.com Software Engineer Monday, April 1, 13
  • 2. What is Snapguide? • 1.5 million uniques/month • ~2000 reqs/min across app and web • Python (Pyramid/uWSGI/ nginx) • MySQL/Redis • Built primarily on AWS: EC2, RDS, S3, SQS, SNS, CloudSearch, CloudFront daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 3. daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 4. daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 5. daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 6. Snapguide on CloudSearch • Beta trial users after mentioning Solr on the phone (seriously!) • Primary data set: guides • Facets: guide topic, “featured” boolean, visibility/ACL flags • “autocomplete” search (more later) daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 7. { "lang": "en", "fields": { "step_count": "14", "author_external_id": "qS878yliQ4mxg_9uHt2AZg", "author": "Claire Hesseltine", "items": [ "Preheat oven to 325 degrees Fahrenheit.", ... ], "title": "Make Brown Butter Sea Salt Cookies", "featured": 1, "summary": "The brown butter adds a nutty, caramel-like taste to these delicious cookies.", "topic": [ "desserts" ], "main_image_uuid": "43d201c8fd4b4833b83d3f95d112f1c1", "like_count": 761, "public": "true" }, "version": 1364333310, "type": "add", "id": "9eabff97e32c4244a8205da3fba442e9" } daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 8. Queries • Guide text search: q=cookies • Guide search with topic: q=cookies&facet=topic&bq=topic:‘desserts’ • “Typeahead”/suggestion search: bq=(or ‘paper flower’ ‘paper flower*’) daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 9. Result Ranking • Use “Compare Rank Expressions” • text_relevance is your friend • Goals: • Boost popular/featured guides • Make title/summary matches worth more than item (supplies, step text) matches daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 10. min( cs.text_relevance( {"weights": {"title":2.5, "author": 1.5, "items": 0.1, "summary": 1.5}, "default_weight":1}), 1000) + min(200, like_count / 10) + 100*featured daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 11. Offline index updates • Extracting guide data to update document is slow • Remove update from online web request process • Internal-only API endpoints • SQS • queue_consumer daemon daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 12. Offline index updates Web server SQS Queue consumer Snapguide DB/Redis Web server (dedicated to queues) CloudSearch daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 13. Performance SSL is painful daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 14. Performance but physical proximity (us-west-1) is awesome daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 15. Future work • Add more domains (users, new features) • Search-based suggestion engine • Improved ranking/scoring — crawl our social graph daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 16. Questions? www.snapguide.com Monday, April 1, 13