SlideShare uma empresa Scribd logo
1 de 30
Amazon Web Services
at
Mendeley
Dan Harvey
Data Architect



twitter: @danharvey
dan.harvey@mendeley.com
Overview
• What do we do?
• System design
• AWS details
• Future plans
• Summary
Mendeley helps researchers work smarter
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            Automatic data extraction




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            External database integration




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            Automatic bibliography generation




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            Tagging and annotation




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter


                        3) Mendeley aggregates research
                                       data in the cloud
1) Install
Mendeley Desktop




               2) Manage
            your research
                   papers
By doing this, Mendeley makes science more
collaborative and transparent
Mendeley in numbers
• 1 million users

• 130 million research articles
• 40 million unique

• 14 million unique files uploaded
• 13 TB in total
System Overview
     S3
                                                                                                  ng
            Amazon Web                                       Web             Web           S ynci
             Services                                       Server          Server
EM
  R
                                                                                           Brow
                                                                                               sing




                                                             Docs
     EC
       2




                                                                              Usage Logs
                                           MySQL

                                                    MySQL


                                                               MySQL
           Da
             ta S
                 erv
                    ice
                       s
                              Map Reduce




                                                   HB
                                                     ase               HD
                                                                         FS
File Storage
• Sync to and from clients
 –Backed onto S3

• How to render 13TB of pdfs?
PDF Previews
• Elastic Beanstalk
• Java servlet
 –Load & render
 –Store into S3
• Quick to prototype
 –Fast iterations
 –No infrastructure to set up
                                   ©   Elas%c
Beanstalk,
Ma/
Wood,
AWS,
2011

 –Developers in control
 –No upfront cost in hardware
• No dependency on rest of our infrastructure
Adapt to take advantage
• Improve delivery
 –Cloud Front
 –Faster worldwide

• Re-working for cost saving
 –SQS
 –Spot instances
 –Render when it’s cheapest!
Article Search
• 40 million papers
• Gives 40GB index in Solr

• Variable load

• Moved to EC2
 –Elastic Load Balancer
                             Two
fold
variance
in
traffic
over
a
week
 –Auto-scale instances
Solr Instance Layout
• Master
                                         Solr
 –Single instance                       Master

 –Matched to indexing load
 –Backed onto EBS
                              Solr
                                          Solr        Solr
                             Slave
                                         Slave       Slave

• Slaves
 –HTTP sync to master
 –Pre-built AMI images                  Elastic
                                     Load Balancer
 –EC2 auto scaling
Desktop Client
• Client Downloads
 –From S3
 –Adding CloudFront


• Crash Reports
 –Stack traces into S3
 –Analytic reports on top
 –More focused bug fixing
The future
• Aim to buy no more hardware

• More Java on Elastic Beanstalk
• SQS - replace queues

• EMR - log analysis
• SimpleDB & S3 for data stores
Problems Faced
• Accounting usage
 –Mix of users on account
 –Start early with this!
 –IAM helps

• Orchestration
 –Cloud Formation
 –Elastic Beanstalk
 –Finding we need more
Summary
• Not all or nothing

• Focus on your problem
       not “Undifferentiated heavy lifting”
                                  - Werner Vogels


• Learn the building blocks provided
• Modular system design helps
Mendeley Binary Battle
• $10,001 prize + $1000 aws vouchers
• Collaboration with PLoS
• Prizes to best use of the API

• Judging panel includes
 –Werner Vogels
 –Tim O'Reilly
We’re hiring
     http://mendeley.com/careers/

             or chat to me after

• Lead Mobile Developer, iOS
• Web Developer, PHP/MySQL
• Software Engineer, Java

Mais conteúdo relacionado

Destaque

Structured writing using ms word
Structured writing using ms wordStructured writing using ms word
Structured writing using ms wordWouter Verkerken
 
Mendeley Workshop Presentation
Mendeley Workshop PresentationMendeley Workshop Presentation
Mendeley Workshop PresentationSalma Patel
 
Scientific writing process
Scientific writing processScientific writing process
Scientific writing processKhalid Hakeem
 
How to write a scientific article?
How to write a scientific article?How to write a scientific article?
How to write a scientific article?Annette Gerritsen
 
Dental drugs prescription
Dental drugs prescriptionDental drugs prescription
Dental drugs prescriptionDani Firman
 
Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)Dani Firman
 
Parts of a Research Paper
Parts of a Research PaperParts of a Research Paper
Parts of a Research PaperDraizelle Sexon
 

Destaque (10)

Structured writing using ms word
Structured writing using ms wordStructured writing using ms word
Structured writing using ms word
 
Mendeley Workshop Presentation
Mendeley Workshop PresentationMendeley Workshop Presentation
Mendeley Workshop Presentation
 
Scientific writing process
Scientific writing processScientific writing process
Scientific writing process
 
How to write a scientific article?
How to write a scientific article?How to write a scientific article?
How to write a scientific article?
 
Dental drugs prescription
Dental drugs prescriptionDental drugs prescription
Dental drugs prescription
 
Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)
 
How to Write a Thesis
How to Write a ThesisHow to Write a Thesis
How to Write a Thesis
 
Structured writing - What's it Good For?
Structured writing - What's it Good For?Structured writing - What's it Good For?
Structured writing - What's it Good For?
 
Introduction to-mendeley presentation-2014
Introduction to-mendeley presentation-2014Introduction to-mendeley presentation-2014
Introduction to-mendeley presentation-2014
 
Parts of a Research Paper
Parts of a Research PaperParts of a Research Paper
Parts of a Research Paper
 

Mais de Dan Harvey

Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Dan Harvey
 
Data Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to HadoopData Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to HadoopDan Harvey
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to HadoopDan Harvey
 
Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Dan Harvey
 
Project Voldemort: Big data loading
Project Voldemort: Big data loadingProject Voldemort: Big data loading
Project Voldemort: Big data loadingDan Harvey
 
HBase at Mendeley
HBase at MendeleyHBase at Mendeley
HBase at MendeleyDan Harvey
 

Mais de Dan Harvey (6)

Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
 
Data Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to HadoopData Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to Hadoop
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011
 
Project Voldemort: Big data loading
Project Voldemort: Big data loadingProject Voldemort: Big data loading
Project Voldemort: Big data loading
 
HBase at Mendeley
HBase at MendeleyHBase at Mendeley
HBase at Mendeley
 

Último

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Último (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Amazon Web Services at Mendeley

  • 1. Amazon Web Services at Mendeley Dan Harvey Data Architect twitter: @danharvey dan.harvey@mendeley.com
  • 2. Overview • What do we do? • System design • AWS details • Future plans • Summary
  • 4. Mendeley helps researchers work smarter 1) Install Mendeley Desktop
  • 5. Mendeley helps researchers work smarter 1) Install Mendeley Desktop Automatic data extraction 2) Manage your research papers
  • 6. Mendeley helps researchers work smarter 1) Install Mendeley Desktop External database integration 2) Manage your research papers
  • 7. Mendeley helps researchers work smarter 1) Install Mendeley Desktop Automatic bibliography generation 2) Manage your research papers
  • 8. Mendeley helps researchers work smarter 1) Install Mendeley Desktop Tagging and annotation 2) Manage your research papers
  • 9. Mendeley helps researchers work smarter 3) Mendeley aggregates research data in the cloud 1) Install Mendeley Desktop 2) Manage your research papers
  • 10. By doing this, Mendeley makes science more collaborative and transparent
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. Mendeley in numbers • 1 million users • 130 million research articles • 40 million unique • 14 million unique files uploaded • 13 TB in total
  • 19. System Overview S3 ng Amazon Web Web Web S ynci Services Server Server EM R Brow sing Docs EC 2 Usage Logs MySQL MySQL MySQL Da ta S erv ice s Map Reduce HB ase HD FS
  • 20. File Storage • Sync to and from clients –Backed onto S3 • How to render 13TB of pdfs?
  • 21. PDF Previews • Elastic Beanstalk • Java servlet –Load & render –Store into S3 • Quick to prototype –Fast iterations –No infrastructure to set up © Elas%c
Beanstalk,
Ma/
Wood,
AWS,
2011 –Developers in control –No upfront cost in hardware • No dependency on rest of our infrastructure
  • 22. Adapt to take advantage • Improve delivery –Cloud Front –Faster worldwide • Re-working for cost saving –SQS –Spot instances –Render when it’s cheapest!
  • 23. Article Search • 40 million papers • Gives 40GB index in Solr • Variable load • Moved to EC2 –Elastic Load Balancer Two
fold
variance
in
traffic
over
a
week –Auto-scale instances
  • 24. Solr Instance Layout • Master Solr –Single instance Master –Matched to indexing load –Backed onto EBS Solr Solr Solr Slave Slave Slave • Slaves –HTTP sync to master –Pre-built AMI images Elastic Load Balancer –EC2 auto scaling
  • 25. Desktop Client • Client Downloads –From S3 –Adding CloudFront • Crash Reports –Stack traces into S3 –Analytic reports on top –More focused bug fixing
  • 26. The future • Aim to buy no more hardware • More Java on Elastic Beanstalk • SQS - replace queues • EMR - log analysis • SimpleDB & S3 for data stores
  • 27. Problems Faced • Accounting usage –Mix of users on account –Start early with this! –IAM helps • Orchestration –Cloud Formation –Elastic Beanstalk –Finding we need more
  • 28. Summary • Not all or nothing • Focus on your problem not “Undifferentiated heavy lifting” - Werner Vogels • Learn the building blocks provided • Modular system design helps
  • 29. Mendeley Binary Battle • $10,001 prize + $1000 aws vouchers • Collaboration with PLoS • Prizes to best use of the API • Judging panel includes –Werner Vogels –Tim O'Reilly
  • 30. We’re hiring http://mendeley.com/careers/ or chat to me after • Lead Mobile Developer, iOS • Web Developer, PHP/MySQL • Software Engineer, Java

Notas do Editor

  1. \n
  2. \n
  3. as\n
  4. as\n
  5. as\n
  6. as\n
  7. as\n
  8. as\n
  9. as\n
  10. as\n
  11. as\n
  12. as\n
  13. as\n
  14. as\n
  15. as\n
  16. as\n
  17. as\n
  18. as\n
  19. as\n
  20. as\n
  21. as\n
  22. as\n
  23. as\n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n