SlideShare uma empresa Scribd logo
1 de 25
Lessons from building a search
engine with Amazon Web Services
            Chirayu Patel
     chirayu@snappyfingers.com
Chirayu Patel
                       Developer
                     SnappyFingers
           Question and Answer Search Engine
                   100% in the cloud



5-Apr-09                CloudCamp - Bangalore
My experiences
                 What worked?
                 What didn’t?
           What could be done better?
                What did I miss?




5-Apr-09           CloudCamp - Bangalore
Recap - AWS?
      •    EC2 – Elastic Compute Cloud
      •    S3 – Simple Storage Service
      •    SQS – Simple Queue Service
      •    SDB – SimpleDB




5-Apr-09                   CloudCamp - Bangalore
Why AWS?
      •    Computing Power requirements unknown
      •    Cheap
      •    Availability of multiple services
      •    Easy to implement SnappyFingers architecture
           using AWS services




5-Apr-09                   CloudCamp - Bangalore
SnappyFingers
      • Information Retrieval System (IRS)
      • FrontEnd
           – Nothing unique here




5-Apr-09                    CloudCamp - Bangalore
Three motivations (behind my decisions)
      • Reluctance to learn
      • Cost Conscious
      • I write buggy code




5-Apr-09                  CloudCamp - Bangalore
Architectural Requirements
      •    Loose Coupled
      •    Scalable
      •    Fault Tolerant
      •    Budget dependent




5-Apr-09                 CloudCamp - Bangalore
IRS Architecture
           Pipeline
                                                                                            SQS
                      Pipe Crawler           Pipe Parser            Pipe Indexer




                                                                                           EC2
                 Crawler                        Parser                     Indexer




                                                EC2 + S3                             SDB
                               Data Store                               Errors



5-Apr-09                                    CloudCamp - Bangalore
Pipes and Pipelines




5-Apr-09         CloudCamp - Bangalore
Pipes and Pipelines
      • Pipes contain jobs
      • Pipeline is a group of pipe
      • Easy to create pipelines and add pipes




5-Apr-09                 CloudCamp - Bangalore
Job ORM
                                                                SQS API
      class CrawlerJob (JobBase):
                                                                SDB API
          class SDBInterfaceConfig:
              domain_name = settings.CRAWLER_JOB_DOMAIN

           class SQSInterfaceConfig:
               queue_name = settings.CRAWLER_JOB_QUEUE
               timeout = settings.CRAWLER_JOB_TIMEOUT

           class AWSMetaData:
               action = CharField (...)
               url = CharField (...)
               ...
               ...
      Default attributes of each Job:
      • Pipeline Name
      • Status
      • Start Time
      • End Time
      • Id

5-Apr-09                                CloudCamp - Bangalore
Job Processing
      for i in range (num_of_jobs):
         try:
             job = cls.jobclass.sqs_get() # process job
             ...
         except Exception, e:
             job.job_processing_complete(…)
             fsdebug.mail_admins (..)
             end_transaction(rollback = True)
             job.sdb_save() # save in error store
          finally:
             job.sqs_del() # delete the job



5-Apr-09                   CloudCamp - Bangalore
The Good
      • Architecture easy to extend
      • ORM approach is a big time saver
      • Simple to add new services




5-Apr-09                CloudCamp - Bangalore
The Bad
      • Messages may be lost
           – Service Failure
           – SQS deletes messages after 4 days.


      Imp: System should be able to recreate jobs




5-Apr-09                     CloudCamp - Bangalore
Storage




5-Apr-09   CloudCamp - Bangalore
What do we store?
      • Crawler Data – Web Pages
      • Extracted Content – Questions/Answers
      • Backups




5-Apr-09                CloudCamp - Bangalore
Storage Structure

                        Meta Data                        Key + Value


           Postgres                                 S3




5-Apr-09                    CloudCamp - Bangalore
ORM
      • Extended Django ORM to support S3

      class S3WebPage (S3Model):
          _allowed_attrs = [quot;urlquot;, quot;contentquot;, ..]
          _name = quot;S3WebPage“
          ...
          ...




5-Apr-09               CloudCamp - Bangalore
The Good
      • Extremely scalable
      • Possible to store Python objects in S3
      • Latency issues can be solved by using a
        caching layer
      • No need to backup S3 data
      • Storage is cheap



5-Apr-09                 CloudCamp - Bangalore
The Bad
      • Postgres + S3 is not an elegant solution
           – Periodic syncing of Postgres and S3 required
      • High transaction costs
           – $.01 per 1000 PUT,COPY,POST or LIST Requests
           – $.01 per 10000 GET Requests




5-Apr-09                      CloudCamp - Bangalore
Computing




5-Apr-09    CloudCamp - Bangalore
EC2 – The Good
      • Computing needs are not constant
      • Data transfer to other AWS services is free
      • AMI’s per node type




5-Apr-09                 CloudCamp - Bangalore
The bad
      • Missed having a nerve center
           – Budget
           – Job Load
           – CPU load
      • Low cost 64bit severs are not available




5-Apr-09                 CloudCamp - Bangalore
Thank You

Mais conteúdo relacionado

Mais procurados

Infrastructure as a code: a cloud approach
Infrastructure as a code: a cloud approachInfrastructure as a code: a cloud approach
Infrastructure as a code: a cloud approachThinkOpen
 
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...Amazon Web Services
 
Best practices to use aws in countryside.
Best practices to use aws in countryside.Best practices to use aws in countryside.
Best practices to use aws in countryside.Takuya Tachibana
 
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS Amazon Web Services
 
ShadowReader - Serverless load tests for replaying production traffic
ShadowReader - Serverless load tests for replaying production trafficShadowReader - Serverless load tests for replaying production traffic
ShadowReader - Serverless load tests for replaying production trafficYuki Sawa
 
Gpu accelerated BERT deployment on aws
Gpu accelerated BERT deployment on awsGpu accelerated BERT deployment on aws
Gpu accelerated BERT deployment on awsJohn Varghese
 
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...Seldon
 
Mvp skill saturday ep09 _06072019_azure updates - july 2019
Mvp skill saturday ep09 _06072019_azure updates - july 2019Mvp skill saturday ep09 _06072019_azure updates - july 2019
Mvp skill saturday ep09 _06072019_azure updates - july 2019Kumton Suttiraksiri
 
Siebel CRM in Production - What Now?
Siebel CRM in Production - What  Now?Siebel CRM in Production - What  Now?
Siebel CRM in Production - What Now?Frank
 
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D..."Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...Vadym Kazulkin
 
Communication tool & Environment for Remote Worker
Communication tool & Environment for Remote WorkerCommunication tool & Environment for Remote Worker
Communication tool & Environment for Remote WorkerShotaro Sakamaki
 
How to improve lambda cold starts
How to improve lambda cold startsHow to improve lambda cold starts
How to improve lambda cold startsYan Cui
 
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAmazon Web Services
 
Empowering Amazon EC2 with GigaSpaces XAP
Empowering Amazon EC2 with GigaSpaces XAPEmpowering Amazon EC2 with GigaSpaces XAP
Empowering Amazon EC2 with GigaSpaces XAPUri Cohen
 
Serverless Apps on Google Cloud: more dev, less ops
Serverless Apps on Google Cloud:  more dev, less opsServerless Apps on Google Cloud:  more dev, less ops
Serverless Apps on Google Cloud: more dev, less opsJoseph Lust
 
SpringOne Tour St. Louis - Serverless Spring
SpringOne Tour St. Louis - Serverless SpringSpringOne Tour St. Louis - Serverless Spring
SpringOne Tour St. Louis - Serverless SpringVMware Tanzu
 
Experiences sharing about Lambda, Kinesis, and Postgresql
Experiences sharing about Lambda, Kinesis, and PostgresqlExperiences sharing about Lambda, Kinesis, and Postgresql
Experiences sharing about Lambda, Kinesis, and PostgresqlOkis Chuang
 

Mais procurados (17)

Infrastructure as a code: a cloud approach
Infrastructure as a code: a cloud approachInfrastructure as a code: a cloud approach
Infrastructure as a code: a cloud approach
 
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
 
Best practices to use aws in countryside.
Best practices to use aws in countryside.Best practices to use aws in countryside.
Best practices to use aws in countryside.
 
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
 
ShadowReader - Serverless load tests for replaying production traffic
ShadowReader - Serverless load tests for replaying production trafficShadowReader - Serverless load tests for replaying production traffic
ShadowReader - Serverless load tests for replaying production traffic
 
Gpu accelerated BERT deployment on aws
Gpu accelerated BERT deployment on awsGpu accelerated BERT deployment on aws
Gpu accelerated BERT deployment on aws
 
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
 
Mvp skill saturday ep09 _06072019_azure updates - july 2019
Mvp skill saturday ep09 _06072019_azure updates - july 2019Mvp skill saturday ep09 _06072019_azure updates - july 2019
Mvp skill saturday ep09 _06072019_azure updates - july 2019
 
Siebel CRM in Production - What Now?
Siebel CRM in Production - What  Now?Siebel CRM in Production - What  Now?
Siebel CRM in Production - What Now?
 
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D..."Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
 
Communication tool & Environment for Remote Worker
Communication tool & Environment for Remote WorkerCommunication tool & Environment for Remote Worker
Communication tool & Environment for Remote Worker
 
How to improve lambda cold starts
How to improve lambda cold startsHow to improve lambda cold starts
How to improve lambda cold starts
 
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
 
Empowering Amazon EC2 with GigaSpaces XAP
Empowering Amazon EC2 with GigaSpaces XAPEmpowering Amazon EC2 with GigaSpaces XAP
Empowering Amazon EC2 with GigaSpaces XAP
 
Serverless Apps on Google Cloud: more dev, less ops
Serverless Apps on Google Cloud:  more dev, less opsServerless Apps on Google Cloud:  more dev, less ops
Serverless Apps on Google Cloud: more dev, less ops
 
SpringOne Tour St. Louis - Serverless Spring
SpringOne Tour St. Louis - Serverless SpringSpringOne Tour St. Louis - Serverless Spring
SpringOne Tour St. Louis - Serverless Spring
 
Experiences sharing about Lambda, Kinesis, and Postgresql
Experiences sharing about Lambda, Kinesis, and PostgresqlExperiences sharing about Lambda, Kinesis, and Postgresql
Experiences sharing about Lambda, Kinesis, and Postgresql
 

Destaque

ACM Bangalore Distinguished Speaker Program
ACM Bangalore Distinguished Speaker ProgramACM Bangalore Distinguished Speaker Program
ACM Bangalore Distinguished Speaker ProgramACMBangalore
 
Opening Remarks - Cloud Symposium
Opening Remarks - Cloud SymposiumOpening Remarks - Cloud Symposium
Opening Remarks - Cloud SymposiumACMBangalore
 
Overview of FreeBSD PMC Tools
Overview of FreeBSD PMC ToolsOverview of FreeBSD PMC Tools
Overview of FreeBSD PMC ToolsACMBangalore
 
The power of abstraction
The power of abstractionThe power of abstraction
The power of abstractionACMBangalore
 
Securing Wireless Cellular Systems
Securing Wireless Cellular SystemsSecuring Wireless Cellular Systems
Securing Wireless Cellular SystemsACMBangalore
 
cloud - internet rengineering
cloud - internet rengineeringcloud - internet rengineering
cloud - internet rengineeringACMBangalore
 
Automated Design of Digital Microfluids Lab-on-Chip
Automated Design of Digital Microfluids Lab-on-ChipAutomated Design of Digital Microfluids Lab-on-Chip
Automated Design of Digital Microfluids Lab-on-ChipACMBangalore
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
 

Destaque (9)

ACM Bangalore Distinguished Speaker Program
ACM Bangalore Distinguished Speaker ProgramACM Bangalore Distinguished Speaker Program
ACM Bangalore Distinguished Speaker Program
 
Opening Remarks - Cloud Symposium
Opening Remarks - Cloud SymposiumOpening Remarks - Cloud Symposium
Opening Remarks - Cloud Symposium
 
Overview of FreeBSD PMC Tools
Overview of FreeBSD PMC ToolsOverview of FreeBSD PMC Tools
Overview of FreeBSD PMC Tools
 
The power of abstraction
The power of abstractionThe power of abstraction
The power of abstraction
 
Securing Wireless Cellular Systems
Securing Wireless Cellular SystemsSecuring Wireless Cellular Systems
Securing Wireless Cellular Systems
 
In Home Safety Tips
In Home Safety TipsIn Home Safety Tips
In Home Safety Tips
 
cloud - internet rengineering
cloud - internet rengineeringcloud - internet rengineering
cloud - internet rengineering
 
Automated Design of Digital Microfluids Lab-on-Chip
Automated Design of Digital Microfluids Lab-on-ChipAutomated Design of Digital Microfluids Lab-on-Chip
Automated Design of Digital Microfluids Lab-on-Chip
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...
 

Semelhante a Lesson from Building a Search Engine using the cloud

Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Szabolcs Zajdó
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...Stuart Myles
 
Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
Deploy and Serve Model from Azure Databricks onto Azure Machine LearningDeploy and Serve Model from Azure Databricks onto Azure Machine Learning
Deploy and Serve Model from Azure Databricks onto Azure Machine LearningDatabricks
 
Cloud Computing in Practice
Cloud Computing in PracticeCloud Computing in Practice
Cloud Computing in PracticeKing Huang
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with BlackfireMarko Mitranić
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand
 
Scaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web ServicesScaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web ServicesAndrew Turner
 
Serverless Data Lake on AWS
Serverless Data Lake on AWSServerless Data Lake on AWS
Serverless Data Lake on AWSThanh Nguyen
 
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...Edureka!
 
Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)Yan Cui
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...SQUADEX
 
Metail and Elastic MapReduce
Metail and Elastic MapReduceMetail and Elastic MapReduce
Metail and Elastic MapReduceGareth Rogers
 
Serverless in Production, an experience report (cloudXchange)
Serverless in Production, an experience report (cloudXchange)Serverless in Production, an experience report (cloudXchange)
Serverless in Production, an experience report (cloudXchange)Yan Cui
 
Azure and/or AWS: How to Choose the best cloud platform for your project
Azure and/or AWS: How to Choose the best cloud platform for your projectAzure and/or AWS: How to Choose the best cloud platform for your project
Azure and/or AWS: How to Choose the best cloud platform for your projectEastBanc Tachnologies
 
Navigate Data Service using AWS
Navigate Data Service using AWSNavigate Data Service using AWS
Navigate Data Service using AWSArno Broekhof
 
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)Amazon Web Services
 
AWS Lambda from the trenches
AWS Lambda from the trenchesAWS Lambda from the trenches
AWS Lambda from the trenchesYan Cui
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon TokyoSid Anand
 
Aws Introduction, technology and $ sense
Aws Introduction, technology and $ senseAws Introduction, technology and $ sense
Aws Introduction, technology and $ senseSachin Dole
 
Big Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the CloudBig Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the CloudGeorge Ang
 

Semelhante a Lesson from Building a Search Engine using the cloud (20)

Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
 
Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
Deploy and Serve Model from Azure Databricks onto Azure Machine LearningDeploy and Serve Model from Azure Databricks onto Azure Machine Learning
Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
 
Cloud Computing in Practice
Cloud Computing in PracticeCloud Computing in Practice
Cloud Computing in Practice
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
Scaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web ServicesScaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web Services
 
Serverless Data Lake on AWS
Serverless Data Lake on AWSServerless Data Lake on AWS
Serverless Data Lake on AWS
 
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
 
Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
 
Metail and Elastic MapReduce
Metail and Elastic MapReduceMetail and Elastic MapReduce
Metail and Elastic MapReduce
 
Serverless in Production, an experience report (cloudXchange)
Serverless in Production, an experience report (cloudXchange)Serverless in Production, an experience report (cloudXchange)
Serverless in Production, an experience report (cloudXchange)
 
Azure and/or AWS: How to Choose the best cloud platform for your project
Azure and/or AWS: How to Choose the best cloud platform for your projectAzure and/or AWS: How to Choose the best cloud platform for your project
Azure and/or AWS: How to Choose the best cloud platform for your project
 
Navigate Data Service using AWS
Navigate Data Service using AWSNavigate Data Service using AWS
Navigate Data Service using AWS
 
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
 
AWS Lambda from the trenches
AWS Lambda from the trenchesAWS Lambda from the trenches
AWS Lambda from the trenches
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
 
Aws Introduction, technology and $ sense
Aws Introduction, technology and $ senseAws Introduction, technology and $ sense
Aws Introduction, technology and $ sense
 
Big Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the CloudBig Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the Cloud
 

Mais de ACMBangalore

Clouds in emerging markets
Clouds in emerging marketsClouds in emerging markets
Clouds in emerging marketsACMBangalore
 
Opportunites and Challenges in Cloud COmputing
Opportunites and Challenges in Cloud COmputingOpportunites and Challenges in Cloud COmputing
Opportunites and Challenges in Cloud COmputingACMBangalore
 
Perspectives on Cloud COmputing - Google
Perspectives on Cloud COmputing - GooglePerspectives on Cloud COmputing - Google
Perspectives on Cloud COmputing - GoogleACMBangalore
 
Making of a Successful Cloud Business
Making of a Successful Cloud BusinessMaking of a Successful Cloud Business
Making of a Successful Cloud BusinessACMBangalore
 
Web Business Platforms on the Cloud
Web Business Platforms on the CloudWeb Business Platforms on the Cloud
Web Business Platforms on the CloudACMBangalore
 
Badrinath Ramamurthy Cloud Infrastructure
Badrinath Ramamurthy   Cloud InfrastructureBadrinath Ramamurthy   Cloud Infrastructure
Badrinath Ramamurthy Cloud InfrastructureACMBangalore
 
market oriented cloud
market oriented cloudmarket oriented cloud
market oriented cloudACMBangalore
 
Case study - SaaS Abs Experience Jan07 09
Case study - SaaS Abs Experience Jan07 09Case study - SaaS Abs Experience Jan07 09
Case study - SaaS Abs Experience Jan07 09ACMBangalore
 
virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009ACMBangalore
 

Mais de ACMBangalore (9)

Clouds in emerging markets
Clouds in emerging marketsClouds in emerging markets
Clouds in emerging markets
 
Opportunites and Challenges in Cloud COmputing
Opportunites and Challenges in Cloud COmputingOpportunites and Challenges in Cloud COmputing
Opportunites and Challenges in Cloud COmputing
 
Perspectives on Cloud COmputing - Google
Perspectives on Cloud COmputing - GooglePerspectives on Cloud COmputing - Google
Perspectives on Cloud COmputing - Google
 
Making of a Successful Cloud Business
Making of a Successful Cloud BusinessMaking of a Successful Cloud Business
Making of a Successful Cloud Business
 
Web Business Platforms on the Cloud
Web Business Platforms on the CloudWeb Business Platforms on the Cloud
Web Business Platforms on the Cloud
 
Badrinath Ramamurthy Cloud Infrastructure
Badrinath Ramamurthy   Cloud InfrastructureBadrinath Ramamurthy   Cloud Infrastructure
Badrinath Ramamurthy Cloud Infrastructure
 
market oriented cloud
market oriented cloudmarket oriented cloud
market oriented cloud
 
Case study - SaaS Abs Experience Jan07 09
Case study - SaaS Abs Experience Jan07 09Case study - SaaS Abs Experience Jan07 09
Case study - SaaS Abs Experience Jan07 09
 
virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009
 

Último

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Lesson from Building a Search Engine using the cloud

  • 1. Lessons from building a search engine with Amazon Web Services Chirayu Patel chirayu@snappyfingers.com
  • 2. Chirayu Patel Developer SnappyFingers Question and Answer Search Engine 100% in the cloud 5-Apr-09 CloudCamp - Bangalore
  • 3. My experiences What worked? What didn’t? What could be done better? What did I miss? 5-Apr-09 CloudCamp - Bangalore
  • 4. Recap - AWS? • EC2 – Elastic Compute Cloud • S3 – Simple Storage Service • SQS – Simple Queue Service • SDB – SimpleDB 5-Apr-09 CloudCamp - Bangalore
  • 5. Why AWS? • Computing Power requirements unknown • Cheap • Availability of multiple services • Easy to implement SnappyFingers architecture using AWS services 5-Apr-09 CloudCamp - Bangalore
  • 6. SnappyFingers • Information Retrieval System (IRS) • FrontEnd – Nothing unique here 5-Apr-09 CloudCamp - Bangalore
  • 7. Three motivations (behind my decisions) • Reluctance to learn • Cost Conscious • I write buggy code 5-Apr-09 CloudCamp - Bangalore
  • 8. Architectural Requirements • Loose Coupled • Scalable • Fault Tolerant • Budget dependent 5-Apr-09 CloudCamp - Bangalore
  • 9. IRS Architecture Pipeline SQS Pipe Crawler Pipe Parser Pipe Indexer EC2 Crawler Parser Indexer EC2 + S3 SDB Data Store Errors 5-Apr-09 CloudCamp - Bangalore
  • 10. Pipes and Pipelines 5-Apr-09 CloudCamp - Bangalore
  • 11. Pipes and Pipelines • Pipes contain jobs • Pipeline is a group of pipe • Easy to create pipelines and add pipes 5-Apr-09 CloudCamp - Bangalore
  • 12. Job ORM SQS API class CrawlerJob (JobBase): SDB API class SDBInterfaceConfig: domain_name = settings.CRAWLER_JOB_DOMAIN class SQSInterfaceConfig: queue_name = settings.CRAWLER_JOB_QUEUE timeout = settings.CRAWLER_JOB_TIMEOUT class AWSMetaData: action = CharField (...) url = CharField (...) ... ... Default attributes of each Job: • Pipeline Name • Status • Start Time • End Time • Id 5-Apr-09 CloudCamp - Bangalore
  • 13. Job Processing for i in range (num_of_jobs): try: job = cls.jobclass.sqs_get() # process job ... except Exception, e: job.job_processing_complete(…) fsdebug.mail_admins (..) end_transaction(rollback = True) job.sdb_save() # save in error store finally: job.sqs_del() # delete the job 5-Apr-09 CloudCamp - Bangalore
  • 14. The Good • Architecture easy to extend • ORM approach is a big time saver • Simple to add new services 5-Apr-09 CloudCamp - Bangalore
  • 15. The Bad • Messages may be lost – Service Failure – SQS deletes messages after 4 days. Imp: System should be able to recreate jobs 5-Apr-09 CloudCamp - Bangalore
  • 16. Storage 5-Apr-09 CloudCamp - Bangalore
  • 17. What do we store? • Crawler Data – Web Pages • Extracted Content – Questions/Answers • Backups 5-Apr-09 CloudCamp - Bangalore
  • 18. Storage Structure Meta Data Key + Value Postgres S3 5-Apr-09 CloudCamp - Bangalore
  • 19. ORM • Extended Django ORM to support S3 class S3WebPage (S3Model): _allowed_attrs = [quot;urlquot;, quot;contentquot;, ..] _name = quot;S3WebPage“ ... ... 5-Apr-09 CloudCamp - Bangalore
  • 20. The Good • Extremely scalable • Possible to store Python objects in S3 • Latency issues can be solved by using a caching layer • No need to backup S3 data • Storage is cheap 5-Apr-09 CloudCamp - Bangalore
  • 21. The Bad • Postgres + S3 is not an elegant solution – Periodic syncing of Postgres and S3 required • High transaction costs – $.01 per 1000 PUT,COPY,POST or LIST Requests – $.01 per 10000 GET Requests 5-Apr-09 CloudCamp - Bangalore
  • 22. Computing 5-Apr-09 CloudCamp - Bangalore
  • 23. EC2 – The Good • Computing needs are not constant • Data transfer to other AWS services is free • AMI’s per node type 5-Apr-09 CloudCamp - Bangalore
  • 24. The bad • Missed having a nerve center – Budget – Job Load – CPU load • Low cost 64bit severs are not available 5-Apr-09 CloudCamp - Bangalore