SlideShare uma empresa Scribd logo
1 de 42
Databases in the Cloud
     Seminar: Big Data Analytics
                Winter Semester 2012-13



             Moshfiqur Rahman
Agenda
       Cloud Computing Introduction
       Big Data
       RDBMS and Cloud Databases
       Scalability, Elasticity, Availability – New attributes for
        databases
           In RDBMS
           In Cloud Databases
       Challenges in Cloud Databases
       Big Data Analytics in Cloud
       Conclusion

    2
Cloud Computing Introduction
       What is in the Cloud?
           Application as a service
           Hardware and system software
       Public and Private Cloud
       Cloud Computing attributes
           Virtually infinite computing resources
           Start small and grow as needed
           Pay-per-use scheme




    3
Cloud Computing Introduction




                      Source: http://en.wikipedia.org/wiki/Cloud_computing

4
Big Data
       Large and complex data sets
           Exponential growth
           Structured, semi-structured, unstructured
           Hard to process in traditional database system
       Challenges with Big Data
           Capture, scrutinization, storage, search, sharing, analysis…
       Big Data sources
           Mobile devices
           Sensors
           Software/server logs
           Cameras and so on…

    5
Big Data Attributes
       Volume
           Factors contribute to the increase of data, for example, text streams
            from social networks
           Hidden relationships in data
           Data storage cost decreased but data analysis issues increased!
       Variety
           Data can be of all possible formats
           Structured/semi-structured data from RDBMS
           Unstructured data from documents, emails, video, audio, sensors
       Velocity
           Keep up the data processing speed with data production speed
           Streams of real time data from sensors and social media
           Reacting quickly to the increase of data velocity

    6
Relational Database
       A relational database is
           Collection of tables (entities)
               Multiple columns
               Multiple rows (tuples)
       Accessed by SQL
       Join multiple tables to get related data
       Normalization is used to minimize redundancy and
        dependency
       Referential Integrity is used to ensure data consistency
       Managed by Relational Database Management System
        (RDBMS)
       Oracle, MS SQL Server, MySQL, etc.

    7
RDBMS - A misfit for cloud?
       RDBMS has
           Simplicity
           Robustness
           Flexibility
           Performance
           Compatibility
           (Limited) Scalability
       Cloud databases require
           Scalability
           Elasticity
           Availability

    8
Cloud Databases
       Key/value store – a new kind of database management
        system
           Store data as key/value pair
           Targeted for specialized applications where a RDBMS is not
            suitable
       Also known as
           Document-oriented database
           Internet-facing database
           Attribute-oriented database
           Distributed database, etc.



    9
Cloud Databases - Advantages
    Stores data in format of items
        Customer items, Order items in an e-commerce system
        A single item contains all the relevant data
    Relationships are not deprecated, just simplified
        Order items contains the keys of associated Customer item
         and Product items
    Able to scale easily and dynamically
        Allows the user to pay only for used resources
        Allows the vendor to scale their infrastructure depending on
         their entire platform size



    10
Cloud Databases - Advantages
    Reduce the development time
        By decreasing developing time with object relational data
         mapping
        Easier to map application object to key/value database items




    11
Cloud Databases - Disadvantages
    Relationships are not defined in data models
        DBMS cannot enforce data integrity
        Deleting item from a set of related items will make data
         inconsistent
    No shared standard
        Totally different set of APIs
        Application developed for one cloud vendor is hard to port to
         another cloud vendor




    12
Scalability
    Desired property of a system to accommodate growing
     amounts of work
        By adding more hardware in single machine
        By adding more machines (a.k.a. node)
    Two ways to scale a system
        Vertically or Scale Up
            New hardware is added to a single node in a system
            Adding more processors or memory to a single machine
        Horizontally or Scale Out
            Add more nodes to a system
            Scaling out from one web-server system to a three web-server
             system

    13
Elasticity
    Ability to spread the workloads dynamically over the
     available resources
        Automatically adds more resources when workload increases
        Automatically shrinks back and removes the unneeded
         resources when workload decreases
    Very important for cloud environment
        Pay-per-scheme




    14
Availability
    Allows the user read and write data at any time without
     blocking them
    Response time is virtually constant and does not depend
     on
        Number of concurrent users
        Database size
        Any other system parameter
    Automatic data backups and failover management




    15
Scalability in RDBMS
    RDBMS provides limited scalability
        Scale up on a single node
        Scale out with relatively small numbers of nodes
    Scale up is not infinite but increase in workload can be
     virtually infinite
    Scale out is overwhelming in system with hundreds or
     thousands of nodes




    16
Elasticity in RDBMS
    RDBMS allows very limited elasticity at storage and
     web/application server layers
        Add a web server when the workload increases and adjust the
         throughput to dissipate the loads to the new server
        When workload decreases, detach the server from the system,
         use it for different purposes
        At storage layers, more disks can be added
    Adding a bigger machine and replace the overloaded
     database server
        Expensive investment
        Unnecessary investment for a seasonal hype


    17
Availability in RDBMS
    Employs storage redundancy by performing data
     replication
        Also ensure improved performance for concurrent users
        Provides resiliency in case of a failure
    Data replication is not so easy process
        Synchronization
        Replicate the whole database to make synchronization easier




    18
Scalability, Elasticity and Availability in
Cloud Databases
    New breed of databases focusing on scalability, elasticity
     and availability
        Key/value store supports nearly limitless scalability
        In the expense of other benefits come with RDBMS
    Data accessed by a single key
        Provides the basis for scalability
        Data item is contained in a single object and handled by a
         single node
    Some modern applications need multiple key/value pair
     access atomically
        Online multi-player games, Google Drive
        Hence required multi-key atomicity

    19
Scalability, Elasticity and Availability in
Cloud Databases
    Different database implementations
        Google’s MegaStore
        G-Store
        Relational Cloud
        ElasTras




    20
MegaStore
    Uses Bigtable as the underlying system
    Provides multi-key atomicity
        Data Fusion
        Group multiple key/value pair as single collection
        Write/ahead logging
        Two-phase commit to support ACID transactions on a
         collection




    21
MegaStore
    Advantages
        Allows entities to be arbitrarily distributed over multiple nodes
        Better performance when entity group co-located in a single
         node
    Disadvantages
        Exhibits performance issues when entity group is distributed
         across multiple nodes




    22
G-Store
    Provides transactional multi-key access over dynamic,
     non-overlapping groups of keys
    Created groups are transient in nature
    Creates abstract group for on-demand transaction access
        Leader key, follower keys
        Ownership of read/write access transfers to the node hosting
         the key group
        No key should not be claimed by multiple group, no key should
         be without a owner




    23
G-Store
    Advantages
        Transactions are efficient for key group resides on single node
    Disadvantages
        A group must be small enough to reside on a single node




    24
Relational Cloud
    Works on Elasticity extensively
    Uses a graph-based partitioning method to split large
     databases across multiple machines
        Workload aware partitioning strategy
    Frontend transaction trace component
        keeps track of transactions
        Analyze the transactions to determine the set of tuples
         accessed together
        Creates a graph of transactions
        Weight is given to the edges to denote how often a
         transactions are executed

    25
Relational Cloud




26
Relational Cloud
    Advantages
        Uses the MySQL, Postgre-SQL as backend databases
        Migrate the database partitions without causing downtime
        Replicate the data for availability
    Disadvantages
        Scaling the graph representation is difficult as it leads to a
         graph with N nodes and up to N2 edges for an N-tuple
         database




    27
ElasTras
    A cloud database under research providing better
     scalability and elasticity with transactional data access




                        Figure: System overview of ElasTras

    28
ElasTras
    Two level Transaction Manager (TM)
        Higher level TM (HTM)
        Owning TM (OTM)
    When any transaction request arrives
        Load balancer uses some load balancing policy and forward the
         request to appropriate HTM
        HTM decides whether to execute the transaction locally or
         forward to OTM
        OTM has exclusive access rights to the data accessed by a
         single transaction
    System state information and database metadata managed
     by Metadata Manager

    29
ElasTras
    Two approach to partition the database
        Static Partitioning
        Dynamic Partitioning
    Static Partitioning
        Database designer defines the partitioning
        ElasTras is responsible for mapping the partitions to their
         specific OTMs
        Also reassigns partitions if workload increases
        Application has the knowledge of partitions
        ElasTras can provide ACID transactional guarantees as
         transactions executed locally to a partition


    30
ElasTras
    Dynamic Partitioning
        Basis for the elasticity of the data store
        Uses range or hash based partitioning scheme
        Applications are not aware of the partitions
        Transactions are not guaranteed to be limited to a single
         partition
        Provides mini transactions with restricted transactional
         semantics to ensure scalability and avoid distributed
         transactions
        Mini transactions ensures recovery but no global
         synchronization


    31
ElasTras
    Advantages
        Provides transactional guarantees in scalable manner
        OTM’s reassigning partitions capability with changing workload
         ensures elasticity and scalability
        Provides ACID transactions when transactions are limited to a
         single partition
    Disadvantages
        In dynamic partitioning, ElasTras only supports mini
         transactions with restricted transactional semantics to avoid
         distributed transactions
        Mini transactions only ensure recovery but no global
         synchronization

    32
Challenges in Cloud Databases
    Importing data
        Data transport are complex and may incur huge cost
    Auto failover management
        Server crashes, hardware malfunction
        Database must be replicated, automatically replace and start
         working if any failure occurs
    Auto scalability and elasticity management
        Scale instantly and automatically both throughput and size
        Very granular increases and shrinking back in resources




    33
Big Data Analytics
    Big Data Analytics
        Process of analyzing huge amount of structured, semi-
         structured and unstructured data of variety types
        Discover the hidden patterns and unknown correlations in
         data
    Companies are interested in big data analytics to achieve
     competitive advantages over rival companies
        Through effective marketing
        Propose new innovative services




    34
Big Data Analytics
    Big data analytics help companies make better business
     decisions
    Traditional analytic software are available for data analysis
        Advanced technologies such as predictive analysis, data mining, etc.
    But, traditional analytics software
        is not suitable for big data with semi-structured and/or unstructured
         data
        is not able to handle the demand of processing power needs to
         analyze those big data
    New class of big data analytics environment has emerged
        NoSQL databases
        Hadoop
        MapReduce


    35
Big Data Analytics in Cloud
    Available database as a service in Cloud
        Amazon SimpleDB
        Google AppEngine
        Microsoft SQL Azure
        so on…
    Limitations in Cloud
        Limitations over query execution time, for example, Amazon
         SimpleDB restricts any query which takes more than 5 sec
        Limitations over result dataset size, for example, Google
         AppEngine does not allow users to retrieve more than 1000
         items for any query
    Impractical for big data analytics

    36
Big Data Analytics in Cloud
    Specialized solution for big data analytics in cloud
        Google BigQuery
        Amazon Elastic MapReduce (EMR)




    37
Google BigQuery
    Cloud based interactive query service for big data
    Implementation of Dremel, a parallel query engine
    Query executes on a small number of very large append-
     only tables
    Two core technology
        Columnar storage
            Records are separated in column values
            Put all single column values in different storage volume forming a tree
        Tree architecture
            Query pushing down to the branches of the tree
            Results are aggregated from the leaves


    38
Amazon Elastic MapReduce (EMR)
    A hosted Hadoop framework
    Provides a web service to process huge amounts of data
    Contains a MapReduce framework
        Sub divides the data in smaller chunks and process them in
         parallel (the “map” function)
        Recombines them into final solution (the “reduce” function)




    39
Google BigQuery vs. Amazon EMR
Head to Head

Google BigQuery                                     Amazon EMR
Interactive data analysis tool for large data set   A programming framework to process big data.
Comparable to Hive but claims to be faster          Accessible by data analysis application
than that                                           developed in Pig, Hive or other programming
                                                    languages using Amazon’s SDK
Designed to run faster query and user friendly      Supports implementing complex data
even for non-programmers with built-in GUI          processing logic
Good for ad-hoc and trial-and-error interactive     Good for batch processing of large dataset
query on large dataset for quick analysis and       doing time consuming data conversion and
troubleshooting                                     aggregation
Provides a regular expression engine to             Structuring data fully dependent on application
structure the unstructured data                     logic
Does not support large result set neither           Supports both large result set and joining of
joining of large tables                             table
Does not support updating existing data, only       Supports updating existing data
append of data is possible

  40
Conclusion
    End of RDBMS?
    Cloud databases for big data
        Finding relationships in data
        Solving the problem for scalability, elasticity and availability
    More rising issues
        Efficient multi tenancy
        Data privacy




    41
Thanks for your attention




42

Mais conteúdo relacionado

Mais procurados

Top 10 cloud service providers
Top 10 cloud service providersTop 10 cloud service providers
Top 10 cloud service providersVineet Garg
 
Cloud Computing Business Models
Cloud Computing Business ModelsCloud Computing Business Models
Cloud Computing Business ModelsKarri Huhtanen
 
AWS Global Infrastructure Foundations
AWS Global Infrastructure Foundations AWS Global Infrastructure Foundations
AWS Global Infrastructure Foundations Amazon Web Services
 
Introduction to Cloud Storage
Introduction to Cloud StorageIntroduction to Cloud Storage
Introduction to Cloud Storagelisbk
 
Week 3 lecture material cc
Week 3 lecture material ccWeek 3 lecture material cc
Week 3 lecture material ccAnkit Gupta
 
Cloud Computing PPT.pptx
Cloud Computing PPT.pptxCloud Computing PPT.pptx
Cloud Computing PPT.pptxHetKhandol
 
Issues in cloud computing
Issues in cloud computingIssues in cloud computing
Issues in cloud computingronak patel
 
Cloud Computing Security
Cloud Computing SecurityCloud Computing Security
Cloud Computing SecurityNinh Nguyen
 
9. Object Relational Databases in DBMS
9. Object Relational Databases in DBMS9. Object Relational Databases in DBMS
9. Object Relational Databases in DBMSkoolkampus
 
Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)Robert Grossman
 
Deployment Models of Cloud Computing.pptx
Deployment Models of Cloud Computing.pptxDeployment Models of Cloud Computing.pptx
Deployment Models of Cloud Computing.pptxJaya Silwal
 
Load Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptLoad Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptUtshab Saha
 
Introduction to GCP presentation
Introduction to GCP presentationIntroduction to GCP presentation
Introduction to GCP presentationMohit Kachhwani
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Apache web server
Apache web serverApache web server
Apache web serverSabiha M
 

Mais procurados (20)

Top 10 cloud service providers
Top 10 cloud service providersTop 10 cloud service providers
Top 10 cloud service providers
 
Cloud Computing Business Models
Cloud Computing Business ModelsCloud Computing Business Models
Cloud Computing Business Models
 
AWS Global Infrastructure Foundations
AWS Global Infrastructure Foundations AWS Global Infrastructure Foundations
AWS Global Infrastructure Foundations
 
Introduction to Cloud Storage
Introduction to Cloud StorageIntroduction to Cloud Storage
Introduction to Cloud Storage
 
Week 3 lecture material cc
Week 3 lecture material ccWeek 3 lecture material cc
Week 3 lecture material cc
 
Cloud Computing PPT.pptx
Cloud Computing PPT.pptxCloud Computing PPT.pptx
Cloud Computing PPT.pptx
 
Issues in cloud computing
Issues in cloud computingIssues in cloud computing
Issues in cloud computing
 
Google BigTable
Google BigTableGoogle BigTable
Google BigTable
 
AWS Simple Storage Service (s3)
AWS Simple Storage Service (s3) AWS Simple Storage Service (s3)
AWS Simple Storage Service (s3)
 
Cloud Computing Security
Cloud Computing SecurityCloud Computing Security
Cloud Computing Security
 
9. Object Relational Databases in DBMS
9. Object Relational Databases in DBMS9. Object Relational Databases in DBMS
9. Object Relational Databases in DBMS
 
Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)
 
Deployment Models of Cloud Computing.pptx
Deployment Models of Cloud Computing.pptxDeployment Models of Cloud Computing.pptx
Deployment Models of Cloud Computing.pptx
 
Cloud Computing Architecture
Cloud Computing ArchitectureCloud Computing Architecture
Cloud Computing Architecture
 
Load Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptLoad Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newppt
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
 
Introduction to GCP presentation
Introduction to GCP presentationIntroduction to GCP presentation
Introduction to GCP presentation
 
Object oriented databases
Object oriented databasesObject oriented databases
Object oriented databases
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Apache web server
Apache web serverApache web server
Apache web server
 

Destaque

Big Data Analytics - GTech Seminar
Big Data Analytics - GTech SeminarBig Data Analytics - GTech Seminar
Big Data Analytics - GTech SeminarBijilash Babu
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computingViet-Trung TRAN
 
Internet of Things (IoT) and Big Data
Internet of Things (IoT) and Big DataInternet of Things (IoT) and Big Data
Internet of Things (IoT) and Big DataGuido Schmutz
 
IoT + Big Data + Cloud + AI Integration Strategy Insights from Patents
IoT + Big Data + Cloud + AI Integration Strategy Insights from PatentsIoT + Big Data + Cloud + AI Integration Strategy Insights from Patents
IoT + Big Data + Cloud + AI Integration Strategy Insights from PatentsAlex G. Lee, Ph.D. Esq. CLP
 
Green Cloud Computing
Green Cloud ComputingGreen Cloud Computing
Green Cloud ComputingSeungyun Lee
 

Destaque (6)

Big Data Analytics - GTech Seminar
Big Data Analytics - GTech SeminarBig Data Analytics - GTech Seminar
Big Data Analytics - GTech Seminar
 
OPC -Connectivity using Java
OPC -Connectivity using JavaOPC -Connectivity using Java
OPC -Connectivity using Java
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computing
 
Internet of Things (IoT) and Big Data
Internet of Things (IoT) and Big DataInternet of Things (IoT) and Big Data
Internet of Things (IoT) and Big Data
 
IoT + Big Data + Cloud + AI Integration Strategy Insights from Patents
IoT + Big Data + Cloud + AI Integration Strategy Insights from PatentsIoT + Big Data + Cloud + AI Integration Strategy Insights from Patents
IoT + Big Data + Cloud + AI Integration Strategy Insights from Patents
 
Green Cloud Computing
Green Cloud ComputingGreen Cloud Computing
Green Cloud Computing
 

Semelhante a Databases in the Cloud Seminar Big Data Analytics

SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...IJCERT JOURNAL
 
Challenges Management and Opportunities of Cloud DBA
Challenges Management and Opportunities of Cloud DBAChallenges Management and Opportunities of Cloud DBA
Challenges Management and Opportunities of Cloud DBAinventy
 
Prepare Your Data For The Cloud
Prepare Your Data For The CloudPrepare Your Data For The Cloud
Prepare Your Data For The CloudIndicThreads
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfajajkhan16
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL TechnologiesAmit Singh
 
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...Darshan Gorasiya
 
Cidr11 paper32
Cidr11 paper32Cidr11 paper32
Cidr11 paper32jujukoko
 
Megastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesMegastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesJoão Gabriel Lima
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
A novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computingA novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computingJoão Gabriel Lima
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
 
Cloud Data Strategy event London
Cloud Data Strategy event LondonCloud Data Strategy event London
Cloud Data Strategy event LondonMongoDB
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
 

Semelhante a Databases in the Cloud Seminar Big Data Analytics (20)

SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
 
Challenges Management and Opportunities of Cloud DBA
Challenges Management and Opportunities of Cloud DBAChallenges Management and Opportunities of Cloud DBA
Challenges Management and Opportunities of Cloud DBA
 
Prepare Your Data For The Cloud
Prepare Your Data For The CloudPrepare Your Data For The Cloud
Prepare Your Data For The Cloud
 
Preparing your data for the cloud
Preparing your data for the cloudPreparing your data for the cloud
Preparing your data for the cloud
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
Bh25352355
Bh25352355Bh25352355
Bh25352355
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
No sq lv2
No sq lv2No sq lv2
No sq lv2
 
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
 
Cidr11 paper32
Cidr11 paper32Cidr11 paper32
Cidr11 paper32
 
Megastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesMegastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive services
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docx
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
A novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computingA novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computing
 
Preparing yourdataforcloud
Preparing yourdataforcloudPreparing yourdataforcloud
Preparing yourdataforcloud
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 
Cloud Data Strategy event London
Cloud Data Strategy event LondonCloud Data Strategy event London
Cloud Data Strategy event London
 
No sql
No sqlNo sql
No sql
 
No sql
No sqlNo sql
No sql
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 

Último

Escort Service in Al Nahda +971509530047 UAE
Escort Service in Al Nahda +971509530047 UAEEscort Service in Al Nahda +971509530047 UAE
Escort Service in Al Nahda +971509530047 UAEvecevep119
 
Escort Service in Al Barsha +971509530047 UAE
Escort Service in Al Barsha +971509530047 UAEEscort Service in Al Barsha +971509530047 UAE
Escort Service in Al Barsha +971509530047 UAEvecevep119
 
Call Girls In Hauz Khas 8375860717 Escorts Service Free Home Delivery
Call Girls In Hauz Khas 8375860717 Escorts Service Free Home DeliveryCall Girls In Hauz Khas 8375860717 Escorts Service Free Home Delivery
Call Girls In Hauz Khas 8375860717 Escorts Service Free Home Deliverydoor45step
 
Benjamin Portfolio Process Work Slideshow
Benjamin Portfolio Process Work SlideshowBenjamin Portfolio Process Work Slideshow
Benjamin Portfolio Process Work Slideshowssuser971f6c
 
Strip Zagor Extra 322 - Dva ortaka.pdf
Strip   Zagor Extra 322 - Dva ortaka.pdfStrip   Zagor Extra 322 - Dva ortaka.pdf
Strip Zagor Extra 322 - Dva ortaka.pdfStripovizijacom
 
Govindpuri Call Girls : ☎ 8527673949, Low rate Call Girls
Govindpuri Call Girls : ☎ 8527673949, Low rate Call GirlsGovindpuri Call Girls : ☎ 8527673949, Low rate Call Girls
Govindpuri Call Girls : ☎ 8527673949, Low rate Call Girlsashishs7044
 
A Selection of Tim Walsh's Recent Paintings
A Selection of Tim Walsh's  Recent PaintingsA Selection of Tim Walsh's  Recent Paintings
A Selection of Tim Walsh's Recent PaintingsTim Walsh
 
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call GirlDxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call GirlYinisingh
 
Kristy Soto's Industrial design Portfolio
Kristy Soto's Industrial design PortfolioKristy Soto's Industrial design Portfolio
Kristy Soto's Industrial design PortfolioKristySoto
 
TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...
TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...
TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...door45step
 
call girls in Noida New Ashok Nagar 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
call girls in Noida New Ashok Nagar 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...call girls in Noida New Ashok Nagar 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
call girls in Noida New Ashok Nagar 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...alexsharmaa01
 
Abu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call Girls
Abu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call GirlsAbu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call Girls
Abu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call Girlshayawit234
 
Karol Bagh Call Girls : ☎ 8527673949, Low rate Call Girls
Karol Bagh Call Girls : ☎ 8527673949, Low rate Call GirlsKarol Bagh Call Girls : ☎ 8527673949, Low rate Call Girls
Karol Bagh Call Girls : ☎ 8527673949, Low rate Call Girlsashishs7044
 
UNIT 5-6 anh văn chuyên nganhhhhhhh.docx
UNIT 5-6 anh văn chuyên nganhhhhhhh.docxUNIT 5-6 anh văn chuyên nganhhhhhhh.docx
UNIT 5-6 anh văn chuyên nganhhhhhhh.docxssuser519b4b
 
Value Aspiration And Culture Theory of Architecture
Value Aspiration And Culture Theory of ArchitectureValue Aspiration And Culture Theory of Architecture
Value Aspiration And Culture Theory of ArchitectureDarrenMasbate
 
8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Available
8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Available8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Available
8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Availabledollysharma2066
 
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts ServiceRussian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Servicedoor45step
 
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts ServiceIndian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Servicedoor45step
 
Mapeh Music QUARTER FOUR Grade nine haha
Mapeh Music QUARTER FOUR Grade nine hahaMapeh Music QUARTER FOUR Grade nine haha
Mapeh Music QUARTER FOUR Grade nine hahaJoshuaAcido2
 
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts ServiceIndian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Servicedoor45step
 

Último (20)

Escort Service in Al Nahda +971509530047 UAE
Escort Service in Al Nahda +971509530047 UAEEscort Service in Al Nahda +971509530047 UAE
Escort Service in Al Nahda +971509530047 UAE
 
Escort Service in Al Barsha +971509530047 UAE
Escort Service in Al Barsha +971509530047 UAEEscort Service in Al Barsha +971509530047 UAE
Escort Service in Al Barsha +971509530047 UAE
 
Call Girls In Hauz Khas 8375860717 Escorts Service Free Home Delivery
Call Girls In Hauz Khas 8375860717 Escorts Service Free Home DeliveryCall Girls In Hauz Khas 8375860717 Escorts Service Free Home Delivery
Call Girls In Hauz Khas 8375860717 Escorts Service Free Home Delivery
 
Benjamin Portfolio Process Work Slideshow
Benjamin Portfolio Process Work SlideshowBenjamin Portfolio Process Work Slideshow
Benjamin Portfolio Process Work Slideshow
 
Strip Zagor Extra 322 - Dva ortaka.pdf
Strip   Zagor Extra 322 - Dva ortaka.pdfStrip   Zagor Extra 322 - Dva ortaka.pdf
Strip Zagor Extra 322 - Dva ortaka.pdf
 
Govindpuri Call Girls : ☎ 8527673949, Low rate Call Girls
Govindpuri Call Girls : ☎ 8527673949, Low rate Call GirlsGovindpuri Call Girls : ☎ 8527673949, Low rate Call Girls
Govindpuri Call Girls : ☎ 8527673949, Low rate Call Girls
 
A Selection of Tim Walsh's Recent Paintings
A Selection of Tim Walsh's  Recent PaintingsA Selection of Tim Walsh's  Recent Paintings
A Selection of Tim Walsh's Recent Paintings
 
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call GirlDxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
 
Kristy Soto's Industrial design Portfolio
Kristy Soto's Industrial design PortfolioKristy Soto's Industrial design Portfolio
Kristy Soto's Industrial design Portfolio
 
TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...
TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...
TOP BEST Call Girls In SECTOR 62 (Noida) ꧁❤ 8375860717 ❤꧂ Female Escorts Serv...
 
call girls in Noida New Ashok Nagar 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
call girls in Noida New Ashok Nagar 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...call girls in Noida New Ashok Nagar 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
call girls in Noida New Ashok Nagar 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
 
Abu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call Girls
Abu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call GirlsAbu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call Girls
Abu Dhabi Housewife Call Girls +971509530047 Abu Dhabi Call Girls
 
Karol Bagh Call Girls : ☎ 8527673949, Low rate Call Girls
Karol Bagh Call Girls : ☎ 8527673949, Low rate Call GirlsKarol Bagh Call Girls : ☎ 8527673949, Low rate Call Girls
Karol Bagh Call Girls : ☎ 8527673949, Low rate Call Girls
 
UNIT 5-6 anh văn chuyên nganhhhhhhh.docx
UNIT 5-6 anh văn chuyên nganhhhhhhh.docxUNIT 5-6 anh văn chuyên nganhhhhhhh.docx
UNIT 5-6 anh văn chuyên nganhhhhhhh.docx
 
Value Aspiration And Culture Theory of Architecture
Value Aspiration And Culture Theory of ArchitectureValue Aspiration And Culture Theory of Architecture
Value Aspiration And Culture Theory of Architecture
 
8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Available
8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Available8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Available
8377087607, Door Step Call Girls In Gaur City (NOIDA) 24/7 Available
 
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts ServiceRussian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
 
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts ServiceIndian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
 
Mapeh Music QUARTER FOUR Grade nine haha
Mapeh Music QUARTER FOUR Grade nine hahaMapeh Music QUARTER FOUR Grade nine haha
Mapeh Music QUARTER FOUR Grade nine haha
 
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts ServiceIndian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
Indian High Profile Call Girls In Sector 18 Noida 8375860717 Escorts Service
 

Databases in the Cloud Seminar Big Data Analytics

  • 1. Databases in the Cloud Seminar: Big Data Analytics Winter Semester 2012-13 Moshfiqur Rahman
  • 2. Agenda  Cloud Computing Introduction  Big Data  RDBMS and Cloud Databases  Scalability, Elasticity, Availability – New attributes for databases  In RDBMS  In Cloud Databases  Challenges in Cloud Databases  Big Data Analytics in Cloud  Conclusion 2
  • 3. Cloud Computing Introduction  What is in the Cloud?  Application as a service  Hardware and system software  Public and Private Cloud  Cloud Computing attributes  Virtually infinite computing resources  Start small and grow as needed  Pay-per-use scheme 3
  • 4. Cloud Computing Introduction Source: http://en.wikipedia.org/wiki/Cloud_computing 4
  • 5. Big Data  Large and complex data sets  Exponential growth  Structured, semi-structured, unstructured  Hard to process in traditional database system  Challenges with Big Data  Capture, scrutinization, storage, search, sharing, analysis…  Big Data sources  Mobile devices  Sensors  Software/server logs  Cameras and so on… 5
  • 6. Big Data Attributes  Volume  Factors contribute to the increase of data, for example, text streams from social networks  Hidden relationships in data  Data storage cost decreased but data analysis issues increased!  Variety  Data can be of all possible formats  Structured/semi-structured data from RDBMS  Unstructured data from documents, emails, video, audio, sensors  Velocity  Keep up the data processing speed with data production speed  Streams of real time data from sensors and social media  Reacting quickly to the increase of data velocity 6
  • 7. Relational Database  A relational database is  Collection of tables (entities)  Multiple columns  Multiple rows (tuples)  Accessed by SQL  Join multiple tables to get related data  Normalization is used to minimize redundancy and dependency  Referential Integrity is used to ensure data consistency  Managed by Relational Database Management System (RDBMS)  Oracle, MS SQL Server, MySQL, etc. 7
  • 8. RDBMS - A misfit for cloud?  RDBMS has  Simplicity  Robustness  Flexibility  Performance  Compatibility  (Limited) Scalability  Cloud databases require  Scalability  Elasticity  Availability 8
  • 9. Cloud Databases  Key/value store – a new kind of database management system  Store data as key/value pair  Targeted for specialized applications where a RDBMS is not suitable  Also known as  Document-oriented database  Internet-facing database  Attribute-oriented database  Distributed database, etc. 9
  • 10. Cloud Databases - Advantages  Stores data in format of items  Customer items, Order items in an e-commerce system  A single item contains all the relevant data  Relationships are not deprecated, just simplified  Order items contains the keys of associated Customer item and Product items  Able to scale easily and dynamically  Allows the user to pay only for used resources  Allows the vendor to scale their infrastructure depending on their entire platform size 10
  • 11. Cloud Databases - Advantages  Reduce the development time  By decreasing developing time with object relational data mapping  Easier to map application object to key/value database items 11
  • 12. Cloud Databases - Disadvantages  Relationships are not defined in data models  DBMS cannot enforce data integrity  Deleting item from a set of related items will make data inconsistent  No shared standard  Totally different set of APIs  Application developed for one cloud vendor is hard to port to another cloud vendor 12
  • 13. Scalability  Desired property of a system to accommodate growing amounts of work  By adding more hardware in single machine  By adding more machines (a.k.a. node)  Two ways to scale a system  Vertically or Scale Up  New hardware is added to a single node in a system  Adding more processors or memory to a single machine  Horizontally or Scale Out  Add more nodes to a system  Scaling out from one web-server system to a three web-server system 13
  • 14. Elasticity  Ability to spread the workloads dynamically over the available resources  Automatically adds more resources when workload increases  Automatically shrinks back and removes the unneeded resources when workload decreases  Very important for cloud environment  Pay-per-scheme 14
  • 15. Availability  Allows the user read and write data at any time without blocking them  Response time is virtually constant and does not depend on  Number of concurrent users  Database size  Any other system parameter  Automatic data backups and failover management 15
  • 16. Scalability in RDBMS  RDBMS provides limited scalability  Scale up on a single node  Scale out with relatively small numbers of nodes  Scale up is not infinite but increase in workload can be virtually infinite  Scale out is overwhelming in system with hundreds or thousands of nodes 16
  • 17. Elasticity in RDBMS  RDBMS allows very limited elasticity at storage and web/application server layers  Add a web server when the workload increases and adjust the throughput to dissipate the loads to the new server  When workload decreases, detach the server from the system, use it for different purposes  At storage layers, more disks can be added  Adding a bigger machine and replace the overloaded database server  Expensive investment  Unnecessary investment for a seasonal hype 17
  • 18. Availability in RDBMS  Employs storage redundancy by performing data replication  Also ensure improved performance for concurrent users  Provides resiliency in case of a failure  Data replication is not so easy process  Synchronization  Replicate the whole database to make synchronization easier 18
  • 19. Scalability, Elasticity and Availability in Cloud Databases  New breed of databases focusing on scalability, elasticity and availability  Key/value store supports nearly limitless scalability  In the expense of other benefits come with RDBMS  Data accessed by a single key  Provides the basis for scalability  Data item is contained in a single object and handled by a single node  Some modern applications need multiple key/value pair access atomically  Online multi-player games, Google Drive  Hence required multi-key atomicity 19
  • 20. Scalability, Elasticity and Availability in Cloud Databases  Different database implementations  Google’s MegaStore  G-Store  Relational Cloud  ElasTras 20
  • 21. MegaStore  Uses Bigtable as the underlying system  Provides multi-key atomicity  Data Fusion  Group multiple key/value pair as single collection  Write/ahead logging  Two-phase commit to support ACID transactions on a collection 21
  • 22. MegaStore  Advantages  Allows entities to be arbitrarily distributed over multiple nodes  Better performance when entity group co-located in a single node  Disadvantages  Exhibits performance issues when entity group is distributed across multiple nodes 22
  • 23. G-Store  Provides transactional multi-key access over dynamic, non-overlapping groups of keys  Created groups are transient in nature  Creates abstract group for on-demand transaction access  Leader key, follower keys  Ownership of read/write access transfers to the node hosting the key group  No key should not be claimed by multiple group, no key should be without a owner 23
  • 24. G-Store  Advantages  Transactions are efficient for key group resides on single node  Disadvantages  A group must be small enough to reside on a single node 24
  • 25. Relational Cloud  Works on Elasticity extensively  Uses a graph-based partitioning method to split large databases across multiple machines  Workload aware partitioning strategy  Frontend transaction trace component  keeps track of transactions  Analyze the transactions to determine the set of tuples accessed together  Creates a graph of transactions  Weight is given to the edges to denote how often a transactions are executed 25
  • 27. Relational Cloud  Advantages  Uses the MySQL, Postgre-SQL as backend databases  Migrate the database partitions without causing downtime  Replicate the data for availability  Disadvantages  Scaling the graph representation is difficult as it leads to a graph with N nodes and up to N2 edges for an N-tuple database 27
  • 28. ElasTras  A cloud database under research providing better scalability and elasticity with transactional data access Figure: System overview of ElasTras 28
  • 29. ElasTras  Two level Transaction Manager (TM)  Higher level TM (HTM)  Owning TM (OTM)  When any transaction request arrives  Load balancer uses some load balancing policy and forward the request to appropriate HTM  HTM decides whether to execute the transaction locally or forward to OTM  OTM has exclusive access rights to the data accessed by a single transaction  System state information and database metadata managed by Metadata Manager 29
  • 30. ElasTras  Two approach to partition the database  Static Partitioning  Dynamic Partitioning  Static Partitioning  Database designer defines the partitioning  ElasTras is responsible for mapping the partitions to their specific OTMs  Also reassigns partitions if workload increases  Application has the knowledge of partitions  ElasTras can provide ACID transactional guarantees as transactions executed locally to a partition 30
  • 31. ElasTras  Dynamic Partitioning  Basis for the elasticity of the data store  Uses range or hash based partitioning scheme  Applications are not aware of the partitions  Transactions are not guaranteed to be limited to a single partition  Provides mini transactions with restricted transactional semantics to ensure scalability and avoid distributed transactions  Mini transactions ensures recovery but no global synchronization 31
  • 32. ElasTras  Advantages  Provides transactional guarantees in scalable manner  OTM’s reassigning partitions capability with changing workload ensures elasticity and scalability  Provides ACID transactions when transactions are limited to a single partition  Disadvantages  In dynamic partitioning, ElasTras only supports mini transactions with restricted transactional semantics to avoid distributed transactions  Mini transactions only ensure recovery but no global synchronization 32
  • 33. Challenges in Cloud Databases  Importing data  Data transport are complex and may incur huge cost  Auto failover management  Server crashes, hardware malfunction  Database must be replicated, automatically replace and start working if any failure occurs  Auto scalability and elasticity management  Scale instantly and automatically both throughput and size  Very granular increases and shrinking back in resources 33
  • 34. Big Data Analytics  Big Data Analytics  Process of analyzing huge amount of structured, semi- structured and unstructured data of variety types  Discover the hidden patterns and unknown correlations in data  Companies are interested in big data analytics to achieve competitive advantages over rival companies  Through effective marketing  Propose new innovative services 34
  • 35. Big Data Analytics  Big data analytics help companies make better business decisions  Traditional analytic software are available for data analysis  Advanced technologies such as predictive analysis, data mining, etc.  But, traditional analytics software  is not suitable for big data with semi-structured and/or unstructured data  is not able to handle the demand of processing power needs to analyze those big data  New class of big data analytics environment has emerged  NoSQL databases  Hadoop  MapReduce 35
  • 36. Big Data Analytics in Cloud  Available database as a service in Cloud  Amazon SimpleDB  Google AppEngine  Microsoft SQL Azure  so on…  Limitations in Cloud  Limitations over query execution time, for example, Amazon SimpleDB restricts any query which takes more than 5 sec  Limitations over result dataset size, for example, Google AppEngine does not allow users to retrieve more than 1000 items for any query  Impractical for big data analytics 36
  • 37. Big Data Analytics in Cloud  Specialized solution for big data analytics in cloud  Google BigQuery  Amazon Elastic MapReduce (EMR) 37
  • 38. Google BigQuery  Cloud based interactive query service for big data  Implementation of Dremel, a parallel query engine  Query executes on a small number of very large append- only tables  Two core technology  Columnar storage  Records are separated in column values  Put all single column values in different storage volume forming a tree  Tree architecture  Query pushing down to the branches of the tree  Results are aggregated from the leaves 38
  • 39. Amazon Elastic MapReduce (EMR)  A hosted Hadoop framework  Provides a web service to process huge amounts of data  Contains a MapReduce framework  Sub divides the data in smaller chunks and process them in parallel (the “map” function)  Recombines them into final solution (the “reduce” function) 39
  • 40. Google BigQuery vs. Amazon EMR Head to Head Google BigQuery Amazon EMR Interactive data analysis tool for large data set A programming framework to process big data. Comparable to Hive but claims to be faster Accessible by data analysis application than that developed in Pig, Hive or other programming languages using Amazon’s SDK Designed to run faster query and user friendly Supports implementing complex data even for non-programmers with built-in GUI processing logic Good for ad-hoc and trial-and-error interactive Good for batch processing of large dataset query on large dataset for quick analysis and doing time consuming data conversion and troubleshooting aggregation Provides a regular expression engine to Structuring data fully dependent on application structure the unstructured data logic Does not support large result set neither Supports both large result set and joining of joining of large tables table Does not support updating existing data, only Supports updating existing data append of data is possible 40
  • 41. Conclusion  End of RDBMS?  Cloud databases for big data  Finding relationships in data  Solving the problem for scalability, elasticity and availability  More rising issues  Efficient multi tenancy  Data privacy 41
  • 42. Thanks for your attention 42