SlideShare uma empresa Scribd logo
1 de 42
Databases in the Cloud
     Seminar: Big Data Analytics
                Winter Semester 2012-13



             Moshfiqur Rahman
Agenda
       Cloud Computing Introduction
       Big Data
       RDBMS and Cloud Databases
       Scalability, Elasticity, Availability – New attributes for
        databases
           In RDBMS
           In Cloud Databases
       Challenges in Cloud Databases
       Big Data Analytics in Cloud
       Conclusion

    2
Cloud Computing Introduction
       What is in the Cloud?
           Application as a service
           Hardware and system software
       Public and Private Cloud
       Cloud Computing attributes
           Virtually infinite computing resources
           Start small and grow as needed
           Pay-per-use scheme




    3
Cloud Computing Introduction




                      Source: http://en.wikipedia.org/wiki/Cloud_computing

4
Big Data
       Large and complex data sets
           Exponential growth
           Structured, semi-structured, unstructured
           Hard to process in traditional database system
       Challenges with Big Data
           Capture, scrutinization, storage, search, sharing, analysis…
       Big Data sources
           Mobile devices
           Sensors
           Software/server logs
           Cameras and so on…

    5
Big Data Attributes
       Volume
           Factors contribute to the increase of data, for example, text streams
            from social networks
           Hidden relationships in data
           Data storage cost decreased but data analysis issues increased!
       Variety
           Data can be of all possible formats
           Structured/semi-structured data from RDBMS
           Unstructured data from documents, emails, video, audio, sensors
       Velocity
           Keep up the data processing speed with data production speed
           Streams of real time data from sensors and social media
           Reacting quickly to the increase of data velocity

    6
Relational Database
       A relational database is
           Collection of tables (entities)
               Multiple columns
               Multiple rows (tuples)
       Accessed by SQL
       Join multiple tables to get related data
       Normalization is used to minimize redundancy and
        dependency
       Referential Integrity is used to ensure data consistency
       Managed by Relational Database Management System
        (RDBMS)
       Oracle, MS SQL Server, MySQL, etc.

    7
RDBMS - A misfit for cloud?
       RDBMS has
           Simplicity
           Robustness
           Flexibility
           Performance
           Compatibility
           (Limited) Scalability
       Cloud databases require
           Scalability
           Elasticity
           Availability

    8
Cloud Databases
       Key/value store – a new kind of database management
        system
           Store data as key/value pair
           Targeted for specialized applications where a RDBMS is not
            suitable
       Also known as
           Document-oriented database
           Internet-facing database
           Attribute-oriented database
           Distributed database, etc.



    9
Cloud Databases - Advantages
    Stores data in format of items
        Customer items, Order items in an e-commerce system
        A single item contains all the relevant data
    Relationships are not deprecated, just simplified
        Order items contains the keys of associated Customer item
         and Product items
    Able to scale easily and dynamically
        Allows the user to pay only for used resources
        Allows the vendor to scale their infrastructure depending on
         their entire platform size



    10
Cloud Databases - Advantages
    Reduce the development time
        By decreasing developing time with object relational data
         mapping
        Easier to map application object to key/value database items




    11
Cloud Databases - Disadvantages
    Relationships are not defined in data models
        DBMS cannot enforce data integrity
        Deleting item from a set of related items will make data
         inconsistent
    No shared standard
        Totally different set of APIs
        Application developed for one cloud vendor is hard to port to
         another cloud vendor




    12
Scalability
    Desired property of a system to accommodate growing
     amounts of work
        By adding more hardware in single machine
        By adding more machines (a.k.a. node)
    Two ways to scale a system
        Vertically or Scale Up
            New hardware is added to a single node in a system
            Adding more processors or memory to a single machine
        Horizontally or Scale Out
            Add more nodes to a system
            Scaling out from one web-server system to a three web-server
             system

    13
Elasticity
    Ability to spread the workloads dynamically over the
     available resources
        Automatically adds more resources when workload increases
        Automatically shrinks back and removes the unneeded
         resources when workload decreases
    Very important for cloud environment
        Pay-per-scheme




    14
Availability
    Allows the user read and write data at any time without
     blocking them
    Response time is virtually constant and does not depend
     on
        Number of concurrent users
        Database size
        Any other system parameter
    Automatic data backups and failover management




    15
Scalability in RDBMS
    RDBMS provides limited scalability
        Scale up on a single node
        Scale out with relatively small numbers of nodes
    Scale up is not infinite but increase in workload can be
     virtually infinite
    Scale out is overwhelming in system with hundreds or
     thousands of nodes




    16
Elasticity in RDBMS
    RDBMS allows very limited elasticity at storage and
     web/application server layers
        Add a web server when the workload increases and adjust the
         throughput to dissipate the loads to the new server
        When workload decreases, detach the server from the system,
         use it for different purposes
        At storage layers, more disks can be added
    Adding a bigger machine and replace the overloaded
     database server
        Expensive investment
        Unnecessary investment for a seasonal hype


    17
Availability in RDBMS
    Employs storage redundancy by performing data
     replication
        Also ensure improved performance for concurrent users
        Provides resiliency in case of a failure
    Data replication is not so easy process
        Synchronization
        Replicate the whole database to make synchronization easier




    18
Scalability, Elasticity and Availability in
Cloud Databases
    New breed of databases focusing on scalability, elasticity
     and availability
        Key/value store supports nearly limitless scalability
        In the expense of other benefits come with RDBMS
    Data accessed by a single key
        Provides the basis for scalability
        Data item is contained in a single object and handled by a
         single node
    Some modern applications need multiple key/value pair
     access atomically
        Online multi-player games, Google Drive
        Hence required multi-key atomicity

    19
Scalability, Elasticity and Availability in
Cloud Databases
    Different database implementations
        Google’s MegaStore
        G-Store
        Relational Cloud
        ElasTras




    20
MegaStore
    Uses Bigtable as the underlying system
    Provides multi-key atomicity
        Data Fusion
        Group multiple key/value pair as single collection
        Write/ahead logging
        Two-phase commit to support ACID transactions on a
         collection




    21
MegaStore
    Advantages
        Allows entities to be arbitrarily distributed over multiple nodes
        Better performance when entity group co-located in a single
         node
    Disadvantages
        Exhibits performance issues when entity group is distributed
         across multiple nodes




    22
G-Store
    Provides transactional multi-key access over dynamic,
     non-overlapping groups of keys
    Created groups are transient in nature
    Creates abstract group for on-demand transaction access
        Leader key, follower keys
        Ownership of read/write access transfers to the node hosting
         the key group
        No key should not be claimed by multiple group, no key should
         be without a owner




    23
G-Store
    Advantages
        Transactions are efficient for key group resides on single node
    Disadvantages
        A group must be small enough to reside on a single node




    24
Relational Cloud
    Works on Elasticity extensively
    Uses a graph-based partitioning method to split large
     databases across multiple machines
        Workload aware partitioning strategy
    Frontend transaction trace component
        keeps track of transactions
        Analyze the transactions to determine the set of tuples
         accessed together
        Creates a graph of transactions
        Weight is given to the edges to denote how often a
         transactions are executed

    25
Relational Cloud




26
Relational Cloud
    Advantages
        Uses the MySQL, Postgre-SQL as backend databases
        Migrate the database partitions without causing downtime
        Replicate the data for availability
    Disadvantages
        Scaling the graph representation is difficult as it leads to a
         graph with N nodes and up to N2 edges for an N-tuple
         database




    27
ElasTras
    A cloud database under research providing better
     scalability and elasticity with transactional data access




                        Figure: System overview of ElasTras

    28
ElasTras
    Two level Transaction Manager (TM)
        Higher level TM (HTM)
        Owning TM (OTM)
    When any transaction request arrives
        Load balancer uses some load balancing policy and forward the
         request to appropriate HTM
        HTM decides whether to execute the transaction locally or
         forward to OTM
        OTM has exclusive access rights to the data accessed by a
         single transaction
    System state information and database metadata managed
     by Metadata Manager

    29
ElasTras
    Two approach to partition the database
        Static Partitioning
        Dynamic Partitioning
    Static Partitioning
        Database designer defines the partitioning
        ElasTras is responsible for mapping the partitions to their
         specific OTMs
        Also reassigns partitions if workload increases
        Application has the knowledge of partitions
        ElasTras can provide ACID transactional guarantees as
         transactions executed locally to a partition


    30
ElasTras
    Dynamic Partitioning
        Basis for the elasticity of the data store
        Uses range or hash based partitioning scheme
        Applications are not aware of the partitions
        Transactions are not guaranteed to be limited to a single
         partition
        Provides mini transactions with restricted transactional
         semantics to ensure scalability and avoid distributed
         transactions
        Mini transactions ensures recovery but no global
         synchronization


    31
ElasTras
    Advantages
        Provides transactional guarantees in scalable manner
        OTM’s reassigning partitions capability with changing workload
         ensures elasticity and scalability
        Provides ACID transactions when transactions are limited to a
         single partition
    Disadvantages
        In dynamic partitioning, ElasTras only supports mini
         transactions with restricted transactional semantics to avoid
         distributed transactions
        Mini transactions only ensure recovery but no global
         synchronization

    32
Challenges in Cloud Databases
    Importing data
        Data transport are complex and may incur huge cost
    Auto failover management
        Server crashes, hardware malfunction
        Database must be replicated, automatically replace and start
         working if any failure occurs
    Auto scalability and elasticity management
        Scale instantly and automatically both throughput and size
        Very granular increases and shrinking back in resources




    33
Big Data Analytics
    Big Data Analytics
        Process of analyzing huge amount of structured, semi-
         structured and unstructured data of variety types
        Discover the hidden patterns and unknown correlations in
         data
    Companies are interested in big data analytics to achieve
     competitive advantages over rival companies
        Through effective marketing
        Propose new innovative services




    34
Big Data Analytics
    Big data analytics help companies make better business
     decisions
    Traditional analytic software are available for data analysis
        Advanced technologies such as predictive analysis, data mining, etc.
    But, traditional analytics software
        is not suitable for big data with semi-structured and/or unstructured
         data
        is not able to handle the demand of processing power needs to
         analyze those big data
    New class of big data analytics environment has emerged
        NoSQL databases
        Hadoop
        MapReduce


    35
Big Data Analytics in Cloud
    Available database as a service in Cloud
        Amazon SimpleDB
        Google AppEngine
        Microsoft SQL Azure
        so on…
    Limitations in Cloud
        Limitations over query execution time, for example, Amazon
         SimpleDB restricts any query which takes more than 5 sec
        Limitations over result dataset size, for example, Google
         AppEngine does not allow users to retrieve more than 1000
         items for any query
    Impractical for big data analytics

    36
Big Data Analytics in Cloud
    Specialized solution for big data analytics in cloud
        Google BigQuery
        Amazon Elastic MapReduce (EMR)




    37
Google BigQuery
    Cloud based interactive query service for big data
    Implementation of Dremel, a parallel query engine
    Query executes on a small number of very large append-
     only tables
    Two core technology
        Columnar storage
            Records are separated in column values
            Put all single column values in different storage volume forming a tree
        Tree architecture
            Query pushing down to the branches of the tree
            Results are aggregated from the leaves


    38
Amazon Elastic MapReduce (EMR)
    A hosted Hadoop framework
    Provides a web service to process huge amounts of data
    Contains a MapReduce framework
        Sub divides the data in smaller chunks and process them in
         parallel (the “map” function)
        Recombines them into final solution (the “reduce” function)




    39
Google BigQuery vs. Amazon EMR
Head to Head

Google BigQuery                                     Amazon EMR
Interactive data analysis tool for large data set   A programming framework to process big data.
Comparable to Hive but claims to be faster          Accessible by data analysis application
than that                                           developed in Pig, Hive or other programming
                                                    languages using Amazon’s SDK
Designed to run faster query and user friendly      Supports implementing complex data
even for non-programmers with built-in GUI          processing logic
Good for ad-hoc and trial-and-error interactive     Good for batch processing of large dataset
query on large dataset for quick analysis and       doing time consuming data conversion and
troubleshooting                                     aggregation
Provides a regular expression engine to             Structuring data fully dependent on application
structure the unstructured data                     logic
Does not support large result set neither           Supports both large result set and joining of
joining of large tables                             table
Does not support updating existing data, only       Supports updating existing data
append of data is possible

  40
Conclusion
    End of RDBMS?
    Cloud databases for big data
        Finding relationships in data
        Solving the problem for scalability, elasticity and availability
    More rising issues
        Efficient multi tenancy
        Data privacy




    41
Thanks for your attention




42

Mais conteúdo relacionado

Mais procurados

AWS Amazon S3 Mastery Bootcamp
AWS Amazon S3 Mastery BootcampAWS Amazon S3 Mastery Bootcamp
AWS Amazon S3 Mastery BootcampMatt Bohn
 
High Availability and Disaster Recovery
High Availability and Disaster RecoveryHigh Availability and Disaster Recovery
High Availability and Disaster RecoveryAkelios
 
Cloud Computing and Services | PPT
Cloud Computing and Services | PPTCloud Computing and Services | PPT
Cloud Computing and Services | PPTSeminar Links
 
Introduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 NetworksIntroduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 Networksi2k2 Networks (P) Ltd.
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Web Services
 
Infrastructure as a Service ( IaaS)
Infrastructure as a Service ( IaaS)Infrastructure as a Service ( IaaS)
Infrastructure as a Service ( IaaS)Ravindra Dastikop
 
Aws overview (Amazon Web Services)
Aws overview (Amazon Web Services)Aws overview (Amazon Web Services)
Aws overview (Amazon Web Services)Jatinder Randhawa
 
Azure Overview Arc
Azure Overview ArcAzure Overview Arc
Azure Overview Arcrajramab
 
Introduction to Cloud Computing with AWS (Thai Session)
Introduction to Cloud Computing with AWS (Thai Session)Introduction to Cloud Computing with AWS (Thai Session)
Introduction to Cloud Computing with AWS (Thai Session)Amazon Web Services
 
Introduction to Amazon Web Services (AWS)
Introduction to Amazon Web Services (AWS)Introduction to Amazon Web Services (AWS)
Introduction to Amazon Web Services (AWS)Garvit Anand
 

Mais procurados (20)

AWS Amazon S3 Mastery Bootcamp
AWS Amazon S3 Mastery BootcampAWS Amazon S3 Mastery Bootcamp
AWS Amazon S3 Mastery Bootcamp
 
High Availability and Disaster Recovery
High Availability and Disaster RecoveryHigh Availability and Disaster Recovery
High Availability and Disaster Recovery
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Cloud Computing and Services | PPT
Cloud Computing and Services | PPTCloud Computing and Services | PPT
Cloud Computing and Services | PPT
 
AWS Basics .pdf
AWS Basics .pdfAWS Basics .pdf
AWS Basics .pdf
 
Introduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 NetworksIntroduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 Networks
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
Intro to AWS: Database Services
Intro to AWS: Database ServicesIntro to AWS: Database Services
Intro to AWS: Database Services
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Infrastructure as a Service ( IaaS)
Infrastructure as a Service ( IaaS)Infrastructure as a Service ( IaaS)
Infrastructure as a Service ( IaaS)
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
Aws overview (Amazon Web Services)
Aws overview (Amazon Web Services)Aws overview (Amazon Web Services)
Aws overview (Amazon Web Services)
 
Software as a service
Software as a serviceSoftware as a service
Software as a service
 
AWS for Backup and Recovery
AWS for Backup and RecoveryAWS for Backup and Recovery
AWS for Backup and Recovery
 
Azure Overview Arc
Azure Overview ArcAzure Overview Arc
Azure Overview Arc
 
Introduction to Cloud Computing with AWS (Thai Session)
Introduction to Cloud Computing with AWS (Thai Session)Introduction to Cloud Computing with AWS (Thai Session)
Introduction to Cloud Computing with AWS (Thai Session)
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Introduction to Amazon Web Services (AWS)
Introduction to Amazon Web Services (AWS)Introduction to Amazon Web Services (AWS)
Introduction to Amazon Web Services (AWS)
 

Destaque

Big Data Analytics - GTech Seminar
Big Data Analytics - GTech SeminarBig Data Analytics - GTech Seminar
Big Data Analytics - GTech SeminarBijilash Babu
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computingViet-Trung TRAN
 
Internet of Things (IoT) and Big Data
Internet of Things (IoT) and Big DataInternet of Things (IoT) and Big Data
Internet of Things (IoT) and Big DataGuido Schmutz
 
IoT + Big Data + Cloud + AI Integration Strategy Insights from Patents
IoT + Big Data + Cloud + AI Integration Strategy Insights from PatentsIoT + Big Data + Cloud + AI Integration Strategy Insights from Patents
IoT + Big Data + Cloud + AI Integration Strategy Insights from PatentsAlex G. Lee, Ph.D. Esq. CLP
 
Green Cloud Computing
Green Cloud ComputingGreen Cloud Computing
Green Cloud ComputingSeungyun Lee
 

Destaque (6)

Big Data Analytics - GTech Seminar
Big Data Analytics - GTech SeminarBig Data Analytics - GTech Seminar
Big Data Analytics - GTech Seminar
 
OPC -Connectivity using Java
OPC -Connectivity using JavaOPC -Connectivity using Java
OPC -Connectivity using Java
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computing
 
Internet of Things (IoT) and Big Data
Internet of Things (IoT) and Big DataInternet of Things (IoT) and Big Data
Internet of Things (IoT) and Big Data
 
IoT + Big Data + Cloud + AI Integration Strategy Insights from Patents
IoT + Big Data + Cloud + AI Integration Strategy Insights from PatentsIoT + Big Data + Cloud + AI Integration Strategy Insights from Patents
IoT + Big Data + Cloud + AI Integration Strategy Insights from Patents
 
Green Cloud Computing
Green Cloud ComputingGreen Cloud Computing
Green Cloud Computing
 

Semelhante a Databases in the Cloud Seminar Big Data Analytics

SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...IJCERT JOURNAL
 
Challenges Management and Opportunities of Cloud DBA
Challenges Management and Opportunities of Cloud DBAChallenges Management and Opportunities of Cloud DBA
Challenges Management and Opportunities of Cloud DBAinventy
 
Prepare Your Data For The Cloud
Prepare Your Data For The CloudPrepare Your Data For The Cloud
Prepare Your Data For The CloudIndicThreads
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfajajkhan16
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL TechnologiesAmit Singh
 
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...Darshan Gorasiya
 
Cidr11 paper32
Cidr11 paper32Cidr11 paper32
Cidr11 paper32jujukoko
 
Megastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesMegastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesJoão Gabriel Lima
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
A novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computingA novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computingJoão Gabriel Lima
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
 
Cloud Data Strategy event London
Cloud Data Strategy event LondonCloud Data Strategy event London
Cloud Data Strategy event LondonMongoDB
 

Semelhante a Databases in the Cloud Seminar Big Data Analytics (20)

SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
 
Challenges Management and Opportunities of Cloud DBA
Challenges Management and Opportunities of Cloud DBAChallenges Management and Opportunities of Cloud DBA
Challenges Management and Opportunities of Cloud DBA
 
Prepare Your Data For The Cloud
Prepare Your Data For The CloudPrepare Your Data For The Cloud
Prepare Your Data For The Cloud
 
Preparing your data for the cloud
Preparing your data for the cloudPreparing your data for the cloud
Preparing your data for the cloud
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
Bh25352355
Bh25352355Bh25352355
Bh25352355
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
No sq lv2
No sq lv2No sq lv2
No sq lv2
 
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
 
Cidr11 paper32
Cidr11 paper32Cidr11 paper32
Cidr11 paper32
 
Megastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesMegastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive services
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docx
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
A novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computingA novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computing
 
Preparing yourdataforcloud
Preparing yourdataforcloudPreparing yourdataforcloud
Preparing yourdataforcloud
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 
Cloud Data Strategy event London
Cloud Data Strategy event LondonCloud Data Strategy event London
Cloud Data Strategy event London
 
No sql
No sqlNo sql
No sql
 
No sql
No sqlNo sql
No sql
 

Último

exhuma plot and synopsis from the exhuma movie.pptx
exhuma plot and synopsis from the exhuma movie.pptxexhuma plot and synopsis from the exhuma movie.pptx
exhuma plot and synopsis from the exhuma movie.pptxKurikulumPenilaian
 
Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...
Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...
Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...anilsa9823
 
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...akbard9823
 
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...akbard9823
 
Lucknow 💋 Call Girl in Lucknow Phone No 8923113531 Elite Escort Service Avail...
Lucknow 💋 Call Girl in Lucknow Phone No 8923113531 Elite Escort Service Avail...Lucknow 💋 Call Girl in Lucknow Phone No 8923113531 Elite Escort Service Avail...
Lucknow 💋 Call Girl in Lucknow Phone No 8923113531 Elite Escort Service Avail...anilsa9823
 
Charbagh ! (Call Girls) in Lucknow Finest Escorts Service 🥗 8923113531 🏊 Avai...
Charbagh ! (Call Girls) in Lucknow Finest Escorts Service 🥗 8923113531 🏊 Avai...Charbagh ! (Call Girls) in Lucknow Finest Escorts Service 🥗 8923113531 🏊 Avai...
Charbagh ! (Call Girls) in Lucknow Finest Escorts Service 🥗 8923113531 🏊 Avai...gurkirankumar98700
 
Deconstructing Gendered Language; Feminist World-Making 2024
Deconstructing Gendered Language; Feminist World-Making 2024Deconstructing Gendered Language; Feminist World-Making 2024
Deconstructing Gendered Language; Feminist World-Making 2024samlnance
 
FULL ENJOY - 9953040155 Call Girls in Indirapuram | Delhi
FULL ENJOY - 9953040155 Call Girls in Indirapuram | DelhiFULL ENJOY - 9953040155 Call Girls in Indirapuram | Delhi
FULL ENJOY - 9953040155 Call Girls in Indirapuram | DelhiMalviyaNagarCallGirl
 
Call Girl Service In Dubai #$# O56521286O #$# Dubai Call Girls
Call Girl Service In Dubai #$# O56521286O #$# Dubai Call GirlsCall Girl Service In Dubai #$# O56521286O #$# Dubai Call Girls
Call Girl Service In Dubai #$# O56521286O #$# Dubai Call Girlsparisharma5056
 
FULL ENJOY - 9953040155 Call Girls in Sangam Vihar | Delhi
FULL ENJOY - 9953040155 Call Girls in Sangam Vihar | DelhiFULL ENJOY - 9953040155 Call Girls in Sangam Vihar | Delhi
FULL ENJOY - 9953040155 Call Girls in Sangam Vihar | DelhiMalviyaNagarCallGirl
 
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...anilsa9823
 
FULL ENJOY - 9953040155 Call Girls in Shahdara | Delhi
FULL ENJOY - 9953040155 Call Girls in Shahdara | DelhiFULL ENJOY - 9953040155 Call Girls in Shahdara | Delhi
FULL ENJOY - 9953040155 Call Girls in Shahdara | DelhiMalviyaNagarCallGirl
 
Akola Call Girls #9907093804 Contact Number Escorts Service Akola
Akola Call Girls #9907093804 Contact Number Escorts Service AkolaAkola Call Girls #9907093804 Contact Number Escorts Service Akola
Akola Call Girls #9907093804 Contact Number Escorts Service Akolasrsj9000
 
Patrakarpuram ) Cheap Call Girls In Lucknow (Adult Only) 🧈 8923113531 𓀓 Esco...
Patrakarpuram ) Cheap Call Girls In Lucknow  (Adult Only) 🧈 8923113531 𓀓 Esco...Patrakarpuram ) Cheap Call Girls In Lucknow  (Adult Only) 🧈 8923113531 𓀓 Esco...
Patrakarpuram ) Cheap Call Girls In Lucknow (Adult Only) 🧈 8923113531 𓀓 Esco...akbard9823
 
Roadrunner Lodge, Motel/Residence, Tucumcari NM
Roadrunner Lodge, Motel/Residence, Tucumcari NMRoadrunner Lodge, Motel/Residence, Tucumcari NM
Roadrunner Lodge, Motel/Residence, Tucumcari NMroute66connected
 
Deira Call Girls # 0522916705 # Call Girls In Deira Dubai || (UAE)
Deira Call Girls # 0522916705 #  Call Girls In Deira Dubai || (UAE)Deira Call Girls # 0522916705 #  Call Girls In Deira Dubai || (UAE)
Deira Call Girls # 0522916705 # Call Girls In Deira Dubai || (UAE)wdefrd
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Pari Chowk | Noida
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Pari Chowk | NoidaFULL ENJOY 🔝 8264348440 🔝 Call Girls in Pari Chowk | Noida
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Pari Chowk | Noidasoniya singh
 
Charbagh / best call girls in Lucknow - Book 🥤 8923113531 🪗 Call Girls Availa...
Charbagh / best call girls in Lucknow - Book 🥤 8923113531 🪗 Call Girls Availa...Charbagh / best call girls in Lucknow - Book 🥤 8923113531 🪗 Call Girls Availa...
Charbagh / best call girls in Lucknow - Book 🥤 8923113531 🪗 Call Girls Availa...gurkirankumar98700
 
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad EscortsIslamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escortswdefrd
 
Authentic # 00971556872006 # Hot Call Girls Service in Dubai By International...
Authentic # 00971556872006 # Hot Call Girls Service in Dubai By International...Authentic # 00971556872006 # Hot Call Girls Service in Dubai By International...
Authentic # 00971556872006 # Hot Call Girls Service in Dubai By International...home
 

Último (20)

exhuma plot and synopsis from the exhuma movie.pptx
exhuma plot and synopsis from the exhuma movie.pptxexhuma plot and synopsis from the exhuma movie.pptx
exhuma plot and synopsis from the exhuma movie.pptx
 
Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...
Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...
Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...
 
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
 
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
 
Lucknow 💋 Call Girl in Lucknow Phone No 8923113531 Elite Escort Service Avail...
Lucknow 💋 Call Girl in Lucknow Phone No 8923113531 Elite Escort Service Avail...Lucknow 💋 Call Girl in Lucknow Phone No 8923113531 Elite Escort Service Avail...
Lucknow 💋 Call Girl in Lucknow Phone No 8923113531 Elite Escort Service Avail...
 
Charbagh ! (Call Girls) in Lucknow Finest Escorts Service 🥗 8923113531 🏊 Avai...
Charbagh ! (Call Girls) in Lucknow Finest Escorts Service 🥗 8923113531 🏊 Avai...Charbagh ! (Call Girls) in Lucknow Finest Escorts Service 🥗 8923113531 🏊 Avai...
Charbagh ! (Call Girls) in Lucknow Finest Escorts Service 🥗 8923113531 🏊 Avai...
 
Deconstructing Gendered Language; Feminist World-Making 2024
Deconstructing Gendered Language; Feminist World-Making 2024Deconstructing Gendered Language; Feminist World-Making 2024
Deconstructing Gendered Language; Feminist World-Making 2024
 
FULL ENJOY - 9953040155 Call Girls in Indirapuram | Delhi
FULL ENJOY - 9953040155 Call Girls in Indirapuram | DelhiFULL ENJOY - 9953040155 Call Girls in Indirapuram | Delhi
FULL ENJOY - 9953040155 Call Girls in Indirapuram | Delhi
 
Call Girl Service In Dubai #$# O56521286O #$# Dubai Call Girls
Call Girl Service In Dubai #$# O56521286O #$# Dubai Call GirlsCall Girl Service In Dubai #$# O56521286O #$# Dubai Call Girls
Call Girl Service In Dubai #$# O56521286O #$# Dubai Call Girls
 
FULL ENJOY - 9953040155 Call Girls in Sangam Vihar | Delhi
FULL ENJOY - 9953040155 Call Girls in Sangam Vihar | DelhiFULL ENJOY - 9953040155 Call Girls in Sangam Vihar | Delhi
FULL ENJOY - 9953040155 Call Girls in Sangam Vihar | Delhi
 
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
 
FULL ENJOY - 9953040155 Call Girls in Shahdara | Delhi
FULL ENJOY - 9953040155 Call Girls in Shahdara | DelhiFULL ENJOY - 9953040155 Call Girls in Shahdara | Delhi
FULL ENJOY - 9953040155 Call Girls in Shahdara | Delhi
 
Akola Call Girls #9907093804 Contact Number Escorts Service Akola
Akola Call Girls #9907093804 Contact Number Escorts Service AkolaAkola Call Girls #9907093804 Contact Number Escorts Service Akola
Akola Call Girls #9907093804 Contact Number Escorts Service Akola
 
Patrakarpuram ) Cheap Call Girls In Lucknow (Adult Only) 🧈 8923113531 𓀓 Esco...
Patrakarpuram ) Cheap Call Girls In Lucknow  (Adult Only) 🧈 8923113531 𓀓 Esco...Patrakarpuram ) Cheap Call Girls In Lucknow  (Adult Only) 🧈 8923113531 𓀓 Esco...
Patrakarpuram ) Cheap Call Girls In Lucknow (Adult Only) 🧈 8923113531 𓀓 Esco...
 
Roadrunner Lodge, Motel/Residence, Tucumcari NM
Roadrunner Lodge, Motel/Residence, Tucumcari NMRoadrunner Lodge, Motel/Residence, Tucumcari NM
Roadrunner Lodge, Motel/Residence, Tucumcari NM
 
Deira Call Girls # 0522916705 # Call Girls In Deira Dubai || (UAE)
Deira Call Girls # 0522916705 #  Call Girls In Deira Dubai || (UAE)Deira Call Girls # 0522916705 #  Call Girls In Deira Dubai || (UAE)
Deira Call Girls # 0522916705 # Call Girls In Deira Dubai || (UAE)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Pari Chowk | Noida
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Pari Chowk | NoidaFULL ENJOY 🔝 8264348440 🔝 Call Girls in Pari Chowk | Noida
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Pari Chowk | Noida
 
Charbagh / best call girls in Lucknow - Book 🥤 8923113531 🪗 Call Girls Availa...
Charbagh / best call girls in Lucknow - Book 🥤 8923113531 🪗 Call Girls Availa...Charbagh / best call girls in Lucknow - Book 🥤 8923113531 🪗 Call Girls Availa...
Charbagh / best call girls in Lucknow - Book 🥤 8923113531 🪗 Call Girls Availa...
 
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad EscortsIslamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
 
Authentic # 00971556872006 # Hot Call Girls Service in Dubai By International...
Authentic # 00971556872006 # Hot Call Girls Service in Dubai By International...Authentic # 00971556872006 # Hot Call Girls Service in Dubai By International...
Authentic # 00971556872006 # Hot Call Girls Service in Dubai By International...
 

Databases in the Cloud Seminar Big Data Analytics

  • 1. Databases in the Cloud Seminar: Big Data Analytics Winter Semester 2012-13 Moshfiqur Rahman
  • 2. Agenda  Cloud Computing Introduction  Big Data  RDBMS and Cloud Databases  Scalability, Elasticity, Availability – New attributes for databases  In RDBMS  In Cloud Databases  Challenges in Cloud Databases  Big Data Analytics in Cloud  Conclusion 2
  • 3. Cloud Computing Introduction  What is in the Cloud?  Application as a service  Hardware and system software  Public and Private Cloud  Cloud Computing attributes  Virtually infinite computing resources  Start small and grow as needed  Pay-per-use scheme 3
  • 4. Cloud Computing Introduction Source: http://en.wikipedia.org/wiki/Cloud_computing 4
  • 5. Big Data  Large and complex data sets  Exponential growth  Structured, semi-structured, unstructured  Hard to process in traditional database system  Challenges with Big Data  Capture, scrutinization, storage, search, sharing, analysis…  Big Data sources  Mobile devices  Sensors  Software/server logs  Cameras and so on… 5
  • 6. Big Data Attributes  Volume  Factors contribute to the increase of data, for example, text streams from social networks  Hidden relationships in data  Data storage cost decreased but data analysis issues increased!  Variety  Data can be of all possible formats  Structured/semi-structured data from RDBMS  Unstructured data from documents, emails, video, audio, sensors  Velocity  Keep up the data processing speed with data production speed  Streams of real time data from sensors and social media  Reacting quickly to the increase of data velocity 6
  • 7. Relational Database  A relational database is  Collection of tables (entities)  Multiple columns  Multiple rows (tuples)  Accessed by SQL  Join multiple tables to get related data  Normalization is used to minimize redundancy and dependency  Referential Integrity is used to ensure data consistency  Managed by Relational Database Management System (RDBMS)  Oracle, MS SQL Server, MySQL, etc. 7
  • 8. RDBMS - A misfit for cloud?  RDBMS has  Simplicity  Robustness  Flexibility  Performance  Compatibility  (Limited) Scalability  Cloud databases require  Scalability  Elasticity  Availability 8
  • 9. Cloud Databases  Key/value store – a new kind of database management system  Store data as key/value pair  Targeted for specialized applications where a RDBMS is not suitable  Also known as  Document-oriented database  Internet-facing database  Attribute-oriented database  Distributed database, etc. 9
  • 10. Cloud Databases - Advantages  Stores data in format of items  Customer items, Order items in an e-commerce system  A single item contains all the relevant data  Relationships are not deprecated, just simplified  Order items contains the keys of associated Customer item and Product items  Able to scale easily and dynamically  Allows the user to pay only for used resources  Allows the vendor to scale their infrastructure depending on their entire platform size 10
  • 11. Cloud Databases - Advantages  Reduce the development time  By decreasing developing time with object relational data mapping  Easier to map application object to key/value database items 11
  • 12. Cloud Databases - Disadvantages  Relationships are not defined in data models  DBMS cannot enforce data integrity  Deleting item from a set of related items will make data inconsistent  No shared standard  Totally different set of APIs  Application developed for one cloud vendor is hard to port to another cloud vendor 12
  • 13. Scalability  Desired property of a system to accommodate growing amounts of work  By adding more hardware in single machine  By adding more machines (a.k.a. node)  Two ways to scale a system  Vertically or Scale Up  New hardware is added to a single node in a system  Adding more processors or memory to a single machine  Horizontally or Scale Out  Add more nodes to a system  Scaling out from one web-server system to a three web-server system 13
  • 14. Elasticity  Ability to spread the workloads dynamically over the available resources  Automatically adds more resources when workload increases  Automatically shrinks back and removes the unneeded resources when workload decreases  Very important for cloud environment  Pay-per-scheme 14
  • 15. Availability  Allows the user read and write data at any time without blocking them  Response time is virtually constant and does not depend on  Number of concurrent users  Database size  Any other system parameter  Automatic data backups and failover management 15
  • 16. Scalability in RDBMS  RDBMS provides limited scalability  Scale up on a single node  Scale out with relatively small numbers of nodes  Scale up is not infinite but increase in workload can be virtually infinite  Scale out is overwhelming in system with hundreds or thousands of nodes 16
  • 17. Elasticity in RDBMS  RDBMS allows very limited elasticity at storage and web/application server layers  Add a web server when the workload increases and adjust the throughput to dissipate the loads to the new server  When workload decreases, detach the server from the system, use it for different purposes  At storage layers, more disks can be added  Adding a bigger machine and replace the overloaded database server  Expensive investment  Unnecessary investment for a seasonal hype 17
  • 18. Availability in RDBMS  Employs storage redundancy by performing data replication  Also ensure improved performance for concurrent users  Provides resiliency in case of a failure  Data replication is not so easy process  Synchronization  Replicate the whole database to make synchronization easier 18
  • 19. Scalability, Elasticity and Availability in Cloud Databases  New breed of databases focusing on scalability, elasticity and availability  Key/value store supports nearly limitless scalability  In the expense of other benefits come with RDBMS  Data accessed by a single key  Provides the basis for scalability  Data item is contained in a single object and handled by a single node  Some modern applications need multiple key/value pair access atomically  Online multi-player games, Google Drive  Hence required multi-key atomicity 19
  • 20. Scalability, Elasticity and Availability in Cloud Databases  Different database implementations  Google’s MegaStore  G-Store  Relational Cloud  ElasTras 20
  • 21. MegaStore  Uses Bigtable as the underlying system  Provides multi-key atomicity  Data Fusion  Group multiple key/value pair as single collection  Write/ahead logging  Two-phase commit to support ACID transactions on a collection 21
  • 22. MegaStore  Advantages  Allows entities to be arbitrarily distributed over multiple nodes  Better performance when entity group co-located in a single node  Disadvantages  Exhibits performance issues when entity group is distributed across multiple nodes 22
  • 23. G-Store  Provides transactional multi-key access over dynamic, non-overlapping groups of keys  Created groups are transient in nature  Creates abstract group for on-demand transaction access  Leader key, follower keys  Ownership of read/write access transfers to the node hosting the key group  No key should not be claimed by multiple group, no key should be without a owner 23
  • 24. G-Store  Advantages  Transactions are efficient for key group resides on single node  Disadvantages  A group must be small enough to reside on a single node 24
  • 25. Relational Cloud  Works on Elasticity extensively  Uses a graph-based partitioning method to split large databases across multiple machines  Workload aware partitioning strategy  Frontend transaction trace component  keeps track of transactions  Analyze the transactions to determine the set of tuples accessed together  Creates a graph of transactions  Weight is given to the edges to denote how often a transactions are executed 25
  • 27. Relational Cloud  Advantages  Uses the MySQL, Postgre-SQL as backend databases  Migrate the database partitions without causing downtime  Replicate the data for availability  Disadvantages  Scaling the graph representation is difficult as it leads to a graph with N nodes and up to N2 edges for an N-tuple database 27
  • 28. ElasTras  A cloud database under research providing better scalability and elasticity with transactional data access Figure: System overview of ElasTras 28
  • 29. ElasTras  Two level Transaction Manager (TM)  Higher level TM (HTM)  Owning TM (OTM)  When any transaction request arrives  Load balancer uses some load balancing policy and forward the request to appropriate HTM  HTM decides whether to execute the transaction locally or forward to OTM  OTM has exclusive access rights to the data accessed by a single transaction  System state information and database metadata managed by Metadata Manager 29
  • 30. ElasTras  Two approach to partition the database  Static Partitioning  Dynamic Partitioning  Static Partitioning  Database designer defines the partitioning  ElasTras is responsible for mapping the partitions to their specific OTMs  Also reassigns partitions if workload increases  Application has the knowledge of partitions  ElasTras can provide ACID transactional guarantees as transactions executed locally to a partition 30
  • 31. ElasTras  Dynamic Partitioning  Basis for the elasticity of the data store  Uses range or hash based partitioning scheme  Applications are not aware of the partitions  Transactions are not guaranteed to be limited to a single partition  Provides mini transactions with restricted transactional semantics to ensure scalability and avoid distributed transactions  Mini transactions ensures recovery but no global synchronization 31
  • 32. ElasTras  Advantages  Provides transactional guarantees in scalable manner  OTM’s reassigning partitions capability with changing workload ensures elasticity and scalability  Provides ACID transactions when transactions are limited to a single partition  Disadvantages  In dynamic partitioning, ElasTras only supports mini transactions with restricted transactional semantics to avoid distributed transactions  Mini transactions only ensure recovery but no global synchronization 32
  • 33. Challenges in Cloud Databases  Importing data  Data transport are complex and may incur huge cost  Auto failover management  Server crashes, hardware malfunction  Database must be replicated, automatically replace and start working if any failure occurs  Auto scalability and elasticity management  Scale instantly and automatically both throughput and size  Very granular increases and shrinking back in resources 33
  • 34. Big Data Analytics  Big Data Analytics  Process of analyzing huge amount of structured, semi- structured and unstructured data of variety types  Discover the hidden patterns and unknown correlations in data  Companies are interested in big data analytics to achieve competitive advantages over rival companies  Through effective marketing  Propose new innovative services 34
  • 35. Big Data Analytics  Big data analytics help companies make better business decisions  Traditional analytic software are available for data analysis  Advanced technologies such as predictive analysis, data mining, etc.  But, traditional analytics software  is not suitable for big data with semi-structured and/or unstructured data  is not able to handle the demand of processing power needs to analyze those big data  New class of big data analytics environment has emerged  NoSQL databases  Hadoop  MapReduce 35
  • 36. Big Data Analytics in Cloud  Available database as a service in Cloud  Amazon SimpleDB  Google AppEngine  Microsoft SQL Azure  so on…  Limitations in Cloud  Limitations over query execution time, for example, Amazon SimpleDB restricts any query which takes more than 5 sec  Limitations over result dataset size, for example, Google AppEngine does not allow users to retrieve more than 1000 items for any query  Impractical for big data analytics 36
  • 37. Big Data Analytics in Cloud  Specialized solution for big data analytics in cloud  Google BigQuery  Amazon Elastic MapReduce (EMR) 37
  • 38. Google BigQuery  Cloud based interactive query service for big data  Implementation of Dremel, a parallel query engine  Query executes on a small number of very large append- only tables  Two core technology  Columnar storage  Records are separated in column values  Put all single column values in different storage volume forming a tree  Tree architecture  Query pushing down to the branches of the tree  Results are aggregated from the leaves 38
  • 39. Amazon Elastic MapReduce (EMR)  A hosted Hadoop framework  Provides a web service to process huge amounts of data  Contains a MapReduce framework  Sub divides the data in smaller chunks and process them in parallel (the “map” function)  Recombines them into final solution (the “reduce” function) 39
  • 40. Google BigQuery vs. Amazon EMR Head to Head Google BigQuery Amazon EMR Interactive data analysis tool for large data set A programming framework to process big data. Comparable to Hive but claims to be faster Accessible by data analysis application than that developed in Pig, Hive or other programming languages using Amazon’s SDK Designed to run faster query and user friendly Supports implementing complex data even for non-programmers with built-in GUI processing logic Good for ad-hoc and trial-and-error interactive Good for batch processing of large dataset query on large dataset for quick analysis and doing time consuming data conversion and troubleshooting aggregation Provides a regular expression engine to Structuring data fully dependent on application structure the unstructured data logic Does not support large result set neither Supports both large result set and joining of joining of large tables table Does not support updating existing data, only Supports updating existing data append of data is possible 40
  • 41. Conclusion  End of RDBMS?  Cloud databases for big data  Finding relationships in data  Solving the problem for scalability, elasticity and availability  More rising issues  Efficient multi tenancy  Data privacy 41
  • 42. Thanks for your attention 42