SlideShare uma empresa Scribd logo
1 de 41
SQL, NoSQL & BigData
            in
     Data Architecture


       Venu Anuganti
               Nov 2012
          http://scalein.com/
         http://venublog.com/
Who am I
• Data Architect, Technology Advisor & Seed Investor

• Design, Implement & Support SQL, NoSQL and BigData
  Solutions

   – Industry: Databases, Games, Social, Video, SaaS, Analytics,
     Warehouse, Web, Financial, Telco, Mobile, Advertising & SEM
     Marketing


   – Consulted for more than 22+ from Fortune-500 companies


• http://scalein.com/
Agenda
• Current trends in SQL, NoSQL and BigData

• Why “data architecture” is key for every company

• Key factors in getting the right solution

• Typical Big “Data Architecture”

• Overview of popular data sources, quick comparison

• How to build “data analytics” for “data science”
Data Trends
Current Trends
• Lot of dynamics in the market, too much data

• SQL, NoSQL, BigData & Analytics – Buzz including
  investors

• NoSQL, BigData is becoming hot topic for every engineer,
  team, company & management

• Nothing less to the current tablet war between Apple,
  Microsoft, Google, Amazon and Samsung

• Very good sign as technology is evolving. But lot of people
  getting confused. What solution should I start with ?

   – Confusion makes a slow start for lot of startups and even for
     leaders in the industry to make a shift
Current Trends - SQL
• SQL is slowing down.. not really
   – OLTP can’t be replaced easily

• Key factors - Pros
   – Transactional, Concurrency, Consistency & Durability
   – Proven, SQL, JDBC/ODBC, native protocols
   – Widely adopted, fits for all & interoperability
   – Legacy, risk free, easy adoption & expert community
   – Low latency response times, almost ~0 secs
   – Very good for small data-sets, takes advantage of bleeding hardware
     (SSD, Flash cards, high memory, latest CPUs, cloud enabled)
   – Easy read scaling, writes needs application logic


• Key factors – Cons
   –   Transactional, Concurrency, Consistency & Durability
   –   Scalability, Clustering, Distributed
   –   Fixed schema, online management
   –   Built-in clustering is hard due to the nature of ACID
   –   Bound by hardware, Scale-UP
Current Trends - NoSQL
• NoSQL is racing

• Key factors - Pros
   –   Overcomes known SQL limitations
   –   Eventually consistent
   –   Clustering, Scalability, Distributed (not all)
   –   Schema free
   –   Each solves it’s own specific problem
   –   Easy to adopt

• Key factors – Cons
   –   Consistency (varies), Durability
   –   Maturity, major solutions are not yet “production” grade
   –   Does not fit for all, individual solution for each problem
   –   Response time, depends on each solution
   –   95% relays on application logic to explore data store data
Current Trends - BigData
• BigData is the latest industry buzz, trend or …

• Gartner – 28B in 2012 & 34B in 2013 spend
   – 2013 top-10 technology trends – 6th place

• Solves large data problems that existed for years
   – Social, User, Mobile growth demanded such a solution (FB
     crossed 1B users, classic example)
   – Google “BigTable” is the key, and new papers like Dremel drives
     it further
   – Amazon “Dynamo” follows
   – Hadoop & ecosystem is becoming synonym for BigData

• Combines vast structured/un-structured data
   – Overcomes from legacy warehouse model
   – Brings data analytics & data science
   – Real-time, mining, insights, discovery & complex reporting
Current Trends - BigData
• Key factors - Pros

   –   Can handle any size
   –   Commodity hardware
   –   Scalable, Distributed, Highly Available
   –   Ecosystem & growing community

• Key factors – Cons

   –   Latency
   –   Redundancy, Durability, Maturity
   –   Tradeoff on consistency
   –   Hardware evolution, even though designed for commodity
Data Architecture
Data Architecture
• No standard solution that fits to all

• Business and data defines the right solution

• It’s all about solving “business” problems

• You need to find the right tool that does the job

   – If company X uses MySQL to scale their 500M users, does not
     mean you can use MySQL to scale your 100M users

   – If company Y uses MongoDB for storing 100M user profile
     data, does not mean you can also take it for granted
Key Factors
• Resources are the key
  – A good engineer can make bad product to work
  – A bad engineer can make good product to suck

• Understand the business
  – Data sources & data growth
  – Data consumption
     • end user vs. API vs. data science vs. reporting vs. internal
  – SLA, Response time, Turn around time, Recovery times
  – Cost; Evolve as business grows, don’t over-architect from day-1
  – Capacity planning, leave enough room for failure & growth
Tradeoff – Data Architecture
• Performance vs. Scale vs. Stability

• OLTP vs. OLAP

• Internal vs. External

• Application stack

• Cloud vs. Data center

• Hardware vs. features vs. product vs. cost
Typical “Data”
  Architecture
Choosing The Right Solution
• Store:

  – SQL, key-value, in-memory, document, graph, bigdata,
    node.js (server end service), s3, azure, file system, …

• Log:

  – Log processing tools for structured/un-structured
    (scripts, splunk, flume, scribe, chukwa, loggly, kibana, .)

• Caching:

  – File System, Use replicas, Write Through Cache (WTC),
    Read From Cache (RFC)
  – CDN/S3/Azure frequent processing, local cache
Choosing The Right Solution ..
• Platform:

  – php, ruby, java, scala, python, c/c++, client/server, rest,
    soap, http, api, etc.

• (Dev)-Operation:

  – OS, file system, automation using puppet/chef,
    security, performance metrics, monitoring, in-depth
    exposure to every layer (nagios, ganglia, zabbix, new-
    relic, tsdb, etc.)

• Search:

  – built-in, solr, elastic search, full-text
Data Store Evaluation
Evaluate – Data Store
• Key Evaluation Requirements
  –Transactional, Durability & Consistency
  –Response time
  –Functionality
  –Data characteristics
  –Scalability, Clustering
  –Failover
  –Maintenance, Online changes, Node Management
  –Maturity
  –Community, Support
  –Hosted or Managed
  –Cost, open source
  –Big “NO” to Appliance models, premium cost solutions
Decide what you need
• SQL
  – Relational, transactional processing

• NoSQL
  – Non relational, distributed, high performance and highly
    scalable

• Analytics, Warehouse, BigData
  – Data Warehousing, Analytics, Data science, and reporting

• Combination of all 3
  – Begin with SQL, NoSQL and eventually need BigData/Analytics
    platform
SQL Stores
• Disk based storage, Fixed schema

• Data is stored as table (row by row and columns – row
  store), Durable and transactional

• Mainly B-tree as the indexing mechanism

• Dynamic locking/ Lock free for concurrency control

• Write-ahead log (WAL) / transactional log for crash
  recovery
• Takes advantage of bleeding hardware (SSD, flash cards,
  CPUs, memory, cloud enabled, …)

• Concurrent read/write/update/delete same row
SQL – Good
• Simple or complex aggregation

• Statistics, reports at data store level

• Need access to more than one tuple of information

• Results based on multiple search conditions
  – SELECT foo FROM bar where X=1 and Y=2

• Fetching of ordered or array of data

• Compatible with many tools
SQL – Bad

• SQL complexity, parsing cost, client/server
  overhead

• Learning and relational model design

• Performance and Scalability

  – Strictly single node write
  – Sharding causes more trouble operationally
  – Operational maintenance, fire fighting

• Puts a break to rapid development cycles
NoSQL Stores …
• Non relational, schema free

• Highly Distributed

• Simple CLI, REST, SOAP or API driven

• Eventually consistent, depends store to store

• Ability to dynamically define new attributes

• Concurrency & Consistency – @application
NoSQL Stores …
• Multiple Types based on storage architecture

• Key Value, KV
     • Very popular for simple key-value lookups; disk/memory

• Document
     • Popular for document type of storage

• Graph
     • Connected graph with entity relationship

• Column Family
     • Key value with fixed column families, allows dynamic columns
       within column family
NoSQL Stores
• Key-Value Stores       • Column Family

  – Dynamo Clones          – BigTable Clones

     •   Membase              • Cassandra
     •   Riak                 • HBase
     •   Redis                • HyperTable
     •   Tokyo Cabinet
     •   Voldemort

• Document Stores        • Graph Databases
     • MongoDB                •   Neo4J
     • CouchDB                •   InfoGrid
     • SimpleDB               •   AllegroGraph
                              •   FlockDB
NoSQL - Good
• Fits very well for volatile data

• High read or write throughput

• Automatic horizontal scalability (Consistent hashing)

• Simple to implement, no investment for schema design

• Application logic defines object model

• Support of MVCC in some form

• Compaction and un-compaction happens at app tier

• In-memory or disk based or combination @performance penalty
NoSQL - Good
• Rapid development cycles, programmer friendly

• Reduces the footprint at data store level

• NoSQL in general faster than SQL

• Supports INSERT, DELETE, SELECT

• Data is distributed by KEY over nodes (depends on solution)

• Lists, sets, queues, pub-sub are also supported by some NoSQL –
  Redis, Riak
NoSQL - Bad
• Packing and Un-packing of each key

• Lack of relation from one key to another

• Need whole value from the key even when you need 1-byte

• Concurrency for latest copy is your take

• Data store is merely a storage layer, can’t be used for:

   –   Analytics
   –   Reporting
   –   Aggregation
   –   Ordered values
SQL/NoSQL – Good and Bad
• Performance mainly depends on amount of memory

• Disk bound both takes a hit

   – SQL has advantage due to sequential and read-ahead

• Optimization towards frequently accessed data

   – SQL engines maintain LRU, buffer pool
   – Read from slave nodes, may not be up2date

• SQL Engines are proven and widely in use

• People use WTC – NoSQL & SQL
Analytic Stores
• Data warehousing, mainly for processing large data
  sets

• Data marts, Dimensional, Fact and Aggregate
  tables

• ETL, BI, Reporting, Analytics

• Columnar, Distribution and Compression is the key
Data Analytics
• Data Analytics is critical for every business

  – Combine heterogeneous data sources
     • Weblogs, user activity, transactional data, purchase history,
       user profile, crm, marketing, campaign performance, …
  – Complex Reporting
  – Understand user behavior, geo, interest levels
  – Recommendation
  – User (re)targeting
  – Product usage, features most (not) liked
  – Increase ROI, user satisfaction

• It helps business in every aspect to inspect,
  understand, implement, apply – Waterfall model
Data Science
• Large data helps to build good models due to high
  probability
  –Statistics
  –Predictions
  –Data Analysis
  –Build test models, continuously
     •   AB test
     •   Apply slowly to selected users or clients
     •   Fine tune it
     •   Adopt globally
Analytic Stores
• Columnar data warehouse solutions

  – GreenPlum (EMC, DCA appliance)
  – Vertica (HP, appliance coming)
  – ParAccel
  – InfoBright (MySQL based)
  – InfiniDB (open source, Calpont appliance)
  – Netezza (IBM, appliance)
  – XtremeData dbX (appliance)
  – TeraData
Analytic Stores - BigData
• Hadoop is leading the BigData platform

• Rapidly Growing - Analytics Platform

  – HDFS, Map Reduce direct processing
  – HIVE
  – HBASE
  – IMPALA - Cloudera announced last week based on
    Google’s Dremel
  – DRILL – Apache open source version, in works
  – Google BigQuery
Document Store
•Document Stores

  – Supports complex data model than KV
  – Good at handling content management, session,
    profile data
  – Multi index support
  – Dynamic schemas, Nested schemas
  – Auto distributed, eventual consistency
  – MVCC (CouchDB) or app logic (MongoDB)

•MongoDB, SimpleDB: widely adopted in this space

•Use Case: Search by complex patterns & CRUD apps
Column Family Store

• Hbase (Apache), Cassanda (Facebook) and HyperTable (Bidu)

   – Hbase – CA
   – Cassandra – AP

• Model consists of rows and columns

• Scalability: Splitting of both rows and columns

   – Rows are split across nodes using primary key, range
   – Columns are distributed using groups
   – Horizontal and vertical partitioning can be used simultaneous

• Extension of document store
Graph Store
• Social Graph

• Relationship between entities

• Data modeling on social networks

• Common Use Cases

  –List of friends, Shared with common property
  –Recommendation system
  –Following
  –Followers
  –Common Connections
Cloud Data Stores
• “Database As Service” Models:

  – Amazon RDS, DynamoDB, SimpleDB, PostgreSQL
  – Xeround (MySQL)
  – Microsoft SQL Azure Database (SQL Server)
  – Google App Engine (NoSQL)
  – SalesForce Database.com (Oracle)
  – ClearDB (MySQL)
  – Cloudant(CouchDB)
Finally …

 SQL
Works great, can’t easily scale

 NoSQL
Works great, can’t fit for all

 Analytics, BigData
Every business need it
Questions ?

•http://scalein.com/
•http://venublog.com/
•venu@venublog.com
•Twitter: @vanuganti

Mais conteúdo relacionado

Mais procurados

SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseSQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseAnita Luthra
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture PatternsMaynooth University
 
NoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture PatternsNoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture PatternsDATAVERSITY
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL David Smelker
 
Oracle vs NoSQL – The good, the bad and the ugly
Oracle vs NoSQL – The good, the bad and the uglyOracle vs NoSQL – The good, the bad and the ugly
Oracle vs NoSQL – The good, the bad and the uglyJohn Kanagaraj
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!Andraz Tori
 
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...Avinash Ramineni
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?James Serra
 
Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2Ajay Kumar Uppal
 
SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.Denis Reznik
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachChoosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachDATAVERSITY
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersAdaryl "Bob" Wakefield, MBA
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 

Mais procurados (20)

SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseSQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture Patterns
 
NoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture PatternsNoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture Patterns
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
SQL vs NoSQL
SQL vs NoSQLSQL vs NoSQL
SQL vs NoSQL
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
Oracle vs NoSQL – The good, the bad and the ugly
Oracle vs NoSQL – The good, the bad and the uglyOracle vs NoSQL – The good, the bad and the ugly
Oracle vs NoSQL – The good, the bad and the ugly
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Sql vs nosql
Sql vs nosqlSql vs nosql
Sql vs nosql
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2
 
SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachChoosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R Users
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 

Destaque

Apache Hivemall @ Apache BigData '17, Miami
Apache Hivemall @ Apache BigData '17, MiamiApache Hivemall @ Apache BigData '17, Miami
Apache Hivemall @ Apache BigData '17, MiamiMakoto Yui
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata Mk Kim
 
Data Analytics Practice at Paxcel
Data Analytics Practice at PaxcelData Analytics Practice at Paxcel
Data Analytics Practice at PaxcelPushpinder Singh
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Preferred Networks
 
Introducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and KanbanIntroducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and KanbanDimitri Ponomareff
 

Destaque (7)

Apache Hivemall @ Apache BigData '17, Miami
Apache Hivemall @ Apache BigData '17, MiamiApache Hivemall @ Apache BigData '17, Miami
Apache Hivemall @ Apache BigData '17, Miami
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 
Data Analytics Practice at Paxcel
Data Analytics Practice at PaxcelData Analytics Practice at Paxcel
Data Analytics Practice at Paxcel
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
 
Introducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and KanbanIntroducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and Kanban
 
BDaas- BigData as a service
BDaas- BigData as a service  BDaas- BigData as a service
BDaas- BigData as a service
 

Semelhante a SQL, NoSQL, BigData in Data Architecture

An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 
Evolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraEvolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraVishal Puri
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQLCrate.io
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
TECHunplugged Austin 2016
TECHunplugged Austin 2016TECHunplugged Austin 2016
TECHunplugged Austin 2016Chris Evans
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQLPhilippe Julio
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho
 
Fontys Lecture - The Evolution of the Oracle Database 2016
Fontys Lecture -  The Evolution of the Oracle Database 2016Fontys Lecture -  The Evolution of the Oracle Database 2016
Fontys Lecture - The Evolution of the Oracle Database 2016Lucas Jellema
 
Phases of Big Data Challenges @ Nokia
Phases of Big Data Challenges @ NokiaPhases of Big Data Challenges @ Nokia
Phases of Big Data Challenges @ NokiaInnovation Enterprise
 

Semelhante a SQL, NoSQL, BigData in Data Architecture (20)

An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Evolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraEvolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital era
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
TECHunplugged Austin 2016
TECHunplugged Austin 2016TECHunplugged Austin 2016
TECHunplugged Austin 2016
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
Fontys Lecture - The Evolution of the Oracle Database 2016
Fontys Lecture -  The Evolution of the Oracle Database 2016Fontys Lecture -  The Evolution of the Oracle Database 2016
Fontys Lecture - The Evolution of the Oracle Database 2016
 
Phases of Big Data Challenges @ Nokia
Phases of Big Data Challenges @ NokiaPhases of Big Data Challenges @ Nokia
Phases of Big Data Challenges @ Nokia
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Último (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

SQL, NoSQL, BigData in Data Architecture

  • 1. SQL, NoSQL & BigData in Data Architecture Venu Anuganti Nov 2012 http://scalein.com/ http://venublog.com/
  • 2. Who am I • Data Architect, Technology Advisor & Seed Investor • Design, Implement & Support SQL, NoSQL and BigData Solutions – Industry: Databases, Games, Social, Video, SaaS, Analytics, Warehouse, Web, Financial, Telco, Mobile, Advertising & SEM Marketing – Consulted for more than 22+ from Fortune-500 companies • http://scalein.com/
  • 3. Agenda • Current trends in SQL, NoSQL and BigData • Why “data architecture” is key for every company • Key factors in getting the right solution • Typical Big “Data Architecture” • Overview of popular data sources, quick comparison • How to build “data analytics” for “data science”
  • 5. Current Trends • Lot of dynamics in the market, too much data • SQL, NoSQL, BigData & Analytics – Buzz including investors • NoSQL, BigData is becoming hot topic for every engineer, team, company & management • Nothing less to the current tablet war between Apple, Microsoft, Google, Amazon and Samsung • Very good sign as technology is evolving. But lot of people getting confused. What solution should I start with ? – Confusion makes a slow start for lot of startups and even for leaders in the industry to make a shift
  • 6. Current Trends - SQL • SQL is slowing down.. not really – OLTP can’t be replaced easily • Key factors - Pros – Transactional, Concurrency, Consistency & Durability – Proven, SQL, JDBC/ODBC, native protocols – Widely adopted, fits for all & interoperability – Legacy, risk free, easy adoption & expert community – Low latency response times, almost ~0 secs – Very good for small data-sets, takes advantage of bleeding hardware (SSD, Flash cards, high memory, latest CPUs, cloud enabled) – Easy read scaling, writes needs application logic • Key factors – Cons – Transactional, Concurrency, Consistency & Durability – Scalability, Clustering, Distributed – Fixed schema, online management – Built-in clustering is hard due to the nature of ACID – Bound by hardware, Scale-UP
  • 7. Current Trends - NoSQL • NoSQL is racing • Key factors - Pros – Overcomes known SQL limitations – Eventually consistent – Clustering, Scalability, Distributed (not all) – Schema free – Each solves it’s own specific problem – Easy to adopt • Key factors – Cons – Consistency (varies), Durability – Maturity, major solutions are not yet “production” grade – Does not fit for all, individual solution for each problem – Response time, depends on each solution – 95% relays on application logic to explore data store data
  • 8. Current Trends - BigData • BigData is the latest industry buzz, trend or … • Gartner – 28B in 2012 & 34B in 2013 spend – 2013 top-10 technology trends – 6th place • Solves large data problems that existed for years – Social, User, Mobile growth demanded such a solution (FB crossed 1B users, classic example) – Google “BigTable” is the key, and new papers like Dremel drives it further – Amazon “Dynamo” follows – Hadoop & ecosystem is becoming synonym for BigData • Combines vast structured/un-structured data – Overcomes from legacy warehouse model – Brings data analytics & data science – Real-time, mining, insights, discovery & complex reporting
  • 9. Current Trends - BigData • Key factors - Pros – Can handle any size – Commodity hardware – Scalable, Distributed, Highly Available – Ecosystem & growing community • Key factors – Cons – Latency – Redundancy, Durability, Maturity – Tradeoff on consistency – Hardware evolution, even though designed for commodity
  • 11. Data Architecture • No standard solution that fits to all • Business and data defines the right solution • It’s all about solving “business” problems • You need to find the right tool that does the job – If company X uses MySQL to scale their 500M users, does not mean you can use MySQL to scale your 100M users – If company Y uses MongoDB for storing 100M user profile data, does not mean you can also take it for granted
  • 12. Key Factors • Resources are the key – A good engineer can make bad product to work – A bad engineer can make good product to suck • Understand the business – Data sources & data growth – Data consumption • end user vs. API vs. data science vs. reporting vs. internal – SLA, Response time, Turn around time, Recovery times – Cost; Evolve as business grows, don’t over-architect from day-1 – Capacity planning, leave enough room for failure & growth
  • 13. Tradeoff – Data Architecture • Performance vs. Scale vs. Stability • OLTP vs. OLAP • Internal vs. External • Application stack • Cloud vs. Data center • Hardware vs. features vs. product vs. cost
  • 14. Typical “Data” Architecture
  • 15.
  • 16. Choosing The Right Solution • Store: – SQL, key-value, in-memory, document, graph, bigdata, node.js (server end service), s3, azure, file system, … • Log: – Log processing tools for structured/un-structured (scripts, splunk, flume, scribe, chukwa, loggly, kibana, .) • Caching: – File System, Use replicas, Write Through Cache (WTC), Read From Cache (RFC) – CDN/S3/Azure frequent processing, local cache
  • 17. Choosing The Right Solution .. • Platform: – php, ruby, java, scala, python, c/c++, client/server, rest, soap, http, api, etc. • (Dev)-Operation: – OS, file system, automation using puppet/chef, security, performance metrics, monitoring, in-depth exposure to every layer (nagios, ganglia, zabbix, new- relic, tsdb, etc.) • Search: – built-in, solr, elastic search, full-text
  • 19. Evaluate – Data Store • Key Evaluation Requirements –Transactional, Durability & Consistency –Response time –Functionality –Data characteristics –Scalability, Clustering –Failover –Maintenance, Online changes, Node Management –Maturity –Community, Support –Hosted or Managed –Cost, open source –Big “NO” to Appliance models, premium cost solutions
  • 20. Decide what you need • SQL – Relational, transactional processing • NoSQL – Non relational, distributed, high performance and highly scalable • Analytics, Warehouse, BigData – Data Warehousing, Analytics, Data science, and reporting • Combination of all 3 – Begin with SQL, NoSQL and eventually need BigData/Analytics platform
  • 21. SQL Stores • Disk based storage, Fixed schema • Data is stored as table (row by row and columns – row store), Durable and transactional • Mainly B-tree as the indexing mechanism • Dynamic locking/ Lock free for concurrency control • Write-ahead log (WAL) / transactional log for crash recovery • Takes advantage of bleeding hardware (SSD, flash cards, CPUs, memory, cloud enabled, …) • Concurrent read/write/update/delete same row
  • 22. SQL – Good • Simple or complex aggregation • Statistics, reports at data store level • Need access to more than one tuple of information • Results based on multiple search conditions – SELECT foo FROM bar where X=1 and Y=2 • Fetching of ordered or array of data • Compatible with many tools
  • 23. SQL – Bad • SQL complexity, parsing cost, client/server overhead • Learning and relational model design • Performance and Scalability – Strictly single node write – Sharding causes more trouble operationally – Operational maintenance, fire fighting • Puts a break to rapid development cycles
  • 24. NoSQL Stores … • Non relational, schema free • Highly Distributed • Simple CLI, REST, SOAP or API driven • Eventually consistent, depends store to store • Ability to dynamically define new attributes • Concurrency & Consistency – @application
  • 25. NoSQL Stores … • Multiple Types based on storage architecture • Key Value, KV • Very popular for simple key-value lookups; disk/memory • Document • Popular for document type of storage • Graph • Connected graph with entity relationship • Column Family • Key value with fixed column families, allows dynamic columns within column family
  • 26. NoSQL Stores • Key-Value Stores • Column Family – Dynamo Clones – BigTable Clones • Membase • Cassandra • Riak • HBase • Redis • HyperTable • Tokyo Cabinet • Voldemort • Document Stores • Graph Databases • MongoDB • Neo4J • CouchDB • InfoGrid • SimpleDB • AllegroGraph • FlockDB
  • 27. NoSQL - Good • Fits very well for volatile data • High read or write throughput • Automatic horizontal scalability (Consistent hashing) • Simple to implement, no investment for schema design • Application logic defines object model • Support of MVCC in some form • Compaction and un-compaction happens at app tier • In-memory or disk based or combination @performance penalty
  • 28. NoSQL - Good • Rapid development cycles, programmer friendly • Reduces the footprint at data store level • NoSQL in general faster than SQL • Supports INSERT, DELETE, SELECT • Data is distributed by KEY over nodes (depends on solution) • Lists, sets, queues, pub-sub are also supported by some NoSQL – Redis, Riak
  • 29. NoSQL - Bad • Packing and Un-packing of each key • Lack of relation from one key to another • Need whole value from the key even when you need 1-byte • Concurrency for latest copy is your take • Data store is merely a storage layer, can’t be used for: – Analytics – Reporting – Aggregation – Ordered values
  • 30. SQL/NoSQL – Good and Bad • Performance mainly depends on amount of memory • Disk bound both takes a hit – SQL has advantage due to sequential and read-ahead • Optimization towards frequently accessed data – SQL engines maintain LRU, buffer pool – Read from slave nodes, may not be up2date • SQL Engines are proven and widely in use • People use WTC – NoSQL & SQL
  • 31. Analytic Stores • Data warehousing, mainly for processing large data sets • Data marts, Dimensional, Fact and Aggregate tables • ETL, BI, Reporting, Analytics • Columnar, Distribution and Compression is the key
  • 32. Data Analytics • Data Analytics is critical for every business – Combine heterogeneous data sources • Weblogs, user activity, transactional data, purchase history, user profile, crm, marketing, campaign performance, … – Complex Reporting – Understand user behavior, geo, interest levels – Recommendation – User (re)targeting – Product usage, features most (not) liked – Increase ROI, user satisfaction • It helps business in every aspect to inspect, understand, implement, apply – Waterfall model
  • 33. Data Science • Large data helps to build good models due to high probability –Statistics –Predictions –Data Analysis –Build test models, continuously • AB test • Apply slowly to selected users or clients • Fine tune it • Adopt globally
  • 34. Analytic Stores • Columnar data warehouse solutions – GreenPlum (EMC, DCA appliance) – Vertica (HP, appliance coming) – ParAccel – InfoBright (MySQL based) – InfiniDB (open source, Calpont appliance) – Netezza (IBM, appliance) – XtremeData dbX (appliance) – TeraData
  • 35. Analytic Stores - BigData • Hadoop is leading the BigData platform • Rapidly Growing - Analytics Platform – HDFS, Map Reduce direct processing – HIVE – HBASE – IMPALA - Cloudera announced last week based on Google’s Dremel – DRILL – Apache open source version, in works – Google BigQuery
  • 36. Document Store •Document Stores – Supports complex data model than KV – Good at handling content management, session, profile data – Multi index support – Dynamic schemas, Nested schemas – Auto distributed, eventual consistency – MVCC (CouchDB) or app logic (MongoDB) •MongoDB, SimpleDB: widely adopted in this space •Use Case: Search by complex patterns & CRUD apps
  • 37. Column Family Store • Hbase (Apache), Cassanda (Facebook) and HyperTable (Bidu) – Hbase – CA – Cassandra – AP • Model consists of rows and columns • Scalability: Splitting of both rows and columns – Rows are split across nodes using primary key, range – Columns are distributed using groups – Horizontal and vertical partitioning can be used simultaneous • Extension of document store
  • 38. Graph Store • Social Graph • Relationship between entities • Data modeling on social networks • Common Use Cases –List of friends, Shared with common property –Recommendation system –Following –Followers –Common Connections
  • 39. Cloud Data Stores • “Database As Service” Models: – Amazon RDS, DynamoDB, SimpleDB, PostgreSQL – Xeround (MySQL) – Microsoft SQL Azure Database (SQL Server) – Google App Engine (NoSQL) – SalesForce Database.com (Oracle) – ClearDB (MySQL) – Cloudant(CouchDB)
  • 40. Finally …  SQL Works great, can’t easily scale  NoSQL Works great, can’t fit for all  Analytics, BigData Every business need it

Notas do Editor

  1. MySQL Employee 2000-2004 Database Companies MySQL SOLID ANTs Data Server ScaleDB Part of Yahoo ’s cloud initiates like Sherpa and Mobstor and a platform MySQL Geek Still contribute randomly to MySQL source
  2. When web is read-only, things used to scale with one or more systems with caching or LB in the front But as things change to real-time and interactive, the same architecture can ’t keep up Talk about how Facebook, Twitter, LinkedIn is evolving Public cloud sucks in performance, but offers elasticity to grow ; but you need to design systems to balance hardware, performance and scalability
  3. Now lets understand different types of data stores
  4. WTC – Write Through Cache RFC – Read From Cache
  5. WTC – Write Through Cache RFC – Read From Cache
  6. Now lets understand different types of data stores
  7. WTC – Write Through Cache RFC – Read From Cache
  8. Concurrent read/write/update to same row
  9. Employee or user can update his profile fields Guaranteed durability
  10. Employee or user can update his profile fields Guaranteed durability
  11. Bunch of cloud based solutions, which are bit surprising
  12. Bunch of cloud based solutions, which are bit surprising
  13. Bunch of cloud based solutions, which are bit surprising
  14. Gaming is a classic example for volatile data
  15. Gaming is a classic example for volatile data
  16. Gaming is a classic example for volatile data
  17. Gaming is a classic example for volatile data
  18. Widely adopted for years
  19. Widely adopted for years
  20. Widely adopted for years
  21. DCA Data Computing Appliance Talk about analytics and how crucial they are now
  22. DCA Data Computing Appliance Talk about analytics and how crucial they are now
  23. Gaming is a classic example for volatile data
  24. Gaming is a classic example for volatile data
  25. Gaming is a classic example for volatile data