SlideShare uma empresa Scribd logo
1 de 27
Big Data and Cloud

    Jun 30, 2011
   Schubert Zhang
Who am I
• Schubert Zhang (张松波)

• Chief Architect and Director of Big Data Engineering
  and Cloud
• Research Cloud Tech., Develop Cloud Projects and
  Products from 2007
• Led the core development team of CMCC “Big Cloud”.
  @Hanborq

• 10-years telecom products development and tech-
  management. @UTStarcom
Agenda
• Introduction of Cloud Storage and Computing

• Big Data and Cloud

• Our Big-Data/Cloud Products and Solutions

• Anything for Discussion …
PART-1:

INTRODUCTION OF CLOUD
STORAGE AND COMPUTING
A Popular Definition of Cloud …
•   Cloud computing is a model for enabling convenient, on-demand network access
    to a shared pool of configurable computing resources (e.g., networks, servers,
    storage, applications, and services) that can be rapidly provisioned and released
    with minimal management effort or service provider interaction.

•   Cloud storage is a model of networked online storage where data is stored on
    multiple servers. Hosting companies operate large data centers, which provides
    the resources according to the requirements of the customer and expose them as
    storage pools, which the customers can themselves use to store files or data
    objects. Physically, the resource may span across multiple servers or/and data
    centers.

•   It promotes availability and is composed of five essential characteristics, three
    service models, and four deployment models.
A Popular Definition of Cloud …
                                                Hybrid
                                                Clouds



Deployment           Private                   Community                         Public Cloud
Models               Cloud                       Cloud


Service           Software as a                Platform as a                Infrastructure as a
Models            Service (SaaS)               Service (PaaS)                  Service (IaaS)
                                             On Demand Self-Service
Essential               Broad Network Access                          Rapid Elasticity
Characteristics
                          Resource Pooling                        Measured Service

                           Massive Scale                          Elastic Computing
Common                     Homogeneity                          Geographic Distribution
Characteristics
                            Virtualization                        Service Orientation
                         Low Cost Software                        Advanced Security
Examples of Famous Cloud Products
•   Google                                                   Techs:
     – Google AppEngine (Storage for Database, etc.)         GFS2/Bigtable/MapReduce/
     – Google Storage (Storage for Objects)                  Megastore/Spanner/Pregel
                                                             /Dremel…
•   Amazon AWS
     –   Simple Storage Service – S3 (Storage for Objects)   Techs:
     –   Cloud Drive (Online Storage for Individuals)        Web-Service-Protocol/
     –   SimpleDB (Storage for Database)
     –   Elastic Compute Cloud – EC2 (Compute)
                                                             Bitstore/Keymap/Dynamo
                                                             …
•   Rackspace
                                                             Techs:
     – Cloud Servers (Compute)
     – Cloud Files (Storage for Objects)                     Open Stack …

•   Facebook                                                 Techs:
     – Messages                                              Hive/Scribe/Haystack/Hadoop
     – Photo Storage
                                                             …
•   Cloudera
     – Hadoop …
We focus on
    The Technologies Back of the Cloud
• Storage                                            • Computing
•   High Scalability                                 •   High Scalability
     –    Shared-Nothing
     –    Object-Oriented                            •   Parallel Computing Framework
     –    NoSQL
     –    …                                               –   MR - MapReduce
                                                          –   BSP - Bulk Synchronous Parallel
•   High Availability
     –    Failure-Detecting                          •   Job/Task scheduler
     –    Server Clustering
     –    Replication
                                                     •   Failure rework
     –    Eventual Consistency                       •   PDM - Parallel Data Analysis/Mining
     –    …
                                                         Algorithms
•   Big Data                                              –   Simple Statistic/Analysis
     –    PB level storage
     –    Structured or non-structured                    –   Classification/Clustering …
     –    Information Retrieval                           –   For Recommendation and AD
     –    Indexing
     –    Automatic re-sharding/re-partitioning           –   …
     –    Automatic load balancing
     –    …

•   High Throughput/Latency
     –    Optimized IO and data write/read models.
PART-2:

BIG DATA AND CLOUD
Big Data
• Immutable Law of Big Data
  – Volume
  – Variety
  – Velocity


• Need ….
  – Distributed System
     • Many-many commodity machines
  – Scale-out vs. Scale-Up
     • Scale-out: Auto vs. Manually
Big Data, Big Business
  $2.25B
                $400M

                 $1.7B


                $250M



                $263M


  $2.35B

             >>$30.5M (vc)

       Storage Products/Solutions   Data Warehouse
         NAS (Limited Scale-out)         (MPP)
The Next Decade in Data Management




A stable system capable of variety of apps is necessary.
Innovations in database are a requirement.
New data stores are necessary.
Differentiation between programs ill continue until key innovations in data management
platforms become uniform.
Engineering

PART-3:

OUR BIG-DATA/CLOUD PRODUCTS
AND SOLUTIONS
Overview
                Cloud Applications
            (MagicBox, EnterpriseApps …)                          Cloud Datasets
                                RESTful
                                                                      科研
              Cloud Services (web-based)
         (ObjectStorage Service, DataStore Service,
          MapReduce Service, Compute Service …)
                                                                       NGO

                                                                        …


                     Cloud Stack                              •   以Cloud Stack云技术产品和
(CloudOS, SandStor, PebStor, MapReduce, vCompute, …)              方案为基础;
                                                              •   提供面向大规模数据存储和
                                                                  处理的行业应用解决方案:
                                                                  Cloud Solutions;
                    Cloud Solutions                           •   提供面向公众和企业的存储、
                                                                  计算、应用云服务产品:
                            物       互                             Cloud Services;
电    电      视       交                       医         政           提供云应用: Cloud
                            联       联                     …   •
力    信      频       通                       疗         府           Applications。
                            网       网
Our Focus
• Enterprise Big Data Management

• Leverage of the Cloud Tech. from Internet
  Backend
Hardware
采用标准的普通服务器硬件(PC-Server)和网络设备,采用大数集群软件平台构建灵活的集群
系统。集群规模可从几个节点到几千节点,存储规模可高达PB级。
We rely more on software layer scalability (scale-out) and fault-tolerance.
                                                    传统服务器:
                                                     IBM小型机(p5 570)
                                                     联系集群系统(深腾7000G)
                                                     曙光集群系统(曙光TC5000)
                                                     SUN服务器
                                                     …

                                                    传统存储系统:
                                                     NAS系统
                                                     SAN系统
                                                     磁盘阵列
                        • 普通标准PC服务器
                        • 自带存储 (单点可>10TB)           弱点:
                        • 易维护                        昂贵、扩展难、限制多
                        • 节点可替代
                        • 集群扩展方便                    拒绝昂贵、难扩展、局限性
                        • 组网灵活                      多的小型机、硬件捆绑集群
                        • Cluster-Level Soft RAID   和SAN/NAS等存储设备。
Products and Features
                                                                        Cloud API
        Cloud                  DataStore                ObjectStorage                   MapReduce                    Compute
       Services                  Cloud                         Cloud                      Cloud                       Cloud

                                SandStor                   PebStor                  MapReduce
        Cloud                                                                                                        vCompute
                                                          CloudOS
        Stack
                                                                     Hardware & OS

                CloudOS                      SandStor                  PebStor                 MapReduce                   vCompute

• Distributed Cloud Platform           • Distributed            • Distributed Blob        • Flexible Parallel Data   • Virtual Machines
• Commodity Hardware and                 Structured Data          Data Management           Processing                 and Computing
  Cluster                                Management                                         Framework                  Resources mgmt
                                                                • Common features
                                       • Common features          of CloudOS              • Common features of       • Multi VMs support
• High Scalability                                                                          CloudOS
• High Reliability(Data Replication)     of CloudOS             • Efficiency indexes                                 • Elastic VMs
                                                                                          • Large-scale
• High Availability                    • High efficiency          and meta mgmt                                        provisioning
                                                                                          • High parallelized
                                         Indexing               • Efficiency storage                                 • Auto-scale
• Strong Consistency                                                                      • Locality computing
                                       • Multi-level Cache        space mgmt
• High Throughput                                                                         • Simple model for
                                       • Compression            • De-duplicating            programming
• Load Balancing
                                       • Fast random access,    • Unlimited blob size     • Abundant high-level
• Global Data Access
                                         Low Latency                                        languages and
• Global File system                                                                        toolkits
                                       • Flexible Schema
• Simplify Complexity of Apps                                                             • Seamlessly integrated
                                       • High Durability, no
                                         data loss                                          with storage system
 July 3, 2012                                                                                                         17
Cloud Service Platform
Cloud Services                       相似的同类产品或业务                        •   Cloud Services API
ObjectStorage Cloud Service          Amazon S3                              – 基于Web,随处可得
                                     Google Storage for Developer           – RESTful风格,简单易用
                                     Rackspace Files/OpenStack Swift        – 提供对语言开发SDK
                                     Google BlobStore
DataStore Cloud Service              Amazon SimpleDB                   •   Cloud Services的特点
                                     Google DataStore                       – 用户无需关心实现
MapReduce Cloud Service              Amazon MapReduce                       – 随处可得
                                     Hadooop                                – 数据可靠性高
Video Media Cloud Service …          Video                                  – 伸缩性强
                                     Delivery/Streaming/Transcoding/        – 可用性高(99.9%)
                                     Time-shifting/Analytics
                                                                            – 按实际使用付费
                                                                            – 简单易用
  •    Multi-Level Cloud Services:
                                                                            – API符合业界标准/习惯
        –   Infrastructure
        –   Platform
                                                                            – 丰富的管理和监控工具
        –   Applications                                                    – 严密且灵活的安全策略
                                                                            – 多种云服务整合的AAA服
                                                                               务
Object Storage Platform
                build another S3
RockStor Object Storage system provides object storage infrastructure
services which guaranteed efficiency, robustness and load-balance.



          Object Access Layer
                                     Providing Client Lib    Object-Oriented

                                                             High Availability
          MetaStore Layer
                      DHT-based Consistent Overlay Network

                                                             High Scalability
           Data Chunk Store Layer
                           Autonomous Overlay Network         Huge Capacity

                 Clustered storage nodes
Object Storage Cloud Services

   RESTful API举例
(一个简单的对象上传/PUT操作)




                                Object Storage
                              Web-based管理系统
                               和Amazon S3类似
2000
                 4000
                 6000
                 8000
                10000




                    0
1306028040000
1306028520000
1306029000000
1306029480000
1306029960000
1306030440000
1306030920000
1306031400000                                                                     count
                                                                               Total used
                                                                              time(hour)



                                      latency(us)
1306031880000



                                     Total average
                                                                            Total operations



1306032360000
                                                                                                   Total Data size(GB)




1306032840000                                        Total throughput/sec
1306033320000
1306033800000
1306034280000
1306034760000
1306035240000
                                                                              4.93




1306035720000
                                        132.230
                                                      7084.320
                                                                                                                         Write(8KB)



                                                                                       134220800




1306036200000
                                                                                                   1024 (=1TB)




1306036680000
1306037160000
1306037640000
1306038120000
1306038600000
1306039080000
1306039560000
1306040040000
1306040520000
                                                                              17.267




                                        464.012
                                                      2155.119
                                                                                                                         Read(8KB)




1306041000000
                                                                                       134220800
                                                                                                   1024 (=1TB)




1306041480000
                                                                                                                                      Performance of S3




1306041960000
1306042440000
1306042920000
1306043400000
1306043880000
1306044360000
1306044840000
1306045320000
                   dThrou(ops/sec)




1306045800000
DataStore Platform
          build a scalable BDMS
          应用层                  数据访问层
                                 SQL语言,JDBC Driver
                                 API
                                 导入工具
                                 数据分析接口 (包括Hadoop集成接口)

                               数据模型和表述层
                                 数据模型和Schema定义,存储引擎映射
API, SQL, Hadoop MapReduce接口     索引管理
                                 简单关系模型

          BDMS集群               分布式存储引擎层
                                WAL,写缓存和读缓存
                                存储文件结构和索引结构
      Structured/Semi-          数据压缩和压紧
                                数据分布管理和索引
                                本地分析引擎

       High Availability       分布式存储平台层
                                分布式数据存储
                                负载均衡
                                数据副本和一致性管理
       High Scalability         数据寻址

                               集群服务层
                                集群节点网络拓扑
           Big Data             故障监测
                                分布式异步通讯框架

        BDMS逻辑架构                   BDMS软件层次架构
Performance of BDMS
Streaming Ingest Data Throughput
 write ops/Sec
 140000
 120000
 100000
  80000
  60000
  40000
  20000
      0
            1
           17
           33
           49
           65
           81
           97
          113
          129
          145
          161
          177
          193
          209
          225
          241
          257
          273
          289
          305
          321
          337
          353
          369
          385
          401
          417
          433
          449
          465
          481
          497
          513
          529
          545
          561
          577
          593
          609
          625
          641
          657
          673
          689
          705
                                               totalThroughput   deltaThroughput


SLA of Random Query
                                                                          Query Result   select * from table where
percentage of read ops                                                                   msisdn > xxx limit N;
100.00%
 80.00%
                                                                          limit 1        0.34 second
 60.00%                                                                   limit 10       0.31 second
 40.00%                                                                   limit 100      0.40 second
 20.00%
                                                                          limit 1000     0.46 second
  0.00%
          1   3   5   7   9   11 13 15 17 19 21 23 25 27 29 31 33         limit 10000    1.25 seconds
                                     100ms                                limit 500000   55.42 seconds
CloudNAS+MagicBox Enterprise
                 Solution
            办公/SOHO网络                                                    Company LAN or WAN
                                      BigdataClou
                                      d NAS Proxy
                                                                              Enterprise Private
                 Access files via                       Web Service             BigdataCloud
                 CIFS/NFS/FTP                           RESTful API
                                                                             MagicBox
                                                                              Service
              MagicBox
               Client




•   CloudNAS                                        •       MagicBox
    NAS Proxy + NAS in BigdataCloud                           Backup/Sync/Sharing/Versioning
     – File Server
                                                              – Documents Backup
     – Archive Server
     – Backup Server                                          – Collaboration
Parallel Computing Platform
                            Applications



  Dataset as Input.          job launch
Partition/Split as used
    defined policy
                            MapReduce
                            JobTracker
                                           ass
                                               ign
                                                     red
                            assign map                     uce


Data Split-1                  Map-1



Data Split-2                  Map-2
                                                                  Reduce-1   Output-1



Data Split-3                  Map-3



Data Split-4                  Map-4
                                                                  Reduce-2   Output-2



Data Split-5                  Map-5




                          MapReduce
                                                                 BSP
Cloud Management
Thank You Very Much!
          Any more question?

      schubert.zhang@gmail.com

       http://cloudepr.blogspot.com
http://www.slideshare.net/schubertzhang

Mais conteúdo relacionado

Mais procurados

Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructuredatastack
 
Cloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sureCloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sureNguyen Duong
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Big Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkBig Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkGraisy Biswal
 
Dataline Tysons Corner 100808 Barry Lynn
Dataline Tysons Corner 100808 Barry LynnDataline Tysons Corner 100808 Barry Lynn
Dataline Tysons Corner 100808 Barry LynnGovCloud Network
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irdatastack
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Prince Building Tech Talk 12102012
Prince Building Tech Talk 12102012Prince Building Tech Talk 12102012
Prince Building Tech Talk 12102012Andy Parsons
 
Big Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingBig Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingAnimesh Chaturvedi
 
MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211MapR Technologies
 
SoftwareGuru 2009 - Cloud Computing
SoftwareGuru 2009 - Cloud ComputingSoftwareGuru 2009 - Cloud Computing
SoftwareGuru 2009 - Cloud ComputingJose Tam
 
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...CloudOps Summit
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
Capacity Managementand the Cloud
Capacity Managementand the CloudCapacity Managementand the Cloud
Capacity Managementand the Clouddannyq
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL TechnologiesAmit Singh
 

Mais procurados (20)

Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
Cloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sureCloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sure
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Big Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkBig Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. Spark
 
Dataline Tysons Corner 100808 Barry Lynn
Dataline Tysons Corner 100808 Barry LynnDataline Tysons Corner 100808 Barry Lynn
Dataline Tysons Corner 100808 Barry Lynn
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Prince Building Tech Talk 12102012
Prince Building Tech Talk 12102012Prince Building Tech Talk 12102012
Prince Building Tech Talk 12102012
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
Demystifying Cloud Computing
Demystifying Cloud Computing Demystifying Cloud Computing
Demystifying Cloud Computing
 
Microsoft Cloud Computing E-Book
Microsoft Cloud Computing E-BookMicrosoft Cloud Computing E-Book
Microsoft Cloud Computing E-Book
 
Big Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingBig Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computing
 
Wolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat DresdenWolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat Dresden
 
MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211
 
SoftwareGuru 2009 - Cloud Computing
SoftwareGuru 2009 - Cloud ComputingSoftwareGuru 2009 - Cloud Computing
SoftwareGuru 2009 - Cloud Computing
 
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
Capacity Managementand the Cloud
Capacity Managementand the CloudCapacity Managementand the Cloud
Capacity Managementand the Cloud
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 

Destaque

The World of Structured Storage System
The World of Structured Storage SystemThe World of Structured Storage System
The World of Structured Storage SystemSchubert Zhang
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aSchubert Zhang
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBaseSchubert Zhang
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-streamSchubert Zhang
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Schubert Zhang
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南Schubert Zhang
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aSchubert Zhang
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on HadoopSchubert Zhang
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Schubert Zhang
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile DevelopmentSchubert Zhang
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionSchubert Zhang
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationSchubert Zhang
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processingSchubert Zhang
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验Schubert Zhang
 
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value PairsHFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value PairsSchubert Zhang
 
Cassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationCassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationSchubert Zhang
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems ReviewSchubert Zhang
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor IntroductionSchubert Zhang
 

Destaque (20)

The World of Structured Storage System
The World of Structured Storage SystemThe World of Structured Storage System
The World of Structured Storage System
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223a
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBase
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-stream
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on Hadoop
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
 
Career Advice
Career AdviceCareer Advice
Career Advice
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile Development
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value PairsHFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
 
Cassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationCassandra Compression and Performance Evaluation
Cassandra Compression and Performance Evaluation
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems Review
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 

Semelhante a Big data and cloud

Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcDataTactics
 
Cloud Computing and Big Data
Cloud Computing and Big DataCloud Computing and Big Data
Cloud Computing and Big DataRobert Keahey
 
Bd cloud v3
Bd cloud v3Bd cloud v3
Bd cloud v3scm24
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Introduction to cloud computingJithin Parakka
 
Cloud computing overview
Cloud computing overviewCloud computing overview
Cloud computing overviewdaklug
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudKhazret Sapenov
 
Open Source Cloud Computing: Practical Solutions For Your Online Presence (PDF)
Open Source Cloud Computing: Practical Solutions For Your Online Presence (PDF)Open Source Cloud Computing: Practical Solutions For Your Online Presence (PDF)
Open Source Cloud Computing: Practical Solutions For Your Online Presence (PDF)Todd Deshane
 
Cloud Storage and Cloud Computing.pptx
Cloud Storage and  Cloud Computing.pptxCloud Storage and  Cloud Computing.pptx
Cloud Storage and Cloud Computing.pptxANALEESUAREZ2
 
Above the cloud joarder kamal
Above the cloud   joarder kamalAbove the cloud   joarder kamal
Above the cloud joarder kamalJoarder Kamal
 
Evolution of the cloud
Evolution of the cloudEvolution of the cloud
Evolution of the cloudsagaroceanic11
 
Info Sec 2010 Possibilities And Security Challenges Of Cloud Computing (Han...
Info Sec 2010   Possibilities And Security Challenges Of Cloud Computing (Han...Info Sec 2010   Possibilities And Security Challenges Of Cloud Computing (Han...
Info Sec 2010 Possibilities And Security Challenges Of Cloud Computing (Han...ptaglephd
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptxbetalab
 
Ramakrishnan Keynote Ladis2009
Ramakrishnan Keynote Ladis2009Ramakrishnan Keynote Ladis2009
Ramakrishnan Keynote Ladis2009yarapavan
 
Cloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens NimisCloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens NimisJensNimis
 
Cloud computing by Luqman
Cloud computing by LuqmanCloud computing by Luqman
Cloud computing by LuqmanLuqman Shareef
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloudA1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloudDr. Wilfred Lin (Ph.D.)
 
Cloud Computing : Security and Forensics
Cloud Computing : Security and ForensicsCloud Computing : Security and Forensics
Cloud Computing : Security and ForensicsGovind Maheswaran
 

Semelhante a Big data and cloud (20)

Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtc
 
Cloud Computing and Big Data
Cloud Computing and Big DataCloud Computing and Big Data
Cloud Computing and Big Data
 
Bd cloud v3
Bd cloud v3Bd cloud v3
Bd cloud v3
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Introduction to cloud computing
 
Cloud computing overview
Cloud computing overviewCloud computing overview
Cloud computing overview
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
 
Open Source Cloud Computing: Practical Solutions For Your Online Presence (PDF)
Open Source Cloud Computing: Practical Solutions For Your Online Presence (PDF)Open Source Cloud Computing: Practical Solutions For Your Online Presence (PDF)
Open Source Cloud Computing: Practical Solutions For Your Online Presence (PDF)
 
Cloud Storage and Cloud Computing.pptx
Cloud Storage and  Cloud Computing.pptxCloud Storage and  Cloud Computing.pptx
Cloud Storage and Cloud Computing.pptx
 
cloud computing
cloud computingcloud computing
cloud computing
 
Above the cloud joarder kamal
Above the cloud   joarder kamalAbove the cloud   joarder kamal
Above the cloud joarder kamal
 
Evolution of the cloud
Evolution of the cloudEvolution of the cloud
Evolution of the cloud
 
Info Sec 2010 Possibilities And Security Challenges Of Cloud Computing (Han...
Info Sec 2010   Possibilities And Security Challenges Of Cloud Computing (Han...Info Sec 2010   Possibilities And Security Challenges Of Cloud Computing (Han...
Info Sec 2010 Possibilities And Security Challenges Of Cloud Computing (Han...
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
Ramakrishnan Keynote Ladis2009
Ramakrishnan Keynote Ladis2009Ramakrishnan Keynote Ladis2009
Ramakrishnan Keynote Ladis2009
 
Cloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens NimisCloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens Nimis
 
Cloud computing by Luqman
Cloud computing by LuqmanCloud computing by Luqman
Cloud computing by Luqman
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloudA1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
 
Cloud Computing : Security and Forensics
Cloud Computing : Security and ForensicsCloud Computing : Security and Forensics
Cloud Computing : Security and Forensics
 

Mais de Schubert Zhang

Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and InfrastructureSchubert Zhang
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSchubert Zhang
 
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Schubert Zhang
 
无线信息传媒的技术分析和商业模式
无线信息传媒的技术分析和商业模式无线信息传媒的技术分析和商业模式
无线信息传媒的技术分析和商业模式Schubert Zhang
 
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataSchubert Zhang
 

Mais de Schubert Zhang (10)

Blockchain in Action
Blockchain in ActionBlockchain in Action
Blockchain in Action
 
科普区块链
科普区块链科普区块链
科普区块链
 
Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and Infrastructure
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluation
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
 
Fans of running gump
Fans of running gumpFans of running gump
Fans of running gump
 
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)
 
pNFS Introduction
pNFS IntroductionpNFS Introduction
pNFS Introduction
 
无线信息传媒的技术分析和商业模式
无线信息传媒的技术分析和商业模式无线信息传媒的技术分析和商业模式
无线信息传媒的技术分析和商业模式
 
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of Data
 

Último

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Último (20)

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Big data and cloud

  • 1. Big Data and Cloud Jun 30, 2011 Schubert Zhang
  • 2. Who am I • Schubert Zhang (张松波) • Chief Architect and Director of Big Data Engineering and Cloud • Research Cloud Tech., Develop Cloud Projects and Products from 2007 • Led the core development team of CMCC “Big Cloud”. @Hanborq • 10-years telecom products development and tech- management. @UTStarcom
  • 3. Agenda • Introduction of Cloud Storage and Computing • Big Data and Cloud • Our Big-Data/Cloud Products and Solutions • Anything for Discussion …
  • 5. A Popular Definition of Cloud … • Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. • Cloud storage is a model of networked online storage where data is stored on multiple servers. Hosting companies operate large data centers, which provides the resources according to the requirements of the customer and expose them as storage pools, which the customers can themselves use to store files or data objects. Physically, the resource may span across multiple servers or/and data centers. • It promotes availability and is composed of five essential characteristics, three service models, and four deployment models.
  • 6. A Popular Definition of Cloud … Hybrid Clouds Deployment Private Community Public Cloud Models Cloud Cloud Service Software as a Platform as a Infrastructure as a Models Service (SaaS) Service (PaaS) Service (IaaS) On Demand Self-Service Essential Broad Network Access Rapid Elasticity Characteristics Resource Pooling Measured Service Massive Scale Elastic Computing Common Homogeneity Geographic Distribution Characteristics Virtualization Service Orientation Low Cost Software Advanced Security
  • 7. Examples of Famous Cloud Products • Google Techs: – Google AppEngine (Storage for Database, etc.) GFS2/Bigtable/MapReduce/ – Google Storage (Storage for Objects) Megastore/Spanner/Pregel /Dremel… • Amazon AWS – Simple Storage Service – S3 (Storage for Objects) Techs: – Cloud Drive (Online Storage for Individuals) Web-Service-Protocol/ – SimpleDB (Storage for Database) – Elastic Compute Cloud – EC2 (Compute) Bitstore/Keymap/Dynamo … • Rackspace Techs: – Cloud Servers (Compute) – Cloud Files (Storage for Objects) Open Stack … • Facebook Techs: – Messages Hive/Scribe/Haystack/Hadoop – Photo Storage … • Cloudera – Hadoop …
  • 8. We focus on The Technologies Back of the Cloud • Storage • Computing • High Scalability • High Scalability – Shared-Nothing – Object-Oriented • Parallel Computing Framework – NoSQL – … – MR - MapReduce – BSP - Bulk Synchronous Parallel • High Availability – Failure-Detecting • Job/Task scheduler – Server Clustering – Replication • Failure rework – Eventual Consistency • PDM - Parallel Data Analysis/Mining – … Algorithms • Big Data – Simple Statistic/Analysis – PB level storage – Structured or non-structured – Classification/Clustering … – Information Retrieval – For Recommendation and AD – Indexing – Automatic re-sharding/re-partitioning – … – Automatic load balancing – … • High Throughput/Latency – Optimized IO and data write/read models.
  • 10. Big Data • Immutable Law of Big Data – Volume – Variety – Velocity • Need …. – Distributed System • Many-many commodity machines – Scale-out vs. Scale-Up • Scale-out: Auto vs. Manually
  • 11. Big Data, Big Business $2.25B $400M $1.7B $250M $263M $2.35B >>$30.5M (vc) Storage Products/Solutions Data Warehouse NAS (Limited Scale-out) (MPP)
  • 12. The Next Decade in Data Management A stable system capable of variety of apps is necessary. Innovations in database are a requirement. New data stores are necessary. Differentiation between programs ill continue until key innovations in data management platforms become uniform.
  • 14. Overview Cloud Applications (MagicBox, EnterpriseApps …) Cloud Datasets RESTful 科研 Cloud Services (web-based) (ObjectStorage Service, DataStore Service, MapReduce Service, Compute Service …) NGO … Cloud Stack • 以Cloud Stack云技术产品和 (CloudOS, SandStor, PebStor, MapReduce, vCompute, …) 方案为基础; • 提供面向大规模数据存储和 处理的行业应用解决方案: Cloud Solutions; Cloud Solutions • 提供面向公众和企业的存储、 计算、应用云服务产品: 物 互 Cloud Services; 电 电 视 交 医 政 提供云应用: Cloud 联 联 … • 力 信 频 通 疗 府 Applications。 网 网
  • 15. Our Focus • Enterprise Big Data Management • Leverage of the Cloud Tech. from Internet Backend
  • 16. Hardware 采用标准的普通服务器硬件(PC-Server)和网络设备,采用大数集群软件平台构建灵活的集群 系统。集群规模可从几个节点到几千节点,存储规模可高达PB级。 We rely more on software layer scalability (scale-out) and fault-tolerance. 传统服务器: IBM小型机(p5 570) 联系集群系统(深腾7000G) 曙光集群系统(曙光TC5000) SUN服务器 … 传统存储系统: NAS系统 SAN系统 磁盘阵列 • 普通标准PC服务器 • 自带存储 (单点可>10TB) 弱点: • 易维护 昂贵、扩展难、限制多 • 节点可替代 • 集群扩展方便 拒绝昂贵、难扩展、局限性 • 组网灵活 多的小型机、硬件捆绑集群 • Cluster-Level Soft RAID 和SAN/NAS等存储设备。
  • 17. Products and Features Cloud API Cloud DataStore ObjectStorage MapReduce Compute Services Cloud Cloud Cloud Cloud SandStor PebStor MapReduce Cloud vCompute CloudOS Stack Hardware & OS CloudOS SandStor PebStor MapReduce vCompute • Distributed Cloud Platform • Distributed • Distributed Blob • Flexible Parallel Data • Virtual Machines • Commodity Hardware and Structured Data Data Management Processing and Computing Cluster Management Framework Resources mgmt • Common features • Common features of CloudOS • Common features of • Multi VMs support • High Scalability CloudOS • High Reliability(Data Replication) of CloudOS • Efficiency indexes • Elastic VMs • Large-scale • High Availability • High efficiency and meta mgmt provisioning • High parallelized Indexing • Efficiency storage • Auto-scale • Strong Consistency • Locality computing • Multi-level Cache space mgmt • High Throughput • Simple model for • Compression • De-duplicating programming • Load Balancing • Fast random access, • Unlimited blob size • Abundant high-level • Global Data Access Low Latency languages and • Global File system toolkits • Flexible Schema • Simplify Complexity of Apps • Seamlessly integrated • High Durability, no data loss with storage system July 3, 2012 17
  • 18. Cloud Service Platform Cloud Services 相似的同类产品或业务 • Cloud Services API ObjectStorage Cloud Service Amazon S3 – 基于Web,随处可得 Google Storage for Developer – RESTful风格,简单易用 Rackspace Files/OpenStack Swift – 提供对语言开发SDK Google BlobStore DataStore Cloud Service Amazon SimpleDB • Cloud Services的特点 Google DataStore – 用户无需关心实现 MapReduce Cloud Service Amazon MapReduce – 随处可得 Hadooop – 数据可靠性高 Video Media Cloud Service … Video – 伸缩性强 Delivery/Streaming/Transcoding/ – 可用性高(99.9%) Time-shifting/Analytics – 按实际使用付费 – 简单易用 • Multi-Level Cloud Services: – API符合业界标准/习惯 – Infrastructure – Platform – 丰富的管理和监控工具 – Applications – 严密且灵活的安全策略 – 多种云服务整合的AAA服 务
  • 19. Object Storage Platform build another S3 RockStor Object Storage system provides object storage infrastructure services which guaranteed efficiency, robustness and load-balance. Object Access Layer Providing Client Lib Object-Oriented High Availability MetaStore Layer DHT-based Consistent Overlay Network High Scalability Data Chunk Store Layer Autonomous Overlay Network Huge Capacity Clustered storage nodes
  • 20. Object Storage Cloud Services RESTful API举例 (一个简单的对象上传/PUT操作) Object Storage Web-based管理系统 和Amazon S3类似
  • 21. 2000 4000 6000 8000 10000 0 1306028040000 1306028520000 1306029000000 1306029480000 1306029960000 1306030440000 1306030920000 1306031400000 count Total used time(hour) latency(us) 1306031880000 Total average Total operations 1306032360000 Total Data size(GB) 1306032840000 Total throughput/sec 1306033320000 1306033800000 1306034280000 1306034760000 1306035240000 4.93 1306035720000 132.230 7084.320 Write(8KB) 134220800 1306036200000 1024 (=1TB) 1306036680000 1306037160000 1306037640000 1306038120000 1306038600000 1306039080000 1306039560000 1306040040000 1306040520000 17.267 464.012 2155.119 Read(8KB) 1306041000000 134220800 1024 (=1TB) 1306041480000 Performance of S3 1306041960000 1306042440000 1306042920000 1306043400000 1306043880000 1306044360000 1306044840000 1306045320000 dThrou(ops/sec) 1306045800000
  • 22. DataStore Platform build a scalable BDMS 应用层 数据访问层 SQL语言,JDBC Driver API 导入工具 数据分析接口 (包括Hadoop集成接口) 数据模型和表述层 数据模型和Schema定义,存储引擎映射 API, SQL, Hadoop MapReduce接口 索引管理 简单关系模型 BDMS集群 分布式存储引擎层 WAL,写缓存和读缓存 存储文件结构和索引结构 Structured/Semi- 数据压缩和压紧 数据分布管理和索引 本地分析引擎 High Availability 分布式存储平台层 分布式数据存储 负载均衡 数据副本和一致性管理 High Scalability 数据寻址 集群服务层 集群节点网络拓扑 Big Data 故障监测 分布式异步通讯框架 BDMS逻辑架构 BDMS软件层次架构
  • 23. Performance of BDMS Streaming Ingest Data Throughput write ops/Sec 140000 120000 100000 80000 60000 40000 20000 0 1 17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 257 273 289 305 321 337 353 369 385 401 417 433 449 465 481 497 513 529 545 561 577 593 609 625 641 657 673 689 705 totalThroughput deltaThroughput SLA of Random Query Query Result select * from table where percentage of read ops msisdn > xxx limit N; 100.00% 80.00% limit 1 0.34 second 60.00% limit 10 0.31 second 40.00% limit 100 0.40 second 20.00% limit 1000 0.46 second 0.00% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 limit 10000 1.25 seconds 100ms limit 500000 55.42 seconds
  • 24. CloudNAS+MagicBox Enterprise Solution 办公/SOHO网络 Company LAN or WAN BigdataClou d NAS Proxy Enterprise Private Access files via Web Service BigdataCloud CIFS/NFS/FTP RESTful API MagicBox Service MagicBox Client • CloudNAS • MagicBox NAS Proxy + NAS in BigdataCloud Backup/Sync/Sharing/Versioning – File Server – Documents Backup – Archive Server – Backup Server – Collaboration
  • 25. Parallel Computing Platform Applications Dataset as Input. job launch Partition/Split as used defined policy MapReduce JobTracker ass ign red assign map uce Data Split-1 Map-1 Data Split-2 Map-2 Reduce-1 Output-1 Data Split-3 Map-3 Data Split-4 Map-4 Reduce-2 Output-2 Data Split-5 Map-5 MapReduce BSP
  • 27. Thank You Very Much! Any more question? schubert.zhang@gmail.com http://cloudepr.blogspot.com http://www.slideshare.net/schubertzhang