O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Big data stores

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 12 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Big data stores (20)

Anúncio

Mais recentes (20)

Big data stores

  1. 1. Introduction to Big Data stores: Key Value stores: Cassandra: • First developed at Facebook (powered the Inbox Search) • Uses decentralized clustered nodes • Considered one of the most scalable NoSQL systems • Very high availability – no single point of failure • Flexible data storage (structured/un-structured) • Relatively easy to configure • Designed for high transaction rates • Java based – Available under the latest Apache license
  2. 2. Key Value NOSQL Databases DynamoDB: • Amazon DynamoDB stores data on Solid State Drives (SSDs) • DynamoDB implements cryptographic methods to authenticate users and prevent unauthorized data access. • Stronger consistency on read tracked by atomic counters enables latest values. • Reduces the over-head of scaling and replication from developers. • Synchronous replication across multiple AWS Availability Zones in an Single Region. • DynamoDB with other AWS features like AWS-EMR, AWS-Data Pipeline can perform complex analytics and data movement respectively.
  3. 3. Key Value NOSQL Databases Riak: • Riak adopts Mater-less peer-peer architecture • Written in Erlang & C, some JavaScript. • Distributes data and performs replication across nodes with consistent hashing. • Riak uses HTTP/REST or custom binary to communicate data with Cluster/Nodes. • Riak has two modes of operation (ie) fullsync (Synchronization occurs every 6 hours) and real-time. (requires synchronization trigger) • When new nodes are added to cluster, data is rebalanced across nodes with no downtime. • Used by 25% of fortune 50 companies. AT&T, AOL, Ask.com, Best Buy, Boeing and Comcast.
  4. 4. Key Value NOSQL Databases Redis: • Redis adopts Master-Slave architecture • Slaves are allowed communicate with each other. • Redis is written in ANSI C and is best suited for rapidly changing data, with predictable size. Ex) Stock-Analysis • By default, latency monitoring is disabled and user can enable by setting a threshold value to variable "latency threshold" • Redis is designed to be accessed by trusted-users within trusted environment. • Performs Hash or Range partitioning(Mapping range of object to specific Redis instance)
  5. 5. Key Value NOSQL Databases CouchDB: • Written in Erlang. • Instead of locks, CouchDB uses Multi-Version Concurrency Control (MVCC) to manage concurrent access to the database. • CouchDB achieves eventual consistency between multiple databases by using incremental replication. • Validates documents using Java Script functions and approve/deny the document update. • CouchDB supports both pull replication(node acts as target)and push replication(node acts as source). • CouchDB is best suited for data that changes occasionally.
  6. 6. Key Value NOSQL Databases Azure Table Storage: • Maximum data size is 200 TB per table. • Azure Table retrieves a maximum of 1000 rows per table. • Azure Table Storage provides ACID transaction that guarantees CRDU operations for a single entity in a table. • Storage access architecture of Azure Table Storage has three-layered structure Front-End (FE) layer - Authenticates and authorizes the request. Partition Layer - partitions the object data and performs load-balancing. Distributed and replicated File System (DFS) Layer - Distributes and Replicates data across many clusters. • Azure Table Storage does not provide a way to represent relationships between data. • To provide fault tolerance the stored data is replicated three times within the region, and replicated an additional 3 times in another region.
  7. 7. Key Value NOSQL Databases BerkeleyDB: • Berkeley DB is a embedded database engine and is suitable for storing key/value data. • Key and data items are stored in simple structures called DBT (DBT is an acronym for database thang) that contains reference to memory and length. • Berkeley DB supports concurrency in threads even in database with size. • Program accessing Berkeley DB determines how data is to stored in records. • Berkeley DB has three different products: o Berkeley DB - contains database implementations and is written in C o Berkeley DB Java Edition - Log structured storage architecture and coded in Pure Java. o Berkeley DB XML - specializes in the storage of XML documents
  8. 8. Column-Family NOSQL databases: HBase: • First developed at Powerset (to power natural language search) • Distributed column oriented database on top of Hadoop/HDFS. • Continuous access to data - Multiple master nodes. • Linear and modular scalability. • Provides interactive commands for manipulating database • Single row atomic operations and row level exclusive locks. • Multiple clients like its native Java library, Thrift, and REST
  9. 9. Column-Family NOSQL databases: BigTable: • First developed at Google(Structured data ). • Sparse, distributed, persistent multidimensional sorted map. • Self Managing ( Servers can be added/removed dynamically. Servers adjust to load imbalance). • Fault tolerant & Persistent. • Designed to scale into the petabyte range. • Tables are optimized for GFS (Google File System) by being split into multiple tablets.
  10. 10. Column-Family NOSQL databases: HyperTable: • Developed as an in-house software at Zvents. • Manages massive spare tables with timestamped cell versions. • Maximum efficiency (Less hardware, power, datacenter). • Good fit for wide range of applications. • Clean semantics. • High performance.
  11. 11. Graph NOSQL databases: Neo4j: • Developed by Neo Technology • Highly scalable, robust. • Graph structures with nodes, edges and properties to store data. • Provides index-free adjacency • Neo4j is schema free – Data does not have to adhere to any convention • ACID – atomic, consistent, isolated and durable for logical units of work • Easy to get started and use. • Support for wide variety of languages (Java, Python, Perl, Scala, Cypher, etc)
  12. 12. Document NOSQL databases: MongoDB: • Developed by the software company 10gen as service product later shifted to open source. • Document Oriented Database. • Implemented in C++ for best performance. (built for speed). • Super low latency access to your data (Very little CPU overhead). • Auto Sharding for easy scalability. • Map/Reduce for Aggregation. • Full index support for high performace. • Language drivers for (Ruby/Ruby on rails, Java, C#, JavaScript, Python, Perl, Erlang etc).

Notas do Editor

  • Cassandra (an Apache project) is a NOSQL Key Value store distributed storage system designed for storing and managing huge amounts of structured or unstructured data over many nodes. Cassandra was first developed at Facebook and has been available as an Apache top-level project since 2010. Like many other NOSQL systems, Cassandra is designed to run over cheap commodity hardware. Cassandra runs over a series of many decentralized clustered nodes and offers very elastic scalability. Capacity can be increased and put online on the fly. This makes Cassandra an ‘always on solution’. Also, because of its distributed architecture, Cassandra has no single point of failure. Cassandra is designed never to go down. Ever

    Some design aspects of Cassandra resemble a traditional database management system. Some of the terminology will look recognizable to SQL/DDL database developers. However, Cassandra (like most other NOSQL solutions) does not support a normalized data model.

    Cassandra is hugely popular and is generally considered the most implemented of the NO/SQL databases. Most like the low complexity of Cassandra. Many consider it an easy and simple solution for cloud data storage. Its simplicity and elegant design makes it a natural choice for many organizations

×