SlideShare uma empresa Scribd logo
1 de 88
Understanding and Building Big
Data Architectures
Ran.ga.na.than B, ThoughtWorks
@ran_than
Part 1 - Storage and NoSQL
Agenda
Img-src: http://dev.assets.neo4j.com.s3.amazonaws.com/wp-content/uploads/graph-data-jim-webber-presentation.
png
Basics
Story of Sir William Osler, University of Oxford
Img-src:http://i.dailymail.co.uk/i/pix/2009/07/01/article-1196775-0543FB03000005DC-763_634x364.jpg
Latency
● Hibernia Express
● 3,000-mile fiber-optic
● across the Atlantic Ocean to connect London to New York
● goal for 5ms latency
● To be used by Financial Institutes for trading
Src: http://shop.oreilly.com/product/0636920028048.do
Why distributed?
HDD Speed
❏ ~122MB per sec
❏ 1TB in 2hr 22 minutes
❏ SSDs are 2-3 times faster
Multiple disks reading parallel:
❏ With 100 HDDs, it takes 6 minutes
Data and Compute together
DataCenter
Img-src: https://fortunedotcom.files.wordpress.com/2015/06/screen-shot-2015-06-24-at-11-54-41-am.png?w=1024
Cluster and nodes
Img-src:https://en.wikipedia.org/wiki/Computer_cluster#/media/File:Cubieboard_HADOOP_cluster.JPG
Clusters
1. MultiNode: e.g: Hadoop, each node has some
responsibility.
2. Peer-to-Peer: e.g: Cassandra, all nodes are equal
Cluster Coordination
Activity
Expectations from
Data Systems
Non-functional parts of the application.
Expectations from Architecture
❏ Reliability
❏ Scalability
❏ Maintainability
❏ High Availability
❏ Fault Tolerance
❏ Security
❏ Compliance
❏ Compatibility
Components of
Data Systems
Components
❏ Storage
❏ Cache
❏ Search
❏ Stream Processing
❏ Batch Processing
❏ Data Exchange Protocols
Components
❏ Storage
❏ Cache
❏ Search
❏ Stream Processing
❏ Batch Processing
❏ Data Exchange Protocols
NoSQL Databases
“Database Admins walked into a NoSQL bar. A little while later
they walked out because they couldn’t find a table.”
Why NoSQLs?
❏ Scalability
❏ Cost
❏ Flexibility
❏ Availability
❏ Migrations
CAP theorem
Strong Consistency, High Availability, and Partition-Tolerance
Img-src:http://image.slidesharecdn.com/cap-131117230434-phpapp02/95/dynamo-and-bigtable-in-light-of-the-cap-theorem-12-638.jpg?cb=1384729712
CA
CP “when your business requirements dictate
atomic reads and writes”
Src: http://robertgreiner.com/2014/08/cap-theorem-revisited/
AP “when the system needs to continue to function
in spite of external errors”
Src: http://robertgreiner.com/2014/08/cap-theorem-revisited/
CAP story
Activity
Design ticket booking with scenarios of
CA, CP, AP
ACID
Atomic, Consistent, Isolated, and Durable
BASE
● Basically Available: If a single node fails, part of the
data won't be available, but the entire data layer stays
operational.
● Soft state: Soft state means data that is not persisted
on the disk, yet in case of failure it could be possible to
restore it.
● Eventually consistent: indicates that the system will
become consistent over time, given that the system
doesn't receive input during that time.
Key-Value DBs
Memcached, Redis, Riak, Voldemort, ...
Namespaces, Keys, Values
Implementation 1 - Arrays
● Only int as key
● Values are of same type
Implementation 2 - Associative Arrays
Key Value
user1 Mike
user2 Mary
user3 Nina
On hotspace
Simple Storage Design
- put key value - will add content to file in one line
- get key - will grep for key and return the value from the
file
What are the problems with this?
Activity
How can we improve this?
Simple Storage Design
- Add in memory index, with key and value as byte offset.
What are the problems with this?
Activity
How can we improve this?
Simple Storage Design
- Segments
- Compaction
What are the problems with this?
Activity
How can we improve this?
Simple Storage Design
- Sorted Key-Value
- Sparse index
- SSTable
What are the problems with this?
Activity
How can we improve this?
Simple Storage Design
- Memtable
- Segments as SSTable
Img-src: https://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/
Simple Storage Design - Overall
- Writes into RedBlack or AVL trees in memory - memtable
=> faster writes
- When memtable is 64MB, write to disk as SSTable and
clean memtable
- First read from memtable and most recent segments in-
memory sparse index (SSTable) => faster reads
- Run a merging and compaction process in the background
=> lesser storage and faster
Assignment
Let us do the e-commerce design with
only key value pair
Document DBs
CouchDB, MongoDB, MarkLogic, DocumentDB, OrientDB, ...
Img-src: http://blog.philipphauer.de/wp-content/uploads/2015/05/Match-OO-Document.png
Query - MongoDB
Img-src: http://bicortex.com/introduction-to-mongodb-nosql-database-for-sql-developers-part-3/
Query - CouchDB
// emit the first letter of each pokemon's name
var myMapReduceFun = {
map: function (doc) {
emit(doc.name.charAt(0));
},
reduce: '_count'
};
// count the pokemon whose names start with 'P'
pouch.query(myMapReduceFun, {
key: 'P', reduce: true, group: true
}).then(function (result) {
// handle result
}).catch(function (err) {
// handle errors
});
What’s cool?
● Flexible schema.
● Embedded docs come in one read.
●
Not so cool
● Familiarity
● Needs more space.
● Doesn’t speak SQL.
Assignment
Let us do the e-commerce design with
only document database
Graph DBs
Neo4j, TitanDB, OrientDB, ...
Relationships in relational DBs means joining
Why?
❏ Modelling and storing relationships in RDBMS is
complicated
❏ Performance degrades with number and levels of
relationships.
❏ Query complexity grows
❏ Adding new type requires schema redesign
With neo4j, you can traverse 4M+ relationships
per second and core
Img-src: http://blog.octo.com/wp-content/uploads/2012/07/RequestInSQL.png
Squeezing to table structure
Img-src: http://blog.octo.com/wp-content/uploads/2012/07/RequestInGraph.png
Closer to white board model
Graph consists of
❏ Vertices
❏ Edges
Cypher Query - Neo4j
SPARQL - RDF data model
Column oriented
DBs
Apache HBase, Cassandra, BigTable, ...
Src: https://www.safaribooksonline.com/library/view/hbase-the-definitive/9781449314682/httpatomoreillycomsourceoreillyimages889228.png
Src: http://static.oschina.net/uploads/img/201303/12072155_ROPI.gif
Img-src: http://www.slideshare.net/romain_jacotin/undestand-google-bigtable-is-as-easy-as-playing-lego-bricks-lecture-by-romain-jacotin
When is this better?
❏ Huge number of columns, with queries on few columns
❏ Aggregation
❏ Column level update
❏ Column data is uniform; so better compression
Time Series Data
Measurement and Time of measurement done repeatedly
Img src: https://www.safaribooksonline.com/library/view/time-series-databases/9781491920909/images/tsdn_0103.png.jpg
Why - Time Series Data
● Trends
When - Time Series
Data
● Huge amount of data
● Mostly query based on time
● Stock exchange
● Sensor data. E.g: Trucks
● Cell towers for usage patterns
Time Series Data
● IoT
● Logs
Replication
This is useful when you have a ncie photo or color-black as a
background. On this slide only, you can put your elements behind
a master element.
Journals
Master-Follower
Masterless
Sharding
Which bucket has your data?
Master
Slave
- Master for writes and real-time
reads
- Slaves for reads
Table
partitioning
- By rows or columns for parallelizing
reads
Feature
specific DBs
- Different DB servers for specific
features of application
- Can this scale?
Federated
Tables
A Federated Table is a table which points
to a table in another database instance
(mostly on an other server). It can be
seen as a view to this remote database
table.
- Administration overhead
- Security
- Access over network
- Okay for reporting/analytical tasks
Federated
Tables
A Federated Table is a table which points
to a table in another database instance
(mostly on an other server). It can be
seen as a view to this remote database
table.
- Administration overhead
- Security
- Access over network
- Okay for reporting/analytical tasks
Range
based
Split data based on range condition. Eg:
zip code, region.
- Non-uniform distribution
Hash based Take hash of key and modulo operation,
put the data in the server based on
reminder value.
- Uniform distribution
- Range queries may take time
Co-ordinators
- Take request, if key is in the request, talk to correct shard
- Co-ordinate across shards to give the result back
- Monitor health
- Take care of rebalancing
- Can be a random node, which will complete the task
- Set of co-ordinators
Take care, while sharding
● Balance your shards, with proper shard key
● Choose correct number of shards. E.g: 12
● Give time for rebalancing. In case of increasing capacity of
server, add nodes faster, and give time move your shards.
● Shard on denormalized data.
● Try to have shard key as part of your queries.
Where am I?
THANK YOU
For questions or suggestions:
Ran.ga.na.than B
@ran_than

Mais conteúdo relacionado

Mais procurados

Remote DBA Experts SQL Server 2008 New Features
Remote DBA Experts SQL Server 2008 New FeaturesRemote DBA Experts SQL Server 2008 New Features
Remote DBA Experts SQL Server 2008 New FeaturesRemote DBA Experts
 
Migrating to postgresql
Migrating to postgresqlMigrating to postgresql
Migrating to postgresqlbotsplash.com
 
Nuxeo JavaOne 2007 presentation (in original format)
Nuxeo JavaOne 2007 presentation (in original format)Nuxeo JavaOne 2007 presentation (in original format)
Nuxeo JavaOne 2007 presentation (in original format)Stefane Fermigier
 
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)Ortus Solutions, Corp
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXTim Callaghan
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
WiredTiger Overview
WiredTiger OverviewWiredTiger Overview
WiredTiger OverviewWiredTiger
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerMongoDB
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerMongoDB
 
in-memory database system and low latency
in-memory database system and low latencyin-memory database system and low latency
in-memory database system and low latencyhyeongchae lee
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoDJamey Hanson
 
Postgres_9.0 vs MySQL_5.5
Postgres_9.0 vs MySQL_5.5Postgres_9.0 vs MySQL_5.5
Postgres_9.0 vs MySQL_5.5Trieu Dao Minh
 
The Great Debate: PostgreSQL vs MySQL
The Great Debate: PostgreSQL vs MySQLThe Great Debate: PostgreSQL vs MySQL
The Great Debate: PostgreSQL vs MySQLEDB
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010jbellis
 
Bookie storage - Apache BookKeeper Meetup - 2015-06-28
Bookie storage - Apache BookKeeper Meetup - 2015-06-28 Bookie storage - Apache BookKeeper Meetup - 2015-06-28
Bookie storage - Apache BookKeeper Meetup - 2015-06-28 Matteo Merli
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLTen Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLanandology
 

Mais procurados (20)

Remote DBA Experts SQL Server 2008 New Features
Remote DBA Experts SQL Server 2008 New FeaturesRemote DBA Experts SQL Server 2008 New Features
Remote DBA Experts SQL Server 2008 New Features
 
Migrating to postgresql
Migrating to postgresqlMigrating to postgresql
Migrating to postgresql
 
Nuxeo JavaOne 2007 presentation (in original format)
Nuxeo JavaOne 2007 presentation (in original format)Nuxeo JavaOne 2007 presentation (in original format)
Nuxeo JavaOne 2007 presentation (in original format)
 
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Percona FT / TokuDB
Percona FT / TokuDBPercona FT / TokuDB
Percona FT / TokuDB
 
WiredTiger Overview
WiredTiger OverviewWiredTiger Overview
WiredTiger Overview
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
In-memory database
In-memory databaseIn-memory database
In-memory database
 
in-memory database system and low latency
in-memory database system and low latencyin-memory database system and low latency
in-memory database system and low latency
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
 
Fudcon talk.ppt
Fudcon talk.pptFudcon talk.ppt
Fudcon talk.ppt
 
Postgres_9.0 vs MySQL_5.5
Postgres_9.0 vs MySQL_5.5Postgres_9.0 vs MySQL_5.5
Postgres_9.0 vs MySQL_5.5
 
The Great Debate: PostgreSQL vs MySQL
The Great Debate: PostgreSQL vs MySQLThe Great Debate: PostgreSQL vs MySQL
The Great Debate: PostgreSQL vs MySQL
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
 
Bookie storage - Apache BookKeeper Meetup - 2015-06-28
Bookie storage - Apache BookKeeper Meetup - 2015-06-28 Bookie storage - Apache BookKeeper Meetup - 2015-06-28
Bookie storage - Apache BookKeeper Meetup - 2015-06-28
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLTen Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
 

Destaque

Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresDATAVERSITY
 
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...Kai Wähner
 
Map reduce and the art of Thinking Parallel - Dr. Shailesh Kumar
Map reduce and the art of Thinking Parallel   - Dr. Shailesh KumarMap reduce and the art of Thinking Parallel   - Dr. Shailesh Kumar
Map reduce and the art of Thinking Parallel - Dr. Shailesh KumarHyderabad Scalability Meetup
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 
Con8862 no sql, json and time series data
Con8862   no sql, json and time series dataCon8862   no sql, json and time series data
Con8862 no sql, json and time series dataAnuj Sahni
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
 
Geeknight : Artificial Intelligence and Machine Learning
Geeknight : Artificial Intelligence and Machine LearningGeeknight : Artificial Intelligence and Machine Learning
Geeknight : Artificial Intelligence and Machine LearningHyderabad Scalability Meetup
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeNati Shalom
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupCaserta
 
Object Oriented Concept
Object Oriented ConceptObject Oriented Concept
Object Oriented Conceptsmj
 
Time series storage in Cassandra
Time series storage in CassandraTime series storage in Cassandra
Time series storage in CassandraEric Evans
 
NoSQL? No, SQL! - SQL, the underestimated "Big Data" technology
NoSQL? No, SQL! - SQL, the underestimated "Big Data" technologyNoSQL? No, SQL! - SQL, the underestimated "Big Data" technology
NoSQL? No, SQL! - SQL, the underestimated "Big Data" technologyDataGeekery
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data ArchitecturesLynn Langit
 
RWDG Webinar: The New Non-Invasive Data Governance Framework
RWDG Webinar: The New Non-Invasive Data Governance FrameworkRWDG Webinar: The New Non-Invasive Data Governance Framework
RWDG Webinar: The New Non-Invasive Data Governance FrameworkDATAVERSITY
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?Venu Anuganti
 
LDM Slides: How Data Modeling Fits into an Overall Enterprise Architecture
LDM Slides: How Data Modeling Fits into an Overall Enterprise ArchitectureLDM Slides: How Data Modeling Fits into an Overall Enterprise Architecture
LDM Slides: How Data Modeling Fits into an Overall Enterprise ArchitectureDATAVERSITY
 

Destaque (20)

Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
 
Offline first geeknight
Offline first geeknightOffline first geeknight
Offline first geeknight
 
Serverless architectures
Serverless architecturesServerless architectures
Serverless architectures
 
GeekNight: Evolution of Programming Languages
GeekNight: Evolution of Programming LanguagesGeekNight: Evolution of Programming Languages
GeekNight: Evolution of Programming Languages
 
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
 
Map reduce and the art of Thinking Parallel - Dr. Shailesh Kumar
Map reduce and the art of Thinking Parallel   - Dr. Shailesh KumarMap reduce and the art of Thinking Parallel   - Dr. Shailesh Kumar
Map reduce and the art of Thinking Parallel - Dr. Shailesh Kumar
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
Con8862 no sql, json and time series data
Con8862   no sql, json and time series dataCon8862   no sql, json and time series data
Con8862 no sql, json and time series data
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Geeknight : Artificial Intelligence and Machine Learning
Geeknight : Artificial Intelligence and Machine LearningGeeknight : Artificial Intelligence and Machine Learning
Geeknight : Artificial Intelligence and Machine Learning
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real Time
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
 
Object Oriented Concept
Object Oriented ConceptObject Oriented Concept
Object Oriented Concept
 
Time series storage in Cassandra
Time series storage in CassandraTime series storage in Cassandra
Time series storage in Cassandra
 
NoSQL? No, SQL! - SQL, the underestimated "Big Data" technology
NoSQL? No, SQL! - SQL, the underestimated "Big Data" technologyNoSQL? No, SQL! - SQL, the underestimated "Big Data" technology
NoSQL? No, SQL! - SQL, the underestimated "Big Data" technology
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
RWDG Webinar: The New Non-Invasive Data Governance Framework
RWDG Webinar: The New Non-Invasive Data Governance FrameworkRWDG Webinar: The New Non-Invasive Data Governance Framework
RWDG Webinar: The New Non-Invasive Data Governance Framework
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
LDM Slides: How Data Modeling Fits into an Overall Enterprise Architecture
LDM Slides: How Data Modeling Fits into an Overall Enterprise ArchitectureLDM Slides: How Data Modeling Fits into an Overall Enterprise Architecture
LDM Slides: How Data Modeling Fits into an Overall Enterprise Architecture
 

Semelhante a Understanding Big Data Architectures

GIDS 2016 Understanding and Building No SQLs
GIDS 2016 Understanding and Building No SQLsGIDS 2016 Understanding and Building No SQLs
GIDS 2016 Understanding and Building No SQLstechmaddy
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGuang Xu
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveSematext Group, Inc.
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRVijay Rayapati
 
Zing Database – Distributed Key-Value Database
Zing Database – Distributed Key-Value DatabaseZing Database – Distributed Key-Value Database
Zing Database – Distributed Key-Value Databasezingopen
 
Zing Database
Zing Database Zing Database
Zing Database Long Dao
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)Chinmay Kulkarni
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkAndy Petrella
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx政宏 张
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of Viewaragozin
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in SparkDatabricks
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsRavindra kumar
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
Architectural anti-patterns for data handling
Architectural anti-patterns for data handlingArchitectural anti-patterns for data handling
Architectural anti-patterns for data handlingGleicon Moraes
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Semelhante a Understanding Big Data Architectures (20)

GIDS 2016 Understanding and Building No SQLs
GIDS 2016 Understanding and Building No SQLsGIDS 2016 Understanding and Building No SQLs
GIDS 2016 Understanding and Building No SQLs
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
 
Zing Database – Distributed Key-Value Database
Zing Database – Distributed Key-Value DatabaseZing Database – Distributed Key-Value Database
Zing Database – Distributed Key-Value Database
 
Zing Database
Zing Database Zing Database
Zing Database
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of View
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Architectural anti-patterns for data handling
Architectural anti-patterns for data handlingArchitectural anti-patterns for data handling
Architectural anti-patterns for data handling
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Mais de Hyderabad Scalability Meetup (10)

Turbo charging v8 engine
Turbo charging v8 engineTurbo charging v8 engine
Turbo charging v8 engine
 
Git internals
Git internalsGit internals
Git internals
 
Nlp
NlpNlp
Nlp
 
Internet of Things - GeekNight - Hyderabad
Internet of Things - GeekNight - HyderabadInternet of Things - GeekNight - Hyderabad
Internet of Things - GeekNight - Hyderabad
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
 
Java 8 Lambda Expressions
Java 8 Lambda ExpressionsJava 8 Lambda Expressions
Java 8 Lambda Expressions
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability Meetup
 
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability MeetupApache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
 
Docker by demo
Docker by demoDocker by demo
Docker by demo
 

Último

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Understanding Big Data Architectures