SlideShare uma empresa Scribd logo
1 de 32
Big Data ABC @andrefaria
Concepts
Relational 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Key Value Stores 
Riak, Mecached, Berkley DB, HamsterDB, Couchbase, Voldemort, DynamoDB 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Document Stores 
think about document databases as key-value stores where the value is examinable 
rich query language + indexes 
MongoDB, CouchDB , Terrastore, OrientDB, RavenDB 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Column Family Stores 
Each column family can be compared to a container of rows in an RDBMS table where the key identifies the row 
and the row consists of multiple columns. The difference is that various rows do not have to have the same columns, 
and columns can be added to any row at any time without having to add it to other rows. 
Cassandra, HBase, Hypertable 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Graph Databases 
Neo4J, Infinite Graph, OrientDB, FlockDB 
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
Map Reduce
Sharding 
http://dbshards.com/articles/database-sharding-configuration/
Document-oriented system 
Large Data Sets 
Records similar to JSON 
Automatic sharding and MapReduce 
Queries are written in JavaScript
Document-oriented system 
JavaScript Interface 
Multi-version concurrency control approach 
Client side needs to handle clashes on writes 
No good built-in method for horizontal scalability 
(but there are external solutions like BigCouch, Lounge, and Pillow)
Originally an internal Facebook project 
Keyspaces and column families 
Similar the data model used by BigTable 
Data is sharded and balanced automatically
Keeps the entire Database in RAM 
Its values can be complex data structures
BIG TABLE 
Structure: tables, row keys, column families, column 
names, timestamps, and cell values 
Designed to handle very large data loads by running on 
big clusters of commodity hardware 
Uses GFS as its underlying storage
Open source clone of Big Table 
Same structure of Big Table 
Uses HDFS instead of GFS
Another open source BigTable clone written in C++ 
Focus in High Performance
DynamoDB 
Key Value System 
Large Distributed Clusters 
Versioning
AWS S3 
HTTP 
Blobs
Inspired by AWS Dynamo DB 
OpenSource and Commercial Versions 
Key Value System 
Large Distributed Clusters 
Queries in ErLang or JavaScript 
Consistent hashing and a gossip protocol to avoid centralized index server
Takes care of running your code across a cluster of machines. 
- chunking up the input data 
- sending it to each machine 
- running your code on each chunk 
- checking that the code ran 
- passing any results next stage 
- sorting between stages 
- sending each chunk of that sorted data to the right machine 
- writing debugging information on each job’s progress
With Hive, you can program 
Hadoop jobs using a SQL-like 
language HiveQL.
Apache Pig 
A procedural data 
processing language 
designed for Hadoop 
Provides a set of functions 
that help with common 
data processing problems
PigPen is map-reduce for Clojure, or distributed Clojure. 
It compiles to Apache Pig, but you don't need to know 
much about Pig to use it.
Hadoop for Logs 
The Flume project is designed to make the data 
gathering process easy and scalable, by running agents 
on the source machines that pass the data updates to 
collectors, which then aggregate them into large chunks 
that can be efficiently written as HDFS files.
The R project is both a 
specialized language and a 
toolkit of modules aimed at 
anyone working with 
statistics.
Lucene is a Java library 
that handles indexing and 
searching large collections 
of documents, and Solr is 
an application that uses 
the library to build a 
search engine server.
Mahout is an open source 
framework that can run 
common machine learning 
algorithms on massive 
datasets. 
The framework makes it 
easy to use analysis 
techniques to implement 
features such as “People 
who bought this also 
bought” recommendation 
engine on your own site.
ZooKeeper 
Coordinates work 
and configuration of 
different Clusters
Serialization 
As you pass data between systems and you need to store it in files at some points
JSON 
BSON (Binary JSON) 
Apache Thrift (predefine structure) 
Apache Avro (predefine structure)
blog.andrefaria.com 
datavisionary.net 
andrefaria.com @andrefaria

Mais conteúdo relacionado

Mais procurados

Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisTrieu Nguyen
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction葵慶 李
 
Doug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemDoug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemCloudera, Inc.
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache HadoopSufi Nawaz
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...AyeeshaParveen
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones WebSantiago Coffey
 
Data science-toolchain
Data science-toolchainData science-toolchain
Data science-toolchainJie-Han Chen
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)SahilRaina21
 
Open source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applicationsOpen source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applicationsSoftwareMill
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabsSiva Sankar
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 

Mais procurados (20)

Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Cloud Strategy Architecture for multi country deployment
Cloud Strategy Architecture for multi country deploymentCloud Strategy Architecture for multi country deployment
Cloud Strategy Architecture for multi country deployment
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Doug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemDoug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop Ecosystem
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones Web
 
Data science-toolchain
Data science-toolchainData science-toolchain
Data science-toolchain
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
 
Mongo db
Mongo dbMongo db
Mongo db
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Open source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applicationsOpen source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applications
 
Hadoop and friends
Hadoop and friendsHadoop and friends
Hadoop and friends
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 

Destaque

Explosao de dados e o conceito de análise de dados relacionados para geração ...
Explosao de dados e o conceito de análise de dados relacionados para geração ...Explosao de dados e o conceito de análise de dados relacionados para geração ...
Explosao de dados e o conceito de análise de dados relacionados para geração ...Felipe Pereira
 
Ecossistema Hadoop no Magazine Luiza
Ecossistema Hadoop no Magazine LuizaEcossistema Hadoop no Magazine Luiza
Ecossistema Hadoop no Magazine LuizaNelson Forte
 
Big Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesBig Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesCRISIL Limited
 
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)Celso Silvati
 
Innovation - Think outside the box
Innovation - Think outside the boxInnovation - Think outside the box
Innovation - Think outside the boxAndré Faria Gomes
 
ABC Algorithm.
ABC Algorithm.ABC Algorithm.
ABC Algorithm.N Vinayak
 
Objetividade: A Virtude Esquecida
Objetividade: A Virtude EsquecidaObjetividade: A Virtude Esquecida
Objetividade: A Virtude EsquecidaAndré Faria Gomes
 
Introduction to Getting Things Done (GTD)
Introduction to Getting Things Done (GTD)Introduction to Getting Things Done (GTD)
Introduction to Getting Things Done (GTD)André Faria Gomes
 
Big Data - O que é o hadoop, map reduce, hdfs e hive
Big Data - O que é o hadoop, map reduce, hdfs e hiveBig Data - O que é o hadoop, map reduce, hdfs e hive
Big Data - O que é o hadoop, map reduce, hdfs e hiveFlavio Fonte, PMP, ITIL
 
Lições de empreendedorismo com Flávio Augusto
Lições de empreendedorismo com Flávio AugustoLições de empreendedorismo com Flávio Augusto
Lições de empreendedorismo com Flávio AugustoAndré Faria Gomes
 
The Secret To Success Is Your Mindset
The Secret To Success Is Your MindsetThe Secret To Success Is Your Mindset
The Secret To Success Is Your MindsetJustin Bryant
 
Apresentacao+dale+carnegie
Apresentacao+dale+carnegieApresentacao+dale+carnegie
Apresentacao+dale+carnegiempedroso2011
 
Capital de Giro e Ciclo Financeiro
Capital de Giro e Ciclo FinanceiroCapital de Giro e Ciclo Financeiro
Capital de Giro e Ciclo FinanceiroAndré Faria Gomes
 
Os 7 hábitos das pessoas altamente eficazes
Os 7 hábitos das pessoas altamente eficazesOs 7 hábitos das pessoas altamente eficazes
Os 7 hábitos das pessoas altamente eficazesAndré Faria Gomes
 
Success mindset slideshare
Success mindset slideshareSuccess mindset slideshare
Success mindset slideshareW5 Coaching
 
Felipe Faias - Como fazer amigos e influenciar pessoas
Felipe Faias - Como fazer amigos e influenciar pessoasFelipe Faias - Como fazer amigos e influenciar pessoas
Felipe Faias - Como fazer amigos e influenciar pessoasFelipe Faias
 

Destaque (20)

Explosao de dados e o conceito de análise de dados relacionados para geração ...
Explosao de dados e o conceito de análise de dados relacionados para geração ...Explosao de dados e o conceito de análise de dados relacionados para geração ...
Explosao de dados e o conceito de análise de dados relacionados para geração ...
 
Ecossistema Hadoop no Magazine Luiza
Ecossistema Hadoop no Magazine LuizaEcossistema Hadoop no Magazine Luiza
Ecossistema Hadoop no Magazine Luiza
 
Big Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesBig Data’s Big Impact on Businesses
Big Data’s Big Impact on Businesses
 
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)
Resumo: O Sucesso Tem Fórmula? (by Celso Silvati)
 
Innovation - Think outside the box
Innovation - Think outside the boxInnovation - Think outside the box
Innovation - Think outside the box
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
ABC Algorithm.
ABC Algorithm.ABC Algorithm.
ABC Algorithm.
 
Real options
Real optionsReal options
Real options
 
Objetividade: A Virtude Esquecida
Objetividade: A Virtude EsquecidaObjetividade: A Virtude Esquecida
Objetividade: A Virtude Esquecida
 
Introduction to Getting Things Done (GTD)
Introduction to Getting Things Done (GTD)Introduction to Getting Things Done (GTD)
Introduction to Getting Things Done (GTD)
 
Big Data - O que é o hadoop, map reduce, hdfs e hive
Big Data - O que é o hadoop, map reduce, hdfs e hiveBig Data - O que é o hadoop, map reduce, hdfs e hive
Big Data - O que é o hadoop, map reduce, hdfs e hive
 
Lições de empreendedorismo com Flávio Augusto
Lições de empreendedorismo com Flávio AugustoLições de empreendedorismo com Flávio Augusto
Lições de empreendedorismo com Flávio Augusto
 
The Secret To Success Is Your Mindset
The Secret To Success Is Your MindsetThe Secret To Success Is Your Mindset
The Secret To Success Is Your Mindset
 
Followership
FollowershipFollowership
Followership
 
Pensando Rápido e Devagar
Pensando Rápido e DevagarPensando Rápido e Devagar
Pensando Rápido e Devagar
 
Apresentacao+dale+carnegie
Apresentacao+dale+carnegieApresentacao+dale+carnegie
Apresentacao+dale+carnegie
 
Capital de Giro e Ciclo Financeiro
Capital de Giro e Ciclo FinanceiroCapital de Giro e Ciclo Financeiro
Capital de Giro e Ciclo Financeiro
 
Os 7 hábitos das pessoas altamente eficazes
Os 7 hábitos das pessoas altamente eficazesOs 7 hábitos das pessoas altamente eficazes
Os 7 hábitos das pessoas altamente eficazes
 
Success mindset slideshare
Success mindset slideshareSuccess mindset slideshare
Success mindset slideshare
 
Felipe Faias - Como fazer amigos e influenciar pessoas
Felipe Faias - Como fazer amigos e influenciar pessoasFelipe Faias - Como fazer amigos e influenciar pessoas
Felipe Faias - Como fazer amigos e influenciar pessoas
 

Semelhante a The ABC of Big Data

Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in AzureMostafa
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in AzureMostafa
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellKhalid Imran
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network
 
Hadoop and object stores can we do it better
Hadoop and object stores  can we do it betterHadoop and object stores  can we do it better
Hadoop and object stores can we do it bettergvernik
 
Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?gvernik
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsgagravarr
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxBhavanaHotchandani
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 

Semelhante a The ABC of Big Data (20)

In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in Azure
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
Hadoop and object stores can we do it better
Hadoop and object stores  can we do it betterHadoop and object stores  can we do it better
Hadoop and object stores can we do it better
 
Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 

Mais de André Faria Gomes

Meetup Escale - Gestão para Equipes de Alta Performance
Meetup Escale - Gestão para Equipes de Alta PerformanceMeetup Escale - Gestão para Equipes de Alta Performance
Meetup Escale - Gestão para Equipes de Alta PerformanceAndré Faria Gomes
 
Protagonistas da inovação - Como criar e gerir os negócios do futuro
Protagonistas da inovação - Como criar e gerir os negócios do futuroProtagonistas da inovação - Como criar e gerir os negócios do futuro
Protagonistas da inovação - Como criar e gerir os negócios do futuroAndré Faria Gomes
 
A Mobilidade como Propulsor da Transformação Digital
A Mobilidade como Propulsor da Transformação DigitalA Mobilidade como Propulsor da Transformação Digital
A Mobilidade como Propulsor da Transformação DigitalAndré Faria Gomes
 
Além da Agilidade 2019 - KickOff Wow
Além da Agilidade 2019 - KickOff WowAlém da Agilidade 2019 - KickOff Wow
Além da Agilidade 2019 - KickOff WowAndré Faria Gomes
 
Modern systems architectures: Uber, Lyft, Cabify
Modern systems architectures: Uber, Lyft, CabifyModern systems architectures: Uber, Lyft, Cabify
Modern systems architectures: Uber, Lyft, CabifyAndré Faria Gomes
 
Principles and Radical Transparency - Lessons Learned from Ray Dalio
Principles and Radical Transparency - Lessons Learned from Ray DalioPrinciples and Radical Transparency - Lessons Learned from Ray Dalio
Principles and Radical Transparency - Lessons Learned from Ray DalioAndré Faria Gomes
 
Bluesoft @ AWS re:Invent 2017 + AWS 101
Bluesoft @ AWS re:Invent 2017 + AWS 101Bluesoft @ AWS re:Invent 2017 + AWS 101
Bluesoft @ AWS re:Invent 2017 + AWS 101André Faria Gomes
 
Boas Práticas da Rede Supermercadista Wegmans
Boas Práticas da Rede Supermercadista WegmansBoas Práticas da Rede Supermercadista Wegmans
Boas Práticas da Rede Supermercadista WegmansAndré Faria Gomes
 
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...André Faria Gomes
 
Change management - Kotter’s eight-step model
Change management - Kotter’s eight-step model Change management - Kotter’s eight-step model
Change management - Kotter’s eight-step model André Faria Gomes
 
Palestra na Uninove sobre Agilidade
Palestra na Uninove sobre AgilidadePalestra na Uninove sobre Agilidade
Palestra na Uninove sobre AgilidadeAndré Faria Gomes
 
What happened to Google Reader?
What happened to Google Reader?What happened to Google Reader?
What happened to Google Reader?André Faria Gomes
 
Gestão Ágil com Management 3.0
Gestão Ágil com Management 3.0Gestão Ágil com Management 3.0
Gestão Ágil com Management 3.0André Faria Gomes
 
Lições aprendidas em 10 anos de agilidade
Lições aprendidas em 10 anos de agilidadeLições aprendidas em 10 anos de agilidade
Lições aprendidas em 10 anos de agilidadeAndré Faria Gomes
 
Big Ideias about Spotify Culture
Big Ideias about Spotify CultureBig Ideias about Spotify Culture
Big Ideias about Spotify CultureAndré Faria Gomes
 

Mais de André Faria Gomes (20)

Meetup Escale - Gestão para Equipes de Alta Performance
Meetup Escale - Gestão para Equipes de Alta PerformanceMeetup Escale - Gestão para Equipes de Alta Performance
Meetup Escale - Gestão para Equipes de Alta Performance
 
Protagonistas da inovação - Como criar e gerir os negócios do futuro
Protagonistas da inovação - Como criar e gerir os negócios do futuroProtagonistas da inovação - Como criar e gerir os negócios do futuro
Protagonistas da inovação - Como criar e gerir os negócios do futuro
 
A Mobilidade como Propulsor da Transformação Digital
A Mobilidade como Propulsor da Transformação DigitalA Mobilidade como Propulsor da Transformação Digital
A Mobilidade como Propulsor da Transformação Digital
 
Além da Agilidade 2019 - KickOff Wow
Além da Agilidade 2019 - KickOff WowAlém da Agilidade 2019 - KickOff Wow
Além da Agilidade 2019 - KickOff Wow
 
Modern systems architectures: Uber, Lyft, Cabify
Modern systems architectures: Uber, Lyft, CabifyModern systems architectures: Uber, Lyft, Cabify
Modern systems architectures: Uber, Lyft, Cabify
 
Breaking the monolith
Breaking the monolithBreaking the monolith
Breaking the monolith
 
Agilidade - APAS
Agilidade - APASAgilidade - APAS
Agilidade - APAS
 
Principles and Radical Transparency - Lessons Learned from Ray Dalio
Principles and Radical Transparency - Lessons Learned from Ray DalioPrinciples and Radical Transparency - Lessons Learned from Ray Dalio
Principles and Radical Transparency - Lessons Learned from Ray Dalio
 
Bluesoft @ AWS re:Invent 2017 + AWS 101
Bluesoft @ AWS re:Invent 2017 + AWS 101Bluesoft @ AWS re:Invent 2017 + AWS 101
Bluesoft @ AWS re:Invent 2017 + AWS 101
 
Boas Práticas da Rede Supermercadista Wegmans
Boas Práticas da Rede Supermercadista WegmansBoas Práticas da Rede Supermercadista Wegmans
Boas Práticas da Rede Supermercadista Wegmans
 
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...
Boas Práticas para Supermercadistas inspiradas no Whole Foods, Sprouts Marke...
 
Change management - Kotter’s eight-step model
Change management - Kotter’s eight-step model Change management - Kotter’s eight-step model
Change management - Kotter’s eight-step model
 
Palestra na Uninove sobre Agilidade
Palestra na Uninove sobre AgilidadePalestra na Uninove sobre Agilidade
Palestra na Uninove sobre Agilidade
 
What happened to Google Reader?
What happened to Google Reader?What happened to Google Reader?
What happened to Google Reader?
 
Gestão Ágil com Management 3.0
Gestão Ágil com Management 3.0Gestão Ágil com Management 3.0
Gestão Ágil com Management 3.0
 
Lições aprendidas em 10 anos de agilidade
Lições aprendidas em 10 anos de agilidadeLições aprendidas em 10 anos de agilidade
Lições aprendidas em 10 anos de agilidade
 
Bematech IFRS
Bematech IFRSBematech IFRS
Bematech IFRS
 
Tips for SaaS Sales Team
Tips for SaaS Sales TeamTips for SaaS Sales Team
Tips for SaaS Sales Team
 
Atendimento Campeão
Atendimento CampeãoAtendimento Campeão
Atendimento Campeão
 
Big Ideias about Spotify Culture
Big Ideias about Spotify CultureBig Ideias about Spotify Culture
Big Ideias about Spotify Culture
 

Último

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 

Último (20)

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 

The ABC of Big Data

  • 1. Big Data ABC @andrefaria
  • 2.
  • 5. Key Value Stores Riak, Mecached, Berkley DB, HamsterDB, Couchbase, Voldemort, DynamoDB http://www.thoughtworks.com/insights/blog/nosql-databases-overview
  • 6. Document Stores think about document databases as key-value stores where the value is examinable rich query language + indexes MongoDB, CouchDB , Terrastore, OrientDB, RavenDB http://www.thoughtworks.com/insights/blog/nosql-databases-overview
  • 7. Column Family Stores Each column family can be compared to a container of rows in an RDBMS table where the key identifies the row and the row consists of multiple columns. The difference is that various rows do not have to have the same columns, and columns can be added to any row at any time without having to add it to other rows. Cassandra, HBase, Hypertable http://www.thoughtworks.com/insights/blog/nosql-databases-overview
  • 8. Graph Databases Neo4J, Infinite Graph, OrientDB, FlockDB http://www.thoughtworks.com/insights/blog/nosql-databases-overview
  • 11. Document-oriented system Large Data Sets Records similar to JSON Automatic sharding and MapReduce Queries are written in JavaScript
  • 12. Document-oriented system JavaScript Interface Multi-version concurrency control approach Client side needs to handle clashes on writes No good built-in method for horizontal scalability (but there are external solutions like BigCouch, Lounge, and Pillow)
  • 13. Originally an internal Facebook project Keyspaces and column families Similar the data model used by BigTable Data is sharded and balanced automatically
  • 14. Keeps the entire Database in RAM Its values can be complex data structures
  • 15. BIG TABLE Structure: tables, row keys, column families, column names, timestamps, and cell values Designed to handle very large data loads by running on big clusters of commodity hardware Uses GFS as its underlying storage
  • 16. Open source clone of Big Table Same structure of Big Table Uses HDFS instead of GFS
  • 17. Another open source BigTable clone written in C++ Focus in High Performance
  • 18. DynamoDB Key Value System Large Distributed Clusters Versioning
  • 19. AWS S3 HTTP Blobs
  • 20. Inspired by AWS Dynamo DB OpenSource and Commercial Versions Key Value System Large Distributed Clusters Queries in ErLang or JavaScript Consistent hashing and a gossip protocol to avoid centralized index server
  • 21. Takes care of running your code across a cluster of machines. - chunking up the input data - sending it to each machine - running your code on each chunk - checking that the code ran - passing any results next stage - sorting between stages - sending each chunk of that sorted data to the right machine - writing debugging information on each job’s progress
  • 22. With Hive, you can program Hadoop jobs using a SQL-like language HiveQL.
  • 23. Apache Pig A procedural data processing language designed for Hadoop Provides a set of functions that help with common data processing problems
  • 24. PigPen is map-reduce for Clojure, or distributed Clojure. It compiles to Apache Pig, but you don't need to know much about Pig to use it.
  • 25. Hadoop for Logs The Flume project is designed to make the data gathering process easy and scalable, by running agents on the source machines that pass the data updates to collectors, which then aggregate them into large chunks that can be efficiently written as HDFS files.
  • 26. The R project is both a specialized language and a toolkit of modules aimed at anyone working with statistics.
  • 27. Lucene is a Java library that handles indexing and searching large collections of documents, and Solr is an application that uses the library to build a search engine server.
  • 28. Mahout is an open source framework that can run common machine learning algorithms on massive datasets. The framework makes it easy to use analysis techniques to implement features such as “People who bought this also bought” recommendation engine on your own site.
  • 29. ZooKeeper Coordinates work and configuration of different Clusters
  • 30. Serialization As you pass data between systems and you need to store it in files at some points
  • 31. JSON BSON (Binary JSON) Apache Thrift (predefine structure) Apache Avro (predefine structure)