SlideShare uma empresa Scribd logo
1 de 30
SELA DEVELOPER PRACTICE
December 15-19, 2013

Manu Cohen-Yashar

The Cloud, Big Data and
NoSQL

© Copyright SELA software & Education Labs Ltd. | 14-18 Baruch Hirsch St Bnei Brak, 51202 Israel | www.selagroup.com
Agenda
What is the cloud
Data boom
No SQL
Big Data
Cloud Distributions
What’s next
Make sense of : Cloud , Big Data and No SQL
How they fit together

Make money !!!
What is the cloud
Cloud Computing is an Idea …
Infrastructure is provisioned by a cloud
provider.
Automatic Scale.
Elasticity. Pay as you use.
Availability.
Simple, Automatic, Economic.
Type of Clouds
IAAS
PAAS
SAAS
and more…
Identity As A Service
Connectivity As A Service

Storage As A Service
Lots of Data
Data is doubles every 18 month
Pictures
Web site
emails
Sensors
Geo Information
Financial Information
Science
Art
. . . (Infinite list)
No Limits
With the cloud it is now possible to mount any
size if cluster and conduct any computation in
any scale.
The one who will make sense of all available
data will rule the world.

The conclusion:
Use the cloud to analyze large scale of data.
Lets Talk about data
When we think of data we think of …
Data has many forms
Yet data comes in many forms and shapes
Graphs

Time
Series

Documents

Blobs

Geo
Sensors

Structured
Unstructured

Web
No Relational
Not all types of data fit well into the relational
world.
Not all data use cases fit well into the ACID
convention
The relational model does not scale very good
Difficult to distribute
Difficult to replicate
The CAP Theory
During a network partition, a distributed system must choose
either Consistency or Availability.

Sharded
NoSQL

RDBMS

Replicated
NoSQL
NO SQL
Large family of databases
No Schema
No relations enforced
Designed for high scale and distribution

Types of NO SQL DB
Key Value
Wide Columns
Documents
Graph
Motivation for NO SQL
Large Scale and Distribution
Simplicity
Low cost
Good fit with the data model
Volume, Velocity and Variety
Important

There is no one NO SQL solution for all
use cases
There are over than 150 possible offerings…
The Cloud and NO SQL
All Cloud Providers have NO SQL solutions
Azure Tables
Google Big Table
Amazon DynamoDB

NO SQL Databases are deployed on a cluster
There are large number of cloud hosting offerings for
no-sql clusters
MongoHQ (MongoDB)
Cassandra on Google Compute engine
Many more
Example – Mongo in Azure
Big Data
What is Big?
“Big” cannot fit on a single machine.

Conclusion:
Big data has to be distributed.
Types of Big Data Processing
Query
General Analysis
Classification
Recommendation
Clustering
Auditing and monitoring
More…
Challenges
Develop a parallel algorithm
Reduce the network traffic -> bring compute to
data
Monitor and manage large number of parallel
tasks
Survive failures
Performance
Linear scale
Batch Processing VS Operational
Intelligence
Batch Processing
Work on existing data
Provide results within minutes

Operational Intelligence
Work on stream of data
Provide real-time results
Distributed File System
No one server can store Big Data files
Distribute files across cluster
Failure is part of the game
Similar API to traditional File Systems
Examples:
HDFS
GFS
Cassandra FS
Mongo FS
Hadoop
Big Data Analysis Platform
Batch Processing
Brings Compute tasks to data nodes
Parallel Processing using Map-Reduce
Open Source
Huge eco system
Hadoop Eco System
Writing a valuable Map-Reduce job for Hadoop
is not simple
Many open source projects provide
abstractions
Pig
Hive
HBase
Sqoop
Mahout
ZooKeeper
More
Hadoop on the Cloud
Hadoop runs on a cluster
You can use a cluster as a service on major
cloud offerings
Storm
Real-Time big data analytics
Process streams of data
Can be used with any programming language
Wide integration with data sources
Check your schema
Be open to use NO-SQL data stores
Identify your use-case and find the right
database for you
Create a simple POC
Look for Big Data
Ask yourself: What can I gain from big data?
How the new data or analysis scope can enhance
your existing set of capabilities?
What additional opportunities for intervention or
processes optimisation does it present?

Identify your use case and find the right product
and data model.
Look for web distributions and create a simple
POC
Questions

Mais conteúdo relacionado

Mais procurados

2012/8/1 夏合宿 発表資料
2012/8/1 夏合宿 発表資料2012/8/1 夏合宿 発表資料
2012/8/1 夏合宿 発表資料
Keiichi Maeda
 
Iasi code camp 20 april 2013 cloud9
Iasi code camp 20 april 2013 cloud9Iasi code camp 20 april 2013 cloud9
Iasi code camp 20 april 2013 cloud9
Codecamp Romania
 

Mais procurados (20)

Introduction to Data Analysis, Storage & Processing Solutions
Introduction to Data Analysis, Storage & Processing SolutionsIntroduction to Data Analysis, Storage & Processing Solutions
Introduction to Data Analysis, Storage & Processing Solutions
 
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsWebinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud Analytics
 
Cloud and Big Data trends
Cloud and Big Data trendsCloud and Big Data trends
Cloud and Big Data trends
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
 
Business Insight
Business InsightBusiness Insight
Business Insight
 
2012/8/1 夏合宿 発表資料
2012/8/1 夏合宿 発表資料2012/8/1 夏合宿 発表資料
2012/8/1 夏合宿 発表資料
 
Introduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesIntroduction to Big Data using AWS Services
Introduction to Big Data using AWS Services
 
Google cloud
Google cloudGoogle cloud
Google cloud
 
How I built a ml human hybrid workflow using computer vision - Amir Shitrit
How I built a ml human hybrid workflow using computer vision - Amir ShitritHow I built a ml human hybrid workflow using computer vision - Amir Shitrit
How I built a ml human hybrid workflow using computer vision - Amir Shitrit
 
Introduction to Big Data
Introduction  to Big DataIntroduction  to Big Data
Introduction to Big Data
 
Iasi code camp 20 april 2013 cloud9
Iasi code camp 20 april 2013 cloud9Iasi code camp 20 april 2013 cloud9
Iasi code camp 20 april 2013 cloud9
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 
Cloud Services for Repositories
Cloud Services for RepositoriesCloud Services for Repositories
Cloud Services for Repositories
 
SnapLogic Live: Big Data Integration
SnapLogic Live: Big Data IntegrationSnapLogic Live: Big Data Integration
SnapLogic Live: Big Data Integration
 
BigQuery for the Big Data win
BigQuery for the Big Data winBigQuery for the Big Data win
BigQuery for the Big Data win
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
 
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZoneStartup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
 
About Pragmatic Works
About Pragmatic WorksAbout Pragmatic Works
About Pragmatic Works
 
Real-Time Analytics with MemSQL and Spark
Real-Time Analytics with MemSQL and SparkReal-Time Analytics with MemSQL and Spark
Real-Time Analytics with MemSQL and Spark
 

Destaque

Destaque (7)

Introduction to cloud computing and big data - part1
Introduction to cloud computing and big data - part1Introduction to cloud computing and big data - part1
Introduction to cloud computing and big data - part1
 
Big-Data Computing on the Cloud
Big-Data Computing on the CloudBig-Data Computing on the Cloud
Big-Data Computing on the Cloud
 
Internet of Things = More Big Data: How Will Cloud Computing Evolve?
Internet of Things = More Big Data: How Will Cloud Computing Evolve?Internet of Things = More Big Data: How Will Cloud Computing Evolve?
Internet of Things = More Big Data: How Will Cloud Computing Evolve?
 
Relationship between cloud computing and big data
Relationship between cloud computing and big dataRelationship between cloud computing and big data
Relationship between cloud computing and big data
 
big data and cloud computing
big data and cloud computingbig data and cloud computing
big data and cloud computing
 
Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the Cloud
 
Cloud Computing and Big Data
Cloud Computing and Big DataCloud Computing and Big Data
Cloud Computing and Big Data
 

Semelhante a Big data, Cloud Computing and No SQL

Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
kalai75
 

Semelhante a Big data, Cloud Computing and No SQL (20)

Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptx
 
Introduction to cloud computing - za garage talks
Introduction to cloud computing -  za garage talksIntroduction to cloud computing -  za garage talks
Introduction to cloud computing - za garage talks
 
Computer project
Computer projectComputer project
Computer project
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
2017 Cloud Computing Primer
2017 Cloud Computing Primer2017 Cloud Computing Primer
2017 Cloud Computing Primer
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
Above the cloud joarder kamal
Above the cloud   joarder kamalAbove the cloud   joarder kamal
Above the cloud joarder kamal
 
Big data
Big dataBig data
Big data
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Big Data Basic Concepts | Presented in 2014
Big Data Basic Concepts  | Presented in 2014Big Data Basic Concepts  | Presented in 2014
Big Data Basic Concepts | Presented in 2014
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
Improve your Tech Quotient
Improve your Tech QuotientImprove your Tech Quotient
Improve your Tech Quotient
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Big data, Cloud Computing and No SQL

  • 1. SELA DEVELOPER PRACTICE December 15-19, 2013 Manu Cohen-Yashar The Cloud, Big Data and NoSQL © Copyright SELA software & Education Labs Ltd. | 14-18 Baruch Hirsch St Bnei Brak, 51202 Israel | www.selagroup.com
  • 2. Agenda What is the cloud Data boom No SQL Big Data Cloud Distributions What’s next
  • 3.
  • 4. Make sense of : Cloud , Big Data and No SQL How they fit together Make money !!!
  • 5. What is the cloud Cloud Computing is an Idea … Infrastructure is provisioned by a cloud provider. Automatic Scale. Elasticity. Pay as you use. Availability. Simple, Automatic, Economic.
  • 6. Type of Clouds IAAS PAAS SAAS and more… Identity As A Service Connectivity As A Service Storage As A Service
  • 7. Lots of Data Data is doubles every 18 month Pictures Web site emails Sensors Geo Information Financial Information Science Art . . . (Infinite list)
  • 8. No Limits With the cloud it is now possible to mount any size if cluster and conduct any computation in any scale. The one who will make sense of all available data will rule the world. The conclusion: Use the cloud to analyze large scale of data.
  • 9. Lets Talk about data When we think of data we think of …
  • 10. Data has many forms Yet data comes in many forms and shapes Graphs Time Series Documents Blobs Geo Sensors Structured Unstructured Web
  • 11. No Relational Not all types of data fit well into the relational world. Not all data use cases fit well into the ACID convention The relational model does not scale very good Difficult to distribute Difficult to replicate
  • 12. The CAP Theory During a network partition, a distributed system must choose either Consistency or Availability. Sharded NoSQL RDBMS Replicated NoSQL
  • 13. NO SQL Large family of databases No Schema No relations enforced Designed for high scale and distribution Types of NO SQL DB Key Value Wide Columns Documents Graph
  • 14. Motivation for NO SQL Large Scale and Distribution Simplicity Low cost Good fit with the data model Volume, Velocity and Variety
  • 15. Important There is no one NO SQL solution for all use cases There are over than 150 possible offerings…
  • 16. The Cloud and NO SQL All Cloud Providers have NO SQL solutions Azure Tables Google Big Table Amazon DynamoDB NO SQL Databases are deployed on a cluster There are large number of cloud hosting offerings for no-sql clusters MongoHQ (MongoDB) Cassandra on Google Compute engine Many more
  • 17. Example – Mongo in Azure
  • 18. Big Data What is Big? “Big” cannot fit on a single machine. Conclusion: Big data has to be distributed.
  • 19. Types of Big Data Processing Query General Analysis Classification Recommendation Clustering Auditing and monitoring More…
  • 20. Challenges Develop a parallel algorithm Reduce the network traffic -> bring compute to data Monitor and manage large number of parallel tasks Survive failures Performance Linear scale
  • 21. Batch Processing VS Operational Intelligence Batch Processing Work on existing data Provide results within minutes Operational Intelligence Work on stream of data Provide real-time results
  • 22. Distributed File System No one server can store Big Data files Distribute files across cluster Failure is part of the game Similar API to traditional File Systems Examples: HDFS GFS Cassandra FS Mongo FS
  • 23. Hadoop Big Data Analysis Platform Batch Processing Brings Compute tasks to data nodes Parallel Processing using Map-Reduce Open Source Huge eco system
  • 24. Hadoop Eco System Writing a valuable Map-Reduce job for Hadoop is not simple Many open source projects provide abstractions Pig Hive HBase Sqoop Mahout ZooKeeper More
  • 25. Hadoop on the Cloud Hadoop runs on a cluster You can use a cluster as a service on major cloud offerings
  • 26. Storm Real-Time big data analytics Process streams of data Can be used with any programming language Wide integration with data sources
  • 27.
  • 28. Check your schema Be open to use NO-SQL data stores Identify your use-case and find the right database for you Create a simple POC
  • 29. Look for Big Data Ask yourself: What can I gain from big data? How the new data or analysis scope can enhance your existing set of capabilities? What additional opportunities for intervention or processes optimisation does it present? Identify your use case and find the right product and data model. Look for web distributions and create a simple POC

Notas do Editor

  1. Consistency: A read sees all previously completed writes.Availability: Reads and writes always succeed.Partition tolerance: Guaranteed properties are maintained even when network failures prevent some machines from communicating with others.https://foundationdb.com/white-papers/the-cap-theorem/The basic idea is that if a client writes to one side of a partition, any reads that go to the other side of that partition can't possibly know about the most recent write. Now you're faced with a choice: do you respond to the reads with potentially stale information, or do you wait (potentially forever) to hear from the other side of the partition and compromise availability?