This document provides an overview and summary of different data storage systems, including RDBMS, Hadoop, Spark and NoSQL. It discusses key concepts and dates related to each technology. RDBMS organizes data into tables with rows and columns, while Hadoop uses MapReduce on HDFS for distributed processing of large datasets. Spark improves on MapReduce by using resilient distributed datasets (RDDs) and caching data in memory for faster iterative jobs. NoSQL systems relax ACID properties for horizontal scalability and high availability.
7. #DevoxxMA @zouheircadi
RDBMS
RelaYonal Database Management Systems
were invented to let you use one set of data
in mulYple ways, including ways that are
unforeseen at the Yme the database is built
and the 1st applicaYons are wrien.
Curt Monash, analyst/blogger
21. #DevoxxMA @zouheircadi
CAP
• Consistency
• Every read receives the most recent write or an error
• Availability
• Every request receives a response , without garantee that
it contains the most recent version
• ParYYon tolerance
• The system con`nue to operate despite arbitrary
par``onning due to network failure
• If allowed, you might sacrifice consistency
• If not, you might sacrifice availability
• NOSQL may sacrifice consistency
hWps://en.wikipedia.org/wiki/CAP_theorem
25. #DevoxxMA @zouheircadi
Key dates
• 2003 octobre : GFS paper released
• 2004 décembre : MapReduce Simplified Data
processing on large clusters
• 2006 janvier : CréaYon Hadoop
• 2006 octobre : Cluster Hadoop de 600machine
chez Yahoo
• 2007 avril: Cluster Hadoop de 1000 machine
chez Yahoo
hWps://en.wikipedia.org/wiki/Apache_Hadoop
43. #DevoxxMA @zouheircadi
Hadoop conclusion
• Map-Reduce has served a great purpose,
though: many, many companies, research
labs and individuals are successfully
bringing Map-Reduce to bear on problems
to which it is suited: brute-force processing
with an opYonal aggregaYon.
hWp://the-paper-trail.org/blog/the-elephant-was-a-trojan-horse-on-the-death-of-map-reduce-at-google/
45. #DevoxxMA @zouheircadi
Hadoop conclusion
• It’s well known in the industry that more
than 10 years ago Google invented
MapReduce, the technology at the heart of
first-generaYon Hadoop. It’s less well
known that Google moved away from
MapReduce several years ago. Today at its
Google I/O 2014 …
hWps://www.datanami.com/2014/06/25/google-re-imagines-mapreduce-launches-dataflow/
54. #DevoxxMA @zouheircadi
Spark – Transformation : Map
Hello World
This Is Devoxx
Morocco
Held In
Casablanca
hello world
this is devoxx
morocco
held in
casablanca
.map(_toLowerCase)
55. #DevoxxMA @zouheircadi
Spark – Transformation : flatMap
hello
wold
this
is
.flatMap(line=>line.split(«s+»))
hello world
this is devoxx
morocco
held in
casablanca
….
devoxx
60. #DevoxxMA @zouheircadi
All in one
val sc = new SparkContext()
val docs = sc.textFile("hdfs://<path>")
val low = docs.map(line => line.toLowerCase)
val word = low.flatMap(line => line.split("s+"))
val counts = words.map(word => (word,1))
val frequency = counts.reduceByKey(_ + _)
val top = frequency.map(_swap).top(N)
top.forEach(println)
90. #DevoxxMA @zouheircadi
Contraintes à l’utilsation de NoSQL
• TransacYons
• On ne peut pas considérer que passer la résolu`on
des conflits au client soit un progrès.
• Mal nécessaire souvent dicté par le business