SlideShare uma empresa Scribd logo
1 de 29
Shafaq Abdullah
Principal, Zenprise
   MongoDB features and Architecture
   Business Intelligence: Overview
   Model of Data using SQL vs NoSQL
   Concept of MapReduce
   Real-world use-case of Business Analytics
   Scalability
   Conclusions
   Key-value
   Column
   Document-based
   Graph
   Schema less data model
   Adhoc Query
   Scalability
   High Availability
   Speed
consistency    Availability




         Partition
        Tolerance
http://www.mongodb.org
Business + smart information =
                Business Intelligence




  Consists of
   querying,
reporting, and
 analytics for
  businesses
                                    Enable
                                  business to
                                  make smart
                                  decision to
                                   execute
   Modelization of Data in SQL
       A 1-many relation of node (id, value) with
    other nodes related by two different relations


      Node                     Relation

       id                      id_node1
       value
                               id_node2
NoSQL Modelization mapped on Relational
Database Modelization



  Node                   Relation
                         c          c
   id                   _id
   value
                        value 1
                        value2
   Using Complex Type Attributes to Model data



                      Nodes

                 _id
                 valued
                 relations []
                                value 1
                                value 2
                                 …
                                value n
   No join operation required

   Nesting of documents, resulting in
    instantaneous access to retrieve nodes/

   Supporting agile method of programming

   Schema flexible adaptive to changing
    business needs
   MapReduce
    ◦ Programming model for managing large amount of
      data in a parallel fashion

    ◦ Map : Processing of a data list to create key/value
      pairs

    ◦ Reduce: Porcess above pair to create new
      aggregated key/value pairs
map(k1, v1) = list(k2,v2)

 reduce(k1, list(v2)) = list(v3)



List : (a; 2) (a; 4)(b; 4)(b; 2)(a;1)(c;5)

Map: (a;[2, 4, 1]), (b;[4,2]), (c,[5])

Reduce: (a;7), (b;6),(c;5)
   Problem


The task is to find the 2 closest
cities in each country

Given that earth as 2D plane

The distance b/w P1 (x1,y1) and
P2 (x2,y2) is computed as
Square-Root of { (x1-x2)2 + (y1-
y2)2 }
Computations   Groups
create view city_dist as
select c1.CountryID,
       c1.CityId, c1.City,
        c2.CityId as CityId2, c2.City as City2,    dist(c1,c2) as
Dist from cities c1 inner join cities c2
where c1.CountryID = c2.CountryID
and c1.CityId < c2.CityId
       /* Calculate distance between 2 cities only once */
select city_dist.* from
( select CountryID, min(Dist) as MinDist from
city_dist where Dist > 0 /* Avoid cities which
share Latitude & Longitude */ group by
CountryID ) a inner join city_dist on
a.CountryID = city_dist.CountryID and
a.MinDist = city_dist.Dist;
Map   Reduce   Finalize
function MapCode() {
emit(this.CountryID, {
     "data":
     [ { "city": this.City, "lat": this.Latitude, "lon":
   this.Longitude } ]
});
 }
function ReduceCode(key, values) {
 var reduced = {"data":[]};
 for (var i in values) {
      var inter = values[i];
             for (var j in inter.data) {

     reduced.data.push(inter.data[j]);
     }
 } return reduced;
function Finalize(key, reduced) {
   if (reduced.data.length == 1) { return { "message" : "This Country contains only 1 City" };
   }
      var min_dist = 999999999999;
   var city1 = { "name": "" };
   var city2 = { "name": "" };
      var c1; var c2; var d;
   for (var i in reduced.data) {
           for (var j in reduced.data) {
                         if (i>=j) continue;
                         c1 = reduced.data[i];
                         c2 = reduced.data[j];
                         d = Dist(c1,c2)
                         if (d < min_dist && d > 0) {
                                      min_dist = d; city1 = c1; city2 = c2;
                         }
           }
      }
   return {"city1": city1.name, "city2": city2.name, "dist": min_dist};
 }
   Shard!
   For write intensive, increase number of
    shards

   For read intensive, increase number of
    replica-sets within shards

   Best Read performance : Data in Shard
    breadth in memory
   MongoDB has application well-suited for BI

   Schemaless Model allows high-performance

   Replication with Sharding allows Scaling out
   http://www.mongodb.org/
   http://www.jaspersoft.com/
   R. Cattell. Scalable SQL and NoSQL Data
    Stores.http://www.cattell.net/datastores/Dat
    astores.pdf
   C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R.
    Bradski, A. Y.Ng, and K. Olukotun. Map-
    reduce for machine learning on multicore. In
    NIPS, pages 281–288, 2006.
   http://www.mongovue.com/2010/11/03/yet
    -another-mongodb-map-reduce-tutorial/

Mais conteúdo relacionado

Semelhante a MongoDB MapReduce Business Intelligence

Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
coolmirza143
 
Map reduce
Map reduceMap reduce
Map reduce
xydii
 
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Sameer Farooqui
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
Abhishek Singh
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Update
vithakur
 

Semelhante a MongoDB MapReduce Business Intelligence (20)

Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 
Map reduce
Map reduceMap reduce
Map reduce
 
MongoDB Distilled
MongoDB DistilledMongoDB Distilled
MongoDB Distilled
 
Understanding Connected Data through Visualization
Understanding Connected Data through VisualizationUnderstanding Connected Data through Visualization
Understanding Connected Data through Visualization
 
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
 
Scala meetup - Intro to spark
Scala meetup - Intro to sparkScala meetup - Intro to spark
Scala meetup - Intro to spark
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Update
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
UMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases
UMLtoGraphDB: Mapping Conceptual Schemas to Graph DatabasesUMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases
UMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases
 
Big data
Big dataBig data
Big data
 
MongoDB Live Hacking
MongoDB Live HackingMongoDB Live Hacking
MongoDB Live Hacking
 
“ Implimentation of SD Processor Based On CRDC Algorithm ”
“ Implimentation of SD Processor Based On CRDC Algorithm ”“ Implimentation of SD Processor Based On CRDC Algorithm ”
“ Implimentation of SD Processor Based On CRDC Algorithm ”
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel ComputingIRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
 
2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

MongoDB MapReduce Business Intelligence

  • 2. MongoDB features and Architecture  Business Intelligence: Overview  Model of Data using SQL vs NoSQL  Concept of MapReduce  Real-world use-case of Business Analytics  Scalability  Conclusions
  • 3. Key-value  Column  Document-based  Graph
  • 4. Schema less data model  Adhoc Query  Scalability  High Availability  Speed
  • 5. consistency Availability Partition Tolerance
  • 7. Business + smart information = Business Intelligence Consists of querying, reporting, and analytics for businesses Enable business to make smart decision to execute
  • 8. Modelization of Data in SQL A 1-many relation of node (id, value) with other nodes related by two different relations Node Relation id id_node1 value id_node2
  • 9. NoSQL Modelization mapped on Relational Database Modelization Node Relation c c id _id value value 1 value2
  • 10. Using Complex Type Attributes to Model data Nodes _id valued relations [] value 1 value 2 … value n
  • 11. No join operation required  Nesting of documents, resulting in instantaneous access to retrieve nodes/  Supporting agile method of programming  Schema flexible adaptive to changing business needs
  • 12. MapReduce ◦ Programming model for managing large amount of data in a parallel fashion ◦ Map : Processing of a data list to create key/value pairs ◦ Reduce: Porcess above pair to create new aggregated key/value pairs
  • 13. map(k1, v1) = list(k2,v2) reduce(k1, list(v2)) = list(v3) List : (a; 2) (a; 4)(b; 4)(b; 2)(a;1)(c;5) Map: (a;[2, 4, 1]), (b;[4,2]), (c,[5]) Reduce: (a;7), (b;6),(c;5)
  • 14.
  • 15. Problem The task is to find the 2 closest cities in each country Given that earth as 2D plane The distance b/w P1 (x1,y1) and P2 (x2,y2) is computed as Square-Root of { (x1-x2)2 + (y1- y2)2 }
  • 16.
  • 17. Computations Groups
  • 18. create view city_dist as select c1.CountryID, c1.CityId, c1.City, c2.CityId as CityId2, c2.City as City2, dist(c1,c2) as Dist from cities c1 inner join cities c2 where c1.CountryID = c2.CountryID and c1.CityId < c2.CityId /* Calculate distance between 2 cities only once */
  • 19. select city_dist.* from ( select CountryID, min(Dist) as MinDist from city_dist where Dist > 0 /* Avoid cities which share Latitude & Longitude */ group by CountryID ) a inner join city_dist on a.CountryID = city_dist.CountryID and a.MinDist = city_dist.Dist;
  • 20.
  • 21. Map Reduce Finalize
  • 22. function MapCode() { emit(this.CountryID, { "data": [ { "city": this.City, "lat": this.Latitude, "lon": this.Longitude } ] }); }
  • 23.
  • 24. function ReduceCode(key, values) { var reduced = {"data":[]}; for (var i in values) { var inter = values[i]; for (var j in inter.data) { reduced.data.push(inter.data[j]); } } return reduced;
  • 25. function Finalize(key, reduced) { if (reduced.data.length == 1) { return { "message" : "This Country contains only 1 City" }; } var min_dist = 999999999999; var city1 = { "name": "" }; var city2 = { "name": "" }; var c1; var c2; var d; for (var i in reduced.data) { for (var j in reduced.data) { if (i>=j) continue; c1 = reduced.data[i]; c2 = reduced.data[j]; d = Dist(c1,c2) if (d < min_dist && d > 0) { min_dist = d; city1 = c1; city2 = c2; } } } return {"city1": city1.name, "city2": city2.name, "dist": min_dist}; }
  • 26. Shard!  For write intensive, increase number of shards  For read intensive, increase number of replica-sets within shards  Best Read performance : Data in Shard breadth in memory
  • 27.
  • 28. MongoDB has application well-suited for BI  Schemaless Model allows high-performance  Replication with Sharding allows Scaling out
  • 29. http://www.mongodb.org/  http://www.jaspersoft.com/  R. Cattell. Scalable SQL and NoSQL Data Stores.http://www.cattell.net/datastores/Dat astores.pdf  C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R. Bradski, A. Y.Ng, and K. Olukotun. Map- reduce for machine learning on multicore. In NIPS, pages 281–288, 2006.  http://www.mongovue.com/2010/11/03/yet -another-mongodb-map-reduce-tutorial/