SlideShare uma empresa Scribd logo
1 de 7
Webinar: Big Data Architectures – Beyond the Elephant Ride
June 29, 2012
Question and Answer Session

Q1. What are the differences between Storm and ESBs like Mule?

Storm and ESB (like mule) are very distinct and cannot be compared..

The motivation behind ESBs is to standardize and structure the loosely coupled
software components so that they can be independently deployed and run in a
disparate environment. The communication is through message passing and using an
ESB heterogeneous components are able to interact with each other.

Storm is for processing large data in real time. When we use storm, we do not
attempt establishing any form of a common structure for different components to
collaborate. Rather, Storm enables huge amount of data to be processed through a
chain of processing units.

So when you’ve large amounts of data than you want to process in real-time, we
advise you to use Storm. On the other hand, when you have numerous components
and you want to write a layer that will enable their interaction, use an ESB.

Infact, Storm and ESB can be theoretically integrated together so that Storm can
handle the streaming analytics part while ESB can cater to service orchestration and
integrations.

Q2. What is the advantage of Giraph and Pregel over more common Graph DBs like
Neo or Infinite graphs?

Giraph is an opensource implementation of Pregel meant for large datasets. It
provides a large-scale graph processing infrastructure over Hadoop. Some of the
advantages I’d like to highlight include:

   1.   Distributed and especially developed for large scale graph processing
   2.   Bulk Synchronous Parallel (BSP) as execution model
   3.   Fault tolerance by check pointing
   4.   Giraph runs on standard Hadoop infrastructure

© 2012 Impetus Technologies                                                     Page 1
5. Computation is executed in memory
   6. It can be a part of pipeline in form of a job
   7. Vertex centric API

Request you to go through answer to Question No. 9 as well.

Q3. What do you recommend for Reporting on top of NoSQL databases?

Technologies coming under NoSQL are relative new and still evolving. Furthermore,
there are a lot of these technologies and it is unlikely that one single tool would work
on all of them.

It will be great if you could share us the exact NoSQL technology which you are either
using or planning to use and we'll then be able to suggest you the right tool.

There are a very few reporting tools like Intellicus and Jasper that work on HBase but
I guess they're still keeping an eye on the market to see the direction it's going to
take.

I strongly believe that you should see some exciting features in these tools in the
next 6 - 12 months’ time frame.

Q4. What are the difference between Cassandra and RIAK and why would you
choose one over the other?

Cassandra and RIAK are popular NoSQL solutions and are best suited to solve
different kind of use cases in specific ways. So the answer to choose one over the
other would totally depend on the business use case that we are trying to solve.

Strengths of Riak over Cassandra

- Adding nodes to the Riak cluster is very easy
- Datamodel doesn't need to be pre-setup
- You can access it using REST or using Protocol Buffer API
- Commercial support is available from Basho




© 2012 Impetus Technologies                                                       Page 2
Strengths of Cassandra over Riak

- Cassandra is still more popular because of the bigger community using it
- You can access it using Cassandra CQL; a SQLish language
- Scales to PBs and support columnar structure
- Enterprise features like rack-awareness are free which is helpful in large
deployments
- Commercial product support is available from Datastax.
- Implementation support is available from 3rd party commercial service providers
like Impetus. (http://wiki.apache.org/cassandra/ThirdPartySupport)

Q5. We planned for a SAN deployment as our storage solution. I have read that
MPP database solutions are optimal on a shared-nothing architecture as DAS
rather than on SAN. Can you please comment on MPP database on SAN vs DAS?

Typically speaking SAN can offer higher throughput over DAS but can also have a
higher latency for lighter loads vis-à-vis DAS. Also, SAN's available throughput will be
shared across all connected nodes. In a MPP Data warehousing scenario, multiple
nodes will connect to SAN, thereby, sharing a common bandwidth.

Another point to note is that most queries served by MPP systems will involve high
amount of scattered reads across multiple nodes, thus pushing the bandwidth
utilization on SAN to its limits. However, if we have high amount of cache with high
speed HBAs and high speed disks in SAN (15K RPM), then the SAN should be able to
server a 10-15 nodes MPP cluster.

On the other hand, DAS storage can also provide very good throughput and does not
have to share the bandwidth across multiple nodes. The bandwidth offered can be
further improved by using multiple SATA adapters and high speed disks (10K - 15K
RPM). DAS probably will offer better performance on a cluster with very high number
of nodes.

To summarize, there is no clear winner and using SAN vs DAS will depend on various
factors like load, underlying technology in the storage system, cache, number of
nodes etc. Both, high end fibre based storage technology and new SATA based
storage technology (e.g. SATA-3), can offer similar bandwidth. We suggest that a



© 2012 Impetus Technologies                                                        Page 3
careful study and capacity planning should be conducted on the underlying storage
system before deciding on the storage solution.

Q6. What architecture components would satisfy the desire to have an integrated
NewSQL environment and be able to marry that data with both adhoc defined user
tables and events detected during unstructured data stream processing?

NewSQL and NoSQL databases/datastores excel in areas where traditional RDBMS
systems have some limitations. In many scenarios, NoSQL/NewSQL databases can
offer significant improvement over RDBMS. Some cases are

1. Very high availability on a high traffic data
2. Storing CLOB/text data that store denormalized/unstructured data
3. Journal data
4. Performance and scalability

Unstructured data stream processing falls more under the category of CEP (complex
event processing) and eventually we will see that NewSQL systems start providing
support for pre-ingestion analytics than the current traditional post-ingestion
analytics. Currently, you will have to rely on some CEP component providing event
detection on streaming data while NewSQL acts as the data sink for this streaming
data. NewSQL can also help in rapid event generation by firing analytical queries
much faster than traditional RDBMS.

Q7. Can you compare Neo4J with your recommended Graph Database

Already answered as a part of Q.2

Q8. What is your take on MongoDB?

RDBMS is still the most commonly used data-store for applications built today. But,
the flexibility offered by Mongo provides advantages with respect to development
speed and overall application performance in many use- cases. Like any other
document store, instead of storing data into tables with rows and columns,
MongoDB encapsulates data into loosely defined documents.




© 2012 Impetus Technologies                                                    Page 4
There are a lot of document-oriented stores, and the underlying implementation
varies between various data-stores. Some represent it as an XML document and
some use JSON. The general rule is documents are not rigidly defined and you can
expect a high degree of flexibility when defining data.

MongoDB is one of the most popular document stores. It is an open source, schema-
free, written in C++ and support for a wide array of programming languages including
a SQL-like query language.

It’s relatively a new technology and has a few challenges as well but with attractive
pricing and relative ease of use, it definitely is becoming a choice for various small
and large companies.

Q9. You didn't mention Neo4j in your graph databases you recommend. Any
particular reason Neo4j wasn't included?

No, there is no particular reason. What’s important here for you is to understand the
difference between these technologies and where their fitment is. If you’ve an OLAP
and data analytics scenario, Hadoop-based Pregel and Giraph will be a better fit. If
you’ve an OLTP setup where you want to store and query on connected data for
online transaction processing Neo4J will come into the picture.

Request you to go through an excellent reading here:
http://jim.webber.name/2011/08/24/66f1fb4b-83c3-4f52-af40-ee6382ad2155.aspx

Q10.What is the limiting factor in analyzing all data in a real-time basis? Is it
processing power, storage systems, DB systems or something else?

There are challenges in each of the points you raised like storing and processing.
When you process the data, it usually has to be loaded on to the main memory which
is still expensive. The machines have to be powerful enough to get you the results
fast. Hence, both processing and storage system are the main bottlenecks.

Also, there is a paradigm shift in the way programming is done. So, in order to
efficiently process the data, we need to come up with parallel algorithms which are
able to work on this data and utilize the processing power of the machines.



© 2012 Impetus Technologies                                                         Page 5
So to summarize three points that I consider limiting factor are: memory, processing
and the right set of algorithms.

Q11.What do you recommend for an in database but very scalable alternative to
SAS for doing advanced math on large datasets

Assuming that the reference here is to SAS language, R scripting can be a good
alternative to work with large datasets as it has good integration with Hadoop and
can scale well using map reduce programming interface over R scripts. Revolution
Analytics is a commercial product for R over Hadoop.

There are other non-Hadoop options as well such as Greenplum or Aster etc which
have support for specialized advanced math libraries.

Also, SAS is now providing integration with Hadoop which means that you can reuse
some of your SAS programming investments and use Hadoop as the underlying
scalable processing engine for some of the analytical execution.

Q12 Are there any NewSQL platforms that have mastered the functionality around
Workload Management? For instance, without workload management, the high
resource, intense transactions can get in the way of traditional reporting needs...
in other words, is there a NewSQL environment that can be used for traditional and
advanced analysis on the same platform?

NewSQL are certainly evolving every day as we speak with many more being built in
stealth mode. We are not aware of any advanced workload management
functionality being provided with any NewSQL platform for now, but that may
change any day now.

However, most NewSQL platforms have been designed to work efficiently with either
OLTP environment or OLAP environment.

Q14 Is MongoDB a better solution for any of the scenarios discussed?

MongoDB can be a good option in some use-cases of OLTP systems or the
transactional system we discussed.



© 2012 Impetus Technologies                                                    Page 6
Q15 Do you have recommendations for an indexing solution?

Depending on the data size you can go for Solr and Elastic Solr as options for
indexing. There are commercial solutions as well but Solr with its new scalable
version SolrCloud can compete with any other commercial solution.

            Write to us at bigdata@impetus.com for more information




© 2012 Impetus Technologies                                                       Page 7

Mais conteúdo relacionado

Mais procurados

Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...DataStax
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Daniel Abadi
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use casesJoey Echeverria
 
1 rh storage - architecture whitepaper
1 rh storage - architecture whitepaper1 rh storage - architecture whitepaper
1 rh storage - architecture whitepaperAccenture
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndicThreads
 
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at ExplorysHBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at ExplorysCloudera, Inc.
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringBADR
 
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1Donghan Kim
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Running Cognos on Hadoop
Running Cognos on HadoopRunning Cognos on Hadoop
Running Cognos on HadoopSenturus
 
Introducing the hadoop ecosystem
Introducing the hadoop ecosystemIntroducing the hadoop ecosystem
Introducing the hadoop ecosystemGeert Van Landeghem
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseGwen (Chen) Shapira
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 

Mais procurados (20)

Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
HDFS
HDFSHDFS
HDFS
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012
 
Aug 2012 HUG: Random vs. Sequential
Aug 2012 HUG: Random vs. SequentialAug 2012 HUG: Random vs. Sequential
Aug 2012 HUG: Random vs. Sequential
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use cases
 
1 rh storage - architecture whitepaper
1 rh storage - architecture whitepaper1 rh storage - architecture whitepaper
1 rh storage - architecture whitepaper
 
Actian DataFlow Whitepaper
Actian DataFlow WhitepaperActian DataFlow Whitepaper
Actian DataFlow Whitepaper
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path ahead
 
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at ExplorysHBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data Engineering
 
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Running Cognos on Hadoop
Running Cognos on HadoopRunning Cognos on Hadoop
Running Cognos on Hadoop
 
Introducing the hadoop ecosystem
Introducing the hadoop ecosystemIntroducing the hadoop ecosystem
Introducing the hadoop ecosystem
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 

Destaque

Your Awesome Brand + Resume Tips
Your Awesome Brand + Resume TipsYour Awesome Brand + Resume Tips
Your Awesome Brand + Resume TipsCasey Knox
 
Guideline to composition writing
Guideline to composition writingGuideline to composition writing
Guideline to composition writingEponI
 
Doris Devers Resume 033116
Doris Devers Resume 033116Doris Devers Resume 033116
Doris Devers Resume 033116Doris Devers
 
S4 tarea4 mameb
S4 tarea4 mamebS4 tarea4 mameb
S4 tarea4 mamebbermarmed
 
Eenvoudig dienstbetoon
Eenvoudig dienstbetoonEenvoudig dienstbetoon
Eenvoudig dienstbetoonDavid Geens
 
비종교 적인 깨달음 (Korean)
비종교 적인 깨달음 (Korean)비종교 적인 깨달음 (Korean)
비종교 적인 깨달음 (Korean)Hitoshi Tsuchiyama
 
Laurie Faith resume
Laurie Faith resumeLaurie Faith resume
Laurie Faith resumeLaurie Faith
 
BITE Social Case Study 1
BITE Social Case Study 1BITE Social Case Study 1
BITE Social Case Study 1Tamara Wilson
 
Second Mile Mobile Detailing
Second Mile Mobile DetailingSecond Mile Mobile Detailing
Second Mile Mobile DetailingAllan S. Watson
 

Destaque (12)

Your Awesome Brand + Resume Tips
Your Awesome Brand + Resume TipsYour Awesome Brand + Resume Tips
Your Awesome Brand + Resume Tips
 
Guideline to composition writing
Guideline to composition writingGuideline to composition writing
Guideline to composition writing
 
Doris Devers Resume 033116
Doris Devers Resume 033116Doris Devers Resume 033116
Doris Devers Resume 033116
 
S4 tarea4 mameb
S4 tarea4 mamebS4 tarea4 mameb
S4 tarea4 mameb
 
Eenvoudig dienstbetoon
Eenvoudig dienstbetoonEenvoudig dienstbetoon
Eenvoudig dienstbetoon
 
비종교 적인 깨달음 (Korean)
비종교 적인 깨달음 (Korean)비종교 적인 깨달음 (Korean)
비종교 적인 깨달음 (Korean)
 
flyer
flyerflyer
flyer
 
Laurie Faith resume
Laurie Faith resumeLaurie Faith resume
Laurie Faith resume
 
BITE Social Case Study 1
BITE Social Case Study 1BITE Social Case Study 1
BITE Social Case Study 1
 
Kti mitra tanjung
Kti mitra tanjungKti mitra tanjung
Kti mitra tanjung
 
Second Mile Mobile Detailing
Second Mile Mobile DetailingSecond Mile Mobile Detailing
Second Mile Mobile Detailing
 
6 Apps to Use With Instagram
6 Apps to Use With Instagram6 Apps to Use With Instagram
6 Apps to Use With Instagram
 

Semelhante a Big Data Architectures Webinar Q&A - Storm vs ESBs, Giraph vs Graph DBs

Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree									Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree AnikeyRoy
 
Six Steps to Modernize Your Data Ecosystem - Mindtree
Six Steps to Modernize Your Data Ecosystem  - MindtreeSix Steps to Modernize Your Data Ecosystem  - Mindtree
Six Steps to Modernize Your Data Ecosystem - Mindtreesamirandev1
 
6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtreedevraajsingh
 
Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog sameerroshan
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?samthemonad
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperDavid Walker
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkAgnihotriGhosh2
 
Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop EMC
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLbalwinders
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataDebajani Mohanty
 
Big Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkBig Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkGraisy Biswal
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsDavid Portnoy
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Whitepaper_Cassandra_Datastax_Final
Whitepaper_Cassandra_Datastax_FinalWhitepaper_Cassandra_Datastax_Final
Whitepaper_Cassandra_Datastax_FinalMichele Hunter
 

Semelhante a Big Data Architectures Webinar Q&A - Storm vs ESBs, Giraph vs Graph DBs (20)

No sql database
No sql databaseNo sql database
No sql database
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docx
 
Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree									Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree
 
Six Steps to Modernize Your Data Ecosystem - Mindtree
Six Steps to Modernize Your Data Ecosystem  - MindtreeSix Steps to Modernize Your Data Ecosystem  - Mindtree
Six Steps to Modernize Your Data Ecosystem - Mindtree
 
6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree
 
Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop
 
Introducing Mache
Introducing MacheIntroducing Mache
Introducing Mache
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Big Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkBig Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. Spark
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse Platforms
 
No sql
No sqlNo sql
No sql
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Whitepaper_Cassandra_Datastax_Final
Whitepaper_Cassandra_Datastax_FinalWhitepaper_Cassandra_Datastax_Final
Whitepaper_Cassandra_Datastax_Final
 
Sdn in big data
Sdn in big dataSdn in big data
Sdn in big data
 

Mais de Impetus Technologies

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Impetus Technologies
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarImpetus Technologies
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarImpetus Technologies
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Impetus Technologies
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in ElasticsearchImpetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Impetus Technologies
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Impetus Technologies
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Impetus Technologies
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...Impetus Technologies
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastImpetus Technologies
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Impetus Technologies
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Impetus Technologies
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabImpetus Technologies
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trendsImpetus Technologies
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...Impetus Technologies
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastImpetus Technologies
 

Mais de Impetus Technologies (20)

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus Webinar
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in Elasticsearch
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus Webcast
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trends
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 

Último

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

Big Data Architectures Webinar Q&A - Storm vs ESBs, Giraph vs Graph DBs

  • 1. Webinar: Big Data Architectures – Beyond the Elephant Ride June 29, 2012 Question and Answer Session Q1. What are the differences between Storm and ESBs like Mule? Storm and ESB (like mule) are very distinct and cannot be compared.. The motivation behind ESBs is to standardize and structure the loosely coupled software components so that they can be independently deployed and run in a disparate environment. The communication is through message passing and using an ESB heterogeneous components are able to interact with each other. Storm is for processing large data in real time. When we use storm, we do not attempt establishing any form of a common structure for different components to collaborate. Rather, Storm enables huge amount of data to be processed through a chain of processing units. So when you’ve large amounts of data than you want to process in real-time, we advise you to use Storm. On the other hand, when you have numerous components and you want to write a layer that will enable their interaction, use an ESB. Infact, Storm and ESB can be theoretically integrated together so that Storm can handle the streaming analytics part while ESB can cater to service orchestration and integrations. Q2. What is the advantage of Giraph and Pregel over more common Graph DBs like Neo or Infinite graphs? Giraph is an opensource implementation of Pregel meant for large datasets. It provides a large-scale graph processing infrastructure over Hadoop. Some of the advantages I’d like to highlight include: 1. Distributed and especially developed for large scale graph processing 2. Bulk Synchronous Parallel (BSP) as execution model 3. Fault tolerance by check pointing 4. Giraph runs on standard Hadoop infrastructure © 2012 Impetus Technologies Page 1
  • 2. 5. Computation is executed in memory 6. It can be a part of pipeline in form of a job 7. Vertex centric API Request you to go through answer to Question No. 9 as well. Q3. What do you recommend for Reporting on top of NoSQL databases? Technologies coming under NoSQL are relative new and still evolving. Furthermore, there are a lot of these technologies and it is unlikely that one single tool would work on all of them. It will be great if you could share us the exact NoSQL technology which you are either using or planning to use and we'll then be able to suggest you the right tool. There are a very few reporting tools like Intellicus and Jasper that work on HBase but I guess they're still keeping an eye on the market to see the direction it's going to take. I strongly believe that you should see some exciting features in these tools in the next 6 - 12 months’ time frame. Q4. What are the difference between Cassandra and RIAK and why would you choose one over the other? Cassandra and RIAK are popular NoSQL solutions and are best suited to solve different kind of use cases in specific ways. So the answer to choose one over the other would totally depend on the business use case that we are trying to solve. Strengths of Riak over Cassandra - Adding nodes to the Riak cluster is very easy - Datamodel doesn't need to be pre-setup - You can access it using REST or using Protocol Buffer API - Commercial support is available from Basho © 2012 Impetus Technologies Page 2
  • 3. Strengths of Cassandra over Riak - Cassandra is still more popular because of the bigger community using it - You can access it using Cassandra CQL; a SQLish language - Scales to PBs and support columnar structure - Enterprise features like rack-awareness are free which is helpful in large deployments - Commercial product support is available from Datastax. - Implementation support is available from 3rd party commercial service providers like Impetus. (http://wiki.apache.org/cassandra/ThirdPartySupport) Q5. We planned for a SAN deployment as our storage solution. I have read that MPP database solutions are optimal on a shared-nothing architecture as DAS rather than on SAN. Can you please comment on MPP database on SAN vs DAS? Typically speaking SAN can offer higher throughput over DAS but can also have a higher latency for lighter loads vis-à-vis DAS. Also, SAN's available throughput will be shared across all connected nodes. In a MPP Data warehousing scenario, multiple nodes will connect to SAN, thereby, sharing a common bandwidth. Another point to note is that most queries served by MPP systems will involve high amount of scattered reads across multiple nodes, thus pushing the bandwidth utilization on SAN to its limits. However, if we have high amount of cache with high speed HBAs and high speed disks in SAN (15K RPM), then the SAN should be able to server a 10-15 nodes MPP cluster. On the other hand, DAS storage can also provide very good throughput and does not have to share the bandwidth across multiple nodes. The bandwidth offered can be further improved by using multiple SATA adapters and high speed disks (10K - 15K RPM). DAS probably will offer better performance on a cluster with very high number of nodes. To summarize, there is no clear winner and using SAN vs DAS will depend on various factors like load, underlying technology in the storage system, cache, number of nodes etc. Both, high end fibre based storage technology and new SATA based storage technology (e.g. SATA-3), can offer similar bandwidth. We suggest that a © 2012 Impetus Technologies Page 3
  • 4. careful study and capacity planning should be conducted on the underlying storage system before deciding on the storage solution. Q6. What architecture components would satisfy the desire to have an integrated NewSQL environment and be able to marry that data with both adhoc defined user tables and events detected during unstructured data stream processing? NewSQL and NoSQL databases/datastores excel in areas where traditional RDBMS systems have some limitations. In many scenarios, NoSQL/NewSQL databases can offer significant improvement over RDBMS. Some cases are 1. Very high availability on a high traffic data 2. Storing CLOB/text data that store denormalized/unstructured data 3. Journal data 4. Performance and scalability Unstructured data stream processing falls more under the category of CEP (complex event processing) and eventually we will see that NewSQL systems start providing support for pre-ingestion analytics than the current traditional post-ingestion analytics. Currently, you will have to rely on some CEP component providing event detection on streaming data while NewSQL acts as the data sink for this streaming data. NewSQL can also help in rapid event generation by firing analytical queries much faster than traditional RDBMS. Q7. Can you compare Neo4J with your recommended Graph Database Already answered as a part of Q.2 Q8. What is your take on MongoDB? RDBMS is still the most commonly used data-store for applications built today. But, the flexibility offered by Mongo provides advantages with respect to development speed and overall application performance in many use- cases. Like any other document store, instead of storing data into tables with rows and columns, MongoDB encapsulates data into loosely defined documents. © 2012 Impetus Technologies Page 4
  • 5. There are a lot of document-oriented stores, and the underlying implementation varies between various data-stores. Some represent it as an XML document and some use JSON. The general rule is documents are not rigidly defined and you can expect a high degree of flexibility when defining data. MongoDB is one of the most popular document stores. It is an open source, schema- free, written in C++ and support for a wide array of programming languages including a SQL-like query language. It’s relatively a new technology and has a few challenges as well but with attractive pricing and relative ease of use, it definitely is becoming a choice for various small and large companies. Q9. You didn't mention Neo4j in your graph databases you recommend. Any particular reason Neo4j wasn't included? No, there is no particular reason. What’s important here for you is to understand the difference between these technologies and where their fitment is. If you’ve an OLAP and data analytics scenario, Hadoop-based Pregel and Giraph will be a better fit. If you’ve an OLTP setup where you want to store and query on connected data for online transaction processing Neo4J will come into the picture. Request you to go through an excellent reading here: http://jim.webber.name/2011/08/24/66f1fb4b-83c3-4f52-af40-ee6382ad2155.aspx Q10.What is the limiting factor in analyzing all data in a real-time basis? Is it processing power, storage systems, DB systems or something else? There are challenges in each of the points you raised like storing and processing. When you process the data, it usually has to be loaded on to the main memory which is still expensive. The machines have to be powerful enough to get you the results fast. Hence, both processing and storage system are the main bottlenecks. Also, there is a paradigm shift in the way programming is done. So, in order to efficiently process the data, we need to come up with parallel algorithms which are able to work on this data and utilize the processing power of the machines. © 2012 Impetus Technologies Page 5
  • 6. So to summarize three points that I consider limiting factor are: memory, processing and the right set of algorithms. Q11.What do you recommend for an in database but very scalable alternative to SAS for doing advanced math on large datasets Assuming that the reference here is to SAS language, R scripting can be a good alternative to work with large datasets as it has good integration with Hadoop and can scale well using map reduce programming interface over R scripts. Revolution Analytics is a commercial product for R over Hadoop. There are other non-Hadoop options as well such as Greenplum or Aster etc which have support for specialized advanced math libraries. Also, SAS is now providing integration with Hadoop which means that you can reuse some of your SAS programming investments and use Hadoop as the underlying scalable processing engine for some of the analytical execution. Q12 Are there any NewSQL platforms that have mastered the functionality around Workload Management? For instance, without workload management, the high resource, intense transactions can get in the way of traditional reporting needs... in other words, is there a NewSQL environment that can be used for traditional and advanced analysis on the same platform? NewSQL are certainly evolving every day as we speak with many more being built in stealth mode. We are not aware of any advanced workload management functionality being provided with any NewSQL platform for now, but that may change any day now. However, most NewSQL platforms have been designed to work efficiently with either OLTP environment or OLAP environment. Q14 Is MongoDB a better solution for any of the scenarios discussed? MongoDB can be a good option in some use-cases of OLTP systems or the transactional system we discussed. © 2012 Impetus Technologies Page 6
  • 7. Q15 Do you have recommendations for an indexing solution? Depending on the data size you can go for Solr and Elastic Solr as options for indexing. There are commercial solutions as well but Solr with its new scalable version SolrCloud can compete with any other commercial solution. Write to us at bigdata@impetus.com for more information © 2012 Impetus Technologies Page 7