This is a Powerpoint Presentation based on the comparison of various available analytical tools. This includes various tools for business analytics and their detailed description.
2. • A GUI (Graphic User Interface) based analytics
software
• Versatile, powerful & user friendly
• Handles large amount of data easily
• Easy to learn
• Popularity of SAS E-Miner doesn’t match its
capability due to high price tag
• One of the most expensive softwares’ which a very
few companies can afford
SAS Enterprise Miner
3. SAS Enterprise Miner
FEATURES
• Data Preparation, Summarization & Exploration
i. Access & integrate structured and unstructured
data sources
ii. Outlier filtering
iii. Data Partitioning
iv. Integration with R
v. Merge & Append Tools
vi. Univariate & Bivariate statistics and plots
vii. Interactive Variable Binning
4. • Advanced Predictive & Descriptive Modeling
i. Clustering & Self Organizing Maps
ii. Market Basket Analysis
iii. Dimension Reduction Techniques
iv. Linear & Logistic Regression
v. Decision Trees
vi. Neural Networks
vii. Time Series Data Mining
viii.Survival Analysis
SAS Enterprise Miner
5. BASE SAS +
• An analytics software pre loaded with functions to
perform statistical analysis
• Not user friendly, involves coding
• Cheaper software as compared to SAS Enterprise
Miner but still expensive
• SAS+ comprises of BASE SAS, SAS STAT & SAS Access
to ODBC
6. WPS
• An analytics software which reads, understands
and executes the language of SAS
• Comes loaded with built in functions & procedures
to perform statistical analysis
• Inspired by BASE SAS
• Interface similar to BASE SAS
• Cheaper than BASE SAS +
• Version 3 (WPS) offers WPS Workbench User
Interface to connect & run programs in server,
cluster and cloud environment
7. IBM’s SPSS
• IBM’s SPSS is an equivalent of BASE SAS
• Popular Software in Market Research
• Can handle small to mid size data sets
8. • IBM’s SPSS Modeler is an equivalent of SAS
Enterprise Miner
• SPSS Modeler offers features such as :
i. Accessing Data
ii. Data Exploration
iii. Summarization & Preparation
iv. Predictive Modeling Techniques such as
regression, clustering, decision trees, neural
networks, self organizing maps etc
IBM’s SPSS Modeler/Clementine
9. R
• World’s most popular open source analytics tool
• Evolved from a language called S, then converted to
a product called S+ (GUI based)
• R offers more than 3000 packages
• R package is a collection of functions which enable:
i. Make computations in descriptive statistics
ii. Data Manipulation
iii. Regression Analysis
iv. Advanced Visualization
10. • R 3.4.3 (Latest Version)
• Developed by practitioners themselves
• BASE SAS involves coding in R Language, interface is similar to
BASE SAS
• R language is concise and elegant, uses pre developed
packages
• Not easy to learn, steep learning curve
• Performs super complex statistical analytics quickly
R
11. • Excellent statistical & visualization capability
• Faces problems in handling large data sets
• Integration of R with HADOOP can handle large data
sets
• Adoption of R has increased due to use by Facebook,
Google, Bing, Mozilla etc
• Graphs can be created with several layers, scales,
coordinate systems, smoothing curves
R
12. Apache HADOOP
• Open source data management software
• Helps companies in analyzing massive data volumes
(Structured & Unstructured)
• Used by Ebay, Yahoo, Facebook
• One of the most desired technical skills in the
industry
13. MICROSTRATEGY
• Business Intelligence product with limited analytics
capability
• Easy to learn tool
• Excellent Visualization
• Advanced Integration Capability with R & HADOOP
14. STATISTICA
• Statistics & Analytics software package developed
by STATSOFT
• Features are data analysis, data management,
statistics, data mining, data visualization etc
• GUI based product similar to SAS Enterprise Miner
• Procedure involves :
i. Loading table of data
ii. Applying statistical functions from drop down
menus
• User friendly, Easy to learn
• Advanced analytics capabilities (with large data)
15. KXEN
• KXEN = Knowledge Extraction Engines
• Automated Analytics
• Reduces work of analysts
• Products are based on algorithms developed by
Russian Mathematician Vladimir Vapnik
• Easy to use, easy to learn, fast & can handle large
data sets
• Can produce a large number of models quickly
• Works like a Black Box
16. • KXEN Software Packages offer:
i. Data Manipulation
ii. Classification
iii. Regression
iv. Clustering
v. Variable Importance
vi. Segmentation
vii. Time Series
viii.Association Rules
ix. Data Fusion
KXEN
17. TABLEAU
• GUI based data visualization product similar to
MICROSTRATEGY, is focused on Business Intelligence
• Drag & Drop feature offered to analyze data
• Visualizes & creates interactive dash boards
• Easy to learn
• Gives a good understanding of data
• Not capable of Predictive Analytics
18. Comparison of Analytics Tools
• Measures used to compare the popularity of
Analytics tools are:
i. Level of Activity on E-mails or Discussion
Lists devoted to these tools
ii. Number of Users (Data Analytics
Competitions)
iii. Languages used in Data Mining or Analysis
20. • R & SAS Tools are most popular
• R has dominated in the last few years
• R shows decline in 2011 due to :
i. Migration to other forums
ii. Emergence of easy to use User Interface in R such
as R Commander, Deducer (a GUI for R) & Rattle (a
GUI for Data Mining using R)
Based on Level of Activity on E-
Mails/Discussion Lists
22. • “Kaggle.com” sponsors data analysis contests
• Companies post Data Analytics Problems with certain
prize money
• R is the most preferred language in these competitions
and even externally
• 50% of the contest winners were found to be using R
• Other tools often have prohibitions, due to licenses etc,
so R is naturally preferred
Based on Number of Users (Data Analytics
Competitions
25. • R is the leader in programming languages used
followed by Python (2015)
• R is the leader in the Softwares’ used followed by
Rapid Minder (2015)
• The usage of these languages and softwares’ has a
direct relationship with the number of jobs which
have a programming language as their requirement
Based on Languages & Softwares’ used in
Data Mining or Analytics
Most analytics tools involve coding, and hence it is a trade off between user friendliness and scalability. GUI based analytics products work well with limited data but become unviable for large data. But SAS Enterprise Miner can handle large amounts of data.
Results coming out of these models still need to be interpreted and insights derived by an analyst who understands the business.
Results coming out of these models still need to be interpreted and insights derived by an analyst who understands the business.
Works like a Black Box – if one needs to explain the algorithm or methodology to an analyst or end user, it is unexplainable and secret. It only gives results but doesn’t offer the explanation behind reaching those inferences or conclusions.