This webinar gives an overview of the Pentaho technology stack and then delves deep into its features like ETL, Reporting, Dashboards, Analytics and Big Data. The webinar also facilitates a cross industry perspective and how Pentaho can be leveraged effectively for decision making. In the end, it also highlights how apart from strong technological features, low TCO is central to Pentaho’s value proposition. For BI technology enthusiasts, this webinar presents easiest ways to learn an end to end analytics tool. For those who are interested in developing a BI / Analytics toolset for their organization, this webinar presents an interesting option of leveraging low cost technology. For big data enthusiasts, this webinar presents overview of how Pentaho has come out as a leader in data integration space for Big data.
Pentaho is one of the leading niche players in Business Intelligence and Big Data Analytics. It offers a comprehensive, end-to-end open source platform for Data Integration and Business Analytics. Pentaho’s leading product: Pentaho Business Analytics is a data integration, BI and analytics platform composed of ETL, OLAP, reporting, interactive dashboards, ad hoc analysis, data mining and predictive analytics.
Automating Google Workspace (GWS) & more with Apps Script
Business Intelligence and Big Data Analytics with Pentaho
1. Welcome to the webinar on
Business Intelligence and Big Data Analytics
with Pentaho
Presented by
&
www.compulinkacademy.com
www.ellicium.com
2. Contents
1
An Introduction to Pentaho
2
Overview of Pentaho technology stack
3
Pentaho ETL
4
Data Exploration using Pentaho
5
Big Data with Pentaho
6
Getting started with Pentaho
3. Welcome to Open source world
Open-source software is computer software with its source
code made available and licensed with a license in which
the copyright holder provides the rights to study, change
and distribute the software to anyone and for any purpose.
Open-source software is very often developed in a public,
collaborative manner.
Reporting
•
•
•
•
Analysis
•
•
•
•
Actuate BIRT
Jasper Reports
Pentaho
Open Reports
ETL Tools
•
•
•
•
JPivot
Mondrian/
Pentaho
PALO
You already use it!!!
•
•
• Jasper
• Pentaho
• SpagoBI
Napster
•
Amazon reviews,
•
YouTube
Data Mining /
Statistics
• Weka /
Pentaho
• R
BI Platforms
Clover ETL
Enhydra Octopus
Talend
Kettle / Pentaho
Linux
Databases
•
•
•
•
Derby
Ingres
MySQL
PostgreSQL
What it means for BI and analytics
A report by the Standish Group states that adoption of open-source software models has resulted
in savings of about $60 billion per year to consumers.
4. Welcome to Pentaho!!!!
•Commercial open source alternative for business intelligence (BI) Founded
in 2004 by five founders
•Management - proven BI and open source veterans from Business Objects,
Cognos, Hyperion, JBoss, Oracle, Red Hat, SAS
• Pioneer in Commercial open source BI Large reference able customer
base, wide range of BI/DW deployments !
•It offers a suite of open source Business Intelligence (BI) products called
Pentaho Business Analytics providing data integration, OLAP services,
reporting, dashboarding, data mining and ETL capabilities
6. What analysts are saying about Pentaho
Pentaho is the only open source company featured in Ovum's Ovum Decision Matrix
for Business Intelligence. "Pentaho is one of the few vendors that provide a direct
integration into Hadoop and NoSQL databases, allowing users to analyse and visualize
NoSQL data alongside traditional data sources"
Forrester recognized Pentaho as the sole "Strong Performer“. "Pentaho provides an
impressive Hadoop data integration tool." Pentaho was cited for its rich functionality
and extensive integration with Apache Hadoop, and for providing certified integration
with distributions from Cloudera, EMC Greenplum and Hortonworks.
Passionned's Business Intelligence Tools Survey highlighted the completeness of the
Pentaho product suite compared to other vendors, as well as Pentaho's significant
cost-saving by pricing products per deployment, not per-user. Pentaho earned
recommendation as a complete enterprise solution.
Pentaho was included in Gartner's Magic Quadrant for Business Intelligence Platforms.
The report, published, offers the analyst firm's insights on business intelligence
vendors who meet an inclusion threshold based on annual sales, capabilities, and
customer survey responses.
7. Pentaho Licensing
The current version of the Pentaho BI Platform will be distributed under
the terms of the GNU General Public License (GPL).
Under the GPL, if you intend to distribute GPL-licensed code to your
customers as part of other software you have created, you may, depending
on the software you have created, be required to GPL that code.
Companies that wish to distribute the Pentaho BI Platform have the option
of purchasing a commercial license from Pentaho Corporation. A
commercial license would exempt you from GPL obligations.
The GNU General Public License (GPL) is the most widely used free
software license, which guarantees end users the freedoms to use, study,
share and modify the software. Derived works can only be distributed
under the same license terms.
11. Delivering Value in Different Deployment Models
Coexistence with traditional proprietary BI
•Minimize risk/exposure with consolidated vendors
•Prove technology and services internally
•Explore the relationship benefits of a transparent model without
software lock-in
Co-deployment with traditional proprietary BI
•Leverage existing investments
•Pragmatically “use what works”
•Reduce overall TCO by incorporating commercial open source
Replacement of traditional proprietary BI
•Upgrade BI capabilities
•Reduce TCO
•Capitalize on the opportunity of a “disruption” (software upgrade,
license change, etc.) in your BI environment
13. Pentaho Kettle ETL
•Pentaho Data Integration (PDI, also called Kettle) is the component of Pentaho responsible
for the Extract, Transform and Load (ETL) processes. Though ETL tools are most frequently
used in data warehouses environments, PDI can also be used for other purposes:
•Migrating data between applications or databases
•Exporting data from databases to flat files
•Loading data massively into databases
•Data cleansing
•Integrating applications
15. Pentaho Kettle ETL
Spoon
• GUI that allows you to design transformations and jobs
•Transformations and Jobs can describe themselves using an XML file or
can be put in a Kettle database repository.
•Spoon is available as executable script and batch file to make use of tool
in heterogeneous environment.
Pan
•A program to execute transformations designed by Spoon in XML or database
repository.
•Transformations are scheduled in batch mode to be run automatically at regular
intervals
Carte
•Simple web server to execute transformations and jobs remotely.
•Accept an XML that contains transformation to execute and the execution
configuration.
•Allows to remotely monitor, start and stop the transformations and jobs
20. Pentaho Dashboards
What is CDE ?
* CDE is one of the plug-in for Pentaho BI Server, contributed and maintained by Pentaho Partner
webdetails.
* We create dashboards using this tool.
* Community Dashboard Editor (CDE) was born to simplify the creation, edition and rendering
processes of the Dashboards.
* CDE is a very powerful and complete tool, combining front end with data sources and custom
components in a seamless way.
CDE has 3 major components
They are.
* Layout
* Components
* Data Sources.CDE has developed based on MVC-2 architecture of Advanced Java
23. Main Big Data Technologies
Hadoop
NoSQL Databases
Analytic RDBMS
•
•
•
•
•
Low cost, reliable scaleout architecture
Distributed computing
Proven success in
Fortune 500 companies
Exploding interest
Hadoop
•
•
Huge horizontal scaling
and high availability
Highly optimized for
retrieval and appending
Types
•
•
•
Document stores
Key Value stores
Graph databases
NoSQL Databases
•
Optimized for bulk-load
and fast aggregate query
workloads
Types
•
•
•
Column-oriented
MPP
In-memory
Analytic Databases
24. What makes Pentaho different for big data
Ingestion / Manipulation
/ Integration
Scheduling
Modeling
Would you rather do this?
… OR THIS?
25. Pentaho Big Data Integration
Pentaho is integrated with Hadoop at many levels
•Traditional ETL - Graphical designer to visually build transformations that read and write data
in Hadoop from/to anywhere and transform the data on the way. No coding required
•HBase Read/Write
•Hive, Hive2 SQL Query and Write
•Impala SQL Query and Write
•Support for Avro file format and snappy compression
•Data Orchestration - Graphical designer to visually build and schedule jobs that orchestrate
processing, data movement and most aspects of operationalizing your data preparation.
•HDFS Copy files
•Map Reduce Job Execution
•Pig Script Execution
•Amazon EMR Job Execution
•Oozie integration
•Sqoop Import/Export
•Pentaho MapReduce Execution
26. Pentaho Big Data Integration
•Pentaho MapReduce - Graphical designer to visually build MapReduce jobs and run
them in cluster. With a simple, point-and-click alternative to writing Hadoop
MapReduce programs in Java or Pig, Pentaho exposes a familiar ETL-style user
interface.
•Traditional Reporting - All data sources supported above can be used directly or
blended with other data to drive our pixel perfect reporting engine. The reports can
be secured, parameterized and published to the web. The reports can be mashed up
with other pentaho visualizations to create dashboards.
•Web Based Interactive Reporting - Pentaho's Metadata layer leverages data stored in
Hive, Hive2 and Impala for WYSIWYG, interactive, self-service reporting.
•Pentaho Analyzer - Leverage your data stored Impala or Hive2 for interactive visual
analysis with drill through, lasso filtering, zooming, and attribute highlighting for
greater insight.
28. Getting started with Pentaho
•Download Pentaho from http://community.pentaho.com/
•Download MySQL from
http://dev.mysql.com/downloads/mysql/
• Download CDE from www.webdetails.pt/ctools/cde.html
Read installation instructions from following blogs:
•http://pentaho-bi-suite.blogspot.in/2013/04/installation-ofpentaho-bi-server.html
• We have a Pentaho installation guide available. Please request
for guide at: info@ellicium.com
29. Thank you !!!
Contact us for customized Pentaho
training on
info@compulinkacademy.com
info@ellicium.com
Or Call Sameer on +91-8793334411