Pentaho - Jake Cornelius - Hadoop World 2010

•

2 gostaram•1,428 visualizações

Putting Analytics in Big Data Analytics Jake Cornelius Director of Product Management, Pentaho Corporation Learn more @ http://www.cloudera.com/hadoop/

Tecnologia

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
Putting Analytics in
Big Data Analytics
Jake Cornelius, Dir. Of Product Management
Pentaho Corporation
October 12, 2010

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Traditional BI
Tape/Trash
Data Mart(s)
Data
Source
?
? ?
?
?
??

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Data Lake(s)
Big Data Architecture
Data Mart(s)
Data
Source
Data WarehouseAd-Hoc

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Pentaho Data Integration
Hadoop
Pentaho Data
Integration
Data Marts, Data Warehouse,
Analytical Applications
Design
Deploy
Orchestrate
Pentaho Data
Integration
Pentaho Data
Integration

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Optimize
Visualize
Load
Files / HDFS
Hive
DM & DW
Applications & Systems
Web Tier
RDBMS
Hadoop
Reporting / Dashboards / Analysis

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Web Tier
RDBMS
Hadoop
Reporting / Dashboards / Analysis
HDFS
Hive
DM

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Demo

• Pentaho for Hadoop Download Capability
• Includes support for development, production support will follow with GA
• Collaborative effort between Pentaho and the Pentaho Community
• 60+ beta sites over three month beta cycle
• Pentaho contributed code for API integration with HIVE to the open source
Apache Foundation
• Pentaho and Cloudera Partnership
• Combines Pentaho ‘s business intelligence and data integration capabilities
with Cloudera’s Distribution for Hadoop (CDH)
• Enables business users to take advantage of Hadoop with ability to easily and
cost-effectively mine, visualize and analyze their Hadoop data
Pentaho for Hadoop Announcements

Pentaho for Hadoop Announcements (cont)
• Pentaho and Impetus Technologies Partnership
• Incorporates Pentaho Agile BI and Pentaho BI Suite for Hadoop into Impetus
Large Data Analytics practice
• First major SI to adopt Pentaho for Hadoop
• Facilitates large data analytics projects including expert consulting services,
best practices support in Hadoop implementations and nCluster including
deployment on private and public clouds

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Pentaho for Hadoop Resources & Events
Resources
Download www.pentaho.com/download/hadoop
Pentaho for Hadoop webpage - resources, press, events, partnerships and
more: www.pentaho.com/hadoop
Big Data Analytics: 5 part video series with James Dixon, Pentaho CTO
Events
Hadoop World: NYC - Oct 12, Gold Sponsor, Exhibitor, Richard Daley
presenting, ‘Putting Analytics in Big Data Analysis’
London Hadoop User Group - Oct 12, London
Agile BI Meets Big Data - Oct 13, New York City

010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Thank You.
Join the conversation. You can find us on:
Pentaho Facebook Group
@Pentaho
http://blog.pentaho.com
Pentaho - Open Source Business Intelligence Group

Mais conteúdo relacionado

Mais procurados

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho

Pentaho Analytics at Tampa Analytics September MeetupMark Kromer

30 for 30: Quick Start Your Pentaho EvaluationPentaho

Big Data for Product ManagersPentaho

Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelDataWorks Summit

Breakout: Operational Analytics with HadoopCloudera, Inc.

Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightPrecisely

All data accessible to all my organization - Presentation at OW2con'19, June...OW2

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman

Data Mashups for AnalyticsKatharine Bierce

Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie

Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...ArabNet ME

Data Process Systems, connecting everythingDataWorks Summit/Hadoop Summit

MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB

Embedded Analytics in Human Capital ManagementPentaho

Better Together: The New Data Management OrchestraCloudera, Inc.

2021 gartner mq dsmlSasikanth R

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightCloudera, Inc.

Integrated dwh 3Gwen (Chen) Shapira

Mais procurados (20)

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...

Pentaho Analytics at Tampa Analytics September Meetup

30 for 30: Quick Start Your Pentaho Evaluation

Big Data for Product Managers

Moving Health Care Analytics to Hadoop to Build a Better Predictive Model

Breakout: Operational Analytics with Hadoop

Pentaho Analytics for MongoDB - presentation from MongoDB World 2014

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight

All data accessible to all my organization - Presentation at OW2con'19, June...

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight

Data Mashups for Analytics

Rob peglar introduction_analytics _big data_hadoop

Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...

Data Process Systems, connecting everything

MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...

Embedded Analytics in Human Capital Management

Better Together: The New Data Management Orchestra

2021 gartner mq dsml

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight

Integrated dwh 3

Semelhante a Pentaho - Jake Cornelius - Hadoop World 2010

Hadoop uk user group meeting finalSkills Matter

Plug 20110217Skills Matter

BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBICC Thomas More

How advanced analytics is impacting the banking sectorMichael Haddad

Putting Business Intelligence to Work on Hadoop Data StoresDATAVERSITY

Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...MongoDB

Big data for product managersAIPMM Administration

Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Alluxio, Inc.

Pentaho Roadmap 2011Datalytics

Pentaho Big Data Analytics with Vertica and HadoopMark Kromer

Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters

MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB

What's on Your Wish List?MongoDB

Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpenAnalytics Spain

Evolution of Big Data at Intel - Crawl, Walk and Run ApproachDataWorks Summit

Filling the Data LakeDataWorks Summit/Hadoop Summit

Web Briefing: Unlock the power of Hadoop to enable interactive analyticsKognitio

Cloudian 451-hortonworks - webinarHortonworks

Driving Real Insights Through Data ScienceVMware Tanzu

Pentaho Analytics on MongoDBMark Kromer

Semelhante a Pentaho - Jake Cornelius - Hadoop World 2010 (20)

Hadoop uk user group meeting final

Plug 20110217

BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho

How advanced analytics is impacting the banking sector

Putting Business Intelligence to Work on Hadoop Data Stores

Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...

Big data for product managers

Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...

Pentaho Roadmap 2011

Pentaho Big Data Analytics with Vertica and Hadoop

Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...

MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...

What's on Your Wish List?

Open Analytics 2014 - Pedro Alves - Innovation though Open Source

Evolution of Big Data at Intel - Crawl, Walk and Run Approach

Filling the Data Lake

Web Briefing: Unlock the power of Hadoop to enable interactive analytics

Cloudian 451-hortonworks - webinar

Driving Real Insights Through Data Science

Pentaho Analytics on MongoDB

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.

Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.

2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.

Edc event vienna presentation 1 oct 2019Cloudera, Inc.

Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.

Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.

Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.

Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.

Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.

Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.

Extending Cloudera SDX beyond the PlatformCloudera, Inc.

Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.

Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.

Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.

Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx

Cloudera Data Impact Awards 2021 - Finalists

2020 Cloudera Data Impact Awards Finalists

Edc event vienna presentation 1 oct 2019

Machine Learning with Limited Labeled Data 4/3/19

Data Driven With the Cloudera Modern Data Warehouse 3.19.19

Introducing Cloudera DataFlow (CDF) 2.13.19

Introducing Cloudera Data Science Workbench for HDP 2.12.19

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19

Leveraging the cloud for analytics and machine learning 1.29.19

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19

Leveraging the Cloud for Big Data Analytics 12.11.18

Modern Data Warehouse Fundamentals Part 3

Modern Data Warehouse Fundamentals Part 2

Modern Data Warehouse Fundamentals Part 1

Extending Cloudera SDX beyond the Platform

Federated Learning: ML with Privacy on the Edge 11.15.18

Analyst Webinar: Doing a 180 on Customer 360

Build a modern platform for anti-money laundering 9.19.18

Introducing the data science sandbox as a service 8.30.18

Último

Story boards and shot lists for my a level piececharlottematthew16

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Install Stable Diffusion in windows machinePadma Pradeep

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Pentaho - Jake Cornelius - Hadoop World 2010

4. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Pentaho Data Integration Hadoop Pentaho Data Integration Data Marts, Data Warehouse, Analytical Applications Design Deploy Orchestrate Pentaho Data Integration Pentaho Data Integration

5. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Optimize Visualize Load Files / HDFS Hive DM & DW Applications & Systems Web Tier RDBMS Hadoop Reporting / Dashboards / Analysis

8. • Pentaho for Hadoop Download Capability • Includes support for development, production support will follow with GA • Collaborative effort between Pentaho and the Pentaho Community • 60+ beta sites over three month beta cycle • Pentaho contributed code for API integration with HIVE to the open source Apache Foundation • Pentaho and Cloudera Partnership • Combines Pentaho ‘s business intelligence and data integration capabilities with Cloudera’s Distribution for Hadoop (CDH) • Enables business users to take advantage of Hadoop with ability to easily and cost-effectively mine, visualize and analyze their Hadoop data Pentaho for Hadoop Announcements

9. Pentaho for Hadoop Announcements (cont) • Pentaho and Impetus Technologies Partnership • Incorporates Pentaho Agile BI and Pentaho BI Suite for Hadoop into Impetus Large Data Analytics practice • First major SI to adopt Pentaho for Hadoop • Facilitates large data analytics projects including expert consulting services, best practices support in Hadoop implementations and nCluster including deployment on private and public clouds

10. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Pentaho for Hadoop Resources & Events Resources Download www.pentaho.com/download/hadoop Pentaho for Hadoop webpage - resources, press, events, partnerships and more: www.pentaho.com/hadoop Big Data Analytics: 5 part video series with James Dixon, Pentaho CTO Events Hadoop World: NYC - Oct 12, Gold Sponsor, Exhibitor, Richard Daley presenting, ‘Putting Analytics in Big Data Analysis’ London Hadoop User Group - Oct 12, London Agile BI Meets Big Data - Oct 13, New York City

11. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Thank You. Join the conversation. You can find us on: Pentaho Facebook Group @Pentaho http://blog.pentaho.com Pentaho - Open Source Business Intelligence Group

Notas do Editor

In a traditional BI system where we have not been able to store all of the raw data, we have solved the problem by being selective. Firstly we selected the attributes of the data that we know we have questions about. Then we cleansed it and aggregated it to transaction levels or higher, and packaged it up in a form that is easy to consume. Then we put it into an expensive system that we could not scale, whether technically or financially. The rest of the data was thrown away or archived on tape, which for the purposes of analysis, is the same as throwing it away. TRANSITION The problem is we don’t know what is in the data that we are throwing away or archiving. We can only answer the questions that we could predict ahead of time.
When we look at the Big Data architecture we described before we recall that * We want to store all of the data, so we can answer both known and unknown questions * We want to satisfy our standard reporting and analysis requirements * We want to satisfying ad-hoc needs by providing the ability to dip into the lake at any time to extract data * We want to balance balance performance and cost as we scale We need the ability to take the data in the Data Lake and easily convert it into data suitable for a data mart, data warehouse or ad-hoc data set - without requiring custom Java code
Fortunately we have an embeddable data integration engine, written in Java We have taken our Data Integration engine, PDI and integrated with Hadoop in a number of different areas: * We have the ability to move files between Hadoop and external locations * We have the ability to read and write to HDFS files during data transformations * We have the ability to execute data transformations within the MapReduce engine * We have the ability to extract information from Hadoop and load it into external data bases and applications * And we have the ability to orchestrate all of this so you can integrate Hadoop into the rest of your data architecture with scheduling, monitoring, logging etc
Put in to diagram form so we can indicate the different layers in the architecture and also show the scale of the data we get this Big Data pyramid. * At the bottom of the pyramid we have Hadoop, containing our complete set of data. * Higher up we have our data mart layer. This layer has less data in it, but has better performance. * At the top we have application-level data caches. * Looking down from the top, from the perspective of our users, they can see the whole pyramid - they have access to the whole structure. The only thing that varies is the query time, depending on what data they want. * Here we see that the RDBMS layer lets up optimize access to the data. We can decide how much data we want to stage in this layer. If we add more storage in this layer, we can increase performance of a larger subset of the data lake, but it costs more money.
In this demo we will show how easy it is to execute a series of Hadoop and non-Hadoop tasks. We are going to TRANSITION 1 Get a weblog file from an FTP server TRANSITION 2 Make sure the source file does not exist with the Hadoop file system TRANSITION 3 Copy the weblog file into Hadoop TRANSITION 4 Read the weblog and process it - add metadata about the URLs, add geocoding, and enrich the operating system and browser attributes TRANSITION 5 Write the results of the data transformation to a new, improved, data file TRANSITION 6 Load the data into Hive TRANSITION 7 Read an aggregated data set from Hadoop TRANSITION 8 And write it into a database TRANSITION 9 Slice and dice the data with the database TRANSITION 10 And execute an ad-hoc query into Hadoop

Pentaho - Jake Cornelius - Hadoop World 2010

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Pentaho - Jake Cornelius - Hadoop World 2010

Semelhante a Pentaho - Jake Cornelius - Hadoop World 2010 (20)

Mais de Cloudera, Inc.

Mais de Cloudera, Inc. (20)

Último

Último (20)

Pentaho - Jake Cornelius - Hadoop World 2010

Notas do Editor