O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
1
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Turning Petabytes of Data into
Millions in...
2
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
DRIVE CUSTOMER INSIGHTS
IMPROVE PRODUCT &
...
3
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
OPERATIONS
DATA
MANAGEMENT
BATCH REAL-TIME...
4
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Talend: A History of Delivering Innovation...
5
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
CLOUDERA ENTERPRISE
Talend and Cloudera
En...
6
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Polling Question
• What is your top requir...
7
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
About me
 Jonathon Whitton, Director Data...
8
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
 Global leader in accounts payable recove...
9
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
What do we do for a living? …. We are in s...
10
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
We are a data-driven business dealing wit...
11
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
… and huge data variety
CLIENT DATA ARRIV...
12
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Our Legacy solution was designed in the 1...
13
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Our evaluation was on potential “architec...
14
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Timeline of HiPer program
Q3 2013 –
Explo...
15
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Our solution for structured data processi...
16
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Our solution for unstructured data proces...
17
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
How does Cloudera Hadoop impact our busin...
18
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
How does Cloudera Hadoop impact our busin...
19
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
How does Talend impact our development?
H...
20
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Cloudera Hadoop and Talend impact the fut...
21
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Key success factors
 Create the business...
22
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Polling Question
• Which Hadoop tools do ...
23
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Q&A
24
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Call to Action
Test Drive Talend Software...
25
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Why Cloudera and Talend?
 Scalability
 ...
26
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Thank you
Próximos SlideShares
Carregando em…5
×

Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Retailers

913 visualizações

Publicada em

PRGX is the world's leading provider of accounts payable audit services and works with leading global retailers. As new forms of data started to flow into their organizations, standard RDBMS systems were not allowing them to scale. Now, by using Talend with Cloudera Enterprise, they are able to acheive a 9-10x performance benefit in processing data, reduce errors, and now provide more innovative products and services to end customers.

Watch this webinar to learn how PRGX worked with Cloudera and Talend to create a high-performance computing platform for data analytics and discovery that rapidly allows them to process, model, and serve massive amount of structured and unstructured data.

Publicada em: Software
  • Seja o primeiro a comentar

Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Retailers

  1. 1. 1 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Turning Petabytes of Data into Millions in Cost Recapture for the World’s Biggest Retailers Case Study with PRGX Global Jonathon Whitton| Director Data Services
  2. 2. 2 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved DRIVE CUSTOMER INSIGHTS IMPROVE PRODUCT & SERVICES EFFICIENCY LOWER BUSINESS RISKS Data is Transforming Business MODERNIZE ARCHITECTURE
  3. 3. 3 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved OPERATIONS DATA MANAGEMENT BATCH REAL-TIME PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT SECURITY FILESYSTEM RELATIONAL NoSQL STORE INTEGRATE BATCH STREAM SQL SEARCH SDK Cloudera Enterprise Making Hadoop Fast, Easy, and Secure for the Modernized Architecture Hadoop is a new kind of data platform. • One place for unlimited data • Unified data access Cloudera makes it: • Fast for business • Easy to manage • Secure without compromise
  4. 4. 4 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Talend: A History of Delivering Innovation 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 (Revenue Growth) Data Integration Master Data Management Data Quality Big Data Application Integration Hadoop 2.0 Spark & Cloud • Our mission: enable the data driven enterprise • The most advanced Big Data integration platform • 10X faster development • One solution for batch & streaming • Deploy machine learning in minutes Data Preparation 1st to market for Spark 1st on YARN & Hadoop 2.0 1st to deliver a DI and AI platform 1st ETL open source tool
  5. 5. 5 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved CLOUDERA ENTERPRISE Talend and Cloudera Enterprise Data Warehouse Unstructured Data Data Sources Structured Data Archive Load Applications Reporting BI System Modeling Model OPERATIONS DATAMANAGEMENT UNIFIED SERVICES PROCESS,ANALYZE, SERVE STORE INTEGRATE APPLICATION INTEGRATION CLOUD INTEGRATION DATA INTEGRATION BIG DATA INTEGRATION MASTER DATA MANAGEMENT DATA PREPARATION Data Fabric Connect Serve Serve Ingest Ingest
  6. 6. 6 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Polling Question • What is your top requirement for modernizing your big data infrastructure? • Capture and secure all data • Operational simplicity • Variety of tools to interact with data • Integration and portability across multiple platforms • Timely availability of data • Other
  7. 7. 7 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved About me  Jonathon Whitton, Director Data Services at PRGX  Started working at PRGX in 2000  BA from Duke University  MBA from Kennesaw State University  LinkedIn.com/in/whitton  Twitter: @JonathonWhitton
  8. 8. 8 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved  Global leader in accounts payable recovery audit  Serve 75% of the Top 20 Global Retailers, most for more than 10 years  Nearly half of recovery audit revenue from outside the U.S., in more than 30 countries and across 5 continents  ~1400 employees
  9. 9. 9 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved What do we do for a living? …. We are in search of errors! Aggregate & Manipulate Client Data • Receive over 2 million client files annually • 2.3 petabytes of data “live” for auditing on average • Data includes purchasing, payment, receiving, deals, point of sale, and emails Mine Client Data for Overpayments • Utilize proprietary data mining tools and techniques • Fully document claim Recover Overpayments from Vendors • Handle majority of vendor communications • Verify that deductions taken or payments received
  10. 10. 10 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved We are a data-driven business dealing with huge data sizes… MONTHLY ANNUALLY200,000 2,400,000 # FILES FOR STRUCTURED DATA MONTHLY 40TB ANNUALLY 480TB MONTHLY 200TB ANNUALLY 2,375TB INBOUND FOR STRUCTURED DATA MONTHLY ANNUALLY12,500,000 150,000,000 # EMAILS UNSTRUCTURED DATA
  11. 11. 11 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved … and huge data variety CLIENT DATA ARRIVES CHALLENGES METHOD RECEIVED DELIVER TO AUDIT TYPES OF FILES % BY FILES EDI XML Flat file csv Flat file delimited database backups spreadsheets Pdfs Tiff Jpeg Png Prns Emails Microfiche Proprietary formats (FOR STRUCTURED DATA) 28% EBCDIC Flat 40% ASCII Flat 25% DB back-ups 7% Proprietary Daily Weekly Bi-weekly Monthly Quarterly Semi-annually Annually Unexpected Under- estimated New schema Wrong schema Tape Hard drive sFTP Email Other media – flash drives, DVDs DAILY WEEKLY BI-WEEKLY MONTHLY QUARTERLY SEMI ANNUALLY ANNUALLY
  12. 12. 12 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Our Legacy solution was designed in the 1990s… It had all the problems that legacy systems have • High lead times • Re-runs take a lot of operations • Highly labor intensive operations Tweaked and upgraded over the years but architecturally remained the same Transformation process Significant re-engineering zero downtime Our solution had to be flexible and lightning fast! HiPer High Performance Computing
  13. 13. 13 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Our evaluation was on potential “architectural stacks”  Functional Requirements  Information Security Requirements and Privacy Protection  Performance Requirements  Technical Requirements  Vendor Information Filter 1: High level assessment focused on vendor’s overall company strength, ability for the solution to scale, technology feasibility in PRGX 19 potential vendors Short list of 6 vendors 5 POC vendors Filter 2: Based on willingness of vendor to work with us on a POC and their flexibility
  14. 14. 14 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Timeline of HiPer program Q3 2013 – Explore technology that can do Data Transformation much faster Q4 2013 – Ready with problem statement Q1 2014 – Level 1 screening of technology stack options Q2 2014 –Proof of Concept Q3 2014 – Selection of Hadoop (Cloudera), Talend stack Q4 2014 – Pilot Accounts; Planning, prioritization for 2015 Q1 2015 – Production begins for top 15 accounts; unstructured analytics Proof of concept By Q2 2016 – Remove reliance on AS400; pilot accounts for unstructured analytics By Q4 2016 – 60% accounts in Production; majority of accounts for unstructured analytics
  15. 15. 15 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Our solution for structured data processing Data Acquisition and Load Data Transformation Auditing Infrastructure management Data Long Term Storage Hadoop Cluster Data Set 1 Data Set 2 Data Set 3 Data Set 4 Data Set 5 Audit Tool Set 1 Audit Tool Set 2 Audit Tool Set 3 1. Talend and related automation Data Privacy, Security and Data Lineage 1. HDFS – for long term data storage 2. Batch processing – Apache Hive and Apache Spark 3. Investigative queries (QC) – Apache Impala 4. Output to RDBMS – Apache Scoop and Talend 1. Cloudera Manager 1. Cloudera Navigator 2. Kerberos 1. Continue and invest in Legacy technology (MSSQL) 2. Explore use of Hadoop oriented BI tools
  16. 16. 16 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Our solution for unstructured data processing Data Acquisition and Load Data Transformation Auditing Infrastructure management Data (pst, nsf, …) Email Auditing Tool Data Privacy, Security and Data Lineage 1. Cloudera Manager 1. Kerberos 2. Mutiple Solr indexes / Hbase Tables EML Files EML files EML files Long term storage - HBASE Search Index - SOLR 1. In-house automation logic in .Net to convert files to eml files 2. Load to Hbase via Apache Thrift 3. Apache Tika to Read Email Data (Headers, Body, Attachments). 1. Hbase – email storage and updates to individual records 2. NGData Keystore indexer (Lily) to update Solr in near real time 3. SOLR – index to query emails 4. SOLRNET - .Net Api to SOLR querying 5. Apache Thrift - Update to HBASE for audit findings 1. Refreshed .Net application to read from SOLR/update to HBASE
  17. 17. 17 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved How does Cloudera Hadoop impact our business? Load and Prepare Data Audit Load and Prepare Data Audit X weeks X/2 weeks More time for Auditing Limited time to process and audit data HiPer’s Impact Delivery Cycle Structured data processing  Improved performance in data delivery for structured and unstructured data about 9-10 times the original timelines via Hive  Starting to use Spark
  18. 18. 18 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved How does Cloudera Hadoop impact our business? Faster analysis in Hadoop  Point of Sales (POS) data was aggregated  list of stores  number of transactions  sum of quantity  Would have run for hours in our AS400 environment  2014 would not have been readily available before Hadoop’s lower cost of storage made it affordable for us to keep more data online  No need to go back to our archive  No need to request a restore 2015 data 13 months: 20141201 to 20151231 Row count: 2,048,666,122 Returned: 1385 rows returned (1 per store) Returned in: 90 seconds 2014 data 13 months: 20131201 to 20141231 Row count: 2,159,626,445 Returned: 1365 rows returned (1 per store) Returned in: 90 seconds Field count in source POS data was 14
  19. 19. 19 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved How does Talend impact our development? Hive QL hand coding Talend Big Data 40% faster Move on to the next item this much faster Development Cycle Development Cycle  Reduced developing processes in Hadoop by at least 40%.  With over 200 distinct process to get through this year, the time savings is very meaningful in achieving our goals  No more hand coding Hive QL  Talend has the logic de-coupled from the code that is executing on the Hadoop cluster  “Upgrade” to Spark via dropdown and then addressing limited component changes that are clearly identified  “Upgrade” to the next big thing
  20. 20. 20 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Cloudera Hadoop and Talend impact the future of our business Structured data processing  Support for “machine-learning” techniques in the email and buyer pulls significantly enhancing productivity for audits Expanded focus  Key enabler to mine large volumes of data from our customers  Ability to re-run and experiment on large sets of data  Framework for industry standard Master Data Infrastructure  Backup and archival solution for the future  Begin standardization and reusability of processes; eliminate person dependency
  21. 21. 21 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Key success factors  Create the business case…for multiple years, with quick wins – no IT programs succeed without business backing  Plan the program – for multiple years  Architect it right • Architect keeping your current process and tools in mind • Architect it for different uses • Architect it for time!  Sell your program…you need champions everywhere – from the CEO/Board to your business, to marketing to even HR  Find the right partner; develop the partnership
  22. 22. 22 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Polling Question • Which Hadoop tools do you foresee using the most to help with your big data challenges? • Data ingestion tools • Search • Impala • Kudu • Spark
  23. 23. 23 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Q&A
  24. 24. 24 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Call to Action Test Drive Talend Software Define a Big Data ProjectLearn about Hadoop with Cloudera
  25. 25. 25 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Why Cloudera and Talend?  Scalability  Performance  Flexibility  Stability  Support
  26. 26. 26 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Thank you

×