SlideShare uma empresa Scribd logo
1 de 37
Common BI/Big Data
Challenges and Solutions
By Andriy Zabavskyy
& Serhiy Haziyev

January, 2013
SoftServe BI/Big Data Lunch and Learn Workshop in Utah
January 30, 2013

The Common BI/Big Data Challenges and Solutions presented by seasoned
SoftServe experts, Andriy Zabavskyy (BI Architect) and Serhiy Haziyev (Director of
Software Architecture).
This was a complimentary workshop where attendees had the opportunity to learn,
network and share knowledge during the lunch and education session.

About SoftServe Inc.
SoftServe, founded in 1993, is a leading global outsourced product and application
development company dedicated to empowering businesses worldwide by providing end-toend capabilities from product concept to completion. Utilizing Product Development Services
2.0 (PDS 2.0), we deliver proactive solutions in the areas of SaaS/Cloud, Mobility, BI/Analytics
and UI/UX for industries including Healthcare, Retail, Manufacturing, Logistics, and
Infrastructure & Storage. SoftServe is a rapidly growing global company with 3,000
professionals and offices in North America, Western Europe, Russia and Ukraine.
Agenda

Data
Visualization
Data
Mining

Big Data

Data
Integration

Data
Warehousing
Typical BI Solution
Data Sources

Data Integration

OLTP: CRM,
ERP, Finance

Data Warehouse

Data Mining

Users

Predictive
Prescriptive
Analytics

Data
Warehouse

OLAP cubes

Data Visualization
and Analysis

Flat files
ETL/ELT

Big Data

Reports
Dashboards

Spreadsheets

Legacy System

BI Tools

Analysts
Agenda

Data
Visualization
Data
Mining

Big Data

Data
Integration

Data
Warehousing
Dashboard & Scorecard
Client

Problem:
▪ Single view from multiple
sources
▪ Track performance against
company targets
Internet

Solution:
▪ Dashboard
▪ KPI and Scorecards

Server Tier
Dashboard & Scorecard: Implementation
Software Vendors Offering

Boxed solutions from
big players

Development Efforts

Customization

(e.g. SAS, SAP, IBI)

Dashboard Frameworks
(e.g. Tableau, QlikView, JasperSoft)

Dashboard libs
(JIDE libs)

Custom defined KPI
Integration Efforts
Custom defined KPI &
Custom built dashboard
framework
Dashboard & Scorecard: Highlights

• Adopting/Customizing of business lines ready
solution could be painful, long and costly process
• Not all dashboard solutions support multitenancy out-of-the-box
Self Service BI
Problem:
▪ Give ability for BI users to
explore and analyze data in
highly customizable manner

BI Users

Data Model

Solution:

Toolset

▪ Expose to users a data model
▪ Give a toolset with data
exploring and analysis
capabilities
OLAP

In-Memory

RDBMS/
NoSQL
Self Service BI: Implementation
• OLAP engines with proper OLAP
viewers
• BI tools with in-memory engines and
semantic/domain layers
• Report Authoring Tools :
– Microsoft Report Builder
– JasperServer Report Designer
Self Service BI: Traditional vs Agile BI Trade-off
Features
Time to Value
Self Service
Collaboration
Interactivity and UX

Customization
Data Quality
Pixel-perfect
Low cost solutions

Traditional

Agile
Self Service BI: Highlights

• Need to educate data consumers to properly use
SSBI tools
• Desktop versions of many SSBI vendors are often
more mature in comparison to Web tools
• In-memory capabilities are limited by RAM size
Agenda

Data
Visualization
Data
Mining

Big Data

Data
Integration

Data
Warehousing
Data Integration Patterns
Scheduled

ETL

ELT

Replication

EAI
EII
Real-time
Message/Record based

Large data sets

Source: Microsoft EDW Architecture, Guidance and Deployment Best Practices
ELT
Problem:
• Efficiently processing very
large volumes of data within
ever shortening processing
windows

Solution:
• Perform transformation steps
on target platform
• Set-based processing

Data Warehouse

Semantic Layer
Load

Staging Layer

Transform

Source

Source

Extract
ELT: Highlights

• Some data integration platforms have clearly separated
ETL and ELT components
• Consider usage of custom scripts native to target
platform vs. built-in DI component
ETL vs. ELT
ETL
Flow

Advantages

Disadvantages

ELT

 Data pipeline are used
 Transformations to the data one
record at a time
 Intermediate data results are
stored in memory

 Data is loaded into the
destination server
 Set-based processing
 Transformations and Lookups
are within the SQL

 Complex transformations
 Intermediate results in memory
is faster than persisting to disk

 The power of the relational
database system can be
utilized for very large data
sets

 Large data sets could
 Load on RDBMS
overwhelm the memory
 More disk activity
 Updates are more efficient using
set-based processing
Agenda

Data
Visualization
Data
Mining

Big Data

Data
Integration

Data
Warehousing
Kimball’s Multidimensional EDW
Problem:
• Integrate and consolidate data
from heterogeneous sources
• Keep data history

Data Warehouse

Solution:
• Use multidimensional model to
store data
• Iterate by business lines
• Integrate by conformed
dimensions

Data Sources
Kimball vs. Inmon
Sources

Data Integration and Data Warehousing

3NF

Inmon Approach

Kimball Approach

Visualization
Kimball vs. Inmon
Inmon

Kimball

Overall
Approach

Top-down

Bottom-up

Data
orientation

Subject- or data driven

Process oriented

Data
Modeling

Traditional

Multidimensional

Primary
Audience

IT professionals

End users
DWH: Implementation
• Trasitional RDBMS

• Analytical Column-based RDBMS
DWH: Highlights

Implications of column-based storage:
– Additional columns vs. Junked dimensions
– Update scenarios should be omitted where
possible
– Partitions scenario should be carefully established
to support maintenance activities
Agenda
Data
Visualization
Data
Mining

Big Data

Data
Integration

Data
Warehousing
Big Data

Big Data axis
Big Data: Hybrid Approach
Problem:
• Under big data circumstances:
– Flexible online analytics
– Access to most detailed raw
data

Operational and
Historical Analytics

Solution:
• Analytical RDBMS for online
analytics
• NoSQL DB as source for
RDBMS and most detailed row
data

NoSQL
RDBMS/DW

Source
Big Data: Implementation
Sample of Hybrid Approach in HP Operational Analytics Architecture
Tape Library

HDFS

Disk Array

Throughput
(600 GB load time)

140-500 MB/s
(0.3-1.2 h)

10-30 MB/s
(5.5-16 h)

50-700 MB/s
(0.25-4 h)

2-40 MB/s
(83h)

Max capacity

30-900 PB

21+ PB

16 PB

~Unlimited

Max file size

~Unlimited

~Unlimited

4 – 16 TB (OSlimited)

Accessibility

SAN

Java API, HTTP,
NFS (MapR)

NFS, CIFS, SAN

REST, SOAP

Scalability

Adding cartridges

Adding nodes

Adding disks

Pay-as-you-go

Reliability

Redundancy

Redundancy
(MapR)

Redundancy

99.99%

Encryption

Yes

Yes*

Yes*

Yes

By datacenter

By datacenter

By datacenter

By Amazon

?

No

Yes

Yes

Yes

No

No

Yes

Yes**

100 TB Cost

$40-60K

$100-200K

$80-400K

$132-216K/year

$12-96K/year

1 PB Cost

$90-140K

$1-2M

$0.5-4M

$1.1-1.6M/year

$120-360K/year

15 PB Cost

$0.7-1.2M

$15-30M

~$18M

$9.9-15M/year

$1.8-3.5M/year

HIPAA Compliancy
Random access
Parallel processing

Retention Storage

Requirements

Operation Storage

Big Data isn’t only Hadoop
Amazon S3

Amazon
Glacier

5 TB

40 TB

No
Big Data: Highlights

• Clickstream analysis is a classic use case
• Scheduled reports are well suited for Hadoop based
reports
• Majority of Self Service BI tools need relational
representation of data
Agenda
Data
Visualization
Data
Mining

Big Data

Data
Integration

Data
Warehousing
Prediction of Customer Loyalty
Problem:

Prediction

• Predict customer loyalty; profitability

Solution:
• Logistic regression algorithm
• Support vector machines

DM Tool

Historical Data

Algorithm
Recommendation System
Problem:

Recommendation

• Recommend to customers the most
suitable goods

Solution:

DM Tool

• K-means clustering algorithm
• Collaborative filtering

Historical Data

Algorithm
DM Models: Implementation

• Custom algorithm implementation
• Statistical packages like R
• Ready data mining model implementations
DM Models: Highlights

• The approach should be:
Problem -> Data Strategy -> Data analysis
… and not vice versa
• DM Algorithms should be carefully selected
• DM Algorithms are highly dependent on business
domain you create them for
SoftServe BI Maturity Model
• Improving the business

Wisdom

• decision making (executives)
• data mining, forecasting

• Gaining business insight

Knowledge

• analytical reports (analysts)
• dashboards, KPIs, scorecards, slice & dice, data
warehouse, OLAP

• Measuring and monitoring

Information

• consolidated reports (managers)
• charts, parametrized reports, dedicated
reporting database

• Running the business

Data

• personal operational reports
(workers, customers)
• simple reports, OLTP or files
SoftServe BI/BigData Expertise
Big Data and NoSQL

Data Integration

Data Warehouse

BI Platforms
More Info about SoftServe BI Offerings

 http://www.softserveinc.com/en-us/services/software-architecture/
 http://www.softserveinc.com/en-us/services/bi-analytics/

Mais conteúdo relacionado

Mais procurados

Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
madynav
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse Strategies
DATAVERSITY
 
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Erik Fransen
 
Oracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellOracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan Hartwell
HPDutchWorld
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-
AshishGuleria
 

Mais procurados (20)

DATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTUREDATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTURE
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overview
 
Project+team+1 slides (2)
Project+team+1 slides (2)Project+team+1 slides (2)
Project+team+1 slides (2)
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
 
Bi presentation Designing and Implementing Business Intelligence Systems
Bi presentation   Designing and Implementing Business Intelligence SystemsBi presentation   Designing and Implementing Business Intelligence Systems
Bi presentation Designing and Implementing Business Intelligence Systems
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse Strategies
 
Benefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topperBenefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topper
 
BISMART Bihealth. Microsoft Business Intelligence in health
BISMART Bihealth. Microsoft Business Intelligence in healthBISMART Bihealth. Microsoft Business Intelligence in health
BISMART Bihealth. Microsoft Business Intelligence in health
 
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
 
Microsoft business intelligence
Microsoft business intelligenceMicrosoft business intelligence
Microsoft business intelligence
 
Oracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellOracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan Hartwell
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analytics
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-
 
2013.12.12 big data heise webcast
2013.12.12 big data heise webcast2013.12.12 big data heise webcast
2013.12.12 big data heise webcast
 
E-Commerce and In-Memory Computing: Crossing the Scalability Chasm
E-Commerce and In-Memory Computing: Crossing the Scalability ChasmE-Commerce and In-Memory Computing: Crossing the Scalability Chasm
E-Commerce and In-Memory Computing: Crossing the Scalability Chasm
 
Performance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data WarehousePerformance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data Warehouse
 

Destaque

IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020
Anjan Roy, PMP
 
Rick Bicc Foundation Services
Rick   Bicc Foundation ServicesRick   Bicc Foundation Services
Rick Bicc Foundation Services
dfwcug
 
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
George Beaton
 

Destaque (20)

What you Need to Know about Machine Learning?
What you Need to Know about Machine Learning?What you Need to Know about Machine Learning?
What you Need to Know about Machine Learning?
 
JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)
 
Machine learning
Machine learningMachine learning
Machine learning
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of Analysis
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache Hadoop
 
SAP’s vision and strategy on BI & BIG (and small) data
SAP’s vision and strategy on BI & BIG (and small) dataSAP’s vision and strategy on BI & BIG (and small) data
SAP’s vision and strategy on BI & BIG (and small) data
 
Implementing business intelligence
Implementing business intelligenceImplementing business intelligence
Implementing business intelligence
 
IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020
 
Innovations in telecom
Innovations in telecomInnovations in telecom
Innovations in telecom
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 
Rick Bicc Foundation Services
Rick   Bicc Foundation ServicesRick   Bicc Foundation Services
Rick Bicc Foundation Services
 
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BI
 
Matinée Micropole DE LA BI A LA DATA INTELLIGENCE 18-10-2016
Matinée Micropole DE LA BI A LA DATA INTELLIGENCE 18-10-2016Matinée Micropole DE LA BI A LA DATA INTELLIGENCE 18-10-2016
Matinée Micropole DE LA BI A LA DATA INTELLIGENCE 18-10-2016
 
Webinar: Bring Your Data to Life with Power BI-2016-01-28
Webinar: Bring Your Data to Life with Power BI-2016-01-28Webinar: Bring Your Data to Life with Power BI-2016-01-28
Webinar: Bring Your Data to Life with Power BI-2016-01-28
 
VBA
VBAVBA
VBA
 
Power BI Desktop screen tour in Thai
Power BI Desktop screen tour in ThaiPower BI Desktop screen tour in Thai
Power BI Desktop screen tour in Thai
 
080827 abramson inmon vs kimball
080827 abramson   inmon vs kimball080827 abramson   inmon vs kimball
080827 abramson inmon vs kimball
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration tool
 
Big Data and BI Best Practices
Big Data and BI Best PracticesBig Data and BI Best Practices
Big Data and BI Best Practices
 

Semelhante a SoftServe BI/BigData Workshop in Utah

MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 

Semelhante a SoftServe BI/BigData Workshop in Utah (20)

Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Data Pipelines and Tools to Integrate with Power BI and Spotfire.pdf
Data Pipelines and Tools to Integrate with Power BI and Spotfire.pdfData Pipelines and Tools to Integrate with Power BI and Spotfire.pdf
Data Pipelines and Tools to Integrate with Power BI and Spotfire.pdf
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

SoftServe BI/BigData Workshop in Utah

  • 1. Common BI/Big Data Challenges and Solutions By Andriy Zabavskyy & Serhiy Haziyev January, 2013
  • 2. SoftServe BI/Big Data Lunch and Learn Workshop in Utah January 30, 2013 The Common BI/Big Data Challenges and Solutions presented by seasoned SoftServe experts, Andriy Zabavskyy (BI Architect) and Serhiy Haziyev (Director of Software Architecture). This was a complimentary workshop where attendees had the opportunity to learn, network and share knowledge during the lunch and education session. About SoftServe Inc. SoftServe, founded in 1993, is a leading global outsourced product and application development company dedicated to empowering businesses worldwide by providing end-toend capabilities from product concept to completion. Utilizing Product Development Services 2.0 (PDS 2.0), we deliver proactive solutions in the areas of SaaS/Cloud, Mobility, BI/Analytics and UI/UX for industries including Healthcare, Retail, Manufacturing, Logistics, and Infrastructure & Storage. SoftServe is a rapidly growing global company with 3,000 professionals and offices in North America, Western Europe, Russia and Ukraine.
  • 4. Typical BI Solution Data Sources Data Integration OLTP: CRM, ERP, Finance Data Warehouse Data Mining Users Predictive Prescriptive Analytics Data Warehouse OLAP cubes Data Visualization and Analysis Flat files ETL/ELT Big Data Reports Dashboards Spreadsheets Legacy System BI Tools Analysts
  • 6. Dashboard & Scorecard Client Problem: ▪ Single view from multiple sources ▪ Track performance against company targets Internet Solution: ▪ Dashboard ▪ KPI and Scorecards Server Tier
  • 7. Dashboard & Scorecard: Implementation Software Vendors Offering Boxed solutions from big players Development Efforts Customization (e.g. SAS, SAP, IBI) Dashboard Frameworks (e.g. Tableau, QlikView, JasperSoft) Dashboard libs (JIDE libs) Custom defined KPI Integration Efforts Custom defined KPI & Custom built dashboard framework
  • 8. Dashboard & Scorecard: Highlights • Adopting/Customizing of business lines ready solution could be painful, long and costly process • Not all dashboard solutions support multitenancy out-of-the-box
  • 9. Self Service BI Problem: ▪ Give ability for BI users to explore and analyze data in highly customizable manner BI Users Data Model Solution: Toolset ▪ Expose to users a data model ▪ Give a toolset with data exploring and analysis capabilities OLAP In-Memory RDBMS/ NoSQL
  • 10. Self Service BI: Implementation • OLAP engines with proper OLAP viewers • BI tools with in-memory engines and semantic/domain layers • Report Authoring Tools : – Microsoft Report Builder – JasperServer Report Designer
  • 11. Self Service BI: Traditional vs Agile BI Trade-off Features Time to Value Self Service Collaboration Interactivity and UX Customization Data Quality Pixel-perfect Low cost solutions Traditional Agile
  • 12. Self Service BI: Highlights • Need to educate data consumers to properly use SSBI tools • Desktop versions of many SSBI vendors are often more mature in comparison to Web tools • In-memory capabilities are limited by RAM size
  • 14. Data Integration Patterns Scheduled ETL ELT Replication EAI EII Real-time Message/Record based Large data sets Source: Microsoft EDW Architecture, Guidance and Deployment Best Practices
  • 15. ELT Problem: • Efficiently processing very large volumes of data within ever shortening processing windows Solution: • Perform transformation steps on target platform • Set-based processing Data Warehouse Semantic Layer Load Staging Layer Transform Source Source Extract
  • 16. ELT: Highlights • Some data integration platforms have clearly separated ETL and ELT components • Consider usage of custom scripts native to target platform vs. built-in DI component
  • 17. ETL vs. ELT ETL Flow Advantages Disadvantages ELT  Data pipeline are used  Transformations to the data one record at a time  Intermediate data results are stored in memory  Data is loaded into the destination server  Set-based processing  Transformations and Lookups are within the SQL  Complex transformations  Intermediate results in memory is faster than persisting to disk  The power of the relational database system can be utilized for very large data sets  Large data sets could  Load on RDBMS overwhelm the memory  More disk activity  Updates are more efficient using set-based processing
  • 19. Kimball’s Multidimensional EDW Problem: • Integrate and consolidate data from heterogeneous sources • Keep data history Data Warehouse Solution: • Use multidimensional model to store data • Iterate by business lines • Integrate by conformed dimensions Data Sources
  • 20. Kimball vs. Inmon Sources Data Integration and Data Warehousing 3NF Inmon Approach Kimball Approach Visualization
  • 21. Kimball vs. Inmon Inmon Kimball Overall Approach Top-down Bottom-up Data orientation Subject- or data driven Process oriented Data Modeling Traditional Multidimensional Primary Audience IT professionals End users
  • 22. DWH: Implementation • Trasitional RDBMS • Analytical Column-based RDBMS
  • 23. DWH: Highlights Implications of column-based storage: – Additional columns vs. Junked dimensions – Update scenarios should be omitted where possible – Partitions scenario should be carefully established to support maintenance activities
  • 26. Big Data: Hybrid Approach Problem: • Under big data circumstances: – Flexible online analytics – Access to most detailed raw data Operational and Historical Analytics Solution: • Analytical RDBMS for online analytics • NoSQL DB as source for RDBMS and most detailed row data NoSQL RDBMS/DW Source
  • 27. Big Data: Implementation Sample of Hybrid Approach in HP Operational Analytics Architecture
  • 28. Tape Library HDFS Disk Array Throughput (600 GB load time) 140-500 MB/s (0.3-1.2 h) 10-30 MB/s (5.5-16 h) 50-700 MB/s (0.25-4 h) 2-40 MB/s (83h) Max capacity 30-900 PB 21+ PB 16 PB ~Unlimited Max file size ~Unlimited ~Unlimited 4 – 16 TB (OSlimited) Accessibility SAN Java API, HTTP, NFS (MapR) NFS, CIFS, SAN REST, SOAP Scalability Adding cartridges Adding nodes Adding disks Pay-as-you-go Reliability Redundancy Redundancy (MapR) Redundancy 99.99% Encryption Yes Yes* Yes* Yes By datacenter By datacenter By datacenter By Amazon ? No Yes Yes Yes No No Yes Yes** 100 TB Cost $40-60K $100-200K $80-400K $132-216K/year $12-96K/year 1 PB Cost $90-140K $1-2M $0.5-4M $1.1-1.6M/year $120-360K/year 15 PB Cost $0.7-1.2M $15-30M ~$18M $9.9-15M/year $1.8-3.5M/year HIPAA Compliancy Random access Parallel processing Retention Storage Requirements Operation Storage Big Data isn’t only Hadoop Amazon S3 Amazon Glacier 5 TB 40 TB No
  • 29. Big Data: Highlights • Clickstream analysis is a classic use case • Scheduled reports are well suited for Hadoop based reports • Majority of Self Service BI tools need relational representation of data
  • 31. Prediction of Customer Loyalty Problem: Prediction • Predict customer loyalty; profitability Solution: • Logistic regression algorithm • Support vector machines DM Tool Historical Data Algorithm
  • 32. Recommendation System Problem: Recommendation • Recommend to customers the most suitable goods Solution: DM Tool • K-means clustering algorithm • Collaborative filtering Historical Data Algorithm
  • 33. DM Models: Implementation • Custom algorithm implementation • Statistical packages like R • Ready data mining model implementations
  • 34. DM Models: Highlights • The approach should be: Problem -> Data Strategy -> Data analysis … and not vice versa • DM Algorithms should be carefully selected • DM Algorithms are highly dependent on business domain you create them for
  • 35. SoftServe BI Maturity Model • Improving the business Wisdom • decision making (executives) • data mining, forecasting • Gaining business insight Knowledge • analytical reports (analysts) • dashboards, KPIs, scorecards, slice & dice, data warehouse, OLAP • Measuring and monitoring Information • consolidated reports (managers) • charts, parametrized reports, dedicated reporting database • Running the business Data • personal operational reports (workers, customers) • simple reports, OLTP or files
  • 36. SoftServe BI/BigData Expertise Big Data and NoSQL Data Integration Data Warehouse BI Platforms
  • 37. More Info about SoftServe BI Offerings  http://www.softserveinc.com/en-us/services/software-architecture/  http://www.softserveinc.com/en-us/services/bi-analytics/

Notas do Editor

  1. Split DW and BD
  2. Split DW and BD
  3. Split DW and BD