SlideShare a Scribd company logo
1 of 27
BIG DATA ANALYTICS
REFERENCE ARCHITECTURES AND
CASE STUDIES
BY SERHIY HAZIYEV AND OLHA HRYTSAY
Agenda
2
Big Data
Challenges
Big Data
Reference
Architectures
Case
Studies
10 tips for
Designing
Big Data
Solutions
Big Data Challenges
3
UNSTRUCTURED
STRUCTURED
HIGH
MEDIUM
LOW
Archives Docs Business
Apps
Media Social
Networks
Public
Web
Data
Storages
Machine
Log Data
Sensor
Data
Data Storages
RDBMS, NoSQL, Hadoop, file systems
etc.
Machine Log Data
Application logs, event logs, server
data, CDRs, clickstream data etc.
Sensor Data
Smart electric meters, medical
devices, car sensors, road cameras
etc.
Archives
Scanned documents, statements,
medical records, e-mails etc..
Docs
XLS, PDF, CSV, HTML, JSON etc.
Business Apps
CRM, ERP systems, HR, project
management etc.
Social Networks
Twitter, Facebook, Google+,
LinkedIn etc.
Public Web
Wikipedia, news, weather, public
finance etc
Media
Images, video, audio etc.
Velocity Variety VolumeComplexity
Big Data Analytics
4
Traditional Analytics (BI) Big Data Analytics
Focus on
Data Sets
Supports
• Descriptive analytics
• Diagnosis analytics
• Limited data sets
• Cleansed data
• Simple models
• Large scale data sets
• More types of data
• Raw data
• Complex data models
• Predictive analytics
• Data Science
Causation: what happened,
and why?
Correlation: new insight
More accurate answers
vs
Big Data Analytics Use Cases
5
Data
Discovery
Business
Reporting
Real Time
Intelligence
Data Quality
Self Service
Business Users
Intelligent AgentsConsumers
Low Latency
Reliability
Volume
Performance
Data Scientists/
Analysts
Big Data Analytics Reference Architectures
6
Architecture Drivers: Reference Architectures:
▪ Volume
▪ Sources
▪ Throughput
▪ Latency
▪ Extensibility
▪ Data Quality
▪ Reliability
▪ Security
▪ Self-Service
▪ Cost
▪ Extended Relational
▪ Non-Relational
▪ Hybrid
Relational Reference Architecture
7
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
OLAP Cubes
Query &
Reporting
Operational
Data Stores
Data Marts
Data
Warehouses
Replication
API/ODBC
Messaging
ETL
Unstructured
Semi-
Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
8
Extended Relational
Reference Architecture
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
OLAP Cubes
Query &
Reporting
Operational
Data Stores
Data Marts
Data
Warehouses
Replication
API/ODBC
Messaging
ETL
Unstructured
Semi-
Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
Key components affected with Big Data challenges
Non-Relational Reference Architecture
9
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
Map Reduce
Query &
Reporting
Search Engines
Distributed File
Systems
NoSQL
Databases
API
Messaging
ETL
Unstructured
Semi-
Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
Key components introduced with non-relational movement
Extended Relational vs. Non-Relational Architecture
10
Architecture Drivers
Extended
Relational
Non-Relational
Large data volume
Self-service (ad-hoc reporting)
Unstructured data processing
High data model extensibility
High data quality and consistency
Extensive security
Reliability and fault-tolerance
Low latency (near-real time)
Low cost
Skills availability
Extended Relational vs. Non-Relational Architecture
11
Architecture Drivers
Extended
Relational
Non-Relational
Large data volume
Self-service (ad-hoc reporting)
Unstructured data processing
High data model extensibility
High data quality and consistency
Extensive security
Reliability and fault-tolerance
Low latency (near-real time)
Low cost
Skills availability
Relational vs. Non-Relational Architecture
12
Relational Non-Relational
• Rational
• Predictable
• Traditional
• Agile
• Flexible
• Modern
Data
Discovery
Business
Reporting
Real Time
Intelligence
Big Data Analytics Use Cases
13
Business Users
Intelligent AgentsConsumers
Performance
Volume
Data Scientists
Data Discovery: Non-Relational Architecture
14
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
Map Reduce
Query &
Reporting
Search Engines
Distributed File
Systems
NoSQL
Databases
API
Messaging
ETL
Unstructured
Semi-
Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
Data
Discovery
Business
Reporting
Real Time
Intelligence
Big Data Analytics Use Cases
15
Intelligent AgentsConsumers
Data Scientists
Data Quality
Self Service
Business Users
Business Reporting: Hybrid Architecture
16
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Map Reduce
SQL Query &
Reporting
Distributed File
Systems
API
Messaging
ETL
Unstructured
Semi-
Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
Relational
DWH/DM
Advanced
Analytics
Search Engines
Extended Relational components Non-relational components
Data
Discovery
Business
Reporting
Real Time
Intelligence
Big Data Analytics Use Cases
17
Data Scientists Business Users
Intelligent AgentsConsumers
Low Latency
Reliability
Lambda Architecture
18
Source:
19
Business Goals:
 Provide development environment
for building custom mobile applications
 Charge customers for the platform they
use with pay-as-you-go model
Business Area:
Cloud based platform for building, deploying,
hosting and managing mobile applications
Case Study #1: Usage & Billing Analysis
Architectural Decisions
20
▪ Volume (> 10 TB)
▪ Sources (Semi-structured - JSON)
▪ Throughput (> 10K/sec)
▪ Latency (2 min)
▪ Extensibility (Custom metrics)
▪ Data Quality (Consistency)
▪ Reliability (24/7)
▪ Security (Multitenancy)
▪ Self-Service (Ad-Hoc reports)
▪ Cost (The less the better )
▪ Constraints (Public Cloud)
Architecture Drivers:
Trade-off:
//
Extended
Relational
Non-Relational
Extensibility - +
Data Quality + -
Self-Service + -
 Extended Relational Architecture
 Extensibility via Pre-allocated
Fields pattern
Solution Architecture
21
Technologies:
• Amazon Redshift
• Amazon SQS
• Amazon S3
• Elastic Beanstalk
• Jaspersoft BI Professional
• Python
22
Business Goals:
 Build in-house Analytics Platform for ROI measurement
and performance analysis of every product and feature
delivered by the e-commerce platform;
 Provide the ability to understand how end-users are
interacting with service content, products, and features on
sites;
 Do clickstream analysis;
 Perform A/B Testing
Business Area:
Retail. A platform for e-commerce and
collecting feedbacks from customers
Case Study #2: Clickstream for retail website
//
Extended
Relational
Non-
Relational
Volume/Scalability +/- +
Throughput + +
Self-Service + +/-
Extensibility - +
Architectural Decisions
23
▪ Volume (45 TB)
▪ Sources (Semi-structured - JSON)
▪ Throughput (> 20K/sec)
▪ Latency (1 hour)
▪ Extensibility (Custom tags)
▪ Data Quality (Not critical)
▪ Reliability (24/7)
▪ Security (Multitenancy)
▪ Self-Service (Canned reports, Data
science)
▪ Cost (The less the better )
▪ Constraints (Public Cloud)
Architecture Drivers:
Trade-off:
 Non-Relational Architecture
 Reporting via Materialized View
pattern
Solution Architecture
24
Technologies:
• Amazon S3
• Flume
• Hadoop/HDFS, MapReduce
• HBase
• Oozie
• Hive
Node 1
Node 2
Node N
10 Tips for Designing Big Data Solutions
25
 Understand data users and sources
 Discover architecture drivers
 Select proper reference architecture
 Do trade-off analysis, address cons
 Map reference architecture to technology stack
 Prototype, re-evaluate architecture
 Estimate implementation efforts
 Set up devops practices from the very beginning
 Advance in solution development through “small wins”
 Be ready for changes, big data technologies are evolving
rapidly
26
▪ Leading global Product and
Application Development partner
founded in 1993
▪ 3,300+ employees across North
America, Ukraine and Western
Europe
▪ Thousands of successful outsourcing
projects!
SaaS/Cloud Solutions . Mobility Solutions . UX/UI
BI/Analytics/Big Data . Software Architecture . Security
Clients include:
Thank You!
27
SoftServe US Office
One Congress Plaza,
111 Congress Avenue, Suite 2700 Austin, TX
78701
Tel: 512.516.8880
Contacts
Serhiy Haziyev: shaziyev@softserveinc.com
Olha Hrytsay: ohrytsay@softserveinc.com

More Related Content

What's hot

What's hot (20)

Big Data Analytics for Banking, a Point of View
Big Data Analytics for Banking, a Point of ViewBig Data Analytics for Banking, a Point of View
Big Data Analytics for Banking, a Point of View
 
Business Intelligence - Intro
Business Intelligence - IntroBusiness Intelligence - Intro
Business Intelligence - Intro
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
The Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationThe Path to Data and Analytics Modernization
The Path to Data and Analytics Modernization
 
Business Intelligence concepts
Business Intelligence conceptsBusiness Intelligence concepts
Business Intelligence concepts
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Data governance Program PowerPoint Presentation Slides
Data governance Program PowerPoint Presentation Slides Data governance Program PowerPoint Presentation Slides
Data governance Program PowerPoint Presentation Slides
 
Slides: Taking an Active Approach to Data Governance
Slides: Taking an Active Approach to Data GovernanceSlides: Taking an Active Approach to Data Governance
Slides: Taking an Active Approach to Data Governance
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
 
Data Management
Data ManagementData Management
Data Management
 
What is data engineering?
What is data engineering?What is data engineering?
What is data engineering?
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analytics
 
Web content mining
Web content miningWeb content mining
Web content mining
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
 
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
 
Data Mining
Data MiningData Mining
Data Mining
 

Similar to Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziyev and Olha Hrytsay

Similar to Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziyev and Olha Hrytsay (20)

big_data_case_studies.pdf
big_data_case_studies.pdfbig_data_case_studies.pdf
big_data_case_studies.pdf
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solves
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
 
Data APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of EngagementData APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of Engagement
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics Webinar
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Data Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSData Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWS
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsEnterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 

More from SoftServe

More from SoftServe (20)

Approaching Quality in Digital Era
Approaching Quality in Digital EraApproaching Quality in Digital Era
Approaching Quality in Digital Era
 
Digital Product Security
Digital Product SecurityDigital Product Security
Digital Product Security
 
Testing Tools and Tips
Testing Tools and TipsTesting Tools and Tips
Testing Tools and Tips
 
Android Mobile Application Testing: Human Interface Guideline, Tools
Android Mobile Application Testing: Human Interface Guideline, ToolsAndroid Mobile Application Testing: Human Interface Guideline, Tools
Android Mobile Application Testing: Human Interface Guideline, Tools
 
Android Mobile Application Testing: Specific Functional, Performance, Device ...
Android Mobile Application Testing: Specific Functional, Performance, Device ...Android Mobile Application Testing: Specific Functional, Performance, Device ...
Android Mobile Application Testing: Specific Functional, Performance, Device ...
 
How to Reduce Time to Market Using Microsoft DevOps Solutions
How to Reduce Time to Market Using Microsoft DevOps SolutionsHow to Reduce Time to Market Using Microsoft DevOps Solutions
How to Reduce Time to Market Using Microsoft DevOps Solutions
 
Containerization: The DevOps Revolution
Containerization: The DevOps Revolution Containerization: The DevOps Revolution
Containerization: The DevOps Revolution
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS
 
Implementing Test Automation: What a Manager Should Know
Implementing Test Automation: What a Manager Should KnowImplementing Test Automation: What a Manager Should Know
Implementing Test Automation: What a Manager Should Know
 
Using AWS Lambda for Infrastructure Automation and Beyond
Using AWS Lambda for Infrastructure Automation and BeyondUsing AWS Lambda for Infrastructure Automation and Beyond
Using AWS Lambda for Infrastructure Automation and Beyond
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for InnovationBig Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
 
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
 
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
 
Managing Requirements with Word and TFS by Max Markov
Managing Requirements with Word and TFS by Max MarkovManaging Requirements with Word and TFS by Max Markov
Managing Requirements with Word and TFS by Max Markov
 
How to Implement Hybrid Cloud Solutions Successfully
How to Implement Hybrid Cloud Solutions SuccessfullyHow to Implement Hybrid Cloud Solutions Successfully
How to Implement Hybrid Cloud Solutions Successfully
 
Designing Big Data Systems Like a Pro
Designing Big Data Systems Like a ProDesigning Big Data Systems Like a Pro
Designing Big Data Systems Like a Pro
 
Product Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
Product Management in Outsourcing by Roman Kolodchak and Roman PavlyukProduct Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
Product Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
 

Recently uploaded

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziyev and Olha Hrytsay

  • 1. BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BY SERHIY HAZIYEV AND OLHA HRYTSAY
  • 3. Big Data Challenges 3 UNSTRUCTURED STRUCTURED HIGH MEDIUM LOW Archives Docs Business Apps Media Social Networks Public Web Data Storages Machine Log Data Sensor Data Data Storages RDBMS, NoSQL, Hadoop, file systems etc. Machine Log Data Application logs, event logs, server data, CDRs, clickstream data etc. Sensor Data Smart electric meters, medical devices, car sensors, road cameras etc. Archives Scanned documents, statements, medical records, e-mails etc.. Docs XLS, PDF, CSV, HTML, JSON etc. Business Apps CRM, ERP systems, HR, project management etc. Social Networks Twitter, Facebook, Google+, LinkedIn etc. Public Web Wikipedia, news, weather, public finance etc Media Images, video, audio etc. Velocity Variety VolumeComplexity
  • 4. Big Data Analytics 4 Traditional Analytics (BI) Big Data Analytics Focus on Data Sets Supports • Descriptive analytics • Diagnosis analytics • Limited data sets • Cleansed data • Simple models • Large scale data sets • More types of data • Raw data • Complex data models • Predictive analytics • Data Science Causation: what happened, and why? Correlation: new insight More accurate answers vs
  • 5. Big Data Analytics Use Cases 5 Data Discovery Business Reporting Real Time Intelligence Data Quality Self Service Business Users Intelligent AgentsConsumers Low Latency Reliability Volume Performance Data Scientists/ Analysts
  • 6. Big Data Analytics Reference Architectures 6 Architecture Drivers: Reference Architectures: ▪ Volume ▪ Sources ▪ Throughput ▪ Latency ▪ Extensibility ▪ Data Quality ▪ Reliability ▪ Security ▪ Self-Service ▪ Cost ▪ Extended Relational ▪ Non-Relational ▪ Hybrid
  • 7. Relational Reference Architecture 7 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics OLAP Cubes Query & Reporting Operational Data Stores Data Marts Data Warehouses Replication API/ODBC Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured
  • 8. 8 Extended Relational Reference Architecture Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics OLAP Cubes Query & Reporting Operational Data Stores Data Marts Data Warehouses Replication API/ODBC Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured Key components affected with Big Data challenges
  • 9. Non-Relational Reference Architecture 9 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics Map Reduce Query & Reporting Search Engines Distributed File Systems NoSQL Databases API Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured Key components introduced with non-relational movement
  • 10. Extended Relational vs. Non-Relational Architecture 10 Architecture Drivers Extended Relational Non-Relational Large data volume Self-service (ad-hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault-tolerance Low latency (near-real time) Low cost Skills availability
  • 11. Extended Relational vs. Non-Relational Architecture 11 Architecture Drivers Extended Relational Non-Relational Large data volume Self-service (ad-hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault-tolerance Low latency (near-real time) Low cost Skills availability
  • 12. Relational vs. Non-Relational Architecture 12 Relational Non-Relational • Rational • Predictable • Traditional • Agile • Flexible • Modern
  • 13. Data Discovery Business Reporting Real Time Intelligence Big Data Analytics Use Cases 13 Business Users Intelligent AgentsConsumers Performance Volume Data Scientists
  • 14. Data Discovery: Non-Relational Architecture 14 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics Map Reduce Query & Reporting Search Engines Distributed File Systems NoSQL Databases API Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured
  • 15. Data Discovery Business Reporting Real Time Intelligence Big Data Analytics Use Cases 15 Intelligent AgentsConsumers Data Scientists Data Quality Self Service Business Users
  • 16. Business Reporting: Hybrid Architecture 16 Web Services Mobile Devices Native Desktop Web Browsers Map Reduce SQL Query & Reporting Distributed File Systems API Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured Relational DWH/DM Advanced Analytics Search Engines Extended Relational components Non-relational components
  • 17. Data Discovery Business Reporting Real Time Intelligence Big Data Analytics Use Cases 17 Data Scientists Business Users Intelligent AgentsConsumers Low Latency Reliability
  • 19. 19 Business Goals:  Provide development environment for building custom mobile applications  Charge customers for the platform they use with pay-as-you-go model Business Area: Cloud based platform for building, deploying, hosting and managing mobile applications Case Study #1: Usage & Billing Analysis
  • 20. Architectural Decisions 20 ▪ Volume (> 10 TB) ▪ Sources (Semi-structured - JSON) ▪ Throughput (> 10K/sec) ▪ Latency (2 min) ▪ Extensibility (Custom metrics) ▪ Data Quality (Consistency) ▪ Reliability (24/7) ▪ Security (Multitenancy) ▪ Self-Service (Ad-Hoc reports) ▪ Cost (The less the better ) ▪ Constraints (Public Cloud) Architecture Drivers: Trade-off: // Extended Relational Non-Relational Extensibility - + Data Quality + - Self-Service + -  Extended Relational Architecture  Extensibility via Pre-allocated Fields pattern
  • 21. Solution Architecture 21 Technologies: • Amazon Redshift • Amazon SQS • Amazon S3 • Elastic Beanstalk • Jaspersoft BI Professional • Python
  • 22. 22 Business Goals:  Build in-house Analytics Platform for ROI measurement and performance analysis of every product and feature delivered by the e-commerce platform;  Provide the ability to understand how end-users are interacting with service content, products, and features on sites;  Do clickstream analysis;  Perform A/B Testing Business Area: Retail. A platform for e-commerce and collecting feedbacks from customers Case Study #2: Clickstream for retail website
  • 23. // Extended Relational Non- Relational Volume/Scalability +/- + Throughput + + Self-Service + +/- Extensibility - + Architectural Decisions 23 ▪ Volume (45 TB) ▪ Sources (Semi-structured - JSON) ▪ Throughput (> 20K/sec) ▪ Latency (1 hour) ▪ Extensibility (Custom tags) ▪ Data Quality (Not critical) ▪ Reliability (24/7) ▪ Security (Multitenancy) ▪ Self-Service (Canned reports, Data science) ▪ Cost (The less the better ) ▪ Constraints (Public Cloud) Architecture Drivers: Trade-off:  Non-Relational Architecture  Reporting via Materialized View pattern
  • 24. Solution Architecture 24 Technologies: • Amazon S3 • Flume • Hadoop/HDFS, MapReduce • HBase • Oozie • Hive Node 1 Node 2 Node N
  • 25. 10 Tips for Designing Big Data Solutions 25  Understand data users and sources  Discover architecture drivers  Select proper reference architecture  Do trade-off analysis, address cons  Map reference architecture to technology stack  Prototype, re-evaluate architecture  Estimate implementation efforts  Set up devops practices from the very beginning  Advance in solution development through “small wins”  Be ready for changes, big data technologies are evolving rapidly
  • 26. 26 ▪ Leading global Product and Application Development partner founded in 1993 ▪ 3,300+ employees across North America, Ukraine and Western Europe ▪ Thousands of successful outsourcing projects! SaaS/Cloud Solutions . Mobility Solutions . UX/UI BI/Analytics/Big Data . Software Architecture . Security Clients include:
  • 27. Thank You! 27 SoftServe US Office One Congress Plaza, 111 Congress Avenue, Suite 2700 Austin, TX 78701 Tel: 512.516.8880 Contacts Serhiy Haziyev: shaziyev@softserveinc.com Olha Hrytsay: ohrytsay@softserveinc.com

Editor's Notes

  1. Big Data – data that is too large complex and dynamic for any conventional data tools to capture, store, manage and analyze.
  2. More details can be found here: link to our case study