SlideShare uma empresa Scribd logo
1 de 23
Enabling Key Business Advantage from Big Data
through Advanced Ingest Processing
Ronald S. Indeck, PhD
President and Founder
VelociData, Inc.
Solving the Need for Speed in
Big DataOps
www.velocidata.com info@velocidata.com
Today’s Discussion
• Motivations for Advanced Processing
• Total Data Challenges
• Economical Parallelism for IT is Arriving
• Heterogeneous System Architectures (HSA)
• HSA Implementation and Business Benchmarks
• Questions
www.velocidata.com info@velocidata.com
Big Data?
3
www.velocidata.com info@velocidata.com
The Urgency for Gaining Answers in Seconds
Companies that Embrace Analytics
Accelerate Performance
“Value Integrators” achieve higher
business performance:
‒ 20 times the EBITDA growth
‒ 50% more revenue growth
• “Large-scale data gathering and analytics are quickly becoming a new frontier of competitive
differentiation” – HBR
• The challenge for IT is to economically provide real time, quality data to support business analytics
and meet time-bound service level requirements when data are doubling every 12 months
Analytics is creating a
competitive advantage
4
www.velocidata.com info@velocidata.com
Recognizing “Total Data” Challenges
• Bloor: Databases are more than adequate for the use cases they are
designed to support
• Consider Big Data AND Relational, not OR … think “Total Data”
• The critical unsolved challenge is breaking Total Data flow bottlenecks
5
• Total Data challenges
• Data volumes exploding
• Data velocity and variety growing
• Data must quickly move between disparate systems
• Processing high volumes on mainframes is expensive
• No spare resources for critical encryption / masking
• Improving or measuring data quality is challenging
www.velocidata.com info@velocidata.com
Conventional Approaches
• Add more cores and memory to the existing platform
• Push processing into MPP (Teradata, Netezza, …)
• Change the infrastructure (Oracle Exadata, …)
• Use distributed platforms (Hadoop, ...)
These require new skills, time, capital, management, support,
risk … and fail to truly solve the Total Data flow problem
6
www.velocidata.com info@velocidata.com
Parallelism in IT Processing is Compelling
• Amdahl’s Law
• High Performance Computing history
• Systems were expensive
• Unique tools and training required
• Scaling performance is often sub-linear
• Issues with timing and thread synchronization
HPC has struggled for 40 years to deliver widespread accessibility mostly due
to cost and poor abstraction, development tools, and design environment
If we could just deliver accessibility at an affordable cost …
• Hardware is now becoming inexpensive
• Application development improvements still needed to enable productivity
 Abstract through implementation of streaming as the paradigm
7
www.velocidata.com info@velocidata.com
Complementary Approach: Heterogeneous System Architecture
• Leverage a variety of compute resources
• Not just parallel threads on identical resources
• Right resources at the right times
• Functional elements use appropriate processing components where needed
• Accommodate stream processing
• Source  processing  target
• Streaming data model enables pipelining, data flow acceleration
• Embrace fine-grained pipeline / functional parallelism
• Especially data / direct parallelism
• Separate latency and throughput
• Engineered system
• Manage thread, memory, and resource timing and contention
8
www.velocidata.com info@velocidata.com
Heterogeneous System Architecture
General purpose “not bad at everything”
- Good branch prediction, fast access to large
memory
Thousands of cores performing very
specific tasks
- Excellent matrix and floating point
Fully customizable with extreme
opportunities for parallelism
- Excels at bit manipulation for regex,
cryptography, searching, …
9
Standard CPUs
Graphics Boards (GPUs)
FPGA Coprocessors
www.velocidata.com info@velocidata.com
• Compute “value at risk” for a portfolio
• 1024 stocks
• Evaluate using Monte Carlo simulation
• Brownian motion random walk
• Execute 1 million trials and aggregate results: 1 trial equals 1024
random walks
• Double-precision computation
Example: Risk Modeling Application
10
www.velocidata.com info@velocidata.com
Example: Risk Modeling Performance Results
• Baseline [CPU-only]
• 450 thousand walks/second  37 minutes to execute 1 billion walks
• FPGA + GPU + CPU
• 140 million walks/second  6 seconds for 1 billion walks
• Speedup of 370x
• Other financial MC simulations are similar
*First use of GPU, FPGA, and CPU in one application
application
stage 1
application
stage 2
application
stage 3
FPGA
graphics
engine
chip
multi-
processor
11
www.velocidata.com info@velocidata.com
• Bundles software, firmware, and hardware into an appliance
• Delivers the right compute resource (CPU, GPU, and FPGA) to the right
process at the right time
• Uses other system resources effectively
• High-level abstraction: no need to code, re-train, or acquire new skillsets
• Promotes stream processing for real-time action
• Sources  processing  targets
• Streaming data model enables pipelining for data flow acceleration
Stream Processing as an HSA Appliance
12
www.velocidata.com info@velocidata.com
Example: VelociData Solution Palette
17
VelociData
Suites
VelociData Solutions Examples Conventional
(records/second)
VelociData
(records/second)
Data
Transformation
Lookup and Replace
Data enrichment by populating fields from a master file,
dictionary translations, etc. (e.g. CP  Cardiopulmonologist)
3000-6000 600,000
Type Conversions XML  Fixed; Binary  Char; Date/Time Formats 1000-2000 800,000
Format Conversions
Rearrange, add, drop, merge, split, and resize fields to change
layouts
1000-10,000 650,000
Key Generation Hash multiple field values into a unique key, (e.g. SHA-2) 3000-20,000 > 1,000,000
Data Masking
Obfuscate data for non-production uses: Persistent or Dynamic;
Format preserving; AES-256
500-10,000 > 1,000,000
Data Quality
USPS Address
Processing
Standardization, verification, and cleansing
(CASS certification in process)
600-2000 400,000
Domain Set Validation
Validate a value based on a list of acceptable values (e.g., all
product codes at a retailer; all countries in the world)
1000-3000 750,000
Field Content Validation
Validates based on patterns such as emails, dates, and phone
numbers
1000-3000 > 1,000,000
Data type validation and bounds checking 3000-6000 > 1,000,000
Data Platform
Conversion
Mainframe Data
Conversion
Copybook parsing & data layout discovery; EBCDIC, COMP,
COMP-3, …  ASCII, Integer, Float,…
200-800 > 200,000
Data Sort Accelerated Data Sort
Sort data using complex sort keys from multiple fields within
records
7000-20,000 1,000,000
Results are system dependent but data intended to provide magnitude comparison
www.velocidata.com info@velocidata.com
Example of Common ETL Bottlenecks
Task #1
Task #2
Task #3
Task #4
Task #5
Task #6
Task #7
Task #8
Staging DB
ETL Server
Candidates for
Acceleration
Extract Transform Load
CSV
Mainframe
XML
RDBMS
Social Media
Sensor
Hadoop
• Hadoop
• ETL Server
• Data Warehouse
• Database Appliances
• BI Tools
•Cloud
www.velocidata.com info@velocidata.com
Example ETL Processes Offloaded
15
Task #6
Task #7
Task #8
Staging DB
ETL Server
Extract Transform Load
Keep Existing Input
Interfaces
Remove
Bottlenecks
Reduce ETL Server
Workload
Faster Total
Processing Time
CSV
Mainframe
XML
RDBMS
Social Media
Sensor
Hadoop
Task #1
Task #2
Task #3
Task #4
Task #5
• Hadoop
• ETL Server
• Data Warehouse
• Database
Appliances
• BI Tools
•Cloud
www.velocidata.com info@velocidata.com
Example Mainframe-to-Hadoop Workflow
• Simple, configuration-driven workflow
• Sample shows Mainframe  HDFS
• Data are validated, cleansed, reformatted, enriched, …, along the way
• Enables landing analytics-ready data as fast as it can move
across the wire
• Workflow can also work in reverse to return processed data to
the mainframe
16
Mainframe
Input
Validation Key Generation Formatter Lookup Address
Standardization CSV Out
www.velocidata.com info@velocidata.com
Wire-rate Platform Integration
17
Enable fast data access between systems
MPP Platforms (e.g., Teradata)
Format and improve data for ready
insertion into Data Analytics
architectures ETL Server
Preprocess data for fast movement
into and out of Data Integration tools
Mainframe
Conversion into and out
of EBCDIC and packed
decimal formats
Hadoop
Convert data to ASCII and
improve quality in flight
VelociData
feeds Hadoop
pre-processed,
quality data for
real-time BI efforts
VelociData
enables real-time
data access by
Teradata for
operational
analytics
www.velocidata.com info@velocidata.com
Enabling Three Layers of Data Access
VelociData delivers Hadoop
pre-processed, quality data to
keep “the lake” clean
Hadoop
VelociData enables real-time
data access for immediate
analytics and visualization
VelociData feeds databases and
warehouses pre-analytic, aggregated
data for operational analytics
• Sensors
• Weblogs
• Transactions
• Mainframe
• Hadoop
• Social Media
• RDBMS
• …
Wire-rate transformations and convergence of fresh and historical data
19
www.velocidata.com info@velocidata.com
Accessing Realtime and Historical Data
• Realtime Analysis for Competitive
Advantage
• Enabling the speed of business to match
business opportunities
• Integrating Historical Data for
Operational Excellence
• Informing traditional BI with real-time inputs
19
Conventional Batch-oriented BI
Real-time Operational Analytics
Iterative Modeling
Business Excellence
www.velocidata.com info@velocidata.com
Stream Processing AND Hadoop
Leveraging stream processing with batch-oriented Hadoop
• Access to more data for analytics
• Process data on ingest (also land raw data if desired)
• Transformation
• Cleansing
• Security
• Never read a COBOL copybook again
• Stream sort for integrating data, aggregation, and dedupe
• …
20
www.velocidata.com info@velocidata.com
Examples of Data Challenges Being Solved
21
• Pharmaceutical discovery query is reduced from 8 days to 20 minutes
• Retailer now integrates full customer data from in-store, on-line, and mobile sources in
real-time (processing 50,000 records/s, up from 100/s)
• Property casualty company shortens by five-fold a daily task of processing 540 million
records to enable more accurate real-time quoting
• Credit card company reduces mainframe costs and improves analytics performance by
integrating historical and fresh data into Hadoop at line rates
• Financial processing network masks 5 million fields/s of production data to sell
opportunity information to retailers
• To enable better customer support, a health benefits provider shortens a data
integration process from 16 hours to 45 seconds
• Billions of records with multi-fields keys are sorted nearly a million records/s for
analytics and data quality
• USPS address standardization at 10 billion/hour for data cleansing on ingest
www.velocidata.com info@velocidata.com
Thank You!
www.velocidata.com info@velocidata.com
Questions?

Mais conteúdo relacionado

Mais procurados

Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Martin Bém
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Rittman Analytics
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher Tamir Dresher
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLCloudera, Inc.
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL ServerMark Kromer
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsEduardo Castro
 
The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architectureJoseph D'Antoni
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringDurga Gadiraju
 
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Mark Rittman
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 

Mais procurados (20)

Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETL
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL Server
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
 
The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architecture
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Yahoo's Next Generation User Profile Platform
Yahoo's Next Generation User Profile PlatformYahoo's Next Generation User Profile Platform
Yahoo's Next Generation User Profile Platform
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
2022 02 Integration Bootcamp
2022 02 Integration Bootcamp2022 02 Integration Bootcamp
2022 02 Integration Bootcamp
 
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 

Semelhante a Enabling Key Business Advantage from Big Data through Advanced Ingest Processing - StampedeCon 2014

StreamCentral Technical Overview
StreamCentral Technical OverviewStreamCentral Technical Overview
StreamCentral Technical OverviewRaheel Retiwalla
 
From Data to Services at the Speed of Business
From Data to Services at the Speed of BusinessFrom Data to Services at the Speed of Business
From Data to Services at the Speed of BusinessAli Hodroj
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAmazon Web Services
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Web Services
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseAltibase
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Tech Triveni
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Crate.io
 

Semelhante a Enabling Key Business Advantage from Big Data through Advanced Ingest Processing - StampedeCon 2014 (20)

StreamCentral Technical Overview
StreamCentral Technical OverviewStreamCentral Technical Overview
StreamCentral Technical Overview
 
From Data to Services at the Speed of Business
From Data to Services at the Speed of BusinessFrom Data to Services at the Speed of Business
From Data to Services at the Speed of Business
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- Altibase
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 

Mais de StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016StampedeCon
 

Mais de StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 

Último

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Último (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Enabling Key Business Advantage from Big Data through Advanced Ingest Processing - StampedeCon 2014

  • 1. Enabling Key Business Advantage from Big Data through Advanced Ingest Processing Ronald S. Indeck, PhD President and Founder VelociData, Inc. Solving the Need for Speed in Big DataOps
  • 2. www.velocidata.com info@velocidata.com Today’s Discussion • Motivations for Advanced Processing • Total Data Challenges • Economical Parallelism for IT is Arriving • Heterogeneous System Architectures (HSA) • HSA Implementation and Business Benchmarks • Questions
  • 4. www.velocidata.com info@velocidata.com The Urgency for Gaining Answers in Seconds Companies that Embrace Analytics Accelerate Performance “Value Integrators” achieve higher business performance: ‒ 20 times the EBITDA growth ‒ 50% more revenue growth • “Large-scale data gathering and analytics are quickly becoming a new frontier of competitive differentiation” – HBR • The challenge for IT is to economically provide real time, quality data to support business analytics and meet time-bound service level requirements when data are doubling every 12 months Analytics is creating a competitive advantage 4
  • 5. www.velocidata.com info@velocidata.com Recognizing “Total Data” Challenges • Bloor: Databases are more than adequate for the use cases they are designed to support • Consider Big Data AND Relational, not OR … think “Total Data” • The critical unsolved challenge is breaking Total Data flow bottlenecks 5 • Total Data challenges • Data volumes exploding • Data velocity and variety growing • Data must quickly move between disparate systems • Processing high volumes on mainframes is expensive • No spare resources for critical encryption / masking • Improving or measuring data quality is challenging
  • 6. www.velocidata.com info@velocidata.com Conventional Approaches • Add more cores and memory to the existing platform • Push processing into MPP (Teradata, Netezza, …) • Change the infrastructure (Oracle Exadata, …) • Use distributed platforms (Hadoop, ...) These require new skills, time, capital, management, support, risk … and fail to truly solve the Total Data flow problem 6
  • 7. www.velocidata.com info@velocidata.com Parallelism in IT Processing is Compelling • Amdahl’s Law • High Performance Computing history • Systems were expensive • Unique tools and training required • Scaling performance is often sub-linear • Issues with timing and thread synchronization HPC has struggled for 40 years to deliver widespread accessibility mostly due to cost and poor abstraction, development tools, and design environment If we could just deliver accessibility at an affordable cost … • Hardware is now becoming inexpensive • Application development improvements still needed to enable productivity  Abstract through implementation of streaming as the paradigm 7
  • 8. www.velocidata.com info@velocidata.com Complementary Approach: Heterogeneous System Architecture • Leverage a variety of compute resources • Not just parallel threads on identical resources • Right resources at the right times • Functional elements use appropriate processing components where needed • Accommodate stream processing • Source  processing  target • Streaming data model enables pipelining, data flow acceleration • Embrace fine-grained pipeline / functional parallelism • Especially data / direct parallelism • Separate latency and throughput • Engineered system • Manage thread, memory, and resource timing and contention 8
  • 9. www.velocidata.com info@velocidata.com Heterogeneous System Architecture General purpose “not bad at everything” - Good branch prediction, fast access to large memory Thousands of cores performing very specific tasks - Excellent matrix and floating point Fully customizable with extreme opportunities for parallelism - Excels at bit manipulation for regex, cryptography, searching, … 9 Standard CPUs Graphics Boards (GPUs) FPGA Coprocessors
  • 10. www.velocidata.com info@velocidata.com • Compute “value at risk” for a portfolio • 1024 stocks • Evaluate using Monte Carlo simulation • Brownian motion random walk • Execute 1 million trials and aggregate results: 1 trial equals 1024 random walks • Double-precision computation Example: Risk Modeling Application 10
  • 11. www.velocidata.com info@velocidata.com Example: Risk Modeling Performance Results • Baseline [CPU-only] • 450 thousand walks/second  37 minutes to execute 1 billion walks • FPGA + GPU + CPU • 140 million walks/second  6 seconds for 1 billion walks • Speedup of 370x • Other financial MC simulations are similar *First use of GPU, FPGA, and CPU in one application application stage 1 application stage 2 application stage 3 FPGA graphics engine chip multi- processor 11
  • 12. www.velocidata.com info@velocidata.com • Bundles software, firmware, and hardware into an appliance • Delivers the right compute resource (CPU, GPU, and FPGA) to the right process at the right time • Uses other system resources effectively • High-level abstraction: no need to code, re-train, or acquire new skillsets • Promotes stream processing for real-time action • Sources  processing  targets • Streaming data model enables pipelining for data flow acceleration Stream Processing as an HSA Appliance 12
  • 13. www.velocidata.com info@velocidata.com Example: VelociData Solution Palette 17 VelociData Suites VelociData Solutions Examples Conventional (records/second) VelociData (records/second) Data Transformation Lookup and Replace Data enrichment by populating fields from a master file, dictionary translations, etc. (e.g. CP  Cardiopulmonologist) 3000-6000 600,000 Type Conversions XML  Fixed; Binary  Char; Date/Time Formats 1000-2000 800,000 Format Conversions Rearrange, add, drop, merge, split, and resize fields to change layouts 1000-10,000 650,000 Key Generation Hash multiple field values into a unique key, (e.g. SHA-2) 3000-20,000 > 1,000,000 Data Masking Obfuscate data for non-production uses: Persistent or Dynamic; Format preserving; AES-256 500-10,000 > 1,000,000 Data Quality USPS Address Processing Standardization, verification, and cleansing (CASS certification in process) 600-2000 400,000 Domain Set Validation Validate a value based on a list of acceptable values (e.g., all product codes at a retailer; all countries in the world) 1000-3000 750,000 Field Content Validation Validates based on patterns such as emails, dates, and phone numbers 1000-3000 > 1,000,000 Data type validation and bounds checking 3000-6000 > 1,000,000 Data Platform Conversion Mainframe Data Conversion Copybook parsing & data layout discovery; EBCDIC, COMP, COMP-3, …  ASCII, Integer, Float,… 200-800 > 200,000 Data Sort Accelerated Data Sort Sort data using complex sort keys from multiple fields within records 7000-20,000 1,000,000 Results are system dependent but data intended to provide magnitude comparison
  • 14. www.velocidata.com info@velocidata.com Example of Common ETL Bottlenecks Task #1 Task #2 Task #3 Task #4 Task #5 Task #6 Task #7 Task #8 Staging DB ETL Server Candidates for Acceleration Extract Transform Load CSV Mainframe XML RDBMS Social Media Sensor Hadoop • Hadoop • ETL Server • Data Warehouse • Database Appliances • BI Tools •Cloud
  • 15. www.velocidata.com info@velocidata.com Example ETL Processes Offloaded 15 Task #6 Task #7 Task #8 Staging DB ETL Server Extract Transform Load Keep Existing Input Interfaces Remove Bottlenecks Reduce ETL Server Workload Faster Total Processing Time CSV Mainframe XML RDBMS Social Media Sensor Hadoop Task #1 Task #2 Task #3 Task #4 Task #5 • Hadoop • ETL Server • Data Warehouse • Database Appliances • BI Tools •Cloud
  • 16. www.velocidata.com info@velocidata.com Example Mainframe-to-Hadoop Workflow • Simple, configuration-driven workflow • Sample shows Mainframe  HDFS • Data are validated, cleansed, reformatted, enriched, …, along the way • Enables landing analytics-ready data as fast as it can move across the wire • Workflow can also work in reverse to return processed data to the mainframe 16 Mainframe Input Validation Key Generation Formatter Lookup Address Standardization CSV Out
  • 17. www.velocidata.com info@velocidata.com Wire-rate Platform Integration 17 Enable fast data access between systems MPP Platforms (e.g., Teradata) Format and improve data for ready insertion into Data Analytics architectures ETL Server Preprocess data for fast movement into and out of Data Integration tools Mainframe Conversion into and out of EBCDIC and packed decimal formats Hadoop Convert data to ASCII and improve quality in flight VelociData feeds Hadoop pre-processed, quality data for real-time BI efforts VelociData enables real-time data access by Teradata for operational analytics
  • 18. www.velocidata.com info@velocidata.com Enabling Three Layers of Data Access VelociData delivers Hadoop pre-processed, quality data to keep “the lake” clean Hadoop VelociData enables real-time data access for immediate analytics and visualization VelociData feeds databases and warehouses pre-analytic, aggregated data for operational analytics • Sensors • Weblogs • Transactions • Mainframe • Hadoop • Social Media • RDBMS • … Wire-rate transformations and convergence of fresh and historical data 19
  • 19. www.velocidata.com info@velocidata.com Accessing Realtime and Historical Data • Realtime Analysis for Competitive Advantage • Enabling the speed of business to match business opportunities • Integrating Historical Data for Operational Excellence • Informing traditional BI with real-time inputs 19 Conventional Batch-oriented BI Real-time Operational Analytics Iterative Modeling Business Excellence
  • 20. www.velocidata.com info@velocidata.com Stream Processing AND Hadoop Leveraging stream processing with batch-oriented Hadoop • Access to more data for analytics • Process data on ingest (also land raw data if desired) • Transformation • Cleansing • Security • Never read a COBOL copybook again • Stream sort for integrating data, aggregation, and dedupe • … 20
  • 21. www.velocidata.com info@velocidata.com Examples of Data Challenges Being Solved 21 • Pharmaceutical discovery query is reduced from 8 days to 20 minutes • Retailer now integrates full customer data from in-store, on-line, and mobile sources in real-time (processing 50,000 records/s, up from 100/s) • Property casualty company shortens by five-fold a daily task of processing 540 million records to enable more accurate real-time quoting • Credit card company reduces mainframe costs and improves analytics performance by integrating historical and fresh data into Hadoop at line rates • Financial processing network masks 5 million fields/s of production data to sell opportunity information to retailers • To enable better customer support, a health benefits provider shortens a data integration process from 16 hours to 45 seconds • Billions of records with multi-fields keys are sorted nearly a million records/s for analytics and data quality • USPS address standardization at 10 billion/hour for data cleansing on ingest