SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
Teradata Proprietary and Confidential
BIG DATA ANALYTICS MEANS
“IN-DATABASE” ANALYTICS
Dr. Bruce Aldridge
Sr. Industry Consultant
Hi-Tech Manufacturing
Teradata
760.458.1376
bruce.aldridge@teradata.com
2 7/30/2013 Teradata Confidential
Overview of Topics
• “Big Data” Analytics
> The problems of extreme data
> Key principles for analytic engines
• Analytic Technologies
> Changing from sequential to parallel
> Design for analytics
• Operationalizing Analytics
> Analytic life cycle management
> Visualization / interacting
3 7/30/2013 Teradata Confidential
What is “Big Data”?
Big Data: any information that’s too fast,
too large or doesn’t fit what you are using
Data Explosion
> Automation of equipment
and business processes
> Sensor integration
> Communication
(networks / web)
> Compliance
4 7/30/2013 Teradata Confidential
Using Big Data
• Collecting data and using
data are different things
> Data Lakes serve as high
volume low cost
repositories for collection
> Data may be semi-structured
or structured - frequently the
conversion happening within the repository
> Large amounts of data may be stored for
reporting, compliance or investigations
• Unusual or new events provide learning
(Most “big data” will not provide new
information or knowledge)
5 7/30/2013 Teradata Confidential
Guidelines for Big Data
• Collecting ≠ learning ≠ Using data
> Data stored on appropriate system for use
> Data mining and statistic tools for learning
> Model publication (PMML) & monitor for deployment
> Visualization tools critical for all
6 7/30/2013 Teradata Confidential6 > 7/30/2013
Extreme data brings new challenges
• New techniques to limit variables for
analysis / modeling
• Emergence of columnar analytics
• Wealth of data results in
more variables than
responses
𝑦 𝑚 = 𝑓 𝑥1, 𝑥2, 𝑥3, 𝑥4, … , 𝑥 𝑛
where n>m
• Data organization struggles
with wide data
(>100,000 columns)
Id V1 V2 V3 V4 V5 V6 V7 V8 v9
AA 1.2 3.1 41 56 ‘a’ 9 0.2 ? ?
AB 0.9 2.7 41 62 ‘a’ 8 0.2 1.1 7
BA 1.0 2.9 42 57 ‘b’ 9 0.1 1.1 ?
Id Col
ID
Val
AA V1 1.2
AA V3 41
AB V1 0.9
AB V8 1.1
AB V2 2.7
“pivot”
Id V1 V2 V3 V4 V5 V6 V7 V8 v9
CA 1.2 3.1 41 56 ‘a’ 9 0.2 ? ?
CB 0.9 2.7 41 62 ‘a’ 8 0.2 1.1 7
BB 1.0 2.9 42 57 ‘b’ 9 0.1 1.1 ?
Id Col
ID
Val
AA V1 1.2
AA V3 41
AB V1 0.9
AB V8 1.1
AB V2 2.7
CA V4 56
CB V3 41
BB V1 1.0
Multiple
tables add
more rows
7 7/30/2013 Teradata Confidential
Technology Requirements for
“Big Data” Analytics
• Need for large amounts of data storage
• Ability to get at the data (SQL)
• Availability of tools for
> Visualization
> Characterizing, organizing
and cleaning data
> Summarizing (descriptive statistics)
> Analyzing (predictive models,
data discovery)
> Monitoring & reporting
• Analytic Fault Tolerance (massive
systems imply more failures)
• Dynamic growth – ability to add more
capability without “starting over” –
mixing technologies
• ROI
llll llll
8 7/30/2013 Teradata Confidential
Analytic Tools
• Faster analytics require a different approach – Parallel
> Sequential processing will be limited
> Parallel analytics distributes calculations across multiple nodes
with each node having the data necessary
> Management of calculation (distribution) and collection
• Because data is generally stored on multiple nodes, so…..
No choice but to bring the analytics to the data.
Data
Analytic
Modeling
Tools
Business Results
Local Data
repository
Parallel
Analytic
Procedures
Simple reporting /
management tools
Data
9 7/30/2013 Teradata Confidential
Putting it all together: Analytic Architecture
LANGUAGES MATH & STATS DATA MINING
DISCOVERY
PLATFORM
LOW COST – HIGH CAPACITY PARALLEL DATA LAKE
CAPTURE | STORE | REFINE
LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS
FLEXIBLE ANALYTIC
/ DISCOVERY
PLATFORM
REPORTING /
MONITOR SYSTEM
OF RECORD -
DATA WAREHOUSE
AUDIO & VIDEO IMAGE
S
TEXT WEB & SOCIAL MACHINE LOGS CR
M
SCM ER
P
Environment for:
• Low Cost high capacity storage
• High power analytics
• Fault tolerant high performance reporting
• Exploration / visualization across all areas
Visualization
exploration
10 7/30/2013 Teradata Confidential
Data Preparation
Transform, clean and
aggregate data to form data
set suitable for analysis
Monitor / Model
Deployment
Deploy statistical model to run
iroutinely - automatically
monitoring for control
Data Exploration
Explore all data with statistical
profiling and visualization
Understand / Model
the data
Apply mathematical /
relational models to test
hypotheses about the data
Modeling ADS
Sample
Data
Build
ADS
Production ADS
Automated process
Analytics Process
SQL In-dbs
Function
PMML or
UDF Models
11 7/30/2013 Teradata Confidential
• Business / Data understanding
> Defining objects and requirements of
the business
> Data collection and data profiling /
characterization
• Data preparation – joins between
tables, attribute selection, cleaning,
building new values
• Modeling: Analytic algorithms
applied and parameters adjusted
• Evaluation: results scored according
to objectives and requirements
• Deployment: Models and
parameters put into on demand or
automatic execution on new data
Analytic discovery process
CRISP – Cross Industry
Standard Process –
data mining
12 7/30/2013 Teradata Confidential
Analysis – The generation of knowledge
Generation of knowledge is iterative and interactive
• An idea related to a problem or observation is formulated
• Data is collected to support or refute the idea (deduction – what kind of
data is necessary?)
• Analysis is made on the data to validate or refute (induction)
• Results either support / reject idea or suggest modifications
Monitoring
• Known analytic models used for prediction / verification
• Adjust / control based on prediction vs. observation
• Business scoring used to prioritize
Data (facts, phenomena)
Idea (model, hypothesis, theory, conjecture)
Monitor / controlValidate Revise
13 7/30/2013 Teradata Confidential
Establishing a Robust Environment
Quality Information
 Master Data Management
 Data Profiling / visualization
 Logical/Physical Model
 Data “correction”
 Data Steward/Cleansing processes
Discovery
 Statistical / Data Mining Tools
 Secure access
 Robust Analysis capabilities
 Visualization / understanding
 Clear and significant results
 Flexibility in data and models
Automation and Alerts
 Simple publication of discovery
knowledge
 Automated pattern/anomaly detection
 Business scoring for notification and
escalation
 Clear communication of results
Visualization and Reports
 Choice of the tools to match needs
(e.g. Dashboard vs. Engineering views)
 Timing and need for data refresh
 Reporting on Core or staging
 Consistent use of metrics/results (e.g.
analytics in database vs. at the
reporting layer)
14 7/30/2013 Teradata Confidential
Analytics – Key Requirements
• Performance:
> Parallel processing - true shared nothing architecture
> Data structure influences analytics (order of magnitude)
> Management of analytics and data critical
• Fault tolerance
> More nodes WILL result in more failures
> Analytic Fault Tolerance is more than database fault
tolerance – the ability to avoid restarting the analytics
• Different node performance
> Execution in parallel will never be identical – adjust for node
differences
> System expansions must be compatible
• Flexible analytics
> Big data analytics combine queries with analytic functions
> Analytic languages not parallel (in general) – need ability to
add / customize new functions
15 7/30/2013 Teradata Confidential
Analytic Applications
• Existing parallel analytics
> In-database proprietary
> In-database addons (Fuzzy Logix, SAS, Partial R)
> Hybrid (Aster) – Database architecture
supporting MAP-Reduce functions
• Many existing applications moving parallel
> SAS: Partnered with Teradata for seamless in-
database execution of more analytics
> R: Partnered with Revolution R for rapid data
extraction and execution of some analytics in-
database AND in parallel
> Spotfire: Execution of aggregation analytics and
ability to define in-database analytic functions.
Embedded TERR (Tibco Enterprise Runtime R)
• Write your own
> Map reduce framework
> User defined functions
16 7/30/2013 Teradata Confidential
16 >
7/30/2013
Analytic Libraries and Enhancements
Database built in:
• Descriptive Statistics
• Basic data mining models
(regression, cluster, trees, PCA)
• User defined functions
Partners
• Revolution R, SAS, Fuzzy
Logix, Spotfire, …
Enhancements
• High Speed connections
• “Native” data storage
17 7/30/2013 Teradata Confidential
Device
Lot
Raw
Data
Wafer
Dashboard as an Analytic Tool
• The Dashboard becomes a 2-way interface
• User interaction parameterizes and launches new
analytics
18 7/30/2013 Teradata Confidential
Integration of “Dashboard”
• Reporting / visualization tool with ability to execute custom
functions in-database
> Empower all users - ability to publish in-database analytics to users
19 7/30/2013 Teradata Confidential
Monitor Analytics
• Analytic models generally are
published into SQL compatible
queries
• Applying models to data involves:
> Gather and format data for analytic
> Group data into consistent sets
> Screen data
> Apply algorithms
> Evaluate results
• Complete Sequence applied to
> Massive amounts of analyses
> Repetitive / automated analyses
> Scoring / Triage to identify most
significant results
20 7/30/2013 Teradata Confidential
An Analytic Monitor approach
User direct edit of
Group Description Table
(very infrequent)
View for
Instances
Stage
Data
Group
Instance
Table
Group
Description
Table
Core
EDW
Data
Model
Reporting,
BI
& Alert
management
tools
Alert
settings
Core
EDW
Data
Model
Core
EDW
Data
Model
Core
EDW
Data
Model
Standard
ETL
Views for
Data
(Group
Instance)
Creation of Views
to evaluate
load data
(installation /
dba level user)
Analytic
procedures
1) Update Group
instance Table
with new /
changed data
2) Identify new
& core data
for required
calculations
3) Perform
calculations
with standard
or custom
libraries
4) Compare
results to
business
rules or
statistical
tests
5) Update alert
and report
flags
Result
Table
Work
Tables
(optional)
Workflow
Status /
Control
Table
ETL starts Analytic
Stored Procedure
And verifies
completion
21 7/30/2013 Teradata Confidential
Statistical summaries vs……
Powerful in-DB Analysis enables the use of Simple BI Tools
22 7/30/2013 Teradata Confidential
Data Graphs
Powerful in-DB Analysis enables the use of Simple BI Tools
Analysis of
Big Data:
76M rows of
Telemetry
(16 of 159
plots/units
shown )
graphed and
stored in-
database for
evaluation.
Engine hrs. vs. date
23 7/30/2013 Teradata Confidential
Summary
• Analytics on “Big Data” will require one or more high
performance (parallel) systems connected to an
interactive interface
• Support combinations of high volumes (data
lakes), high performance and flexible advanced
analytics
• Tools for understanding,
cleansing, discovery
AND monitoring
necessary
• Interactive Visualization
support across all
systems
• Management of
analytics and fault
control

Mais conteúdo relacionado

Mais procurados

Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarBig Data Spain
 
Turning an idea into a Data-Driven Production System: An Energy Load Forecas...
 Turning an idea into a Data-Driven Production System: An Energy Load Forecas... Turning an idea into a Data-Driven Production System: An Energy Load Forecas...
Turning an idea into a Data-Driven Production System: An Energy Load Forecas...Big Data Spain
 
San Antonio’s electric utility making big data analytics the business of the ...
San Antonio’s electric utility making big data analytics the business of the ...San Antonio’s electric utility making big data analytics the business of the ...
San Antonio’s electric utility making big data analytics the business of the ...DataWorks Summit
 
Achieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturingAchieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturingDataWorks Summit
 
Getting the most out of Tibco Spotfire
Getting the most out of Tibco SpotfireGetting the most out of Tibco Spotfire
Getting the most out of Tibco SpotfireHerwig Van Marck
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
 
Consumption based analytics enabled by Data Virtualization
Consumption based analytics enabled by Data VirtualizationConsumption based analytics enabled by Data Virtualization
Consumption based analytics enabled by Data VirtualizationDenodo
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...DataWorks Summit
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...DataWorks Summit
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Seeling Cheung
 
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...DataWorks Summit
 
Enabling DataOps with Unified Data Lineage
Enabling DataOps with Unified Data LineageEnabling DataOps with Unified Data Lineage
Enabling DataOps with Unified Data LineageMANTA
 
The Single Most Important Formula for Business Success
The Single Most Important Formula for Business SuccessThe Single Most Important Formula for Business Success
The Single Most Important Formula for Business SuccessDataWorks Summit
 
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...DataWorks Summit
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Global Data Management – a practical framework to rethinking enterprise, oper...
Global Data Management – a practical framework to rethinking enterprise, oper...Global Data Management – a practical framework to rethinking enterprise, oper...
Global Data Management – a practical framework to rethinking enterprise, oper...DataWorks Summit
 
VP of WW Partners by Alan Chhabra
VP of WW Partners by Alan ChhabraVP of WW Partners by Alan Chhabra
VP of WW Partners by Alan ChhabraBig Data Spain
 

Mais procurados (20)

Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
 
Turning an idea into a Data-Driven Production System: An Energy Load Forecas...
 Turning an idea into a Data-Driven Production System: An Energy Load Forecas... Turning an idea into a Data-Driven Production System: An Energy Load Forecas...
Turning an idea into a Data-Driven Production System: An Energy Load Forecas...
 
San Antonio’s electric utility making big data analytics the business of the ...
San Antonio’s electric utility making big data analytics the business of the ...San Antonio’s electric utility making big data analytics the business of the ...
San Antonio’s electric utility making big data analytics the business of the ...
 
Big Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud DetectionBig Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud Detection
 
Achieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturingAchieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturing
 
Getting the most out of Tibco Spotfire
Getting the most out of Tibco SpotfireGetting the most out of Tibco Spotfire
Getting the most out of Tibco Spotfire
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 
Consumption based analytics enabled by Data Virtualization
Consumption based analytics enabled by Data VirtualizationConsumption based analytics enabled by Data Virtualization
Consumption based analytics enabled by Data Virtualization
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
 
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
 
Enabling DataOps with Unified Data Lineage
Enabling DataOps with Unified Data LineageEnabling DataOps with Unified Data Lineage
Enabling DataOps with Unified Data Lineage
 
The Single Most Important Formula for Business Success
The Single Most Important Formula for Business SuccessThe Single Most Important Formula for Business Success
The Single Most Important Formula for Business Success
 
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
CS-Op Analytics
CS-Op AnalyticsCS-Op Analytics
CS-Op Analytics
 
Global Data Management – a practical framework to rethinking enterprise, oper...
Global Data Management – a practical framework to rethinking enterprise, oper...Global Data Management – a practical framework to rethinking enterprise, oper...
Global Data Management – a practical framework to rethinking enterprise, oper...
 
VP of WW Partners by Alan Chhabra
VP of WW Partners by Alan ChhabraVP of WW Partners by Alan Chhabra
VP of WW Partners by Alan Chhabra
 

Destaque

In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionRevolution Analytics
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Sarah Aerni
 
Analytics Environment
Analytics EnvironmentAnalytics Environment
Analytics EnvironmentYuu Kimy
 
About alteryx
About alteryxAbout alteryx
About alteryxYuu Kimy
 
In-Database Predictive Analytics
In-Database Predictive AnalyticsIn-Database Predictive Analytics
In-Database Predictive AnalyticsJohn De Goes
 
Io tビジネスモデルに関する考察20161119
Io tビジネスモデルに関する考察20161119Io tビジネスモデルに関する考察20161119
Io tビジネスモデルに関する考察20161119Keiichiro Nabeno
 
[db tech showcase Sapporo 2015] C15:商用RDBをOSSへ Oracle to Postgres 徹底解説 by 株式会...
[db tech showcase Sapporo 2015] C15:商用RDBをOSSへ Oracle to Postgres 徹底解説 by 株式会...[db tech showcase Sapporo 2015] C15:商用RDBをOSSへ Oracle to Postgres 徹底解説 by 株式会...
[db tech showcase Sapporo 2015] C15:商用RDBをOSSへ Oracle to Postgres 徹底解説 by 株式会...Insight Technology, Inc.
 
データからインサイト そして、アイデアの発想へ(CJM/POV/HMW)
データからインサイト そして、アイデアの発想へ(CJM/POV/HMW)データからインサイト そして、アイデアの発想へ(CJM/POV/HMW)
データからインサイト そして、アイデアの発想へ(CJM/POV/HMW)Masanori Kado
 
Pivotal Data Warehouse in the Age of Digital Transformation
Pivotal Data Warehouse in the Age of Digital TransformationPivotal Data Warehouse in the Age of Digital Transformation
Pivotal Data Warehouse in the Age of Digital TransformationVMware Tanzu
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
 
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセットアウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセットMasanori Saito
 

Destaque (12)

In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
 
Analytics Environment
Analytics EnvironmentAnalytics Environment
Analytics Environment
 
About alteryx
About alteryxAbout alteryx
About alteryx
 
In-Database Predictive Analytics
In-Database Predictive AnalyticsIn-Database Predictive Analytics
In-Database Predictive Analytics
 
Io tビジネスモデルに関する考察20161119
Io tビジネスモデルに関する考察20161119Io tビジネスモデルに関する考察20161119
Io tビジネスモデルに関する考察20161119
 
[db tech showcase Sapporo 2015] C15:商用RDBをOSSへ Oracle to Postgres 徹底解説 by 株式会...
[db tech showcase Sapporo 2015] C15:商用RDBをOSSへ Oracle to Postgres 徹底解説 by 株式会...[db tech showcase Sapporo 2015] C15:商用RDBをOSSへ Oracle to Postgres 徹底解説 by 株式会...
[db tech showcase Sapporo 2015] C15:商用RDBをOSSへ Oracle to Postgres 徹底解説 by 株式会...
 
データからインサイト そして、アイデアの発想へ(CJM/POV/HMW)
データからインサイト そして、アイデアの発想へ(CJM/POV/HMW)データからインサイト そして、アイデアの発想へ(CJM/POV/HMW)
データからインサイト そして、アイデアの発想へ(CJM/POV/HMW)
 
Pivotal Data Warehouse in the Age of Digital Transformation
Pivotal Data Warehouse in the Age of Digital TransformationPivotal Data Warehouse in the Age of Digital Transformation
Pivotal Data Warehouse in the Age of Digital Transformation
 
ビジネスモデルの作り方
ビジネスモデルの作り方ビジネスモデルの作り方
ビジネスモデルの作り方
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in Transwerwise
 
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセットアウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
 

Semelhante a BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS

Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsKinetica
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Rajesh Kumar
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupCaserta
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRyan Andhavarapu
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeMicrosoft
 
Alten calsoft labs analytics service offerings
Alten calsoft labs   analytics service offeringsAlten calsoft labs   analytics service offerings
Alten calsoft labs analytics service offeringsSandeep Vyas
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfIlham31574
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platformsJamesAnderson599331
 
Datawa.re: Data warehouse design, development and support just got alot faster
Datawa.re: Data warehouse design, development and support just got alot fasterDatawa.re: Data warehouse design, development and support just got alot faster
Datawa.re: Data warehouse design, development and support just got alot fasterJohn Leonard
 

Semelhante a BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS (20)

Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing Meetup
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data Warehouse
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
 
Alten calsoft labs analytics service offerings
Alten calsoft labs   analytics service offeringsAlten calsoft labs   analytics service offerings
Alten calsoft labs analytics service offerings
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdf
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platforms
 
Datawa.re: Data warehouse design, development and support just got alot faster
Datawa.re: Data warehouse design, development and support just got alot fasterDatawa.re: Data warehouse design, development and support just got alot faster
Datawa.re: Data warehouse design, development and support just got alot faster
 

Último

2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...ictsugar
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadAyesha Khan
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailAriel592675
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportMintel Group
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Anamaria Contreras
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 

Último (20)

2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 
Call Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North GoaCall Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North Goa
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detail
 
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
 
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCREnjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample Report
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 

BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS

  • 1. Teradata Proprietary and Confidential BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS Dr. Bruce Aldridge Sr. Industry Consultant Hi-Tech Manufacturing Teradata 760.458.1376 bruce.aldridge@teradata.com
  • 2. 2 7/30/2013 Teradata Confidential Overview of Topics • “Big Data” Analytics > The problems of extreme data > Key principles for analytic engines • Analytic Technologies > Changing from sequential to parallel > Design for analytics • Operationalizing Analytics > Analytic life cycle management > Visualization / interacting
  • 3. 3 7/30/2013 Teradata Confidential What is “Big Data”? Big Data: any information that’s too fast, too large or doesn’t fit what you are using Data Explosion > Automation of equipment and business processes > Sensor integration > Communication (networks / web) > Compliance
  • 4. 4 7/30/2013 Teradata Confidential Using Big Data • Collecting data and using data are different things > Data Lakes serve as high volume low cost repositories for collection > Data may be semi-structured or structured - frequently the conversion happening within the repository > Large amounts of data may be stored for reporting, compliance or investigations • Unusual or new events provide learning (Most “big data” will not provide new information or knowledge)
  • 5. 5 7/30/2013 Teradata Confidential Guidelines for Big Data • Collecting ≠ learning ≠ Using data > Data stored on appropriate system for use > Data mining and statistic tools for learning > Model publication (PMML) & monitor for deployment > Visualization tools critical for all
  • 6. 6 7/30/2013 Teradata Confidential6 > 7/30/2013 Extreme data brings new challenges • New techniques to limit variables for analysis / modeling • Emergence of columnar analytics • Wealth of data results in more variables than responses 𝑦 𝑚 = 𝑓 𝑥1, 𝑥2, 𝑥3, 𝑥4, … , 𝑥 𝑛 where n>m • Data organization struggles with wide data (>100,000 columns) Id V1 V2 V3 V4 V5 V6 V7 V8 v9 AA 1.2 3.1 41 56 ‘a’ 9 0.2 ? ? AB 0.9 2.7 41 62 ‘a’ 8 0.2 1.1 7 BA 1.0 2.9 42 57 ‘b’ 9 0.1 1.1 ? Id Col ID Val AA V1 1.2 AA V3 41 AB V1 0.9 AB V8 1.1 AB V2 2.7 “pivot” Id V1 V2 V3 V4 V5 V6 V7 V8 v9 CA 1.2 3.1 41 56 ‘a’ 9 0.2 ? ? CB 0.9 2.7 41 62 ‘a’ 8 0.2 1.1 7 BB 1.0 2.9 42 57 ‘b’ 9 0.1 1.1 ? Id Col ID Val AA V1 1.2 AA V3 41 AB V1 0.9 AB V8 1.1 AB V2 2.7 CA V4 56 CB V3 41 BB V1 1.0 Multiple tables add more rows
  • 7. 7 7/30/2013 Teradata Confidential Technology Requirements for “Big Data” Analytics • Need for large amounts of data storage • Ability to get at the data (SQL) • Availability of tools for > Visualization > Characterizing, organizing and cleaning data > Summarizing (descriptive statistics) > Analyzing (predictive models, data discovery) > Monitoring & reporting • Analytic Fault Tolerance (massive systems imply more failures) • Dynamic growth – ability to add more capability without “starting over” – mixing technologies • ROI llll llll
  • 8. 8 7/30/2013 Teradata Confidential Analytic Tools • Faster analytics require a different approach – Parallel > Sequential processing will be limited > Parallel analytics distributes calculations across multiple nodes with each node having the data necessary > Management of calculation (distribution) and collection • Because data is generally stored on multiple nodes, so….. No choice but to bring the analytics to the data. Data Analytic Modeling Tools Business Results Local Data repository Parallel Analytic Procedures Simple reporting / management tools Data
  • 9. 9 7/30/2013 Teradata Confidential Putting it all together: Analytic Architecture LANGUAGES MATH & STATS DATA MINING DISCOVERY PLATFORM LOW COST – HIGH CAPACITY PARALLEL DATA LAKE CAPTURE | STORE | REFINE LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS FLEXIBLE ANALYTIC / DISCOVERY PLATFORM REPORTING / MONITOR SYSTEM OF RECORD - DATA WAREHOUSE AUDIO & VIDEO IMAGE S TEXT WEB & SOCIAL MACHINE LOGS CR M SCM ER P Environment for: • Low Cost high capacity storage • High power analytics • Fault tolerant high performance reporting • Exploration / visualization across all areas Visualization exploration
  • 10. 10 7/30/2013 Teradata Confidential Data Preparation Transform, clean and aggregate data to form data set suitable for analysis Monitor / Model Deployment Deploy statistical model to run iroutinely - automatically monitoring for control Data Exploration Explore all data with statistical profiling and visualization Understand / Model the data Apply mathematical / relational models to test hypotheses about the data Modeling ADS Sample Data Build ADS Production ADS Automated process Analytics Process SQL In-dbs Function PMML or UDF Models
  • 11. 11 7/30/2013 Teradata Confidential • Business / Data understanding > Defining objects and requirements of the business > Data collection and data profiling / characterization • Data preparation – joins between tables, attribute selection, cleaning, building new values • Modeling: Analytic algorithms applied and parameters adjusted • Evaluation: results scored according to objectives and requirements • Deployment: Models and parameters put into on demand or automatic execution on new data Analytic discovery process CRISP – Cross Industry Standard Process – data mining
  • 12. 12 7/30/2013 Teradata Confidential Analysis – The generation of knowledge Generation of knowledge is iterative and interactive • An idea related to a problem or observation is formulated • Data is collected to support or refute the idea (deduction – what kind of data is necessary?) • Analysis is made on the data to validate or refute (induction) • Results either support / reject idea or suggest modifications Monitoring • Known analytic models used for prediction / verification • Adjust / control based on prediction vs. observation • Business scoring used to prioritize Data (facts, phenomena) Idea (model, hypothesis, theory, conjecture) Monitor / controlValidate Revise
  • 13. 13 7/30/2013 Teradata Confidential Establishing a Robust Environment Quality Information  Master Data Management  Data Profiling / visualization  Logical/Physical Model  Data “correction”  Data Steward/Cleansing processes Discovery  Statistical / Data Mining Tools  Secure access  Robust Analysis capabilities  Visualization / understanding  Clear and significant results  Flexibility in data and models Automation and Alerts  Simple publication of discovery knowledge  Automated pattern/anomaly detection  Business scoring for notification and escalation  Clear communication of results Visualization and Reports  Choice of the tools to match needs (e.g. Dashboard vs. Engineering views)  Timing and need for data refresh  Reporting on Core or staging  Consistent use of metrics/results (e.g. analytics in database vs. at the reporting layer)
  • 14. 14 7/30/2013 Teradata Confidential Analytics – Key Requirements • Performance: > Parallel processing - true shared nothing architecture > Data structure influences analytics (order of magnitude) > Management of analytics and data critical • Fault tolerance > More nodes WILL result in more failures > Analytic Fault Tolerance is more than database fault tolerance – the ability to avoid restarting the analytics • Different node performance > Execution in parallel will never be identical – adjust for node differences > System expansions must be compatible • Flexible analytics > Big data analytics combine queries with analytic functions > Analytic languages not parallel (in general) – need ability to add / customize new functions
  • 15. 15 7/30/2013 Teradata Confidential Analytic Applications • Existing parallel analytics > In-database proprietary > In-database addons (Fuzzy Logix, SAS, Partial R) > Hybrid (Aster) – Database architecture supporting MAP-Reduce functions • Many existing applications moving parallel > SAS: Partnered with Teradata for seamless in- database execution of more analytics > R: Partnered with Revolution R for rapid data extraction and execution of some analytics in- database AND in parallel > Spotfire: Execution of aggregation analytics and ability to define in-database analytic functions. Embedded TERR (Tibco Enterprise Runtime R) • Write your own > Map reduce framework > User defined functions
  • 16. 16 7/30/2013 Teradata Confidential 16 > 7/30/2013 Analytic Libraries and Enhancements Database built in: • Descriptive Statistics • Basic data mining models (regression, cluster, trees, PCA) • User defined functions Partners • Revolution R, SAS, Fuzzy Logix, Spotfire, … Enhancements • High Speed connections • “Native” data storage
  • 17. 17 7/30/2013 Teradata Confidential Device Lot Raw Data Wafer Dashboard as an Analytic Tool • The Dashboard becomes a 2-way interface • User interaction parameterizes and launches new analytics
  • 18. 18 7/30/2013 Teradata Confidential Integration of “Dashboard” • Reporting / visualization tool with ability to execute custom functions in-database > Empower all users - ability to publish in-database analytics to users
  • 19. 19 7/30/2013 Teradata Confidential Monitor Analytics • Analytic models generally are published into SQL compatible queries • Applying models to data involves: > Gather and format data for analytic > Group data into consistent sets > Screen data > Apply algorithms > Evaluate results • Complete Sequence applied to > Massive amounts of analyses > Repetitive / automated analyses > Scoring / Triage to identify most significant results
  • 20. 20 7/30/2013 Teradata Confidential An Analytic Monitor approach User direct edit of Group Description Table (very infrequent) View for Instances Stage Data Group Instance Table Group Description Table Core EDW Data Model Reporting, BI & Alert management tools Alert settings Core EDW Data Model Core EDW Data Model Core EDW Data Model Standard ETL Views for Data (Group Instance) Creation of Views to evaluate load data (installation / dba level user) Analytic procedures 1) Update Group instance Table with new / changed data 2) Identify new & core data for required calculations 3) Perform calculations with standard or custom libraries 4) Compare results to business rules or statistical tests 5) Update alert and report flags Result Table Work Tables (optional) Workflow Status / Control Table ETL starts Analytic Stored Procedure And verifies completion
  • 21. 21 7/30/2013 Teradata Confidential Statistical summaries vs…… Powerful in-DB Analysis enables the use of Simple BI Tools
  • 22. 22 7/30/2013 Teradata Confidential Data Graphs Powerful in-DB Analysis enables the use of Simple BI Tools Analysis of Big Data: 76M rows of Telemetry (16 of 159 plots/units shown ) graphed and stored in- database for evaluation. Engine hrs. vs. date
  • 23. 23 7/30/2013 Teradata Confidential Summary • Analytics on “Big Data” will require one or more high performance (parallel) systems connected to an interactive interface • Support combinations of high volumes (data lakes), high performance and flexible advanced analytics • Tools for understanding, cleansing, discovery AND monitoring necessary • Interactive Visualization support across all systems • Management of analytics and fault control