SlideShare uma empresa Scribd logo
1 de 118
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Olivier Klein 奧樂凱
Emerging Technologies Solutions
Architect, Asia-Pacific
Modern Data Architectures for
Business Insights at Scale
Data analysis for a better customer experience
• Your business creates and stores
data and logs all the time
• Data points and logs allow you to
understand individual customer
experience and improve it
• Analysis of logs and trails help
gain insights
Ever Increasing Amount of Data
Volume
Velocity
Variety
Generation
Collection & Storage
Analytics & Computation
Collaboration & Sharing
More devices
Lower cost
Higher throughput
Generation
Collection & Storage
Analytics & Computation
Collaboration & Sharing
Highly constrained
More devices
Lower cost
Higher throughput
Generation
Collection & Storage
Analytics & Computation
Collaboration & Sharing
95% of the 1.2 zettabytes
of data in the digital
universe is unstructured
70% of of this is user-
generated content
Unstructured data growth
explosive, with estimates
of compound annual
growth (CAGR) at 62%
from 2008 – 2012.
Source: IDC
GB TB
PB
ZB
EB
Big Data: Unconstrained data growth
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Available for analysis
Generated data
Data volume - Gap
1990 2000 2010 2020
Cloud Computing helps remove constraints
Big Data:
• Potentially massive datasets
• Iterative, experimental style of
data manipulation and analysis
• Frequently not a steady-state
workload; peaks and valleys
• Data is a combination of
structured and unstructured
data in many formats
AWS Cloud:
• Virtually unlimited capacity
• Iterative, experimental usage cost
through on-demand
infrastructure
• Fully scalable infrastructure for
highly variable workloads
• Tools & Services for managing
structured, unstructured and
stream data
Let’s talk business outcomes of data analytics!
Outcome 1 : Modernize and consolidate
• Insights to enhance business applications and
create new digital services
Outcome 2 : Innovate for new revenues
• Personalization, demand forecasting, risk analysis
Outcome 3 : Real-time engagement
• Interactive customer experience, event-driven
automation, fraud detection
Outcome 4 : Automate for expansive reach
• Automation of business processes and physical
infrastructure
Driving Business Outcomes via Data Analytics
Amazon Redshift Amazon Elastic
MapReduce
Data Warehouse Semi-structured
Amazon GlacierAmazon Simple
Storage Service
Data Storage Archive
Amazon
DynamoDB
Amazon Machine
Learning
Amazon Kinesis
NoSQL Predictive Models Other AppsStreaming
Use optimal combination of interoperable services
2 . S o u r c e D a t a
S 3 U p l o a d
K i n e s i s F i r e h o s e
n a m o D B S t r e a m s
S n o w b a l l
S n o w b a l l E d g e
S n o w m o b i l e
3 . L i f e c yc l e
m a n a g e m e n t
a n d c o l d s t o r a g e
5 . D a t a
g o v e r n a n c e ,
s e c u r i t y,
p r i v a c y
Analytics
D a t a b a s e
M i g r a t i o n
S e r v i c e
1 . I n g e s t i o n
D a t a s t o r e t a r g e t
4 .
M e t a d a t a
c a p t u r e
6 . S e l f - s e r v i c e
d i s c o v e r y, s e a r c h ,
a c c e s s
7 .
M a n a g i n g
d a t a
q u a l i t y
A W
S
G l u
e
S 3
E F S
D yn a m o D B
R D S
E B S
8 . P r e p a r i n g f o r
An a l yt i c s
9 .
O r c h e s t r a t i o n
a n d j o b
s c h e d u l i n g
1 0 .
C a p t u r i n g
d a t a
c h a n g e s
G l a c i e r E M R
At h e n a
E M R
E l a s t i c S e a r c h
R e d s h i f t
AI
M a c h i n e L e a r n i n g
Q u i c k s i g h t
Modern Data Architecture on AWS
Insights to enhance business applications, new digital services
Technology: Backend system integration, on-prem data center extension, business application
integration, BI provisioning, data lakes, external APIs, access control and logging
Common initiatives
Insights: 360 view of the business
• Legacy data systems migration to enable self-service for business analysts
• Integration of all customer data, from orders, payments, interactions
• Supplier performance for inventory and vendor management
Digitization: Web-service that gives on-demand insights
• Delivery of digital content, with behavior tracking, and upsell (or ads)
• Ordering system for enterprise customers or consumers
Data monetization: Enrich, aggregate, and sell business data
• External data enrichment API, including digital marketing platforms
• Purchasable data sets of anonymized, domain-enriched insights
Outcome 1 : Modernize and Consolidate
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Enhancing business applications and creating new digital services takes a few
steps. Business goals often consist of being an agile, well-run organization,
and to stop missing opportunities because people are making decisions
without accurate insights. These initiatives are focused on giving important
personas fast and secure access to business-relevant insights.
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
1. Define personas and use case requirements (including UI)
Data analysts
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
2. Locate the data sources that have the information to extract
Data analysts
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
3. Ingest data through incremental or full loads, across secure connections
Data analysts
Fluentd: Open Source Log Collection
https://github.com/fluent/fluentd/
• Fluentd is an open source
data collector to unify data
collection and consumption
• Integration into many data
sources (App Logs, Syslogs,
Twitter etc.)
• Direct integration into AWS
<source>
type tail
format apache2
path /var/log/apache2/access_log
tag s3.apache.access
</source>
<match s3.*.*>
type s3
s3_bucket myweblogs
path logs/
</match>
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
4. Use Hadoop for large scale ETL, data quality, and preparation [*EMRFS]
AWS Glue
Amazon S3
Raw Data
Amazon EMR
ETL
Data analysts
Amazon S3
Clean Data
Amazon S3
• Highly available object storage
• Designed for 99.999999999% annual
data durability
• Replicated across 3 facilities
• Virtually unlimited scale
• Pay only for what you use, you don’t
need to pre-provision
• Allows event notifications to trigger
further action
Amazon S3
Amazon EMR
• Amazon EMR is a fully managed
Hadoop cluster
• Transient and long running clusters
• Direct integration into Amazon S3
• Easy to scale and enable burstable
capacity
• Integration with AWS Spot Market
1 instance x 100 hours = 100 instances x 1 hour
(and with Spot Pricing not only faster but also cheaper)
Amazon EMR
• Amazon EMR supports all common
Hadoop Frameworks such as:
• Spark, Pig, Hive, Hue, Oozie …
• Hbase, Presto, Impala …
• Decouples storage from compute
• Allows independent scaling
• Direct Integration with DynamoDB
and S3
Amazon S3Amazon
DynamoDB
Amazon EMR
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
5. Stage all data into centralized, highly available, durable storage for further access
AWS Glue
Amazon S3
Raw Data
Data analysts
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
6. Load semi-structured into Hadoop, structured into the DWH, and application data
into managed legacy application databases
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon Redshift
• Fully managed petabyte-scale data
warehouse
• Scalable amount of cluster nodes
• ODBC/JDBC connector for BI tools
using SQL
• Supports Amazon DynamoDB and
Amazon S3 to load data
• Less than a 10th of a cost of traditional
solutions
Amazon Redshift
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
7. Data is protected through identity and access management and logging
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
AWS
Cloud TrailAWS IAM
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS
Cloud TrailAWS IAM
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
8. Data analysts use BI tools of choice to access all serving services
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon
QuickSight
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon Quicksight
• Fast, cloud-powered, BI service that
makes it easy to build visualizations,
perform ad-hoc analysis, and get insights
from data.
• Connectors for files, third party platforms,
AWS services and other partner BI tools
• In-memory calculation engine (SPICE)
to accelerate analysis and visualization
• $9 per user per month
AWS Marketplace
• Pre-Configured machine images
ready to be launched into virtual
server instances
• Launch applications with 1-Click
• Pay software licenses by the
hour or bring your own license
(BYOL)
AWS
Cloud TrailAWS IAM
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
9. Business users have enterprise applications enhanced by analytics
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon
QuickSight
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS
Cloud TrailAWS IAM
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
10. External parties can buy services or data in a governed, secure way
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon
QuickSight
Amazon
API Gateway
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon
QuickSight
Amazon
API Gateway
AWS
Cloud TrailAWS IAM
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon Athena
Decouple Storage and Compute
Traditionally analytical workloads
required large databases or data
warehouses, with storage and
compute close to each other
Big Data often benefits from
decoupling storage and compute
Amazon S3 offers virtually unlimited
storage at a per GB/month rate
No need to
move data
Query S3 directly
& right away
No infrastructure to
setup & manage
Fast results
within seconds
Pay for just the
queries you run
Amazon Athena
Interactive query service that makes it
easy to analyze data in Amazon S3
using standard SQL
Athena & Quicksight Demo
Amazon
S3
Amazon
Athena
Amazon
Quicksight
Analyze past flight performance data stored in S3
Bureau of Transportation Flight Data Statistics
www.transtats.bts.gov
Create visualizations from S3 with Athena & Quicksight
Personalization, demand forecasting, risk analysis
Technology: Advanced analytics, customer segmentations, high volume transactional data, un/semi-
structured data, design of experiment, A/B & hypothesis testing, machine learning
Common initiatives
Personalization: Refine market approaches based on optimal segments
• Offer products to new customers based on clusters of similar individuals
• Launch share of wallet initiatives, understanding likely total spend
• Targeted marketing to capture interests and increase conversion rates
Predict demand: Guide business owners to select the best scenarios
• Launch items or promotions at the optimal time to maximize response
• Modeling for store assortment, product selection, and merchandizing
• New product design, based on known market propensities
Risk measurement: Create freedom to act by quantifying exposures
• Scenario simulation to encourage investments and new offerings
• Supply chain analytics allows for faster confirmation of goods to customers
Outcome 2 : Innovate for new revenues
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Driving net new revenues is realized by business teams that have access to
skilled analysts, using platforms that can scale up and out, without IT
bottlenecks. Organizations start operating based on what they know about
their customers, and can approach new ventures in terms of confidence
levels. Product launches, campaigns, supply chain management, packaged
services, and customized offerings are designed and executed based on
predictive models.
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
AWS
Cloud TrailAWS IAM
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Engagement platforms
1. Personas involved in generating new revenues are data scientists, data
analysts (often embedded), business users, and customers/suppliers
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Direct
Connect
AWS
Cloud TrailAWS IAM
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS Glue
2. Advanced analytics are built from a base of traditional data processing
Amazon EMR
Amazon RedShift
Amazon RDS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Direct
Connect
AWS
Cloud TrailAWS IAM
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS Glue
3. On-premise storage and databases are connected and converted
Amazon EMR
Amazon RedShift
Amazon RDS
AWS Database
Migration Service
AWS Storage
Gateway
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Direct
Connect
Internet
Interfaces
AWS
Cloud TrailAWS IAM
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Web logs /
cookies
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS Glue
4. Internet-native data sources, like web and mobile, are captured
Amazon EMR
Amazon RedShift
Amazon RDS
AWS Database
Migration Service
AWS Storage
Gateway
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
AWS Storage
Gateway
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS Glue
5. Streaming un/semi-structured data feeds, like social and devices are
captured
Amazon EMR
Amazon RedShift
Amazon RDS
Stream in Real Time: Amazon Kinesis
• Real-Time Data Processing over
large distributed streams
• Elastic capacity that scales to
millions of events per second
• React In real-time upon incoming
stream events
• Reliable stream storage
replicated across 3 facilities
Amazon Kinesis
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
AWS Storage
Gateway
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
6. Log files and other schemaless data converted to Parquet and staged
Amazon EMR
Amazon RedShift
Amazon RDS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
AWS Storage
Gateway
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
7. Data scientists test hypothesis against un/semi-structured data
Amazon RedShift
Amazon RDS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
AWS Storage
Gateway
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine Learning
Amazon S3
Schemaless
AWS Glue
8. Simple analytical models are built against Amazon Machine Learning
Amazon EMR
Amazon RedShift
Amazon RDS
Amazon Athena
Amazon ElasticSearch
Amazon Machine Learning
• Easy to use, managed machine
learning service built for developers
• Machine learning technology based
on Amazon’s internal systems
• Create models using data stored in
Amazon S3, Amazon RDS or Amazon
Redshift
• Request predictions on batch or real-
time
Amazon Machine
Learning
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
AWS Storage
Gateway
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
AWS Glue
9. Complex analytical models are built against EMR (Spark) clusters
Amazon EMR
Amazon RedShift
Amazon RDS
Amazon Athena
Amazon ElasticSearch
Apache Spark
• In-memory analytics cluster using RDD
(Resilient Distributed Dataset) for fast
processing
• Spark MLlib offers machine learning out of the box
• Apache Spark can read directly from Amazon S3
data = sc.textFile("s3://...")
parsedData = data.map(lambda line: array([float(x) for x in line.split(' ')]))
model = KMeans.train(parsedData, 2, maxIterations=10, initializationMode="random")
model.save(sc, "MyModel")
sameModel = KMeansModel.load(sc, "MyModel")
Machine Learning Algorithms
• Classification
• Sentiment analysis – Do people like my new product?
• Linear Regression
• Trend prediction – How much revenue next month?
• Clustering
• Recommendation - Other people bought this!
• Association
• Market basket analysis – Bundled products
• Neural Networks
• Pattern recognition - Speech recognition
Amazon Machine
Learning
Amazon EMR +
Spark Mlib
GPU Optimized
EC2 Instance
Intel® Processor Technologies
Intel® AVX – Dramatically increases performance for highly parallel HPC workloads
such as life science engineering, data mining, financial analysis, media processing
Intel® AES-NI – Enhances security with new encryption instructions that reduce the
performance penalty associated with encrypting/decrypting data
Intel® Turbo Boost Technology – Increases computing power with performance that
adapts to spikes in workloads
Intel Transactional Synchronization (TSX) Extensions – Enables execution of
transactions that are independent to accelerate throughput
P state & C state control – provides granular performance tuning for cores and sleep
states to improve overall application performance
New X1 Instance - Tons of Memory
• Designed for large-scale, in-memory
applications in the cloud
• Ideal for in-memory databases like SAP
HANA and big data processing apps like
Spark and Presto
• Powered by Intel® Xeon® E7 8880 v3
Haswell processors
• Features up to 2TB of memory and up to
128 vCPUs per instance
• 8X the memory offered by any other Amazon EC2
instance
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
AWS Storage
Gateway
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
AWS Glue
10. Predictive models are published to data staging
Amazon EMR
Amazon RedShift
Amazon RDS
Amazon Athena
Amazon ElasticSearch
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
AWS Storage
Gateway
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
AWS Glue
11. Analysts use DWH, EMR, ES to find patterns & measure performance
Amazon RedShift
Amazon RDS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
AWS Storage
Gateway
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
AWS Glue
12. Risk models evaluated to create new products and assess customers
Amazon RDS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
AWS Glue
13. Demand forecasts loaded into supply chain management systems
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
AWS Glue
14. Personalized offers are broadcast out over notification channels
Amazon SNS
Amazon Pinpoint
Amazon SNS & Amazon Pinpoint
• Amazon SNS is a fully
managed, cross-platform
mobile push intermediary
service
• Fully scalable to millions
of devices
• Amazon Pinpoint allows
to created targeted
campaigns and measure
engagement and results
Amazon SNS
Apple APNS
Google GCM
Amazon ADM
Windows WNS and
MPNS
Baidu CP
Android Phones and Tablets
Apple iPhones and iPads
Kindle Fire Devices
Android Phones and Tablets in China
iOS
Windows Phone Devices
Amazon
SNS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
AWS Glue
Amazon SNS
Amazon Pinpoint
Elastic GPUs For EC2
U s e G r a p h i c s G P U s A s I f T h e y W e r e E B S Vo l u m e s
Elastic GPUs: GPU Acceleration on-demand
Current
Generation
EC2
Instance
1GiB
GPU Memory
2 GiB
4 GiB
8 GiB
Current
Generation
EC2
Instance
Elastic GPUs: GPU Acceleration on-demand
BREAK
Next up: Real-Time Analytics and Engagement
Interactive customer experience, event-driven automation, fraud detection
Technology: Clickstream/mobile apps/sensor/video (computer vision)/audio (intent comprehension), event
detection and pipelining, in-line scoring, serverless compute, computer vision, deep learning
Common initiatives
Interactive CX: Natural customer journeys with adaptive interfaces
• Behavior-based recommendations, improving personalization along the journey
• Seamless session transfer across UI, from browser to mobile to physical location
• Voice-driven commands, and use of gestures and other natural interfaces
Event-driven automation: Full execution of business process driven by an action
• Order fulfillment, with real-time update notifications to customer
• Fast response to customer complaints/comments over direct or social channels
Fraud detection: Protect customer and business w/ real-time anomaly detection
• Purchase and payment verification, using behavioral models and location assessment
• Application and account opening validation
Outcome 3 : Real-time Engagement
Artificial Intelligence
Alexa, Hello!
The Power of Speech: Alexa
Alexa, the voice service that powers
Echo, provides capabilities, or skills,
that enable customers to interact with
devices using voice
Alexa Skills Kit (ASK) allows everyone
to build and publish their own skills
Skills can be powered by AWS
Lambda
Build your own Alexa Skill!
Amazon
Echo
Alexa Skills
Kit
AWS Lambda Facebook
Page
Personalized content
- Account access
- Track spending
- Check balances
- Pay bills
- Prevent fraud
Unlimited
Replays
Returns an MP3
or audio stream
Lightning Fast
Response
Fully Managed and
Low Cost
Amazon Polly
Turn text into lifelike speech using deep
learning technologies to synthesize
speech that sounds like a human voice
Amazon Polly
“The temperature
in WA is 75°F”
“The temperature
in Washington is 75 degrees
Fahrenheit”
Amazon Polly: Text In, Life-like Speech Out
Amazon Lex
Conversational interfaces for your
applications, powered by the same
Natural Language Understanding
(NLU) & Automatic Speech Recognition
(ASR) models as Alexa
Integrated
development in
AWS console
Trigger AWS
Lambda
functions
Multi-step
conversations
Continually improving
ASR & NLU models
Enterprise
connectors
Fully Managed
Intents
A particular goal that the
user wants to achieve
Utterances
Spoken or typed phrases
that invoke your intent
Slots
Data the user must provide to fulfill the
intent
Prompts
Questions that ask the user to input
data
Fulfillment
The business logic required to fulfill the
user’s intent
BookHotel
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Provide superior customer service by responding to opportunities in real
time. Fulfill requests for products or services in an automated fashion to
create a strong competitive advantage over those that are unable to.
Assurance becomes a different challenge, when speeds increase, and fraud
prevention must be adaptive and fast. Adding another layer of opportunity and
complexity is the use of vast streams of data from devices that are
measuring location, video, behaviors, environmental conditions, and more.
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
AWS
Cloud TrailAWS IAM
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
1. Real-time engagement requires personas that develop the analytics,
and platforms for engaging and automating processes
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
AWS Glue
2. Real-time systems are built from a base of advanced data processing
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
AWS Glue
Amazon
Kinesis
3. Events are pipelined through Kinesis, into multiple streams, at scale
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
Amazon EMR
AWS Glue
Amazon
Kinesis
4. Event data is given context and structure in EMR and pushed for batch
Also possible with Spark Streaming!
Amazon
Kinesis
EMR with
Spark Streaming
KinesisUtils.createStream(‘twitter-stream’)
.filter(_.getText.contains(‘Big Data’))
.countByWindow(Seconds(5))
Counting tweets on a sliding window
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
CloudWatch
Amazon
Kinesis Firehose
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
Amazon EMR
AWS Glue
Amazon
Kinesis
5. Kinesis Firehose pumps events into a DWH for near real-time analysis
Amazon Kinesis Firehose
• Fully managed data streaming service to ingest and
capture data into your storage or data warehouse
• Ability to batch load, compress or encrypt streaming
data
• Elastic to scale to any throughput (no more sharding)
• Charged only per GB processed ($0.035 per GB)
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
CloudWatch
Amazon
Kinesis Firehose
Event Scoring
AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
Amazon EMR
AWS Glue
Amazon
Kinesis
6. The event is streamed to a scoring server for processing
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
CloudWatch
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
Amazon EMR
AWS Glue
Amazon
Kinesis
7. Language, intent, and image processing are run and sent for scoring
Amazon Rekognition
Image Recognitions and Analysis
powered by Deep Learning which
allows to search, verify and organize
millions of images
Easy to use Batch Analysis Real-time
Analysis
Continually Improving Low Cost
Maple
Villa
Plant
Garden
Water
Swimming Pool
Tree
Potted Plant
Backyard
Demographic Data
Facial Landmarks
Sentiment Expressed
Image Quality
Brightness: 25.84
Sharpness: 160
General Attributes
Serverless Rekognition Demo
Serverless website that uses Rekognition to identify
faces and classify pictures
Amazon S3
AWS Lambda
Amazon API
Gateway
Amazon
DynamoDB
Amazon
Rekognition
Mobile
CodeFor.Cloud/image
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
CloudWatch
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
Amazon EMR
AWS Glue
Amazon
Kinesis
8. Simple analytical models are checked on-demand against Amazon ML
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
CloudWatch
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
Amazon EMR
AWS Glue
Amazon
Kinesis
9. Complex analytical models are scored against coded models (PMML)
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
CloudWatch
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
Amazon EMR
AWS Glue
Amazon
Kinesis
10. Scored response to the event is processed to be pushed for action
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
Amazon DynamoDB
AWS Storage
Gateway
Amazon
CloudWatch
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
Amazon EMR
AWS Glue
Amazon
Kinesis
11. Recommendations are pushed to DynamoDB for low latency serving
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
Amazon SQS
AWS Storage
Gateway
Amazon
CloudWatch
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
Amazon EMR
AWS Glue
Amazon
Kinesis
12. Actions are pushed to RDS and SQS for business process automation
Amazon DynamoDB
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
Amazon DynamoDB
Amazon SQS
AWS Storage
Gateway
Amazon
CloudWatch
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
Amazon EMR
AWS Glue
Amazon
Kinesis
Amazon
Kinesis
Twitter Stream Amazon
Lambda
Demo: Live Twitter Feed Analysis
* https://blog.twitter.com/2013/new-tweets-per-second-record-and-how
Twitter Blog* - On a typical day (in 2013):
• More than 500 million Tweets sent
• Average 5,700 TPS
Amazon
Elasticsearch
Service
Kinesis
for Real-
Time
10TB/day
Amazon
S3
• Robinhood’s lean staff used AWS to create a
massively scalable securities trading app
with strong built-in security and compliance
features that supported hundreds of thousands
of users at launch
• Saved customers $22 million in commissions
since launch, and transacted over $1 billion. All
of this scaled up with 2 DevOps resources
• Amazon Redshift has allowed the data science
team to identify fraud and fight money
laundering, without needing to hire a data
science infrastructure team
Robinhood Launches Popular No-fee Brokerage Trading Platform on AWS
Robinhood is an investment platform that offers free
trades for everyone. It is based in Palo Alto, CA.
We can look at real-time
analytics and behaviors on
our platform, that wouldn't be
available at our scale if we
weren't using AWS.
”
“
Miles Wellesley
Head of Business Development
Automation of self-service, deployment, policy, and quality assurance
Technology: Self-service, on-demand provisioning, DevOps, spot pricing, Cloud Formations, security
automation, performance monitoring (CW&XR), global rollouts
Common initiatives
Self-service:
• Application catalog or portal for all employees, availability determined by role
• Service provisioning backed by automation of policy and governance
Agile development: Use of DevOps to allow very few resources to deploy globally
• CI/CD for software release, build/test, and deployment automation
• Templated infrastructure provisioning, and configuration management
• Business rules and policies are "gold coded" to be used for all deployments
• Use of Security by Design (SbD) to codify network, O/S, and encryption
Comprehensive monitoring: Assurance of SLA and issue remediation
• Logging and monitoring of all API calls and executions to ensure SLAs are met
• Analysis of performance variance for faster root cause analysis
Outcome 4 : Automate for expansive reach
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Automate for expansive reach
Automation of self-service, deployment, policy, and quality assurance
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
AWS
Cloud TrailAWS IAM
Amazon
Kinesis
Amazon Athena
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
Amazon DynamoDB
Amazon SQS
AWS Storage
Gateway
Amazon
CloudWatch
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
Amazon EMR
AWS Glue
Amazon
Kinesis
AWS DevOps
AWS Glue
Easily understand your data sources,
prepare the data, and load it reliably to
data stores and your analytics pipeline
Integrated with:
S3, RDS, Redshift & any JDBC-
compliant data store
Build Your Data
Catalog
Generate And Edit
Transformations
Schedule And Run
Your Jobs
AWS Lambda
• Use AWS Lambda to clean and
massage incoming data
• Write code to load data sources
(S3, DynamoDB) automatically in your
data warehouse (e.g. Amazon Redshift)
• React in real-time to incoming events in
Amazon Kinesis
Amazon Lambda
Amazon Redshift
Amazon
Kinesis
AdRoll: AWS Lambda for log files
Valentino Volonghi
CTO, AdRoll
“Polling is not a scalable strategy to
figure out when new files are added to S3,
especially when you add 17M of them per
month. So we moved Lambda in front of
S3.”
• Cross-platform, cross-device
advertising platform
• Offers retargeting based on
clickstream data
300TB
new
data/mont
h
Remember everything is an API: SDKs
Java Python (boto) PHP .NET Ruby Node.js
iOS Android Go
JavaScript
C++
Affordable Petabyte-scale Analytics
AWS helps customers maximize the value of Big Data
investments while reducing overall IT costs
Secure,
Highly Durable storage
$28.16 / TB / month
Data
Archiving
$7.16 / TB / month
Real-time
streaming data load
$0.035 / GB
10-node
Spark Cluster
$0.15 / hr
Petabyte-scale
Data Warehouse
$0.25 / hr
Amazon Glacier Amazon S3 Amazon RedshiftAmazon EMRAmazon Kinesis
Call To Action
• Attend the official AWS Training course organized by AWS Authorized local
training partner – Iverson Associates Sdn Bhd (www.iverson.com.my).
• Join the AWS Jumpstart (2 hr) session and hear from our customers and partners
on how they enabled their teams and successfully deployed on AWS. Also stand a
chance to win free seat to the above courses.
• Point of contact – Cheryl Wong - cheryl.wong@iverson.com.my
Courses Date
Architecting on AWS 28 Feb - 2 March
System Operations on AWS 8-10 March
Developing on AWS 15-17 March
Big Data on AWS 19-21 April
Date Venue
17 Mar 2017 Iverson Associates Sdn Bhd (303330-M), Suites T113-T114, 3rd Floor, Centrepoint,
Lebuh Bandar Utama, Bandar Utama, 47800 Petaling Jaya, Selangor
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
http://bit.ly/awssummitkl
April 18 | One World Hotel | Kuala Lumpur
Register Now!
Join AWS User Group MY
https://www.facebook.com/groups/awsugmy/
Thank You!
Next up: Q&A

Mais conteúdo relacionado

Mais procurados

(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
Amazon Web Services
 

Mais procurados (20)

Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWS
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
 
Build an App on AWS for Your First 10 Million Users
Build an App on AWS for Your First 10 Million UsersBuild an App on AWS for Your First 10 Million Users
Build an App on AWS for Your First 10 Million Users
 
The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017
 
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksReal-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 
Best Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSBest Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWS
 
Analytics on AWS:Structured, Unstructured and Streaming
Analytics on AWS:Structured, Unstructured and StreamingAnalytics on AWS:Structured, Unstructured and Streaming
Analytics on AWS:Structured, Unstructured and Streaming
 
Big Data and Analytics – End to End on AWS – Russell Nash
Big Data and Analytics – End to End on AWS – Russell NashBig Data and Analytics – End to End on AWS – Russell Nash
Big Data and Analytics – End to End on AWS – Russell Nash
 
Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
BDA304 Data-Driven Post Mortems
BDA304 Data-Driven Post MortemsBDA304 Data-Driven Post Mortems
BDA304 Data-Driven Post Mortems
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
 
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Running Lean Architectures: How to Optimize for Cost Efficiency
Running Lean Architectures: How to Optimize for Cost Efficiency Running Lean Architectures: How to Optimize for Cost Efficiency
Running Lean Architectures: How to Optimize for Cost Efficiency
 

Destaque

Destaque (20)

MongoDB NoSQL database a deep dive -MyWhitePaper
MongoDB  NoSQL database a deep dive -MyWhitePaperMongoDB  NoSQL database a deep dive -MyWhitePaper
MongoDB NoSQL database a deep dive -MyWhitePaper
 
2015 Internet Trends Report
2015 Internet Trends Report2015 Internet Trends Report
2015 Internet Trends Report
 
Best Practices for Integrating Active Directory with AWS Workloads
Best Practices for Integrating Active Directory with AWS WorkloadsBest Practices for Integrating Active Directory with AWS Workloads
Best Practices for Integrating Active Directory with AWS Workloads
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
Tracxn Research - ERP Landscape, February 2017
Tracxn Research - ERP Landscape, February 2017Tracxn Research - ERP Landscape, February 2017
Tracxn Research - ERP Landscape, February 2017
 
Comparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statementsComparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statements
 
Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017
 
Tracxn Research - Construction Tech Landscape, February 2017
Tracxn Research - Construction Tech Landscape, February 2017Tracxn Research - Construction Tech Landscape, February 2017
Tracxn Research - Construction Tech Landscape, February 2017
 
2017 iosco research report on financial technologies (fintech)
2017 iosco research report on  financial technologies (fintech)2017 iosco research report on  financial technologies (fintech)
2017 iosco research report on financial technologies (fintech)
 
Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
 
Business model navigator 55 business model patterns
Business model navigator  55 business model patternsBusiness model navigator  55 business model patterns
Business model navigator 55 business model patterns
 
Tracxn Research - Industrial Robotics Landscape, February 2017
Tracxn Research - Industrial Robotics Landscape, February 2017Tracxn Research - Industrial Robotics Landscape, February 2017
Tracxn Research - Industrial Robotics Landscape, February 2017
 
Tugas4 0317-nasrulakbar-141250552
Tugas4 0317-nasrulakbar-141250552Tugas4 0317-nasrulakbar-141250552
Tugas4 0317-nasrulakbar-141250552
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
Webinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph DatabasesWebinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph Databases
 
Build a Website on AWS for Your First 10 Million Users
Build a Website on AWS for Your First 10 Million UsersBuild a Website on AWS for Your First 10 Million Users
Build a Website on AWS for Your First 10 Million Users
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 

Semelhante a Modern Data Architectures for Business Insights at Scale

Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Amazon Web Services Korea
 
February 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWSFebruary 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWS
Amazon Web Services
 

Semelhante a Modern Data Architectures for Business Insights at Scale (20)

Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
AWS Initiate Day Dublin 2019 – Big Data Meets AI
AWS Initiate Day Dublin 2019 – Big Data Meets AIAWS Initiate Day Dublin 2019 – Big Data Meets AI
AWS Initiate Day Dublin 2019 – Big Data Meets AI
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
 
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
 
Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100
 
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
A Winning Strategy for the Digital Economy
A Winning Strategy for the Digital EconomyA Winning Strategy for the Digital Economy
A Winning Strategy for the Digital Economy
 
February 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWSFebruary 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWS
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
 

Mais de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Modern Data Architectures for Business Insights at Scale

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Olivier Klein 奧樂凱 Emerging Technologies Solutions Architect, Asia-Pacific Modern Data Architectures for Business Insights at Scale
  • 2. Data analysis for a better customer experience • Your business creates and stores data and logs all the time • Data points and logs allow you to understand individual customer experience and improve it • Analysis of logs and trails help gain insights
  • 3. Ever Increasing Amount of Data Volume Velocity Variety
  • 4. Generation Collection & Storage Analytics & Computation Collaboration & Sharing
  • 5. More devices Lower cost Higher throughput Generation Collection & Storage Analytics & Computation Collaboration & Sharing
  • 6. Highly constrained More devices Lower cost Higher throughput Generation Collection & Storage Analytics & Computation Collaboration & Sharing
  • 7. 95% of the 1.2 zettabytes of data in the digital universe is unstructured 70% of of this is user- generated content Unstructured data growth explosive, with estimates of compound annual growth (CAGR) at 62% from 2008 – 2012. Source: IDC GB TB PB ZB EB Big Data: Unconstrained data growth
  • 8. Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares Available for analysis Generated data Data volume - Gap 1990 2000 2010 2020
  • 9. Cloud Computing helps remove constraints
  • 10. Big Data: • Potentially massive datasets • Iterative, experimental style of data manipulation and analysis • Frequently not a steady-state workload; peaks and valleys • Data is a combination of structured and unstructured data in many formats AWS Cloud: • Virtually unlimited capacity • Iterative, experimental usage cost through on-demand infrastructure • Fully scalable infrastructure for highly variable workloads • Tools & Services for managing structured, unstructured and stream data
  • 11. Let’s talk business outcomes of data analytics!
  • 12. Outcome 1 : Modernize and consolidate • Insights to enhance business applications and create new digital services Outcome 2 : Innovate for new revenues • Personalization, demand forecasting, risk analysis Outcome 3 : Real-time engagement • Interactive customer experience, event-driven automation, fraud detection Outcome 4 : Automate for expansive reach • Automation of business processes and physical infrastructure Driving Business Outcomes via Data Analytics
  • 13. Amazon Redshift Amazon Elastic MapReduce Data Warehouse Semi-structured Amazon GlacierAmazon Simple Storage Service Data Storage Archive Amazon DynamoDB Amazon Machine Learning Amazon Kinesis NoSQL Predictive Models Other AppsStreaming Use optimal combination of interoperable services
  • 14. 2 . S o u r c e D a t a S 3 U p l o a d K i n e s i s F i r e h o s e n a m o D B S t r e a m s S n o w b a l l S n o w b a l l E d g e S n o w m o b i l e 3 . L i f e c yc l e m a n a g e m e n t a n d c o l d s t o r a g e 5 . D a t a g o v e r n a n c e , s e c u r i t y, p r i v a c y Analytics D a t a b a s e M i g r a t i o n S e r v i c e 1 . I n g e s t i o n D a t a s t o r e t a r g e t 4 . M e t a d a t a c a p t u r e 6 . S e l f - s e r v i c e d i s c o v e r y, s e a r c h , a c c e s s 7 . M a n a g i n g d a t a q u a l i t y A W S G l u e S 3 E F S D yn a m o D B R D S E B S 8 . P r e p a r i n g f o r An a l yt i c s 9 . O r c h e s t r a t i o n a n d j o b s c h e d u l i n g 1 0 . C a p t u r i n g d a t a c h a n g e s G l a c i e r E M R At h e n a E M R E l a s t i c S e a r c h R e d s h i f t AI M a c h i n e L e a r n i n g Q u i c k s i g h t Modern Data Architecture on AWS
  • 15. Insights to enhance business applications, new digital services Technology: Backend system integration, on-prem data center extension, business application integration, BI provisioning, data lakes, external APIs, access control and logging Common initiatives Insights: 360 view of the business • Legacy data systems migration to enable self-service for business analysts • Integration of all customer data, from orders, payments, interactions • Supplier performance for inventory and vendor management Digitization: Web-service that gives on-demand insights • Delivery of digital content, with behavior tracking, and upsell (or ads) • Ordering system for enterprise customers or consumers Data monetization: Enrich, aggregate, and sell business data • External data enrichment API, including digital marketing platforms • Purchasable data sets of anonymized, domain-enriched insights Outcome 1 : Modernize and Consolidate
  • 16. Ingest ServingData sources Speed (Real-time) Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Enhancing business applications and creating new digital services takes a few steps. Business goals often consist of being an agile, well-run organization, and to stop missing opportunities because people are making decisions without accurate insights. These initiatives are focused on giving important personas fast and secure access to business-relevant insights.
  • 17. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers 1. Define personas and use case requirements (including UI) Data analysts
  • 18. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP 2. Locate the data sources that have the information to extract Data analysts
  • 19. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 3. Ingest data through incremental or full loads, across secure connections Data analysts
  • 20. Fluentd: Open Source Log Collection https://github.com/fluent/fluentd/ • Fluentd is an open source data collector to unify data collection and consumption • Integration into many data sources (App Logs, Syslogs, Twitter etc.) • Direct integration into AWS <source> type tail format apache2 path /var/log/apache2/access_log tag s3.apache.access </source> <match s3.*.*> type s3 s3_bucket myweblogs path logs/ </match>
  • 21. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 4. Use Hadoop for large scale ETL, data quality, and preparation [*EMRFS] AWS Glue Amazon S3 Raw Data Amazon EMR ETL Data analysts Amazon S3 Clean Data
  • 22. Amazon S3 • Highly available object storage • Designed for 99.999999999% annual data durability • Replicated across 3 facilities • Virtually unlimited scale • Pay only for what you use, you don’t need to pre-provision • Allows event notifications to trigger further action Amazon S3
  • 23. Amazon EMR • Amazon EMR is a fully managed Hadoop cluster • Transient and long running clusters • Direct integration into Amazon S3 • Easy to scale and enable burstable capacity • Integration with AWS Spot Market
  • 24. 1 instance x 100 hours = 100 instances x 1 hour (and with Spot Pricing not only faster but also cheaper)
  • 25. Amazon EMR • Amazon EMR supports all common Hadoop Frameworks such as: • Spark, Pig, Hive, Hue, Oozie … • Hbase, Presto, Impala … • Decouples storage from compute • Allows independent scaling • Direct Integration with DynamoDB and S3 Amazon S3Amazon DynamoDB Amazon EMR
  • 26. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 5. Stage all data into centralized, highly available, durable storage for further access AWS Glue Amazon S3 Raw Data Data analysts Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data
  • 27. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 6. Load semi-structured into Hadoop, structured into the DWH, and application data into managed legacy application databases AWS Glue Amazon S3 Raw Data Amazon EMR Semi-structured Amazon RedShift Data Warehouse Amazon RDS Legacy Apps Data analysts Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data
  • 28. Amazon Redshift • Fully managed petabyte-scale data warehouse • Scalable amount of cluster nodes • ODBC/JDBC connector for BI tools using SQL • Supports Amazon DynamoDB and Amazon S3 to load data • Less than a 10th of a cost of traditional solutions Amazon Redshift
  • 29. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 7. Data is protected through identity and access management and logging AWS Glue Amazon S3 Raw Data Amazon EMR Semi-structured Amazon RedShift Data Warehouse Amazon RDS Legacy Apps Data analysts AWS Cloud TrailAWS IAM Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data
  • 30. AWS Cloud TrailAWS IAM Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 8. Data analysts use BI tools of choice to access all serving services AWS Glue Amazon S3 Raw Data Amazon EMR Semi-structured Amazon RedShift Data Warehouse Amazon RDS Legacy Apps Data analysts Amazon QuickSight Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data
  • 31. Amazon Quicksight • Fast, cloud-powered, BI service that makes it easy to build visualizations, perform ad-hoc analysis, and get insights from data. • Connectors for files, third party platforms, AWS services and other partner BI tools • In-memory calculation engine (SPICE) to accelerate analysis and visualization • $9 per user per month
  • 32.
  • 33. AWS Marketplace • Pre-Configured machine images ready to be launched into virtual server instances • Launch applications with 1-Click • Pay software licenses by the hour or bring your own license (BYOL)
  • 34. AWS Cloud TrailAWS IAM Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 9. Business users have enterprise applications enhanced by analytics AWS Glue Amazon S3 Raw Data Amazon EMR Semi-structured Amazon RedShift Data Warehouse Amazon RDS Legacy Apps Data analysts Amazon QuickSight Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data
  • 35. AWS Cloud TrailAWS IAM Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 10. External parties can buy services or data in a governed, secure way AWS Glue Amazon S3 Raw Data Amazon EMR Semi-structured Amazon RedShift Data Warehouse Amazon RDS Legacy Apps Data analysts Amazon QuickSight Amazon API Gateway Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data
  • 36. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data AWS Glue Amazon S3 Raw Data Amazon EMR Semi-structured Amazon RedShift Data Warehouse Amazon RDS Legacy Apps Data analysts Amazon QuickSight Amazon API Gateway AWS Cloud TrailAWS IAM Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Athena
  • 37. Decouple Storage and Compute Traditionally analytical workloads required large databases or data warehouses, with storage and compute close to each other Big Data often benefits from decoupling storage and compute Amazon S3 offers virtually unlimited storage at a per GB/month rate
  • 38. No need to move data Query S3 directly & right away No infrastructure to setup & manage Fast results within seconds Pay for just the queries you run Amazon Athena Interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL
  • 39. Athena & Quicksight Demo Amazon S3 Amazon Athena Amazon Quicksight Analyze past flight performance data stored in S3 Bureau of Transportation Flight Data Statistics www.transtats.bts.gov Create visualizations from S3 with Athena & Quicksight
  • 40. Personalization, demand forecasting, risk analysis Technology: Advanced analytics, customer segmentations, high volume transactional data, un/semi- structured data, design of experiment, A/B & hypothesis testing, machine learning Common initiatives Personalization: Refine market approaches based on optimal segments • Offer products to new customers based on clusters of similar individuals • Launch share of wallet initiatives, understanding likely total spend • Targeted marketing to capture interests and increase conversion rates Predict demand: Guide business owners to select the best scenarios • Launch items or promotions at the optimal time to maximize response • Modeling for store assortment, product selection, and merchandizing • New product design, based on known market propensities Risk measurement: Create freedom to act by quantifying exposures • Scenario simulation to encourage investments and new offerings • Supply chain analytics allows for faster confirmation of goods to customers Outcome 2 : Innovate for new revenues
  • 41.
  • 42. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Driving net new revenues is realized by business teams that have access to skilled analysts, using platforms that can scale up and out, without IT bottlenecks. Organizations start operating based on what they know about their customers, and can approach new ventures in terms of confidence levels. Product launches, campaigns, supply chain management, packaged services, and customized offerings are designed and executed based on predictive models.
  • 43. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis AWS Cloud TrailAWS IAM Amazon CloudWatch Data analysts Data scientists Business users Engagement platforms 1. Personas involved in generating new revenues are data scientists, data analysts (often embedded), business users, and customers/suppliers
  • 44. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Direct Connect AWS Cloud TrailAWS IAM Amazon CloudWatch Data analysts Data scientists Business users Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data AWS Glue 2. Advanced analytics are built from a base of traditional data processing Amazon EMR Amazon RedShift Amazon RDS
  • 45. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Direct Connect AWS Cloud TrailAWS IAM Amazon CloudWatch Data analysts Data scientists Business users Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data AWS Glue 3. On-premise storage and databases are connected and converted Amazon EMR Amazon RedShift Amazon RDS AWS Database Migration Service AWS Storage Gateway
  • 46. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Direct Connect Internet Interfaces AWS Cloud TrailAWS IAM Amazon CloudWatch Data analysts Data scientists Business users Web logs / cookies Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data AWS Glue 4. Internet-native data sources, like web and mobile, are captured Amazon EMR Amazon RedShift Amazon RDS AWS Database Migration Service AWS Storage Gateway
  • 47. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces AWS Cloud TrailAWS IAM Amazon Kinesis AWS Storage Gateway Amazon CloudWatch Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data AWS Glue 5. Streaming un/semi-structured data feeds, like social and devices are captured Amazon EMR Amazon RedShift Amazon RDS
  • 48. Stream in Real Time: Amazon Kinesis • Real-Time Data Processing over large distributed streams • Elastic capacity that scales to millions of events per second • React In real-time upon incoming stream events • Reliable stream storage replicated across 3 facilities Amazon Kinesis
  • 49. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces AWS Cloud TrailAWS IAM Amazon Kinesis AWS Storage Gateway Amazon CloudWatch Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless AWS Glue 6. Log files and other schemaless data converted to Parquet and staged Amazon EMR Amazon RedShift Amazon RDS
  • 50. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch AWS Storage Gateway Amazon CloudWatch Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless AWS Glue 7. Data scientists test hypothesis against un/semi-structured data Amazon RedShift Amazon RDS
  • 51. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces AWS Cloud TrailAWS IAM Amazon Kinesis AWS Storage Gateway Amazon CloudWatch Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine Learning Amazon S3 Schemaless AWS Glue 8. Simple analytical models are built against Amazon Machine Learning Amazon EMR Amazon RedShift Amazon RDS Amazon Athena Amazon ElasticSearch
  • 52. Amazon Machine Learning • Easy to use, managed machine learning service built for developers • Machine learning technology based on Amazon’s internal systems • Create models using data stored in Amazon S3, Amazon RDS or Amazon Redshift • Request predictions on batch or real- time Amazon Machine Learning
  • 53. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces AWS Cloud TrailAWS IAM Amazon Kinesis AWS Storage Gateway Amazon CloudWatch Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless AWS Glue 9. Complex analytical models are built against EMR (Spark) clusters Amazon EMR Amazon RedShift Amazon RDS Amazon Athena Amazon ElasticSearch
  • 54. Apache Spark • In-memory analytics cluster using RDD (Resilient Distributed Dataset) for fast processing • Spark MLlib offers machine learning out of the box • Apache Spark can read directly from Amazon S3 data = sc.textFile("s3://...") parsedData = data.map(lambda line: array([float(x) for x in line.split(' ')])) model = KMeans.train(parsedData, 2, maxIterations=10, initializationMode="random") model.save(sc, "MyModel") sameModel = KMeansModel.load(sc, "MyModel")
  • 55. Machine Learning Algorithms • Classification • Sentiment analysis – Do people like my new product? • Linear Regression • Trend prediction – How much revenue next month? • Clustering • Recommendation - Other people bought this! • Association • Market basket analysis – Bundled products • Neural Networks • Pattern recognition - Speech recognition Amazon Machine Learning Amazon EMR + Spark Mlib GPU Optimized EC2 Instance
  • 56. Intel® Processor Technologies Intel® AVX – Dramatically increases performance for highly parallel HPC workloads such as life science engineering, data mining, financial analysis, media processing Intel® AES-NI – Enhances security with new encryption instructions that reduce the performance penalty associated with encrypting/decrypting data Intel® Turbo Boost Technology – Increases computing power with performance that adapts to spikes in workloads Intel Transactional Synchronization (TSX) Extensions – Enables execution of transactions that are independent to accelerate throughput P state & C state control – provides granular performance tuning for cores and sleep states to improve overall application performance
  • 57. New X1 Instance - Tons of Memory • Designed for large-scale, in-memory applications in the cloud • Ideal for in-memory databases like SAP HANA and big data processing apps like Spark and Presto • Powered by Intel® Xeon® E7 8880 v3 Haswell processors • Features up to 2TB of memory and up to 128 vCPUs per instance • 8X the memory offered by any other Amazon EC2 instance
  • 58. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces AWS Cloud TrailAWS IAM Amazon Kinesis AWS Storage Gateway Amazon CloudWatch Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless AWS Glue 10. Predictive models are published to data staging Amazon EMR Amazon RedShift Amazon RDS Amazon Athena Amazon ElasticSearch
  • 59. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch AWS Storage Gateway Amazon CloudWatch Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless AWS Glue 11. Analysts use DWH, EMR, ES to find patterns & measure performance Amazon RedShift Amazon RDS
  • 60. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift AWS Storage Gateway Amazon CloudWatch Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless AWS Glue 12. Risk models evaluated to create new products and assess customers Amazon RDS
  • 61. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon CloudWatch Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless AWS Glue 13. Demand forecasts loaded into supply chain management systems
  • 62. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon CloudWatch Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless AWS Glue 14. Personalized offers are broadcast out over notification channels Amazon SNS Amazon Pinpoint
  • 63. Amazon SNS & Amazon Pinpoint • Amazon SNS is a fully managed, cross-platform mobile push intermediary service • Fully scalable to millions of devices • Amazon Pinpoint allows to created targeted campaigns and measure engagement and results Amazon SNS Apple APNS Google GCM Amazon ADM Windows WNS and MPNS Baidu CP Android Phones and Tablets Apple iPhones and iPads Kindle Fire Devices Android Phones and Tablets in China iOS Windows Phone Devices Amazon SNS
  • 64. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon CloudWatch Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless AWS Glue Amazon SNS Amazon Pinpoint
  • 65. Elastic GPUs For EC2 U s e G r a p h i c s G P U s A s I f T h e y W e r e E B S Vo l u m e s
  • 66. Elastic GPUs: GPU Acceleration on-demand Current Generation EC2 Instance
  • 67. 1GiB GPU Memory 2 GiB 4 GiB 8 GiB Current Generation EC2 Instance Elastic GPUs: GPU Acceleration on-demand
  • 68.
  • 69. BREAK Next up: Real-Time Analytics and Engagement
  • 70. Interactive customer experience, event-driven automation, fraud detection Technology: Clickstream/mobile apps/sensor/video (computer vision)/audio (intent comprehension), event detection and pipelining, in-line scoring, serverless compute, computer vision, deep learning Common initiatives Interactive CX: Natural customer journeys with adaptive interfaces • Behavior-based recommendations, improving personalization along the journey • Seamless session transfer across UI, from browser to mobile to physical location • Voice-driven commands, and use of gestures and other natural interfaces Event-driven automation: Full execution of business process driven by an action • Order fulfillment, with real-time update notifications to customer • Fast response to customer complaints/comments over direct or social channels Fraud detection: Protect customer and business w/ real-time anomaly detection • Purchase and payment verification, using behavioral models and location assessment • Application and account opening validation Outcome 3 : Real-time Engagement
  • 73. The Power of Speech: Alexa Alexa, the voice service that powers Echo, provides capabilities, or skills, that enable customers to interact with devices using voice Alexa Skills Kit (ASK) allows everyone to build and publish their own skills Skills can be powered by AWS Lambda
  • 74. Build your own Alexa Skill! Amazon Echo Alexa Skills Kit AWS Lambda Facebook Page
  • 75. Personalized content - Account access - Track spending - Check balances - Pay bills - Prevent fraud
  • 76. Unlimited Replays Returns an MP3 or audio stream Lightning Fast Response Fully Managed and Low Cost Amazon Polly Turn text into lifelike speech using deep learning technologies to synthesize speech that sounds like a human voice
  • 77. Amazon Polly “The temperature in WA is 75°F” “The temperature in Washington is 75 degrees Fahrenheit” Amazon Polly: Text In, Life-like Speech Out
  • 78. Amazon Lex Conversational interfaces for your applications, powered by the same Natural Language Understanding (NLU) & Automatic Speech Recognition (ASR) models as Alexa Integrated development in AWS console Trigger AWS Lambda functions Multi-step conversations Continually improving ASR & NLU models Enterprise connectors Fully Managed
  • 79. Intents A particular goal that the user wants to achieve Utterances Spoken or typed phrases that invoke your intent Slots Data the user must provide to fulfill the intent Prompts Questions that ask the user to input data Fulfillment The business logic required to fulfill the user’s intent BookHotel
  • 80. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Provide superior customer service by responding to opportunities in real time. Fulfill requests for products or services in an automated fashion to create a strong competitive advantage over those that are unable to. Assurance becomes a different challenge, when speeds increase, and fraud prevention must be adaptive and fast. Adding another layer of opportunity and complexity is the use of vast streams of data from devices that are measuring location, video, behaviors, environmental conditions, and more.
  • 81. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection AWS Cloud TrailAWS IAM Amazon CloudWatch Data analysts Data scientists Business users Engagement platforms Automation / events 1. Real-time engagement requires personas that develop the analytics, and platforms for engaging and automating processes
  • 82. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon CloudWatch Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless AWS Glue 2. Real-time systems are built from a base of advanced data processing
  • 83. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon CloudWatch Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless AWS Glue Amazon Kinesis 3. Events are pipelined through Kinesis, into multiple streams, at scale
  • 84. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon CloudWatch Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless Amazon EMR AWS Glue Amazon Kinesis 4. Event data is given context and structure in EMR and pushed for batch
  • 85. Also possible with Spark Streaming! Amazon Kinesis EMR with Spark Streaming KinesisUtils.createStream(‘twitter-stream’) .filter(_.getText.contains(‘Big Data’)) .countByWindow(Seconds(5)) Counting tweets on a sliding window
  • 86. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon CloudWatch Amazon Kinesis Firehose Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless Amazon EMR AWS Glue Amazon Kinesis 5. Kinesis Firehose pumps events into a DWH for near real-time analysis
  • 87. Amazon Kinesis Firehose • Fully managed data streaming service to ingest and capture data into your storage or data warehouse • Ability to batch load, compress or encrypt streaming data • Elastic to scale to any throughput (no more sharding) • Charged only per GB processed ($0.035 per GB)
  • 88. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon CloudWatch Amazon Kinesis Firehose Event Scoring AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless Amazon EMR AWS Glue Amazon Kinesis 6. The event is streamed to a scoring server for processing
  • 89. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon CloudWatch Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless Amazon EMR AWS Glue Amazon Kinesis 7. Language, intent, and image processing are run and sent for scoring
  • 90. Amazon Rekognition Image Recognitions and Analysis powered by Deep Learning which allows to search, verify and organize millions of images Easy to use Batch Analysis Real-time Analysis Continually Improving Low Cost
  • 92. Demographic Data Facial Landmarks Sentiment Expressed Image Quality Brightness: 25.84 Sharpness: 160 General Attributes
  • 93. Serverless Rekognition Demo Serverless website that uses Rekognition to identify faces and classify pictures Amazon S3 AWS Lambda Amazon API Gateway Amazon DynamoDB Amazon Rekognition Mobile CodeFor.Cloud/image
  • 94. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon CloudWatch Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless Amazon EMR AWS Glue Amazon Kinesis 8. Simple analytical models are checked on-demand against Amazon ML
  • 95. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon CloudWatch Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless Amazon EMR AWS Glue Amazon Kinesis 9. Complex analytical models are scored against coded models (PMML)
  • 96. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon CloudWatch Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless Amazon EMR AWS Glue Amazon Kinesis 10. Scored response to the event is processed to be pushed for action
  • 97. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS Amazon DynamoDB AWS Storage Gateway Amazon CloudWatch Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless Amazon EMR AWS Glue Amazon Kinesis 11. Recommendations are pushed to DynamoDB for low latency serving
  • 98. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS Amazon SQS AWS Storage Gateway Amazon CloudWatch Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless Amazon EMR AWS Glue Amazon Kinesis 12. Actions are pushed to RDS and SQS for business process automation Amazon DynamoDB
  • 99. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS Amazon DynamoDB Amazon SQS AWS Storage Gateway Amazon CloudWatch Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless Amazon EMR AWS Glue Amazon Kinesis
  • 100. Amazon Kinesis Twitter Stream Amazon Lambda Demo: Live Twitter Feed Analysis * https://blog.twitter.com/2013/new-tweets-per-second-record-and-how Twitter Blog* - On a typical day (in 2013): • More than 500 million Tweets sent • Average 5,700 TPS Amazon Elasticsearch Service
  • 101.
  • 103.
  • 104. • Robinhood’s lean staff used AWS to create a massively scalable securities trading app with strong built-in security and compliance features that supported hundreds of thousands of users at launch • Saved customers $22 million in commissions since launch, and transacted over $1 billion. All of this scaled up with 2 DevOps resources • Amazon Redshift has allowed the data science team to identify fraud and fight money laundering, without needing to hire a data science infrastructure team Robinhood Launches Popular No-fee Brokerage Trading Platform on AWS Robinhood is an investment platform that offers free trades for everyone. It is based in Palo Alto, CA. We can look at real-time analytics and behaviors on our platform, that wouldn't be available at our scale if we weren't using AWS. ” “ Miles Wellesley Head of Business Development
  • 105. Automation of self-service, deployment, policy, and quality assurance Technology: Self-service, on-demand provisioning, DevOps, spot pricing, Cloud Formations, security automation, performance monitoring (CW&XR), global rollouts Common initiatives Self-service: • Application catalog or portal for all employees, availability determined by role • Service provisioning backed by automation of policy and governance Agile development: Use of DevOps to allow very few resources to deploy globally • CI/CD for software release, build/test, and deployment automation • Templated infrastructure provisioning, and configuration management • Business rules and policies are "gold coded" to be used for all deployments • Use of Security by Design (SbD) to codify network, O/S, and encryption Comprehensive monitoring: Assurance of SLA and issue remediation • Logging and monitoring of all API calls and executions to ensure SLAs are met • Analysis of performance variance for faster root cause analysis Outcome 4 : Automate for expansive reach
  • 106. Ingest ServingData sources Speed (Real-time) Scale (Batch) Automate for expansive reach Automation of self-service, deployment, policy, and quality assurance Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data AWS Cloud TrailAWS IAM Amazon Kinesis Amazon Athena Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS Amazon DynamoDB Amazon SQS AWS Storage Gateway Amazon CloudWatch Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless Amazon EMR AWS Glue Amazon Kinesis AWS DevOps
  • 107. AWS Glue Easily understand your data sources, prepare the data, and load it reliably to data stores and your analytics pipeline Integrated with: S3, RDS, Redshift & any JDBC- compliant data store
  • 111. AWS Lambda • Use AWS Lambda to clean and massage incoming data • Write code to load data sources (S3, DynamoDB) automatically in your data warehouse (e.g. Amazon Redshift) • React in real-time to incoming events in Amazon Kinesis Amazon Lambda Amazon Redshift Amazon Kinesis
  • 112. AdRoll: AWS Lambda for log files Valentino Volonghi CTO, AdRoll “Polling is not a scalable strategy to figure out when new files are added to S3, especially when you add 17M of them per month. So we moved Lambda in front of S3.” • Cross-platform, cross-device advertising platform • Offers retargeting based on clickstream data 300TB new data/mont h
  • 113. Remember everything is an API: SDKs Java Python (boto) PHP .NET Ruby Node.js iOS Android Go JavaScript C++
  • 114. Affordable Petabyte-scale Analytics AWS helps customers maximize the value of Big Data investments while reducing overall IT costs Secure, Highly Durable storage $28.16 / TB / month Data Archiving $7.16 / TB / month Real-time streaming data load $0.035 / GB 10-node Spark Cluster $0.15 / hr Petabyte-scale Data Warehouse $0.25 / hr Amazon Glacier Amazon S3 Amazon RedshiftAmazon EMRAmazon Kinesis
  • 115. Call To Action • Attend the official AWS Training course organized by AWS Authorized local training partner – Iverson Associates Sdn Bhd (www.iverson.com.my). • Join the AWS Jumpstart (2 hr) session and hear from our customers and partners on how they enabled their teams and successfully deployed on AWS. Also stand a chance to win free seat to the above courses. • Point of contact – Cheryl Wong - cheryl.wong@iverson.com.my Courses Date Architecting on AWS 28 Feb - 2 March System Operations on AWS 8-10 March Developing on AWS 15-17 March Big Data on AWS 19-21 April Date Venue 17 Mar 2017 Iverson Associates Sdn Bhd (303330-M), Suites T113-T114, 3rd Floor, Centrepoint, Lebuh Bandar Utama, Bandar Utama, 47800 Petaling Jaya, Selangor
  • 116. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. http://bit.ly/awssummitkl April 18 | One World Hotel | Kuala Lumpur Register Now!
  • 117. Join AWS User Group MY https://www.facebook.com/groups/awsugmy/

Notas do Editor

  1. 50 mins
  2. More : https://aws.amazon.com/blogs/aws/ec2-instance-update-x1-sap-hana-t2-nano-websites/