Module1 - Amazon Personalize 중심으로 살펴보는 추천 시스템의 원리와 구축
Module 2 - 추천 시스템을 위한 데이터 분석 시스템 구축 하기
Module 3 - E-Commerce 사이트를 보다 Smart 하게 만들기 (Amazon Comprehend & Fraud Detector)
S
Sungmin KimSolutions Architect em Amazon Web Services
9. Collaborative Filtering
(+)
• Content 를 분석 할 필요 없음.
• 사용자 행동 로그만으로 추천 계산
가능.
• 다양한 곳에 적용 가능.
(-)
• Cold Start 문제 발생.
Content-based Filtering
(-)
• Content가 풍부한 곳에 적용 가능.
• 적용 범위가 비교적 제한적.
(예: Music, Movies 등은 적용이 쉽지 않음)
(+)
• Content 분석 만으로 추천 가능.
• 사용자 행동 로그 없이 추천 계산 가능.
• Cold Start 문제 완화.
Hybrid = Collaborative + Content-based
14. Solution
(Recipes)
Model selection,
training, tunning
and verification
Campaign
Model hosting,
and inference
Amazon
Personalize
Data Set Group
Users Items
Interactions
Data Sets
User events /
interactions
Item meta data
(a.k.a catalog
information -
optional)
User meta data
(e.g.
demographics
– optional)
Amazon Personalize
How it works
• GetRecommendations
• GetPersonalizedRanking
16. 어떤 데이터를 준비해야할까?
• 3 가지 데이터
• Users
• 사용자 메타 데이터
• 연령, 성별, 고객 멤버쉽 등
• Items
• Item 메타 데이터
• 가격, SKU(상품 재고 관리 단위), 재고 여부 등
• (User-Item) Interactions
• 사용자의 Item에 대한 행동 로그 데이터
• 구매(buy), 장바구니 담기(cart), 상품 보기(view) 등
• Interactions 데이터는 추천 계산에 사용되므로 반드시 필요함
• User, Items는 추천 계산에서 사용할 데이터 제외 및 추천 결과 filtering 용도로 사용
• 입력 데이터는 S3에 CSV 포맷의 저장, 첫번째 row에 컬럼 Header가 필요함
20. 데이터는 얼마나 준비해야할까?
Minimum Suggested Data Volume
• More than 50 users.
• More than 50 items.
• More than 1,500 interactions.
※ https://github.com/aws-samples/amazon-personalize-samples/blob/master/PersonalizeCheatSheet2.0.md
21. User personalization Personalized ranking
Similar items
Recipes
• User-personalization
• HRNN, HRNN-Metadata,
HRNN-Coldstart(legacy)
• Popularity-Count (baseline)
Recipe
• SIMS
Recipe
• Personalized-Ranking
Use case by Recipes
32. Amazon Personalize
Data Set Group
Users Items
Interactions
Solution
(Recipes)
Model selection,
training, tunning
and verification
Campaign
Model hosting,
and inference
Amazon
Personalize
Data Sets
Users, Items,
and Interactions
38. #6: Dynamic parallelism
#5: Parallel processing
#4: Human in the loop
#3: Error handling
#2: Branching
#1: Function orchestration
Amazon Step Functions Use Cases
42. Amazon Personalize
Data Set Group
Users Items
Interactions
Solution
(Recipes)
Model selection,
training, tunning
and verification
Campaign
Model hosting,
and inference
Amazon
Personalize
Data Sets
Users, Items,
and Interactions
43. Amazon Personalize
Data Set Group
Users Items
Interactions
Solution
(Recipes)
Model selection,
training, tunning
and verification
Campaign
Model hosting,
and inference
Events
Tracker Data Sets
Users, Items,
and Interactions
Amazon
Personalize
44. Amazon Personalize
Data Set Group
Users Items
Interactions
Solution
(Recipes)
Model selection,
training, tunning
and verification
Campaign
Model hosting,
and inference
Events
Tracker Data Sets
Users, Items,
and Interactions
Real-Time
Events
Amazon
Personalize
51. • Filter – 학습(training)에 사용할 records제외하기
• EVENT_TYPE – 특정 EVENT_TYPE의 records만 사용하고 싶은 경우
• EVENT_VALUE – 특정 threshold 값 이상의 records만 사용하고 싶은 경우
• (e.g. Review Rating > 2.0, Watch > 5)
• Dynamic Filter – 추천 API 호출 시 추천 결과에서 제외 하기
• GetRecommendations
• GetPersonalizedRanking
Dataset
Group
Dataset
Import
Recipe
&
Solution
Campaign
Amazon
Personalize
• GetRecommendations
• GetPersonalizedRanking
Filtering Recommendations
※ https://docs.aws.amazon.com/personalize/latest/dg/filter.html
52. Dynamic Filter
• Filtering by item
• EXCLUDE ItemId WHERE items.genre IN ($GENRE)
• EXCLUDE ItemId WHERE items.genre IN ("Comedy")
• INCLUDE ItemId WHERE items.number_of_downloads < 20
• Filtering by interactions
• INCLUDE ItemId WHERE interactions.event_type IN ("*")
• EXCLUDE ItemId WHERE interactions.event_type IN (“click”, “stream”)
• Filtering by item based on user properties
• EXCLUDE ItemId WHERE items.number_of_downloads < 20 IF CurrentUser.age > 18
AND CurrentUser.age < 30
• INCLUDE Item.ID WHERE items.genre IN (“Comedy”) | EXCLUDE ItemID WHERE
items.description IN ("classic”)
※ Amazon Personalize now supports dynamic filters for applying business rules to your recommendations on the fly
53. 추천 결과 추천 결과
Web
Server
Amazon Personalize
X
X
Web
Server
Amazon Personalize
X
X
X
X
Lambda
X
X
X
X
Partially Filled Fully Filled
추천 Filtering을 Business Logic 처럼 적용하기
Amazon Personalize Filtering 사용 Filtering Business Logic 적용
54. Batch Recommendation
• 사용 사례
§ 많은 수의 사용자에 대한 추천을 한 번에 계산 및 저장 하고 싶은 경우
§ 배치 기반 워크플로(예: 이메일 또는 알림 전송)를 통해 추천 결과를 지속적으로
제공하고 싶은 경우
• 비용
§ 사용자들에게 동일한 추천 결과를 제공해도 되는 경우
§ 배치 처리가 훨씬 더 편리하고 경제적
• 배치 추론 작업 방법
• AWS 웹 콘솔
• API 호출
56. 실시간 vs 배치 추천 비용
데이터 양
실시간 추천
배치 추천
비용 ü 학습용 데이터 양
ü 모델 학습 시간
ü 추론 API 호출 시간(TPS)
57. Summary
• 추천의 중요성
- Retention Rate, Churn Rate, Conversion Rate 등의 E-Commerce 주요 지표에 영항을 줌
• 추천 평가 기준
- Coverage, Relevance, Serendipity의 조화
• Amazon Personalize 적용 시 필요한 데이터
- Users, Items, Interactions
• Amazon Personalize Recipes
- User-Personalization, SIMS, Popularity-Count, Personalized-Ranking
• Amazon Personalize 적용 순서
- Data Ingestion(Data Set Group 생성 & Data Set Import) → Training (Recipe & Solution 생성) →
Inference(Campaign 생성)
• AWS Step Functions을 이용한 추천 계산 작업 자동화(MLOps)
• 추천 결과 필터링 방법
- 추천 계산 시, Interactions 데이터 필터링
- 추천 API 호출 시, 결과에서 필터링 (Dynamic Filter)
- 추천 계산 결과 후처리 (Business Logic 적용 하기)
61. Agenda
• 데이터 분석의 위한 사전 지식
• 데이터 구조
• 데이터 온도 스펙트럼
• 데이터 파이프라인
• 추천 시스템 구축을 위해 필요한 데이터
• 사용자 행동 로그
• 추천 성과 분석 지표
• 사용자 행동 로그 수집을 위한 데이터 분석 아키텍처
• 추천 성과 분석을 위한 데이터 분석 시스템 확장
• Lesson Learned – Architectural Principles
64. Structure
Hot data Warm data Cold data
Low
High
High Request rate
Low
High Cost / GB
Low High
Latency
Low High
Data Volume
Low
In-Memory SQL
NoSQL Search
Object Storage
Archive
Storage
Graph
Data Temperature Spectrum
65. Simplify Big Data Processing
Collect Consume
Store Process/Analyze
Data
1 4
0 9
5
Answers &
Insights
Time to answer (Latency)
Throughput
Cost
ETL
66. 추천 시스템에 필요한 데이터
• 추천 계산을 위한 사용자 행동 로그
• 상품 상세 페이지 보기
• 장바구니 담기
• 상품 구매
• …
• 추천 성과 측정을 위한 데이터
• 추천 아이템 노출 횟수
• 추천 아이템 클릭 횟수
• 추천 아이템 노출 위치
• …
visit
view
cart
buy
67. 사용자 행동 로그 수집
Data Set Group
Users Items
Interactions
Solution
(Recipes)
Model selection,
training, tunning
and verification
Campaign
Model hosting,
and inference
Events
Tracker Data Sets
Users, Items,
and Interactions
Real-Time
Events
Amazon
Personalize
68. • visit
• view
• cart
• buy
Web
Server
Users
Items
Interactions
Amazon Personalize
사용자 행동 로그 수집
69. • visit
• view
• cart
• buy
Web
Server
Users
Items
Interactions
Dataset
Group
Dataset
Import
Recipe
&
Solution
Campaign
Amazon Personalize
Personalize
Runtime
API
사용자 행동 로그 수집
70. • visit
• view
• cart
• buy
Web
Server
Users
Items
Interactions
Dataset
Group
Dataset
Import
Recipe
&
Solution
Campaign
AWS Step
Functions
workflow
Personalize
Runtime
API
사용자 행동 로그 수집
71. • visit
• view
• cart
• buy
Web
Server
Users
Items
Interactions
Dataset
Group
Dataset
Import
Recipe
&
Solution
Campaign
AWS Step
Functions
workflow
S3
Personalize
Runtime
API
사용자 행동 로그 수집
72. • visit
• view
• cart
• buy
Web
Server
Users
Items
Interactions
Dataset
Group
Dataset
Import
Recipe
&
Solution
Campaign
AWS Step
Functions
workflow
Personalize
Runtime
API
Users Items
Interactions
S3
사용자 행동 로그 수집
73. • visit
• view
• cart
• buy
Web
Server
S3
Users
Items
Interactions
Step Functions
How to deliver?
ü Fastly
ü Without loss
Personalize
Runtime
API
사용자 행동 로그 수집
74. Key Components of Real-time Analytics
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Devices and/or
applications that
produce real-time
data at high
velocity
Data from tens of
thousands of data
sources can be
written to a single
stream
Data are stored in the
order they were
received for a set
duration of time and
can be replayed
indefinitely during
that time
Records are read in
the order they are
produced, enabling
real-time analytics
or streaming ETL
Data lake
(most common)
Database
(least common)
76. • visit
• view
• cart
• buy
Web
Server
S3
Users
Items
Interactions
Step Functions
Data Source
Data Sink
Personalize
Runtime
API
사용자 행동 로그 수집
Stream
Storage
77. • visit
• view
• cart
• buy
Web
Server
S3
Stream
Delivery
Users
Items
Interactions
Step Functions
Data Source
Data Sink
Personalize
Runtime
API
사용자 행동 로그 수집
Stream
Storage
78. • visit
• view
• cart
• buy
Web
Server
S3
Stream
Delivery
Users
Items
Interactions
Step Functions
Kinesis Data
Streams
Managed Streaming
for Kafka
Kinesis Data
Firehose
Personalize
Runtime
API
사용자 행동 로그 수집
Stream
Storage
81. Comparing Amazon Kinesis Data Streams to MSK
• Streams and shards
• AWS API experience
• Throughput provisioning model
• Seamless scaling
• Typically lower costs
• Deep AWS integrations
• Topics and partitions
• Open-source compatibility
• Strong third-party tooling
• Cluster provisioning model
• Apache Kafka scaling isn’t
seamless to clients
• Raw performance
Amazon Kinesis Data Streams Amazon MSK
82. Stream Ingestion
• AWS SDKs
• Publish directly from application code via APIs
• AWS Mobile SDK
• Kinesis Agent
• Monitors log files and forwards lines as messages to
Kinesis Data Streams
• Kinesis Producer Library (KPL)
• Background process aggregates and batches messages
• 3rd party and open source
• Kafka Connect (kinesis-kafka-connector)
• fluentd (aws-fluent-plugin-kinesis)
• Log4J Appender (kinesis-log4j-appender)
• and more …
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Amazon Kinesis
Data Streams
84. Amazon Kinesis Data Firehose
• Zero administration and seamless elasticity
• Direct-to-data store integration
• Serverless continuous data transformations
• Near real-time
86. Pre-built Data Conversion
Data
Source
Kinesis
Data Firehose
JSON Data
schema
AWS Glue Data
Catalog
Amazon S3
• Convert the format of your input data from JSON to columnar data
format Apache Parquet or Apache ORC before storing the data in
Amazon S3
• Works in conjunction to the transform features to convert other format
to JSON before the data conversion
convert to
columnar format
/failed
87. • visit
• view
• cart
• buy
Web
Server
S3
Stream
Delivery
Users
Items
Interactions
Step Functions
Personalize
Runtime
API
Kinesis Data
Streams
Managed Streaming
for Kafka
Kinesis Data
Firehose
사용자 행동 로그 수집
Stream
Storage
88. • visit
• view
• cart
• buy
Web
Server
S3
Users
Items
Interactions
Step Functions
Kinesis Data
Streams
Personalize
Runtime
API
사용자 행동 로그 수집
89. • visit
• view
• cart
• buy
Web
Server
S3
Users
Items
Interactions
Step Functions
Kinesis Data
Streams
Kinesis Data
Firehose
Personalize
Runtime
API
사용자 행동 로그 수집
90. • visit
• view
• cart
• buy
Web
Server
S3
Users
Items
Interactions
Step Functions
Kinesis Data
Streams
Kinesis Data
Firehose
Event
Tracker
Lambda
Personalize
Runtime
API
사용자 행동 로그 수집
91. 추천 성과 분석
• E-Commerce 주요 지표
• Retention Rate (체류 시간)
• Churn Rate (이탈률)
• Conversion Rate (전환률)
• 추천 알고리즘 지표 - A/B Test
• Coverage
• CTR (Click-through Rate)
≈ Relevance + Serendipity
If you can’t measure it, you can’t improve it.
– Peter Drucker
92. • visit
• view
• cart
• buy
Web
Server
S3
Kinesis
Firehose
Users
Items
Interactions
Event
Tracker
Lambda
Step Functions
Personalize
Runtime
API
추천 성과 분석
Kinesis Data
Streams
93. • visit
• view
• cart
• buy
Web
Server
S3
Kinesis
Firehose
Users
Items
Interactions
Amazon Personalize
사용자
행동 로그
추천 성과 분석
Kinesis Data
Streams
94. • visit
• view
• cart
• buy
ü click
ü impression
ü channel
Web
Server
S3
Kinesis
Firehose
Users
Items
Interactions
Amazon Personalize
사용자
행동 로그
추천 성과
지표
추천 성과 분석
Kinesis Data
Streams
95. • visit
• view
• cart
• buy
ü click
ü impression
ü channel
Web
Server
S3
Kinesis
Firehose
Users
Items
Interactions
Amazon Personalize
Marketer
Data Scientist
Business Analyst
사용자
행동 로그
추천 성과 분석
추천 성과
지표
Kinesis Data
Streams
96. • visit
• view
• cart
• buy
ü click
ü impression
ü channel
Web
Server
S3
Kinesis
Firehose
Users
Items
Interactions
Amazon Personalize
Marketer
Data Scientist
Business Analyst
QuickSight
사용자
행동 로그
추천 성과
지표
추천 성과 분석
Kinesis Data
Streams
97. DATA SOURCES
Relational
Databases
Flat Files
And Many Others!
DATA SETS
Retail Data
Ops Data
Marketing Data
ANALYSES DASHBOARDS &
STORIES
Amazon QuickSight
Fast BI Service with Pay-per-Session Pricing and ML Insights for everyone
99. Comparison of SQL Processing engines
Data Structure Semi Semi Semi Full
Languages API/SQL SQL SQL SQL
Data Store
S3 (Glue),
S3/HDFS (Spark)
S3/HDFS S3 Local
Use case Transformation
SQL Queries
for S3/HDFS
Serverless SQL
Queries for S3
Fully Featured
SQL Database
Performance
AWS Glue Amazon Athena Amazon Redshift
105. Amazon EMR
Applications
Framework
Process Layer
Data Layer
Infrastructure
S3
EMRFS
Amazon
S3
Instances Spot Instances
Amazon EMR
Easily run and scale Apache Spark, Hive, Presto, and other big data frameworks
106. • Interact with streaming data in real-time using SQL or integrated Apache Flink applications
• Build fully managed and elastic stream processing applications
Amazon Kinesis Data Analytics
A managed Apache Flink solution that enables building of sophisticated streaming
applications
107. Kinesis Data Analytics for SQL
Data
Source
Stream
Storage
Stream
Ingestion
Data
Sink
[("It's", 1),
("raining", 1),
("cats", 1),
("and", 1),
("dogs!", 1)]
“It's raining cats and dogs!” It’s 1
raining 1
cats 1
and 1
dogs! 1
108. Kinesis Data Analytics (SQL)
• STREAM (in-application): a continuously
updated entity that you can SELECT from and
INSERT into like a TABLE
• PUMP: an entity used to continuously
'SELECT ... FROM' a source STREAM, and
INSERT SQL results into an output STREAM
• Create output stream, which can be used to
send to a destination
SOURCE
STREAM
INSERT
& SELECT
(PUMP)
DESTIN.
STREAM
Destination
Source
[("It's", 1),
("raining", 1),
("cats", 1),
("and", 1),
("dogs!", 1)]
116. Use Amazon S3 as your Data lake
• Natively supported by big data frameworks (Spark, Hive, Presto, etc.)
• Decouple storage and compute
• No need to run compute clusters for storage (unlike HDFS)
• Can run transient Amazon EMR clusters with Amazon EC2 Spot Instances
• Multiple & heterogeneous analysis clusters and services can use the same
data
• Designed for 99.999999999% durability
• No need to pay for data replication within a region
• Secure: SSL, client/server-side encryption at rest
• Low cost
126. From Batch to Real-time:
Lambda Architecture
Data
Source
Stream
Storage
Speed Layer
Batch Layer
Batch
Process
Batch
View
Real-
time
View
Consumer
Query &
Merge Results
Service Layer
Stream
Ingestion
Raw Data
Storage
Streaming Data
Stream
Delivery
Stream
Process
127. Collect Consume
Store Process /
Analyze
Data
1 4
0 9
5 Answers &
Insights
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Streams
Amazon Managed
Streams for Kafka
Amazon S3
Amazon Kinesis
Data Analytics
AWS Glue
Amazon EMR
Amazon Athena Amazon QuickSight
Amazon Redshift
Amazon Elasticsearch
Service
Amazon Machine
Learning
AWS Lambda
ETL
128. Collect Consume
Store Process /
Analyze
Data
1 4
0 9
5 Answers &
Insights
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Streams
Amazon Managed
Streams for Kafka
Amazon S3
Amazon Kinesis
Data Analytics
AWS Glue
Amazon EMR
Amazon Athena Amazon QuickSight
Amazon Redshift
Amazon Elasticsearch
Service
Amazon Machine
Learning
AWS Lambda
ETL
Stream
Storage
129. Collect Consume
Store Process /
Analyze
Data
1 4
0 9
5 Answers &
Insights
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Streams
Amazon Managed
Streams for Kafka
Amazon S3
Amazon Kinesis
Data Analytics
AWS Glue
Amazon EMR
Amazon Athena Amazon QuickSight
Amazon Redshift
Amazon Elasticsearch
Service
Amazon Machine
Learning
AWS Lambda
ETL
Stream
Delivery
130. Collect Consume
Store Process /
Analyze
Data
1 4
0 9
5 Answers &
Insights
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Streams
Amazon Managed
Streams for Kafka
Amazon S3
Amazon Kinesis
Data Analytics
AWS Glue
Amazon EMR
Amazon Athena Amazon QuickSight
Amazon Redshift
Amazon Elasticsearch
Service
Amazon Machine
Learning
AWS Lambda
ETL
Data Lake
131. Collect Consume
Store Process /
Analyze
Data
1 4
0 9
5 Answers &
Insights
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Streams
Amazon Managed
Streams for Kafka
Amazon S3
Amazon Kinesis
Data Analytics
AWS Glue
Amazon EMR
Amazon Athena Amazon QuickSight
Amazon Redshift
Amazon Elasticsearch
Service
Amazon Machine
Learning
AWS Lambda
ETL
Stream/Batch
Process
(Batch,
Speed Layer)
132. Collect Consume
Store Process /
Analyze
Data
1 4
0 9
5 Answers &
Insights
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Streams
Amazon Managed
Streams for Kafka
Amazon S3
Amazon Kinesis
Data Analytics
AWS Glue
Amazon EMR
Amazon Athena Amazon QuickSight
Amazon Redshift
Amazon Elasticsearch
Service
Amazon Machine
Learning
AWS Lambda
ETL
Serving Layer
133. Lessons Learned: Architectural Principles
• Build decoupled systems
- Data → Store → Process → Store → Analyze → Answers
• Use the right tool for the job
- Data structure, latency, throughput, access patterns
• Leverage managed and serverless services
- Scalable/elastic, available, reliable, secure, no/low admin
• Use log-centric design patterns
- Immutable logs (data lake), materialized views
• Be cost-conscious
- Big data ≠ Big cost
• Working backwards
- Design from consume to collect
150. AI Driven Social Media Dashboard
https://aws.amazon.com/ko/solutions/implementations/ai-
driven-social-media-dashboard/
151. Reference
• Detect sentiment from customer reviews using Amazon Comprehend
• Enable smart text analytics using Amazon Elasticsearch Service and
Amazon Comprehend
• Build a social media dashboard using machine learning and BI services
• Building a custom classifier using Amazon Comprehend
• Build a custom entity recognizer using Amazon Comprehend
154. Hand Designed Rule Automated Rule learning from Data
사기 탐지(Fraud Detection) 어떻게 할 수 있을까?
155. ü 확장성(Scalability)
ü 새로운 유형의 사기 탐지
ü Domain 전문가
사람이 직접 Fraud Detection Rules을 개발한다면,
Hand Designed Rules
156. ü ML(기계 학습) 전문가의 부재
ü 반복적인 학습과 모델 평가
ü Time-consuming 작업
Automated Rule learning from Data
Fraud Detection은 ML 역시 어렵다
157. Amazon Fraud Detector
기계 학습을 사용하여 온라인 사기를 대규모로 실시간으로 쉽게
감지 할 수 있는 사기 탐지 서비스
사전 구축 된 사기
탐지 모델 템플릿
맞춤형 사기 탐지
모델 자동 생성
아마존 내부 경험을
통한 다양한 패턴
Amazon
SageMaker와의
통합
과거 평가 및 탐지
로직 검토 통합
159. ML template: Online Fraud Insights
• Detect risky events based on an event’s attributes
• Best for detecting potential fraud when historical account/user data is
limited
• Inspired by models and techniques used to protect Amazon.com/AWS
account registration
• Use cases: new account, first transaction, guest checkout
• Inputs: 3 required data elements and 50+ optional
160. Data requirements (for Online Fraud Insights template)
EVENT_TIMESTAMP Variable 1 Variable 2 Variable N EVENT_LABEL
4/10/2019 11:05 … … … Legit / 0
4/10/2019 19:34 … … … Legit / 0
4/10/2019 20:29 … … ... Fraud / 1
… … … … …
Required Required
At least 2 variables required (max 100)
At least 10K total examples
At least 500 fraud examples
• Data must reside in S3 (same region with AFD)
• Data should be in CSV format
• First line of CSV file should have headers
• 2 required headers: EVENT_TIMESTAMP and
EVENT_LABEL (they should not have any NULL or missing
values)
• Maximum file size of 5GB
• Minimum 6 weeks of data
• Recommended: 3-6 months of data
• AFD can handle NULL and missing
values (for variables)
163. • You will need to map all the event variables to a variable type
• Amazon Fraud Detector can also do this automatically, when you
import the dataset
• For more information see Variable types .
EVENT_TIMESTAMP Variable 1 Variable 2 Variable N EVENT_LABEL
4/10/2019 11:05 … … … Legit / 0
4/10/2019 19:34 … … … Legit / 0
4/10/2019 20:29 … … ... Fraud / 1
… … … … …
Variable type
EMAIL_ADDRESS
IP_ADDRESS
PHONE_NUMBER
USERAGENT
FINGERPRINT
PAYMENT_TYPE
CARD_BIN
AUTH_CODE
AVS
BILLING_NAME
BILLING_PHONE
BILLING_ADDRESS_L1
BILLING_ADDRESS_L2
BILLING_CITY
BILLING_STATE
BILLING_COUNTRY
BILLING_ZIP
SHIPPING_NAME
SHIPPING_PHONE
SHIPPING_ADDRESS_L1
SHIPPING_ADDRESS_L2
SHIPPING_CITY
SHIPPING_STATE
SHIPPING_COUNTRY
SHIPPING_ZIP
ORDER_ID
PRODUCT_CATEGORY
CURRENCY_CODE
PRICE
NUMERIC
CATEGORICAL
FREE_FORM_TEXT
Variables
164. ML Template: Automated model building
Data
Validation
1
Data Enrichment
&Transformation
2
Model Training
& Selection
4
Performance
Metrics
5
Training data in
Amazon S3
Deployment
& Hosting
6
Feature
Engineering
3
165. Interactive ML performance metrics
• GUI for defining the
optimal decision threshold
for the best separation
between fraud and legits
• Confusion matrix
• Easily control the trade-
off between FP and FN
Part of Fraud Detector UI
168. Reference
• Catching fraud faster by building a proof of concept in
Amazon Fraud Detector
• Reviewing online fraud using Amazon Fraud Detector and
Amazon A2I
• AWS Fraud Detector Samples