SlideShare uma empresa Scribd logo
1 de 26
How to Train
Your Classifier:
Create a Serverless Machine Learning System
with AWS and Python
PyData ✤ November 27th, 2017 ✤ apmetadata@ap.org
Classification
Parrots
Sandwiches
apmetadata@ap.org
apmetadata@ap.org
Tags
Why do you want tags
on your text content?
● Search, navigation,
recommendations
● Aggregation, routing
● Discoverability
○ properties
○ relationships
apmetadata@ap.org
Taxonomy
apmetadata@ap.org
Taxonomy
Jordan Larson
<http://cv.ap.org/id/9A7FD8FA87AD4A43BDD522B65147A808> ,
ap:associatedState <http://cv.ap.org/id/8083[Nebraska]43E>;
ap:displayLabel "Jordan Larson (Women's volleyball)"@en;
ap:hometown "Hooper, NE"@en;
ap:olympicTeam2016 <http://cv.ap.org/id/46[United States Olympic Team]B73H>;
ap:sport <http://cv.ap.org/id/DA[Volleyball]C8EA>;
dbprop:birthdate "1986-10-16"^^xsd:date;
dcterms:created "2012-07-11T14:30:26-04:00"^^xsd:dateTime;
dcterms:modified "2017-07-25T10:37:49-04:00"^^xsd:dateTime;
a <http://cv.ap.org/c/ProfessionalAthlete>, skos:Concept;
skos:broader <http://cv.ap.org/id/384[Professional Athlete]88>;
skos:definition "American volleyball player."@en;
skos:inScheme <http://cv.ap.org/a#person>;
skos:prefLabel "Jordan Larson"@en;
foaf:gender "Female"@en.
Applying taxonomy to text
Manually
apmetadata@ap.org
Airlines
Industry
Pan
American
Airlines Co.
Travel
<Hurricane Harvey>
(AND,
(MINOC_2,
(SENT,
(NOTIN,
(OR,"Harvey_C","HARVEY_C"),
(OR,"[Fullname
female]","[Fullname
male]","[Person]")),
(OR,"texas","landfall","storm",
"hurricane","nws","National weather
service","evacuate@","surge@","flood@",
"rain@N","coastal","sandbag@N"...
)
)
)...
Applying taxonomy to text
Rules-based classifier
apmetadata@ap.org
https://www.flickr.com/photos/notionscapital/15556898221/
Applying taxonomy to text
Statistical classifier
apmetadata@ap.org
Training data
Training engine Trained model
AP Metadata Services
Tag with AP taxonomy
APMS Custom Tagging
Simple four step REST API
Add your own tags and taxonomy
apmetadata@ap.org
Let’s create a classifier! For dragons
What if l like the AP Taxonomy
but I want to classify with some additional tags?
In this case, documents about dragons
A taxonomy of dragons
(borrowed from screencrush.com)
New documents about dragons
To be classified
A map (with some * )
A fully automated workflow
for training and deploying a
Lambda-based classifier
Sadly, the expression hic sunt
dracones (here be dragons) is an
anachronism, but it does appear
at least once, on the Hunt-Lenox
globe (ca 1510).
The Hunt-Lenox Globe (NYPL)
* Dragon emojis indicate problems found and (mostly) solved
Step
Functions
Client
EC2
Auto Scaling
Download training data
Download dependencies
Train model
Deploy model
EC2 classifier.py
classifier.pkl
tags.json
API Gateway
Lambda
Workflow Scaling Worker Classifier
apmetadata@ap.org
Creating a classifier
A Lambda-based classifier
• AWS Lambda: run event-driven code without provisioning or
managing a server or servers
•Cost efficient solution to ensure capacity meets demand
• What do we need?
• Code to invoke classifier and return results to user
• Code dependencies (e.g. scikit-learn)
• Other supporting artifacts (the trained model, the taxonomy)
• Permissions for Lambda function to interact with other AWS services
• API endpoint for accessing Lambda function
apmetadata@ap.org
Step
Functions
Client
EC2
Auto Scaling
Download training data
Download dependencies
Train model
Deploy model
EC2 classifier.py
classifier.pkl
tags.json
API Gateway
Lambda
Workflow Scaling Worker Classifier
apmetadata@ap.org
Processing user requests
Processing user requests
Validate and train
Adding complexity: a workflow for algorithm selection
AWS Step Functions: use visual workflows to coordinate microservices
into a single application
Triggers auto-scaling,
sends training request
to worker in the cloud.
apmetadata@ap.org
Step
Functions
Client
EC2
Auto Scaling
Download training data
Download dependencies
Train model
Deploy model
EC2 classifier.py
classifier.pkl
tags.json
API Gateway
Lambda
Workflow Scaling Worker Classifier
apmetadata@ap.org
Training and deploying
Training in the cloud
• AWS EC2: scalable computing capacity in the cloud
• Register an Amazon Machine Image (AMI) specifically for training
•Speeds up provisioning your server
• Ensures versions match between dependencies and your model
•Prepare dependencies ahead of time to beat AWS Lambda’s size limits
•If you are using scikit-learn, sklearn-build-lambda can generate an appropriately
sized zip
• Save model and taxonomy to disk, add to dependency zip
apmetadata@ap.org
Automating deployments
• Serverless Framework: Node.js
application for rapid deployment of
serverless architectures
• Simplifies the task of creating (and
deleting) our classifier Lambdas
•Provider agnostic, though you may
not be
•Zip artifact support for Lambda
creation
apmetadata@ap.org
Step
Functions
Client
EC2
Auto Scaling
Download training data
Download dependencies
Train model
Deploy model
EC2 classifier.py
classifier.pkl
tags.json
API Gateway
Lambda
Workflow Scaling Worker Classifier
apmetadata@ap.org
Classifying with AWS Lambda
Classifying with AWS Lambda
• Be mindful of cold starts
•Allocating more memory may help
• Store large models in S3 and take advantage of container reuse
•Download assets to /tmp
•Check /tmp for cached data before invocation
Item Limit
Deployment package (compressed) 50MB
Deployment package (uncompressed) 250MB
Non-persistent disk space in /tmp 500MB
apmetadata@ap.org
Predicted
Eagles
Predicted
Doves
Predicted
Pigeons
Sum of items
= 300
Actual
Eagles
95 3 2 100 Eagles
Actual
Doves
3 72 25 100 Doves
Actual
Pigeons
2 23 75 100 Pigeons
How do I measure results?
Confusion matrix
apmetadata@ap.org
How do I measure
results?
apmetadata@ap.org
Measure your model’s performance per class
• Precision (number of correct predictions divided by the total number in the dataset)
• Recall (number of correct positive predictions divided by the total number of positives)
Predicted
Eagles
Predicted
Doves
Predicted
Pigeons
Sum of items
= 300
Actual
Eagles
95 3 2 100 Eagles
Actual
Doves
3 72 25 100 Doves
Actual
Pigeons
2 23 75 100 Pigeons
Model accuracy:
242 / 300 = 80%
How do I improve results?
Training data
• Correctly tagged - quality matters
• Quantity matters too - as long as it’s ‘good’ data!
• Balanced training sets across classes
apmetadata@ap.org
How do I improve results?
Taxonomy
• Clean taxonomy nodes and structure
• Distinct semantics, use relationships
• Avoid overlapping concepts between nodes
apmetadata@ap.org
apmetadata@ap.org
Thank You!
dfox@ap.org
smyles@ap.org
vzielinska@ap.org
apmetadata@ap.org
Learn more about AP Metadata Services
https://developer.ap.org/ap-metadata-services

Mais conteúdo relacionado

Mais procurados

Building Automated Control Systems for Your AWS Infrastructure
Building Automated Control Systems for Your AWS InfrastructureBuilding Automated Control Systems for Your AWS Infrastructure
Building Automated Control Systems for Your AWS InfrastructureAmazon Web Services
 
Building CICD Pipelines for Serverless Applications - DevDay Austin 2017
Building CICD Pipelines for Serverless Applications - DevDay Austin 2017Building CICD Pipelines for Serverless Applications - DevDay Austin 2017
Building CICD Pipelines for Serverless Applications - DevDay Austin 2017Amazon Web Services
 
Serverless Orchestration with AWS Step Functions - May 2017 AWS Online Tech T...
Serverless Orchestration with AWS Step Functions - May 2017 AWS Online Tech T...Serverless Orchestration with AWS Step Functions - May 2017 AWS Online Tech T...
Serverless Orchestration with AWS Step Functions - May 2017 AWS Online Tech T...Amazon Web Services
 
Building A Dynamic Website - 31st Jan 2015
Building A Dynamic Website - 31st Jan 2015Building A Dynamic Website - 31st Jan 2015
Building A Dynamic Website - 31st Jan 2015Jhalak Modi
 
EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...
EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...
EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...rICh morrow
 
Big Data answers in seconds with Amazon Athena
Big Data answers in seconds with Amazon AthenaBig Data answers in seconds with Amazon Athena
Big Data answers in seconds with Amazon AthenaJulien SIMON
 
Scaling your web app horizontally and vertically (ahmedabad amazon aws cloud...
Scaling your web app  horizontally and vertically (ahmedabad amazon aws cloud...Scaling your web app  horizontally and vertically (ahmedabad amazon aws cloud...
Scaling your web app horizontally and vertically (ahmedabad amazon aws cloud...Jhalak Modi
 
Building a Serverless Pipeline
Building a Serverless PipelineBuilding a Serverless Pipeline
Building a Serverless PipelineJulien SIMON
 
Customer Case Study Containerised Bioinformatics on AWS How we Achieved 20x l...
Customer Case Study Containerised Bioinformatics on AWS How we Achieved 20x l...Customer Case Study Containerised Bioinformatics on AWS How we Achieved 20x l...
Customer Case Study Containerised Bioinformatics on AWS How we Achieved 20x l...Amazon Web Services
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesAmazon Web Services
 
Serverless Frameworks on AWS
Serverless Frameworks on AWSServerless Frameworks on AWS
Serverless Frameworks on AWSJulien SIMON
 
What is AWS lambda?
What is AWS lambda?What is AWS lambda?
What is AWS lambda?Whizlabs
 
Redshift loader - Copenhagen AWS User Group
Redshift loader - Copenhagen AWS User GroupRedshift loader - Copenhagen AWS User Group
Redshift loader - Copenhagen AWS User GroupMartin Larsen
 
Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)Julien SIMON
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...Julien SIMON
 
Deploying AWS Lambda Functions using Go Lang
Deploying AWS Lambda Functions using Go LangDeploying AWS Lambda Functions using Go Lang
Deploying AWS Lambda Functions using Go LangAnkit Sheth
 
Autoscaling Best Practices
Autoscaling Best PracticesAutoscaling Best Practices
Autoscaling Best PracticesMarc Cluet
 
Developing and deploying serverless applications (February 2017)
Developing and deploying serverless applications (February 2017)Developing and deploying serverless applications (February 2017)
Developing and deploying serverless applications (February 2017)Julien SIMON
 

Mais procurados (19)

Building Automated Control Systems for Your AWS Infrastructure
Building Automated Control Systems for Your AWS InfrastructureBuilding Automated Control Systems for Your AWS Infrastructure
Building Automated Control Systems for Your AWS Infrastructure
 
Building CICD Pipelines for Serverless Applications - DevDay Austin 2017
Building CICD Pipelines for Serverless Applications - DevDay Austin 2017Building CICD Pipelines for Serverless Applications - DevDay Austin 2017
Building CICD Pipelines for Serverless Applications - DevDay Austin 2017
 
Serverless Orchestration with AWS Step Functions - May 2017 AWS Online Tech T...
Serverless Orchestration with AWS Step Functions - May 2017 AWS Online Tech T...Serverless Orchestration with AWS Step Functions - May 2017 AWS Online Tech T...
Serverless Orchestration with AWS Step Functions - May 2017 AWS Online Tech T...
 
Building A Dynamic Website - 31st Jan 2015
Building A Dynamic Website - 31st Jan 2015Building A Dynamic Website - 31st Jan 2015
Building A Dynamic Website - 31st Jan 2015
 
EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...
EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...
EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...
 
Big Data answers in seconds with Amazon Athena
Big Data answers in seconds with Amazon AthenaBig Data answers in seconds with Amazon Athena
Big Data answers in seconds with Amazon Athena
 
Scaling your web app horizontally and vertically (ahmedabad amazon aws cloud...
Scaling your web app  horizontally and vertically (ahmedabad amazon aws cloud...Scaling your web app  horizontally and vertically (ahmedabad amazon aws cloud...
Scaling your web app horizontally and vertically (ahmedabad amazon aws cloud...
 
Building a Serverless Pipeline
Building a Serverless PipelineBuilding a Serverless Pipeline
Building a Serverless Pipeline
 
Customer Case Study Containerised Bioinformatics on AWS How we Achieved 20x l...
Customer Case Study Containerised Bioinformatics on AWS How we Achieved 20x l...Customer Case Study Containerised Bioinformatics on AWS How we Achieved 20x l...
Customer Case Study Containerised Bioinformatics on AWS How we Achieved 20x l...
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless Architectures
 
Serverless Frameworks on AWS
Serverless Frameworks on AWSServerless Frameworks on AWS
Serverless Frameworks on AWS
 
What is AWS lambda?
What is AWS lambda?What is AWS lambda?
What is AWS lambda?
 
Redshift loader - Copenhagen AWS User Group
Redshift loader - Copenhagen AWS User GroupRedshift loader - Copenhagen AWS User Group
Redshift loader - Copenhagen AWS User Group
 
Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
 
Deploying AWS Lambda Functions using Go Lang
Deploying AWS Lambda Functions using Go LangDeploying AWS Lambda Functions using Go Lang
Deploying AWS Lambda Functions using Go Lang
 
Autoscaling Best Practices
Autoscaling Best PracticesAutoscaling Best Practices
Autoscaling Best Practices
 
Developing and deploying serverless applications (February 2017)
Developing and deploying serverless applications (February 2017)Developing and deploying serverless applications (February 2017)
Developing and deploying serverless applications (February 2017)
 
AWSの真髄
AWSの真髄AWSの真髄
AWSの真髄
 

Semelhante a How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

Architetture serverless e pattern avanzati per AWS Lambda
Architetture serverless e pattern avanzati per AWS LambdaArchitetture serverless e pattern avanzati per AWS Lambda
Architetture serverless e pattern avanzati per AWS LambdaAmazon Web Services
 
Big data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsBig data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsMarek Kuczynski
 
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Amazon Web Services
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Amazon Web Services
 
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & DataductAmazon Web Services
 
Data analytics master class: predict hotel revenue
Data analytics master class: predict hotel revenueData analytics master class: predict hotel revenue
Data analytics master class: predict hotel revenueKris Peeters
 
Ai lifecycle and navigator
Ai lifecycle and navigatorAi lifecycle and navigator
Ai lifecycle and navigatoraiclub_slides
 
Serverless Data Lake on AWS
Serverless Data Lake on AWSServerless Data Lake on AWS
Serverless Data Lake on AWSThanh Nguyen
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...SQUADEX
 
Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
Deploy and Serve Model from Azure Databricks onto Azure Machine LearningDeploy and Serve Model from Azure Databricks onto Azure Machine Learning
Deploy and Serve Model from Azure Databricks onto Azure Machine LearningDatabricks
 
AWS18 Startup Day Toronto- The Best Practices and Hard Lessons Learned of Ser...
AWS18 Startup Day Toronto- The Best Practices and Hard Lessons Learned of Ser...AWS18 Startup Day Toronto- The Best Practices and Hard Lessons Learned of Ser...
AWS18 Startup Day Toronto- The Best Practices and Hard Lessons Learned of Ser...Amazon Web Services
 
Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...Amazon Web Services
 
Chris O'Brien - Best bits of Azure for Office 365/SharePoint developers
Chris O'Brien - Best bits of Azure for Office 365/SharePoint developersChris O'Brien - Best bits of Azure for Office 365/SharePoint developers
Chris O'Brien - Best bits of Azure for Office 365/SharePoint developersChris O'Brien
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...Amazon Web Services
 
AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18Neal Davis
 
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Amazon Web Services
 

Semelhante a How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python (20)

Architetture serverless e pattern avanzati per AWS Lambda
Architetture serverless e pattern avanzati per AWS LambdaArchitetture serverless e pattern avanzati per AWS Lambda
Architetture serverless e pattern avanzati per AWS Lambda
 
AWS glue technical enablement training
AWS glue technical enablement trainingAWS glue technical enablement training
AWS glue technical enablement training
 
Big data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsBig data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The Netherlands
 
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
 
Data analytics master class: predict hotel revenue
Data analytics master class: predict hotel revenueData analytics master class: predict hotel revenue
Data analytics master class: predict hotel revenue
 
Ai lifecycle and navigator
Ai lifecycle and navigatorAi lifecycle and navigator
Ai lifecycle and navigator
 
Serverless Data Lake on AWS
Serverless Data Lake on AWSServerless Data Lake on AWS
Serverless Data Lake on AWS
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF Loft
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
Deploy and Serve Model from Azure Databricks onto Azure Machine LearningDeploy and Serve Model from Azure Databricks onto Azure Machine Learning
Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
 
AWS18 Startup Day Toronto- The Best Practices and Hard Lessons Learned of Ser...
AWS18 Startup Day Toronto- The Best Practices and Hard Lessons Learned of Ser...AWS18 Startup Day Toronto- The Best Practices and Hard Lessons Learned of Ser...
AWS18 Startup Day Toronto- The Best Practices and Hard Lessons Learned of Ser...
 
Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...
 
Chris O'Brien - Best bits of Azure for Office 365/SharePoint developers
Chris O'Brien - Best bits of Azure for Office 365/SharePoint developersChris O'Brien - Best bits of Azure for Office 365/SharePoint developers
Chris O'Brien - Best bits of Azure for Office 365/SharePoint developers
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
 
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
 

Mais de Stuart Myles

IPTC Rights Statements For News
IPTC Rights Statements For NewsIPTC Rights Statements For News
IPTC Rights Statements For NewsStuart Myles
 
IPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasIPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasStuart Myles
 
IPTC Board Spring 2019
IPTC Board Spring 2019IPTC Board Spring 2019
IPTC Board Spring 2019Stuart Myles
 
IPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceIPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceStuart Myles
 
Photomation or Fauxtomation?
Photomation or Fauxtomation?Photomation or Fauxtomation?
Photomation or Fauxtomation?Stuart Myles
 
Image Tagging at the Associated Press
Image Tagging at the Associated PressImage Tagging at the Associated Press
Image Tagging at the Associated PressStuart Myles
 
IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018Stuart Myles
 
IPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeIPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeStuart Myles
 
How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?Stuart Myles
 
IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018Stuart Myles
 
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...Stuart Myles
 
Ap Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesAp Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesStuart Myles
 
IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018Stuart Myles
 
Sustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesSustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesStuart Myles
 
The Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorThe Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorStuart Myles
 
IPTC Approach to News in JSON
IPTC Approach to News in JSONIPTC Approach to News in JSON
IPTC Approach to News in JSONStuart Myles
 
IPTC News in JSON November 2017
IPTC News in JSON November 2017IPTC News in JSON November 2017
IPTC News in JSON November 2017Stuart Myles
 
IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017Stuart Myles
 
Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Stuart Myles
 
Credibility Schema Working Group
Credibility Schema Working GroupCredibility Schema Working Group
Credibility Schema Working GroupStuart Myles
 

Mais de Stuart Myles (20)

IPTC Rights Statements For News
IPTC Rights Statements For NewsIPTC Rights Statements For News
IPTC Rights Statements For News
 
IPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasIPTC New Taxonomies Ideas
IPTC New Taxonomies Ideas
 
IPTC Board Spring 2019
IPTC Board Spring 2019IPTC Board Spring 2019
IPTC Board Spring 2019
 
IPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceIPTC Spring 2019 Conference
IPTC Spring 2019 Conference
 
Photomation or Fauxtomation?
Photomation or Fauxtomation?Photomation or Fauxtomation?
Photomation or Fauxtomation?
 
Image Tagging at the Associated Press
Image Tagging at the Associated PressImage Tagging at the Associated Press
Image Tagging at the Associated Press
 
IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018
 
IPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeIPTC AGM 2018 Welcome
IPTC AGM 2018 Welcome
 
How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?
 
IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018
 
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
 
Ap Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesAp Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and Challenges
 
IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018
 
Sustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesSustaining Television News Technical Challenges
Sustaining Television News Technical Challenges
 
The Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorThe Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing Director
 
IPTC Approach to News in JSON
IPTC Approach to News in JSONIPTC Approach to News in JSON
IPTC Approach to News in JSON
 
IPTC News in JSON November 2017
IPTC News in JSON November 2017IPTC News in JSON November 2017
IPTC News in JSON November 2017
 
IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017
 
Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017
 
Credibility Schema Working Group
Credibility Schema Working GroupCredibility Schema Working Group
Credibility Schema Working Group
 

Último

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Último (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

  • 1. How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python PyData ✤ November 27th, 2017 ✤ apmetadata@ap.org
  • 3. apmetadata@ap.org Tags Why do you want tags on your text content? ● Search, navigation, recommendations ● Aggregation, routing ● Discoverability ○ properties ○ relationships
  • 5. apmetadata@ap.org Taxonomy Jordan Larson <http://cv.ap.org/id/9A7FD8FA87AD4A43BDD522B65147A808> , ap:associatedState <http://cv.ap.org/id/8083[Nebraska]43E>; ap:displayLabel "Jordan Larson (Women's volleyball)"@en; ap:hometown "Hooper, NE"@en; ap:olympicTeam2016 <http://cv.ap.org/id/46[United States Olympic Team]B73H>; ap:sport <http://cv.ap.org/id/DA[Volleyball]C8EA>; dbprop:birthdate "1986-10-16"^^xsd:date; dcterms:created "2012-07-11T14:30:26-04:00"^^xsd:dateTime; dcterms:modified "2017-07-25T10:37:49-04:00"^^xsd:dateTime; a <http://cv.ap.org/c/ProfessionalAthlete>, skos:Concept; skos:broader <http://cv.ap.org/id/384[Professional Athlete]88>; skos:definition "American volleyball player."@en; skos:inScheme <http://cv.ap.org/a#person>; skos:prefLabel "Jordan Larson"@en; foaf:gender "Female"@en.
  • 6. Applying taxonomy to text Manually apmetadata@ap.org Airlines Industry Pan American Airlines Co. Travel
  • 8. Applying taxonomy to text Statistical classifier apmetadata@ap.org Training data Training engine Trained model
  • 9. AP Metadata Services Tag with AP taxonomy APMS Custom Tagging Simple four step REST API Add your own tags and taxonomy apmetadata@ap.org
  • 10. Let’s create a classifier! For dragons What if l like the AP Taxonomy but I want to classify with some additional tags? In this case, documents about dragons
  • 11. A taxonomy of dragons (borrowed from screencrush.com) New documents about dragons To be classified
  • 12. A map (with some * ) A fully automated workflow for training and deploying a Lambda-based classifier Sadly, the expression hic sunt dracones (here be dragons) is an anachronism, but it does appear at least once, on the Hunt-Lenox globe (ca 1510). The Hunt-Lenox Globe (NYPL) * Dragon emojis indicate problems found and (mostly) solved
  • 13. Step Functions Client EC2 Auto Scaling Download training data Download dependencies Train model Deploy model EC2 classifier.py classifier.pkl tags.json API Gateway Lambda Workflow Scaling Worker Classifier apmetadata@ap.org Creating a classifier
  • 14. A Lambda-based classifier • AWS Lambda: run event-driven code without provisioning or managing a server or servers •Cost efficient solution to ensure capacity meets demand • What do we need? • Code to invoke classifier and return results to user • Code dependencies (e.g. scikit-learn) • Other supporting artifacts (the trained model, the taxonomy) • Permissions for Lambda function to interact with other AWS services • API endpoint for accessing Lambda function apmetadata@ap.org
  • 15. Step Functions Client EC2 Auto Scaling Download training data Download dependencies Train model Deploy model EC2 classifier.py classifier.pkl tags.json API Gateway Lambda Workflow Scaling Worker Classifier apmetadata@ap.org Processing user requests
  • 16. Processing user requests Validate and train Adding complexity: a workflow for algorithm selection AWS Step Functions: use visual workflows to coordinate microservices into a single application Triggers auto-scaling, sends training request to worker in the cloud. apmetadata@ap.org
  • 17. Step Functions Client EC2 Auto Scaling Download training data Download dependencies Train model Deploy model EC2 classifier.py classifier.pkl tags.json API Gateway Lambda Workflow Scaling Worker Classifier apmetadata@ap.org Training and deploying
  • 18. Training in the cloud • AWS EC2: scalable computing capacity in the cloud • Register an Amazon Machine Image (AMI) specifically for training •Speeds up provisioning your server • Ensures versions match between dependencies and your model •Prepare dependencies ahead of time to beat AWS Lambda’s size limits •If you are using scikit-learn, sklearn-build-lambda can generate an appropriately sized zip • Save model and taxonomy to disk, add to dependency zip apmetadata@ap.org
  • 19. Automating deployments • Serverless Framework: Node.js application for rapid deployment of serverless architectures • Simplifies the task of creating (and deleting) our classifier Lambdas •Provider agnostic, though you may not be •Zip artifact support for Lambda creation apmetadata@ap.org
  • 20. Step Functions Client EC2 Auto Scaling Download training data Download dependencies Train model Deploy model EC2 classifier.py classifier.pkl tags.json API Gateway Lambda Workflow Scaling Worker Classifier apmetadata@ap.org Classifying with AWS Lambda
  • 21. Classifying with AWS Lambda • Be mindful of cold starts •Allocating more memory may help • Store large models in S3 and take advantage of container reuse •Download assets to /tmp •Check /tmp for cached data before invocation Item Limit Deployment package (compressed) 50MB Deployment package (uncompressed) 250MB Non-persistent disk space in /tmp 500MB apmetadata@ap.org
  • 22. Predicted Eagles Predicted Doves Predicted Pigeons Sum of items = 300 Actual Eagles 95 3 2 100 Eagles Actual Doves 3 72 25 100 Doves Actual Pigeons 2 23 75 100 Pigeons How do I measure results? Confusion matrix apmetadata@ap.org
  • 23. How do I measure results? apmetadata@ap.org Measure your model’s performance per class • Precision (number of correct predictions divided by the total number in the dataset) • Recall (number of correct positive predictions divided by the total number of positives) Predicted Eagles Predicted Doves Predicted Pigeons Sum of items = 300 Actual Eagles 95 3 2 100 Eagles Actual Doves 3 72 25 100 Doves Actual Pigeons 2 23 75 100 Pigeons Model accuracy: 242 / 300 = 80%
  • 24. How do I improve results? Training data • Correctly tagged - quality matters • Quantity matters too - as long as it’s ‘good’ data! • Balanced training sets across classes apmetadata@ap.org
  • 25. How do I improve results? Taxonomy • Clean taxonomy nodes and structure • Distinct semantics, use relationships • Avoid overlapping concepts between nodes apmetadata@ap.org
  • 26. apmetadata@ap.org Thank You! dfox@ap.org smyles@ap.org vzielinska@ap.org apmetadata@ap.org Learn more about AP Metadata Services https://developer.ap.org/ap-metadata-services