SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Maintaining High Quality User-Generated Content
Through Machine Learning
Nikhil Dandekar
Quora: Nikhil-Dandekar
Twitter: @nikhilbd
Paula Griffin
Quora: Paula-Griffin-1
Twitter: @paulajgriffin
What is Quora?
Quora is a platform to ask
questions, get useful
answers, and share what
you know with the world.
Incredible answers from credible sources
Not everyone is Peter Norvig.
● Biggest challenges of any user-generated-content site are quality and moderation
● Two (mostly distinct) sets of users to deal with
○ Bad actors trying to cause harm
○ Well-meaning users who miss the mark
Bad actors
Well-meaning users
Growing challenges
● Millions of questions, answers, users, and topics
○ More incentives for bad actors
○ More users who aren’t familiar with Quora norms
● Without active effort, quality gets worse as we scale
● We need solutions that get better as our content grows
Solving these problems together
Writing the rulebook
● First step: deciding what you want on your platform
● “Be Nice, Be Respectful” policy since before our public launch in 2010
○ No hate speech
○ No harassment
○ No retaliation
● Almost all other policies flow from “being helpful” to someone viewing the page
○ Don’t write joke answers
○ Tag content with appropriate topics
Enforcing the rules
● Users can report content and users for violating Quora’s policies
● Starting out: manual review of all reports
● Problems:
○ Many man-hours needed to review all reports
○ Low reporting rates
○ The worst part: someone actually has to see the bad content
Enforcing the rules at scale
● Heuristics and machine learning help us reduce the burden of handling user reports, and
can proactively identify bad content
○ Deal with reported content faster and more cheaply
○ Catch spam, harassment, and other problems before other users see it
○ Automatically fix formatting and grammar in some cases
● Benefits of scale:
○ More content → more choice of good content
○ Ongoing feedback from human review systems
○ More data to train our models
Maintaining high content quality using
Machine Learning
ML Models for quality
● Questions: Adult detection, Question quality classification,
Duplicate questions detector, Overly personal question detector,
Question autocorrection etc.
● Answers + Comments: Adult detection, Answer ranking for
questions, Answer collapsing, BNBR classifier, Harassment classifier,
Spam classifier etc.
● Topics: Duplicate Topics detector, Bad Topic classifier etc.
● Users: Bad actor detection, Bad user-credentials classifier, Fake
name detection, User-topic bio classifier etc.
● Classifiers on other content types, e.g. answer wikis.
Machine Learning for quality: Overview
Machine Learning for quality: Overview
Algorithms
● RNNs (LSTMs/GRUs) and other deep networks,
Gradient Boosted Decision Trees, Random Forests,
Logistic Regression, LambdaMART, k-means and other
clustering techniques, k-NNs, PageRank etc.
Libraries
● Tensorflow, Keras, Sklearn, Xgboost, LightGBM,
FastText, RankLib, NTLK, spaCy etc.
Machine Learning model decision flow
Content
ML model
High-confidence
decision?
Take automatic action Ask a human to verify the action
NoYes
● Some examples of this decision flow:
○ Spam detection
○ BNBR violation detection
○ Question quality classifier
○ Duplicate question detection
○ ...and more
● The more nuanced and sensitive the decision, the
more the need for human verification
ML decision flow examples
Machine Learning data feedback loop
Training
data
Run model
on content
User actions
Human reviews
Train
Models
Case study: Question quality and automatic
question correction
● Users often ask questions with grammatical and spelling errors
● Example:
○ Which coin/token is next big thing in crypto currencies? And why?
○ Which coin/token is the next big thing in cryptocurrencies? Why?
● These are good questions, but the lack of correct phrasing hurts them
○ Less likely to be answered by experts
○ Harder to catch duplicate questions
○ Can hurt the perception of “quality” of Quora
“Bad” questions on Quora
“Bad” questions on Quora
● Types of errors in questions
○ Grammatical errors, e.g., “How I can ...”
○ Spelling mistakes
○ Missing preposition or article
○ Wrong/missing punctuation
○ Wrong capitalization
○ etc.
● Can we use Machine Learning to automatically correct these questions?
● Started off as an “offroad” hack-week project
● Since shipped
Automatic question correction: research
● Frame this problem similar to the machine translation
problem
● Final Model:
○ Sequence-to-sequence, character-level RNN (GRU)
with attention
Automatic question correction: Model
Automatic question correction: Model
● Model Details:
○ Sequence to sequence (encoder-decoder) model
○ Character-level
○ GRUs (Gated Recurrent Units)
○ Attention-based
○ Bidirectional
○ Beam search for decoding
● Tried solving the subproblems individually, but didn’t work as
well
● Training
○ Training data: Pairs of [bad question, corrected question]
○ Tensorflow, on a single box with GPUs
○ Training time: 2-3 hours
● Serving:
○ Tensorflow, GPU-based serving
○ Latency: <500 ms p99
● Run on new questions added to Quora
Automatic question correction: System Details
Automatic question correction: Results
● Checks for BNBR violations on questions, answers,
comments.
● Binary classifier
● Training data:
○ Positive: Confirmed BNBR violations
○ Negative: False BNBR reports, other good content
● Model: NN with 1 hidden layer (fastText)
● Same ML decision flow as before
BNBR classification
● Quality is one of the most important problems we face at Quora
● There are various systems to maintain quality, and we need to use all of them in order to keep up
● Machine Learning solutions helps us maintain quality at scale
○ ...but you can’t totally bypass human efforts
In conclusion
Thank you!
Nikhil Dandekar
Quora: Nikhil-Dandekar
Twitter: @nikhilbd
Paula Griffin
Quora: Paula-Griffin-1
Twitter: @paulajgriffin

Mais conteúdo relacionado

Mais procurados

MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixXavier Amatriain
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleXavier Amatriain
 
Barcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons LearnedBarcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons LearnedXavier Amatriain
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain
 
Cikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business ValueCikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business ValueXavier Amatriain
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the WorldYves Raimond
 
Evan Estola – Data Scientist, Meetup.com at MLconf ATL
Evan Estola – Data Scientist, Meetup.com at MLconf ATLEvan Estola – Data Scientist, Meetup.com at MLconf ATL
Evan Estola – Data Scientist, Meetup.com at MLconf ATLMLconf
 
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 Big & Personal: the data and the models behind Netflix recommendations by Xa... Big & Personal: the data and the models behind Netflix recommendations by Xa...
Big & Personal: the data and the models behind Netflix recommendations by Xa...BigMine
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In IndustryXavier Amatriain
 

Mais procurados (11)

MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
Barcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons LearnedBarcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons Learned
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
Cikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business ValueCikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business Value
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 
Evan Estola – Data Scientist, Meetup.com at MLconf ATL
Evan Estola – Data Scientist, Meetup.com at MLconf ATLEvan Estola – Data Scientist, Meetup.com at MLconf ATL
Evan Estola – Data Scientist, Meetup.com at MLconf ATL
 
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 Big & Personal: the data and the models behind Netflix recommendations by Xa... Big & Personal: the data and the models behind Netflix recommendations by Xa...
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 

Semelhante a Maintaining high quality user generated content through machine learning

Scaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningScaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningVo Viet Anh
 
LibQual Challenges & Lessons Learned at UW Oshkosh
LibQual Challenges & Lessons Learned at UW OshkoshLibQual Challenges & Lessons Learned at UW Oshkosh
LibQual Challenges & Lessons Learned at UW OshkoshWiLS
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...Sri Ambati
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsXavier Amatriain
 
How to become Industry ready engineers.pdf
How to become  Industry ready engineers.pdfHow to become  Industry ready engineers.pdf
How to become Industry ready engineers.pdfDrNilam Choudhary
 
Getting a Data Science Job
Getting a Data Science JobGetting a Data Science Job
Getting a Data Science JobAlexey Grigorev
 
Hooking react developers
Hooking react developersHooking react developers
Hooking react developersOliver Dolan
 
Student_Syllabus_CDS
Student_Syllabus_CDSStudent_Syllabus_CDS
Student_Syllabus_CDSDae Won Kim
 
How to Succeed as a PM by Native Instruments fmr Dir of Product
How to Succeed as a PM by Native Instruments fmr Dir of ProductHow to Succeed as a PM by Native Instruments fmr Dir of Product
How to Succeed as a PM by Native Instruments fmr Dir of ProductProduct School
 
Intro to Technical Writing: Creating Content that Google and Readers will Love
Intro to Technical Writing: Creating Content that Google and Readers will LoveIntro to Technical Writing: Creating Content that Google and Readers will Love
Intro to Technical Writing: Creating Content that Google and Readers will LoveLauren Hayward Schaefer
 
CP vs Project - Elevate Ep. 02.pdf
CP vs Project  - Elevate Ep. 02.pdfCP vs Project  - Elevate Ep. 02.pdf
CP vs Project - Elevate Ep. 02.pdfpreetikumara
 
Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019Maurício Aniche
 
How to Succeed as a PM by fmr Native Instrument Dir of Product
How to Succeed as a PM by fmr Native Instrument Dir of ProductHow to Succeed as a PM by fmr Native Instrument Dir of Product
How to Succeed as a PM by fmr Native Instrument Dir of ProductProduct School
 
The obstacles of developer productivity.pptx
The obstacles of developer productivity.pptxThe obstacles of developer productivity.pptx
The obstacles of developer productivity.pptxLaurence Chen
 
User research independent study
User research independent studyUser research independent study
User research independent studyDr. V Vorvoreanu
 
Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo羽祈 張
 
Entrepreneurship is product management
Entrepreneurship is product managementEntrepreneurship is product management
Entrepreneurship is product managementAndrew Gutierrez
 

Semelhante a Maintaining high quality user generated content through machine learning (20)

Scaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningScaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine Learning
 
LibQual Challenges & Lessons Learned at UW Oshkosh
LibQual Challenges & Lessons Learned at UW OshkoshLibQual Challenges & Lessons Learned at UW Oshkosh
LibQual Challenges & Lessons Learned at UW Oshkosh
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
How to become Industry ready engineers.pdf
How to become  Industry ready engineers.pdfHow to become  Industry ready engineers.pdf
How to become Industry ready engineers.pdf
 
Getting a Data Science Job
Getting a Data Science JobGetting a Data Science Job
Getting a Data Science Job
 
Hooking react developers
Hooking react developersHooking react developers
Hooking react developers
 
Student_Syllabus_CDS
Student_Syllabus_CDSStudent_Syllabus_CDS
Student_Syllabus_CDS
 
How to Succeed as a PM by Native Instruments fmr Dir of Product
How to Succeed as a PM by Native Instruments fmr Dir of ProductHow to Succeed as a PM by Native Instruments fmr Dir of Product
How to Succeed as a PM by Native Instruments fmr Dir of Product
 
Increasing the Maturity of our Java User Groups
Increasing the Maturity of our Java User GroupsIncreasing the Maturity of our Java User Groups
Increasing the Maturity of our Java User Groups
 
Intro to Technical Writing: Creating Content that Google and Readers will Love
Intro to Technical Writing: Creating Content that Google and Readers will LoveIntro to Technical Writing: Creating Content that Google and Readers will Love
Intro to Technical Writing: Creating Content that Google and Readers will Love
 
CP vs Project - Elevate Ep. 02.pdf
CP vs Project  - Elevate Ep. 02.pdfCP vs Project  - Elevate Ep. 02.pdf
CP vs Project - Elevate Ep. 02.pdf
 
Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019
 
How to Succeed as a PM by fmr Native Instrument Dir of Product
How to Succeed as a PM by fmr Native Instrument Dir of ProductHow to Succeed as a PM by fmr Native Instrument Dir of Product
How to Succeed as a PM by fmr Native Instrument Dir of Product
 
Increasing the Maturity of Our Java User Groups
Increasing the Maturity of Our Java User Groups Increasing the Maturity of Our Java User Groups
Increasing the Maturity of Our Java User Groups
 
The obstacles of developer productivity.pptx
The obstacles of developer productivity.pptxThe obstacles of developer productivity.pptx
The obstacles of developer productivity.pptx
 
User research independent study
User research independent studyUser research independent study
User research independent study
 
Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo
 
Entrepreneurship is product management
Entrepreneurship is product managementEntrepreneurship is product management
Entrepreneurship is product management
 

Último

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 

Último (20)

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 

Maintaining high quality user generated content through machine learning

  • 1. Maintaining High Quality User-Generated Content Through Machine Learning Nikhil Dandekar Quora: Nikhil-Dandekar Twitter: @nikhilbd Paula Griffin Quora: Paula-Griffin-1 Twitter: @paulajgriffin
  • 2. What is Quora? Quora is a platform to ask questions, get useful answers, and share what you know with the world.
  • 3. Incredible answers from credible sources
  • 4. Not everyone is Peter Norvig. ● Biggest challenges of any user-generated-content site are quality and moderation ● Two (mostly distinct) sets of users to deal with ○ Bad actors trying to cause harm ○ Well-meaning users who miss the mark
  • 7. Growing challenges ● Millions of questions, answers, users, and topics ○ More incentives for bad actors ○ More users who aren’t familiar with Quora norms ● Without active effort, quality gets worse as we scale ● We need solutions that get better as our content grows
  • 9. Writing the rulebook ● First step: deciding what you want on your platform ● “Be Nice, Be Respectful” policy since before our public launch in 2010 ○ No hate speech ○ No harassment ○ No retaliation ● Almost all other policies flow from “being helpful” to someone viewing the page ○ Don’t write joke answers ○ Tag content with appropriate topics
  • 10. Enforcing the rules ● Users can report content and users for violating Quora’s policies ● Starting out: manual review of all reports ● Problems: ○ Many man-hours needed to review all reports ○ Low reporting rates ○ The worst part: someone actually has to see the bad content
  • 11. Enforcing the rules at scale ● Heuristics and machine learning help us reduce the burden of handling user reports, and can proactively identify bad content ○ Deal with reported content faster and more cheaply ○ Catch spam, harassment, and other problems before other users see it ○ Automatically fix formatting and grammar in some cases ● Benefits of scale: ○ More content → more choice of good content ○ Ongoing feedback from human review systems ○ More data to train our models
  • 12. Maintaining high content quality using Machine Learning
  • 13. ML Models for quality ● Questions: Adult detection, Question quality classification, Duplicate questions detector, Overly personal question detector, Question autocorrection etc. ● Answers + Comments: Adult detection, Answer ranking for questions, Answer collapsing, BNBR classifier, Harassment classifier, Spam classifier etc. ● Topics: Duplicate Topics detector, Bad Topic classifier etc. ● Users: Bad actor detection, Bad user-credentials classifier, Fake name detection, User-topic bio classifier etc. ● Classifiers on other content types, e.g. answer wikis. Machine Learning for quality: Overview
  • 14. Machine Learning for quality: Overview Algorithms ● RNNs (LSTMs/GRUs) and other deep networks, Gradient Boosted Decision Trees, Random Forests, Logistic Regression, LambdaMART, k-means and other clustering techniques, k-NNs, PageRank etc. Libraries ● Tensorflow, Keras, Sklearn, Xgboost, LightGBM, FastText, RankLib, NTLK, spaCy etc.
  • 15. Machine Learning model decision flow Content ML model High-confidence decision? Take automatic action Ask a human to verify the action NoYes
  • 16. ● Some examples of this decision flow: ○ Spam detection ○ BNBR violation detection ○ Question quality classifier ○ Duplicate question detection ○ ...and more ● The more nuanced and sensitive the decision, the more the need for human verification ML decision flow examples
  • 17. Machine Learning data feedback loop Training data Run model on content User actions Human reviews Train Models
  • 18. Case study: Question quality and automatic question correction
  • 19. ● Users often ask questions with grammatical and spelling errors ● Example: ○ Which coin/token is next big thing in crypto currencies? And why? ○ Which coin/token is the next big thing in cryptocurrencies? Why? ● These are good questions, but the lack of correct phrasing hurts them ○ Less likely to be answered by experts ○ Harder to catch duplicate questions ○ Can hurt the perception of “quality” of Quora “Bad” questions on Quora
  • 20. “Bad” questions on Quora ● Types of errors in questions ○ Grammatical errors, e.g., “How I can ...” ○ Spelling mistakes ○ Missing preposition or article ○ Wrong/missing punctuation ○ Wrong capitalization ○ etc. ● Can we use Machine Learning to automatically correct these questions? ● Started off as an “offroad” hack-week project ● Since shipped
  • 22. ● Frame this problem similar to the machine translation problem ● Final Model: ○ Sequence-to-sequence, character-level RNN (GRU) with attention Automatic question correction: Model
  • 23. Automatic question correction: Model ● Model Details: ○ Sequence to sequence (encoder-decoder) model ○ Character-level ○ GRUs (Gated Recurrent Units) ○ Attention-based ○ Bidirectional ○ Beam search for decoding ● Tried solving the subproblems individually, but didn’t work as well
  • 24. ● Training ○ Training data: Pairs of [bad question, corrected question] ○ Tensorflow, on a single box with GPUs ○ Training time: 2-3 hours ● Serving: ○ Tensorflow, GPU-based serving ○ Latency: <500 ms p99 ● Run on new questions added to Quora Automatic question correction: System Details
  • 26. ● Checks for BNBR violations on questions, answers, comments. ● Binary classifier ● Training data: ○ Positive: Confirmed BNBR violations ○ Negative: False BNBR reports, other good content ● Model: NN with 1 hidden layer (fastText) ● Same ML decision flow as before BNBR classification
  • 27. ● Quality is one of the most important problems we face at Quora ● There are various systems to maintain quality, and we need to use all of them in order to keep up ● Machine Learning solutions helps us maintain quality at scale ○ ...but you can’t totally bypass human efforts In conclusion
  • 28. Thank you! Nikhil Dandekar Quora: Nikhil-Dandekar Twitter: @nikhilbd Paula Griffin Quora: Paula-Griffin-1 Twitter: @paulajgriffin