OCR with MXNet Gluon

•

8 gostaram•3,032 visualizações

Using MXNet Gluon, we outline a pipeline for Optical Character Recognition that uses a CNN for page detection, SSD (Single Shot Multibox Detection) for line detection and CNN-BiLSTM with CTC Loss for handwriting recognition. We also investigate extensions to improve performance with Lexicon Search and Language Modelling.

Tecnologia

AWS MXNet Applications
OCR with MXNet Gluon
Work of Jonathan Chung (& Thomas Delteil)
Presented by Thom Lane
16th
August 2018

Dataset
IAM Handwriting Database:
657 writers
1,539 pages of scanned text
5,685 isolated and labeled sentences
13,353 isolated and labeled text lines
115,320 isolated and labeled words
http://www.fki.inf.unibe.ch/databases/iam-handwriting-database
(Register for username and password)

Solution
Handwriting Recognition Pipeline

Page Segmentation
Input: Image with handwritten text region
Output: Bounding box of handwritten text
Problem: Single object detection
Solutions:
1) Using MSERs
2) Using CNNs

Using MSERs
Maximally Stable Extremal Regions (MSERs) algorithm

CNN: Training
Start by optimizing Mean Square Error.
When consistent overlap, then fine-tune IOU.

Line Segmentation
Input: Image containing only handwritten text
Output: Bounding boxes for each handwritten line
Problem: Multiple object detection
Solutions:
1) Using SSD for lines
2) Using SSD for words

Using SSD
SSD: Single Shot Multi-box Detector
Customized:
1. Data augmentation
2. Anchor boxes
3. Network architecture
4. Non-maximum suppression

Line SSD: Anchor boxes
Standard Custom
`mxnet.ndarray.contrib.MultiBoxPrior`

Line SSD: Non-Maximum Suppression
`mxnet.ndarray.contrib.box_nms`

Line SSD: Results
Train IOU = 0.593 & Test IOU = 0.573

Handwriting Recognition
Input: Image of single line of handwritten text
Output: Probabilities of each character for each time step.
i.e. an array of shape (sequence_length × vocab. of characters)
Inference Output: String of recognized text

Detection: With CTC loss
Problem:
• Must have label for each time step.
• Or must model alignment.
• Must compress final sequence.
Solution:
• Connectionist Temporal Classification loss.
• Defines a collapsing function.
• Assumes monotonic and many-to-one.
`mxnet.ndarray.contrib.CTCLoss`

Inference: Best Path vs Correct decoding
!"#$ %&$ℎ = −, −
!"#$ %&$ℎ +,$-,$ = . −, − = “”
% “&” = % “ − &” + % “& − ” + % “&&”
= 0.6 ∗ 0.4 + 0.4 ∗ 0.6 + 0.4 ∗ 0.4
= 0.64
.788"9$ +,$-,$ = “&”
% ”” = %(−, −) = 0.36
Can’t check all paths though.
&=-ℎ&>"$_="@A$ℎBCDECFGC_HCFIJK paths.
i.e. 81NO paths!

Inference: Adding Language Model
And evaluate ‘perplexity’ of of each candidate and
take the one with lowest value as final prediction.

Inference: Results
Mean Character Error Rate (CER)
Greedy = 21.170.
Greedy + Lexicon Search: 21.152.
Beam Search + Lexicon Search: CER = 21.058.

Thanks!
And don’t forget to check out:
https://medium.com/apache-mxnet
https://github.com/ThomasDelteil/Gluon_OCR_LSTM_CTC

Mais conteúdo relacionado

Mais procurados

Introduction of Html/css/jsKnoldus Inc.

Forms of learning in aiRobert Antony

offline character recognition for handwritten gujarati textBhumika Patel

Artificial IntelligenceSyed Zaid Irshad

Css Complete NotesEPAM Systems

Deep Learning - Overview of my work IIMohamed Loey

Handwriting Recognition Using Deep Learning and Computer VersionNaiyan Noor

Querying XML: XPath and XQueryKatrien Verbert

PerceptronNagarajan

Oops concept on c#baabtra.com - No. 1 supplier of quality freshers

L11 array listteach4uin

Wrapper classkamal kotecha

ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopiwan_rg

String.pptajeela mushtaq

BTech Pattern Recognition NotesAshutosh Agrahari

The Complete HTMLRohit Buddabathina

1 03 - CSS Introductionapnwebdev

HTML presentation for beginnersjeroenvdmeer

10 mavzu(prezent)EldorKarimov

Handwritten Character RecognitionConstantine Priemski

Mais procurados (20)

Introduction of Html/css/js

Forms of learning in ai

offline character recognition for handwritten gujarati text

Artificial Intelligence

Css Complete Notes

Deep Learning - Overview of my work II

Handwriting Recognition Using Deep Learning and Computer Version

Querying XML: XPath and XQuery

Perceptron

Oops concept on c#

L11 array list

Wrapper class

ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop

String.ppt

BTech Pattern Recognition Notes

The Complete HTML

1 03 - CSS Introduction

HTML presentation for beginners

10 mavzu(prezent)

Handwritten Character Recognition

Semelhante a OCR with MXNet Gluon

Optimizing Terascale Machine Learning Pipelines with Keystone MLSpark Summit

Introduction to Deep LearningOswald Campesato

Distributed Deep Learning on AWS with Apache MXNetAmazon Web Services

Workshop - Introduction to Machine Learning with RShirin Elsinghorst

Neural Networks in the Wild: Handwriting RecognitionJohn Liu

"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...Dataconomy Media

Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017MLconf

Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData

Deep Dive on Deep Learning (June 2018)Julien SIMON

Machine Learning in ActionAmazon Web Services

Text cnn on acme ugc moderationMarsan Ma

Assignment-1-NF.docxKhondokerAbuNaim

Memory efficient java tutorial practices and challengesmustafa sarac

MXNet WorkshopAmazon Web Services

Diving into Deep Learning (Silicon Valley Code Camp 2017)Oswald Campesato

Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiSebastian Ruder

Protein Secondary Structure Prediction using Deep Learning methodsChrysoula Kosma

Polymer Brush Data ProcessorCory Bethrant

Haskell for data scienceJohn Cant

Scalable Deep Learning Using Apache MXNetAmazon Web Services

Semelhante a OCR with MXNet Gluon (20)

Optimizing Terascale Machine Learning Pipelines with Keystone ML

Introduction to Deep Learning

Distributed Deep Learning on AWS with Apache MXNet

Workshop - Introduction to Machine Learning with R

Neural Networks in the Wild: Handwriting Recognition

"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...

Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017

Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...

Deep Dive on Deep Learning (June 2018)

Machine Learning in Action

Text cnn on acme ugc moderation

Assignment-1-NF.docx

Memory efficient java tutorial practices and challenges

MXNet Workshop

Diving into Deep Learning (Silicon Valley Code Camp 2017)

Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai

Protein Secondary Structure Prediction using Deep Learning methods

Polymer Brush Data Processor

Haskell for data science

Scalable Deep Learning Using Apache MXNet

Mais de Apache MXNet

Recent Advances in Natural Language ProcessingApache MXNet

Fine-tuning BERT for Question AnsweringApache MXNet

Introduction to GluonNLPApache MXNet

Introduction to object tracking with Deep LearningApache MXNet

Introduction to GluonCVApache MXNet

Introduction to Computer VisionApache MXNet

Image Segmentation: Approaches and ChallengesApache MXNet

Introduction to Deep face detection and recognitionApache MXNet

Generative Adversarial Networks (GANs) using Apache MXNetApache MXNet

Deep Learning With Apache MXNet On Video by Ben Taylor @ ziff.aiApache MXNet

Using Java to deploy Deep Learning models with MXNetApache MXNet

AI powered emotion recognition: From Inception to Production - Global AI Conf...Apache MXNet

MXNet Paris Workshop - Intro To MXNetApache MXNet

Apache MXNet ODSC West 2018Apache MXNet

DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018Apache MXNet

Apache MXNet EcoSystem - ACNA2018Apache MXNet

ONNX and Edge DeploymentsApache MXNet

Distributed Inference with MXNet and SparkApache MXNet

Multivariate Time SeriesApache MXNet

AI On the Edge: Model CompressionApache MXNet

Mais de Apache MXNet (20)

Recent Advances in Natural Language Processing

Fine-tuning BERT for Question Answering

Introduction to GluonNLP

Introduction to object tracking with Deep Learning

Introduction to GluonCV

Introduction to Computer Vision

Image Segmentation: Approaches and Challenges

Introduction to Deep face detection and recognition

Generative Adversarial Networks (GANs) using Apache MXNet

Deep Learning With Apache MXNet On Video by Ben Taylor @ ziff.ai

Using Java to deploy Deep Learning models with MXNet

AI powered emotion recognition: From Inception to Production - Global AI Conf...

MXNet Paris Workshop - Intro To MXNet

Apache MXNet ODSC West 2018

DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018

Apache MXNet EcoSystem - ACNA2018

ONNX and Edge Deployments

Distributed Inference with MXNet and Spark

Multivariate Time Series

AI On the Edge: Model Compression

Último

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

GenCyber Cyber Security Day PresentationMichael W. Hawkins

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

A Call to Action for Generative AI in 2024Results

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Slack Application Development 101 Slidespraypatel2

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

OCR with MXNet Gluon

1. AWS MXNet Applications OCR with MXNet Gluon Work of Jonathan Chung (& Thomas Delteil) Presented by Thom Lane 16th August 2018

2. Problem Optical Character Recognition

4. Dataset IAM Handwriting Database

5. Dataset IAM Handwriting Database: 657 writers 1,539 pages of scanned text 5,685 isolated and labeled sentences 13,353 isolated and labeled text lines 115,320 isolated and labeled words http://www.fki.inf.unibe.ch/databases/iam-handwriting-database (Register for username and password)

6. Solution Handwriting Recognition Pipeline

7. Pipeline

8. 1) Page Segmentation

9. Page Segmentation Input: Image with handwritten text region Output: Bounding box of handwritten text Problem: Single object detection Solutions: 1) Using MSERs 2) Using CNNs

10. Using MSERs Maximally Stable Extremal Regions (MSERs) algorithm

11. Using CNN

12. CNN: Data Augmentation

13. CNN: Training Start by optimizing Mean Square Error. When consistent overlap, then fine-tune IOU.

14. CNN: Results

15. 2) Line Segmentation

16. Line Segmentation Input: Image containing only handwritten text Output: Bounding boxes for each handwritten line Problem: Multiple object detection Solutions: 1) Using SSD for lines 2) Using SSD for words

17. Using SSD SSD: Single Shot Multi-box Detector Customized: 1. Data augmentation 2. Anchor boxes 3. Network architecture 4. Non-maximum suppression

18. Line SSD: Data Augmentation

19. Line SSD: Anchor boxes Standard Custom `mxnet.ndarray.contrib.MultiBoxPrior`

20. Line SSD: Anchor boxes

21. Line SSD: Network architecture

22. Line SSD: Non-Maximum Suppression `mxnet.ndarray.contrib.box_nms`

23. Line SSD: Training

24. Line SSD: Results Train IOU = 0.593 & Test IOU = 0.573

25. Line SSD: Results

26. Word SSD

27. 3) Handwriting Recognition

28. Handwriting Recognition Input: Image of single line of handwritten text Output: Probabilities of each character for each time step. i.e. an array of shape (sequence_length × vocab. of characters) Inference Output: String of recognized text

29. Detection: Using CNN-BiLSTM

30. Detection: Adding down-sampling

31. Detection: With CTC loss Problem: • Must have label for each time step. • Or must model alignment. • Must compress final sequence. Solution: • Connectionist Temporal Classification loss. • Defines a collapsing function. • Assumes monotonic and many-to-one. `mxnet.ndarray.contrib.CTCLoss`

32. 70% 10% 5% … … … 100% 0% 0% truth preds

33. Inference: Best Path vs Correct decoding !"#$ %&$ℎ = −, − !"#$ %&$ℎ +,$-,$ = . −, − = “” % “&” = % “ − &” + % “& − ” + % “&&” = 0.6 ∗ 0.4 + 0.4 ∗ 0.6 + 0.4 ∗ 0.4 = 0.64 .788"9$ +,$-,$ = “&” % ”” = %(−, −) = 0.36 Can’t check all paths though. &=-ℎ&>"$_="@A$ℎBCDECFGC_HCFIJK paths. i.e. 81NO paths!

34. Inference: With Beam Search

35. Inference: With CTC Beam Search

36. Inference: Adding Lexicon Search

37. Inference: Adding Language Model And evaluate ‘perplexity’ of of each candidate and take the one with lowest value as final prediction.

38. Inference: Results Mean Character Error Rate (CER) Greedy = 21.170. Greedy + Lexicon Search: 21.152. Beam Search + Lexicon Search: CER = 21.058.

39. Thanks! And don’t forget to check out: https://medium.com/apache-mxnet https://github.com/ThomasDelteil/Gluon_OCR_LSTM_CTC

OCR with MXNet Gluon

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a OCR with MXNet Gluon

Semelhante a OCR with MXNet Gluon (20)

Mais de Apache MXNet

Mais de Apache MXNet (20)

Último

Último (20)

OCR with MXNet Gluon