Belgium Outsystems user group speech recognition ocr

Rui
Inocêncio
Outsystems Associate Mobile Developer
@
in
rui@globalsmartpro.com
/rui-inocencio

Speech Recognition
and
Optical Character Recognition
Using plugins that use google Artificial Intelligence software to
add some high-end feature to your applications

4 |
What is Speech Recognition and
how it relates to artificial
intelligence technology?
Speech Recognition and Optical Character Recognition

5 | Speech Recognition and Optical Character Recognition
• Speech Recognition is a generic name of the
software used to convert audio speech to text.
(the plugin used in this presentation use:
Google Cloud Speech-To-Text).
• “Powered by machine learning:
Apply the most advanced deep-learning neural
network algorithms to audio for speech
recognition with unparalleled accuracy. Accuracy
improves over time as Google improves the
internal speech recognition technology used by
Google products.”
https://cloud.google.com/speech-to-text/

6 |
What is OCR
(Optical Character Recognition)
and how it relates to artificial
intelligence technology?

• OCR
It is a software which recognizes characters in an
image and produces a string of characters
(the plugins used in this presentation use:
Cloud Vision AI from google).
• “Vision AI
Google Cloud’s Vision API offers powerful
pre-trained machine learning models through
REST and RPC APIs..”
https://cloud.google.com/vision/

• “Vision AI
Google Cloud’s Vision API offers powerful
through
REST and RPC APIs..”
https://cloud.google.com/vision/
• “ :
Apply the most advanced
algorithms to audio for speech
recognition with unparalleled accuracy. Accuracy
improves over time as Google improves the
internal speech recognition technology used by
Google products.”
https://cloud.google.com/speech-to-text/

• Machine Learning : is the learning in which
machine can learn by its own without being
explicitly programmed. It is an application of AI
that provide system the ability to automatically
learn and improve from experience.
https://www.geeksforgeeks.org/difference-between-machine-learning-and-
artificial-intelligence/
So, What is Machine
Learning?

10 |
What are an Artificial Neural Networks ?
Source: Wikipedia, the free encyclopedia

11 |

12 |

13 |
Why should we use Voice
Recognition tecnology ?

• for Convinience :
If you have a cooking application it would be very interesting if
you could give voice commands to turn the page, ask for
ingredients, cooking time, temperature and set alarms or
timers, instead of having to use your dirty finger over the
screen to do it.
If you are in your car, It would be nice to use the voice to open
the windows or turn on the radio.
• first of all it is fancy :
Who did not get amazed while interacting with Alexa and/or
Siri.
• for Necessity:
If you are a factory worker and have your hands
constantly busy and have to fill up a report, then having
the possibility to use the voice to do it will be a
differentiator factor (a much-appreciated feature).

15 |
Why should we use OCR
(Optical Character Recognition)
tecnology ?

• Read printed and handwritten text and numbers
from an image.
• Automatically add labes to images.
• Automatically categorize images.
• Compare images.

17 |
What all this have to do with
outsystems?

• You can use those APIs directly in your
applications or create a plugin that makes it
easier to access those APIs for a specific task.
• The Speech-To-Text and Cloud Vision AI
Software are available for developers through
APIs provided by google cloud services.
• Or you can use forge plugins already done by
someone that facilitates the utilization of those
APIs.

19 |
Outsystems Forge Plugins
The OutSystems Forge is a repository of reusable, open code modules,
connectors, and UI components to help speed up app delivery time.

20 |Speech Recognition and Optical Character Recognition
• Google Cloud Vision OCR
An extension that allows applications to use Google's
Cloud Vision API (https://cloud.google.com/vision/) to
perform OCR (Optical Character Recognition) on
images extracting those characters from the image
into a text.
• Speech Recognition Plugin:
An extension that allows applications to use
Google's voice recognition API
(https://cloud.google.com/speech-to-text/)
to transform speech into text.

Supported plataforms: Android and IOS
Usage: requires internet connection
Methods:
isRecognitionAvailable startListening
stopListening getSupportedLanguages
hasPermission requestPermission
Licensing:
Cordova Plugin for Speech Recognition - Github, The MIT License (MIT).

The package contains two modules
1. An extension written in C# that uses Google's Cloud Vision API to perform OCR on images.
The extension exposes the following actions:
• GetDateAndAmountRegex: Extracts a date and a currency amount from the provided image
using the specified Regular Expressions.
• GetFullText: Extracts the full text from the provided image.
• GetTextAnnotations: Returns a collection of text annotation objects, each identifying an area of
the image where text was detected.
2. A module containing a single Static Entity with the accepted Language Codes to be passed
as Language Hints.

PoC (Prove of Concept):
Using Speech Recognition Plugin Into an application
(Quality Plus)
• It was intended to be used as a complete user interface (using the
voice) between the user and the application.
• The example application is a Report application and we can
subdivide the voice interface in 2 parts:
a) Navigation controls (to move around the menus
provided in the application)
b) Answering question (to select and/or answer the
questions presented in the reports)

PoC:
Using Google Cloud Vision OCR Into an application
(Quality Plus)
• In our example case (a report application), during a report we
want to take a picture of a serial number or any other identification
number of a product and have the plugin retrieve the numbers and
characters to be added to the database.

So...
It is time to see what we
talking about.

Go to Forge page and download the plugin or install
it directly in your outsystems environment:
Speech Recognition Plugin
https://www.outsystems.com/forge/component-
overview/2123/speech-recognition-plugin
So... How we do to use
these plugins?

GoogleCloudVisionOCR
https://www.outsystems.com/forge/component-
overview/1572/googlecloudvisionocr

28 |
Download Plugin from Forge to your environment
Add the presentation’s nameSpeech Recognition and Optical Character Recognition

29 |
Add as a Dependency….

30 |

31 |

32 |

Next section
We gonna see some code...

Speech Recognition
Implementing Speech Recognition into an aplication
PoC implemtation:
- Quality Plus -

35 |
What is Quality Plus?

36 |
It is a Report Application...

37 |
That use Voice Recognition to:
• Answer questions.
• Navigate through the menus.

38 |
WE START CREATING SOME SCREEN ACTIONS THAT WILL BE USED TO:
- START AND CHECK IF PLUGIN CAN BE USED.
- USE THE PLUGIN TO CATCH USER SPEECH.
- ASSOCIATE ANY POSSIBLE ACTION THAT THE APPLICATION CAN
DO WITH THE GIVEN SPEECH COMMAND.

39 |
WE STARTED CREATING 2 CLIENT ACTIONS...

THEN WE CREATE OTHER ACTIONS IN THE SCREEN...

IN OUR SCREEN WE START CALLING AN DATA ACTION.

THEN WE USE THE TRIGGER EVENT - ON AFTER FETCH – TO RUN THE ACTION
REC-OnAfterFech-Start-Plugin.

RUN THE ACTION REC-OnAfterFech-Start-Plugin.

44 |
INSIDE THE ACTIONS

45 |
REC-OnAfterFetch-Start-Plugin
Client Action specific to the
target screen

REC-OnAfterFetch-Start-Plugin Client Action on the example screen...

49 |
REC-Start-Plugin
Common Client Action

50 |
REC-StartPlugin

51 |
REC-StartPlugin

52 |
REC-StartPlugin

53 |
REC-StartPlugin

55 |
REC-Start-Listening
Client Action specific to the target
screen

REC-Starting-Listening Client Action on the example screen...

60 |
REC-PrepareToListen

65 |
REC-FindACommand

REC-FindACommand Client Action on the example screen...

68 |
(menu with numerical options)

REC-OnAfterFetch-Start-Plugin (menu with numerical options)...

Creating a number to associate with actions...

73 |
REC-FindACommand

REC-FindACommand (menu with numerical options)...

78 |
REC-Start-Listening

REC-Start-Listening (menu with numerical options)...

82 |
WELL! AND IF WE WANNA
ANSWER SOME QUESTIONS?

83 |
THEN, THINGS STARTED TO
BECOME A LITTLE MORE
COMPLEX...

84 |
specific Client Action
answering questions

REC-OnAfterFetch-Start-Plugin CLIENT ACTION

87 |
REC-Start-Listening
answering questions

88 |
REC-Start-Listening

90 |
REC-FindACommand
answering questions

91 |
REC-FindACommand

And there is more…..
Using Cloud Vision AI

Our example application
-Quality Plus-

97 |
use OCR to:
Extract the Serial Number or
any other identification number
from a picture taken by the user
while doing a report.

103 |
END

Belgium Outsystems user group speech recognition ocr

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Belgium Outsystems user group speech recognition ocr

Semelhante a Belgium Outsystems user group speech recognition ocr (20)

Mais de Providit

Mais de Providit (7)

Último

Último (20)

Belgium Outsystems user group speech recognition ocr