3. Speech Recognition
and
Optical Character Recognition
Using plugins that use google Artificial Intelligence software to
add some high-end feature to your applications
4. 4 |
What is Speech Recognition and
how it relates to artificial
intelligence technology?
Speech Recognition and Optical Character Recognition
5. 5 | Speech Recognition and Optical Character Recognition
• Speech Recognition is a generic name of the
software used to convert audio speech to text.
(the plugin used in this presentation use:
Google Cloud Speech-To-Text).
• “Powered by machine learning:
Apply the most advanced deep-learning neural
network algorithms to audio for speech
recognition with unparalleled accuracy. Accuracy
improves over time as Google improves the
internal speech recognition technology used by
Google products.”
https://cloud.google.com/speech-to-text/
6. 6 |
What is OCR
(Optical Character Recognition)
and how it relates to artificial
intelligence technology?
Speech Recognition and Optical Character Recognition
7. 7 | Speech Recognition and Optical Character Recognition
• OCR
It is a software which recognizes characters in an
image and produces a string of characters
(the plugins used in this presentation use:
Cloud Vision AI from google).
• “Vision AI
Google Cloud’s Vision API offers powerful
pre-trained machine learning models through
REST and RPC APIs..”
https://cloud.google.com/vision/
8. 8 | Speech Recognition and Optical Character Recognition
• “Vision AI
Google Cloud’s Vision API offers powerful
through
REST and RPC APIs..”
https://cloud.google.com/vision/
• “ :
Apply the most advanced
algorithms to audio for speech
recognition with unparalleled accuracy. Accuracy
improves over time as Google improves the
internal speech recognition technology used by
Google products.”
https://cloud.google.com/speech-to-text/
9. 9 | Speech Recognition and Optical Character Recognition
• Machine Learning : is the learning in which
machine can learn by its own without being
explicitly programmed. It is an application of AI
that provide system the ability to automatically
learn and improve from experience.
https://www.geeksforgeeks.org/difference-between-machine-learning-and-
artificial-intelligence/
So, What is Machine
Learning?
10. 10 |
What are an Artificial Neural Networks ?
Speech Recognition and Optical Character Recognition
Source: Wikipedia, the free encyclopedia
11. 11 |
What are an Artificial Neural Networks ?
Speech Recognition and Optical Character Recognition
Source: Wikipedia, the free encyclopedia
12. 12 |
What are an Artificial Neural Networks ?
Speech Recognition and Optical Character Recognition
Source: Wikipedia, the free encyclopedia
13. 13 |
Why should we use Voice
Recognition tecnology ?
Speech Recognition and Optical Character Recognition
14. 14 | Speech Recognition and Optical Character Recognition
• for Convinience :
If you have a cooking application it would be very interesting if
you could give voice commands to turn the page, ask for
ingredients, cooking time, temperature and set alarms or
timers, instead of having to use your dirty finger over the
screen to do it.
If you are in your car, It would be nice to use the voice to open
the windows or turn on the radio.
• first of all it is fancy :
Who did not get amazed while interacting with Alexa and/or
Siri.
• for Necessity:
If you are a factory worker and have your hands
constantly busy and have to fill up a report, then having
the possibility to use the voice to do it will be a
differentiator factor (a much-appreciated feature).
15. 15 |
Why should we use OCR
(Optical Character Recognition)
tecnology ?
Speech Recognition and Optical Character Recognition
16. 16 | Speech Recognition and Optical Character Recognition
• Read printed and handwritten text and numbers
from an image.
• Automatically add labes to images.
• Automatically categorize images.
• Compare images.
17. 17 |
What all this have to do with
outsystems?
Speech Recognition and Optical Character Recognition
18. 18 | Speech Recognition and Optical Character Recognition
• You can use those APIs directly in your
applications or create a plugin that makes it
easier to access those APIs for a specific task.
• The Speech-To-Text and Cloud Vision AI
Software are available for developers through
APIs provided by google cloud services.
• Or you can use forge plugins already done by
someone that facilitates the utilization of those
APIs.
19. 19 |
Outsystems Forge Plugins
Speech Recognition and Optical Character Recognition
The OutSystems Forge is a repository of reusable, open code modules,
connectors, and UI components to help speed up app delivery time.
20. 20 |Speech Recognition and Optical Character Recognition
• Google Cloud Vision OCR
An extension that allows applications to use Google's
Cloud Vision API (https://cloud.google.com/vision/) to
perform OCR (Optical Character Recognition) on
images extracting those characters from the image
into a text.
• Speech Recognition Plugin:
An extension that allows applications to use
Google's voice recognition API
(https://cloud.google.com/speech-to-text/)
to transform speech into text.
21. 21 | Speech Recognition and Optical Character Recognition
Supported plataforms: Android and IOS
Usage: requires internet connection
Methods:
isRecognitionAvailable startListening
stopListening getSupportedLanguages
hasPermission requestPermission
Licensing:
Cordova Plugin for Speech Recognition - Github, The MIT License (MIT).
22. 22 | Speech Recognition and Optical Character Recognition
The package contains two modules
1. An extension written in C# that uses Google's Cloud Vision API to perform OCR on images.
The extension exposes the following actions:
• GetDateAndAmountRegex: Extracts a date and a currency amount from the provided image
using the specified Regular Expressions.
• GetFullText: Extracts the full text from the provided image.
• GetTextAnnotations: Returns a collection of text annotation objects, each identifying an area of
the image where text was detected.
2. A module containing a single Static Entity with the accepted Language Codes to be passed
as Language Hints.
23. 23 | Speech Recognition and Optical Character Recognition
PoC (Prove of Concept):
Using Speech Recognition Plugin Into an application
(Quality Plus)
• It was intended to be used as a complete user interface (using the
voice) between the user and the application.
• The example application is a Report application and we can
subdivide the voice interface in 2 parts:
a) Navigation controls (to move around the menus
provided in the application)
b) Answering question (to select and/or answer the
questions presented in the reports)
24. 24 | Speech Recognition and Optical Character Recognition
PoC:
Using Google Cloud Vision OCR Into an application
(Quality Plus)
• In our example case (a report application), during a report we
want to take a picture of a serial number or any other identification
number of a product and have the plugin retrieve the numbers and
characters to be added to the database.
26. 26 | Speech Recognition and Optical Character Recognition
Go to Forge page and download the plugin or install
it directly in your outsystems environment:
Speech Recognition Plugin
https://www.outsystems.com/forge/component-
overview/2123/speech-recognition-plugin
So... How we do to use
these plugins?
27. 27 | Speech Recognition and Optical Character Recognition
GoogleCloudVisionOCR
https://www.outsystems.com/forge/component-
overview/1572/googlecloudvisionocr
28. 28 |
Download Plugin from Forge to your environment
Add the presentation’s nameSpeech Recognition and Optical Character Recognition
29. 29 |
Add as a Dependency….
Speech Recognition and Optical Character Recognition
30. 30 |
Add as a Dependency….
Speech Recognition and Optical Character Recognition
31. 31 |
Add as a Dependency….
Speech Recognition and Optical Character Recognition
32. 32 |
Add as a Dependency….
Speech Recognition and Optical Character Recognition
35. 35 |
What is Quality Plus?
Speech Recognition and Optical Character Recognition
36. 36 |
It is a Report Application...
Speech Recognition and Optical Character Recognition
37. 37 |
That use Voice Recognition to:
Speech Recognition and Optical Character Recognition
• Answer questions.
• Navigate through the menus.
38. 38 |
WE START CREATING SOME SCREEN ACTIONS THAT WILL BE USED TO:
- START AND CHECK IF PLUGIN CAN BE USED.
- USE THE PLUGIN TO CATCH USER SPEECH.
- ASSOCIATE ANY POSSIBLE ACTION THAT THE APPLICATION CAN
DO WITH THE GIVEN SPEECH COMMAND.
Speech Recognition and Optical Character Recognition
39. 39 |
WE STARTED CREATING 2 CLIENT ACTIONS...
Speech Recognition and Optical Character Recognition
40. 40 | Speech Recognition and Optical Character Recognition
THEN WE CREATE OTHER ACTIONS IN THE SCREEN...
41. 41 | Speech Recognition and Optical Character Recognition
IN OUR SCREEN WE START CALLING AN DATA ACTION.
42. 42 | Speech Recognition and Optical Character Recognition
THEN WE USE THE TRIGGER EVENT - ON AFTER FETCH – TO RUN THE ACTION
REC-OnAfterFech-Start-Plugin.
43. 43 | Speech Recognition and Optical Character Recognition
RUN THE ACTION REC-OnAfterFech-Start-Plugin.
44. 44 |
INSIDE THE ACTIONS
Speech Recognition and Optical Character Recognition
97. 97 |
use OCR to:
Speech Recognition and Optical Character Recognition
Extract the Serial Number or
any other identification number
from a picture taken by the user
while doing a report.
98. 98 | Speech Recognition and Optical Character Recognition
99. 99 | Speech Recognition and Optical Character Recognition