SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
BUIDING A DESKTOP ASSISTANT THAT USES VOICE COMAAND
AND COMPLETES SPECIFIED TASKS
SUBMITTED TO
KIIT Deemed to be University
In Partial Fulfillment of the Requirement for the Award of
BACHELOR’S DEGREE IN
ELECTRONICS AND COMPUTER SCIENECE ENGINEERING
BY
PRASUN CHAKRABORTY ROLL-1730041
UNDER THE GUIDANCE OF
PROF.CHANDANI KUMARI
SCHOOL OF ELECTRONICS ENGINEERING
KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY
BHUBANESWAR, ODISHA - 751024
KIIT Deemed to be University
School of Electronics Engineering
BHUBANESWAR, ODISHA - 751024
CERTIFICATE
This is certify that the report entitled
“BUIDING A DESKTOP ASSISTANT THAT USES VOICE COMAAND AND COMPLETES SPECIFIED TASKS “
BY
PRASUN CHAKRABORTY ROLL-1730041
is a record of bonafied work carried out by them, in the partial fulfillment of the
requirement for the award of degree of Bachelor of Engineering in Electronics and
Computer Science Engineering at KIIT Deemed to be university, Bhubaneswar. This
work is done during year 2020, under your guidance.
Date: / /
(Prof. Guide Name)
CHANDANI KUMARI
ACKNOWLEDGEMENTS
The success and final outcome of this project required a lot of guidance and assistance
from Prof. CHANDANI KUMARI and I am extremely privileged to have got this all
along the completion of my project. All that I have done is only due to such supervision
and assistance and I would not forget to to thank her.
I respect and thank Prof. CHANDANI KUMARI for providing me an opportunity to do
the project work from home in this pandemic and giving me all support and guidance
which made me complete the project . I am extremely thankful to her for providing such
a nice support and guidance although she had busy schedule managing the academic
affairs.
PRASUN CHAKRABORTY
ABSTARCT
A virtual assistant for desktop also called digital assistant, is an application program
that understands natural language voice commands and complete tasks for the users.
Most of the digital assistants are interacted with by using human voice. They may also
be reffered as voice assistant. To interact with a digital assistant one must use a wake
word, that is used to activate the device . Once one said a wake word, the system is now
ready to be asked a question. One could then ask “whats about the weather” and the
system will forecast the weather in local area aloud .
As digital digital assistant become more popular , so do their capabilities and the task
they are able to perform . Below are few of popular activities this desktop assistant can
perform .
 Answer basic questions
 Searching Google/Wikipedia
 Set alarm, timer
 Get information about temperature
 Playing a Song
 Reading and writing text files & many more..
In this project python programming language is used to develop the application .
CONTENT
1. Literature Review……………………………………………1
1.1 An overview of speech recognition……………………………1
1.2 History……………………………………………………..……………….1
2. Types of speech recognition……………….…………..2
2.1 Isolated speech…………………………………………..…….……….2
2.2 Connected speech……………………………………………………..2
2.3 Continuous speech…………………………………………………….2
2.4 Spontaneous speech………………………………………………….2
3. Basic speech recognition process……………………3
4. Introduction to Python…………………………………..4
5. Python libraries used in this Project……………….4
6. Tools required………………………………………………10
7. Use case Diagram………………………………………….10
8. List of tasks this application perform…………….11
9. Uses of speech recognition……………………………16
10. Applications………………………………………………….16
10.1 From medical perspective……………………………………….16
10.2 From military perspective………………………………………..16
10.3 From education perspective…………………………………….16
11. Some factors that may disturb functionalities
of the application………………………………………….17
12. The future of Speech Recognition…………………..17
13. References……………………………………………………..17
1
1. Literature Review
1.1 An overview of Speech Recognition
Speech Recognition is a technology that enables a computer to capture the words
spoken by a human with a help of microphone . These words are later on recognized by
Speech recognizer, and at the end system works according to the voice input .
The process of Speech Recognition consists of different steps that will be discussed in the
following section one by one.
1.2 History
The concept of speech recognition stated somewhere in 1940s, practically the first
speech recognition program was appeared in 1952 at the bell labs, that was about
recognition of digit in a noise free environment.
1940s and 1950s are considered as the foundation period of the speech recognition
technology , in this period work was done on the foundational paradigms of the speech
recognition that is automation and information theoretic models. The key technologies
that were developed in this decade were filter banks and time normalization methods.
In 1990s the key technologies developed during this period were the methods for
stochastic language understanding , statistical learning of acoustic and language models
and the method for implementation of large vocabulary speech understanding systems.
After the five decades of research , the speech recognition technology has finally entered
marketplace , benefiting the users in variety of ways . The challenge of designing a
machine that truly functions like an intelligent human is still a major one going forward.
2
2. Types of Speech Recognition : Speech Recognition systems can be divided
Into the number of classes based on their ability to recognize those words and list of
words they have. A few classes of speech recognition are classified as under :
2.1. Isolated Speech
Isolated word usually involve a pause between two utterance ; it does’nt mean that it
only accepts a single word but instead it requires one utterance at a time.
2.2 Connected Speech
Connected words or connected speech are similar to isolated speech but allow separate
utterance with minimal pause between them .
2.3 Continuous Speech
Continuous speech allow the user to speak almost naturally. It is also called the
computer dictation.
2.4 Spontaneous Speech
At a basic level, it can be the thought of as speech that is natural sounding and not
rehearsed. An ASR system with spontaneous speech ability should be able to handle a
variety of natural speech features such as words being run together ,“ums” and
“ahs”and even slight stutters.
3
3. Basic Speech Recognition Process
 Audio Input : With the help of the microphone the audio (human voice )is input to
the system .
 Analog to Digital : The process of converting to analog signal into digital form is
known as digitization . it involves both sampling and quantization process.
 Acoustic Model : An acoustic model is created by taking audio inputs and their text
transcripts , and using software to create statistical representation of the sounds
that make up each word .
 Language Model : Language modeling is used in many natural language processing
applications such as speech recognition tries to capture the properties of language
and to predict the next word in the speech sequence .
 Speech Engine : The job of speech engine is to convert the input audio file into text
to accomplish this it uses all sorts of data, software algorithms and statistics .
 Output : After all the above steps finally the output comes that is performing
operations according to the voice commands .
Audio Input Analog to Digital Acoustic Model
Output Language ModelSpeech Engine
4
4. Introduction to Python
4.1 Python : Python is an interpreted, high-level, general-purpose programming
language. Created by Guido van Rossum and first released in 1991, Python's design
philosophy emphasizes code readability with its notable use of significant whitespace.
It is a high level programming language which is,
Interpreted: Python is processed as run time by the interpreter .
Interactive: A python prompt can be used and can interact with the interpreter
directly to write the programs .
Object-oriented: Python suppoprts Object oriented technique of programming
Beginner’s language: Python is a great language for the beginner-level
programmers and supports the development of a wide range of applications.
In this project different techniques have been used for different functionalities .
Those will be discussed one by one .
5. Python Libraries used in this project
 OS : The OS module in python provides functions for interacting with the
operating system. OS, comes under Python’s standard utility modules. This
module provides a portable way of using operating system dependent functionality.
 Date-time : Python has a module named datetime to work with dates and times.
 Random : Sometimes we want the computer to pick a random number in a given
range, pick a random element from a list, pick a random card from a deck, flip a
coin, etc. The random module provides access to functions that support these types
of operations. The random module is another library of functions that can extend
the basic features of python.
 PyOWM : PyOWM is a client Python wrapper library for Open Weather Map web
APIs. It allows quick and easy consumption of OWM data from Python
applications via a simple object model and in a human-friendly fashion.
5
 PyAutoGUI : PyAutoGUI is a cross-platform GUI automation Python module for
human beings. Used to programmatically control the mouse & keyboard.
PyAutoGUI supports Python 2 and 3.
 Requests : Requests is an Apache2 Licensed HTTP library, written in Python. It is
designed to be used by humans to interact with the language. This means one don’t
have to manually add query strings to URLs, or form-encode POST data.
Requests will allow one to send HTTP/1.1 requests using Python. With it, one can
add content like headers, form data, multipart files, and parameters via simple
Python libraries. It also allows you to access the response data of Python in the
same way.
 Twilio.rest : The Twilio Python Helper Library makes it easy to interact with the
Twilio API from Python application.The Twilio Python Helper Library supports
Python applications written in Python 2.7 and above. Using Twilio API one can
automate sending whatsapp messages , calls , sending verification codes. In our
project Twilio API has been used to send help message to anyone in emergency
situation.
 Webbrowser : The webbrowser module provides a high-level interface to allow
displaying Web-based documents to users. Under most circumstances, simply
calling the open() function from this module will open url using the default browser .
One have to import the module and use open() function.
Webbrowser.open(“URL”,new=2)
If new is 0, the url is opened in the same browser window if possible. If new is 1, a
new browser window is opened if possible. If new is 2, a new browser page ("tab") is
opened if possible.
 Pyttsx : pyttsx is a cross-platform text to speech library which is platform
independent. The major advantage of using this library for text-to-speech
conversion is that it works offline. However, pyttsx supports only Python 2.x.
Hence, we will see pyttsx3 which is modified to work on both Python 2.x and
Python 3.x with the same code.
6
.
 Speech-Recognition : Speech recognition has its roots in research done at Bell Labs
in the early 1950s. Early systems were limited to a single speaker and had limited
vocabularies of about a dozen words. Modern speech recognition systems have
come a long way since their ancient counterparts. They can recognize speech from
multiple speakers and have enormous vocabularies in numerous languages.
The first component of speech recognition is, of course, speech. Speech must be
converted from physical sound to an electrical signal with a microphone, and
then to digital data with an analog-to-digital converter. Once digitized, several
models can be used to transcribe the audio to text.
Most modern speech recognition systems rely on what is known as a Hidden
Markov Model (HMM). This approach works on the assumption that a speech
signal, when viewed on a short enough timescale (say, ten milliseconds), can be
reasonably approximated as a stationary process—that is, a process in which
statistical properties do not change over time.
In a typical HMM, the speech signal is divided into 10-millisecond fragments. The
power spectrum of each fragment, which is essentially a plot of the signal’s
power as a function of frequency, is mapped to a vector of real numbers known
as cepstral coefficients. The dimension of this vector is usually small—sometimes as
low as 10, although more accurate systems may have dimension 32 or more.
The final output of the HMM is a sequence of these vectors.
To decode the speech into text, groups of vectors are matched to one or
more phonemes. A fundamental unit of speech. This calculation requires training,
since the sound of a phoneme varies from speaker to speaker, and even varies from
one utterance to another by the same speaker. A special algorithm is then applied
to determine the most likely word (or words) that produce the given sequence of
phonemes.
One can imagine that this whole process may be computationally expensive. In
many modern speech recognition systems, neural networks are used to simplify the
speech signal using techniques for feature transformation and dimensionality
reduction before HMM recognition. Voice activity detectors (VADs) are also used
to reduce an audio signal to only the portions that are likely to contain speech.
This prevents the recognizer from wasting time analyzing unnecessary parts of the
signal.
7
 Wikipedia : The Internet is the single largest source of information, and therefore it
is important to know how to fetch data from various sources. And with Wikipedia
being one of the largest and most popular sources for information on the Internet.
Wikipedia is a multilingual online encyclopedia created and maintained as an open
collaboration project by a community of volunteer editors using a wiki-based
editing system.
 Smtplib : Simple Mail Transfer Protocol (SMTP) is a protocol, which handles
sending e-mail and routing e-mail between mail servers.
Python provides smtplib module, which defines an SMTP client session object that
can be used to send mail to any Internet machine with an SMTP or ESMTP
listener daemon.
Here is the detail of the parameters:
Host - This is the host running SMTP server. One can specify IP address of the
host or domain name like facebook.com This is optional argument.
Port - If one are providing host argument , then they need to specify a port where
SMTP server is listening. Usually this port would be 25.
Local-hostname - If one’s SMTP server is running on local machine,then they
can specify just localhost as of this option.
An SMTP object has an instance method called sendmail,which is typically used to
do the work of mailing a message. It takes the parameters-
 The sender - A string with the address of the sender.
 The receivers - A list of strings , one or each recipient.
 The message - A message as a string formatted as a specified in the various
RFCs.
 Playsound : The playsound module is the simplest module to use for playing sound.
This module works on both Python 2 and Python 3, and is tested to play wav and
mp3 files only. It contains only one method, named playsound(), with one
argument to take the audio filename for playing.
8
 Plyer : Plyer is a Python library for accessing features of hardware / platforms.
 Ctypes : Ctypes is a foreign function library for Python. It provides C compatible
data types, and allows calling functions in DLLs or shared libraries. It can be used
to wrap these libraries in pure Python.
 Psutil : Psutil is a Python cross-platform library used to access system details and
process utilities. It is used to keep track of various resources utilization in the
system. Usage of resources like CPU, memory, disks, network, sensors can be
monitored. Hence, this library is used for system monitoring, profiling, limiting
process resources and the management of running processes. It is supported in
Python versions 2.6, 2.7 and 3.4+.
 Urllib : urllib is a package that collects several modules for working with URLs:
urllib.request for opening and reading URLs
urllib.error containing the exceptions raised by urllib.request
urllib.parse for parsing URLs
urllib.robotparser for parsing robots.txt files
 Pyspeedtest : Python library to test network bandwidth using Speedtest.net servers.
One can check ping speed, downloading speed and ping speed using this library .
 Pandas : pandas is a fast, powerful, flexible and easy to use open source data
analysis and manipulation tool, built on top of the Python programming language.
 Matplotlib : Matplotlib is an amazing visualization library in Python for 2D plots
of arrays. Matplotlib is a multi-platform data visualization library built on NumPy
arrays and designed to work with the broader SciPy stack. It was introduced by
John Hunter in the year 2002.
One of the greatest benefits of visualization is that it allows us visual access to huge
amounts of data in easily digestible visuals. Matplotlib consists of several plots like
line, bar, scatter, histogram etc.
9
 Beautifulsoup : Beautiful Soup is a Python package for parsing HTML and XML
documents (including having malformed markup, i.e. non-closed tags, so named
after tag soup). It creates a parse tree for parsed pages that can be used to extract
data from HTML, which is useful for web scraping.
 Tabulate : Pretty-print tabular data in Python, a library and a command-line
utility.
The main use cases of the library are:
 printing small tables without hassle: just one function call, formatting is
guided by the data itself
 authoring tabular data for lightweight plain-text markup: multiple output
formats suitable for further editing or transformation
 readable presentation of mixed textual and numeric data: smart column
alignment, configurable number formatting, alignment by a decimal point
 Numpy : NumPy is a python library used for working with arrays. It also has
functions for working in domain of linear algebra, fourier transform, and
matrices. NumPy was created in 2005 by Travis Oliphant. It is an open source
project and can be used freely. NumPy stands for Numerical Python.
 Opencv : OpenCV-Python is a library of Python bindings designed to solve
computer vision problems. ... OpenCV-Python makes use of Numpy, which is a
highly optimized library for numerical operations with a MATLAB-style syntax.
All the OpenCV array structures are converted to and from Numpy arrays.
It is also a free open source library used in real-time image processing. It's used
to process images, videos, and even live streams too.
 Wave : The wave module in Python's standard library is an easy interface to the
audio WAV format. The functions in this module can write audio data in raw
format to a file like object and read the attributes of a WAV file.
10
6. Tools Required
 Hardware : Monitor/Display
 Software : Windows 10
Visual Studio Code(IDE)
Google-Chrome browser
Python version 3.7
7. Diagrams
11
8. List of tasks This application perform
When this application is executed user has to set some input fields ,
(i) That is target whatsapp number and the body of the message . Now here question
arises why these has to be set at the initial point , so an emergency alarming
feature has been added to this desktop assistant. To ellaborate this feature let’s take an
example below.
Eg. Suppose one person goes out of his home for any purpose and in the road their
he faces something wrong happening with him, suppose some people are trying to
attack him or trying to force him to do something or to take him with them ,means
any kind of disturbing situation. He finds that there is nobody to help him in that
situation , he shouts “help me, help me please! ” but there is nobody to help but using
this emergency alarming features he can beg help direct from police station or
hospitals or any kind of emergency services , may be there are no people to listen his
cry but there is his voice assistant running on his laptop or mobile phone and when he
shouts “help me !” his digital assistant listen to him, recognizes his voice and sends a
preset message for help to any emergency service within few seconds. It looks good
now right !! Now he can inform anyone about his problem without calling or texting
anyone touching his mobile phone , he can beg help only using his voice command .
So what he needs to do to activate this feature in emergency is first to set all the
inputs those are the target whatsapp number (the number he wants to send message to
inform aboout his problem )and the message of the body(here suppose the message he
wants to send to police station ), so when all these are set now his digital assistant is
ready to help him out of his home too. The best format of writing the message body is
<Myself XYZ, My address is UXV , My contact number is XXXXXXXXX, I came to
market , now I am in trouble please help me !!>
All these things he has to set before going outside ,so that in critical situation using
only “help me !” command message can be sent because in a disturbing situation he
would’nt get much time to deliver his all details to be traced out.
[**NOTE : To successfully implement this feature the target whatsapp number should
be registered with twiliio account because in this project twilio API has been used for
this purpose.**]
12
1.setting the Application
2. Whatsapp message notification in target number
3. Message in target number’s inbox
13
(ii) When the application starts it shows birthday reminder if any of user’s friends or
known person has their birthday on that day or not . User just has to set the date of
birth of anybody in the application, if the current date matches with the preset date
then it will show birthday wish reminder otherwis it will push a notification as shown
in the image below.
 More Tasks and their Commands
Voice Command Task Description
Ok bro Activation poitn for speech recognition
Jarvis Check if jarvis is listening user or not
Who are you ? Tells the sytem name that is ‘Jarvis’
tell me something about you Gives basic description about itself
temperature Forecast the current temperature of air
check my battery status Gives the battery details
check the connection status Says if user is connected to internet or not
and pushes notification
check internet speed Checks and tells upload , download and
ping speed of user’s network
<user’s query>wikipedia Surfing to wikipedia, shows and speaks
the result aloud
google search <query> Opens google chrome browser and
displays all the possible results for the
query
google Opens google for user
14
Voice Command Task Description
google maps Opens google map
google drive Opens google drive for user
google translate Opens google translate for user
find location<place_name> Searches the place on google map and
shows result in browser
open youtube Opens youtube homepage
search youtube<query> Searches and shows all the possible
contents for user’s query
open udemy Opens udemy homepage
search udemy<course_name> Shows all the possible course
find geeksforgeeks<subject> Opens geeksforgeeks and shows possible
results
open mail Open user’s personal gmail id account
send mail
<message_body><target_mail_id>
Sends mail to anyone using voice
command
open whatsapp Opens whatsapp for user
open facebook Opens facebook for user
find facebook<query> Find people in facebook
search live train status<train_no> Shows live status for user provided train
number
open zoom Opens zoom application
open sublime text Opens sublime text editor
open calculator Opens calculator app
open notepad Opens notepad in desktop
handle file<mode_to_handle> Read write and save text files based on
user selected mode
play music/I am sad Plays music from folders
movie<movie_name> Opens media player and starts the specific
name
take screenshot<file_name> Takes screen shot and saves as user
provided file name in specified folder
15
Voice Command Task Description
change walpaper Changes the desktop background
take me to my chilhood Shows any chilhood photo of user if there
is any specified folder conatining all such
photos
set alarm
<hours><minutes><am/pm>
Sets alarm and rings alarm tone when the
set time reaches
set timer <seconds> Set a timer for user provided seconds and
rings a warning tone when the deadline
reacches
read breaking news Read top 10 global news headlines aloud
in a day
india corona cases Shows results of top 5 states in india
corona update Shows total global records of corona, like
number of infected , death , recovered
people
take photo Captures picture and saves to specified
folder
record video Records video using dektop camera and
saves to specified folder
record audio Records audio using microphone and
saves the recorded file in specified folder
help me <problem_statement> Sends whatsapp messages to emergency
contact through voice command
police<problem_statement> Sends mail to local police station mail-id
mentioning the problem through voice
command
restart my pc Takes confirmation from user about
restarting and works accordingly
shutdown my pc Takes confirmation from user about
switching the system off and works
accordingly
exit Quits the application
16
9. Uses of Speech Recognition program
Basically speech recognition is used for two main purposes. First and foremost
dictation that is in the context of speech recognition is translation of spoken words
into text and second controlling the computer and its various application by voice .
Writing by voice let a person to write 150 words per minute or more if indeed he/she
can spoke quickly. This perspective of speech recoginition programs help to do much
bigger things in a short time and this way they can save their effort too.
10. Applications
10.1 From medical Perspective :
People with disabilities can benefit from speech recognition programs. Speech
recognition is especially useful for people who have difficulties using their hands,
in such cases speech recognition is much beneficial and they can use for operating
computers. Speech recognition is used in deaf telephony such as voicemail to
text.
10.2 From military perspective :
Speech recognition is important from military perspective ; in air force speech
recognition has definite potential for reducing the pilot workload. Beside the air
force such program can also be used to train helicopters , battle management and
other applications.
10.3 From education perspective :
Individual with learning disabilities who have problems with thought-to-paper
communication can benefit from the software . some other application areas of speech
recognition technology are described above.
17
11. Some factors that may disturb functionalities of the application :
 Homonyms : Are the words that are differently spelled and have the different
meaning but acuqires the same meaning, for example ‘to’ and ‘two’, ‘be’ and
‘bee’. This is a challenge for computer machine to distinguish between such
types of phrases that sound alike.
 Overlapping Speeches : A second challenge in this process is to understand the
speech uttered by user, often the machine takes wrong command on the basis of
the style of uttering a word .
12. The future of Speech Recognition :
 Accuracy will become better and better.
 Dictation speeech recognition will gradually become accepted
 Using speech recognition in collaboration with AI a system can be developed
exactly as intelligent as human
 In future probably corporate tasks can be automated using speech recognition
and selenium.
13. References :
1. https://pypi.org/
2. https://www.geeksforgeeks.org/
3. https://github.com/github
4. https://www.kdnuggets.com/2020/06/easy-speech-text-python.html
5.
https://www.analyticsvidhya.com/blog/2019/07/learn-build-first-speech-to-text-model-
python/
18

Mais conteúdo relacionado

Mais procurados

Minor Project Presentation 1
Minor Project Presentation 1Minor Project Presentation 1
Minor Project Presentation 1
Pratishtha Ram
 

Mais procurados (20)

Jarvis
JarvisJarvis
Jarvis
 
Online Voting System ppt
Online Voting System pptOnline Voting System ppt
Online Voting System ppt
 
Virtual personal assistant
Virtual personal assistantVirtual personal assistant
Virtual personal assistant
 
Driver Drowsiness Detection report
Driver Drowsiness Detection reportDriver Drowsiness Detection report
Driver Drowsiness Detection report
 
Online Voting System-using Advanced Java
Online Voting System-using Advanced JavaOnline Voting System-using Advanced Java
Online Voting System-using Advanced Java
 
Python project on Image Based Captcha
Python project on Image Based CaptchaPython project on Image Based Captcha
Python project on Image Based Captcha
 
Introduction to IoT
Introduction to IoTIntroduction to IoT
Introduction to IoT
 
Online voting system ppt by anoop
Online voting system ppt by anoopOnline voting system ppt by anoop
Online voting system ppt by anoop
 
3D Password Presentation
3D  Password Presentation3D  Password Presentation
3D Password Presentation
 
Voice based email for blinds
Voice based email for blindsVoice based email for blinds
Voice based email for blinds
 
Powerpoint presentation on 5G wireless technology
Powerpoint presentation on 5G wireless technologyPowerpoint presentation on 5G wireless technology
Powerpoint presentation on 5G wireless technology
 
Chatbot Abstract
Chatbot AbstractChatbot Abstract
Chatbot Abstract
 
Internship on web development
Internship on web developmentInternship on web development
Internship on web development
 
Internship report on AI , ML & IIOT and project responses full docs
Internship report on AI , ML & IIOT and project responses full docsInternship report on AI , ML & IIOT and project responses full docs
Internship report on AI , ML & IIOT and project responses full docs
 
Minor project Report for "Quiz Application"
Minor project Report for "Quiz Application"Minor project Report for "Quiz Application"
Minor project Report for "Quiz Application"
 
Placement management system
Placement management systemPlacement management system
Placement management system
 
Minor Project Presentation 1
Minor Project Presentation 1Minor Project Presentation 1
Minor Project Presentation 1
 
A.I based chatbot on healthcare and medical science
A.I based chatbot on healthcare and medical scienceA.I based chatbot on healthcare and medical science
A.I based chatbot on healthcare and medical science
 
Online voting system
Online voting systemOnline voting system
Online voting system
 
Android College Application Project Report
Android College Application Project ReportAndroid College Application Project Report
Android College Application Project Report
 

Semelhante a Desktop assistant

this is a jarvis ppt for jarvis ai assistant lovers and this is for you
this is a jarvis ppt for jarvis ai assistant lovers and this is for youthis is a jarvis ppt for jarvis ai assistant lovers and this is for you
this is a jarvis ppt for jarvis ai assistant lovers and this is for you
higev50580
 
VIRTUAL PERSONAL ASSISTANT.pdf
VIRTUAL PERSONAL ASSISTANT.pdfVIRTUAL PERSONAL ASSISTANT.pdf
VIRTUAL PERSONAL ASSISTANT.pdf
AnkushSolanki6
 
An Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile EnvironmentAn Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile Environment
Association of Scientists, Developers and Faculties
 

Semelhante a Desktop assistant (20)

IRJET- Virtual Vision for Blinds
IRJET- Virtual Vision for BlindsIRJET- Virtual Vision for Blinds
IRJET- Virtual Vision for Blinds
 
this is a jarvis ppt for jarvis ai assistant lovers and this is for you
this is a jarvis ppt for jarvis ai assistant lovers and this is for youthis is a jarvis ppt for jarvis ai assistant lovers and this is for you
this is a jarvis ppt for jarvis ai assistant lovers and this is for you
 
AI & ML
AI & MLAI & ML
AI & ML
 
Bt35408413
Bt35408413Bt35408413
Bt35408413
 
Voice Assistant.pptx
Voice Assistant.pptxVoice Assistant.pptx
Voice Assistant.pptx
 
Synthesized Speech using a small Microcontroller
Synthesized Speech using a small MicrocontrollerSynthesized Speech using a small Microcontroller
Synthesized Speech using a small Microcontroller
 
N010637794
N010637794N010637794
N010637794
 
Personal Voice Assistant using python.pptx
Personal Voice Assistant using python.pptxPersonal Voice Assistant using python.pptx
Personal Voice Assistant using python.pptx
 
Voice Assistant (1).pdf
Voice Assistant (1).pdfVoice Assistant (1).pdf
Voice Assistant (1).pdf
 
Demonstration of visual based and audio-based hci system
Demonstration of visual based and audio-based hci systemDemonstration of visual based and audio-based hci system
Demonstration of visual based and audio-based hci system
 
NEURAL NETWORK BOT
NEURAL NETWORK BOTNEURAL NETWORK BOT
NEURAL NETWORK BOT
 
Automated System Using Speech Recognition
Automated System Using Speech RecognitionAutomated System Using Speech Recognition
Automated System Using Speech Recognition
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And Recording
 
Virtual Personal Assistant
Virtual Personal AssistantVirtual Personal Assistant
Virtual Personal Assistant
 
VIRTUAL PERSONAL ASSISTANT.pdf
VIRTUAL PERSONAL ASSISTANT.pdfVIRTUAL PERSONAL ASSISTANT.pdf
VIRTUAL PERSONAL ASSISTANT.pdf
 
Python
PythonPython
Python
 
An Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile EnvironmentAn Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile Environment
 
Advanced Computational Intelligence: An International Journal (ACII)
Advanced Computational Intelligence: An International Journal (ACII)Advanced Computational Intelligence: An International Journal (ACII)
Advanced Computational Intelligence: An International Journal (ACII)
 
VOICE COMMAND SYSTEM USING RASPBERRY PI
VOICE COMMAND SYSTEM USING RASPBERRY PIVOICE COMMAND SYSTEM USING RASPBERRY PI
VOICE COMMAND SYSTEM USING RASPBERRY PI
 
Voice Command System Using Raspberry PI
Voice Command System Using Raspberry PIVoice Command System Using Raspberry PI
Voice Command System Using Raspberry PI
 

Último

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 

Último (20)

Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 

Desktop assistant

  • 1. BUIDING A DESKTOP ASSISTANT THAT USES VOICE COMAAND AND COMPLETES SPECIFIED TASKS SUBMITTED TO KIIT Deemed to be University In Partial Fulfillment of the Requirement for the Award of BACHELOR’S DEGREE IN ELECTRONICS AND COMPUTER SCIENECE ENGINEERING BY PRASUN CHAKRABORTY ROLL-1730041 UNDER THE GUIDANCE OF PROF.CHANDANI KUMARI SCHOOL OF ELECTRONICS ENGINEERING KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY BHUBANESWAR, ODISHA - 751024
  • 2. KIIT Deemed to be University School of Electronics Engineering BHUBANESWAR, ODISHA - 751024 CERTIFICATE This is certify that the report entitled “BUIDING A DESKTOP ASSISTANT THAT USES VOICE COMAAND AND COMPLETES SPECIFIED TASKS “ BY PRASUN CHAKRABORTY ROLL-1730041 is a record of bonafied work carried out by them, in the partial fulfillment of the requirement for the award of degree of Bachelor of Engineering in Electronics and Computer Science Engineering at KIIT Deemed to be university, Bhubaneswar. This work is done during year 2020, under your guidance. Date: / / (Prof. Guide Name) CHANDANI KUMARI
  • 3. ACKNOWLEDGEMENTS The success and final outcome of this project required a lot of guidance and assistance from Prof. CHANDANI KUMARI and I am extremely privileged to have got this all along the completion of my project. All that I have done is only due to such supervision and assistance and I would not forget to to thank her. I respect and thank Prof. CHANDANI KUMARI for providing me an opportunity to do the project work from home in this pandemic and giving me all support and guidance which made me complete the project . I am extremely thankful to her for providing such a nice support and guidance although she had busy schedule managing the academic affairs. PRASUN CHAKRABORTY
  • 4. ABSTARCT A virtual assistant for desktop also called digital assistant, is an application program that understands natural language voice commands and complete tasks for the users. Most of the digital assistants are interacted with by using human voice. They may also be reffered as voice assistant. To interact with a digital assistant one must use a wake word, that is used to activate the device . Once one said a wake word, the system is now ready to be asked a question. One could then ask “whats about the weather” and the system will forecast the weather in local area aloud . As digital digital assistant become more popular , so do their capabilities and the task they are able to perform . Below are few of popular activities this desktop assistant can perform .  Answer basic questions  Searching Google/Wikipedia  Set alarm, timer  Get information about temperature  Playing a Song  Reading and writing text files & many more.. In this project python programming language is used to develop the application .
  • 5. CONTENT 1. Literature Review……………………………………………1 1.1 An overview of speech recognition……………………………1 1.2 History……………………………………………………..……………….1 2. Types of speech recognition……………….…………..2 2.1 Isolated speech…………………………………………..…….……….2 2.2 Connected speech……………………………………………………..2 2.3 Continuous speech…………………………………………………….2 2.4 Spontaneous speech………………………………………………….2 3. Basic speech recognition process……………………3 4. Introduction to Python…………………………………..4 5. Python libraries used in this Project……………….4 6. Tools required………………………………………………10 7. Use case Diagram………………………………………….10 8. List of tasks this application perform…………….11 9. Uses of speech recognition……………………………16 10. Applications………………………………………………….16 10.1 From medical perspective……………………………………….16 10.2 From military perspective………………………………………..16 10.3 From education perspective…………………………………….16 11. Some factors that may disturb functionalities of the application………………………………………….17 12. The future of Speech Recognition…………………..17 13. References……………………………………………………..17
  • 6. 1 1. Literature Review 1.1 An overview of Speech Recognition Speech Recognition is a technology that enables a computer to capture the words spoken by a human with a help of microphone . These words are later on recognized by Speech recognizer, and at the end system works according to the voice input . The process of Speech Recognition consists of different steps that will be discussed in the following section one by one. 1.2 History The concept of speech recognition stated somewhere in 1940s, practically the first speech recognition program was appeared in 1952 at the bell labs, that was about recognition of digit in a noise free environment. 1940s and 1950s are considered as the foundation period of the speech recognition technology , in this period work was done on the foundational paradigms of the speech recognition that is automation and information theoretic models. The key technologies that were developed in this decade were filter banks and time normalization methods. In 1990s the key technologies developed during this period were the methods for stochastic language understanding , statistical learning of acoustic and language models and the method for implementation of large vocabulary speech understanding systems. After the five decades of research , the speech recognition technology has finally entered marketplace , benefiting the users in variety of ways . The challenge of designing a machine that truly functions like an intelligent human is still a major one going forward.
  • 7. 2 2. Types of Speech Recognition : Speech Recognition systems can be divided Into the number of classes based on their ability to recognize those words and list of words they have. A few classes of speech recognition are classified as under : 2.1. Isolated Speech Isolated word usually involve a pause between two utterance ; it does’nt mean that it only accepts a single word but instead it requires one utterance at a time. 2.2 Connected Speech Connected words or connected speech are similar to isolated speech but allow separate utterance with minimal pause between them . 2.3 Continuous Speech Continuous speech allow the user to speak almost naturally. It is also called the computer dictation. 2.4 Spontaneous Speech At a basic level, it can be the thought of as speech that is natural sounding and not rehearsed. An ASR system with spontaneous speech ability should be able to handle a variety of natural speech features such as words being run together ,“ums” and “ahs”and even slight stutters.
  • 8. 3 3. Basic Speech Recognition Process  Audio Input : With the help of the microphone the audio (human voice )is input to the system .  Analog to Digital : The process of converting to analog signal into digital form is known as digitization . it involves both sampling and quantization process.  Acoustic Model : An acoustic model is created by taking audio inputs and their text transcripts , and using software to create statistical representation of the sounds that make up each word .  Language Model : Language modeling is used in many natural language processing applications such as speech recognition tries to capture the properties of language and to predict the next word in the speech sequence .  Speech Engine : The job of speech engine is to convert the input audio file into text to accomplish this it uses all sorts of data, software algorithms and statistics .  Output : After all the above steps finally the output comes that is performing operations according to the voice commands . Audio Input Analog to Digital Acoustic Model Output Language ModelSpeech Engine
  • 9. 4 4. Introduction to Python 4.1 Python : Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. It is a high level programming language which is, Interpreted: Python is processed as run time by the interpreter . Interactive: A python prompt can be used and can interact with the interpreter directly to write the programs . Object-oriented: Python suppoprts Object oriented technique of programming Beginner’s language: Python is a great language for the beginner-level programmers and supports the development of a wide range of applications. In this project different techniques have been used for different functionalities . Those will be discussed one by one . 5. Python Libraries used in this project  OS : The OS module in python provides functions for interacting with the operating system. OS, comes under Python’s standard utility modules. This module provides a portable way of using operating system dependent functionality.  Date-time : Python has a module named datetime to work with dates and times.  Random : Sometimes we want the computer to pick a random number in a given range, pick a random element from a list, pick a random card from a deck, flip a coin, etc. The random module provides access to functions that support these types of operations. The random module is another library of functions that can extend the basic features of python.  PyOWM : PyOWM is a client Python wrapper library for Open Weather Map web APIs. It allows quick and easy consumption of OWM data from Python applications via a simple object model and in a human-friendly fashion.
  • 10. 5  PyAutoGUI : PyAutoGUI is a cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard. PyAutoGUI supports Python 2 and 3.  Requests : Requests is an Apache2 Licensed HTTP library, written in Python. It is designed to be used by humans to interact with the language. This means one don’t have to manually add query strings to URLs, or form-encode POST data. Requests will allow one to send HTTP/1.1 requests using Python. With it, one can add content like headers, form data, multipart files, and parameters via simple Python libraries. It also allows you to access the response data of Python in the same way.  Twilio.rest : The Twilio Python Helper Library makes it easy to interact with the Twilio API from Python application.The Twilio Python Helper Library supports Python applications written in Python 2.7 and above. Using Twilio API one can automate sending whatsapp messages , calls , sending verification codes. In our project Twilio API has been used to send help message to anyone in emergency situation.  Webbrowser : The webbrowser module provides a high-level interface to allow displaying Web-based documents to users. Under most circumstances, simply calling the open() function from this module will open url using the default browser . One have to import the module and use open() function. Webbrowser.open(“URL”,new=2) If new is 0, the url is opened in the same browser window if possible. If new is 1, a new browser window is opened if possible. If new is 2, a new browser page ("tab") is opened if possible.  Pyttsx : pyttsx is a cross-platform text to speech library which is platform independent. The major advantage of using this library for text-to-speech conversion is that it works offline. However, pyttsx supports only Python 2.x. Hence, we will see pyttsx3 which is modified to work on both Python 2.x and Python 3.x with the same code.
  • 11. 6 .  Speech-Recognition : Speech recognition has its roots in research done at Bell Labs in the early 1950s. Early systems were limited to a single speaker and had limited vocabularies of about a dozen words. Modern speech recognition systems have come a long way since their ancient counterparts. They can recognize speech from multiple speakers and have enormous vocabularies in numerous languages. The first component of speech recognition is, of course, speech. Speech must be converted from physical sound to an electrical signal with a microphone, and then to digital data with an analog-to-digital converter. Once digitized, several models can be used to transcribe the audio to text. Most modern speech recognition systems rely on what is known as a Hidden Markov Model (HMM). This approach works on the assumption that a speech signal, when viewed on a short enough timescale (say, ten milliseconds), can be reasonably approximated as a stationary process—that is, a process in which statistical properties do not change over time. In a typical HMM, the speech signal is divided into 10-millisecond fragments. The power spectrum of each fragment, which is essentially a plot of the signal’s power as a function of frequency, is mapped to a vector of real numbers known as cepstral coefficients. The dimension of this vector is usually small—sometimes as low as 10, although more accurate systems may have dimension 32 or more. The final output of the HMM is a sequence of these vectors. To decode the speech into text, groups of vectors are matched to one or more phonemes. A fundamental unit of speech. This calculation requires training, since the sound of a phoneme varies from speaker to speaker, and even varies from one utterance to another by the same speaker. A special algorithm is then applied to determine the most likely word (or words) that produce the given sequence of phonemes. One can imagine that this whole process may be computationally expensive. In many modern speech recognition systems, neural networks are used to simplify the speech signal using techniques for feature transformation and dimensionality reduction before HMM recognition. Voice activity detectors (VADs) are also used to reduce an audio signal to only the portions that are likely to contain speech. This prevents the recognizer from wasting time analyzing unnecessary parts of the signal.
  • 12. 7  Wikipedia : The Internet is the single largest source of information, and therefore it is important to know how to fetch data from various sources. And with Wikipedia being one of the largest and most popular sources for information on the Internet. Wikipedia is a multilingual online encyclopedia created and maintained as an open collaboration project by a community of volunteer editors using a wiki-based editing system.  Smtplib : Simple Mail Transfer Protocol (SMTP) is a protocol, which handles sending e-mail and routing e-mail between mail servers. Python provides smtplib module, which defines an SMTP client session object that can be used to send mail to any Internet machine with an SMTP or ESMTP listener daemon. Here is the detail of the parameters: Host - This is the host running SMTP server. One can specify IP address of the host or domain name like facebook.com This is optional argument. Port - If one are providing host argument , then they need to specify a port where SMTP server is listening. Usually this port would be 25. Local-hostname - If one’s SMTP server is running on local machine,then they can specify just localhost as of this option. An SMTP object has an instance method called sendmail,which is typically used to do the work of mailing a message. It takes the parameters-  The sender - A string with the address of the sender.  The receivers - A list of strings , one or each recipient.  The message - A message as a string formatted as a specified in the various RFCs.  Playsound : The playsound module is the simplest module to use for playing sound. This module works on both Python 2 and Python 3, and is tested to play wav and mp3 files only. It contains only one method, named playsound(), with one argument to take the audio filename for playing.
  • 13. 8  Plyer : Plyer is a Python library for accessing features of hardware / platforms.  Ctypes : Ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python.  Psutil : Psutil is a Python cross-platform library used to access system details and process utilities. It is used to keep track of various resources utilization in the system. Usage of resources like CPU, memory, disks, network, sensors can be monitored. Hence, this library is used for system monitoring, profiling, limiting process resources and the management of running processes. It is supported in Python versions 2.6, 2.7 and 3.4+.  Urllib : urllib is a package that collects several modules for working with URLs: urllib.request for opening and reading URLs urllib.error containing the exceptions raised by urllib.request urllib.parse for parsing URLs urllib.robotparser for parsing robots.txt files  Pyspeedtest : Python library to test network bandwidth using Speedtest.net servers. One can check ping speed, downloading speed and ping speed using this library .  Pandas : pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.  Matplotlib : Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Matplotlib is a multi-platform data visualization library built on NumPy arrays and designed to work with the broader SciPy stack. It was introduced by John Hunter in the year 2002. One of the greatest benefits of visualization is that it allows us visual access to huge amounts of data in easily digestible visuals. Matplotlib consists of several plots like line, bar, scatter, histogram etc.
  • 14. 9  Beautifulsoup : Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.  Tabulate : Pretty-print tabular data in Python, a library and a command-line utility. The main use cases of the library are:  printing small tables without hassle: just one function call, formatting is guided by the data itself  authoring tabular data for lightweight plain-text markup: multiple output formats suitable for further editing or transformation  readable presentation of mixed textual and numeric data: smart column alignment, configurable number formatting, alignment by a decimal point  Numpy : NumPy is a python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices. NumPy was created in 2005 by Travis Oliphant. It is an open source project and can be used freely. NumPy stands for Numerical Python.  Opencv : OpenCV-Python is a library of Python bindings designed to solve computer vision problems. ... OpenCV-Python makes use of Numpy, which is a highly optimized library for numerical operations with a MATLAB-style syntax. All the OpenCV array structures are converted to and from Numpy arrays. It is also a free open source library used in real-time image processing. It's used to process images, videos, and even live streams too.  Wave : The wave module in Python's standard library is an easy interface to the audio WAV format. The functions in this module can write audio data in raw format to a file like object and read the attributes of a WAV file.
  • 15. 10 6. Tools Required  Hardware : Monitor/Display  Software : Windows 10 Visual Studio Code(IDE) Google-Chrome browser Python version 3.7 7. Diagrams
  • 16. 11 8. List of tasks This application perform When this application is executed user has to set some input fields , (i) That is target whatsapp number and the body of the message . Now here question arises why these has to be set at the initial point , so an emergency alarming feature has been added to this desktop assistant. To ellaborate this feature let’s take an example below. Eg. Suppose one person goes out of his home for any purpose and in the road their he faces something wrong happening with him, suppose some people are trying to attack him or trying to force him to do something or to take him with them ,means any kind of disturbing situation. He finds that there is nobody to help him in that situation , he shouts “help me, help me please! ” but there is nobody to help but using this emergency alarming features he can beg help direct from police station or hospitals or any kind of emergency services , may be there are no people to listen his cry but there is his voice assistant running on his laptop or mobile phone and when he shouts “help me !” his digital assistant listen to him, recognizes his voice and sends a preset message for help to any emergency service within few seconds. It looks good now right !! Now he can inform anyone about his problem without calling or texting anyone touching his mobile phone , he can beg help only using his voice command . So what he needs to do to activate this feature in emergency is first to set all the inputs those are the target whatsapp number (the number he wants to send message to inform aboout his problem )and the message of the body(here suppose the message he wants to send to police station ), so when all these are set now his digital assistant is ready to help him out of his home too. The best format of writing the message body is <Myself XYZ, My address is UXV , My contact number is XXXXXXXXX, I came to market , now I am in trouble please help me !!> All these things he has to set before going outside ,so that in critical situation using only “help me !” command message can be sent because in a disturbing situation he would’nt get much time to deliver his all details to be traced out. [**NOTE : To successfully implement this feature the target whatsapp number should be registered with twiliio account because in this project twilio API has been used for this purpose.**]
  • 17. 12 1.setting the Application 2. Whatsapp message notification in target number 3. Message in target number’s inbox
  • 18. 13 (ii) When the application starts it shows birthday reminder if any of user’s friends or known person has their birthday on that day or not . User just has to set the date of birth of anybody in the application, if the current date matches with the preset date then it will show birthday wish reminder otherwis it will push a notification as shown in the image below.  More Tasks and their Commands Voice Command Task Description Ok bro Activation poitn for speech recognition Jarvis Check if jarvis is listening user or not Who are you ? Tells the sytem name that is ‘Jarvis’ tell me something about you Gives basic description about itself temperature Forecast the current temperature of air check my battery status Gives the battery details check the connection status Says if user is connected to internet or not and pushes notification check internet speed Checks and tells upload , download and ping speed of user’s network <user’s query>wikipedia Surfing to wikipedia, shows and speaks the result aloud google search <query> Opens google chrome browser and displays all the possible results for the query google Opens google for user
  • 19. 14 Voice Command Task Description google maps Opens google map google drive Opens google drive for user google translate Opens google translate for user find location<place_name> Searches the place on google map and shows result in browser open youtube Opens youtube homepage search youtube<query> Searches and shows all the possible contents for user’s query open udemy Opens udemy homepage search udemy<course_name> Shows all the possible course find geeksforgeeks<subject> Opens geeksforgeeks and shows possible results open mail Open user’s personal gmail id account send mail <message_body><target_mail_id> Sends mail to anyone using voice command open whatsapp Opens whatsapp for user open facebook Opens facebook for user find facebook<query> Find people in facebook search live train status<train_no> Shows live status for user provided train number open zoom Opens zoom application open sublime text Opens sublime text editor open calculator Opens calculator app open notepad Opens notepad in desktop handle file<mode_to_handle> Read write and save text files based on user selected mode play music/I am sad Plays music from folders movie<movie_name> Opens media player and starts the specific name take screenshot<file_name> Takes screen shot and saves as user provided file name in specified folder
  • 20. 15 Voice Command Task Description change walpaper Changes the desktop background take me to my chilhood Shows any chilhood photo of user if there is any specified folder conatining all such photos set alarm <hours><minutes><am/pm> Sets alarm and rings alarm tone when the set time reaches set timer <seconds> Set a timer for user provided seconds and rings a warning tone when the deadline reacches read breaking news Read top 10 global news headlines aloud in a day india corona cases Shows results of top 5 states in india corona update Shows total global records of corona, like number of infected , death , recovered people take photo Captures picture and saves to specified folder record video Records video using dektop camera and saves to specified folder record audio Records audio using microphone and saves the recorded file in specified folder help me <problem_statement> Sends whatsapp messages to emergency contact through voice command police<problem_statement> Sends mail to local police station mail-id mentioning the problem through voice command restart my pc Takes confirmation from user about restarting and works accordingly shutdown my pc Takes confirmation from user about switching the system off and works accordingly exit Quits the application
  • 21. 16 9. Uses of Speech Recognition program Basically speech recognition is used for two main purposes. First and foremost dictation that is in the context of speech recognition is translation of spoken words into text and second controlling the computer and its various application by voice . Writing by voice let a person to write 150 words per minute or more if indeed he/she can spoke quickly. This perspective of speech recoginition programs help to do much bigger things in a short time and this way they can save their effort too. 10. Applications 10.1 From medical Perspective : People with disabilities can benefit from speech recognition programs. Speech recognition is especially useful for people who have difficulties using their hands, in such cases speech recognition is much beneficial and they can use for operating computers. Speech recognition is used in deaf telephony such as voicemail to text. 10.2 From military perspective : Speech recognition is important from military perspective ; in air force speech recognition has definite potential for reducing the pilot workload. Beside the air force such program can also be used to train helicopters , battle management and other applications. 10.3 From education perspective : Individual with learning disabilities who have problems with thought-to-paper communication can benefit from the software . some other application areas of speech recognition technology are described above.
  • 22. 17 11. Some factors that may disturb functionalities of the application :  Homonyms : Are the words that are differently spelled and have the different meaning but acuqires the same meaning, for example ‘to’ and ‘two’, ‘be’ and ‘bee’. This is a challenge for computer machine to distinguish between such types of phrases that sound alike.  Overlapping Speeches : A second challenge in this process is to understand the speech uttered by user, often the machine takes wrong command on the basis of the style of uttering a word . 12. The future of Speech Recognition :  Accuracy will become better and better.  Dictation speeech recognition will gradually become accepted  Using speech recognition in collaboration with AI a system can be developed exactly as intelligent as human  In future probably corporate tasks can be automated using speech recognition and selenium. 13. References : 1. https://pypi.org/ 2. https://www.geeksforgeeks.org/ 3. https://github.com/github 4. https://www.kdnuggets.com/2020/06/easy-speech-text-python.html 5. https://www.analyticsvidhya.com/blog/2019/07/learn-build-first-speech-to-text-model- python/
  • 23. 18