Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

•

1 like•561 views

This document summarizes a research study that compared text and speech input modalities for tagging photos on camera phones. The study tested three hypotheses: 1) speech is preferred over text for tagging, 2) the advantage of speech increases with longer tags, and 3) text is faster than speech for retrieving photos. A user study was conducted with conditions for speech-only, text-only, and allowing both. Results showed speech was not clearly better than text for tagging or retrieving photos. The implications are that systems should support multiple input modalities, enable reviewing audio tags, and allow combining modalities to address their separate strengths and weaknesses.

Technology Entertainment & Humor

Research & Development

Text vs. Speech
A Comparison of Tagging Input Modalities
for Camera Phones

Mauro Cherubini, Xavier Anguera,
Nuria Oliver, and Rodrigo de Oliveira

people do not want to tag
their pictures
intro → hypotheses → methodology → results → implications

research question:

Assuming that users are willing to
input at least one tag, which input
modality can help the production and
retrieval of the pictures?

intro → hypotheses → methodology → results → implications

hypothesis 1

Speech is preferred to text as an
annotation mechanism on mobile
phones (objective measure)

Support:
- Mitchard and Winkles (2002)

intro → hypotheses → methodology → results → implications

hypothesis 1-bis

Speech annotations are preferred by
users even if this means spending more
time on the task (subjective measure)

Support:
- Perakakis and Potamianos (2008)

intro → hypotheses → methodology → results → implications

hypothesis 2

The longer the tag the larger the
advantage of voice over text for
annotating pictures on mobile phones

Support:
- Hauptmann and Rudnicky (1990)

intro → hypotheses → methodology → results → implications

hypothesis 3

Retrieving pictures on mobile phones
with speech is not faster than with text
(objective measure)

Support:
- Mills et al. (2000)

intro → hypotheses → methodology → results → implications

the user study
ﬁeld study
controlled
(4 weeks)
experiment

T1 - T2 - T3 - T4

3 experimental conditions:
a. Speech only
b. Text only
c. Speech and Text

intro → hypotheses → methodology → results → implications

MAMI

intro → hypotheses → methodology → results → implications

features of MAMI

•  processing is done entirely on the mobile
phone
•  speech is not transcribed
•  to compare the waveforms of the audio tags,
MAMI uses algorithm of Dynamic Time
Warping

intro → hypotheses → methodology → results → implications

task 1: remember the tag
stimulus
retrieval

Pictures taken during the ﬁeld trial

intro → hypotheses → methodology → results → implications

task 2: remember the context
stimulus
retrieval

TASK 2
PICTURE 1

three little bushes
Garden
Tree
Stairs

intro → hypotheses → methodology → results → implications

task 3: remember the picture
stimulus
retrieval

Text
Audio tags were converted into
textual tags and vice versa

intro → hypotheses → methodology → results → implications

task 4: remember the
sequence
assignment
retrieval

TASK 4

Three pictures among
the oldest and three
pictures among the
newest.

intro → hypotheses → methodology → results → implications

metrics

•  time to completion
•  false positives
•  retrieval errors

intro → hypotheses → methodology → results → implications

results H1

intro → hypotheses → methodology → results → implications

results H1-bis
All participants in the BOTH group felt that tagging
with text was more effective than tagging with voice.

Voice: 3.33 [0.81], Text: 4.34 [0.81] (Mean [SD])
1 = completely agree; 5 = completely disagree

intro → hypotheses → methodology → results → implications

results H2

intro → hypotheses → methodology → results → implications

results H3

intro → hypotheses → methodology → results → implications

take away 1:
speech is not a given

the advantage of audio as an input modality for tagging
pictures on mobile phones is not a given

why?
1. retrieval precision
2. privacy

intro → hypotheses → methodology → results → implications

take away 2:
input mistakes
we address text input mistakes immediately.
on the contrary mistakes in audio recordings are less
frequently addressed

intro → hypotheses → methodology → results → implications

take away 3:
memory

speech does not help memorizing the tags

intro → hypotheses → methodology → results → implications

implication 1:
allow multiple modalities

© Pixar, 2008

intro → hypotheses → methodology → results → implications

implication 2:
enable audio inspection

intro → hypotheses → methodology → results → implications

implication 3:
enable modality synesthesia

© Disney, 1940
intro → hypotheses → methodology → results → implications

Research Development

end
thanks

martigan@gmail.com
mauro@tid.es

http://www.i-cherubini.it/mauro/blog/
http://research.tid.es/multimedia/

Similar to Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

CarterCritique1

CarterCritique1

Clark ch 5 and 6

Pennymotsett ppquiz

Cognitive principles of instruction (edet 722) ctml

academic3

GloCALL 2013 conference presentation

Takeshi Sato

Science.1207745.full

Universia Perú

Blenderbot

taeseon ryu

Similar to Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones (8)

CarterCritique1

Clark ch 5 and 6

Pennymotsett ppquiz

Cognitive principles of instruction (edet 722) ctml

GloCALL 2013 conference presentation

Science.1207745.full

Blenderbot

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Zilliz

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

GenAI Risks & Security Meetup 01052024.pdf

lior mazor

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

A Beginners Guide to Building a RAG App Using Open Source Milvus

Zilliz

Scalable LLM APIs for AI and Generative AI Application Development Ettikan Karuppiah, Director/Technologist - NVIDIA Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...

apidays

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Edi Saputra

Architecting Cloud Native Applications

WSO2

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

ICT role in 21st century education and its challenges

rafiqahmad00786416

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Exploring the Future Potential of AI-Enabled Smartphone Processors

Strategies for Landing an Oracle DBA Job as a Fresher

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Why Teams call analytics are critical to your entire business

GenAI Risks & Security Meetup 01052024.pdf

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

A Beginners Guide to Building a RAG App Using Open Source Milvus

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

presentation ICT roal in 21st century education

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Architecting Cloud Native Applications

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

ICT role in 21st century education and its challenges

Data Cloud, More than a CDP by Matt Robison

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

1. Research & Development Text vs. Speech A Comparison of Tagging Input Modalities for Camera Phones Mauro Cherubini, Xavier Anguera, Nuria Oliver, and Rodrigo de Oliveira

2. people do not want to tag their pictures intro → hypotheses → methodology → results → implications

3. research question: Assuming that users are willing to input at least one tag, which input modality can help the production and retrieval of the pictures? intro → hypotheses → methodology → results → implications

4. hypothesis 1 Speech is preferred to text as an annotation mechanism on mobile phones (objective measure) Support: - Mitchard and Winkles (2002) intro → hypotheses → methodology → results → implications

5. hypothesis 1-bis Speech annotations are preferred by users even if this means spending more time on the task (subjective measure) Support: - Perakakis and Potamianos (2008) intro → hypotheses → methodology → results → implications

6. hypothesis 2 The longer the tag the larger the advantage of voice over text for annotating pictures on mobile phones Support: - Hauptmann and Rudnicky (1990) intro → hypotheses → methodology → results → implications

7. hypothesis 3 Retrieving pictures on mobile phones with speech is not faster than with text (objective measure) Support: - Mills et al. (2000) intro → hypotheses → methodology → results → implications

8. the user study ﬁeld study controlled (4 weeks) experiment T1 - T2 - T3 - T4 3 experimental conditions: a. Speech only b. Text only c. Speech and Text intro → hypotheses → methodology → results → implications

9. MAMI intro → hypotheses → methodology → results → implications

10. features of MAMI •  processing is done entirely on the mobile phone •  speech is not transcribed •  to compare the waveforms of the audio tags, MAMI uses algorithm of Dynamic Time Warping intro → hypotheses → methodology → results → implications

11. task 1: remember the tag stimulus retrieval Pictures taken during the ﬁeld trial intro → hypotheses → methodology → results → implications

12. task 2: remember the context stimulus retrieval TASK 2 PICTURE 1 three little bushes Garden Tree Stairs intro → hypotheses → methodology → results → implications

13. task 3: remember the picture stimulus retrieval Text Audio tags were converted into textual tags and vice versa intro → hypotheses → methodology → results → implications

14. task 4: remember the sequence assignment retrieval TASK 4 Three pictures among the oldest and three pictures among the newest. intro → hypotheses → methodology → results → implications

15. metrics •  time to completion •  false positives •  retrieval errors intro → hypotheses → methodology → results → implications

16. results H1 intro → hypotheses → methodology → results → implications

17. results H1-bis All participants in the BOTH group felt that tagging with text was more effective than tagging with voice. Voice: 3.33 [0.81], Text: 4.34 [0.81] (Mean [SD]) 1 = completely agree; 5 = completely disagree intro → hypotheses → methodology → results → implications

18. results H2 intro → hypotheses → methodology → results → implications

19. results H3 intro → hypotheses → methodology → results → implications

20. results H3 - continued

21. take away 1: speech is not a given the advantage of audio as an input modality for tagging pictures on mobile phones is not a given why? 1. retrieval precision 2. privacy intro → hypotheses → methodology → results → implications

22. take away 2: input mistakes we address text input mistakes immediately. on the contrary mistakes in audio recordings are less frequently addressed intro → hypotheses → methodology → results → implications

23. take away 3: memory speech does not help memorizing the tags intro → hypotheses → methodology → results → implications

25. implication 2: enable audio inspection intro → hypotheses → methodology → results → implications

27. Research Development end thanks martigan@gmail.com mauro@tid.es http://www.i-cherubini.it/mauro/blog/ http://research.tid.es/multimedia/

Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

Recommended

Recommended

More Related Content

Similar to Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

Similar to Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones (8)

Recently uploaded

Recently uploaded (20)

Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones