Uma Murthy defended her PhD dissertation on developing digital libraries with superimposed information to support scholarly tasks involving fine-grained information. She acknowledged her family, advisors, colleagues, and funding sources. Her research addressed problems with managing and using heterogeneous and distributed information by developing a digital library prototype called SuperIDR that allows for working with contextualized subdocument information. A qualitative study found that subimages are important for fish identification tasks and that SuperIDR supports uses like marking, annotating, browsing, and searching subimages.
Ensuring Technical Readiness For Copilot in Microsoft 365
Digital Libraries Support Fine-grain Information Tasks
1. Digital Libraries with Superimposed Information Supporting Scholarly Tasks that Involve Fine-grain Information Uma Murthy PhD Defense 28 January 2011
2. Acknowledgments My family Dr. Edward Fox, Dr. Manuel Pérez-Quiñones, Dr. Ricardo Torres, Dr. Lois Delcambre, Dr. NarenRamakrishnan, Dr. Eric Hallerman, Lin Tzy Li, Dr. Marcos Goncalves, Yinlin Chen, Nadia Kozievitch, Evandro Ramos, Tiago Falcao, KapilAhuja, Dr. John Pitrelli, Dr. GaneshRamaswamy, Dr. Andrea Kavanaugh, Dr. Lillian Cassel, Dr. Deborah Tatar, Dr. Donald Orth, Seungwon Yang, LokeyaVenkatachalam, Seonho Kim, Doug Gorton, Ricardo Quintana-Castillo, Monika Akbar, Dave Archer, Susan Price, RaoShen, SrinivasVemuri, Xiaoyan Yang, YoncaHaciahmetoglu, PardhaPyla, ManasTungare, SameerAhuja, Ben Hanrahan, Laurian Vega, Stacy Branham, Tejinder Judge, Rhonda Phillips, RamyaRavichandar, HariPyla, ManjulaIyer, Dr. Noel Greis, Dr. Jack Olin, VenkatSrinivasan, … NSF grants (Superimposed information, Digital Government, DL curriculum, CTR, ECDL), Microsoft tablet PC grant, CS department, and Graduate school 2
4. Problems Information is heterogeneous, voluminous, distributed across locations, and it is challenging to manage, organize, access, retrieve, and use. Tools/methods (including paper-based and digital) are not well-integrated. 4 Ineffective and inefficient task execution
5. A digital library = repository of collections and metadata + services 5
6. Scenario 6 Find me species that are darters that have a dorsal fin that looks like this, which is connected to another dorsal fin that looks like this, which might have an orange hue on its edge Search for subdocuments, in context of other information, incl. other subdocuments Use it in another task/context
7. Superimposed information enables working with contextualized subdocuments superimposed (new) information marks base (existing) information 7
8. Hypothesis A digital library with superimposed information (SI-DL) provides enhanced support to scholarly tasks that involve working with subdocuments DL SI Provides enhanced support to Scholarly tasks with subdocuments + 8
21. Subimage and SuperIDR use – a qualitative study How do people use subimages in fish identification and how does SuperIDR support that use? SuperIDR support for working with subimages in fish identification Contexts and strategies of working with subimages in fish identification Characteristics of subimages and related information 21
22. Rationale: Maximize Use of SuperIDR Recruit people with interest in fish ID Have a longer duration of use in natural setting and in targeted tasks Have them use SuperIDR on their own (data on use in the wild) and in targeted tasks (opportunity to observe use) Collect qualitative data, in multiple ways and from multiple sources, on subimage and SuperIDR use in fish ID 22
24. 24 Study procedures Data collected Interview responses Diary entries Log data of SuperIDR use Screen captures of task execution Spoken thoughts during task execution Species id materials Database image Species id responses
25. Participants:3 groups 25 Analyzed participants based on fisheries and fish identification experience, current projects and fish identification practices P2 (male), P5 (female), P6 (male): Relatively less experienced, undergraduates (UG) or recent UG P1 (male), P5 (female): Moderately experienced Master’s students, working on theses and/or teaching/research P3 (male), P4 (female): Highly experienced PhD students, working on research projects
28. Co-presence, morphological comparisons, multiple parts description, connections/relationships, comparison with other information-objects 28 Comparison with other information objects Connections/relationships
29. information object as a whole, combination of types 29 Information object as a whole Combination of color and count
30. Strategies and contexts that suggest subimage use in fish identification In learning methods In identification (top-down approach, compare similar species) To help identify fishes quickly (identify in field versus the lab or the classroom) In fishes of the same species (to deal with variability in appearance) To verify species using manual inspection 30
31. Subimage use in SuperIDR Marking and annotating subimages (940 subimages and annotations) Browsing through subimages in species description, subimages in comparison, subimages in search results Text, image, and combined search, complex objects as queries 31
35. “It [SuperIDR] is pulling together different ways of getting to information ... So, not only do I have a taxonomy [and] dichotomous key, but it is also supported by images, many images that I have loaded in myself, that I can compare and contrast right there in the program [SuperIDR]. I can annotate the images, so I know that I kind of looking somewhat into their future [use]. And it kind of just pulls all those tools together, more so than [pulling together] information. It gives me many ways of accessing the same information. The more ways you can come to that information, the better [it is]. Because it is always going to make you more confident about the decision that you are making." [P1 interview] 35 SI-DL Context It depends on how distinct that species [is] and how many other species are similar to that species, I guess … I would never trust the result, I guess, 100 …you know, based on just one picture and a little bit of written text. I would always want to pull up other species that are somewhat similar and just do a visual inspection myself to be sure that it just was not some bad [query] image that I used or a bad search term." [P3 interview] “... It would not work if you said that this fish has dark spots. You know you get hundreds of species with dark spots. But, if you got down to a few species and you need to know how many they have ..." [P1 interview] Manually working with information
37. Conclusions Working with subdocuments is important and necessary in many scholarly tasks An SI-DL provides enhanced support to such scholarly tasks Treating subdocuments as first-class objects facilitates management, access, retrieval, and use of subdocuments and associated information Contributions Superimposed applications SI-DL definition (metamodel) and prototype (SuperIDR) Findings from user studies on use of SI in scholarly tasks Insights about subimage use in species identification Guidelines for SI-DL design Datasets (images, subimages, annotations)* 37
38. Future work Improved CBIR of subimages and improved combined search (e.g. transfer learning) Leverage existing collections to study applicability in other domains Crowdsourcing social media to study SI use in a social network context and the Participatory SI-DL, when personal and institutional DLs come together Comparison of various forms and functions of subdocuments and associated 38
40. Publications related to this research Published SuperIDR: A Tablet PC Tool for Image Description and Retrieval (WIPTE, 2010) A Teaching Tool for Parasitology: Enhancing Learning with Annotation and Image Retrieval (ECDL, 2010) Superimposed image description and retrieval for fish species identification (ECDL 2009) Species identification: fish images with cbir and annotations (JCDL poster, 2009) Superimposed information architecture for digital libraries (ECDL, 2008) From concepts to implementation and visualization: tools from a team-based approach to IR (SIGIR demo, 2008) Further development of a digital library curriculum: Evaluation approaches and new tools (ICADL, 2007) A superimposed information-supported digital library (JCDL doctoral consortium, 2007) Extending the 5S digital library (DL) framework: From a minimal DL towards a DL reference model (DLF workshop, JCDL, 2007) Enhancing concept mapping tools below and above to facilitate the use of superimposed information (CMC, 2006) Sierra - a superimposed application for enhanced image description and retrieval (ECDL demo, 2006) Using superimposed and context information to find and re-find sub-documents (PIM, 2006) SIMPEL: a superimposed multimedia presentation editor and player (JCDL demo, 2006) Planned A qualitative study on the use of subimages and of SuperIDR – a prototype digital library with superimposed information – in fish species identification (JCDL, 2011) Extending the 5s framework to provide support for cbir, complex objects, and superimposed information (journal paper) 40
41. Other published work Pedagogical Enhancements to a Course on Information Retrieval (TLIR, 2011) Sustainability of Bits, not just Atoms (CHI sustainability workshop, 2010) Using an iPhone Application for Diversity Recruitment (ASEE-SE, 2009) Building an ontology for crisis, tragedy and recovery (NKOS 2009) Curatorial Work and Learning in Virtual Environments: A Virtual World Project to Support the NDIIPP Community (JCDL Digital Preservation workshop, 2009) A Methodology and Tool Suite for Evaluation of Accuracy of Interoperating Statistical Natural Language Processing Engines (Interspeech 2008) VizBlog: a discovery tool for the blogosphere. (DigGov 2007) Re-finding from a Human Information Processing Perspective (PIM 2006) 41
44. Photo attributions (Flickr) A digital library by HacksHaven Art History With Chris And Mac 6/9: Manet: Lecture (Mme Manet and Leon) by moonflowerdragon Korean music by Homies In Heaven Old annotations by Lorianne DiSabato Reading Annotation by Rosa Say
47. Summary of findings of qualitative study 13 types of subimages/annotations from 940 subimages/annotations Subimages are important and necessary in fish identification Identification top down way Learning using multiple methods Context is important Combined search and using a complex object as a query SI-DL – bringing together capabilities 47
Species identification, analyzing paintings, studying architecture styles, analyzing medical images, etc.
Focus on infrastructure to work with marks
Results to date Case studiesSuperimposed applicationsSuperIDRSuperIDR evaluation – longitudinal and classroom-basedMetamodelWas able to answer questions about what does this DL contain, how might it be realized, how it compares with traditional methods of doing a task and to some extent how subimages/SI is used in scholarly tasksBut not yet answered – how SI-DL supports use of subimages in scholarly tasks? Opportunity to analyze deeper on use of subimages in scholarly tasks
Results to date Case studiesSuperimposed applicationsSuperIDRSuperIDR evaluation – longitudinal and classroom-basedMetamodelWas able to answer questions about what does this DL contain, how might it be realized, how it compares with traditional methods of doing a task and to some extent how subimages/SI is used in scholarly tasksBut not yet answered – how SI-DL supports use of subimages in scholarly tasks? Opportunity to analyze deeper on use of subimages in scholarly tasks
Results to date Case studiesSuperimposed applicationsSuperIDRSuperIDR evaluation – longitudinal and classroom-basedMetamodelWas able to answer questions about what does this DL contain, how might it be realized, how it compares with traditional methods of doing a task and to some extent how subimages/SI is used in scholarly tasksBut not yet answered – how SI-DL supports use of subimages in scholarly tasks? Opportunity to analyze deeper on use of subimages in scholarly tasks
Results to date Case studiesSuperimposed applicationsSuperIDRSuperIDR evaluation – longitudinal and classroom-basedMetamodelWas able to answer questions about what does this DL contain, how might it be realized, how it compares with traditional methods of doing a task and to some extent how subimages/SI is used in scholarly tasksBut not yet answered – how SI-DL supports use of subimages in scholarly tasks? Opportunity to analyze deeper on use of subimages in scholarly tasks
Results to date Case studiesSuperimposed applicationsSuperIDRSuperIDR evaluation – longitudinal and classroom-basedMetamodelWas able to answer questions about what does this DL contain, how might it be realized, how it compares with traditional methods of doing a task and to some extent how subimages/SI is used in scholarly tasksBut not yet answered – how SI-DL supports use of subimages in scholarly tasks? Opportunity to analyze deeper on use of subimages in scholarly tasks
Results to date Case studiesSuperimposed applicationsSuperIDRSuperIDR evaluation – longitudinal and classroom-basedMetamodelWas able to answer questions about what does this DL contain, how might it be realized, how it compares with traditional methods of doing a task and to some extent how subimages/SI is used in scholarly tasksBut not yet answered – how SI-DL supports use of subimages in scholarly tasks? Opportunity to analyze deeper on use of subimages in scholarly tasks
Results to date Case studiesSuperimposed applicationsSuperIDRSuperIDR evaluation – longitudinal and classroom-basedMetamodelWas able to answer questions about what does this DL contain, how might it be realized, how it compares with traditional methods of doing a task and to some extent how subimages/SI is used in scholarly tasksBut not yet answered – how SI-DL supports use of subimages in scholarly tasks? Opportunity to analyze deeper on use of subimages in scholarly tasks
Results to date Case studiesSuperimposed applicationsSuperIDRSuperIDR evaluation – longitudinal and classroom-basedMetamodelWas able to answer questions about what does this DL contain, how might it be realized, how it compares with traditional methods of doing a task and to some extent how subimages/SI is used in scholarly tasksBut not yet answered – how SI-DL supports use of subimages in scholarly tasks? Opportunity to analyze deeper on use of subimages in scholarly tasks
Collect qualitative data, in multiple ways and from multiple sources, on subimage and SuperIDR use in fish IDRecruit people with interest in fish IDHave a longer duration of use in natural setting and in targeted tasksHave them use SuperIDR on their own (data on use in the wild) and in targeted tasks (opportunity to observe use), so we have data on use that relates to task execution.Study setup – skype, interviews, etc.
3 week long studySetup, pre-study interview – for background information and species id practices, and training Week use (diaries) and tasks – first 2 weeksWeek use (diaries) – 3rd weekPost study interview on subimage use in species id, SuperIDR support of subimage use
P1, P2, P3, P4, P5, and P6Undergraduates – P2 and P6, recently taken Ichthyology, freshwater species knowledge relatively fresh in mind, transitioning from using/referring to several sources to internalizing that species id knowledge, species id in the classroom, assisted senior students in 1-2 projects on field Master’s students – P1 and P5 (recently took Ichthyology) work with a few species, just started on research projects, generally use memory or refer to a few books/websites/etc. PhD students – P3 and P5 have many years of experience, have done a lot of species id in the field and lab, work on select species, have almost internalized species id process. Still need to refer to information for fishes outside the ones that they work. Have developed their own styles of species id. For the most part, use these references to confirm fish identification
morphological description (shape, pattern, texture), size, color, presence, counts, location,morphological comparisons, multiple parts description, connections/relationships, comparison with other information-objects, (Not about parts) the information object as a whole, combination of aforementioned types
Use of subimages is necessary in fish identificationFish identification activities – learning and identifying speciesLearning methods vary, such as notecards, textbooks, identification key, notes, printed lists, lists of images in digital documents, websites, etc. Focus on location, habitat, species general physical appearance, distinguishing characteristics subimages.Species identification is typically a top-down approach – family, genus, species. Distinguishing charac./subimages used at genus/species level, usually to compare and contrast among very similar species eliminating choices and then arriving at the species. Typically identify in field (except while taking a class, wherea lot of id is in class using jarred/specimens), need to quickly id fish in order to release them alive (another reason for distinguishing charac.)Species vary in appearance – some charac. Are preserved such as black lines or markings, so might use that in identifying a fish.
Used from 3.20 hours to 7:15 hours, across task and non-task sessions. Identification of species using top-down approach described earlierCombined search, complex objects as queriesManual analysis of images is necessarySuperIDR feedbackbrings together tools to access information, well supported subimage use for learning about a species, since there is a lot of information to browse and learn.
SubdocumentsPreserve contextSupport multiple ways to describe, organize, access, retrieve, use, and re-use subdocuments and associated information Support manual as well as automatic ways to work with and process information
Improve CBIR on subimages and combined search – combined query and search, descriptors for this application, treating subimages separately from whole images, transfer learning, leveraging knowledge of types of subimages/annotations to improve searchLeverage existing collections to study applicability in other domains (flickr group photo notes)Crowdsourcing social media to study SI use in a social network context and the WWW – how do people use others’ tags on photos, others’ notes on images, others’ annotations on documents (kindle books)?, what activities do they use it for? , does SI and its use help/impact services (search, etc)?Participatory SI-DL, when personal and institutional DLs come together, how is SI now modeled, considering multiple users and institutions and uses? How can people share information and services in a reusable and interoperable manner in this participatory DL? What are the dynamics of users and uses in such a DL?Comparison of forms of subdocuments and associated information - Marshall’s study of annotations and types, Winget’s study of annotations on structured data (musical scores), subimages/annotation types