O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Challenges in Enabling Mixed Media Scholarly Research with Multi-Media Data in a Sustainable Infrastructure

131 visualizações

Publicada em

Presentation at the Digital Humanities 2018 Conference, Mexico City, on the development of the Media Suite, an online research environment that facilitates scholarly research using large multimedia collections maintained at archives, libraries and knowledge institutions. The Media Suite unlocks the data on the collection level, item level, and segment level, provides tools that are aligned with the scholarly primitives (discovery, annotation, comparison, linking), and has a 'workspace' for storing personal mixed media collections and annotations, and to do advanced analysis using Jupyter Notebooks and NLP tools.

See the notes for the narrative that goes with the slides.

Publicada em: Dados e análise
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Challenges in Enabling Mixed Media Scholarly Research with Multi-Media Data in a Sustainable Infrastructure

  1. 1. Challenges in Enabling Mixed Media Scholarly Research with Multi-media Data in a Sustainable Infrastructure Roeland Ordelman - Technical coordinator CLARIAH Media Suite Netherlands Institute for Sound and Vision / University of Twente The Netherlands
  2. 2. Challenges in Enabling Mixed Media Scholarly Research with Multi-media Data in a Sustainable Infrastructure
  3. 3. COMMON RESEARCH INFRASTRUCTURE Infrastructure – hardware, software, databases, people, and policies supporting scholar’s information management needs
  4. 4. CLARIAH Centers Common Lab Research Infrastructure for the Arts and Humanities SUSTAINABLE Available after the project Maintenance and support Updates and upgrades
  5. 5. Challenges in Enabling Mixed Media Scholarly Research with Multi-media Data in a Sustainable Infrastructure
  6. 6. Media Studies Focus on both “institutional” data collections and collections created by scholars
  7. 7. Media Suite Unlocking Archive Data for Mixed Media Scholarly Research
  8. 8. Welke data zitten in de Media Suite V3? Radio & Television (1.88M items) Newspapers (60M pages) Film (1129 films) Oral History (2744 interviews) MULTIMEDIA
  9. 9. Welke data zitten in de Media Suite V3? MIXEDMEDIA
  10. 10. Welke data zitten in de Media Suite V3? MIXEDMEDIA Program Guides Objects Radio & TV ratings Photos Video Audio
  11. 11. RESEARCH PILOTS Cross-Medial Analysis of WW2 Eyewitness Testimonies Cross-media research of public debates on drugs and regulation Me and Myself: Tracing first person in documentary history in AV-collections Annotating EYE’s Jean Desmet Collection: Towards Mixed Media Analysis in Digital Media History Narrativizing Disruption: How exploratory search can support media researchers to interpret ‘disruptive’ media events as lucid narratives Remediation in Sports News clariah.nl/projecten/research-pilots
  12. 12. Challenges in Enabling Mixed Media Scholarly Research with Multi-media Data in a Sustainable Infrastructure
  13. 13. “Unlock data” Distant reading Close reading
  14. 14. 1. Discovery & Inspection of data sets hidden in archives 2. Discovery of items in large archival data sets 3. Accessing items (play, view) from restricted data sets 4. Discovery of segments in time-based media 5. Relating and comparing data on the segment level DistantreadingClosereading
  15. 15. DATA SETS CKAN web-based open source management system for the storage and distribution of open data Open Archive Initiative (OAI)
  16. 16. Example: DANS registers set Oral History Common Lab Research Infrastructure for the Humanities
  17. 17. Inspect metadata in MediaSuite Common Lab Research Infrastructure for the Humanities
  18. 18. “METADATA ARCHAEOLOGY” Manual effort to describe metadata fields
  19. 19. 1. Discovery & Inspection of data sets hidden in archives 2. Discovery of items in large archival data sets 3. Accessing items (play, view) from restricted data sets 4. Discovery of segments in time-based media 5. Relating and comparing data on the segment level
  20. 20. Search Oral History in Media Suite Common Lab Research Infrastructure for the Humanities
  21. 21. 1. Discovery & Inspection of data sets hidden in archives 2. Discovery of items in large archival data sets 3. Accessing items (play, view) from restricted data sets 4. Discovery of segments in time-based media 5. Relating and comparing data on the segment level
  22. 22. Federated login – (Dutch) scholars only
  23. 23. 1. Discovery & Inspection of data sets hidden in archives 2. Discovery of items in large archival data sets 3. Accessing items (play, view) from restricted data sets 4. Discovery of segments in time-based media 5. Relating and comparing data on the segment level
  24. 24. Auto Metadata Extraction – Large scale speech recognition
  25. 25. 1. Discovery & Inspection of data sets hidden in archives 2. Discovery of items in large archival data sets 3. Accessing items (play, view) from restricted data sets 4. Discovery of segments in time-based media 5. Relating and comparing data on the segment level
  26. 26. To appear: Content-based Cross-media Recommendations
  27. 27. Design
  28. 28. Co-development Community building User stories! Short iterations (sprints) of 2 weeks: development & testing • Information Specialist • Experienced DH Researcher Liaisons part of development team: Workshops, hack-a- thons, data-a-thons
  29. 29. Discussing issues with Gitter
  30. 30. Tracking issues with Github
  31. 31. SCHOLARLY PRIMITIVES Unsworth, 2000 Blanke and Hedges, 2013
  32. 32. 1. Discover & Inspect collections 2. Faceted search on item level, granular search on segment level 3. Comparing queries within / cross collections 4. Linking collections 5. Annotation
  33. 33.  W3C annotation model  Requirement in multimedia: add timings or geometry  Keep alignment between annotation and source  Add your own vocabularies (e.g., onomy.org)  Use for filtering and clustering  Export for creating visualisations
  34. 34. Usability
  35. 35. WORKSPACE  Create virtual personal mixed media collections  Create projects  Stores annotations  Upload personal collections  Advanced Data Analysis (Jupyter Notebooks)  Advanced Data processing  Export annotations
  36. 36. Project Search Bookmark Save Bookmark Save Query
  37. 37. Bookmark view View Source
  38. 38. Annotation view View SourceAlignment
  39. 39. Data analysis: Jupyter Notebooks or NLP Common Lab Research Infrastructure for the Humanities
  40. 40. Write your own (Python) code to analyze the data in the Media Suite
  41. 41. Example output Jupyter Notebook
  42. 42. 1. Main contribution: enabling mixed media scholarly research for “institutional” multimedia collections 2. Bringing the Tools to the Data 3. Addressed challenges with respect to:  Unlocking the data, enabling distant/close reading  Design of the Media Suite, supporting the scholarly primitives  Usability of the tools, providing a workspace for saving results, collaboration and further (advanced) analysis 4. Lots of possible improvements, collections to add, issues to solve: funding for another 4 years! Summary…
  43. 43. Research coordination: Julia Noordegraaf @jjnoordegraaf Technical coordination: Roeland Ordelman @roelandordelman mediasuite.clariah.nl

×