O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Media Suite: Unlocking Archives for Mixed Media Scholarly Research

48 visualizações

Publicada em

Presentation at the CLARIN 2018 Conference, October 2018, Pisa, Italy

Publicada em: Dados e análise
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Media Suite: Unlocking Archives for Mixed Media Scholarly Research

  1. 1. Media Suite: Unlocking Archives for Mixed Media Scholarly Research Roeland Ordelman - Technical coordinator CLARIAH Media Suite Netherlands Institute for Sound and Vision / University of Twente The Netherlands
  2. 2. Media Studies Focus on both “institutional” data collections and collections created by scholars
  3. 3. Welke data zitten in de Media Suite V3? Radio & Television (1.88M items) Newspapers (60M pages) Film (1129 films) Oral History (2744 interviews) MULTIMEDIA
  4. 4. Welke data zitten in de Media Suite V3? MIXEDMEDIA
  5. 5. RESEARCH PILOTS Cross-Medial Analysis of WW2 Eyewitness Testimonies Cross-media research of public debates on drugs and regulation Me and Myself: Tracing first person in documentary history in AV-collections Annotating EYE’s Jean Desmet Collection: Towards Mixed Media Analysis in Digital Media History Narrativizing Disruption: How exploratory search can support media researchers to interpret ‘disruptive’ media events as lucid narratives Remediation in Sports News clariah.nl/projecten/research-pilots
  6. 6. Media Suite: enabling Mixed Media Scholarly Research with Multi-media Data in a Sustainable Infrastructure
  7. 7. CLARIAH Centers Common Lab Research Infrastructure for the Arts and Humanities SUSTAINABLE üAvailable after the project üMaintenance and support üUpdates and upgrades
  8. 8. Architecture principles 1. Centers are responsible for data quality and to facilitate access to data 2. Authorized access using a federated authentication mechanism 3. Data is connected to a shared “workspace” (VRE) for various forms of analysis … 4. … that provides exports of data in various formats for using tools outside the closed environment 5. The Media Suite provides the interface on the underlying architecture
  9. 9. 1. Centers facilitate are responsible for data quality and access to data
  10. 10. REGISTER COLLECTION HARVEST COLLECTION METADATA SEARCH COLLECTION Collection Owner Media Suite Scholar CKAN web-based open source management system for the storage and distribution of open data Open Archive Initiative (OAI) ISSUE: Persistent link to source file ISSUE: IPR (e.g., no subtitles)
  11. 11. Example: DANS registers set Oral History Common Lab Research Infrastructure for the Humanities
  12. 12. “METADATA ARCHEOLOGY” Manual effort to describe metadata fields ISSUE: Resources manual effort
  13. 13. Tools for inspection of metadata Common Lab Research Infrastructure for the Humanities
  14. 14. 2. Authorized access using a federated authentication mechanism
  15. 15. Secure play-out and viewing ISSUE: Not always available
  16. 16. Federated login
  17. 17. 3. Data is connected to a shared “workspace” (VRE) for analysis ISSUE: Currently semi- shared
  18. 18. WORKSPACE ü Create virtual personal mixed media collections ü Create projects ü Stores annotations ü Upload personal collections ü Advanced Data Analysis (Jupyter Notebooks) ü Advanced Data processing ü Export annotations
  19. 19. Data analysis: Jupyter Notebooks or NLP Common Lab Research Infrastructure for the Humanities ISSUE: Robust pipelines
  20. 20. Write your own (Python) code to analyze the data in the Media Suite ISSUE: expertise
  21. 21. Example output Jupyter Notebook
  22. 22. Auto Metadata Extraction – Large scale speech recognition 350K hours processed until now
  23. 23. Poster slam 11:00 – 11:30 tomorrow
  24. 24. 4. Provide exports of data for tools outside
  25. 25. Media Suite is just an interface on the underlying infrastructure…. Speech Suite
  26. 26. Media Suite: Unlocking Archives for Mixed Media Scholarly Research
  27. 27. Co-development Community building User stories! Short iterations (sprints) of 2 weeks: development & testing • Information Specialist • Experienced DH Researcher Liaisons part of development team: Workshops, hack-a- thons, data-a-thons
  28. 28. Discussing issues with Gitter
  29. 29. Tracking issues with Github
  30. 30. SCHOLARLY PRIMITIVES Unsworth, 2000 Blanke and Hedges, 2013
  31. 31. “Unlock data” Distant reading Close reading
  32. 32. 1. Discovery & Inspection of data sets hidden in archives 2. Discovery of items in large archival data sets 3. Accessing items (play, view) from restricted data sets 4. Discovery of segments in time-based media 5. Relating and comparing data on the segment level DistantreadingClosereading
  33. 33. Search Oral History in Media Suite Common Lab Research Infrastructure for the Humanities
  34. 34. Project Search Bookmark Save Bookmark Save Query
  35. 35. Bookmark view View Source
  36. 36. Annotation view View SourceAlignment ISSUE: Complex interface
  37. 37. Private collection Apply enrichment or a “pipeline”
  38. 38. To appear: Content-based Cross-media Recommendations
  39. 39. 1. Registered collections: persistent link (data management) 2. Registered collections: rights don’t permit (legal) 3. Metadata archeology: manual resources (funding) 4. Play-out/view: not always available (funding) 5. Shared workspace: semi-shared (infra development) 6. Advanced analysis: expertise scholars (training) 7. Advanced analysis: robust pipelines (benchmarking) 8. Workspace: complex interface (interaction design) Issues/investments
  40. 40. Main contribution: enabling mixed media scholarly research for “institutional” multimedia collections Bringing the Tools to the Data: in progress but already useful: ü Unlocking the data, enabling distant/close reading ü Supporting the scholarly primitives ü Providing a workspace for saving annotations, creating collections and options for (advanced) analysis Summary…
  41. 41. Research coordination: Julia Noordegraaf @jjnoordegraaf Technical coordination: Roeland Ordelman @roelandordelman DEMO & QUESTIONS AT THE BAZAR mediasuite.clariah.nl

×