SlideShare uma empresa Scribd logo
1 de 41
Baixar para ler offline
Helping communities access and explore
       their newspaper heritage.


     Rose Holley – Manager Newspaper Digitisation Program
         http://www.nla.gov.au/ndp rholley@nla.gov.au
             Australian Media Traditions Conference
      23 November 2007, Charles Sturt University, Bathurst
                                                             1
Status of the Program
November 2006 Minister for Arts and
Sports approval

Budget approval -$8 million for 3 million
pages over 4 years

Contracts signed with digitisation suppliers

April 2007 program pilot phase
commences
                                            2
Content and Coverage
National Content
                                                Northern
                                                Territory
Initially a title from each                     Times
    state
Focus on major titles
    from each state first
Anticipated that
    ‘regional’ titles may
                                                                           Courier Mail
    be contributed later

                              West Australian
Coverage: published                                  Advertiser      Sydney Gazette
   between 1803 – 1954                                            Canberra Times
(out of copyright)
                                                                  Argus

                                                                               Mercury


                                                                                      3
First Newspaper
        • First page of first
          Australian newspaper
          ever published


          The Sydney Gazette and New
          South Wales Advertiser
          Saturday March 5 1803




                                       4
Through 150 years
• Up to 1954 (when
  Copyright applies),
  and later if agreement
  with publishers.


The Argus 22 August 1945




                                 5
Relationship - ANPLAN
Website: http://www.nla.gov.au/anplan/




                                         6
Keep Up to Date with Progress
• Website: http://www.nla.gov.au/ndp/




                                        7
National Help
• NLA working with State and Territory
  Libraries as part of ANPLAN.
• Libraries suggest titles and dates and
  provide microfilm for digitising.
• ANPLAN members and other stakeholders
  will provide feedback on the search and
  delivery prototype.
• Developing model for national contribution
  of regional newspapers.
                                           8
Process in brief
 National sourcing of selected newspaper microfilm
                       masters.

Masters scanned by Contractor, Sydney to tiff files.

   NLA perform quality assurance, add metadata.

Contractor, India process tiff files - OCR, zoning, xml
                       markup.

NLA QA files, ingest to system, create derivatives for
                        delivery.


                                                          9
Logistics
Australia (State Capitals – Sydney/Canberra)
USA (Virginia) - India (Hyderabad, Chennai)




                                           10
6 Month Progress
• IT Infrastructure and storage implemented at NLA

• Content management and ingest software developed by
  NLA to support workflow

• Quality assurance and production software developed by
  US/India contractor

• Pilot data sent to contractors to test workflows, systems
  and software against agreed project spec.


                                                              11
Next 6 months
• Acceptance of pilot data then commence
  production phase (3 million pages)

• Development of search and delivery prototype

• Public launch of service with a good body of
  content in 2008

• Progressive addition of content – national
  program ongoing
                                                 12
Technology – internal NLA
 Old newspapers being processed and delivered
          using latest digital technology

• NLA developing in house:
  – Ingest and storage system
  – Workflow and content management system including
    quality assurance module
  – Search and delivery system

• NLA providing:
  – System Infrastructure
     (storage, backup, disaster recovery)

                                                   13
Infrastructure and Storage




    Online Storage – 70 TB:
•   Working space for images in processing 40TB for 1 million pages
•   Search and delivery derivatives 30TB for 3 million pages
•   XML files, database systems and indexes 1 TB

    Offline Storage – unlimited for master images on tape.

                                                                      14
Establishing Workflows




                         15
Technology - external
• Scanning microfilm
  using
  Flexscan/Eclipse
  scanner and latest
  software (nextstar)
  from NextScan
  www.nextscan.com

20,000 pages a week.

                                16
Scanning Contractor




                      17
Digital Images returned to NLA




                                 18
Quality Assurance at NLA
                Use 2 widescreen
                monitors placed
                vertically. Can view
                complete page
                within context of
                issue.

                Add metadata, sort
                out missing and
                duplicate pages
                within an issue.

                Prepare batches to
                send for OCR.
                                     19
Metadata




           20
Page verification




                    21
22
Technology - external
Software developed to:
• Zone areas and articles on a page
• Flag continuing articles across multiple pages
• Categorise articles on a page
• OCR text on a page
• Re-key headings and first 4 lines of text.
• Deliver XML files (ALTO) and METS/MODS
  files.

                                                   23
India Facility - Hyderabad




                             24
25
Quality Assurance




                    26
OCR Accuracy




               27
Batch reporting




                  28
Acceptance Criteria




                      29
Prototype Development
Under discussion:
• Derivative sizes and zoom technology
  testing
• Search and Browse features
• Results and refinement of results
• User interaction with source (web 2.0)
• Interface design

                                           30
Digital Newspaper Searching
• Newspapers full text searchable
• Image captions searchable
• Search across multiple papers e.g. by
  persons name.
• Refine searching by:
  – Date
  – Newspaper title
  – State published
                                          31
Refine search by categories
•   News
•   Advertising
•   Birth Death Marriage notices
•   Obituaries
•   Editorial commentary and letters
•   Shipping News
•   Arts and leisure
•   Detailed lists, results, guides
                                       32
Search Illustrations
Categorised as:
• Photo
• Cartoon
• Map
• Graph
• Illustration
Captions searchable
                      Canberra Times 26 July 1928 page 6
                                                           33
Browsing and Viewing
• Browse papers page by page
• Zoom in and out of image
  – to read small text
  – to view context of article within page layout


• Print article or entire page or issue



                                                    34
Zoom technology




                  35
Testing derivative sizes and zooming




                                 36
Prototype wireframe




                      37
Other features
Under discussion:
• OCR correction by users
• Personal annotation of articles by users
• Tagging results
• Creating public sets (for historical events)
• Clustering results
• Searching across other relevant resources (paid
  subscription services, international resources,
  other digital resources)
                                                38
Prototype release
• To be released to stakeholders who have
  given microfilm content
• Stakeholders able to view their data
• Feedback on data quality and search
  functionality
• Amendments made and then ‘search and
  delivery version 1’ released to a wider
  group for testing and feedback before
  public launch in 2008.
                                            39
Pilot Data
• Canberra Times
• Sydney Gazette
• Northern Territory Times
• South Australia Advertiser
• Hobart Town Gazette, Courier, Colonial, Mercury
• Melbourne Argus
• Perth Gazette
• West Australian
• Brisbane Courier Mail
(12 titles, 8000 issues = 50,000 pages = 500,000 articles)


                                                             40
http://www.nla.gov.au/ndp   41

Mais conteúdo relacionado

Semelhante a The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

Building and Managing Online Communities
Building and Managing Online CommunitiesBuilding and Managing Online Communities
Building and Managing Online CommunitiesRose Holley
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Future Perfect 2012
 
Putting the Pieces Together: Creating a National Educational Television Catalog
Putting the Pieces Together: Creating a National Educational Television CatalogPutting the Pieces Together: Creating a National Educational Television Catalog
Putting the Pieces Together: Creating a National Educational Television CatalogWGBH Media Library and Archives
 
Biodiversity Information Networks: dataflows for interdisciplinary science
Biodiversity Information Networks: dataflows for interdisciplinary scienceBiodiversity Information Networks: dataflows for interdisciplinary science
Biodiversity Information Networks: dataflows for interdisciplinary scienceBruno Danis
 
Biodiversity Information Networks: Dataflows for interdisciplinary sciences
Biodiversity Information Networks: Dataflows for interdisciplinary sciencesBiodiversity Information Networks: Dataflows for interdisciplinary sciences
Biodiversity Information Networks: Dataflows for interdisciplinary sciencesGBIF_NPT
 
Which Came First, the Data Structure or the Website?: Lessons Learned in Buil...
Which Came First, the Data Structure or the Website?:Lessons Learned in Buil...Which Came First, the Data Structure or the Website?:Lessons Learned in Buil...
Which Came First, the Data Structure or the Website?: Lessons Learned in Buil...Ellice Engdahl
 
Moving an Archive from Tape to Disk: A Case-Study at ICPSR
Moving an Archive from Tape to Disk: A Case-Study at ICPSRMoving an Archive from Tape to Disk: A Case-Study at ICPSR
Moving an Archive from Tape to Disk: A Case-Study at ICPSRBryan Beecher
 
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사Chris
 
Tools for Measurements and Analysis
Tools for Measurements and AnalysisTools for Measurements and Analysis
Tools for Measurements and AnalysisRIPE NCC
 
Chemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the DesktopChemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the DesktopMarcus Hanwell
 
2012.03.20 ihr farquhar v03
2012.03.20 ihr   farquhar v032012.03.20 ihr   farquhar v03
2012.03.20 ihr farquhar v03Digital History
 
Open Source Visualization of Scientific Data
Open Source Visualization of Scientific DataOpen Source Visualization of Scientific Data
Open Source Visualization of Scientific DataMarcus Hanwell
 
Kerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensorsKerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensorsWeb Directions
 
How the Web of Data Will be Won
How the Web of Data Will be WonHow the Web of Data Will be Won
How the Web of Data Will be WonJeni Tennison
 
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Larry Smarr
 
NCIP Update - NISO Update, ALA Annual Chicago 2013
NCIP Update - NISO Update, ALA Annual Chicago 2013NCIP Update - NISO Update, ALA Annual Chicago 2013
NCIP Update - NISO Update, ALA Annual Chicago 2013nettiel
 

Semelhante a The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007 (20)

Building and Managing Online Communities
Building and Managing Online CommunitiesBuilding and Managing Online Communities
Building and Managing Online Communities
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
 
Putting the Pieces Together: Creating a National Educational Television Catalog
Putting the Pieces Together: Creating a National Educational Television CatalogPutting the Pieces Together: Creating a National Educational Television Catalog
Putting the Pieces Together: Creating a National Educational Television Catalog
 
Bertenthal
BertenthalBertenthal
Bertenthal
 
Geoservices Activities at EDINA
Geoservices Activities at EDINAGeoservices Activities at EDINA
Geoservices Activities at EDINA
 
Biodiversity Information Networks: dataflows for interdisciplinary science
Biodiversity Information Networks: dataflows for interdisciplinary scienceBiodiversity Information Networks: dataflows for interdisciplinary science
Biodiversity Information Networks: dataflows for interdisciplinary science
 
Biodiversity Information Networks: Dataflows for interdisciplinary sciences
Biodiversity Information Networks: Dataflows for interdisciplinary sciencesBiodiversity Information Networks: Dataflows for interdisciplinary sciences
Biodiversity Information Networks: Dataflows for interdisciplinary sciences
 
Which Came First, the Data Structure or the Website?: Lessons Learned in Buil...
Which Came First, the Data Structure or the Website?:Lessons Learned in Buil...Which Came First, the Data Structure or the Website?:Lessons Learned in Buil...
Which Came First, the Data Structure or the Website?: Lessons Learned in Buil...
 
ConfrencePres
ConfrencePresConfrencePres
ConfrencePres
 
Moving an Archive from Tape to Disk: A Case-Study at ICPSR
Moving an Archive from Tape to Disk: A Case-Study at ICPSRMoving an Archive from Tape to Disk: A Case-Study at ICPSR
Moving an Archive from Tape to Disk: A Case-Study at ICPSR
 
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
 
Tools for Measurements and Analysis
Tools for Measurements and AnalysisTools for Measurements and Analysis
Tools for Measurements and Analysis
 
Chemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the DesktopChemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the Desktop
 
Radio Free Asia
Radio Free AsiaRadio Free Asia
Radio Free Asia
 
2012.03.20 ihr farquhar v03
2012.03.20 ihr   farquhar v032012.03.20 ihr   farquhar v03
2012.03.20 ihr farquhar v03
 
Open Source Visualization of Scientific Data
Open Source Visualization of Scientific DataOpen Source Visualization of Scientific Data
Open Source Visualization of Scientific Data
 
Kerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensorsKerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensors
 
How the Web of Data Will be Won
How the Web of Data Will be WonHow the Web of Data Will be Won
How the Web of Data Will be Won
 
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
 
NCIP Update - NISO Update, ALA Annual Chicago 2013
NCIP Update - NISO Update, ALA Annual Chicago 2013NCIP Update - NISO Update, ALA Annual Chicago 2013
NCIP Update - NISO Update, ALA Annual Chicago 2013
 

Mais de Rose Holley

The strategic rebuilding and positioning of UNSW Canberra Special Collections...
The strategic rebuilding and positioning of UNSW Canberra Special Collections...The strategic rebuilding and positioning of UNSW Canberra Special Collections...
The strategic rebuilding and positioning of UNSW Canberra Special Collections...Rose Holley
 
Crowdsourcing based curation and user engagement in digital library design
Crowdsourcing based curation and user engagement in digital library designCrowdsourcing based curation and user engagement in digital library design
Crowdsourcing based curation and user engagement in digital library designRose Holley
 
National Archives of Australia. AVAMS Project Achievements August 2014
National Archives of Australia. AVAMS Project Achievements August 2014National Archives of Australia. AVAMS Project Achievements August 2014
National Archives of Australia. AVAMS Project Achievements August 2014Rose Holley
 
Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...Rose Holley
 
Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...
Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...
Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...Rose Holley
 
The Australian Women's Weekly now available in Trove: An overview of the digi...
The Australian Women's Weekly now available in Trove: An overview of the digi...The Australian Women's Weekly now available in Trove: An overview of the digi...
The Australian Women's Weekly now available in Trove: An overview of the digi...Rose Holley
 
Ideas for how volunteers at cultural heritage institutions can help, using Tr...
Ideas for how volunteers at cultural heritage institutions can help, using Tr...Ideas for how volunteers at cultural heritage institutions can help, using Tr...
Ideas for how volunteers at cultural heritage institutions can help, using Tr...Rose Holley
 
Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...
Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...
Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...Rose Holley
 
Crowdsourcing Strategies for Archives, Nov 2010
Crowdsourcing Strategies for Archives, Nov 2010Crowdsourcing Strategies for Archives, Nov 2010
Crowdsourcing Strategies for Archives, Nov 2010Rose Holley
 
Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...Rose Holley
 
A model for incorporating e-resources into Trove, September 2010
A model for incorporating e-resources into Trove, September 2010A model for incorporating e-resources into Trove, September 2010
A model for incorporating e-resources into Trove, September 2010Rose Holley
 
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...Developments in Access to Art Information: Trove. Presentation at ARLIS confe...
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...Rose Holley
 
Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...
Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...
Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...Rose Holley
 
Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...
Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...
Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...Rose Holley
 
Trove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian ParliamentTrove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian ParliamentRose Holley
 
Legal Research using digitised historic Australian Newspapers August 2010, by...
Legal Research using digitised historic Australian Newspapers August 2010, by...Legal Research using digitised historic Australian Newspapers August 2010, by...
Legal Research using digitised historic Australian Newspapers August 2010, by...Rose Holley
 
Trove: Explore Like Never Before. Key Features of Trove May 2010
Trove: Explore Like Never Before. Key Features of Trove May 2010Trove: Explore Like Never Before. Key Features of Trove May 2010
Trove: Explore Like Never Before. Key Features of Trove May 2010Rose Holley
 
Trove: Innovation In Access To Information. June 2010
Trove: Innovation In Access To Information. June 2010Trove: Innovation In Access To Information. June 2010
Trove: Innovation In Access To Information. June 2010Rose Holley
 
Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...
Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...
Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...Rose Holley
 
Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...
Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...
Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...Rose Holley
 

Mais de Rose Holley (20)

The strategic rebuilding and positioning of UNSW Canberra Special Collections...
The strategic rebuilding and positioning of UNSW Canberra Special Collections...The strategic rebuilding and positioning of UNSW Canberra Special Collections...
The strategic rebuilding and positioning of UNSW Canberra Special Collections...
 
Crowdsourcing based curation and user engagement in digital library design
Crowdsourcing based curation and user engagement in digital library designCrowdsourcing based curation and user engagement in digital library design
Crowdsourcing based curation and user engagement in digital library design
 
National Archives of Australia. AVAMS Project Achievements August 2014
National Archives of Australia. AVAMS Project Achievements August 2014National Archives of Australia. AVAMS Project Achievements August 2014
National Archives of Australia. AVAMS Project Achievements August 2014
 
Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...
 
Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...
Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...
Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...
 
The Australian Women's Weekly now available in Trove: An overview of the digi...
The Australian Women's Weekly now available in Trove: An overview of the digi...The Australian Women's Weekly now available in Trove: An overview of the digi...
The Australian Women's Weekly now available in Trove: An overview of the digi...
 
Ideas for how volunteers at cultural heritage institutions can help, using Tr...
Ideas for how volunteers at cultural heritage institutions can help, using Tr...Ideas for how volunteers at cultural heritage institutions can help, using Tr...
Ideas for how volunteers at cultural heritage institutions can help, using Tr...
 
Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...
Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...
Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...
 
Crowdsourcing Strategies for Archives, Nov 2010
Crowdsourcing Strategies for Archives, Nov 2010Crowdsourcing Strategies for Archives, Nov 2010
Crowdsourcing Strategies for Archives, Nov 2010
 
Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...
 
A model for incorporating e-resources into Trove, September 2010
A model for incorporating e-resources into Trove, September 2010A model for incorporating e-resources into Trove, September 2010
A model for incorporating e-resources into Trove, September 2010
 
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...Developments in Access to Art Information: Trove. Presentation at ARLIS confe...
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...
 
Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...
Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...
Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...
 
Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...
Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...
Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...
 
Trove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian ParliamentTrove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian Parliament
 
Legal Research using digitised historic Australian Newspapers August 2010, by...
Legal Research using digitised historic Australian Newspapers August 2010, by...Legal Research using digitised historic Australian Newspapers August 2010, by...
Legal Research using digitised historic Australian Newspapers August 2010, by...
 
Trove: Explore Like Never Before. Key Features of Trove May 2010
Trove: Explore Like Never Before. Key Features of Trove May 2010Trove: Explore Like Never Before. Key Features of Trove May 2010
Trove: Explore Like Never Before. Key Features of Trove May 2010
 
Trove: Innovation In Access To Information. June 2010
Trove: Innovation In Access To Information. June 2010Trove: Innovation In Access To Information. June 2010
Trove: Innovation In Access To Information. June 2010
 
Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...
Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...
Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...
 
Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...
Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...
Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...
 

Último

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

  • 1. Helping communities access and explore their newspaper heritage. Rose Holley – Manager Newspaper Digitisation Program http://www.nla.gov.au/ndp rholley@nla.gov.au Australian Media Traditions Conference 23 November 2007, Charles Sturt University, Bathurst 1
  • 2. Status of the Program November 2006 Minister for Arts and Sports approval Budget approval -$8 million for 3 million pages over 4 years Contracts signed with digitisation suppliers April 2007 program pilot phase commences 2
  • 3. Content and Coverage National Content Northern Territory Initially a title from each Times state Focus on major titles from each state first Anticipated that ‘regional’ titles may Courier Mail be contributed later West Australian Coverage: published Advertiser Sydney Gazette between 1803 – 1954 Canberra Times (out of copyright) Argus Mercury 3
  • 4. First Newspaper • First page of first Australian newspaper ever published The Sydney Gazette and New South Wales Advertiser Saturday March 5 1803 4
  • 5. Through 150 years • Up to 1954 (when Copyright applies), and later if agreement with publishers. The Argus 22 August 1945 5
  • 6. Relationship - ANPLAN Website: http://www.nla.gov.au/anplan/ 6
  • 7. Keep Up to Date with Progress • Website: http://www.nla.gov.au/ndp/ 7
  • 8. National Help • NLA working with State and Territory Libraries as part of ANPLAN. • Libraries suggest titles and dates and provide microfilm for digitising. • ANPLAN members and other stakeholders will provide feedback on the search and delivery prototype. • Developing model for national contribution of regional newspapers. 8
  • 9. Process in brief National sourcing of selected newspaper microfilm masters. Masters scanned by Contractor, Sydney to tiff files. NLA perform quality assurance, add metadata. Contractor, India process tiff files - OCR, zoning, xml markup. NLA QA files, ingest to system, create derivatives for delivery. 9
  • 10. Logistics Australia (State Capitals – Sydney/Canberra) USA (Virginia) - India (Hyderabad, Chennai) 10
  • 11. 6 Month Progress • IT Infrastructure and storage implemented at NLA • Content management and ingest software developed by NLA to support workflow • Quality assurance and production software developed by US/India contractor • Pilot data sent to contractors to test workflows, systems and software against agreed project spec. 11
  • 12. Next 6 months • Acceptance of pilot data then commence production phase (3 million pages) • Development of search and delivery prototype • Public launch of service with a good body of content in 2008 • Progressive addition of content – national program ongoing 12
  • 13. Technology – internal NLA Old newspapers being processed and delivered using latest digital technology • NLA developing in house: – Ingest and storage system – Workflow and content management system including quality assurance module – Search and delivery system • NLA providing: – System Infrastructure (storage, backup, disaster recovery) 13
  • 14. Infrastructure and Storage Online Storage – 70 TB: • Working space for images in processing 40TB for 1 million pages • Search and delivery derivatives 30TB for 3 million pages • XML files, database systems and indexes 1 TB Offline Storage – unlimited for master images on tape. 14
  • 16. Technology - external • Scanning microfilm using Flexscan/Eclipse scanner and latest software (nextstar) from NextScan www.nextscan.com 20,000 pages a week. 16
  • 19. Quality Assurance at NLA Use 2 widescreen monitors placed vertically. Can view complete page within context of issue. Add metadata, sort out missing and duplicate pages within an issue. Prepare batches to send for OCR. 19
  • 20. Metadata 20
  • 22. 22
  • 23. Technology - external Software developed to: • Zone areas and articles on a page • Flag continuing articles across multiple pages • Categorise articles on a page • OCR text on a page • Re-key headings and first 4 lines of text. • Deliver XML files (ALTO) and METS/MODS files. 23
  • 24. India Facility - Hyderabad 24
  • 25. 25
  • 30. Prototype Development Under discussion: • Derivative sizes and zoom technology testing • Search and Browse features • Results and refinement of results • User interaction with source (web 2.0) • Interface design 30
  • 31. Digital Newspaper Searching • Newspapers full text searchable • Image captions searchable • Search across multiple papers e.g. by persons name. • Refine searching by: – Date – Newspaper title – State published 31
  • 32. Refine search by categories • News • Advertising • Birth Death Marriage notices • Obituaries • Editorial commentary and letters • Shipping News • Arts and leisure • Detailed lists, results, guides 32
  • 33. Search Illustrations Categorised as: • Photo • Cartoon • Map • Graph • Illustration Captions searchable Canberra Times 26 July 1928 page 6 33
  • 34. Browsing and Viewing • Browse papers page by page • Zoom in and out of image – to read small text – to view context of article within page layout • Print article or entire page or issue 34
  • 36. Testing derivative sizes and zooming 36
  • 38. Other features Under discussion: • OCR correction by users • Personal annotation of articles by users • Tagging results • Creating public sets (for historical events) • Clustering results • Searching across other relevant resources (paid subscription services, international resources, other digital resources) 38
  • 39. Prototype release • To be released to stakeholders who have given microfilm content • Stakeholders able to view their data • Feedback on data quality and search functionality • Amendments made and then ‘search and delivery version 1’ released to a wider group for testing and feedback before public launch in 2008. 39
  • 40. Pilot Data • Canberra Times • Sydney Gazette • Northern Territory Times • South Australia Advertiser • Hobart Town Gazette, Courier, Colonial, Mercury • Melbourne Argus • Perth Gazette • West Australian • Brisbane Courier Mail (12 titles, 8000 issues = 50,000 pages = 500,000 articles) 40