SlideShare uma empresa Scribd logo
1 de 55
“ Many Hands Make Light Work” ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
http://www.nla.gov.au/ndp
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Objectives
National Program and Content ,[object Object],[object Object],[object Object],[object Object],West Australian Northern Territory Times Courier Mail Advertiser Sydney Morning Herald Sydney Gazette Argus Mercury Canberra Times
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Behind the scenes… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The technical bit
Development cycle ,[object Object],[object Object],[object Object],[object Object]
http://ndpbeta.nla.gov.au Home page of beta
Search words Dec 2008 Search words – December 2008 www.wordle.net
Search phrases Dec 2008 www.wordle.net
 
 
User interaction ,[object Object],[object Object],[object Object]
To login or not to login?
Browse by page or search
Interaction at article level
Add a tag ‘titanic sinking’
 
Add a comment
OCR text on left for correcting
After enhancements
Tag cloud or tag fog??
Most used tag
Tagging enables ‘marking records’
User profile page
Text Correction – method 1
Text correction – method 2
One article corrected by many
View all corrections on this article
Births, Deaths and Marriages
Many different users correct just the names
Comments 1. Some users add further information about the content and people mentioned in article
Comments 2. Some users add notes on the physical state of the image or difficulties they are having with text correction.
Sample of user activity Nov 08 ,[object Object],[object Object],[object Object]
Text correction activity
Top text correctors ,[object Object],[object Object]
Big picture rankings
Measuring improvements
“ Who are the text correctors?” Flickr: LucLeqay
Why correct text? ,[object Object],[object Object],[object Object]
Motivating factors ,[object Object],[object Object],[object Object],[object Object],[object Object],http://www.pickthebrain.com/blog/21-proven-motivation-tactics/
Maintaining motivation ,[object Object],[object Object],[object Object],[object Object]
Profiles of top correctors
 
 
Understanding genealogists ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Opinions of users ,[object Object],[object Object],[object Object],[object Object],[object Object],http://www.nla.gov.au/ndp/project_details/documents/ANDP_TextCorrectionComments.pdf http://www.nla.gov.au/ndp/project_details/documents/ANDP_PositiveFeedbackBetaDec2008.pdf
Requests from users ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tricky questions How to improve the method and what should ‘power user mode’ be like?
Notes and Tags – Associated with.. Line, word…
Lessons learnt ,[object Object],[object Object],[object Object],[object Object]
The power ,[object Object],[object Object],[object Object],[object Object]
Future potential of text enhancement ,[object Object],[object Object],[object Object]
Website:  http://www.nla.gov.au/ndp

Mais conteúdo relacionado

Mais procurados

Online resources for new distance learners
Online resources for new distance learnersOnline resources for new distance learners
Online resources for new distance learnersctfissel
 
Monitoring the Impact of Your Strategies
Monitoring the Impact of Your StrategiesMonitoring the Impact of Your Strategies
Monitoring the Impact of Your Strategieslisbk
 
HNFE 3224: Understanding Social Information
HNFE 3224:  Understanding Social InformationHNFE 3224:  Understanding Social Information
HNFE 3224: Understanding Social InformationRebecca Kate Miller
 
EDUC 601 Library Presentation
EDUC 601 Library PresentationEDUC 601 Library Presentation
EDUC 601 Library Presentationkmokopp
 
Professional Development in Challenging Times: Tools and Techniques for Keep...
Professional Development in Challenging Times:  Tools and Techniques for Keep...Professional Development in Challenging Times:  Tools and Techniques for Keep...
Professional Development in Challenging Times: Tools and Techniques for Keep...Rebecca Kate Miller
 
Building bibliographies and managing citations with Mendeley
Building bibliographies and managing citations with MendeleyBuilding bibliographies and managing citations with Mendeley
Building bibliographies and managing citations with MendeleyAda Giannatelli
 
Introduction To Facebook: Opportunities and Challenges For The Institution
Introduction To Facebook: Opportunities and Challenges For The InstitutionIntroduction To Facebook: Opportunities and Challenges For The Institution
Introduction To Facebook: Opportunities and Challenges For The Institutionlisbk
 
Social Citation
Social CitationSocial Citation
Social CitationVitae
 
Online Citation Tools
Online Citation ToolsOnline Citation Tools
Online Citation Toolswill wade
 
Using & Creating Social Information
Using & Creating Social InformationUsing & Creating Social Information
Using & Creating Social InformationRebecca Kate Miller
 
SLA2012 Computer Science Roundtable: Supporting the Researcher Workflow throu...
SLA2012 Computer Science Roundtable: Supporting the Researcher Workflow throu...SLA2012 Computer Science Roundtable: Supporting the Researcher Workflow throu...
SLA2012 Computer Science Roundtable: Supporting the Researcher Workflow throu...William Gunn
 
Selecting a reference management program
Selecting a reference management programSelecting a reference management program
Selecting a reference management programRosemary Rodd
 

Mais procurados (20)

Internet research
Internet researchInternet research
Internet research
 
Online resources for new distance learners
Online resources for new distance learnersOnline resources for new distance learners
Online resources for new distance learners
 
Monitoring the Impact of Your Strategies
Monitoring the Impact of Your StrategiesMonitoring the Impact of Your Strategies
Monitoring the Impact of Your Strategies
 
Twitter
TwitterTwitter
Twitter
 
HNFE 3224: Understanding Social Information
HNFE 3224:  Understanding Social InformationHNFE 3224:  Understanding Social Information
HNFE 3224: Understanding Social Information
 
EDUC 601 Library Presentation
EDUC 601 Library PresentationEDUC 601 Library Presentation
EDUC 601 Library Presentation
 
Professional Development in Challenging Times: Tools and Techniques for Keep...
Professional Development in Challenging Times:  Tools and Techniques for Keep...Professional Development in Challenging Times:  Tools and Techniques for Keep...
Professional Development in Challenging Times: Tools and Techniques for Keep...
 
Mendeley slide
Mendeley slideMendeley slide
Mendeley slide
 
The Tools of Web 2.0
The Tools of Web 2.0The Tools of Web 2.0
The Tools of Web 2.0
 
Finding and Using E-Books
Finding and Using E-BooksFinding and Using E-Books
Finding and Using E-Books
 
Building bibliographies and managing citations with Mendeley
Building bibliographies and managing citations with MendeleyBuilding bibliographies and managing citations with Mendeley
Building bibliographies and managing citations with Mendeley
 
HNFE 2014 Library Lecture
HNFE 2014 Library LectureHNFE 2014 Library Lecture
HNFE 2014 Library Lecture
 
Introduction To Facebook: Opportunities and Challenges For The Institution
Introduction To Facebook: Opportunities and Challenges For The InstitutionIntroduction To Facebook: Opportunities and Challenges For The Institution
Introduction To Facebook: Opportunities and Challenges For The Institution
 
Mendeley reference management tool
Mendeley reference management toolMendeley reference management tool
Mendeley reference management tool
 
Social Citation
Social CitationSocial Citation
Social Citation
 
Online Citation Tools
Online Citation ToolsOnline Citation Tools
Online Citation Tools
 
Using & Creating Social Information
Using & Creating Social InformationUsing & Creating Social Information
Using & Creating Social Information
 
RSS in Education
RSS in EducationRSS in Education
RSS in Education
 
SLA2012 Computer Science Roundtable: Supporting the Researcher Workflow throu...
SLA2012 Computer Science Roundtable: Supporting the Researcher Workflow throu...SLA2012 Computer Science Roundtable: Supporting the Researcher Workflow throu...
SLA2012 Computer Science Roundtable: Supporting the Researcher Workflow throu...
 
Selecting a reference management program
Selecting a reference management programSelecting a reference management program
Selecting a reference management program
 

Destaque

Poverty and unemployment
Poverty and unemploymentPoverty and unemployment
Poverty and unemploymentranadhive
 
Unemployment and poverty
Unemployment and povertyUnemployment and poverty
Unemployment and povertyHiran Patel
 
Presentation for public awareness
Presentation for public awarenessPresentation for public awareness
Presentation for public awarenessdrmcbansal
 
POVERTY IN INDIA
POVERTY IN INDIAPOVERTY IN INDIA
POVERTY IN INDIAINDIASJJK
 
Public awareness to protect environment
Public awareness to protect environmentPublic awareness to protect environment
Public awareness to protect environmentAbhilasha Lahigude
 
Environmental Awareness Presentation
Environmental Awareness PresentationEnvironmental Awareness Presentation
Environmental Awareness Presentationrfelters
 
Internal and external business environment
Internal and external business environmentInternal and external business environment
Internal and external business environmentAashish Sahi
 

Destaque (11)

Poverty and unemployment
Poverty and unemploymentPoverty and unemployment
Poverty and unemployment
 
Unemployment and poverty
Unemployment and povertyUnemployment and poverty
Unemployment and poverty
 
Presentation for public awareness
Presentation for public awarenessPresentation for public awareness
Presentation for public awareness
 
Poverty in india
Poverty in indiaPoverty in india
Poverty in india
 
POVERTY IN INDIA
POVERTY IN INDIAPOVERTY IN INDIA
POVERTY IN INDIA
 
Poverty In India
Poverty In IndiaPoverty In India
Poverty In India
 
Poverty
PovertyPoverty
Poverty
 
Public awareness to protect environment
Public awareness to protect environmentPublic awareness to protect environment
Public awareness to protect environment
 
Environmental Awareness Presentation
Environmental Awareness PresentationEnvironmental Awareness Presentation
Environmental Awareness Presentation
 
Save Environment PPT
Save Environment PPTSave Environment PPT
Save Environment PPT
 
Internal and external business environment
Internal and external business environmentInternal and external business environment
Internal and external business environment
 

Semelhante a Many Hands Make Light Work: Public Collaborative Text Correction in Australian Historic Newspapers. Keynote. April 2009

Enhancement and Enrichment of Digital Content by User Communities: The Aust...
Enhancement and Enrichment of Digital Content by User Communities: The Aust...Enhancement and Enrichment of Digital Content by User Communities: The Aust...
Enhancement and Enrichment of Digital Content by User Communities: The Aust...guest6a9161
 
Enhancement and Enrichment of Digital Content by User Communities: The Austra...
Enhancement and Enrichment of Digital Content by User Communities: The Austra...Enhancement and Enrichment of Digital Content by User Communities: The Austra...
Enhancement and Enrichment of Digital Content by User Communities: The Austra...Rose Holley
 
Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...
Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...
Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...Rose Holley
 
Intranet 2.0 School: Building the essential staff intranet for your library
Intranet 2.0 School: Building the essential staff intranet for your libraryIntranet 2.0 School: Building the essential staff intranet for your library
Intranet 2.0 School: Building the essential staff intranet for your libraryChris Evjy
 
Tagging - Can User Generated Content Improve Our Services?
Tagging - Can User Generated Content Improve Our Services?Tagging - Can User Generated Content Improve Our Services?
Tagging - Can User Generated Content Improve Our Services?guestff5a190a
 
Hvordan få søk til å fungere effektivt
Hvordan få søk til å fungere effektivtHvordan få søk til å fungere effektivt
Hvordan få søk til å fungere effektivtKristian Norling
 
#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)
#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)
#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)Shadeed Eleazer
 
Intranet 2.0 by Toby Ward, Prescient Digital Media
Intranet 2.0 by Toby Ward, Prescient Digital MediaIntranet 2.0 by Toby Ward, Prescient Digital Media
Intranet 2.0 by Toby Ward, Prescient Digital MediaPrescient Digital Media
 
Defining the Damn Data
Defining the Damn DataDefining the Damn Data
Defining the Damn DataJen Matson
 
Assessment 2DescriptionFocusEssayValue50Due D.docx
Assessment 2DescriptionFocusEssayValue50Due D.docxAssessment 2DescriptionFocusEssayValue50Due D.docx
Assessment 2DescriptionFocusEssayValue50Due D.docxgalerussel59292
 
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_project
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_projectLeticia_Ferrer_Mur_Team11_Semester3_1_BA_project
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_projectLeticia Ferrer Mur
 
OpenAmplify V 2.0 Webinar
OpenAmplify V 2.0 WebinarOpenAmplify V 2.0 Webinar
OpenAmplify V 2.0 WebinarOpenAmplify
 
UK Department of Education intranet transformation case study at The Intranet...
UK Department of Education intranet transformation case study at The Intranet...UK Department of Education intranet transformation case study at The Intranet...
UK Department of Education intranet transformation case study at The Intranet...Prescient Digital Media
 
UK Department of Education intranet transformation case study w Erica Hodgson...
UK Department of Education intranet transformation case study w Erica Hodgson...UK Department of Education intranet transformation case study w Erica Hodgson...
UK Department of Education intranet transformation case study w Erica Hodgson...Toby Ward
 
Information Search
Information SearchInformation Search
Information Searchallerhed
 
Metadata Management In A Social Media World, Spsbos, 2 2010
Metadata Management In A Social Media World, Spsbos, 2 2010Metadata Management In A Social Media World, Spsbos, 2 2010
Metadata Management In A Social Media World, Spsbos, 2 2010Christian Buckley
 

Semelhante a Many Hands Make Light Work: Public Collaborative Text Correction in Australian Historic Newspapers. Keynote. April 2009 (20)

Enhancement and Enrichment of Digital Content by User Communities: The Aust...
Enhancement and Enrichment of Digital Content by User Communities: The Aust...Enhancement and Enrichment of Digital Content by User Communities: The Aust...
Enhancement and Enrichment of Digital Content by User Communities: The Aust...
 
Enhancement and Enrichment of Digital Content by User Communities: The Austra...
Enhancement and Enrichment of Digital Content by User Communities: The Austra...Enhancement and Enrichment of Digital Content by User Communities: The Austra...
Enhancement and Enrichment of Digital Content by User Communities: The Austra...
 
Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...
Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...
Trove: Collecting, Sharing and Improving Digital Data: Changing roles of libr...
 
How To Use Transfer Paper
How To Use Transfer PaperHow To Use Transfer Paper
How To Use Transfer Paper
 
Intranet 2.0 School: Building the essential staff intranet for your library
Intranet 2.0 School: Building the essential staff intranet for your libraryIntranet 2.0 School: Building the essential staff intranet for your library
Intranet 2.0 School: Building the essential staff intranet for your library
 
chrissy burns
chrissy burnschrissy burns
chrissy burns
 
chrissy burns
chrissy burnschrissy burns
chrissy burns
 
Tagging - Can User Generated Content Improve Our Services?
Tagging - Can User Generated Content Improve Our Services?Tagging - Can User Generated Content Improve Our Services?
Tagging - Can User Generated Content Improve Our Services?
 
Hvordan få søk til å fungere effektivt
Hvordan få søk til å fungere effektivtHvordan få søk til å fungere effektivt
Hvordan få søk til å fungere effektivt
 
#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)
#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)
#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)
 
Intranet 2.0 by Toby Ward, Prescient Digital Media
Intranet 2.0 by Toby Ward, Prescient Digital MediaIntranet 2.0 by Toby Ward, Prescient Digital Media
Intranet 2.0 by Toby Ward, Prescient Digital Media
 
Defining the Damn Data
Defining the Damn DataDefining the Damn Data
Defining the Damn Data
 
Assessment 2DescriptionFocusEssayValue50Due D.docx
Assessment 2DescriptionFocusEssayValue50Due D.docxAssessment 2DescriptionFocusEssayValue50Due D.docx
Assessment 2DescriptionFocusEssayValue50Due D.docx
 
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_project
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_projectLeticia_Ferrer_Mur_Team11_Semester3_1_BA_project
Leticia_Ferrer_Mur_Team11_Semester3_1_BA_project
 
OpenAmplify V 2.0 Webinar
OpenAmplify V 2.0 WebinarOpenAmplify V 2.0 Webinar
OpenAmplify V 2.0 Webinar
 
0101InSite
0101InSite0101InSite
0101InSite
 
UK Department of Education intranet transformation case study at The Intranet...
UK Department of Education intranet transformation case study at The Intranet...UK Department of Education intranet transformation case study at The Intranet...
UK Department of Education intranet transformation case study at The Intranet...
 
UK Department of Education intranet transformation case study w Erica Hodgson...
UK Department of Education intranet transformation case study w Erica Hodgson...UK Department of Education intranet transformation case study w Erica Hodgson...
UK Department of Education intranet transformation case study w Erica Hodgson...
 
Information Search
Information SearchInformation Search
Information Search
 
Metadata Management In A Social Media World, Spsbos, 2 2010
Metadata Management In A Social Media World, Spsbos, 2 2010Metadata Management In A Social Media World, Spsbos, 2 2010
Metadata Management In A Social Media World, Spsbos, 2 2010
 

Mais de Rose Holley

The strategic rebuilding and positioning of UNSW Canberra Special Collections...
The strategic rebuilding and positioning of UNSW Canberra Special Collections...The strategic rebuilding and positioning of UNSW Canberra Special Collections...
The strategic rebuilding and positioning of UNSW Canberra Special Collections...Rose Holley
 
Crowdsourcing based curation and user engagement in digital library design
Crowdsourcing based curation and user engagement in digital library designCrowdsourcing based curation and user engagement in digital library design
Crowdsourcing based curation and user engagement in digital library designRose Holley
 
National Archives of Australia. AVAMS Project Achievements August 2014
National Archives of Australia. AVAMS Project Achievements August 2014National Archives of Australia. AVAMS Project Achievements August 2014
National Archives of Australia. AVAMS Project Achievements August 2014Rose Holley
 
Building and Managing Online Communities
Building and Managing Online CommunitiesBuilding and Managing Online Communities
Building and Managing Online CommunitiesRose Holley
 
Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...Rose Holley
 
Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...
Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...
Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...Rose Holley
 
The Australian Women's Weekly now available in Trove: An overview of the digi...
The Australian Women's Weekly now available in Trove: An overview of the digi...The Australian Women's Weekly now available in Trove: An overview of the digi...
The Australian Women's Weekly now available in Trove: An overview of the digi...Rose Holley
 
Ideas for how volunteers at cultural heritage institutions can help, using Tr...
Ideas for how volunteers at cultural heritage institutions can help, using Tr...Ideas for how volunteers at cultural heritage institutions can help, using Tr...
Ideas for how volunteers at cultural heritage institutions can help, using Tr...Rose Holley
 
Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...
Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...
Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...Rose Holley
 
Crowdsourcing Strategies for Archives, Nov 2010
Crowdsourcing Strategies for Archives, Nov 2010Crowdsourcing Strategies for Archives, Nov 2010
Crowdsourcing Strategies for Archives, Nov 2010Rose Holley
 
Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...Rose Holley
 
A model for incorporating e-resources into Trove, September 2010
A model for incorporating e-resources into Trove, September 2010A model for incorporating e-resources into Trove, September 2010
A model for incorporating e-resources into Trove, September 2010Rose Holley
 
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...Developments in Access to Art Information: Trove. Presentation at ARLIS confe...
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...Rose Holley
 
Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...
Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...
Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...Rose Holley
 
Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...
Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...
Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...Rose Holley
 
Trove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian ParliamentTrove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian ParliamentRose Holley
 
Legal Research using digitised historic Australian Newspapers August 2010, by...
Legal Research using digitised historic Australian Newspapers August 2010, by...Legal Research using digitised historic Australian Newspapers August 2010, by...
Legal Research using digitised historic Australian Newspapers August 2010, by...Rose Holley
 
Trove: Explore Like Never Before. Key Features of Trove May 2010
Trove: Explore Like Never Before. Key Features of Trove May 2010Trove: Explore Like Never Before. Key Features of Trove May 2010
Trove: Explore Like Never Before. Key Features of Trove May 2010Rose Holley
 
Trove: Innovation In Access To Information. June 2010
Trove: Innovation In Access To Information. June 2010Trove: Innovation In Access To Information. June 2010
Trove: Innovation In Access To Information. June 2010Rose Holley
 
Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...
Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...
Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...Rose Holley
 

Mais de Rose Holley (20)

The strategic rebuilding and positioning of UNSW Canberra Special Collections...
The strategic rebuilding and positioning of UNSW Canberra Special Collections...The strategic rebuilding and positioning of UNSW Canberra Special Collections...
The strategic rebuilding and positioning of UNSW Canberra Special Collections...
 
Crowdsourcing based curation and user engagement in digital library design
Crowdsourcing based curation and user engagement in digital library designCrowdsourcing based curation and user engagement in digital library design
Crowdsourcing based curation and user engagement in digital library design
 
National Archives of Australia. AVAMS Project Achievements August 2014
National Archives of Australia. AVAMS Project Achievements August 2014National Archives of Australia. AVAMS Project Achievements August 2014
National Archives of Australia. AVAMS Project Achievements August 2014
 
Building and Managing Online Communities
Building and Managing Online CommunitiesBuilding and Managing Online Communities
Building and Managing Online Communities
 
Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...
 
Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...
Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...
Resource Sharing in Australia: 'Find' and 'Get' in Trove - Making 'Getting' b...
 
The Australian Women's Weekly now available in Trove: An overview of the digi...
The Australian Women's Weekly now available in Trove: An overview of the digi...The Australian Women's Weekly now available in Trove: An overview of the digi...
The Australian Women's Weekly now available in Trove: An overview of the digi...
 
Ideas for how volunteers at cultural heritage institutions can help, using Tr...
Ideas for how volunteers at cultural heritage institutions can help, using Tr...Ideas for how volunteers at cultural heritage institutions can help, using Tr...
Ideas for how volunteers at cultural heritage institutions can help, using Tr...
 
Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...
Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...
Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...
 
Crowdsourcing Strategies for Archives, Nov 2010
Crowdsourcing Strategies for Archives, Nov 2010Crowdsourcing Strategies for Archives, Nov 2010
Crowdsourcing Strategies for Archives, Nov 2010
 
Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...
 
A model for incorporating e-resources into Trove, September 2010
A model for incorporating e-resources into Trove, September 2010A model for incorporating e-resources into Trove, September 2010
A model for incorporating e-resources into Trove, September 2010
 
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...Developments in Access to Art Information: Trove. Presentation at ARLIS confe...
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...
 
Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...
Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...
Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...
 
Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...
Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...
Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...
 
Trove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian ParliamentTrove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian Parliament
 
Legal Research using digitised historic Australian Newspapers August 2010, by...
Legal Research using digitised historic Australian Newspapers August 2010, by...Legal Research using digitised historic Australian Newspapers August 2010, by...
Legal Research using digitised historic Australian Newspapers August 2010, by...
 
Trove: Explore Like Never Before. Key Features of Trove May 2010
Trove: Explore Like Never Before. Key Features of Trove May 2010Trove: Explore Like Never Before. Key Features of Trove May 2010
Trove: Explore Like Never Before. Key Features of Trove May 2010
 
Trove: Innovation In Access To Information. June 2010
Trove: Innovation In Access To Information. June 2010Trove: Innovation In Access To Information. June 2010
Trove: Innovation In Access To Information. June 2010
 
Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...
Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...
Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...
 

Último

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Many Hands Make Light Work: Public Collaborative Text Correction in Australian Historic Newspapers. Keynote. April 2009

Notas do Editor

  1. Thank you for inviting me to speak here today. Before I begin I would like to acknowledge the hard work of the ANDP team over the last 2 years. Our team was small consisting of only 6 people and we worked closely together with a shared vision and goal to achieve what I will show you today.
  2. All the information I will speak about today is available on the ANDP website. The address is www.nla.gov.au/ndp Under the project details tab are several papers, reports and all previous presentations. We have a high level of transparency with the program and this website has proved to be a useful information tool for the public, librarians and stakeholders. All information about titles to be digitised is available under the ‘selected titles’ tab.
  3. The overall objective of the Australian Newspaper Digitisation Program is to improve access to Australian newspapers, focusing first on content that is out of copyright – so up until the end of 1954. Up until now people wishing to research historic Australian newspapers needed to go to libraries across Australia and scroll through reels of microfilm. This program aimed to provide an online service that will let people anywhere, anytime access these newspapers via the internet. The service is now available. It is free. You can full text search across every page of every newspaper in the service, including advertising, cartoons, letters to the editor as well as the news and sports articles.
  4. Every state and territory library in Australia is involved in this national program. By 2011 we will have digitised 4 million newspaper pages, that’s about 40 million articles. The Sydney Morning Herald will comprise 600,000 pages of this. Each state has selected a major daily newspaper to begin with. We are working in collaboration with the state and territory libraries to digitise these first 4 million pages since many of them own the microfilm copies of the newspapers that we use to create the digital images from. Regional newspapers will be included from this year onwards. Regional titles are being contributed from libraries around Australia.
  5. The program started 2 years ago and we have digitised 1.8 million pages from microfilm so far. It is a 2 step process. Firstly microfilm is scanned into digital images by our contractor in Sydney and then the pages are sent to our contractor in India for Optical Character Recognition (OCR) processing. This makes them full text searchable. After quality assurance they are made available to the public through the Australian Newspapers service. The Beta service was released to the public in July 2008. It now contains 360,000 pages (3.5 million articles) and is being very well used. We will add another 40 million articles into the service by 2011.
  6. Today I am going to mainly talk about data enhancement by the public, including text correction. However I would just like to say that the development of the public search service is only one aspect of the overall program. Behind the scenes we have been undertaking significant software development – we have designed 2 systems, the Newspapers Content Management System which includes Quality Assurance Modules and the Search and Delivery System. We have also upgraded our infrastructure and purchased 63 TB of storage so far for the national newspapers storage infrastructure. All aspects of the digitisation process are being outsourced (some offshore). The ANDP team of 6 has been responsible for all aspects of the project. In addition we have employed some university students on a casual basis to undertake the quality assurance processes.
  7. On the technical side of things we are using a MySQL database and a Lucene search index. It was not our preference to undertake software development to the extent we have, but since there were no solutions available off the shelf we have gone down this path. It is our intent to share the code as open source for both systems sometime in the near future. We have had a lot of interest from other national libraries and institutions who wish to obtain the code and/or assist us with software development.
  8. The development cycle for the search and delivery system was first to release a prototype to state and territory libraries for feedback in 2007. We then developed a beta version in 2008 which had a public release. It is our intent this year to move the beta version into a version 1 and officially launch the service very soon.
  9. This is the home page of Australian Newspapers beta. Users can either keyword search or browse by date, title or state. The service is being heavily used with around 28,000 keyword searches per day and an unknown number of browses. We have not widely publicised the service as yet since we are still in a beta version.
  10. People are predominantly searching for names in the service. This is a visual image of search terms. The most searched names are John, William, Thomas, George and James.
  11. Search phrases also remain pretty similar from month to month with phrases often being a personal names in combination with the term, births, deaths, murders, shipping. In December 2008 the term ‘christmas’ was also a popular search term.
  12. Most of the users have found out about the service from genealogy blogs and forums. This is an example of a popular international forum where the news of OCR text correction wings its way from Mary in Italy to William in Gateshead UK, (PAGE)
  13. to Zoe in London, to Uncle John in Bedforshire, and then to Harry’s mum in Brisbane in a matter of minutes.
  14. The features discussed on the forums that the public are using are adding tags and comments to articles, and correcting the text within articles. We’ll look at each of these in turn..
  15. Firstly when a user comes to the site they can choose whether or not they want to login. It is not mandatory to login even if using the tag, comment or text correction features. The benefit of logging in is that users can track their activity and if they are a top corrector they may appear in the top correctors hall of fame. To date we have 3000 registered users, out of over 300,000 unique users. Of the 3000 registered users 1300 are regular text correctors. We do not know how many unregistered (anonymous) users are correcting text.
  16. Newspapers have a hierarchy of issue, pages in an issue, and articles on a page, which is reflected in the system. It is easy to navigate between the levels when browsing newspapers. This shows the page level view. On this screen you can move the frame splitter on the left to entirely hide the left bar and view only the newspaper image if you want. To access the enhancement features the user needs to go to the article level. If you do a keyword search instead of a browse you will come to article view immediately.
  17. This is the article view. Users can zoom in or out and choose to view the article in the context of the entire page. They can also navigate to any other page within the newspaper issue. The electronically generated text created through the OCR process is displayed on the left hand side. This is also where the users can use the 3 enhancement features. Users can tag the article with keywords and they can write comments and notes about the article. If users login they will be able to choose to make their tags and comments public or private. So they can share their comments with all users or they can add their own private research notes that only they can access. One feature that we believe is innovative and not available in any other online newspaper service, is the ability for the user to correct the electronically generated text. There are a number of reasons why the electronically created text is not always 100% accurate, mainly due to the quality of the original newspaper that the image was created from. Users can correct the text by clicking on the ‘Help fix this text’ button. We will now use these features on this article. The article we are looking at is the first report in an Australian paper of the sinking of the titantic.It’s in the Northern Territory Times on 19 April 1912.
  18. I want to tag the article with ‘titantic sinking’. If a user does not login when they first enter the service then the first time they want to enhance an article they will be offered the option to login. At this point they can either login or enter the captcha to verify they are human (and not a robot – attempting to do something undesirable).
  19. Once logged in or verified with captcha a user can enter their tags.
  20. Now I want to add a comment. Those of you who read this article may have noticed that it was reported that all passengers were safely rescued from the titanic and the weather was calm. I’ll just add a comment to say this was unfortunately not the case.
  21. Now I have zoomed in on the image and if the OCR text was inaccurate I would edit it in the box on the left. In this article the text is actually very accurate so has either OCR’d very well, or already been corrected by someone else.
  22. Now we can review the article with all the enhancements we have made showing on the left. Tags, comments and corrections. We can view the history of all the enhancements (both ours and other peoples history). So those were the basics, but lets take a closer look at users activity with the enhancement features over the last 6 months
  23. Adding tags has been a hugely popular activity for users. 46,000 tags have been added. However of these the vast majority are for personal names and only 34 tags have been used more than 100 times… This has led not to a useful tag cloud, but to tag fog! The screenshot shows the ‘John’ fog. Most of the tags have been used less than 10 times. Of the 46,000 16,500 are unique. The use of tags is surprising because we were dubious initially about the value of tags for articles when every article is full-text searchable and if the name you are looking for is incorrect you can edit it so that you can find it again. It certainly appears that people are using tags to try and track their research. Very few services if any have enabled tagging of full-text items, most tagging is for image collections only so what we are seeing here is new to us.
  24. The most used tag (one of only 5 that jump from the fog) is LLRSA which we have now discovered is short for the Light Railway Research Society of Australia. They have 250 members who are using the tag to record their group research.
  25. Tagging enables ‘marking’ or ‘saving’ of records into a group so that you can come back to them later. There is currently no other method to save a group of articles, other than bookmarking them.
  26. Each user has a profile page where they can view their latest tagging, commenting and text correction activities. The user profile pages are visible to other users. At this stage users cannot edit their profiles. It is desirable however that users are able to edit and personalise their profiles so they can share information about themselves and their research interests with other users.
  27. By browsing user profile pages we can see 2 distinct methods that people use to correct text. This first profile shows us that this user is looking at lots of different articles with a similar subject – flying saucers and ufo’s and just correcting a few lines in each article. The profile shows the article, the date changed, the old text and the new text.
  28. The next user profile shows method 2 – find an interesting article and then correct the whole article. Two of our top correctors are correcting long articles on gruesome murders, this is a popular theme. Text correctors report doing 1-3 hrs of text correction at a sitting on average. The average visitor spends 17 minutes searching and reading articles in a session.
  29. Several people can correct the same article. All corrections are saved and viewable in the history of the article. All versions of corrections are searched for. It is the last correction that is visible in the left hand pane. Articles are corrected by many users when they are either very long, very significant, or very illegible. For example this article is in the first Australian newspaper – the Sydney Gazette and NSW advertiser of March 1803. Around 20 people have made corrections to this article. It is particularly challenging because of its use of the long f instead of an s.
  30. This is the text correction history of this article, showing all the different users and what parts they corrected.
  31. Another regular activity of text correctors is methodically working through the family notices to correct names in the births, marriages, and deaths columns. This is a perfect example of a barely legible births column in the image on the right. We can see that it has already been corrected by a user and we can view the corrections.
  32. The raw OCR text has basically come out as rubbish (on the left) and users here have just fixed the names but not the rest of the words in the line. This means that other people will now be able to find these names.
  33. The comments feature was originally for researchers to annotate articles. It changed its name from annotations to notes to comments after user feedback. Some users are annotating the articles and adding further information about the content of the article or people mentioned in the article.
  34. Other users are adding comments on the physical state of the image or difficulties and questions they have around text correction. We have observed users using the comments to communicate with other users.
  35. We are not moderating text correction and this was a risk that both we and the users were aware of. To date no vandalism of text has been reported to us or noticed by us. By being transparent about the lack of moderation and giving a high level of trust to our users we appear to have gained a committed, responsible and dedicated group of text correctors. Some have likened it to Wikipedia. However if a user was to change something incorrectly we can see by this example that it would not take long for another user to notice it and correct it. In the example 3 different users are correcting the same article and helping each other in a matter of minutes. The users are therefore moderating each others corrections at the moment. In the worse case scenario that something was changed totally incorrectly other users would be aware of this since they can all still see the image. Also the search engine searches all text, even corrections of corrections so the original terms are still retrievable. Users have been using the comments field to communicate with each other and ask for help as this example also shows. This is because there is no other forum for them to communicate with each other at present.
  36. Since the release of the service in July 2008 text correction has remained consistent among a core group of 1300 correctors who have mostly been doing the same amount per month. Between 300,000 and 400,000 lines of text are corrected per month in 15-20,000 articles. There was a slight dip in November which was due to no new articles being added that month (which many users said de-motivated them). However text correction increased in January, despite there still being no new content added. Perhaps a lot of people were staying inside in the 40 degree heat looking for things to do with air-con on?
  37. In the first 6 months a total of 2 million lines in 100,000 articles had been corrected. The top 5 correctors had consistently remained in the top 5 each month and were working up to 45 hrs per week on text correction. Top correctors are correcting up to 30,000 lines per month. We had many users saying that t ext correction is proving to be an ‘addictive’ or compulsive activity. They sat down to fix a few words for 5 minutes and before they knew it 3 hours had passed. This was very interesting.
  38. Due to user demand we instigated the ‘hall of fame’ into the beta service. The top 5 correctors show on the home page and also in the hall of fame. Originally the hall of fame only showed the top 10 but users wanted to see more, so now it is anyone who has corrected more than 5000 lines per month. Users are still asking for entire league tables however so they can see where they are in the big picture. This is a motivating factor for them. During development it was suggested that we need to use gaming technologies to encourage people to correct text but this has so far not proved necessary!
  39. One of the things we have not been able to do is to measure the overall improvement in OCR accuracy. This is due to the difficulty measuring the OCR accuracy of the raw text to start with. I have written a paper published in D-Lib this month (March/April) about the difficulties of measuring OCR accuracy. A simple but resource intensive solution may be to compare words in an article with words in a dictionary before and after text correction as a comparison. We are supplied with a page level ‘word confidence’ figure from the OCR engine (where 0 is poor confidence and 1 is good). As a matter of interest we have plotted the text corrections in articles on a page, against the existing OCR engine provided page confidence levels for the entire corpus to date. The corrected lines have been scaled back by a factor of 7 so that they are more easily compared. The graph shows that corrections are above the page number curve for low confidence and below the page number curve for high confidence and about the same for mid confidence. So lower confidence pages tend to attract slightly more corrections proportionally than higher confidence pages, but the effect isn’t that pronounced. Page with low confidence make up 10.6% of the corpus and they get 16.7% of the corrections, pages with high confidence make up 20.4% of the corpus and they get just 18.4% of the corrections. 69% of the corpus is of average confidence and these pages get 64% of the corrections. It would be entirely feasible as some users have suggested to actively ‘serve up’ the articles on pages that have low page confidences if we wanted to target these for corrections.
  40. So after all this activity the most common question people kept asking me was “Who are these people?” and also “Why do they do it?” Some people even suspected that the text correctors were really library staff, which is not the case. The text correctors are real, normal people. We sent some of them a survey to find answers to our questions about how long they spend correcting, why they do it, what motivates them, what would motivate them to do more or less? The responses were very interesting.
  41. The three main reasons for correcting text were: We’re helping to provide an accurate record of Australian History We want to record family names and help others as we go We think it is a useful cause that will help all Australians, the Library, and ourselves and we are willing to give time for this.
  42. The motivating factors given were no different to those that motivate anyone to do anything for example they enjoy it, they have their own research goals, the think about the main outcome (ie making it better for everyone), they have been given a high level of trust and respect to do the job, and it is a challenge.
  43. To maintain or increase their motivation they again gave standard motivational answers. Things we had not done which they would like were to give them detailed instructions on how to do the job, to create for them a feeling of team spirit and being part of a virtual community, to recognise their achievements and acknowledge they were making a difference, and lastly to give them more content. They said the more content they were given the more they would do. Many noted that we had not publicised the service in any way or called for volunteers and the potential to harness a lot more volunteers was vast.
  44. All our top 5 correctors are Australians living in Victoria, New South Wales, and Queensland, with one in America. The five turned out to be 6 since one was a married couple sharing a logon to do research. Of the 6, 4 are female and 2 male. One is working full-time, one is a stay at home mum and 4 are retired. They are aged between 38 and 65. Three of the correctors are correcting as a volunteer ‘do good’ activity and trying to think up topics to correct, whereas the other 3 are correcting around their own areas of family history and local research. 2 of the 6 are also transcribing shipping records and births, marriages and deaths for other organisations. Here are some quotes from some of our top correctors. Julie is our top corrector and has corrected 2,500 articles so far. She is in her thirties and is a stay at home mum. She mainly corrects articles on local history and murder and corrects whole articles at a time. She says “ I enjoy the correction – it’s a great way to learn more about past history and things of interest whilst doing a service to the community by correcting text for the benefit of others” I keep doing because of the knowledge that you are doing something that will benefit future people that wish to access articles on their family history.
  45. Catherine is located in Washington DC and works full-time as the Director of an e-commerce company. She says “I enjoy typing, want to do something useful and find the content fascinating. I do it to benefit others”. Also she does not watch much TV. Lyn and Maurie a retired couple work on it together as part of their family history shipping research. They also do voluntary work for the mariners records. They say “ We get sick of doing housework, we find text correction addictive and it helps us and other people. How can you not correct errors when you see them?”.
  46. Mick is recently retired from IT. He says “ I thought I could be of some assistance to the project. It benefits me and other people. It helps with my family research. I would do more if I had broadband and did not have to share the computer with the rest of my family!” Fay is retired, she says “I enjoy the challenge, I need something to do in my spare time and it benefits me and others”
  47. Many of our current text correctors are genealogists. Genealogists do things that other groups of people may not. There is a genealogists ‘to do’ list that is circulating on blogs at the moment. It gives a useful insight into the life of a genealogist. One thing that is very important to them is what they call ‘random acts of genealogical kindness” where they may do something helpful for someone else that will help them trace their family tree. They also do organised acts of kindness such as transcribing births, marriages and deaths records. Genealogists very quickly get to grips with new technology if it helps them access resources or achieve one of their objectives.
  48. We have been gathering feedback from users for 6 months about the beta service and text correction in particular. The feedback has been overwhelming positive and thousands of suggestions and comment have been received. The feedback was gathered from a survey form, from e-mails, by observation of users, by statistics, and by lurking on forums and blogs (going into the users spaces). The users have given us valuable feedback so that we can better meet their needs. Some of their ideas match our own and other ideas they have given us are innovative and fresh and we had not thought of them ourselves.
  49. The main requests from users for improvements are as follows: Improve the text correction feature (so they can do more) Have more advanced searching including ability to define and search across enhancement layers e.g. tags only, tags and corrected text only, tags and comments. Have a communication mechanism e.g. a forum Enhancement of user profiles More statistics and where they are in the big picture of text correctors Alerting for new content coming into the database Guidelines for enhancement activities
  50. Some of the questions we now have to answer are: - How can we improve the text correction functionality, and if there is a quick mode and a power user mode – what should they be like? This is a mockup of a possible improved method.
  51. We’ve had lots of discussions about what tags and comments should be associated with – the issue, the page, the article, a line, a word. This is an early mockup of pinning a ‘post it’ type note to an article. Although visually this method made it easier for users to understand the difference between correcting text and adding a note, it created confusion since users did not know where to pin the note on the image and when many users attached tags or comments to the same image you could no longer see the image. Hence we reverted back to the textual approach. What the tag or comment is associated with is important when considering how to search across the enhancement layers. Also if enabling searching of layers e.g tags and comments together, decisions about how the relevancy ranking should work will have to be made. At present the relevancy ranking is based on a number of standard things with the addition of categories. Hence a news article or family notice will appear higher in the list than an advertisement or sports result.
  52. The lessons we have learnt to date are that engaging with users and building virtual communities is just as important to the users as providing the data itself. They want to be part of a community. By giving the users a high level of trust we have built commitment and loyalty in the community. Another lesson we have learnt is that using the term ‘text correction’ is not always helpful. It implies that something will be corrected and the old version deleted, which has caused concern to stakeholders and to the public. However as users undertake the activity it has become apparent that what they are doing is ‘enhancement’ or ‘enriching’ the data. They are actually creating layers on top of the original data, and all the layers can be transparent and separate or jointly searchable. The term ‘enhancement of data’ is not one which has yet become common terminology in Australian libraries but it will not be long before it does and is commonly understood by both the public and libraries. Lastly we know that the Australian Newspapers has had a big ‘social impact’ on peoples lives and the genealogical community. We are unable to quantitatively measure the impact or predict what may happen next.
  53. Traditionally libraries have held the power and control over data but the Australian Newspapers service is shifting that power to the community. Recently Barack Obama speaking on community engagement and volunteering said “Don’t under-estimate the power of people who join together …. They can accomplish amazing things”. This is true. People want to achieve amazing things and we as librarians have the power to give them both the data and the tools to do this – they will do the rest themselves. The challenge for the library is now how to nurture, sustain and grow this virtual community we have created and their resulting activities.
  54. The future potential of text enhancement is mind boggling when you think of it in the world context. In Australia alone we have 21 million people, more than half of whom have internet access at home so could potentially be volunteers. FamilyIndexSearch project report that in their first year they had 2000 volunteers and by their third year they have 160,000 volunteers correcting birth,marriage and death records. The Australian Newspapers program has the potential to match this easily. But why just think about Australian Newspapers? This functionality could be applied to many other full-text resources, indeed a global centre could be established where users decide what types of materials from which countries they wish to enhance. The future is exciting and open.
  55. That brings me to the end of my talk. I could of course talk a lot longer but I wanted to give you the opportunity to be able to ask me some questions. There is a full report on the activity of text correctors called ‘many hands make light work’ on the website. Thank you.