2. Automatic Metadata Generation
Is a machine process of metadata extraction and
metadata harvesting.
Metadata extraction uses automatic indexing
techniques to search and obtain resource content and
produce structured metadata according to metadata
standards
Metadata harvesting is completed by machine to collect
tagged metadata created by machine or humans.
2
3. Why choose automatic metadata generation
over manually created metadata?
Advantages:
Efficiency
Cost
Consistency
3
4. Automatic Metadata Generation Concept Example(s)
Metadata extraction. The process of automatically Metadata extraction for a Web page involves extracting
pulling (extracting) metadata from a resource’s metadata from the resource's content that is displayed
content. Resource content is mined to produce via a Web browser.
structured (“labeled”) metadata for object
representation.
Metadata harvesting. The process of automatically Metadata harvested from a Web page is found in the
collecting resource metadata already embedded in or "header” source code of an HTML (or XHTML) resource
associated with a resource. The harvested metadata is (e.g., "Keywords" META tags). Metadata for a Microsoft
originally produced by humans or by fully or WORD document is found under file properties
semiautomatic processes supported by software. (e.g., "Type of file," which is automatically
generated, and "Keywords," which can be added by a
resource author).
Fully-automatic metadata generation. Web editing software (e.g., Macromedia’s Dreamweaver
Complete (or total) reliance on automatic processes to and Microsoft’s FrontPage) and selected document
create metadata. software (e.g., Microsoft WORD and Acrobat)
automatically produce metadata at the time a resource
is created or updated (e.g., “Date of creation" or "Date
modified") without human intervention.
Semi-automatic metadata generation. (1) Fully-automatic techniques are used to generate
Partial reliance on software to create metadata; a metadata (e.g.,"Keywords") as a first pass, and
combination of fully-automatic and human processes software then presents the metadata to a person, who
to create metadata. may manually edit the metadata. (2)Software may
present a person (e.g., resource author or Web
architect) with a “template” that guides the manual
input of metadata, and then automatically converts the
metadata to appropriate encoding (e.g., XML tags). The
software may even automatically embed metadata in a
resource. 4
Greenberg (2005), p. 25
5. Created to “identify and recommend functionalities for
automatic metadata generation applications”
Discusses current state of automatic metadata generation
applications
Problem areas
Conducted survey of metadata experts
Suggests functionalities that future applications should
incorporate
Found at:
http://www.loc.gov/catdir/bibcontrol/lc_amega_final_report.pdf
5
6. Problems with current automatic metadata applications:
Do not support standard bibliographic functions and element
qualifications
Sophisticated automatic indexing algorithms have not been
incorporated to metadata applications
Automatic metadata applications are developed separate from each
other
There is no standards for creating automatic metadata generation
applications
6
7. The purpose of the survey conducted by AMeGA was to:
Get an idea of what current libraries are currently doing for
metadata creation
See if they are aware of current automatic metadata generation
applications
See what developments they would most like to see happen for
metadata creation
Survey participants: 217 completed the survey
75.2% of participants had three or more years of cataloging and/or indexing experience
•29.5% were administrators/executives •40.7% of participants were from Academic
•28.3% catalogers/metadata librarians libraries
•Remaining percentages divided by 8 •13.4% from Government
categories agency/department
•12.8% Academic community (not the
library)
•11.6% Government library
•9.3% Non-profit organization
•8.1% Cooperation/company
•1.2% Public library
•0.1% Corporate library
7
•2.3% Other
8. Top 4 metadata standards used in the libraries that participants worked: MARC, DC
simple, DC qualified, and EAD.
Top 4 metadata standards used in nonlibraries that participants worked: DC
simple, DC qualified, MARC, DC application profile.
94 Organizations were using 1 metadata system
55 Organizations were using 2 metadata systems
22 Organizations were using 3 metadata systems
6 Organizations were using 4 metadata systems
4 Organizations were using 5 metadata systems
2 Organizations were using 6 metadata systems
1 Organization was using 7 metadata systems
The most common Metadata Generation systems being used (in order of most used):
Custom/in-house
ContentDM
Endeavor/Voyager
OCLC/Innovative Interfaces
OCLC/Connexion
Microsoft Access
Xmetal
NoteTab (or similar text editor)
XML Spy
Dspace
(etc.)
Greenberg (2005) p. 24 8
9. Survey participants were asked a series of experience or opinion
questions regarding the automatic metadata generation of digital
document like objects using the Dublin Core Metadata Element Set.
Participants either experience or predict the most accuracy of technical
metadata (ID, language, format).
Less accuracy was predicted for subject and description since it requires
intellectual judgment.
When questioned whether they would devote a “moderate” amount of
resources for research between either intellectual metadata (subject,
description) or complete automation of physical metadata (ID, format,
language) they were divided.
A majority of participants believed that research for generating nontextual
and foreign language material is important and valuable.
70% of participants would like applications to run automatic algorithms,
allowing human evaluation and editing afterwards.
Most participants would also want to be able to incorporate subject
schemes, content creation guidelines, cataloging and metadata examples
into metadata generation applications.
9
10. Based on the results of the survey, AMeGA created a list of
functionalities needed in automatic metadata generation
applications:
The system should be able to configure profiles before metadata
generation
The system should automatically identify and collect any
metadata associated with a resource
The system should enhance and refine manually generated and
automatically generated metadata
The system should automatically evaluate the quality and
metadata and provide a rating score
The system should be used to create metadata for nontextual
resources
10
11. Conclusion
Experimental researchers and metadata experts need to work
together on developing applications.
Application standards needs to be created.
Much more funding and research needs to be devoted to
automatic metadata generation.
The important thing to now be developed is metadata
generation applications that automatically identifies and
collects metadata, aids human metadata generation, enhance
previously created metadata, and evaluates the quality of
metadata.
11
12. DCMI (2008). Dublin Core Metadata Initiative: Scorpion. Retrieved from
http://www.dublincore.org/tools/tools/tool-11.shtml
Greenberg, J., (2003). Metadata Generation: Processes, People and Tools. Bulletin of the
American Society for Information Sciences and Technology, Volume Number 29(2).
Retrieved from http://www.asis.org/Bulletin/Dec-02/greenberg.html
Greenberg, J., Spurgin, K., Crystal, A. (2005). Final Report for the AMeGA (Automatic Metadata
Generation Applications) Project. Retrieved from
http://www.loc.gov/catdir/bibcontrol/lc_amega_final_report.pdf
Greenberg, J., Spurgin, K., Crystal, A. (2006). Functionalities for automatic metadata generation
applications: a survey of metadata experts’ opinions. Int. J. Metadata, Semantics and Ontologies,
Volume Number 1 (1), 3-20.
Ojokoh, B., Adewale, O., & Falaki, S. (2009). Automated document metadata extraction. Journal Of
Information Science, 35(5), 563-570.
Park, J., & Lu, C. (2009). Application of semi-automatic metadata generation in libraries: Types,
tools, and techniques. Library & Information Science Research (07408188), 31(4), 225-231.
Shafer, K. E. (2001). Automatic Subject Assignment via the Scorpion System. Journal Of
Library Administration, 34(1/2), 187.
Shafer, K. E. (2001). Evaluating Scorpion Results. Journal Of Library Administration, 34(3/4), 237.
Su, S. T., Long, Y., & Cromwell, D. E. (2002). E2M: Automatic Generation of MARC-Formatted Metadata by
Crawling E-Publications. Information Technology & Libraries, 21(4), 171-180.
12
13. Thank you!
For any questions or concerns, please contact me at:
hachilde@uncg.edu
_________
It’s been a wonderful class with everyone! Good luck in all
of your future endeavors! I hope to see you all around!
13