In a perfect world, all articles consistently contain sufficient metadata to describe the resource. We know this is not the reality, so we are motivated to investigate the evolution of the metadata that is present when authors and publishers supply their own. Because applying metadata takes time, we recognize that each news article author has a limited metadata budget with which to spend their time and effort. How are they spending this budget? What are the top metadata categories in use? How did they grow over time? What purpose do they serve? We also recognize that not all metadata fields are used equally. What is the growth of individual fields over time? Which fields experienced the fastest adoption? In this paper, we review 227,726 HTML news articles from 29 outlets captured by the Internet Archive between 1998 and 2016. Upon reviewing the metadata fields in each article, we discovered that 2010 began a metadata renaissance as publishers embraced metadata for improved search engine ranking, search engine tracking, social media tracking, and social media sharing. When analyzing individual fields, we find that one application of metadata stands out above all others: social cards -- the cards generated by platforms like Twitter when one shares a URL. Once a metadata standard was established for cards in 2010, its fields were adopted by 20% of articles in the first year and reached more than 95% adoption by 2016. This rate of adoption surpasses efforts like schema.org and Dublin Core by a fair margin. When confronted with these results on how news publishers spend their metadata budget, we must conclude that it is all about the cards.
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata Growth
1. @shawnmjones @WebSciDL
It’s All About The Cards:
Sharing on Social Media
Encouraged HTML
Metadata Growth
Shawn M. Jones· Valentina Neblitt-Jones· Martin Klein
Los Alamos National Laboratory
Research Library
Michele C. Weigle· Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Research Group
2. @shawnmjones @WebSciDL
Metadata is key to organizing content and
providing context
Creating
metadata
takes time
and effort.
Web page
authors can
add
metadata to
their pages
with HTML’s
META
element.
2
5. @shawnmjones @WebSciDL
Past studies focused on Dublin Core, and
show that systems favor certain fields
5
title is the most popular field per 10 studies
description is the second most popular field per 6 studies
6. @shawnmjones @WebSciDL
Our study evaluates the evolution of
metadata usage over time
6
Web archives capture web page
HTML, JavaScript, CSS, and
embedded content as
mementos.
Mementos have a specific
capture date and time, their
memento-datetime.
Each memento represents an
author’s behavior at that
specific time.
2/28/2021
3/20/2021
3/27/2021
7. @shawnmjones @WebSciDL
We thank Max Grusky for access
to the NEWSROOM dataset
7
NEWSROOM contains 1.3 million
mementos of news articles that contain
metadata.
All articles contain at least an HTML
description field.
NEWSROOM’s mementos were captured
by the Internet Archive between 1998 and
2016.
9. @shawnmjones @WebSciDL
In 1998, the mean
number of
metadata fields
used was 2
by 2016, it was 39
9
The sharp increase in 2006
may be an artifact of the
uneven sampling in the
dataset.
2
39
If we look at each individual
metadata field, how are they
being used?
10. @shawnmjones @WebSciDL
We grouped
metadata fields
into categories
10
Metadata usage exploded
after 2008.
A category’s size =
percentage of articles that
contain at least one
metadata field from that
category.
11. @shawnmjones @WebSciDL
We evaluated the use of the fields specified in
HTML standards from HTML 2.0 to HTML 5
11
keywords are still in use
even though most search
engines do not process them.
author usage is on the rise.
The heavy use of
description is an artifact
of the dataset.
12. @shawnmjones @WebSciDL
To contrast with previous studies, we
analyzed the adoption of Dublin Core
12
Dublin Core’s usage has not
grown much compared to
other categories.
13. @shawnmjones @WebSciDL
Schema.org is designed to assist
search engines
13
SEO experts
imply better
placement
among
search
results for
pages using
schema.org,
but the
adoption rate
seems
moderate.
14. @shawnmjones @WebSciDL
Other search engine metadata usage has
not grown much either
14
We see very similar
usage for metadata
related to identifying
pages for Google and
Bing.
15. @shawnmjones @WebSciDL
Metadata that supports sharing on social
media has experienced a renaissance
Social cards are
summaries of web pages
shared on social media.
twitter:image
twitter:title
twitter:description
15
They are built from authors’
web page metadata.
16. @shawnmjones @WebSciDL
Usage of OGP (Facebook) fields for social cards
has skyrocketed since it was introduced
16
Card fields required per testing are outlined in red.
Additional card fields required per documentation are in dotted
red.
There has
been far
less growth
for fields not
related to
social cards.
17. @shawnmjones @WebSciDL
The Twitter Card standard shows the same meteoric
rise in metadata usage specific to social cards
17
The card fields required after we tested creating cards with Twitter are
outlined in red.
Additional card fields required per documentation are in dotted red.
The growing
field usage
mirrors their
Facebook
counterparts.
Twitter will use
OGP fields, but
only if
twitter:card
is specified.
18. @shawnmjones @WebSciDL
Facebook supports non-OGP fields as
part of its Marketing API
18
Facebook’s sharing debugger implies that authors need to supply fb:app_id for
Facebook to generate a card, but it works fine without it.
Many of the articles we reviewed contained a blank string or “dummy value” for
this field.
19. @shawnmjones @WebSciDL
In conclusion: It’s all about the cards
19
• We analyzed 227,724 mementos
of news articles to understand
how authors used their metadata
budget.
• In 2008, metadata usage
exploded.
• When we break down usage by
individual fields, we see that
authors favor fields associated
with social cards.
• This insight can help future
metadata standard authors
understand what spurs metadata
adoption.
S. M. Jones, V. Neblitt-Jones, M. C. Weigle, M. Klein, and M. L. Nelson, “It's All About The Cards: Sharing on Social
Media Probably Encouraged HTML Metadata Growth,” ACM/IEEE Joint Conference on Digital Libraries, 2021.
[preprint: https://arxiv.org/abs/2104.04116.]