SlideShare uma empresa Scribd logo
1 de 19
@shawnmjones @WebSciDL
It’s All About The Cards:
Sharing on Social Media
Encouraged HTML
Metadata Growth
Shawn M. Jones· Valentina Neblitt-Jones· Martin Klein
Los Alamos National Laboratory
Research Library
Michele C. Weigle· Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Research Group
@shawnmjones @WebSciDL
Metadata is key to organizing content and
providing context
Creating
metadata
takes time
and effort.
Web page
authors can
add
metadata to
their pages
with HTML’s
META
element.
2
@shawnmjones @WebSciDL
Web page authors have many
choices in metadata standards
3
@shawnmjones @WebSciDL
Creating metadata is
expensive
4
How do authors spend their metadata
budget?
@shawnmjones @WebSciDL
Past studies focused on Dublin Core, and
show that systems favor certain fields
5
title is the most popular field per 10 studies
description is the second most popular field per 6 studies
@shawnmjones @WebSciDL
Our study evaluates the evolution of
metadata usage over time
6
Web archives capture web page
HTML, JavaScript, CSS, and
embedded content as
mementos.
Mementos have a specific
capture date and time, their
memento-datetime.
Each memento represents an
author’s behavior at that
specific time.
2/28/2021
3/20/2021
3/27/2021
@shawnmjones @WebSciDL
We thank Max Grusky for access
to the NEWSROOM dataset
7
NEWSROOM contains 1.3 million
mementos of news articles that contain
metadata.
All articles contain at least an HTML
description field.
NEWSROOM’s mementos were captured
by the Internet Archive between 1998 and
2016.
@shawnmjones @WebSciDL
We sampled 277,724 mementos of news articles
from the 39 outlets found in NEWSROOM
8
@shawnmjones @WebSciDL
In 1998, the mean
number of
metadata fields
used was 2
by 2016, it was 39
9
The sharp increase in 2006
may be an artifact of the
uneven sampling in the
dataset.
2
39
If we look at each individual
metadata field, how are they
being used?
@shawnmjones @WebSciDL
We grouped
metadata fields
into categories
10
Metadata usage exploded
after 2008.
A category’s size =
percentage of articles that
contain at least one
metadata field from that
category.
@shawnmjones @WebSciDL
We evaluated the use of the fields specified in
HTML standards from HTML 2.0 to HTML 5
11
keywords are still in use
even though most search
engines do not process them.
author usage is on the rise.
The heavy use of
description is an artifact
of the dataset.
@shawnmjones @WebSciDL
To contrast with previous studies, we
analyzed the adoption of Dublin Core
12
Dublin Core’s usage has not
grown much compared to
other categories.
@shawnmjones @WebSciDL
Schema.org is designed to assist
search engines
13
SEO experts
imply better
placement
among
search
results for
pages using
schema.org,
but the
adoption rate
seems
moderate.
@shawnmjones @WebSciDL
Other search engine metadata usage has
not grown much either
14
We see very similar
usage for metadata
related to identifying
pages for Google and
Bing.
@shawnmjones @WebSciDL
Metadata that supports sharing on social
media has experienced a renaissance
Social cards are
summaries of web pages
shared on social media.
twitter:image
twitter:title
twitter:description
15
They are built from authors’
web page metadata.
@shawnmjones @WebSciDL
Usage of OGP (Facebook) fields for social cards
has skyrocketed since it was introduced
16
Card fields required per testing are outlined in red.
Additional card fields required per documentation are in dotted
red.
There has
been far
less growth
for fields not
related to
social cards.
@shawnmjones @WebSciDL
The Twitter Card standard shows the same meteoric
rise in metadata usage specific to social cards
17
The card fields required after we tested creating cards with Twitter are
outlined in red.
Additional card fields required per documentation are in dotted red.
The growing
field usage
mirrors their
Facebook
counterparts.
Twitter will use
OGP fields, but
only if
twitter:card
is specified.
@shawnmjones @WebSciDL
Facebook supports non-OGP fields as
part of its Marketing API
18
Facebook’s sharing debugger implies that authors need to supply fb:app_id for
Facebook to generate a card, but it works fine without it.
Many of the articles we reviewed contained a blank string or “dummy value” for
this field.
@shawnmjones @WebSciDL
In conclusion: It’s all about the cards
19
• We analyzed 227,724 mementos
of news articles to understand
how authors used their metadata
budget.
• In 2008, metadata usage
exploded.
• When we break down usage by
individual fields, we see that
authors favor fields associated
with social cards.
• This insight can help future
metadata standard authors
understand what spurs metadata
adoption.
S. M. Jones, V. Neblitt-Jones, M. C. Weigle, M. Klein, and M. L. Nelson, “It's All About The Cards: Sharing on Social
Media Probably Encouraged HTML Metadata Growth,” ACM/IEEE Joint Conference on Digital Libraries, 2021.
[preprint: https://arxiv.org/abs/2104.04116.]

Mais conteúdo relacionado

Mais procurados

Mutable data @ scale
Mutable data @ scaleMutable data @ scale
Mutable data @ scaleOri Reshef
 
SnapLogic Live: Big Data Integration
SnapLogic Live: Big Data IntegrationSnapLogic Live: Big Data Integration
SnapLogic Live: Big Data IntegrationSnapLogic
 
A unified analytics platform with Kafka and Flink | Stephan Ewen, Ververica
A unified analytics platform with Kafka and Flink | Stephan Ewen, VervericaA unified analytics platform with Kafka and Flink | Stephan Ewen, Ververica
A unified analytics platform with Kafka and Flink | Stephan Ewen, VervericaHostedbyConfluent
 
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...HostedbyConfluent
 
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...Spark Summit
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
 
MongoDB and Azure Data Bricks - Microsoft
MongoDB and Azure Data Bricks - MicrosoftMongoDB and Azure Data Bricks - Microsoft
MongoDB and Azure Data Bricks - MicrosoftMongoDB
 
Micro-Servicing Linked Data
Micro-Servicing Linked DataMicro-Servicing Linked Data
Micro-Servicing Linked DataopenCypher
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Jason Flittner
 
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...Marius Politze
 
Detecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David PryceDetecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David PryceDatabricks
 
Spark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit
 
Build robust streaming data pipelines with MongoDB and Kafka P2
Build robust streaming data pipelines with MongoDB and Kafka P2Build robust streaming data pipelines with MongoDB and Kafka P2
Build robust streaming data pipelines with MongoDB and Kafka P2Ashnikbiz
 
The Internet as a Single Database
The Internet as a Single DatabaseThe Internet as a Single Database
The Internet as a Single DatabaseDatafiniti
 
Building materialised views for linked data systems using microservices
Building materialised views for linked data systems using microservicesBuilding materialised views for linked data systems using microservices
Building materialised views for linked data systems using microservicesConnected Data World
 
Spark Summit Keynote by Suren Nathan
Spark Summit Keynote by Suren NathanSpark Summit Keynote by Suren Nathan
Spark Summit Keynote by Suren NathanSpark Summit
 

Mais procurados (20)

Mutable data @ scale
Mutable data @ scaleMutable data @ scale
Mutable data @ scale
 
SnapLogic Live: Big Data Integration
SnapLogic Live: Big Data IntegrationSnapLogic Live: Big Data Integration
SnapLogic Live: Big Data Integration
 
A unified analytics platform with Kafka and Flink | Stephan Ewen, Ververica
A unified analytics platform with Kafka and Flink | Stephan Ewen, VervericaA unified analytics platform with Kafka and Flink | Stephan Ewen, Ververica
A unified analytics platform with Kafka and Flink | Stephan Ewen, Ververica
 
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
 
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
 
AnzoGraph DB - SPARQL 101
AnzoGraph DB - SPARQL 101AnzoGraph DB - SPARQL 101
AnzoGraph DB - SPARQL 101
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the Cloud
 
MongoDB and Azure Data Bricks - Microsoft
MongoDB and Azure Data Bricks - MicrosoftMongoDB and Azure Data Bricks - Microsoft
MongoDB and Azure Data Bricks - Microsoft
 
Spark and MongoDB
Spark and MongoDBSpark and MongoDB
Spark and MongoDB
 
Tracking data lineage at Stitch Fix
Tracking data lineage at Stitch FixTracking data lineage at Stitch Fix
Tracking data lineage at Stitch Fix
 
Micro-Servicing Linked Data
Micro-Servicing Linked DataMicro-Servicing Linked Data
Micro-Servicing Linked Data
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
 
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
 
Detecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David PryceDetecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David Pryce
 
Mongodb Spring
Mongodb SpringMongodb Spring
Mongodb Spring
 
Spark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu Adunuthula
 
Build robust streaming data pipelines with MongoDB and Kafka P2
Build robust streaming data pipelines with MongoDB and Kafka P2Build robust streaming data pipelines with MongoDB and Kafka P2
Build robust streaming data pipelines with MongoDB and Kafka P2
 
The Internet as a Single Database
The Internet as a Single DatabaseThe Internet as a Single Database
The Internet as a Single Database
 
Building materialised views for linked data systems using microservices
Building materialised views for linked data systems using microservicesBuilding materialised views for linked data systems using microservices
Building materialised views for linked data systems using microservices
 
Spark Summit Keynote by Suren Nathan
Spark Summit Keynote by Suren NathanSpark Summit Keynote by Suren Nathan
Spark Summit Keynote by Suren Nathan
 

Semelhante a It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata Growth

Comparison of used metadata elements in digital libraries in iran with dublin...
Comparison of used metadata elements in digital libraries in iran with dublin...Comparison of used metadata elements in digital libraries in iran with dublin...
Comparison of used metadata elements in digital libraries in iran with dublin...eSAT Publishing House
 
Comparison of used metadata elements in digital libraries in iran with dublin...
Comparison of used metadata elements in digital libraries in iran with dublin...Comparison of used metadata elements in digital libraries in iran with dublin...
Comparison of used metadata elements in digital libraries in iran with dublin...eSAT Journals
 
GoodRelations & RDFa for Deep Comparison Shopping on a Web Scale
GoodRelations & RDFa for Deep Comparison Shopping on a Web ScaleGoodRelations & RDFa for Deep Comparison Shopping on a Web Scale
GoodRelations & RDFa for Deep Comparison Shopping on a Web ScaleMartin Hepp
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & AnalysisScott Sanders
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014Robert Meusel
 
The Data Records Extraction from Web Pages
The Data Records Extraction from Web PagesThe Data Records Extraction from Web Pages
The Data Records Extraction from Web Pagesijtsrd
 
Study on Web Content Extraction Techniques
Study on Web Content Extraction TechniquesStudy on Web Content Extraction Techniques
Study on Web Content Extraction Techniquesijtsrd
 
Kellogg XML Holland Speech
Kellogg XML Holland SpeechKellogg XML Holland Speech
Kellogg XML Holland SpeechDave Kellogg
 
IRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET Journal
 
SMX Advanced 2012 - Catching up with the Semantic Web
SMX Advanced 2012 - Catching up with the Semantic WebSMX Advanced 2012 - Catching up with the Semantic Web
SMX Advanced 2012 - Catching up with the Semantic WebMatthew Brown
 
A Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesA Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesIJMER
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 
Web Content Mining Based on Dom Intersection and Visual Features Concept
Web Content Mining Based on Dom Intersection and Visual Features ConceptWeb Content Mining Based on Dom Intersection and Visual Features Concept
Web Content Mining Based on Dom Intersection and Visual Features Conceptijceronline
 
A detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesA detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesijctet
 
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKSA LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKScsandit
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?Martin Hepp
 
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...IOSR Journals
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engineankur881120
 

Semelhante a It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata Growth (20)

Comparison of used metadata elements in digital libraries in iran with dublin...
Comparison of used metadata elements in digital libraries in iran with dublin...Comparison of used metadata elements in digital libraries in iran with dublin...
Comparison of used metadata elements in digital libraries in iran with dublin...
 
Comparison of used metadata elements in digital libraries in iran with dublin...
Comparison of used metadata elements in digital libraries in iran with dublin...Comparison of used metadata elements in digital libraries in iran with dublin...
Comparison of used metadata elements in digital libraries in iran with dublin...
 
GoodRelations & RDFa for Deep Comparison Shopping on a Web Scale
GoodRelations & RDFa for Deep Comparison Shopping on a Web ScaleGoodRelations & RDFa for Deep Comparison Shopping on a Web Scale
GoodRelations & RDFa for Deep Comparison Shopping on a Web Scale
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
 
The Data Records Extraction from Web Pages
The Data Records Extraction from Web PagesThe Data Records Extraction from Web Pages
The Data Records Extraction from Web Pages
 
Study on Web Content Extraction Techniques
Study on Web Content Extraction TechniquesStudy on Web Content Extraction Techniques
Study on Web Content Extraction Techniques
 
Kellogg XML Holland Speech
Kellogg XML Holland SpeechKellogg XML Holland Speech
Kellogg XML Holland Speech
 
BigData
BigDataBigData
BigData
 
IRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search Results
 
SMX Advanced 2012 - Catching up with the Semantic Web
SMX Advanced 2012 - Catching up with the Semantic WebSMX Advanced 2012 - Catching up with the Semantic Web
SMX Advanced 2012 - Catching up with the Semantic Web
 
F43033234
F43033234F43033234
F43033234
 
A Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesA Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web Databases
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Web Content Mining Based on Dom Intersection and Visual Features Concept
Web Content Mining Based on Dom Intersection and Visual Features ConceptWeb Content Mining Based on Dom Intersection and Visual Features Concept
Web Content Mining Based on Dom Intersection and Visual Features Concept
 
A detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesA detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniques
 
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKSA LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
 
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
 

Mais de Shawn Jones

Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Shawn Jones
 
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...Shawn Jones
 
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Shawn Jones
 
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Shawn Jones
 
Automatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social CardsAutomatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social CardsShawn Jones
 
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)Shawn Jones
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Shawn Jones
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web ArchivesShawn Jones
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesShawn Jones
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Shawn Jones
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitShawn Jones
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-ItShawn Jones
 
Improving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesShawn Jones
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsShawn Jones
 
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoAvoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoShawn Jones
 
Continuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonestContinuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonestShawn Jones
 
A Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven DevelopmentA Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven DevelopmentShawn Jones
 
Reconstructing the past with media wiki
Reconstructing the past with media wikiReconstructing the past with media wiki
Reconstructing the past with media wikiShawn Jones
 

Mais de Shawn Jones (19)

Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
 
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
 
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
 
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
 
Automatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social CardsAutomatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social Cards
 
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web Archives
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento Toolkit
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-It
 
Improving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web Archives
 
Reference Rot
Reference RotReference Rot
Reference Rot
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive Collections
 
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoAvoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
 
Continuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonestContinuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonest
 
A Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven DevelopmentA Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven Development
 
Reconstructing the past with media wiki
Reconstructing the past with media wikiReconstructing the past with media wiki
Reconstructing the past with media wiki
 

Último

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Último (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata Growth

  • 1. @shawnmjones @WebSciDL It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata Growth Shawn M. Jones· Valentina Neblitt-Jones· Martin Klein Los Alamos National Laboratory Research Library Michele C. Weigle· Michael L. Nelson Old Dominion University Web Science and Digital Libraries Research Group
  • 2. @shawnmjones @WebSciDL Metadata is key to organizing content and providing context Creating metadata takes time and effort. Web page authors can add metadata to their pages with HTML’s META element. 2
  • 3. @shawnmjones @WebSciDL Web page authors have many choices in metadata standards 3
  • 4. @shawnmjones @WebSciDL Creating metadata is expensive 4 How do authors spend their metadata budget?
  • 5. @shawnmjones @WebSciDL Past studies focused on Dublin Core, and show that systems favor certain fields 5 title is the most popular field per 10 studies description is the second most popular field per 6 studies
  • 6. @shawnmjones @WebSciDL Our study evaluates the evolution of metadata usage over time 6 Web archives capture web page HTML, JavaScript, CSS, and embedded content as mementos. Mementos have a specific capture date and time, their memento-datetime. Each memento represents an author’s behavior at that specific time. 2/28/2021 3/20/2021 3/27/2021
  • 7. @shawnmjones @WebSciDL We thank Max Grusky for access to the NEWSROOM dataset 7 NEWSROOM contains 1.3 million mementos of news articles that contain metadata. All articles contain at least an HTML description field. NEWSROOM’s mementos were captured by the Internet Archive between 1998 and 2016.
  • 8. @shawnmjones @WebSciDL We sampled 277,724 mementos of news articles from the 39 outlets found in NEWSROOM 8
  • 9. @shawnmjones @WebSciDL In 1998, the mean number of metadata fields used was 2 by 2016, it was 39 9 The sharp increase in 2006 may be an artifact of the uneven sampling in the dataset. 2 39 If we look at each individual metadata field, how are they being used?
  • 10. @shawnmjones @WebSciDL We grouped metadata fields into categories 10 Metadata usage exploded after 2008. A category’s size = percentage of articles that contain at least one metadata field from that category.
  • 11. @shawnmjones @WebSciDL We evaluated the use of the fields specified in HTML standards from HTML 2.0 to HTML 5 11 keywords are still in use even though most search engines do not process them. author usage is on the rise. The heavy use of description is an artifact of the dataset.
  • 12. @shawnmjones @WebSciDL To contrast with previous studies, we analyzed the adoption of Dublin Core 12 Dublin Core’s usage has not grown much compared to other categories.
  • 13. @shawnmjones @WebSciDL Schema.org is designed to assist search engines 13 SEO experts imply better placement among search results for pages using schema.org, but the adoption rate seems moderate.
  • 14. @shawnmjones @WebSciDL Other search engine metadata usage has not grown much either 14 We see very similar usage for metadata related to identifying pages for Google and Bing.
  • 15. @shawnmjones @WebSciDL Metadata that supports sharing on social media has experienced a renaissance Social cards are summaries of web pages shared on social media. twitter:image twitter:title twitter:description 15 They are built from authors’ web page metadata.
  • 16. @shawnmjones @WebSciDL Usage of OGP (Facebook) fields for social cards has skyrocketed since it was introduced 16 Card fields required per testing are outlined in red. Additional card fields required per documentation are in dotted red. There has been far less growth for fields not related to social cards.
  • 17. @shawnmjones @WebSciDL The Twitter Card standard shows the same meteoric rise in metadata usage specific to social cards 17 The card fields required after we tested creating cards with Twitter are outlined in red. Additional card fields required per documentation are in dotted red. The growing field usage mirrors their Facebook counterparts. Twitter will use OGP fields, but only if twitter:card is specified.
  • 18. @shawnmjones @WebSciDL Facebook supports non-OGP fields as part of its Marketing API 18 Facebook’s sharing debugger implies that authors need to supply fb:app_id for Facebook to generate a card, but it works fine without it. Many of the articles we reviewed contained a blank string or “dummy value” for this field.
  • 19. @shawnmjones @WebSciDL In conclusion: It’s all about the cards 19 • We analyzed 227,724 mementos of news articles to understand how authors used their metadata budget. • In 2008, metadata usage exploded. • When we break down usage by individual fields, we see that authors favor fields associated with social cards. • This insight can help future metadata standard authors understand what spurs metadata adoption. S. M. Jones, V. Neblitt-Jones, M. C. Weigle, M. Klein, and M. L. Nelson, “It's All About The Cards: Sharing on Social Media Probably Encouraged HTML Metadata Growth,” ACM/IEEE Joint Conference on Digital Libraries, 2021. [preprint: https://arxiv.org/abs/2104.04116.]