3. Categorizing data provider
• US Government Agencies
• Dividing agencies based on US Federal Government
Reference Model
• Each agency is in charge of publishing related datasets
• Data.gov catalog also provide topic related categorization
Nooshin Allahyari
3
5. Dataset Collection
• All 25 Datasets collected from Data.gov
• Datasets are in RDF format
• Difficulties running huge datasets
• Using different tools As endpoint
▫ Virtuoso commercial version as SPARQL endpoint
Easy to Install
GUI
Lots of visual tools
SQL,SQL tools and connection tools.
• Increasing dataset number for reliability
Nooshin Allahyari
5
8. Ontology Vocabulary Usage
• FEA Reference Model Ontology(RMO)
• Vocabulary Related to Government Context
▫ General Vocabulary
Country
State
City
▫ Government programs, Services:
Health Program
Cultural Program
Nooshin Allahyari
8
9. Annotation Property
• Useful to provide additional information about
datasets. All datasets have:
▫ rdfs:lable
▫ Rdfs:comments
▫ No language tag or metadata
Some datsets from Italy dataset catalog in TWC LOGD
contain Language Tag .
Nooshin Allahyari
9
11. Concept Coverage
• Same Concept in all datasets
• Metadata for Data.gov wiki and TWC LOGD
Nooshin Allahyari
11
Prefix Concept
foaf Homepage
rdfs isDefinedBy
dcterms Source
dgtwc uses-property
dgtwc number-of-triples
dgtwc number-of-properties
dgtwc number-of-enteries
12. Concept Coverage
• General Concept Related Government
• Low Coverage of concept
• Multi-name concepts
Nooshin Allahyari
12
Concept Coverage(percentage)
State 48%
City 32%
State-Abbreviation 16%
Region 12%
Zip 12%
Country 8%
Country origin code 8%
Area code 8%
14. Case-Based Analysis
• Three dataset from same agency in same
category
▫ Department of Veterans Affairs
dataset1213
dataset1288
Dataset1290
• Result of each dataset queries shows all three of
them have similar concepts
State
City
VISN
Station
Nooshin Allahyari
14
15. Case-Based Analysis-1288
• The query lists all station with their specific code(VISN)
in each city and determine the state in which the city is
located in:
Nooshin Allahyari
15
SELECT DISTINCT ?city ?station ?visn ?st
WHERE
{
?s <http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#city> ?city
OPTIONAL{ ?s
<http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#station> ?station}
OPTIONAL{?s
<http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#visn> ?visn}
OPTIONAL{?s
<http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#st> ?st}
}
State VISN Station City
"NJ" "3" "561" "East Orange"
"NY" "3" "620" "Montrose"
"NY" "3" "630"
"New York
Harbor"
"NY" "3" "632" "Northport"
"DE" "4" "460" "Wilmington"
"PA" "4" "503" "Altoona"
"PA" "4" "529" "Butler"
"WV" "4" "540" "Clarksburg"
16. Case-Based Analysis-1290
• The query lists all station with their specific code(VISN)
in each city and determine the state in which the city is
located in:
Nooshin Allahyari
16
SELECT DISTINCT ?city ?station ?visn ?st
WHERE
{
?s <http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#city> ?city
OPTIONAL{ ?s
<http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#station> ?station}
OPTIONAL{?s
<http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#visn> ?visn}
OPTIONAL{?s
<http://www.data.gov/semantic/data/alpha/1288/dataset-
1288.rdf#st> ?st}
}
State VISN Station City
"ME" "1" "402" "Togus"
"VT" "1" "405"
"White River
Junction"
"MA" "1" "518" "Bedford"
"MA" "1" "523" "West Roxbury"
"NH" "1" "608" "Manchester"
"MA" "1" "631" "Northampton"
"RI" "1" "650" "Providence"
"CT" "1" "689" "West Haven"
17. Case-Based Analysis-1213
• The query lists all station with their specific code(VISN)
in each city and determine the state in which the city is
located in:
Nooshin Allahyari
17
SELECT DISTINCT ?visn ?city ?state
WHERE
{
?s <http://www.data.gov/semantic/data/alpha/1213/dataset-1213.rdf#visn>
?visn.
?s <http://www.data.gov/semantic/data/alpha/1213/dataset-1213.rdf#city>
?city.
?s <http://www.data.gov/semantic/data/alpha/1213/dataset-1213.rdf#state>
?state
}
State VISN City
"CT" "1" "West Haven"
"MA" "1" "Bedford"
"MA" "1" "West Roxbury"
"MA" "1" "Northampton"
"ME" "1" "Togus"
"NH" "1" "Manchester"
"RI" "1" "Providence"
"VT" "1" "White River Junction"
21. Conclusion
• No Government ontology have been used in
experimental datasets
• Weak vocabulary usage in US Government
• Multi-vocabulary usage for same concept
• Multi-vocabulary usage in same government agency
• Lack of well defined, coherent, and consistent
government ontology.
Nooshin Allahyari
21