SlideShare a Scribd company logo
1 of 17
Adrian Stevenson, Senior Technical Coordinator, Jisc Manchester
Tools for Data Manipulation
UKAD Open RefineWorkshop, Jisc London, 18th March 2016
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 2
Workshop Resources
Available from:
http://data.archiveshub.ac.uk/workshops/ukad2016/readme.html
Link to Open Refine and plugins
Link to example data used for workshop
Link to completed Open Refine project from todays
workshop
Open Refine
OpenRefine (formerly Google Refine) is a powerful tool for
working with messy data: cleaning it; transforming it from
one format into another; and extending it with web
services and external data.
Main Uses:
• Explore data
• Clean and transform data
• Reconcile and match data
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 3
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 4
Installing and running Open Refine
Download from:
http://openrefine.org/download.html
Run and in a web browser go to: http://127.0.0.1:3333/
Select ‘create project’ and browse for Archives Hub
example csv data file
Note: May need to clear browser cache to see new projects
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 5
Clean andTransform - Facets and Clustering
Strip white space
Transform Upper case, title case
Split multi valued cells or Edit col > Split several cols
Facet on label
Order by count
Cluster and rename rows
Undo
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 6
Clean - Remove Duplicate rows
Sort on column with duplicates and reorder permanently
Facet duplicates to check
Watch for OR switching from rows to records view
Edit cells > Blank Down
Facet by blank
Remove all matching
Essence of Open Refine is using facets and filters to isolate
rows and invoke commands to affect all these rows together
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 7
URIs
LD Design Issues
Triples
http://www.w3.org/DesignIssues/LinkedData.html
8
Triples
Triples statements
»‘Things’ have ‘properties’ with ‘values’
»Subject – Predicate - Object
Archival
Resource
Repository Provides Access To
Pride and
Prejudice
Jane Austen Is Author Of
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 9
Triples are the basis of RDF and Linked Data
owl:sameAs
Hub Person - owl:sameAs -VIAF Person
<http://data.archiveshub.ac.uk/id/person/nra/webbma
rthabeatrice1858-1943socialreformer>
owl:sameAs
<http://viaf.org/viaf/86607236> .
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 10
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 11
Matching Names toVIAF
May need to join columns together, for example to give more
consistent name form, e.g using:
cells["FamilyName"].value + ", " + cells["GivenName"].value + ", " +
cells["Dates"].value
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 12
Matching Names toVIAF
VIAF reconciliation service details at:
http://iphylo.blogspot.co.uk/2013/04/reconciling-author-names-using-open.html
May need to add as a ‘standard service’ under Reconcile >
Start reconciling. Service URL is:
http://iphylo.org/~rpage/phyloinformatics/services/reconcil
iation_viaf.php
Other recon services e.g. LCSH at:
https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-
Sources
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 13
RDF Export
Download RDF Refine Extension from http://refine.deri.ie/
Unzip
Open Project > Browse workspace directory
Create ‘extensions’ folder (if doesn’t exist)
Copy RDF Refine unzipped folder to workspace directory
Restart Open Refine
Need to create column withVIAF URIs for export:
"http://viaf.org/viaf/"+cell.recon.match.id
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 14
Matching Subjects to LCSH
Click RDF button in the top right corner, select ‘Add reconciliation
service, Based on SPARQL endpoint’.
Add following parameters:
Name: LCSH
Endpoint URL: http://sparql.freeyourmetadata.org/
Graph URI: http://id.loc.gov/authorities/subjects
Type:Virtuoso
Label properties: check only skos:prefLabel
Martha BeatriceWebb
Place of birth:Gloucester,
England
Place of death: Liphook,
Hampshire, England
Life dates: 1858-1943
Epithet: social reformer
and historian
Family name:Webb
Image
from: BeatriceWebb letters
BeatriceWebb (1858 - 1943). Fabian Socialist, social reformer, writer,
historian, diarist.Wife, collaborator and assistant of SidneyWebb,
later Lord Passfield.Together they contributed to the radical
ideology first of the Liberal Party and later of the Labour Party.
from: BeatriceWebb,A summer holiday in Scotland, 1884.
BeatriceWebb (1858-1943), nee Potter, social reformer and diarist.
Married to SidneyWebb, pioneers of social science. She was
involved in many spheres of political and social activity including the
Labour Party, Fabianism, social observation, investigations into
poverty, development of socialism, the foundation of the National
Health Service and post war welfare state, the London School of
Biographical Notes
Works
Our Partnership
My Apprenticeship
The case for the factory acts
BeatriceWebb’s diaries; edited by MargaretCole
The Diary
Knows
http://dbpedia.org/page/George_Bernard_Shaw
http://dbpedia.org/page/Sidney_Webb,_1st_Bar
on_Passfield
15Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/
Contact
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 16
Adrian Stevenson
SeniorTechnical Coordinator
Jisc Manchester
http://www.jisc.ac.uk
adrian.stevenson@jisc.ac.uk
http://www.twitter.com/adrianstevenson
https://www.linkedin.com/in/adrianstevenson
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 17
CC License
This presentation available under creative commons Non
Commercial-Share Alike:
http://creativecommons.org/licenses/by-nc/2.0/uk/

More Related Content

What's hot

The Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It OutThe Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It Out
Adrian Stevenson
 
CKAN intro for Estonian open data workshop
CKAN intro for Estonian open data workshopCKAN intro for Estonian open data workshop
CKAN intro for Estonian open data workshop
Irina Bolychevsky
 
ProQuest: The Road to Open Access - An Aggregator Journey (LundOnline 2014)
ProQuest: The Road to Open Access - An Aggregator Journey (LundOnline 2014)ProQuest: The Road to Open Access - An Aggregator Journey (LundOnline 2014)
ProQuest: The Road to Open Access - An Aggregator Journey (LundOnline 2014)
ProQuest
 

What's hot (19)

Open Data
Open DataOpen Data
Open Data
 
ESDG seminar 2019: reconstructing a country
ESDG seminar 2019: reconstructing a countryESDG seminar 2019: reconstructing a country
ESDG seminar 2019: reconstructing a country
 
It's the end of the world as we know it, and i feel fine
It's the end of the world as we know it, and i feel fineIt's the end of the world as we know it, and i feel fine
It's the end of the world as we know it, and i feel fine
 
The Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It OutThe Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It Out
 
2011 11 grdi-presentation
2011 11 grdi-presentation2011 11 grdi-presentation
2011 11 grdi-presentation
 
2011 jisc rdtf teresa the womens library
2011 jisc rdtf teresa the womens library2011 jisc rdtf teresa the womens library
2011 jisc rdtf teresa the womens library
 
Clariah WP4 dataLegend data stories
Clariah WP4 dataLegend data storiesClariah WP4 dataLegend data stories
Clariah WP4 dataLegend data stories
 
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...
 
CKAN intro for Estonian open data workshop
CKAN intro for Estonian open data workshopCKAN intro for Estonian open data workshop
CKAN intro for Estonian open data workshop
 
Esshc presentation ashkan
Esshc presentation ashkanEsshc presentation ashkan
Esshc presentation ashkan
 
ProQuest: The Road to Open Access - An Aggregator Journey (LundOnline 2014)
ProQuest: The Road to Open Access - An Aggregator Journey (LundOnline 2014)ProQuest: The Road to Open Access - An Aggregator Journey (LundOnline 2014)
ProQuest: The Road to Open Access - An Aggregator Journey (LundOnline 2014)
 
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
 
Linked open data and libraries
Linked open data and librariesLinked open data and libraries
Linked open data and libraries
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
 
What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)
 
Internal meeting: An introduction to the civil registry & LINKS
Internal meeting: An introduction to the civil registry & LINKSInternal meeting: An introduction to the civil registry & LINKS
Internal meeting: An introduction to the civil registry & LINKS
 
WW2 underground newspapers on Wikipedia using DBPedia , 12-2-2016, The Hague
WW2 underground newspapers on Wikipedia using DBPedia , 12-2-2016, The HagueWW2 underground newspapers on Wikipedia using DBPedia , 12-2-2016, The Hague
WW2 underground newspapers on Wikipedia using DBPedia , 12-2-2016, The Hague
 
Cultural Heritage Information Dashboards
Cultural Heritage Information DashboardsCultural Heritage Information Dashboards
Cultural Heritage Information Dashboards
 
The Past's Present Future: Emerging Trends in Online Cultural Heritage
The Past's Present Future:  Emerging Trends in Online Cultural HeritageThe Past's Present Future:  Emerging Trends in Online Cultural Heritage
The Past's Present Future: Emerging Trends in Online Cultural Heritage
 

Viewers also liked

Promotion of Scientific Output : made possible by your library
Promotion of Scientific Output : made possible by your libraryPromotion of Scientific Output : made possible by your library
Promotion of Scientific Output : made possible by your library
Guus van den Brekel
 
Data manipulation instructions
Data manipulation instructionsData manipulation instructions
Data manipulation instructions
Mahesh Kumar Attri
 

Viewers also liked (18)

Exploring British Design
Exploring British DesignExploring British Design
Exploring British Design
 
Promotion of Scientific Output : made possible by your library
Promotion of Scientific Output : made possible by your libraryPromotion of Scientific Output : made possible by your library
Promotion of Scientific Output : made possible by your library
 
The Cutting Edge of SWORD
The Cutting Edge of SWORDThe Cutting Edge of SWORD
The Cutting Edge of SWORD
 
Linked Data and the Semantic Web: What Are They and Should I Care?
Linked Data and the Semantic Web: What Are They and Should I Care?Linked Data and the Semantic Web: What Are They and Should I Care?
Linked Data and the Semantic Web: What Are They and Should I Care?
 
Linked Data and the Semantic Web - What Are They and Should I Care?
Linked Data and the Semantic Web - What Are They and Should I Care?Linked Data and the Semantic Web - What Are They and Should I Care?
Linked Data and the Semantic Web - What Are They and Should I Care?
 
High and Lows of Library Linked Data
High and Lows of Library Linked DataHigh and Lows of Library Linked Data
High and Lows of Library Linked Data
 
Very Gentle Linked Data Workshop
Very Gentle Linked Data WorkshopVery Gentle Linked Data Workshop
Very Gentle Linked Data Workshop
 
Clearspace Demonstration
Clearspace DemonstrationClearspace Demonstration
Clearspace Demonstration
 
The Story of How an Oracle Classic Stronghold successfully embraced SOA
The Story of How an Oracle Classic Stronghold successfully embraced SOAThe Story of How an Oracle Classic Stronghold successfully embraced SOA
The Story of How an Oracle Classic Stronghold successfully embraced SOA
 
Visualization - how one picture beats a 1000 words - and how to leverage that
Visualization - how one picture beats a 1000 words - and how to leverage thatVisualization - how one picture beats a 1000 words - and how to leverage that
Visualization - how one picture beats a 1000 words - and how to leverage that
 
Inheritance
InheritanceInheritance
Inheritance
 
Locah Project Show and Tell
Locah Project Show and TellLocah Project Show and Tell
Locah Project Show and Tell
 
Lessons from ‘Linking Lives’ and ‘WW1 Discovery’ Projects
Lessons from ‘Linking Lives’ and ‘WW1 Discovery’ ProjectsLessons from ‘Linking Lives’ and ‘WW1 Discovery’ Projects
Lessons from ‘Linking Lives’ and ‘WW1 Discovery’ Projects
 
Use Cases Vs User Stories
Use Cases Vs User StoriesUse Cases Vs User Stories
Use Cases Vs User Stories
 
Data manipulation instructions
Data manipulation instructionsData manipulation instructions
Data manipulation instructions
 
31 Case Studies on Conversion Optimization
31 Case Studies on Conversion Optimization31 Case Studies on Conversion Optimization
31 Case Studies on Conversion Optimization
 
From Use case to User Story
From Use case to User StoryFrom Use case to User Story
From Use case to User Story
 
Data transfer and manipulation
Data transfer and manipulationData transfer and manipulation
Data transfer and manipulation
 

Similar to Tools for Data Manipulation - UKAD Open Refine Workshop

VRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_SeneffVRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_Seneff
Heather Seneff
 
Of Cataloging & Context
Of Cataloging & ContextOf Cataloging & Context
Of Cataloging & Context
charper
 
Web of Data Usage Mining
Web of Data Usage MiningWeb of Data Usage Mining
Web of Data Usage Mining
Markus Luczak-Rösch
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
Bernhard Haslhofer
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
Juan Sequeda
 

Similar to Tools for Data Manipulation - UKAD Open Refine Workshop (20)

鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107
 
VRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_SeneffVRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_Seneff
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to Reuse
 
Of Cataloging & Context
Of Cataloging & ContextOf Cataloging & Context
Of Cataloging & Context
 
Linked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureLinked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the Future
 
Gatenby Vvbad 200909
Gatenby Vvbad 200909Gatenby Vvbad 200909
Gatenby Vvbad 200909
 
Linked Data: from Library Entities to the Web of Data
Linked Data: from Library Entities to the Web of DataLinked Data: from Library Entities to the Web of Data
Linked Data: from Library Entities to the Web of Data
 
BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?
 
Open data and linked data
Open data and linked dataOpen data and linked data
Open data and linked data
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural Heritage
 
Web of Data Usage Mining
Web of Data Usage MiningWeb of Data Usage Mining
Web of Data Usage Mining
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearch
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
Resilient Linked Data
Resilient Linked DataResilient Linked Data
Resilient Linked Data
 
Linked Open Data: Opportunities & Barriers for Archives
Linked Open Data: Opportunities & Barriers for ArchivesLinked Open Data: Opportunities & Barriers for Archives
Linked Open Data: Opportunities & Barriers for Archives
 
LD4L OCLC Data Strategy
LD4L OCLC Data StrategyLD4L OCLC Data Strategy
LD4L OCLC Data Strategy
 

More from Adrian Stevenson

Linked Data and the Semantic Web - Mimas Seminar
Linked Data and the Semantic Web - Mimas SeminarLinked Data and the Semantic Web - Mimas Seminar
Linked Data and the Semantic Web - Mimas Seminar
Adrian Stevenson
 

More from Adrian Stevenson (20)

SEO Matters
SEO MattersSEO Matters
SEO Matters
 
“Il n’y a pas de hors-texte” - Challenges for Archival Linked Data
“Il n’y a pas de hors-texte” - Challenges for Archival Linked Data“Il n’y a pas de hors-texte” - Challenges for Archival Linked Data
“Il n’y a pas de hors-texte” - Challenges for Archival Linked Data
 
Wrapping and Unwrapping History: What’s Gained and What’s Lost
Wrapping and Unwrapping History: What’s Gained and What’s LostWrapping and Unwrapping History: What’s Gained and What’s Lost
Wrapping and Unwrapping History: What’s Gained and What’s Lost
 
Digital Humanities and the First World War
Digital Humanities and the First World WarDigital Humanities and the First World War
Digital Humanities and the First World War
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
 
GLAM Rocks! London Semantic Web Meetup
GLAM Rocks! London Semantic Web MeetupGLAM Rocks! London Semantic Web Meetup
GLAM Rocks! London Semantic Web Meetup
 
Linked Data - the Future for Open Repositories. Kultivate Workshop
Linked Data - the Future for Open Repositories. Kultivate WorkshopLinked Data - the Future for Open Repositories. Kultivate Workshop
Linked Data - the Future for Open Repositories. Kultivate Workshop
 
2 minutes on LOCAH Linking Lives at Europeana Tech 2011
 2 minutes on LOCAH Linking Lives at Europeana Tech 2011 2 minutes on LOCAH Linking Lives at Europeana Tech 2011
2 minutes on LOCAH Linking Lives at Europeana Tech 2011
 
Report on the International Linked Open Data for Libraries, Archives and Muse...
Report on the International Linked Open Data for Libraries, Archives and Muse...Report on the International Linked Open Data for Libraries, Archives and Muse...
Report on the International Linked Open Data for Libraries, Archives and Muse...
 
Aggregation Using Linked Data – LOCAH Project Experiences
Aggregation Using Linked Data – LOCAH Project ExperiencesAggregation Using Linked Data – LOCAH Project Experiences
Aggregation Using Linked Data – LOCAH Project Experiences
 
Linked Data - the Future for Open Repositories?
Linked Data - the Future for Open Repositories?Linked Data - the Future for Open Repositories?
Linked Data - the Future for Open Repositories?
 
LOCAH Project and Considerations of Linked Data Approaches
LOCAH Project and Considerations of Linked Data ApproachesLOCAH Project and Considerations of Linked Data Approaches
LOCAH Project and Considerations of Linked Data Approaches
 
Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data
Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked DataDo the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data
Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data
 
RDFa From Theory to Practice
RDFa From Theory to PracticeRDFa From Theory to Practice
RDFa From Theory to Practice
 
Linked Data and the Semantic Web - Mimas Seminar
Linked Data and the Semantic Web - Mimas SeminarLinked Data and the Semantic Web - Mimas Seminar
Linked Data and the Semantic Web - Mimas Seminar
 
Semantic Technologies: Which Way Now? – UKOLN Response
Semantic Technologies: Which Way Now? – UKOLN ResponseSemantic Technologies: Which Way Now? – UKOLN Response
Semantic Technologies: Which Way Now? – UKOLN Response
 
SWORD 3 Kick-off Meeting
SWORD 3 Kick-off MeetingSWORD 3 Kick-off Meeting
SWORD 3 Kick-off Meeting
 
Making Repository Easier With SWORD
Making Repository Easier With SWORDMaking Repository Easier With SWORD
Making Repository Easier With SWORD
 
SWORD: The Story So Far
SWORD: The Story So FarSWORD: The Story So Far
SWORD: The Story So Far
 
SWORD: An Overview
SWORD: An OverviewSWORD: An Overview
SWORD: An Overview
 

Recently uploaded

Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 

Recently uploaded (20)

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Tools for Data Manipulation - UKAD Open Refine Workshop

  • 1. Adrian Stevenson, Senior Technical Coordinator, Jisc Manchester Tools for Data Manipulation UKAD Open RefineWorkshop, Jisc London, 18th March 2016
  • 2. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 2 Workshop Resources Available from: http://data.archiveshub.ac.uk/workshops/ukad2016/readme.html Link to Open Refine and plugins Link to example data used for workshop Link to completed Open Refine project from todays workshop
  • 3. Open Refine OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. Main Uses: • Explore data • Clean and transform data • Reconcile and match data Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 3
  • 4. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 4 Installing and running Open Refine Download from: http://openrefine.org/download.html Run and in a web browser go to: http://127.0.0.1:3333/ Select ‘create project’ and browse for Archives Hub example csv data file Note: May need to clear browser cache to see new projects
  • 5. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 5 Clean andTransform - Facets and Clustering Strip white space Transform Upper case, title case Split multi valued cells or Edit col > Split several cols Facet on label Order by count Cluster and rename rows Undo
  • 6. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 6 Clean - Remove Duplicate rows Sort on column with duplicates and reorder permanently Facet duplicates to check Watch for OR switching from rows to records view Edit cells > Blank Down Facet by blank Remove all matching Essence of Open Refine is using facets and filters to isolate rows and invoke commands to affect all these rows together
  • 7. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 7
  • 9. Triples Triples statements »‘Things’ have ‘properties’ with ‘values’ »Subject – Predicate - Object Archival Resource Repository Provides Access To Pride and Prejudice Jane Austen Is Author Of Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 9 Triples are the basis of RDF and Linked Data
  • 10. owl:sameAs Hub Person - owl:sameAs -VIAF Person <http://data.archiveshub.ac.uk/id/person/nra/webbma rthabeatrice1858-1943socialreformer> owl:sameAs <http://viaf.org/viaf/86607236> . Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 10
  • 11. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 11 Matching Names toVIAF May need to join columns together, for example to give more consistent name form, e.g using: cells["FamilyName"].value + ", " + cells["GivenName"].value + ", " + cells["Dates"].value
  • 12. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 12 Matching Names toVIAF VIAF reconciliation service details at: http://iphylo.blogspot.co.uk/2013/04/reconciling-author-names-using-open.html May need to add as a ‘standard service’ under Reconcile > Start reconciling. Service URL is: http://iphylo.org/~rpage/phyloinformatics/services/reconcil iation_viaf.php Other recon services e.g. LCSH at: https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data- Sources
  • 13. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 13 RDF Export Download RDF Refine Extension from http://refine.deri.ie/ Unzip Open Project > Browse workspace directory Create ‘extensions’ folder (if doesn’t exist) Copy RDF Refine unzipped folder to workspace directory Restart Open Refine Need to create column withVIAF URIs for export: "http://viaf.org/viaf/"+cell.recon.match.id
  • 14. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 14 Matching Subjects to LCSH Click RDF button in the top right corner, select ‘Add reconciliation service, Based on SPARQL endpoint’. Add following parameters: Name: LCSH Endpoint URL: http://sparql.freeyourmetadata.org/ Graph URI: http://id.loc.gov/authorities/subjects Type:Virtuoso Label properties: check only skos:prefLabel
  • 15. Martha BeatriceWebb Place of birth:Gloucester, England Place of death: Liphook, Hampshire, England Life dates: 1858-1943 Epithet: social reformer and historian Family name:Webb Image from: BeatriceWebb letters BeatriceWebb (1858 - 1943). Fabian Socialist, social reformer, writer, historian, diarist.Wife, collaborator and assistant of SidneyWebb, later Lord Passfield.Together they contributed to the radical ideology first of the Liberal Party and later of the Labour Party. from: BeatriceWebb,A summer holiday in Scotland, 1884. BeatriceWebb (1858-1943), nee Potter, social reformer and diarist. Married to SidneyWebb, pioneers of social science. She was involved in many spheres of political and social activity including the Labour Party, Fabianism, social observation, investigations into poverty, development of socialism, the foundation of the National Health Service and post war welfare state, the London School of Biographical Notes Works Our Partnership My Apprenticeship The case for the factory acts BeatriceWebb’s diaries; edited by MargaretCole The Diary Knows http://dbpedia.org/page/George_Bernard_Shaw http://dbpedia.org/page/Sidney_Webb,_1st_Bar on_Passfield 15Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/
  • 16. Contact Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 16 Adrian Stevenson SeniorTechnical Coordinator Jisc Manchester http://www.jisc.ac.uk adrian.stevenson@jisc.ac.uk http://www.twitter.com/adrianstevenson https://www.linkedin.com/in/adrianstevenson
  • 17. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 17 CC License This presentation available under creative commons Non Commercial-Share Alike: http://creativecommons.org/licenses/by-nc/2.0/uk/

Editor's Notes

  1. Hub used mainly for linked data project where we wanted to match to VIAF. Will come to later in the workshop.
  2. Review options on import screen Talk through the example data and the purpose of the columns
  3. Facet
  4. Mention that facet on duplicates for person URI doesn’t necc mean want to remove the rows as the Arc Res URIs may be different. Depends what wanting to do. More tutorials http://kb.refinepro.com/2011/08/remove-duplicate.html http://enipedia.tudelft.nl/wiki/OpenRefine_Tutorial#Deduplicate_entries
  5. Explain why might want to reconcile to VIAF. Other recon services at https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources
  6. http://www.w3.org/DesignIssues/LinkedData.html
  7. If any of cells in the columns are blank, the merge will fail for that row. To fix, create a facet of blank cells with "Text Facet" ⇒ "Customized Facets" ⇒ "Facet by Blank". Then use "Edit Cells" ⇒ "Transform ..." and enter a string with a space: ' '. This also has it’s limitations as some names have inconsistent number of commas.
  8. Talk through faceting of judgement. How check and accept reconclied rows. Explain why this is why have included Hub URI and ArcRes URI for manual checking
  9. Mock-up of the LInking Lives interface shows the way data is brought together.