O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Adrian Stevenson, Senior Technical Coordinator, Jisc Manchester
Tools for Data Manipulation
UKAD Open RefineWorkshop, Jisc...
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 2
Workshop Resources...
Open Refine
OpenRefine (formerly Google Refine) is a powerful tool for
working with messy data: cleaning it; transforming ...
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 4
Installing and run...
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 5
Clean andTransform...
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 6
Clean - Remove Dup...
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 7
URIs
LD Design Issues
Triples
http://www.w3.org/DesignIssues/LinkedData.html
8
Triples
Triples statements
»‘Things’ have ‘properties’ with ‘values’
»Subject – Predicate - Object
Archival
Resource
Repos...
owl:sameAs
Hub Person - owl:sameAs -VIAF Person
<http://data.archiveshub.ac.uk/id/person/nra/webbma
rthabeatrice1858-1943s...
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 11
Matching Names to...
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 12
Matching Names to...
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 13
RDF Export
Downlo...
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 14
Matching Subjects...
Martha BeatriceWebb
Place of birth:Gloucester,
England
Place of death: Liphook,
Hampshire, England
Life dates: 1858-1943
E...
Contact
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 16
Adrian St...
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 17
CC License
This p...
Próximos SlideShares
Carregando em…5
×
Próximos SlideShares
EconBiz mobile: Why App and where do we go from here?
Avançar
Transfira para ler offline e ver em ecrã inteiro.

0

Compartilhar

Baixar para ler offline

Tools for Data Manipulation - UKAD Open Refine Workshop

Baixar para ler offline

Held at Jisc London 18th March 2016.

Details at http://www.nationalarchives.gov.uk/archives-sector/engaging-with-ukad.htm

Audiolivros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo
  • Seja a primeira pessoa a gostar disto

Tools for Data Manipulation - UKAD Open Refine Workshop

  1. 1. Adrian Stevenson, Senior Technical Coordinator, Jisc Manchester Tools for Data Manipulation UKAD Open RefineWorkshop, Jisc London, 18th March 2016
  2. 2. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 2 Workshop Resources Available from: http://data.archiveshub.ac.uk/workshops/ukad2016/readme.html Link to Open Refine and plugins Link to example data used for workshop Link to completed Open Refine project from todays workshop
  3. 3. Open Refine OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. Main Uses: • Explore data • Clean and transform data • Reconcile and match data Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 3
  4. 4. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 4 Installing and running Open Refine Download from: http://openrefine.org/download.html Run and in a web browser go to: http://127.0.0.1:3333/ Select ‘create project’ and browse for Archives Hub example csv data file Note: May need to clear browser cache to see new projects
  5. 5. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 5 Clean andTransform - Facets and Clustering Strip white space Transform Upper case, title case Split multi valued cells or Edit col > Split several cols Facet on label Order by count Cluster and rename rows Undo
  6. 6. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 6 Clean - Remove Duplicate rows Sort on column with duplicates and reorder permanently Facet duplicates to check Watch for OR switching from rows to records view Edit cells > Blank Down Facet by blank Remove all matching Essence of Open Refine is using facets and filters to isolate rows and invoke commands to affect all these rows together
  7. 7. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 7
  8. 8. URIs LD Design Issues Triples http://www.w3.org/DesignIssues/LinkedData.html 8
  9. 9. Triples Triples statements »‘Things’ have ‘properties’ with ‘values’ »Subject – Predicate - Object Archival Resource Repository Provides Access To Pride and Prejudice Jane Austen Is Author Of Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 9 Triples are the basis of RDF and Linked Data
  10. 10. owl:sameAs Hub Person - owl:sameAs -VIAF Person <http://data.archiveshub.ac.uk/id/person/nra/webbma rthabeatrice1858-1943socialreformer> owl:sameAs <http://viaf.org/viaf/86607236> . Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 10
  11. 11. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 11 Matching Names toVIAF May need to join columns together, for example to give more consistent name form, e.g using: cells["FamilyName"].value + ", " + cells["GivenName"].value + ", " + cells["Dates"].value
  12. 12. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 12 Matching Names toVIAF VIAF reconciliation service details at: http://iphylo.blogspot.co.uk/2013/04/reconciling-author-names-using-open.html May need to add as a ‘standard service’ under Reconcile > Start reconciling. Service URL is: http://iphylo.org/~rpage/phyloinformatics/services/reconcil iation_viaf.php Other recon services e.g. LCSH at: https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data- Sources
  13. 13. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 13 RDF Export Download RDF Refine Extension from http://refine.deri.ie/ Unzip Open Project > Browse workspace directory Create ‘extensions’ folder (if doesn’t exist) Copy RDF Refine unzipped folder to workspace directory Restart Open Refine Need to create column withVIAF URIs for export: "http://viaf.org/viaf/"+cell.recon.match.id
  14. 14. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 14 Matching Subjects to LCSH Click RDF button in the top right corner, select ‘Add reconciliation service, Based on SPARQL endpoint’. Add following parameters: Name: LCSH Endpoint URL: http://sparql.freeyourmetadata.org/ Graph URI: http://id.loc.gov/authorities/subjects Type:Virtuoso Label properties: check only skos:prefLabel
  15. 15. Martha BeatriceWebb Place of birth:Gloucester, England Place of death: Liphook, Hampshire, England Life dates: 1858-1943 Epithet: social reformer and historian Family name:Webb Image from: BeatriceWebb letters BeatriceWebb (1858 - 1943). Fabian Socialist, social reformer, writer, historian, diarist.Wife, collaborator and assistant of SidneyWebb, later Lord Passfield.Together they contributed to the radical ideology first of the Liberal Party and later of the Labour Party. from: BeatriceWebb,A summer holiday in Scotland, 1884. BeatriceWebb (1858-1943), nee Potter, social reformer and diarist. Married to SidneyWebb, pioneers of social science. She was involved in many spheres of political and social activity including the Labour Party, Fabianism, social observation, investigations into poverty, development of socialism, the foundation of the National Health Service and post war welfare state, the London School of Biographical Notes Works Our Partnership My Apprenticeship The case for the factory acts BeatriceWebb’s diaries; edited by MargaretCole The Diary Knows http://dbpedia.org/page/George_Bernard_Shaw http://dbpedia.org/page/Sidney_Webb,_1st_Bar on_Passfield 15Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/
  16. 16. Contact Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 16 Adrian Stevenson SeniorTechnical Coordinator Jisc Manchester http://www.jisc.ac.uk adrian.stevenson@jisc.ac.uk http://www.twitter.com/adrianstevenson https://www.linkedin.com/in/adrianstevenson
  17. 17. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 17 CC License This presentation available under creative commons Non Commercial-Share Alike: http://creativecommons.org/licenses/by-nc/2.0/uk/

Held at Jisc London 18th March 2016. Details at http://www.nationalarchives.gov.uk/archives-sector/engaging-with-ukad.htm

Vistos

Vistos totais

1.084

No Slideshare

0

De incorporações

0

Número de incorporações

33

Ações

Baixados

2

Compartilhados

0

Comentários

0

Curtir

0

×