SlideShare uma empresa Scribd logo
1 de 23
Intro to Open Refine
An overview & walkthrough to get you started.
 intro/overview (15 min)
 walkthrough (45 min)
 intro to advanced (10 min)
 q&a (20 min)
http://www.txdhc.org/txdhc-training-webcast-materials/
Jennifer Hecker Liz Grumbach
“a tool for working
with messy data”
Cleaning up data that is:
 in a simple tabular format
 is inconsistently formatted
 has inconsistent terminology
 get an overview of a data set
 resolve inconsistencies
 split data up into more granular parts
 match local data up to other data sets
 enhance a data set with data from
other sources
https://cms-assets.tutsplus.com/uploads/users/199/posts/20843/image/text-facet-openrefine.png
https://cms-assets.tutsplus.com/uploads/users/199/posts/20843/image/clustering-openrefine.png
https://cms-assets.tutsplus.com/uploads/users/199/posts/20843/image/clustering-openrefine.png
Freebase Gridworks
=
GoogleRefine
=
OpenRefine
=
Refine
…ask some questions about your data set:
 What type of data is it & what format is it in?
 What’s the size of your data set?
 What question do you want to ask your data?
 What do you need to do to find the answer?
Excel
familiarity, better for data entry, cut and paste
operation, no paging to navigate
Google Spreadsheets
similar to Excel, can get external data
relatively easily, easy to collaborate and share
Google Fusion Tables if you just want to filter, easy to share
Text editor powerful text editor can do many things
Unix tools
more challenging to use, but quick and some
things (finding things, sorting) are easy
Writing code most sophisticated and most to learn!
<And now Liz attempts the
dangerous LIVE DEMO!>
Regular expressions
 “wildcards on steroids” that allow for
more granular data manipulation
(http://www.regular-expressions.info)
Transformations using Open Refine
Expression Language (GREL)
 kind of like a formula in Excel
Retrieve data from online sources
 example: use names to retrieve birth/death dates
from Virtual International Authority File (VIAF)
Match data to external data sources using
 Extensions for RDF, DBpedia, Named-Entity
Recognition (NER), etc…
 And ‘reconciliation’ services
Use ‘cross’ function to compare
contents of two Refine projects, or
share data between the two projects.
 TxDHC blog post on this webinar http://www.txdhc.org/txdhc-training-
webcast-materials/
 The OpenRefine Wiki https://github.com/OpenRefine/OpenRefine/wiki
 OpenRefine User Documentation
https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users
 The ‘Free your metadata’ site http://freeyourmetadata.org...
 …and book http://book.freeyourmetadata.org
 The OpenRefine mailing list and forum
http://groups.google.com/d/forum/openrefine
http://bit.ly/1uGPd0f
Please email us if you have any questions:
Jennifer = jenniferraehecker@gmail.com
Liz = egrumbac@tamu.edu
credits * acknowledgements * citations
These slides were developed by Jennifer Hecker (j.hecker@Austin.utexas.edu) and Liz Grumbach (egrumbac@tamu.edu )
on behalf of University of Texas Libraries, Texas A&M’s Initiative for Digital Humanities, Media and Culture, and the Texas
Digital Humanities Consortium using many resources including the wonderful course material developed by Owen
Stephens on behalf of the British Library (http://www.meanboyfriend.com/overdue_ideas/2014/11/working-with-data-
using-openrefine/).
Unless otherwise stated, all images, audio or video content are separate works with their own license, and should not be
assumed to be CC-BY in their own right. This work is licensed under a Creative Commons Attribution 4.0 International
License http://creativecommons.org/licenses/by/4.0/. It is suggested when crediting this work, you include the phrase
“Developed by Liz Grumback and Jennifer Hecker on behalf of the university of Texas, Texas A&M, and the TXDHC.”
Thanks to University of Texas Libraries, Texas A&M’s Initiative for Digital Humanities, and the Texas Digital Humanities
Consortium for facilitating this presentation.

Mais conteúdo relacionado

Mais procurados

Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
Juan Sequeda
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashups
giurca
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
Primal Pappachan
 

Mais procurados (20)

Evolutionary & Swarm Computing for the Semantic Web
Evolutionary & Swarm Computing for the Semantic WebEvolutionary & Swarm Computing for the Semantic Web
Evolutionary & Swarm Computing for the Semantic Web
 
ISWC 2014 - Dandelion: from raw data to dataGEMs for developers
ISWC 2014 - Dandelion: from raw data to dataGEMs for developersISWC 2014 - Dandelion: from raw data to dataGEMs for developers
ISWC 2014 - Dandelion: from raw data to dataGEMs for developers
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RES
 
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
Using entity extraction extension with OpenRefine and Dandelion API
Using entity extraction extension with OpenRefine and Dandelion APIUsing entity extraction extension with OpenRefine and Dandelion API
Using entity extraction extension with OpenRefine and Dandelion API
 
Introduction to Elastic with a hint of Symfony and Docker
Introduction to Elastic with a hint of Symfony and DockerIntroduction to Elastic with a hint of Symfony and Docker
Introduction to Elastic with a hint of Symfony and Docker
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashups
 
Linked data-tooling-xml
Linked data-tooling-xmlLinked data-tooling-xml
Linked data-tooling-xml
 
Emerging technologies in academic libraries
Emerging technologies in academic librariesEmerging technologies in academic libraries
Emerging technologies in academic libraries
 
Linked data tooling XML
Linked data tooling XMLLinked data tooling XML
Linked data tooling XML
 
The Digital Cavemen of Linked Lascaux
The Digital Cavemen of Linked LascauxThe Digital Cavemen of Linked Lascaux
The Digital Cavemen of Linked Lascaux
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
 
Initial Usage Analysis of DBpedia's Triple Pattern Fragments
Initial Usage Analysis of DBpedia's Triple Pattern FragmentsInitial Usage Analysis of DBpedia's Triple Pattern Fragments
Initial Usage Analysis of DBpedia's Triple Pattern Fragments
 
Apache Stanbol 
and the Web of Data - ApacheCon 2011
Apache Stanbol 
and the Web of Data - ApacheCon 2011Apache Stanbol 
and the Web of Data - ApacheCon 2011
Apache Stanbol 
and the Web of Data - ApacheCon 2011
 
What is Web-scraping?
What is Web-scraping?What is Web-scraping?
What is Web-scraping?
 
Web Scraping Basics
Web Scraping BasicsWeb Scraping Basics
Web Scraping Basics
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University Library
 
Reinhard LAWDI Presentation
Reinhard LAWDI PresentationReinhard LAWDI Presentation
Reinhard LAWDI Presentation
 
Ruby on Rails and the Semantic Web
Ruby on Rails and the Semantic WebRuby on Rails and the Semantic Web
Ruby on Rails and the Semantic Web
 

Semelhante a TXDHC OpenRefine Training

Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4
szbra
 
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaRMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
Jazz Yao-Tsung Wang
 
Semantic Web, an introduction for bioscientists
Semantic Web, an introduction for bioscientistsSemantic Web, an introduction for bioscientists
Semantic Web, an introduction for bioscientists
Emanuele Della Valle
 
Semantic Result Formats: Automatically Transforming Structured Data into usef...
Semantic Result Formats: Automatically Transforming Structured Data into usef...Semantic Result Formats: Automatically Transforming Structured Data into usef...
Semantic Result Formats: Automatically Transforming Structured Data into usef...
Hans-Joerg Happel
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS Practitioners
Emanuele Della Valle
 

Semelhante a TXDHC OpenRefine Training (20)

Tapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and FlinkTapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and Flink
 
Research software and Dataverse
Research software and DataverseResearch software and Dataverse
Research software and Dataverse
 
Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4
 
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaRMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
 
(PROJEKTURA) open data big data @tgg osijek
(PROJEKTURA) open data big data @tgg osijek(PROJEKTURA) open data big data @tgg osijek
(PROJEKTURA) open data big data @tgg osijek
 
Semantic Web, an introduction for bioscientists
Semantic Web, an introduction for bioscientistsSemantic Web, an introduction for bioscientists
Semantic Web, an introduction for bioscientists
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world
 
Build Secure Cloud-Hosted Apps for SharePoint 2013
Build Secure Cloud-Hosted Apps for SharePoint 2013Build Secure Cloud-Hosted Apps for SharePoint 2013
Build Secure Cloud-Hosted Apps for SharePoint 2013
 
DataHub
DataHubDataHub
DataHub
 
Semantic Result Formats: Automatically Transforming Structured Data into usef...
Semantic Result Formats: Automatically Transforming Structured Data into usef...Semantic Result Formats: Automatically Transforming Structured Data into usef...
Semantic Result Formats: Automatically Transforming Structured Data into usef...
 
Dave de Roure - The myExperiment approach towards Open Science
Dave de Roure - The myExperiment approach towards Open ScienceDave de Roure - The myExperiment approach towards Open Science
Dave de Roure - The myExperiment approach towards Open Science
 
My Experiment
My ExperimentMy Experiment
My Experiment
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
Visualization of Information (ProQuest)
Visualization of Information (ProQuest)Visualization of Information (ProQuest)
Visualization of Information (ProQuest)
 
Flink Apachecon Presentation
Flink Apachecon PresentationFlink Apachecon Presentation
Flink Apachecon Presentation
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS Practitioners
 
O365Con18 - Reach for the Cloud Build Solutions with the Power of Microsoft G...
O365Con18 - Reach for the Cloud Build Solutions with the Power of Microsoft G...O365Con18 - Reach for the Cloud Build Solutions with the Power of Microsoft G...
O365Con18 - Reach for the Cloud Build Solutions with the Power of Microsoft G...
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 

Último

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Último (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 

TXDHC OpenRefine Training

  • 1. Intro to Open Refine An overview & walkthrough to get you started.
  • 2.  intro/overview (15 min)  walkthrough (45 min)  intro to advanced (10 min)  q&a (20 min) http://www.txdhc.org/txdhc-training-webcast-materials/
  • 4. “a tool for working with messy data”
  • 5. Cleaning up data that is:  in a simple tabular format  is inconsistently formatted  has inconsistent terminology
  • 6.  get an overview of a data set  resolve inconsistencies  split data up into more granular parts  match local data up to other data sets  enhance a data set with data from other sources
  • 7.
  • 8.
  • 9.
  • 14. …ask some questions about your data set:  What type of data is it & what format is it in?  What’s the size of your data set?  What question do you want to ask your data?  What do you need to do to find the answer?
  • 15. Excel familiarity, better for data entry, cut and paste operation, no paging to navigate Google Spreadsheets similar to Excel, can get external data relatively easily, easy to collaborate and share Google Fusion Tables if you just want to filter, easy to share Text editor powerful text editor can do many things Unix tools more challenging to use, but quick and some things (finding things, sorting) are easy Writing code most sophisticated and most to learn!
  • 16. <And now Liz attempts the dangerous LIVE DEMO!>
  • 17. Regular expressions  “wildcards on steroids” that allow for more granular data manipulation (http://www.regular-expressions.info)
  • 18. Transformations using Open Refine Expression Language (GREL)  kind of like a formula in Excel
  • 19. Retrieve data from online sources  example: use names to retrieve birth/death dates from Virtual International Authority File (VIAF) Match data to external data sources using  Extensions for RDF, DBpedia, Named-Entity Recognition (NER), etc…  And ‘reconciliation’ services
  • 20. Use ‘cross’ function to compare contents of two Refine projects, or share data between the two projects.
  • 21.  TxDHC blog post on this webinar http://www.txdhc.org/txdhc-training- webcast-materials/  The OpenRefine Wiki https://github.com/OpenRefine/OpenRefine/wiki  OpenRefine User Documentation https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users  The ‘Free your metadata’ site http://freeyourmetadata.org...  …and book http://book.freeyourmetadata.org  The OpenRefine mailing list and forum http://groups.google.com/d/forum/openrefine
  • 22. http://bit.ly/1uGPd0f Please email us if you have any questions: Jennifer = jenniferraehecker@gmail.com Liz = egrumbac@tamu.edu
  • 23. credits * acknowledgements * citations These slides were developed by Jennifer Hecker (j.hecker@Austin.utexas.edu) and Liz Grumbach (egrumbac@tamu.edu ) on behalf of University of Texas Libraries, Texas A&M’s Initiative for Digital Humanities, Media and Culture, and the Texas Digital Humanities Consortium using many resources including the wonderful course material developed by Owen Stephens on behalf of the British Library (http://www.meanboyfriend.com/overdue_ideas/2014/11/working-with-data- using-openrefine/). Unless otherwise stated, all images, audio or video content are separate works with their own license, and should not be assumed to be CC-BY in their own right. This work is licensed under a Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/. It is suggested when crediting this work, you include the phrase “Developed by Liz Grumback and Jennifer Hecker on behalf of the university of Texas, Texas A&M, and the TXDHC.” Thanks to University of Texas Libraries, Texas A&M’s Initiative for Digital Humanities, and the Texas Digital Humanities Consortium for facilitating this presentation.

Notas do Editor

  1. Howdy there everybody! Thanks for joining this inaugural webinar from the Texas Digital Humanities Consortium. We are testing out this format for ongoing consortial training use.
  2. This session is being recorded and, you may follow along with these slides, or access the recording, slides and supplementary materials on the Texas Digital Humanities Consortium website. Also on the website is a link to a three-question survey and we would very much appreciate any feedback you are willing to provide. During the webinar Liz and I are going to trade off presenting and chat-window-monitoring duties. Please be patient and cross your fingers for us!
  3. I’m going to introduce today’s presenters very quickly. I’m Jennifer Hecker and I work at the University of Texas Libraries. I specialize in brining my years of experience as an archivist to bear on our digital access challenges. I also work in the digital humanities space, coordinating collaborations and projects with students, faculty and staff all over UT. I also direct the Austin Fanzine Project and do a lot of outreach and mentoring work. Liz Grumbach works for the Initiative for Digital Humanities, Media, and Culture at Texas A&M as a Research Associate, where she supports faculty, staff, and student Digital Humanities projects and endeavors. She's also the Project Manager for the Advanced Research Consortium and 18thConnect.org, where she organizes peer review, supports the creation of digital editions, and maintains the digital records for all ARC research nodes. She's involved in the management of the Early Modern OCR Project (eMOP), which aims to teach machines how to read early modern fonts and make open source software packages available to other institutions seeking to auto generate transcriptions of large page image data sets. 
  4. An open-source tool for working with messy data. Runs in a browser, but locally – your data don’t leave your machine. Active development community – people creating extensions – and discussion list.
  5. This is some of the basic stuff you might use Refine for. In a little bit, Liz is going to walk you through these functions. Refine does a lot more, too, but today we’re just going to get your feet wet. I’ll come back after the demo and talk a little bit about some of the more advanced possibilities that you can explore…
  6. Refine lets you
  7. Here’s a slide from a webinar I attended a couple of weeks ago. It’s an example of OpenRefine in action – here being used to normalize data as one step in the workflow of a larger metadata aggregation project. So what does it look like?
  8. Refine let’s you split out data that is in one cell into multiple cells – and vice versa.
  9. Here are some simple examples of what we mean when we talk about “normalizing metadata”. Refine lets you easily batch edit data so that it uniformly adheres to your standards.
  10. Here’s what text faceting looks like. It’s useful for getting an overview of your data. Here’s you can quickly see some inconsistencies you might want to address.
  11. Refine also lets you do something called clustering. – change slide – This is my personal favorite part!
  12. Here’s a little bigger view… Liz will go into more detail during the demo, but basically, Refine groups data according to a number of factors that you can adjust that it thinks is similar so that you can review, modify and batch edit. Faceting and clustering are by far the two functions I tend to use most in Refine.
  13. A little background: In conversation, you’ll probably hear all three of these names for this tool. Nobody calls it Freebase Gridworks any more, but the other three are all common. Google originally developed Refine, but then abandoned the project & it became open source, hence the name OpenRefine. Lots of folks – myself included – take the lazy approach and just call it Refine.
  14. There are a number of tools out there that can help you manipulate data sets in a variety of ways. How do you know which is right for you? First, ask yourself some questions about your data.
  15. Here’s a matrix that can help guide your tool selection. It’s not comprehensive, there are more tools out there for sure (and all these tools do more than the brief description above would imply – for example Google Fusion Tables can be used to geocode location information and automatically generate maps, stuff like that), but these are the most common tools and this gives you an idea of what to expect from each of them… Ok, now I’m going to attempt to hand over the presentation to Liz, a couple hundred miles to my East.
  16. Ok, so now that you’re all excited about what you can do with Refine, I’m going to quickly run thorough some of the more advanced functions. By using regular expressions, which I’ve seen described as “wildcards on steroids”, you can more finely filter an manipulate your data.
  17. Using those same regular expressions, Refine helps you use GREL, the Open Refine Expression Language, to perform transformations on your data.
  18. Using various community-developed extensions which you can easily select and install, you can retrieve data from online sources such as VIAF, and you can match data to external sources such as Dbpedia.
  19. Thanks for tuning in y’all! We hope this was helpful and we welcome any questions or feedback y’all might have!