Thousand of services, from translation or text analysis to mapping or URL shortning are now available through API. However if you are not a developer to integrate the API to your workflow you won't be able to process or enrich your data. Good new, with OpenRefine this is no longer the case!
Using the Add Column by Fetching URL you can easliy query those services. We will review how to find API, to query them and parse the result in Refine with various examples and usage type.
2. Working with API in Refine - @magdmartin 2
Next Meet Up
● Next Meet Up
– April 21- 6PM – 7.45PM
– May 21 - 6PM – 7.45PM
– June 18 - 6PM – 7.45PM
● Pool
– Use Case Presentation and learn how other use
Refine
– Learn a specific set of functionality
– Hack session where everybody come with a laptop
data to clean.
3. Working with API in Refine - @magdmartin 3
Working with API in Refine
● What is Refine?
● What is an API?
● How to make API call?
● Hands On!
4. Working with API in Refine - @magdmartin 4
Bridge
The Technology Gap
Domain Expert – Spreadsheet Warriors
(split and join, format, transpose,
complex formulas, )
Basic Knowledge
of Scripting
(normalization, parsing,
schema alignment, API,
fuzzy matching, scheduling)
DBA
Data Science
ETL Engineer
Data Clerk – Data Entry
Spreadsheet user
(filter, sort, manual edition, basic formula)
Understand The Data
(Business Skills)
Know How To
Transform Data
(Technical Skills)
5. Working with API in Refine - @magdmartin 5
Discovery Wrangling
In application feedback
(personal usage)
Profiling Preparation
ad hoc usage
reporting - migration
Quality Transformation
Industralization
Integration
Measure
Check
Build - Do
Learn Think
Plan - Act
A Lean Data Model
6. Working with API in Refine - @magdmartin 6
What is an API?
● Each API provider is like a restaurant:
– Menu = API
– Placing an Order = Executing an API Call
– Different food on the menu = Parameters
– Food = the System's Response
● Each API have their own rules
– Need to register (or not)
– How frequently you can make a call
– GET or POST request (Refine support only GET)
– Price and Service Level Agreement
● See Mashape or ProgrammableWeb to discover API
7. Working with API in Refine - @magdmartin 7
Example of API Provider
everythinglocation
Verify and Geocode
Address Data
● Give an address string
● Returns a validated,
parsed and geocoded
address
Timezonedb
free time zone database
for cities of the world
● Give a lat/long
● Return timezone and time
8. Working with API in Refine - @magdmartin 8
Making API call
Everythinglocation
API address key Verify &
Geocode
Country
https://saas.loqate.com/rest/?lqtkey=[APIKEY]&p=v%2Bg&ctry=USA&addr=1111%20bayhill%20drive%20san%20bruno
Address
Documentation
I'd like the following address
cleaned, parsed and its long / lat
1111 Bayhill Drive San Bruno
9. Working with API in Refine - @magdmartin 9
Longitude
API address Latitude key
http://api.timezonedb.com/?lat=53.7833&lng=1.75&key=<Your_API_Key>
Making API call
Timezonedb
Documentation
http://api.timezonedb.com/?lat=53.7833&lng=1.75&format=json&key=<Your_API_Key>
http://api.timezonedb.com/?lat=53.7833&lng=1.75&key=<Your_API_Key>
format
In JSON format
I'd like the timezone for this
coordinate 53.7833 , 1.75
10. Working with API in Refine - @magdmartin 10
Working with API
Cheat Sheet
● Create a project with Fortune 1000 data: http://cs.brown.edu/~pavlo/fortune1000/
● Concatenate the Address:
cells['address'].value + ' ' +cells['city'].value + ' ' +cells['state'].value + ' '
+cells['zipcode'].value
● Call Everythinglocation:
'http://saas.loqate.com/rest/?lqtkey=[YOURKEY]&p=v
%2Bg&ctry=USA&addr='+value.replace(" ","%20")
● Retrieve only the JSON results
value.replace('{"status":"OK","results":[','').replace(']}','')
● Extract Longitude and Latitude
value.parseJson()['Longitude']
value.parseJson()['Latitude']
● Call Timezonedb
'http://api.timezonedb.com/?
lat='+cells['Latitude'].value+'&lng='+cells['Longitude'].value+'&key=[YOURKEY]'
● Extract ZoneName
value.split('<zoneName>')[1].split('</zoneName>')[0]