This document introduces Yahoo Query Language (YQL) and provides an overview of its features and capabilities. YQL allows users to query, filter, and join data across web services using an SQL-like language. It provides a single API specification and allows accessing and modifying internet data through a uniform query language. Examples demonstrate how to retrieve twitter trending topics for a location and get related news articles through a single YQL query. The document also discusses open data tables, the YQL console, and how to contribute new tables to the system.
1. Yahoo Query Language:
select * from internet
an Introduction
Mirek Grymuza – mirek@yahoo-inc.com
Josh Gordineer – joshgord@yahoo-inc.com
2. What are we going to cover?
• What, why and brief history of YQL
• Overview of YQL features, YQL Console
• Get into more detail with: YQL in practice
4. My application
my awesome application
•multiple data sources
•different specs and formats
•multiple connections
•api changes to deal with
•no arbitrary sources without work
5. Enter YQL
•single API spec
•SQL-like
•select/insert/update/delete
•let YQL optimize queries
•powerful
my awesome application
6. So what can YQL do?
SELECT * FROM flickr.photos.info WHERE photo_id IN (SELECT id FROM flickr.photos.search(1) WHERE text IN (SELECT content FROM
search.termextract WHERE context IN (SELECT body FROM nyt.article.search WHERE apikey='key' AND query='obama' LIMIT 1)))
show: lists the supported tables
desc: describes the structure of a table
select: fetches data
insert/update/delete: modify data
use: use an Open Data Table
set: define key-values across Open Data Tables
The statement
7. Filtering, paging, projection
• Table data can be filtered in the WHERE clause either:
–Remotely by the table data source provider or
–Locally by the YQL engine
• YQL tries to present “rows” of data
–Abstracts away “paging” views of data sources
–Presents a “subset” of paging tables by default
• In YQL fields are analogous to the columns of a table,
multiple fields are delimited by commas
select Title,Address from local.search(0,10) where query="sushi" and
location="san francisco, ca" and Rating.AverageRating="4.5" LIMIT 2
8. Joining across sources
• Sub-select works the same as normal select except it can
only return a “leaf” element value or attribute
• Parallelizes execution
• Example: How to get an international weather forecast?
Join two services in different companies:
select * from weather.forecast where location in (select id from xml where
url=http://xoap.weather.com/search/search?where=prague and
itemPath="search.loc")
9. Post-query manipulation
• YQL includes built-in functions such as sort, unique,
truncate, tail, reverse...
• Simple post-SELECT processing can be performed by
appending the “pipe” symbol to the end of the
statement SELECT … | sort(field=item.date) SELECT
… | unique(field=item.title) | …
• Functions only operate on the data being returned by the
query, nothing to do with the tables or data sources
themselves
select * from social.profile where guid in (select guid from
social.connections where owner_guid=me) | sort(field="nickname")
10. How do you benefit?
SELECT * FROM INTERNET
(INSERT/UPDATE/DELETE)
Uniform method for accessing and modifying
internet data and services
Simplify and enrich data and service
access via uniform query language and
execute tables
11. Now let’s review - what is YQL?
• Cloud web service with SQL-Like Language
–Familiar to developers
• Synonymous with Data access
–Expressive enough to get the right data.
• Self describing - show, desc table
• Allows you to query, filter, join and update data across any
structured data on the web / web services
–And Yahoo’s Sherpa cloud storage
• All in Real time
• Inject business logic with execute element
12. YQL Since Launch...
• open data tables, environment files
• execute element - April
• new paging model
• insert/update/delete, jsonp-x - July
• set verb, yql.storage, debug mode, multi env
• y.rest, y.query with timeouts
• custom cache, query alias
• meta element
• extend execute to add libraries, functions
• console cache, shortener and query builder
• lots of various data tables since then and more being added
Launched October 28 2008
2010
an enhancement
or new feature
added every
month since
2009
13. ...where is YQL today?
Most popular tables this month?
~6B table requests in October
on track to 7B in November
Popular since launch?
14. YQL Console
• http://developer.yahoo.com/yql/console/
• Hosted site which executes YQL queries
• Swiss Army Knife for YQL Developers
• Design and debug quickly
How many tables?
• default tables – 175
• community tables – 772
• total - 947
19. What is YQL?
• “The Yahoo! Query Language is an expressive SQL-like
language that lets you query, filter, and join data across
Web services.”
• So what does that mean?
• Be “lazy” – Let YQL take care of the data
–Allows you to focus on innovation not on API’s
20. The Problem
• Fetch the Yahoo! News articles for Twitter trending topics
in San Francisco
• And be “lazy” i.e. use YQL
21. YQL Tables
• Built-in Tables
–Maintained by the YQL Team (or Yahoo!)
–fantasy sports, weather, answers, flickr, geo, music,
search, upcoming, mail …
• Data Tables
–Specialized tables to fetch raw data from the web
–atom, csv, html, json, xml …
23. Open Data Tables
• Brings the power of YQL to any API
• Open Data Table Schema defines mapping between YQL
and Endpoint
–http://query.yahooapis.com/v1/schema/table.xsd
• Supply the open table with the “use” statement
• Supply multiple open tables with an “env” query parameter
–ENV file contains multiple USE statements
–Loads environment prior to executing YQL query
24. Open Data Table Example
<?xml version="1.0" encoding="UTF-8"?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
<bindings>
<select itemPath="matching_trends.trends.trend"
produces="XML">
<urls>
<url>http://api.twitter.com/1/trends/{woeid}.xml</url>
</urls>
<inputs>
<key id="woeid” paramType="path" required="true" />
</inputs>
</select>
</bindings>
</table>
25. url and key Elements
<url>http://api.twitter.com/1/trends/{woeid}.xml</url>
• Provides the resource location for your API
<key id="woeid" paramType="path" required="true" />
• Defines the parameters for the API and provides a binding
for the YQL where clause
• paramType can be query or path
• required is optional
26. Running YQL Queries
• Console
–http://developer.yahoo.com/yql/console
–Quickly discover tables and iterate on queries
• Public Endpoint
–http://query.yahooapis.com/v1/public/yql
–No Auth
–Rate limit 1K/hour per IP
• Authenticated Endpoint
–http://query.yahooapis.com/v1/yql
–OAuth
–10x higher rate limits
27. YQL Webservice Basics cont’d
• Query passed in as the “q” query parameter
–http://query.yahooapis.com/v1/public/yql?q=show%20ta
bles
• Execute as a simple HTTP GET
–curl
http://query.yahooapis.com/v1/public/yql?q=show%20ta
bles
• Also available for PUT, POST and DELETE
–curl -d "q=show%20tables"
http://query.yahooapis.com/v1/public/yql
32. Community Tables
• Someone may have done the work for you already
–http://datatables.org
• Tables are hosted on GitHub
–https://github.com/yql/yql-tables
• Use the env query parameter to include all community
tables in a request
–env=store://datatables.org/alltableswithkeys
34. Contributing
Process for adding/updating tables on Git
1. Fork the YQL Tables project
2. Clone your Fork
3. Make your changes
4. Push Changes / Commit
5. Make Pull Request
6. YQL Table Admin will moderate and merge changes
and generate new push to datatables.org
• Steps 1-5 are standard Git procedures, step 6 is unique
• Git Tutorials
–http://help.github.com/forking
–http://thinkvitamin.com/code/starting-with-git-cheat-
sheet
35. Twitter Trending News Query
select abstract, url from search.news where query in (
select trend from twitter.trends.location where
woeid=2487956
)
Retrieves news results for the latest twitter trending topics in
San Francisco
• Combines numerous API calls into a single YQL query
• Filters search.news response from 5 fields into just 2