3. Creating a Stream: Specification Work out what you want your stream to do What do you want the elements to contain? What sources do you want the data to come from? What is your budget for data acquisition? Who is this data for?
4. Creating a Stream: Definition Write a Stream Definition that executes your specification
5. Creating a Stream: Filtered Data Retrieve the data that is filtered by your stream JSON API HTTP Streaming WebSockets Streaming RSS
6. Creating a Stream in DataSift 1. Select the Create Stream button on any page on DataSift
7. Creating a Stream in DataSift 2. Fill in the title, description, and tags for your Stream The Title and Description will be shown next to your Stream The Tags will be used for search and categorisation of your Stream Enabling the Private checkbox will make your Stream visible only to you
8. Creating a Stream in DataSift 3. Create your first stream definition This is the Stream Editor There is a default stream definition already inserted for you Why not try changing “hello world” to a different value? e.g. interaction.content contains “cat”
9. Creating a Stream in DataSift 4. Hit the Save button Your Stream is now saved You can use the breadcrumbs to go back to see a live preview of the results
10. FSDL: Filtered Stream Definition Language FSDL is the language used to write Stream Definitions for DataSift The language takes the following basic format: <term> <logical operator> <term> <logical operator> There must be a minimum of 1 term in a definition. All terms must be separated by logical operators. A logical operator is either “and” or “or”.
11. FSDL: Nested Rule On the previous slide, we had this definition outline: <term> <logical operator> <term> <logical operator> The term can be either one of a “nested rule” or a “predicate”. A nested rule is a method of including the result of another stream within the logic of this one. The syntax for a nested rule is: rule “<stream identifier>” Where the stream identifier is a 32-character alphanumeric string obtainable from the stream you wish to include’s page on DataSift, or through the API.
12. FSDL: Nested Rule Example This is an example of a simple FSDL definition: interaction.content contains “justinbieber” The Stream Identifier for this definition is 4e8e6772337d0b993391ee6417171b79. The stream will contain all content which contains “justinbieber” in its content. We can create another rule to filter this down further, using the nested rule syntax: rule “4e8e6772337d0b993391ee6417171b79” and language.tag == “en” This performs the same filtering as the first stream, with the addition of only including content determined to be in English using the language.tag == “en” predicate. In this case, the logical operator separating the two terms is “and”.
13. FSDL: Predicates Predicates are formed of 3 items, a target, operator and argument, in the following format: <target> <operator> <argument> In the previous example, we saw this predicate used to filter the results of another rule: language.tag == “en” In this example, the target is “language.tag”; the operator is “==“ (equals); and the argument is “en”. There is a long list of targets, operators, and the arguments they require on the DataSift Support Documentation.
14. FSDL: Example Predicates The following are some examples of some simple predicates: interaction.content contains “#rdgtweetup” twitter.user.friends_count >= 1000 interaction.content contains_word “net” interaction.geo exists author.username in "dtsn,nickhalstead,chris_alexander,datasift"
15. FSDL: Example Definitions Here are examples of more complex definitions composed of multiple terms: (interaction.contentcontains "Justin Bieber« OR interaction.contentcontains "Justin Beiber") (interaction.content contains "Nokia" OR interaction.content contains "Motorola" OR interaction.content contains "Palm") AND interaction.content contains "phone“ interaction.content contains "#rdgfestival" OR interaction.content contains "#readingfestival" OR rule "4315e367618830de6224c479f35db4ca"
16. API Calls API calls are available to perform most of the DataSift functionality. All of these API calls are available through a semi-RESTful interface, in a similar way to the Twitter API. Data formats supported include JSON, JSONP, XML and PHP (serialized). Each call is fully documented on the DataSift Support site.
17. Retrieving Stream Data Once you have configured your stream with a definition and verified it is correct, you can connect to your stream through a number of methods: The JSON API is simple and similar to how you would access Twitter Search. The HTTP Stream is similar to the Twitter firehose, giving a constant stream of data through a single connection. WebSockets is similar to this but meant for client-side connections through supported web browsers. RSS is also available, recommended for lower volume feeds only. All services are fully documented on the DataSift Support site.
18. Questions You can get more help, support, examples and user content on the DataSift Support website: http://support.datasift.net You can also ask us on Twitter: @datasift