2. Agenda
• Kinesis Firehose and Redshift
• Build a Streaming Solution for Log Analytics
o Step 1 Set Up Redshift DB and Table
o Step 2 Create Firehose Delivery Stream and Configure Data
Transformation
o Step 3 Send Data to Firehose Delivery Stream
o Step 4 Query and Analyze the Data from Redshift
o Step 5 Monitor Streaming Data Pipeline
3. Load streaming data into Amazon S3,
Amazon Redshift, and Amazon
Elasticsearch Service
Kinesis Firehose
6. Data Flow Overview
Kinesis
Producer UI
Amazon
Kinesis
Firehose
Amazon
Redshift
Generate web
logs
Deliver processed web
logs to Redshift
Run SQL queries on
processed web logs
Transform raw data
to structured data
22. Sample Data
219.134.32.117 - - [16/Feb/2017:09:38:20 -0800] "GET /wp-content HTTP/1.1" 200 4521
"-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/5.1; .NET CLR
3.8.23015.5)"
95.169.41.62 - - [16/Feb/2017:09:38:20 -0800] "PUT /app/main/posts HTTP/1.1" 200
3883 "-" "Mozilla/5.0 (Windows NT 6.2; Trident/7.0; rv:11.0) like Gecko"
221.147.191.247 - - [16/Feb/2017:09:38:20 -0800] "GET /explore HTTP/1.1" 200 6579 "-"
"Mozilla/5.0 (Windows; U; Windows NT 5.1) AppleWebKit/538.0.1 (KHTML, like Gecko)
Chrome/38.0.895.0 Safari/538.0.1"
179.96.123.130 - - [16/Feb/2017:09:38:20 -0800] "GET /list HTTP/1.1" 200 560 "-"
"Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:5.4) Gecko/20100101 Firefox/5.4.6"
132.119.12.76 - - [16/Feb/2017:09:38:20 -0800] "PUT /explore HTTP/1.1" 200 3131 "-"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_0 rv:5.0; AZ) AppleWebKit/535.1.0
(KHTML, like Gecko) Version/4.0.3 Safari/535.1.0"
74.113.56.92 - - [16/Feb/2017:09:38:20 -0800] "DELETE /app/main/posts HTTP/1.1" 200
7069 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_9) AppleWebKit/532.1.0
(KHTML, like Gecko) Chrome/15.0.877.0 Safari/532.1.0"
23. After Data Transformation
1.133.158.104,16/Feb/2017:10:26:46 -0800,GET,/search/tag/list,HTTP/1.1,200,9523,-
,"Mozilla/5.0 (Windows; U; Windows NT 5.3) AppleWebKit/531.1.1 (KHTML, like Gecko)
Chrome/24.0.827.0 Safari/531.1.1"
194.189.242.208,16/Feb/2017:10:26:46 -0800,GET,/explore,HTTP/1.1,200,8202,-
,"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 5.0; Trident/5.1)"
210.104.234.68,16/Feb/2017:10:26:46 -0800,GET,/wp-content,HTTP/1.1,200,6523,-
,"Mozilla/5.0 (Windows; U; Windows NT 5.0) AppleWebKit/538.0.2 (KHTML, like Gecko)
Chrome/19.0.804.0 Safari/538.0.2"
12.140.32.105,16/Feb/2017:10:26:46 -0800,PUT,/wp-admin,HTTP/1.1,200,9273,-
,"Mozilla/5.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/6.0)"
208.53.124.37,16/Feb/2017:10:26:46 -0800,GET,/explore,HTTP/1.1,200,5187,-
,"Mozilla/5.0 (Windows; U; Windows NT 5.2) AppleWebKit/531.2.1 (KHTML, like Gecko)
Chrome/36.0.842.0 Safari/531.2.1“
113.80.90.8,16/Feb/2017:10:26:46 -0800,PUT,/wp-content,HTTP/1.1,200,4431,-
,"Mozilla/5.0 (Windows; U; Windows NT 5.2) AppleWebKit/534.1.1 (KHTML, like Gecko)
Chrome/23.0.886.0 Safari/534.1.1"
26. Query Data
• Find distribution of response codes over days
SELECT TRUNC(request_time), response_code, COUNT(*) FROM
weblogs GROUP BY 1,2 ORDER BY 1,3 DESC;
• Count the number of 404 response codes
SELECT COUNT(*) FROM weblogs WHERE response_code = 404;
• Show all requests paths with status “PAGE NOT FOUND”
SELECT TOP 1 request_path, COUNT(*) FROM weblogs WHERE
response_code = 404 GROUP BY 1 ORDER BY 2 DESC;