SlideShare uma empresa Scribd logo
1 de 43
1© StreamSets, Inc. All rights reserved.
Project Ouroboros
Using StreamSets Data Collector to Help Manage
the StreamSets Open Source Community
Pat Patterson / Director of Evangelism
@metadaddy / pat@streamsets.com
2© StreamSets, Inc. All rights reserved.
Who Am I?
Pat Patterson / pat@streamsets.com / @metadaddy
Past: Sun Microsystems, Salesforce
Present: Director of Evangelism, StreamSets
I run far 🏃♂️
3© StreamSets, Inc. All rights reserved.
Who is StreamSets?
Seasoned leadership team Customer base from global
8000
50%
Unique commercial
downloaders
2000+
Open source downloads
worldwide
3,000,000+
Broad connectivity
50+
History of innovation
streamsets.com/about-us
4© StreamSets, Inc. All rights reserved.
The StreamSets DataOps Platform
Data Lake
5© StreamSets, Inc. All rights reserved.
A Swiss Army Knife for Data
6© StreamSets, Inc. All rights reserved.
Parse Fastly CDN logs
Extract records relating to downloads
Gain insights
Companies downloading the binaries
Geographic reach
Metrics for different binary artifacts
Objective
7© StreamSets, Inc. All rights reserved.
Bash script to download S3 objects using AWS CLI tool
sed, grep, sort, uniq, awk, diff, xargs, curl
Complex, hard to maintain, slow, essentially ‘write-only’ code
cut -f 1 -d ' ' merge.log|sort|uniq > ips
diff --new-line-format="" --unchanged-line-
format="" ips allips > newips
cat newips|xargs -L 1 -I% curl -s
http://ipinfo.io/%/org|cut -f 2- -d '
'|sort|uniq>orgs && subl orgs
Before
8© StreamSets, Inc. All rights reserved.
Mission creep
Inertia
Why???
Image Nyah S / Pexels / Pexels License
9© StreamSets, Inc. All rights reserved.
Data Flow
StreamSets
Data Collector
↘
↘
Amazon S3
MySQL
10© StreamSets, Inc. All rights reserved.
Parse Fastly CDN log lines, send data to MySQL
<134>2017-07-09T12:01:13Z cache-sjc3636
StreamSetsS3Bucket[60550]: 104.155.191.102 "-" "-"
Sun, 09 Jul 2017 12:01:12 GMT GET
/datacollector/latest/parcel/manifest.json 200 1295
Let’s Get Started!
11© StreamSets, Inc. All rights reserved.
Grok Patterns are designed for exactly this!
Standard patterns for timestamps, HTTP verbs, filenames
<%{NUMBER:priority}>%{TIMESTAMP_ISO8601:timestamp}
%{HOSTNAME:cachenode}
%{WORD:logname}[%{NUMBER:pid}]: %{IP:ip} "-" "-"
%{DATESTAMP_FASTLY:datestamp} %{WORD:verb}
%{PATH:file} %{NUMBER:code} %{SIZE_OR_NULL}
Simple, Right?
12© StreamSets, Inc. All rights reserved.
First Cut
13© StreamSets, Inc. All rights reserved.
What??? An HTTP request isn’t supposed to include the protocol like that!
Fastly records whatever the client sends, no matter how dumb.
But...
Record1-Error SERVICE_ERROR_001 - Cannot parse record from message 'rawData':
com.streamsets.pipeline.api.service.dataformats.DataParserException:
LOG_PARSER_03 - Log line '<134>2017-07-09T12:01:13Z cache-sjc3636
StreamSetsS3Bucket[60550]: 104.155.191.102 "-" "- Sun, 09 Jul 2017 12:01:12 GMT
GET
https://archives.streamsets.com/datacollector/latest/parcel/STREAMSETS_DATAC
OLLECTOR-1.1.4-el6.parcel 404 0' does not conform to 'Grok Format
14© StreamSets, Inc. All rights reserved.
<%{NUMBER:priority}>%{TIMESTAMP_ISO8601:timestamp}
%{HOSTNAME:cachenode}
%{WORD:logname}[%{NUMBER:pid}]: %{IP:ip} "-" "-"
%{DATESTAMP_FASTLY:datestamp} %{WORD:verb}
%{NOTSPACE:file} %{NUMBER:code} %{SIZE_OR_NULL}
Solution: Be Permissive with your Input
15© StreamSets, Inc. All rights reserved.
Even if you think you know the data
schema - test with real data!
First Lesson Learned
16© StreamSets, Inc. All rights reserved.
Second Cut
17© StreamSets, Inc. All rights reserved.
But
Performance SUCKED!
18© StreamSets, Inc. All rights reserved.
Solution: Duplicate the Data
CREATE TABLE download (
id int(11) AUTO_INCREMENT,
ip varchar(64),
date datetime,
file varchar(767),
PRIMARY KEY (`id`),
KEY `date_idx` (`date`),
KEY `file_idx` (`file`)
);
19© StreamSets, Inc. All rights reserved.
Third Cut
20© StreamSets, Inc. All rights reserved.
30x Better Performance!
21© StreamSets, Inc. All rights reserved.
Filtering Downloads
22© StreamSets, Inc. All rights reserved.
Fit the data model to the data
Second Lesson Learned
23© StreamSets, Inc. All rights reserved.
Lookup company details from IP via Kickfire API
What’s Next?
24© StreamSets, Inc. All rights reserved.
Fourth Cut
25© StreamSets, Inc. All rights reserved.
com.streamsets.pipeline.api.base.OnRecordErrorException: HTTP_01 -
Error fetching resource. Status: 429 Reason: You have reached the
maximum calls per second
org.glassfish.jersey.message.internal.EntityInputStream@4cb3922b
But...
Kickfire API is rate limited!
To deliver optimum performance to all of our API customers, KickFire
balances transaction loads by using rate limits
26© StreamSets, Inc. All rights reserved.
Solution - Rate Limit
27© StreamSets, Inc. All rights reserved.
com.streamsets.pipeline.api.base.OnRecordErrorException: HTTP_01 - Error
fetching resource. Status: 429 Reason: You have reached the maximum calls
per month org.glassfish.jersey.message.internal.EntityInputStream@4cb3922b
But...
Kickfire API has a monthly call limit!
28© StreamSets, Inc. All rights reserved.
Solution - Don’t Ask For Data We Already Have
29© StreamSets, Inc. All rights reserved.
Know your API’s
non-functional constraints!
Third Lesson Learned
30© StreamSets, Inc. All rights reserved.
Fifth Cut
31© StreamSets, Inc. All rights reserved.
Leave to run for a few weeks...
Image © Itzuvit / Wikimedia Commons / CC-BY-SA-3.0
32© StreamSets, Inc. All rights reserved.
com.streamsets.pipeline.api.base.OnRecordErrorException: HTTP_01 -
Error fetching resource. Status: 429 Reason: You have reached the
maximum calls per month
org.glassfish.jersey.message.internal.EntityInputStream@4cb3922b
But...
Kickfire’s monthly call limit strikes again!
33© StreamSets, Inc. All rights reserved.
Root Cause
Seeing large numbers of downloads from the same few IP addresses
Data Collector has a microbatch architecture - database writes are
committed at the end of the batch
New IP address isn’t visible in the database until the start of the next batch
Still making repeated requests to Kickfire for the same IP address!
34© StreamSets, Inc. All rights reserved.
Solution - Deduplicate records on IP Address
35© StreamSets, Inc. All rights reserved.
Data Collector operates batch-by-batch
-
design your pipelines accordingly!
Fourth Lesson Learned
36© StreamSets, Inc. All rights reserved.
The Finished Article
37© StreamSets, Inc. All rights reserved.
A Closer Look
38© StreamSets, Inc. All rights reserved.
No plan survives first
contact with the enemy
Helmuth von Moltke the Elder, "On Strategy"
(1871)
Ultimate Lesson Learned
Image in the public domain
39© StreamSets, Inc. All rights reserved.
or
Ultimate Lesson Learned
40© StreamSets, Inc. All rights reserved.
Everybody has a plan
until they get punched
in the mouth
Mike Tyson (1987)
Ultimate Lesson Learned
Image © Abelito Roldan / Flickr / CC BY 2.0
41© StreamSets, Inc. All rights reserved.
September 3-5, 2019
Tue, Sep 3 - Training & Tutorials
Wed-Thu, Sep 4-5, Keynote & Breakouts
Hilton Financial District
(Tue|Wed|Thur)
42© StreamSets, Inc. All rights reserved.
Questions?
43© StreamSets, Inc. All rights reserved.
Thank you
43© StreamSets, Inc. All rights reserved.
Pat Patterson / Director of Evangelism
@metadaddy / pat@streamsets.com

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
HIPAA Compliance in the Cloud
HIPAA Compliance in the CloudHIPAA Compliance in the Cloud
HIPAA Compliance in the Cloud
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
 
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
PCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System TuningPCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System Tuning
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Enterprise Metadata Integration
Enterprise Metadata IntegrationEnterprise Metadata Integration
Enterprise Metadata Integration
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
 
Mainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzureMainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft Azure
 
Spark meetup - Zoomdata Streaming
Spark meetup  - Zoomdata StreamingSpark meetup  - Zoomdata Streaming
Spark meetup - Zoomdata Streaming
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to..."Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
 
Continus sql with sql stream builder
Continus sql with sql stream builderContinus sql with sql stream builder
Continus sql with sql stream builder
 
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
 
Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
Building Stateful applications on Streaming Platforms | Premjit Mishra, Dell ...
Building Stateful applications on Streaming Platforms | Premjit Mishra, Dell ...Building Stateful applications on Streaming Platforms | Premjit Mishra, Dell ...
Building Stateful applications on Streaming Platforms | Premjit Mishra, Dell ...
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Big Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data Tech
Big Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data TechBig Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data Tech
Big Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data Tech
 

Semelhante a Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamSets Open Source Community

Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)
Tom Diederich
 

Semelhante a Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamSets Open Source Community (20)

Logstash and Maxmind: not just for GEOIP anymore
Logstash and Maxmind: not just for GEOIP anymoreLogstash and Maxmind: not just for GEOIP anymore
Logstash and Maxmind: not just for GEOIP anymore
 
Instrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyInstrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with Envoy
 
Hive on mesos Strata
Hive on mesos StrataHive on mesos Strata
Hive on mesos Strata
 
How to use 23c AHF AIOPS to protect Oracle Databases 23c
How to use 23c AHF AIOPS to protect Oracle Databases 23c How to use 23c AHF AIOPS to protect Oracle Databases 23c
How to use 23c AHF AIOPS to protect Oracle Databases 23c
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
Who’s Minding the SSO Store?
Who’s Minding the SSO Store? Who’s Minding the SSO Store?
Who’s Minding the SSO Store?
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
 
SmartDB Office Hours: Connection Pool Sizing Concepts
SmartDB Office Hours: Connection Pool Sizing ConceptsSmartDB Office Hours: Connection Pool Sizing Concepts
SmartDB Office Hours: Connection Pool Sizing Concepts
 
Cyber Data Lake: How CIS Analyzes Billions of Network Traffic Records per Day
Cyber Data Lake: How CIS Analyzes Billions of Network Traffic Records per DayCyber Data Lake: How CIS Analyzes Billions of Network Traffic Records per Day
Cyber Data Lake: How CIS Analyzes Billions of Network Traffic Records per Day
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
PostgreSQL
PostgreSQL PostgreSQL
PostgreSQL
 
Scaling Slack - The Good, the Unexpected, and the Road Ahead
Scaling Slack - The Good, the Unexpected, and the Road AheadScaling Slack - The Good, the Unexpected, and the Road Ahead
Scaling Slack - The Good, the Unexpected, and the Road Ahead
 
Oracle Management Cloud
Oracle Management Cloud Oracle Management Cloud
Oracle Management Cloud
 
Oracle Management Cloud
Oracle Management CloudOracle Management Cloud
Oracle Management Cloud
 
Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.
 
Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
 

Mais de Pat Patterson

Mais de Pat Patterson (20)

DevOps from the Provider Perspective
DevOps from the Provider PerspectiveDevOps from the Provider Perspective
DevOps from the Provider Perspective
 
How Imprivata Combines External Data Sources for Business Insights
How Imprivata Combines External Data Sources for Business InsightsHow Imprivata Combines External Data Sources for Business Insights
How Imprivata Combines External Data Sources for Business Insights
 
Data Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, HowData Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, How
 
Dealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data LakeDealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data Lake
 
Integrating with Einstein Analytics
Integrating with Einstein AnalyticsIntegrating with Einstein Analytics
Integrating with Einstein Analytics
 
Efficient Schemas in Motion with Kafka and Schema Registry
Efficient Schemas in Motion with Kafka and Schema RegistryEfficient Schemas in Motion with Kafka and Schema Registry
Efficient Schemas in Motion with Kafka and Schema Registry
 
Dealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data LakeDealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data Lake
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 
Adaptive Data Cleansing with StreamSets and Cassandra
Adaptive Data Cleansing with StreamSets and CassandraAdaptive Data Cleansing with StreamSets and Cassandra
Adaptive Data Cleansing with StreamSets and Cassandra
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data Integrations
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
All Aboard the Boxcar! Going Beyond the Basics of REST
All Aboard the Boxcar! Going Beyond the Basics of RESTAll Aboard the Boxcar! Going Beyond the Basics of REST
All Aboard the Boxcar! Going Beyond the Basics of REST
 
Provisioning IDaaS - Using SCIM to Enable Cloud Identity
Provisioning IDaaS - Using SCIM to Enable Cloud IdentityProvisioning IDaaS - Using SCIM to Enable Cloud Identity
Provisioning IDaaS - Using SCIM to Enable Cloud Identity
 
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
 
Enterprise IoT: Data in Context
Enterprise IoT: Data in ContextEnterprise IoT: Data in Context
Enterprise IoT: Data in Context
 
OData: A Standard API for Data Access
OData: A Standard API for Data AccessOData: A Standard API for Data Access
OData: A Standard API for Data Access
 
API-Driven Relationships: Building The Trans-Internet Express of the Future
API-Driven Relationships: Building The Trans-Internet Express of the FutureAPI-Driven Relationships: Building The Trans-Internet Express of the Future
API-Driven Relationships: Building The Trans-Internet Express of the Future
 
Using Salesforce to Manage Your Developer Community
Using Salesforce to Manage Your Developer CommunityUsing Salesforce to Manage Your Developer Community
Using Salesforce to Manage Your Developer Community
 

Último

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 

Último (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 

Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamSets Open Source Community

  • 1. 1© StreamSets, Inc. All rights reserved. Project Ouroboros Using StreamSets Data Collector to Help Manage the StreamSets Open Source Community Pat Patterson / Director of Evangelism @metadaddy / pat@streamsets.com
  • 2. 2© StreamSets, Inc. All rights reserved. Who Am I? Pat Patterson / pat@streamsets.com / @metadaddy Past: Sun Microsystems, Salesforce Present: Director of Evangelism, StreamSets I run far 🏃♂️
  • 3. 3© StreamSets, Inc. All rights reserved. Who is StreamSets? Seasoned leadership team Customer base from global 8000 50% Unique commercial downloaders 2000+ Open source downloads worldwide 3,000,000+ Broad connectivity 50+ History of innovation streamsets.com/about-us
  • 4. 4© StreamSets, Inc. All rights reserved. The StreamSets DataOps Platform Data Lake
  • 5. 5© StreamSets, Inc. All rights reserved. A Swiss Army Knife for Data
  • 6. 6© StreamSets, Inc. All rights reserved. Parse Fastly CDN logs Extract records relating to downloads Gain insights Companies downloading the binaries Geographic reach Metrics for different binary artifacts Objective
  • 7. 7© StreamSets, Inc. All rights reserved. Bash script to download S3 objects using AWS CLI tool sed, grep, sort, uniq, awk, diff, xargs, curl Complex, hard to maintain, slow, essentially ‘write-only’ code cut -f 1 -d ' ' merge.log|sort|uniq > ips diff --new-line-format="" --unchanged-line- format="" ips allips > newips cat newips|xargs -L 1 -I% curl -s http://ipinfo.io/%/org|cut -f 2- -d ' '|sort|uniq>orgs && subl orgs Before
  • 8. 8© StreamSets, Inc. All rights reserved. Mission creep Inertia Why??? Image Nyah S / Pexels / Pexels License
  • 9. 9© StreamSets, Inc. All rights reserved. Data Flow StreamSets Data Collector ↘ ↘ Amazon S3 MySQL
  • 10. 10© StreamSets, Inc. All rights reserved. Parse Fastly CDN log lines, send data to MySQL <134>2017-07-09T12:01:13Z cache-sjc3636 StreamSetsS3Bucket[60550]: 104.155.191.102 "-" "-" Sun, 09 Jul 2017 12:01:12 GMT GET /datacollector/latest/parcel/manifest.json 200 1295 Let’s Get Started!
  • 11. 11© StreamSets, Inc. All rights reserved. Grok Patterns are designed for exactly this! Standard patterns for timestamps, HTTP verbs, filenames <%{NUMBER:priority}>%{TIMESTAMP_ISO8601:timestamp} %{HOSTNAME:cachenode} %{WORD:logname}[%{NUMBER:pid}]: %{IP:ip} "-" "-" %{DATESTAMP_FASTLY:datestamp} %{WORD:verb} %{PATH:file} %{NUMBER:code} %{SIZE_OR_NULL} Simple, Right?
  • 12. 12© StreamSets, Inc. All rights reserved. First Cut
  • 13. 13© StreamSets, Inc. All rights reserved. What??? An HTTP request isn’t supposed to include the protocol like that! Fastly records whatever the client sends, no matter how dumb. But... Record1-Error SERVICE_ERROR_001 - Cannot parse record from message 'rawData': com.streamsets.pipeline.api.service.dataformats.DataParserException: LOG_PARSER_03 - Log line '<134>2017-07-09T12:01:13Z cache-sjc3636 StreamSetsS3Bucket[60550]: 104.155.191.102 "-" "- Sun, 09 Jul 2017 12:01:12 GMT GET https://archives.streamsets.com/datacollector/latest/parcel/STREAMSETS_DATAC OLLECTOR-1.1.4-el6.parcel 404 0' does not conform to 'Grok Format
  • 14. 14© StreamSets, Inc. All rights reserved. <%{NUMBER:priority}>%{TIMESTAMP_ISO8601:timestamp} %{HOSTNAME:cachenode} %{WORD:logname}[%{NUMBER:pid}]: %{IP:ip} "-" "-" %{DATESTAMP_FASTLY:datestamp} %{WORD:verb} %{NOTSPACE:file} %{NUMBER:code} %{SIZE_OR_NULL} Solution: Be Permissive with your Input
  • 15. 15© StreamSets, Inc. All rights reserved. Even if you think you know the data schema - test with real data! First Lesson Learned
  • 16. 16© StreamSets, Inc. All rights reserved. Second Cut
  • 17. 17© StreamSets, Inc. All rights reserved. But Performance SUCKED!
  • 18. 18© StreamSets, Inc. All rights reserved. Solution: Duplicate the Data CREATE TABLE download ( id int(11) AUTO_INCREMENT, ip varchar(64), date datetime, file varchar(767), PRIMARY KEY (`id`), KEY `date_idx` (`date`), KEY `file_idx` (`file`) );
  • 19. 19© StreamSets, Inc. All rights reserved. Third Cut
  • 20. 20© StreamSets, Inc. All rights reserved. 30x Better Performance!
  • 21. 21© StreamSets, Inc. All rights reserved. Filtering Downloads
  • 22. 22© StreamSets, Inc. All rights reserved. Fit the data model to the data Second Lesson Learned
  • 23. 23© StreamSets, Inc. All rights reserved. Lookup company details from IP via Kickfire API What’s Next?
  • 24. 24© StreamSets, Inc. All rights reserved. Fourth Cut
  • 25. 25© StreamSets, Inc. All rights reserved. com.streamsets.pipeline.api.base.OnRecordErrorException: HTTP_01 - Error fetching resource. Status: 429 Reason: You have reached the maximum calls per second org.glassfish.jersey.message.internal.EntityInputStream@4cb3922b But... Kickfire API is rate limited! To deliver optimum performance to all of our API customers, KickFire balances transaction loads by using rate limits
  • 26. 26© StreamSets, Inc. All rights reserved. Solution - Rate Limit
  • 27. 27© StreamSets, Inc. All rights reserved. com.streamsets.pipeline.api.base.OnRecordErrorException: HTTP_01 - Error fetching resource. Status: 429 Reason: You have reached the maximum calls per month org.glassfish.jersey.message.internal.EntityInputStream@4cb3922b But... Kickfire API has a monthly call limit!
  • 28. 28© StreamSets, Inc. All rights reserved. Solution - Don’t Ask For Data We Already Have
  • 29. 29© StreamSets, Inc. All rights reserved. Know your API’s non-functional constraints! Third Lesson Learned
  • 30. 30© StreamSets, Inc. All rights reserved. Fifth Cut
  • 31. 31© StreamSets, Inc. All rights reserved. Leave to run for a few weeks... Image © Itzuvit / Wikimedia Commons / CC-BY-SA-3.0
  • 32. 32© StreamSets, Inc. All rights reserved. com.streamsets.pipeline.api.base.OnRecordErrorException: HTTP_01 - Error fetching resource. Status: 429 Reason: You have reached the maximum calls per month org.glassfish.jersey.message.internal.EntityInputStream@4cb3922b But... Kickfire’s monthly call limit strikes again!
  • 33. 33© StreamSets, Inc. All rights reserved. Root Cause Seeing large numbers of downloads from the same few IP addresses Data Collector has a microbatch architecture - database writes are committed at the end of the batch New IP address isn’t visible in the database until the start of the next batch Still making repeated requests to Kickfire for the same IP address!
  • 34. 34© StreamSets, Inc. All rights reserved. Solution - Deduplicate records on IP Address
  • 35. 35© StreamSets, Inc. All rights reserved. Data Collector operates batch-by-batch - design your pipelines accordingly! Fourth Lesson Learned
  • 36. 36© StreamSets, Inc. All rights reserved. The Finished Article
  • 37. 37© StreamSets, Inc. All rights reserved. A Closer Look
  • 38. 38© StreamSets, Inc. All rights reserved. No plan survives first contact with the enemy Helmuth von Moltke the Elder, "On Strategy" (1871) Ultimate Lesson Learned Image in the public domain
  • 39. 39© StreamSets, Inc. All rights reserved. or Ultimate Lesson Learned
  • 40. 40© StreamSets, Inc. All rights reserved. Everybody has a plan until they get punched in the mouth Mike Tyson (1987) Ultimate Lesson Learned Image © Abelito Roldan / Flickr / CC BY 2.0
  • 41. 41© StreamSets, Inc. All rights reserved. September 3-5, 2019 Tue, Sep 3 - Training & Tutorials Wed-Thu, Sep 4-5, Keynote & Breakouts Hilton Financial District (Tue|Wed|Thur)
  • 42. 42© StreamSets, Inc. All rights reserved. Questions?
  • 43. 43© StreamSets, Inc. All rights reserved. Thank you 43© StreamSets, Inc. All rights reserved. Pat Patterson / Director of Evangelism @metadaddy / pat@streamsets.com