SlideShare a Scribd company logo
1 of 6
Download to read offline
IP: 10.4.0.2
Host: graphite-02
IP: 10.4.0.1
Host: graphite-01
[cache:a]
line_receiver_interface 0.0.0.0
line_receiver_port 2103
pickle_receiver_interface 0.0.0.0
pickle_receiver_port 2104
cache_query_interface 0.0.0.0
cache_query_port 7102
[cache:b]
line_receiver_interface 0.0.0.0
line_receiver_port 2203
pickle_receiver_interface 0.0.0.0
pickle_receiver_port 2204
cache_query_interface 0.0.0.0
cache_query_port 7202
[relay]
line_receiver_interface = 0.0.0.0
line_receiver_port = 2003
pickle_receiver_interface = 0.0.0.0
pickle_receiver_port = 2004
relay_method = consistent-hashing
replication_factor = 1
Destinations = [ 10.4.0.1:2104:a,
10.4.0.1:2204:b,
10.4.0.2:2104:a,
10.4.0.2:2204:b ]
[cache:a]
line_receiver_interface 0.0.0.0
line_receiver_port 2103
pickle_receiver_interface 0.0.0.0
pickle_receiver_port 2104
cache_query_interface 0.0.0.0
cache_query_port 7102
[cache:b]
line_receiver_interface 0.0.0.0
line_receiver_port 2203
pickle_receiver_interface 0.0.0.0
pickle_receiver_port 2204
cache_query_interface 0.0.0.0
cache_query_port 7202
[relay]
line_receiver_interface = 0.0.0.0
line_receiver_port = 2003
pickle_receiver_interface = 0.0.0.0
pickle_receiver_port = 2004
relay_method = consistent-hashing
replication_factor = 1
Destinations = [ 10.4.0.1:2104:a,
10.4.0.1:2204:b,
10.4.0.2:2104:a,
10.4.0.2:2204:b ]
[aggregator]
line_receiver_interface = 0.0.0.0
line_receiver_port = 2013
pickle_receiver_interface = 0.0.0.0
pickle_receiver_port = 2014
[aggregator]
line_receiver_interface = 0.0.0.0
line_receiver_port = 2013
pickle_receiver_interface = 0.0.0.0
pickle_receiver_port = 2014
load balancer
IP: 10.4.0.10
Host: graphite
TCP ports: 2003, 2004 HTTP ports: 80
[webapp]
port = 80
memcache_hosts = [
“rf-1.cache.amazonaws.com” ]
cluster_servers = [
“10.4.0.2:80” ]
remote_rendering = false
carbonlink_hosts = [
“10.4.0.1:7102”,
“10.4.0.1:7202” ]
[webapp]
port = 80
memcache_hosts = [
“rf-1.cache.amazonaws.com” ]
cluster_servers = [
“10.4.0.2:80” ]
remote_rendering = false
carbonlink_hosts = [
“10.4.0.2:7102”,
“10.4.0.2:7202” ]
adobrosynets@recordedfuture.com
IP: 10.4.0.2
Host: graphite-02
IP: 10.4.0.1
Host: graphite-01
[cache:a]
line_receiver_port 2103
pickle_receiver_port 2104
cache_query_port 7102
[cache:b]
line_receiver_port 2203
pickle_receiver_port 2204
cache_query_port 7202
[relay]
line_receiver_port = 2003
pickle_receiver_port = 2004
relay_method = consistent-hash
replication_factor = 1
Destinations = [
10.4.0.1:2104:a,10.4.0.1:2204:b,
10.4.0.2:2104:a,10.4.0.2:2204:b,
10.4.0.3:2104:a,10.4.0.3:2204:b
]
[cache:a]
line_receiver_port 2103
pickle_receiver_port 2104
cache_query_port 7102
[cache:b]
line_receiver_port 2203
pickle_receiver_port 2204
cache_query_port 7202
[relay]
line_receiver_port = 2003
pickle_receiver_port = 2004
relay_method = consistent-hash
replication_factor = 1
Destinations = [
10.4.0.1:2104:a,10.4.0.1:2204:b,
10.4.0.2:2104:a,10.4.0.2:2204:b ,
10.4.0.3:2104:a,10.4.0.3:2204:b
]
[aggregator]
line_receiver_port = 2013
pickle_receiver_port = 2014
[aggregator]
line_receiver_port = 2013
pickle_receiver_port = 2014
load balancer
IP: 10.4.0.10
Host: graphite
TCP ports: 2003, 2004 HTTP ports: 80
[webapp]
memcache_hosts = [
“rf-1.cache” ]
cluster_servers = [
“10.4.0.2:80”,
“10.4.0.3:80” ]
carbonlink_hosts = [
“10.4.0.1:7102”,
“10.4.0.1:7202” ]
[webapp]
memcache_hosts = [
“rf-1.cache” ]
cluster_servers = [
“10.4.0.1:80”,
“10.4.0.3:80” ]
carbonlink_hosts = [
“10.4.0.2:7102”,
“10.4.0.2:7202” ]
IP: 10.4.0.3
Host: graphite-03
[cache:a]
line_receiver_port 2103
pickle_receiver_port 2104
cache_query_port 7102
[cache:b]
line_receiver_port 2203
pickle_receiver_port 2204
cache_query_port 7202
[relay]
line_receiver_port = 2003
pickle_receiver_port = 2004
relay_method = consistent-hash
replication_factor = 1
Destinations = [
10.4.0.1:2104:a,10.4.0.1:2204:b,
10.4.0.2:2104:a,10.4.0.2:2204:b ,
10.4.0.3:2104:a,10.4.0.3:2204:b
]
[aggregator]
line_receiver_port = 2013
pickle_receiver_port = 2014
[webapp]
memcache_hosts = [
“rf-1.cache” ]
cluster_servers = [
“10.4.0.1:80”,
“10.4.0.2:80” ]
carbonlink_hosts = [
“10.4.0.3:7102”,
“10.4.0.3:7202” ]
adobrosynets@recordedfuture.com
Key points
- Many nodes, each node running carbon-relay, webapp, carbon-cache(s).
- Use at least two carbon-cache processes at the node to utilize performance
(typically one process per CPU core)
- All carbon-cache instances use the same schema definitions for whisper files
- All monitoring agents (statsd/sensu/gdash/codehale/collectd/etc) use loadbalancer front-end (HAproxy or ELB)
to send/query metrics.
- Each carbon-relay may route metrics to any carbon-cache instance at any graphite server in cluster.
- All carbon-relays use 'consistent-hashing' method and have exactly the same DESTINATIONS list
(carbon.conf DESTINATIONS. Order is important?)
- All webapp processes share exactly the same memcache instance(s)
(local_settings.py MEMCACHE_HOSTS)
- Each webapp can may query only local carbon-cache instances.
(local_settings.py CARBONLINK_HOSTS)
- All webapps may contain not only other webapps in CLUSTER_SERVERS , but also itself.
(local_settings.py, as of 0.9.10 version)
- Each webapp CARBONLINK_HOSTS must contain only local instances from DESTINATIONS
(order is not important )
- In terms of AWS EC2, graphite nodes are supposed to be installed in the same Region.
- Aggregator is not that useful. It is better to aggregate somewhere else (statsd/diamond) and send to graphite.
webapp/graphite/storage.py
STORE = Store(settings.DATA_DIRS, remote_hosts=settings.CLUSTER_SERVERS)
class Store:
def __init__(self, directories=[], remote_hosts=[]):
self.directories = directories
self.remote_hosts = remote_hosts
self.remote_stores = [ RemoteStore(host)
for host in remote_hosts if not is_local_interface(host) ]
...
def find_first():
...
remote_requests = [ r.find(query) for r in self.remote_stores if r.available ]
...
It is safe to have exactly the same CLUSTER_SERVERS option for all webapps in a cluster
(less template work with Chef/Puppet)
Though, there are some edge-cases. https://github.com/graphite-project/graphite-web/issues/222
CARBONLINK_HOSTS should contain only local carbon-cache instances, not all DESTINATIONS list.
Webapp will take care of selecting proper carbon-cache instance for the metric, although it has a different
list of items in his hash ring .
https://answers.launchpad.net/graphite/+question/228472
webapp/graphite/render/datalib.py
# Data retrieval API
def fetchData(requestContext, pathExpr):
...
if requestContext['localOnly']:
store = LOCAL_STORE
else:
store = STORE
for dbFile in store.find(pathExpr):
log.metric_access(dbFile.metric_path)
dbResults = dbFile.fetch( timestamp(startTime), timestamp(endTime) )
try:
cachedResults = CarbonLink.query(dbFile.real_metric)
results = mergeResults(dbResults, cachedResults)
except:
log.exception()
results = dbResults
if not results:
continue
...
return seriesList
Useful links:
http://graphite.readthedocs.org
http://www.aosabook.org/en/graphite.html
http://rcrowley.org/articles/federated-graphite.html
http://bitprophet.org/blog/2013/03/07/graphite/
http://boopathi.in/blog/the-graphite-story-directi
https://answers.launchpad.net/graphite/+question/228472

More Related Content

What's hot

Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS Cloud
Zhenxiao Luo
 
Spacecrafts Made Simple: How Loft Orbital Delivers Unparalleled Speed-to-Spac...
Spacecrafts Made Simple: How Loft Orbital Delivers Unparalleled Speed-to-Spac...Spacecrafts Made Simple: How Loft Orbital Delivers Unparalleled Speed-to-Spac...
Spacecrafts Made Simple: How Loft Orbital Delivers Unparalleled Speed-to-Spac...
InfluxData
 

What's hot (20)

Spark on Yarn @ Netflix
Spark on Yarn @ NetflixSpark on Yarn @ Netflix
Spark on Yarn @ Netflix
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for Prometheus
 
Data Integration
Data IntegrationData Integration
Data Integration
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
 
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
 
InfluxDB 2.0: Dashboarding 101 by David G. Simmons
InfluxDB 2.0: Dashboarding 101 by David G. SimmonsInfluxDB 2.0: Dashboarding 101 by David G. Simmons
InfluxDB 2.0: Dashboarding 101 by David G. Simmons
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS Cloud
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 
Spacecrafts Made Simple: How Loft Orbital Delivers Unparalleled Speed-to-Spac...
Spacecrafts Made Simple: How Loft Orbital Delivers Unparalleled Speed-to-Spac...Spacecrafts Made Simple: How Loft Orbital Delivers Unparalleled Speed-to-Spac...
Spacecrafts Made Simple: How Loft Orbital Delivers Unparalleled Speed-to-Spac...
 
Airflow for Beginners
Airflow for BeginnersAirflow for Beginners
Airflow for Beginners
 
Dato vs GraphX
Dato vs GraphXDato vs GraphX
Dato vs GraphX
 
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
 
Map Reduce along with Amazon EMR
Map Reduce along with Amazon EMRMap Reduce along with Amazon EMR
Map Reduce along with Amazon EMR
 
Open source data ingestion
Open source data ingestionOpen source data ingestion
Open source data ingestion
 
An Introduction to the Heatmap / Histogram Plugin
An Introduction to the Heatmap / Histogram PluginAn Introduction to the Heatmap / Histogram Plugin
An Introduction to the Heatmap / Histogram Plugin
 
Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...
Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...
Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
 
Storm over gearpump
Storm over gearpumpStorm over gearpump
Storm over gearpump
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
 
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
 

Similar to Graphite cluster setup blueprint

Cloud Foundry Open Tour China (english)
Cloud Foundry Open Tour China (english)Cloud Foundry Open Tour China (english)
Cloud Foundry Open Tour China (english)
marklucovsky
 

Similar to Graphite cluster setup blueprint (20)

Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWS
 
Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWS
 
Service Discovery using etcd, Consul and Kubernetes
Service Discovery using etcd, Consul and KubernetesService Discovery using etcd, Consul and Kubernetes
Service Discovery using etcd, Consul and Kubernetes
 
Deploying IPv6 in OpenStack Environments
Deploying IPv6 in OpenStack EnvironmentsDeploying IPv6 in OpenStack Environments
Deploying IPv6 in OpenStack Environments
 
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
 
Automating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestratorAutomating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestrator
 
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
 
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
 
Session: A Reference Architecture for Running Modern APIs with NGINX Unit and...
Session: A Reference Architecture for Running Modern APIs with NGINX Unit and...Session: A Reference Architecture for Running Modern APIs with NGINX Unit and...
Session: A Reference Architecture for Running Modern APIs with NGINX Unit and...
 
Deep dive in container service discovery
Deep dive in container service discoveryDeep dive in container service discovery
Deep dive in container service discovery
 
Kubernetes internals (Kubernetes 해부하기)
Kubernetes internals (Kubernetes 해부하기)Kubernetes internals (Kubernetes 해부하기)
Kubernetes internals (Kubernetes 해부하기)
 
Cloud Foundry Open Tour China (english)
Cloud Foundry Open Tour China (english)Cloud Foundry Open Tour China (english)
Cloud Foundry Open Tour China (english)
 
RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
RedisConf17 - Lyft - Geospatial at Scale - Daniel HochmanRedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
 
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
 
Scaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container ServiceScaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container Service
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDB
 
Ports, pods and proxies
Ports, pods and proxiesPorts, pods and proxies
Ports, pods and proxies
 
Kubernetes at Datadog Scale
Kubernetes at Datadog ScaleKubernetes at Datadog Scale
Kubernetes at Datadog Scale
 
Istio Playground
Istio PlaygroundIstio Playground
Istio Playground
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Graphite cluster setup blueprint

  • 1. IP: 10.4.0.2 Host: graphite-02 IP: 10.4.0.1 Host: graphite-01 [cache:a] line_receiver_interface 0.0.0.0 line_receiver_port 2103 pickle_receiver_interface 0.0.0.0 pickle_receiver_port 2104 cache_query_interface 0.0.0.0 cache_query_port 7102 [cache:b] line_receiver_interface 0.0.0.0 line_receiver_port 2203 pickle_receiver_interface 0.0.0.0 pickle_receiver_port 2204 cache_query_interface 0.0.0.0 cache_query_port 7202 [relay] line_receiver_interface = 0.0.0.0 line_receiver_port = 2003 pickle_receiver_interface = 0.0.0.0 pickle_receiver_port = 2004 relay_method = consistent-hashing replication_factor = 1 Destinations = [ 10.4.0.1:2104:a, 10.4.0.1:2204:b, 10.4.0.2:2104:a, 10.4.0.2:2204:b ] [cache:a] line_receiver_interface 0.0.0.0 line_receiver_port 2103 pickle_receiver_interface 0.0.0.0 pickle_receiver_port 2104 cache_query_interface 0.0.0.0 cache_query_port 7102 [cache:b] line_receiver_interface 0.0.0.0 line_receiver_port 2203 pickle_receiver_interface 0.0.0.0 pickle_receiver_port 2204 cache_query_interface 0.0.0.0 cache_query_port 7202 [relay] line_receiver_interface = 0.0.0.0 line_receiver_port = 2003 pickle_receiver_interface = 0.0.0.0 pickle_receiver_port = 2004 relay_method = consistent-hashing replication_factor = 1 Destinations = [ 10.4.0.1:2104:a, 10.4.0.1:2204:b, 10.4.0.2:2104:a, 10.4.0.2:2204:b ] [aggregator] line_receiver_interface = 0.0.0.0 line_receiver_port = 2013 pickle_receiver_interface = 0.0.0.0 pickle_receiver_port = 2014 [aggregator] line_receiver_interface = 0.0.0.0 line_receiver_port = 2013 pickle_receiver_interface = 0.0.0.0 pickle_receiver_port = 2014 load balancer IP: 10.4.0.10 Host: graphite TCP ports: 2003, 2004 HTTP ports: 80 [webapp] port = 80 memcache_hosts = [ “rf-1.cache.amazonaws.com” ] cluster_servers = [ “10.4.0.2:80” ] remote_rendering = false carbonlink_hosts = [ “10.4.0.1:7102”, “10.4.0.1:7202” ] [webapp] port = 80 memcache_hosts = [ “rf-1.cache.amazonaws.com” ] cluster_servers = [ “10.4.0.2:80” ] remote_rendering = false carbonlink_hosts = [ “10.4.0.2:7102”, “10.4.0.2:7202” ] adobrosynets@recordedfuture.com
  • 2. IP: 10.4.0.2 Host: graphite-02 IP: 10.4.0.1 Host: graphite-01 [cache:a] line_receiver_port 2103 pickle_receiver_port 2104 cache_query_port 7102 [cache:b] line_receiver_port 2203 pickle_receiver_port 2204 cache_query_port 7202 [relay] line_receiver_port = 2003 pickle_receiver_port = 2004 relay_method = consistent-hash replication_factor = 1 Destinations = [ 10.4.0.1:2104:a,10.4.0.1:2204:b, 10.4.0.2:2104:a,10.4.0.2:2204:b, 10.4.0.3:2104:a,10.4.0.3:2204:b ] [cache:a] line_receiver_port 2103 pickle_receiver_port 2104 cache_query_port 7102 [cache:b] line_receiver_port 2203 pickle_receiver_port 2204 cache_query_port 7202 [relay] line_receiver_port = 2003 pickle_receiver_port = 2004 relay_method = consistent-hash replication_factor = 1 Destinations = [ 10.4.0.1:2104:a,10.4.0.1:2204:b, 10.4.0.2:2104:a,10.4.0.2:2204:b , 10.4.0.3:2104:a,10.4.0.3:2204:b ] [aggregator] line_receiver_port = 2013 pickle_receiver_port = 2014 [aggregator] line_receiver_port = 2013 pickle_receiver_port = 2014 load balancer IP: 10.4.0.10 Host: graphite TCP ports: 2003, 2004 HTTP ports: 80 [webapp] memcache_hosts = [ “rf-1.cache” ] cluster_servers = [ “10.4.0.2:80”, “10.4.0.3:80” ] carbonlink_hosts = [ “10.4.0.1:7102”, “10.4.0.1:7202” ] [webapp] memcache_hosts = [ “rf-1.cache” ] cluster_servers = [ “10.4.0.1:80”, “10.4.0.3:80” ] carbonlink_hosts = [ “10.4.0.2:7102”, “10.4.0.2:7202” ] IP: 10.4.0.3 Host: graphite-03 [cache:a] line_receiver_port 2103 pickle_receiver_port 2104 cache_query_port 7102 [cache:b] line_receiver_port 2203 pickle_receiver_port 2204 cache_query_port 7202 [relay] line_receiver_port = 2003 pickle_receiver_port = 2004 relay_method = consistent-hash replication_factor = 1 Destinations = [ 10.4.0.1:2104:a,10.4.0.1:2204:b, 10.4.0.2:2104:a,10.4.0.2:2204:b , 10.4.0.3:2104:a,10.4.0.3:2204:b ] [aggregator] line_receiver_port = 2013 pickle_receiver_port = 2014 [webapp] memcache_hosts = [ “rf-1.cache” ] cluster_servers = [ “10.4.0.1:80”, “10.4.0.2:80” ] carbonlink_hosts = [ “10.4.0.3:7102”, “10.4.0.3:7202” ] adobrosynets@recordedfuture.com
  • 3. Key points - Many nodes, each node running carbon-relay, webapp, carbon-cache(s). - Use at least two carbon-cache processes at the node to utilize performance (typically one process per CPU core) - All carbon-cache instances use the same schema definitions for whisper files - All monitoring agents (statsd/sensu/gdash/codehale/collectd/etc) use loadbalancer front-end (HAproxy or ELB) to send/query metrics. - Each carbon-relay may route metrics to any carbon-cache instance at any graphite server in cluster. - All carbon-relays use 'consistent-hashing' method and have exactly the same DESTINATIONS list (carbon.conf DESTINATIONS. Order is important?) - All webapp processes share exactly the same memcache instance(s) (local_settings.py MEMCACHE_HOSTS) - Each webapp can may query only local carbon-cache instances. (local_settings.py CARBONLINK_HOSTS) - All webapps may contain not only other webapps in CLUSTER_SERVERS , but also itself. (local_settings.py, as of 0.9.10 version) - Each webapp CARBONLINK_HOSTS must contain only local instances from DESTINATIONS (order is not important ) - In terms of AWS EC2, graphite nodes are supposed to be installed in the same Region. - Aggregator is not that useful. It is better to aggregate somewhere else (statsd/diamond) and send to graphite.
  • 4. webapp/graphite/storage.py STORE = Store(settings.DATA_DIRS, remote_hosts=settings.CLUSTER_SERVERS) class Store: def __init__(self, directories=[], remote_hosts=[]): self.directories = directories self.remote_hosts = remote_hosts self.remote_stores = [ RemoteStore(host) for host in remote_hosts if not is_local_interface(host) ] ... def find_first(): ... remote_requests = [ r.find(query) for r in self.remote_stores if r.available ] ... It is safe to have exactly the same CLUSTER_SERVERS option for all webapps in a cluster (less template work with Chef/Puppet) Though, there are some edge-cases. https://github.com/graphite-project/graphite-web/issues/222
  • 5. CARBONLINK_HOSTS should contain only local carbon-cache instances, not all DESTINATIONS list. Webapp will take care of selecting proper carbon-cache instance for the metric, although it has a different list of items in his hash ring . https://answers.launchpad.net/graphite/+question/228472 webapp/graphite/render/datalib.py # Data retrieval API def fetchData(requestContext, pathExpr): ... if requestContext['localOnly']: store = LOCAL_STORE else: store = STORE for dbFile in store.find(pathExpr): log.metric_access(dbFile.metric_path) dbResults = dbFile.fetch( timestamp(startTime), timestamp(endTime) ) try: cachedResults = CarbonLink.query(dbFile.real_metric) results = mergeResults(dbResults, cachedResults) except: log.exception() results = dbResults if not results: continue ... return seriesList