Getting to understand your Kubernetes storage capabilities is important in order to run a proper cluster in production. In this session I will demonstrate how to use Sherlock, an open source platform written to test persistent NVMe/TCP storage in Kubernetes, either via synthetic workload or via variety of databases, all easily done and summarized to give you an estimate of what your IOPS, Latency and Throughput your storage can provide to the Kubernetes cluster.
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Testing Persistent Storage Performance in Kubernetes with Sherlock
1. Brought to you by
Testing Persistent Storage
Performance in Kubernetes
with Sherlock
Sagy Volkov
Distinguished Performance Architect at Lightbits Labs
2. Sagy Volkov
Distinguished Performance Architect
■ Based in Boulder, Colorado
■ Mantra: Displaying performance results should be tuned to
audience.
■ 5th startup I’ve joined :)
■ Climbed (so far) 18 out of 58 fourteeners in Colorado
3. Why Storage Performance on Kubernetes?
■ More and more enterprise customers are moving tier 1&2 traditional
applications to K8s.
■ Performance is key factor to determine SLAs
■ K8s looks more on compute resources and less on storage.
■ Storage can reside inside the K8s cluster (converged or internal) or outside
(external).
4. Why Sherlock?
■ Started as a project for a financial entity that wanted “real life” performance
numbers from an authentic applications vs just generating IOs via fio (for
example)
■ It still can be challenging to deploy databases on K8s for new K8s users and
then understand how to “hammer” the storage to its full potential.
■ Went to become a tool to easily test storage performance on K8s.
5. Why NVMe/TCP and Lightbits?
■ We wrote the spec and code for the nvme_tcp module in the Linux kernel.
■ Kernel 4.10 (and of course easily backported)
■ Many companies have joined since then to contribute.
■ TCP infrastructure exists on every and any datacenter, a much simpler
alternative to NVMe over Fabric/FC.
■ Lightbits is the only SDS that provides performance that is close (Latency)
and even better (IOPS) than local NVMe storage.
■ Elastic Raid, Intelligent Flash Management and core storage features.
6. Storage Performance
■ Latency/IOPS/BW - See what really important to you or your project
■ Make sure to test storage under duress as well.
■ Measure recovery time (per storage) without and with your application
running.
■ SDS are network sensitive so be careful on what you chose.
8. Sherlock (1)
■ Currently supports 4 databases and 3 workloads (and more are coming):
● PostgreSQL (sysbench and pgbench)
● MySQL (sysbench)
● MS SQL server (HammerDB)
● MongoDB (YCSB)
9. Sherlock (2)
■ Written in bash because it exists anywhere. Python version is coming.
■ Forces databases to spread evenly on worker nodes.
■ Does the heavy lifting of creating deployments, populating data and running
the actual benchmarks.
■ Easily scriptable to run multiple options.
■ Run from any Linux/Mac either inside or outside the cluster, as long as you
can access the K8s api.
10. Layout of the Project
sherlock
├── Databases <- scripts for creating/deleting DBs, running workloads and config files
│ └── sample_config_files <- very good examples here.
├── containers <- source of all containers
│ ├── benchmark_container
│ │ ├── mongo-yum
│ │ ├── pgbench_tests
│ │ ├── runs
│ │ └── ycsb
│ ├── fio_container
│ │ └── runs
│ └── stats_container
│ └── scripts
└── fio <- yup, there’s still a way to run fio within sherlock
11. Configuration
■ We’re testing storage (and application) performance, so K8s cluster should be
idle
■ All worker nodes are equal from HW perspective, if not, use the lowest node
as the baseline to calculate resources.
■ Equal number of DBs will be created on all worker nodes.
■ Same number and size of PVs. Not too small, we’re not testing buffer cache.
■ Do the math of what each worker have in terms resource, and apply cpu
cycles and memory for the database pods *and* the benchmark pods.
■ I usually leave 25% off
13. Database Deployment
create_databases -c <path to config file>
$ ./create_databases
Now using project "postgresql" on server "https://api.vaelin.ocsonazure.com:6443".
You can add applications to this project with the 'new-app' command. For example, try:
oc new-app ruby~https://github.com/sclorg/ruby-ex.git
to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application:
kubectl create deployment hello-node --image=gcr.io/hello-minikube-zero-install/hello-node
20200924:04:12:45: Creating postgresql database pod postgresql-0 on node vaelin-nkmld-worker-eastus21-bwr2l
persistentvolumeclaim/postgresql-pvc-0 created
deployment.apps/postgresql-0 created
service/postgresql-0 created
pod/postgresql-0-6cd6667847-r9hlw condition met
20200924:04:13:13: Creating postgresql database pod postgresql-1 on node vaelin-nkmld-worker-eastus21-jt7qb
persistentvolumeclaim/postgresql-pvc-1 created
deployment.apps/postgresql-1 created
service/postgresql-1 created
pod/postgresql-1-64bbb6f567-csvfn condition met
20200924:04:13:44: Creating postgresql database pod postgresql-2 on node vaelin-nkmld-worker-eastus21-tvflp
persistentvolumeclaim/postgresql-pvc-2 created
deployment.apps/postgresql-2 created
service/postgresql-2 created
14. Loading Data
run_database_workload-parallel -b sysbench -j prepare -c <path to config file>
$ ./run_database_workload-parallel -b sysbench -j prepare -c sherlock.config
20200924:05:01:30: Starting sysbench job for prepare in deployment postgresql-0 with database ip 10.129.2.12 ...
20200924:05:01:31: job.batch/sysbench-prepare-postgresql-0-maccnb-494r6 is using sysbench pod
sysbench-prepare-postgresql-0-maccnb-494r6-tsv2d on node vaelin-nkmld-worker-eastus21-bwr2l
20200924:05:01:31: Starting sysbench job for prepare in deployment postgresql-1 with database ip 10.128.2.12 ...
20200924:05:01:32: job.batch/sysbench-prepare-postgresql-1-maccnb-xfrr2 is using sysbench pod
sysbench-prepare-postgresql-1-maccnb-xfrr2-fr7v9 on node vaelin-nkmld-worker-eastus21-jt7qb
20200924:05:01:32: Starting sysbench job for prepare in deployment postgresql-2 with database ip 10.131.0.23 ...
20200924:05:01:32: job.batch/sysbench-prepare-postgresql-2-maccnb-td6cr is using sysbench pod
sysbench-prepare-postgresql-2-maccnb-td6cr-g78kj on node vaelin-nkmld-worker-eastus21-tvflp
20200924:05:01:33: Starting sysbench job for prepare in deployment postgresql-3 with database ip 10.129.2.13 ...
20200924:05:01:33: job.batch/sysbench-prepare-postgresql-3-maccnb-gqrc4 is using sysbench pod
sysbench-prepare-postgresql-3-maccnb-gqrc4-92xxn on node vaelin-nkmld-worker-eastus21-bwr2l
20200924:05:01:33: Starting sysbench job for prepare in deployment postgresql-4 with database ip 10.128.2.13 ...
20200924:05:01:34: job.batch/sysbench-prepare-postgresql-4-maccnb-wb4d8 is using sysbench pod
sysbench-prepare-postgresql-4-maccnb-wb4d8-4gz2h on node vaelin-nkmld-worker-eastus21-jt7qb
20200924:05:01:34: Starting sysbench job for prepare in deployment postgresql-5 with database ip 10.131.0.24 ...
…
15. Running Workload
run_database_workload-parallel -b sysbench -j run -c <path to config file> -n <some name
for the run>
$ ./run_database_workload-parallel -b sysbench -j run -c sherlock.config -n psql1
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:hostnetwork added: "default"
20200924:06:04:52: Starting to collect stats on worker node vaelin-nkmld-worker-eastus21-bwr2l...
20200924:06:04:53: Starting to collect stats on worker node vaelin-nkmld-worker-eastus21-jt7qb...
20200924:06:04:54: Starting to collect stats on worker node vaelin-nkmld-worker-eastus21-tvflp...
20200924:06:04:54: Starting to collect stats on sds node vaelin-nkmld-cephnode-eastus21-5c7xj...
20200924:06:04:55: Starting to collect stats on sds node vaelin-nkmld-cephnode-eastus21-5fv74...
20200924:06:04:56: Starting to collect stats on sds node vaelin-nkmld-cephnode-eastus21-8fwqv...
20200924:06:04:56: Starting sysbench job for run in deployment postgresql-0 with database ip 10.129.2.12 ...
20200924:06:04:57: job.batch/sysbench-run-postgresql-0-psql1-9ccs5 is using sysbench pod sysbench-run-postgresql-0-psql1-9ccs5-2krw5 on node
vaelin-nkmld-worker-eastus21-bwr2l
20200924:06:04:57: Starting sysbench job for run in deployment postgresql-1 with database ip 10.128.2.12 ...
20200924:06:04:58: job.batch/sysbench-run-postgresql-1-psql1-c6r6k is using sysbench pod sysbench-run-postgresql-1-psql1-c6r6k-q85sg on node
vaelin-nkmld-worker-eastus21-jt7qb
…
…
20200924:06:05:01: Waiting for jobs to complete ...
job.batch/sysbench-run-postgresql-3-psql1-tg8cn condition met
job.batch/stats-sds-psql1-vaelin-nkmld-cephnode-eastus21-8fwqv-gg89r condition met
job.batch/sysbench-run-postgresql-4-psql1-nnzwt condition met
job.batch/sysbench-run-postgresql-2-psql1-nsggv condition met
…
…