cassandra

Load test Cassandra – The native way – Part 2: The How

In the previous post we talked briefly about why do we need to load test a service. In this post we’ll continue on where we left at the last post on how to load test cassandra. So, lets get started on how to use this amazingly powerful tool; cassandra-stress.


Things to keep in mind:

  • Load test should be simulating the real-time scenario. So, it is very important to have this setup as close to the one in production. It is highly recommended that we use a separate node/host in proximity to the cluster for load testing (Eg: Deploy the load test server in the same region if you are deployment is in AWS).
  • Do not use any node from the cluster itself for load testing. It is not unusual to think that, since cassandra-stress is a tool that comes bundled with the cassandra distribution and logically it makes sense to directly use the tool in one of the nodes. Because, cassandra-stress is a heavy-weight process and can consume a lot of JVM resources and can in-turn cloud your node’s performance.
  • We should also keep in mind that cassandra-stress tool is not actually a distributed program, so in order to test a cluster, we need to make sure that memory is not a bottleneck, so I would recommend to have a host with at-least 16Gigs of memory.

How to use cassandra-stress:

Step 1 : The configuration file

The configuration file is the way to let cassandra-stress tool to prepare key-space and table and prepare data for the load test. We need to configure a bunch of properties for defining the keyspace, table, data-distribution for the test and the queries to test. 

keyspaceKeyspace name
keyspace_definitionDefine keyspace
tableTable name
table_definitionDefine the table definition
columnspecColumn Distribution Specifications 
inserinsertBatch Ratio Distribution Specifications
queriesA list of queries you wish to run against the schema
# Keyspace Name
keyspace: keyspace_to_load_test
# The CQL for creating a keyspace (optional if it already exists)
keyspace_definition: |
CREATE KEYSPACE keyspace_to_load_test with replication = {'class': 'SimpleStrategy', 'replication_factor' : '3'}
# Table name
table: table_to_load_test
# The CQL for creating a table you wish to stress (optional if it already exists)
table_definition: |
CREATE TABLE table_to_load_test (
id uuid,
column1 text,
column2 int,
PRIMARY KEY((id), column1))
### Column Distribution Specifications ###
columnspec:
– name: id
population: GAUSSIAN(1..1000000, 500000, 15) # Normal distribution to mimic the production load
– name: column1
size: uniform(5..20) # Anywhere from 5 characters to 20 characters
cluster: fixed(5) #Assuming that we would be having 5 distinct carriers
– name: column2
size: uniform(100..500) # Anywhere from 5 characters to 20 characters
### Batch Ratio Distribution Specifications ###
insert:
partitions: fixed(1) # We are just going to be touching single partiton with an insert
select: fixed(1)/5 # We would want to update 1/5th of the rows in the partition at any given time
batchtype: UNLOGGED # No batched inserts
#
# A list of queries you wish to run against the schema
#
queries:
queryForUseCase:
cql: select * from table_to_load_test where id = ? and column1 = ?
fields: samerow

Now that we have this configuration file ready, we can use this to run our load test by using the cassandra-stress tool. Lets see how to run the tool now.


Step 2 : Command options

cassandra-stress tool comes bundled with your cassandra distribution download. You will be able to find the tool in apache-cassandra-<version>/tools/bin/.apache-cassandra-<version>/tools/bin/.  You can also learn the options available more deeply by checking out the help option in the tool. I will go thru an example and show you how to run the tool in this post. 

cassandra-stress user profile=stresstest.yaml duration=4h 'ops(insert=100, queryForUseCase =1)' cl=LOCAL_QUORUM -node <nodelist seperated by commas> -rate 'threads=450' throttle=30000/s -graph file="stress-result-4h-ratelimit-clients.html" title=Stress-test-4h -log file=result.log

Lets go over the options I used one by one to understand what they mean. This is by no means a comprehensive explanation. I would highly recommend giving the documentation a good read to know more about these options.

userSpecify the tool to say that cassandra-stress is used for running a load test on User specified schema.
profileSpecify where the configuration file (yaml file) exist.
durationDuration for which your load test should run
opsOperations defined in the yaml file to be included as part the load test. In our example it is insert and queryForUseCase defined in the yaml file.
clConsistency level for your operations
nodeNodes in the cluster
rate# of threads and peak ops/sec limit
graphGraphical report of the run. Specify the file name and title of the report
logLog file name

It is as simple as this. The tool will now run for the duration specified and output a detailed report on the run.

I hope you found this helpful and would certainly be delighted to answer any question regarding this.  

Standard

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.