Load test Cassandra – The native way – Part 2: The How

In the previous post we talked briefly about why do we need to load test a service. In this post we’ll continue on where we left at the last post on how to load test cassandra. So, lets get started on how to use this amazingly powerful tool; cassandra-stress.

Things to keep in mind:

Load test should be simulating the real-time scenario. So, it is very important to have this setup as close to the one in production. It is highly recommended that we use a separate node/host in proximity to the cluster for load testing (Eg: Deploy the load test server in the same region if you are deployment is in AWS).
Do not use any node from the cluster itself for load testing. It is not unusual to think that, since cassandra-stress is a tool that comes bundled with the cassandra distribution and logically it makes sense to directly use the tool in one of the nodes. Because, cassandra-stress is a heavy-weight process and can consume a lot of JVM resources and can in-turn cloud your node’s performance.
We should also keep in mind that cassandra-stress tool is not actually a distributed program, so in order to test a cluster, we need to make sure that memory is not a bottleneck, so I would recommend to have a host with at-least 16Gigs of memory.

How to use `cassandra-stress`:

Step 1 : The configuration file

The configuration file is the way to let cassandra-stress tool to prepare key-space and table and prepare data for the load test. We need to configure a bunch of properties for defining the keyspace, table, data-distribution for the test and the queries to test.

keyspace	Keyspace name
keyspace_definition	Define keyspace
table	Table name
table_definition	Define the table definition
columnspec	Column Distribution Specifications
inserinsert	Batch Ratio Distribution Specifications
queries	A list of queries you wish to run against the schema

	# Keyspace Name
	keyspace: keyspace_to_load_test

	# The CQL for creating a keyspace (optional if it already exists)
	keyspace_definition: \|
	CREATE KEYSPACE keyspace_to_load_test with replication = {'class': 'SimpleStrategy', 'replication_factor' : '3'}

	# Table name
	table: table_to_load_test

	# The CQL for creating a table you wish to stress (optional if it already exists)
	table_definition: \|
	CREATE TABLE table_to_load_test (
	id uuid,
	column1 text,
	column2 int,
	PRIMARY KEY((id), column1))

	### Column Distribution Specifications ###

	columnspec:
	– name: id
	population: GAUSSIAN(1..1000000, 500000, 15) # Normal distribution to mimic the production load

	– name: column1
	size: uniform(5..20) # Anywhere from 5 characters to 20 characters
	cluster: fixed(5) #Assuming that we would be having 5 distinct carriers

	– name: column2
	size: uniform(100..500) # Anywhere from 5 characters to 20 characters

	### Batch Ratio Distribution Specifications ###

	insert:
	partitions: fixed(1) # We are just going to be touching single partiton with an insert

	select: fixed(1)/5 # We would want to update 1/5th of the rows in the partition at any given time

	batchtype: UNLOGGED # No batched inserts


	#
	# A list of queries you wish to run against the schema
	#
	queries:
	queryForUseCase:
	cql: select * from table_to_load_test where id = ? and column1 = ?
	fields: samerow

view raw cassandra-stress-example.yaml hosted with ❤ by GitHub

Now that we have this configuration file ready, we can use this to run our load test by using the cassandra-stress tool. Lets see how to run the tool now.

Step 2 : Command options

cassandra-stress tool comes bundled with your cassandra distribution download. You will be able to find the tool in apache-cassandra-<version>/tools/bin/.apache-cassandra-<version>/tools/bin/. You can also learn the options available more deeply by checking out the help option in the tool. I will go thru an example and show you how to run the tool in this post.

cassandra-stress user profile=stresstest.yaml duration=4h 'ops(insert=100, queryForUseCase =1)' cl=LOCAL_QUORUM -node <nodelist seperated by commas> -rate 'threads=450' throttle=30000/s -graph file="stress-result-4h-ratelimit-clients.html" title=Stress-test-4h -log file=result.log

Lets go over the options I used one by one to understand what they mean. This is by no means a comprehensive explanation. I would highly recommend giving the documentation a good read to know more about these options.

user	Specify the tool to say that cassandra-stress is used for running a load test on User specified schema.
profile	Specify where the configuration file (yaml file) exist.
duration	Duration for which your load test should run
ops	Operations defined in the yaml file to be included as part the load test. In our example it is `insert` and `queryForUseCase` defined in the yaml file.
cl	Consistency level for your operations
node	Nodes in the cluster
rate	# of threads and peak ops/sec limit
graph	Graphical report of the run. Specify the file name and title of the report
log	Log file name

It is as simple as this. The tool will now run for the duration specified and output a detailed report on the run.

I hope you found this helpful and would certainly be delighted to answer any question regarding this.

Vish's brainstorm

Ramblings about all the stuff I find interesting in technology around me.

Load test Cassandra – The native way – Part 2: The How

Things to keep in mind:

How to use `cassandra-stress`:

Step 1 : The configuration file

Step 2 : Command options

Leave a comment Cancel reply

Things to keep in mind:

How to use cassandra-stress:

Step 1 : The configuration file

Step 2 : Command options

Share this:

Related

Leave a comment Cancel reply

How to use `cassandra-stress`: