cassandra, libraries, scala

Access Cassandra using phantom – A type-safe scala library

This post is a quick tutorial on how to use Scala for interacting with Cassandra database. We will look into how to use phantom library for achieving this. phantom is a library that is written in Scala and acts as a wrapper on top of the cassandra-driver (Java driver for Cassandra).

The library phantom offers a very concise and typesafe way to write and design Cassandra interactions using Scala. In this post, I will be sharing my 2 cents for people getting started in using this library.

Prerequisites

Before getting started, I would like to make a few assumptions to keep this post concise and short as much as possible. My assumptions are that;

  1. You know what Cassandra is.
  2. You have Cassandra running locally or on a cluster for this tutorial.
  3. You have a piece of basic knowledge of what is Scala and sbt.

In this post, let’s write a simple database client for accessing User information. For simplicity sake, our User model would just have id, firstName, lastName and emailId as its fields. Since cassandra is all about designing your data-model based on your query use-case, we shall have two use-cases for accessing data and hence shall define two tables in cassandra.

The tables would have the following schema.

CREATE KEYSPACE user_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
CREATE TABLE user_keyspace.user_by_first_name (
firstname text PRIMARY KEY,
email text,
id uuid,
lastname text
);
CREATE TABLE user_keyspace.user_by_id (
id uuid PRIMARY KEY,
email text,
firstname text,
lastname text
);
view raw keyspace.cql hosted with ❤ by GitHub

Make sure your database has this schema configured before proceeding further.


Step 0: Add phantom library to the dependencies

Let us start by creating a simple sbt scala project and add the following dependency to build.sbt file to the libraryDependencies.

"com.outworkers" % "phantom-dsl_2.12" % "2.30.0"

Step 1: Define your model class

In cassandra, it is totally acceptable to denormalize the data and have multiple tables for the same model. phantom is designed to allow this. We define one case class as the data holder and several model classes that correspond to our different use cases. Let us go ahead and define the model and case classes for our User data-model.

// Base model used as DTO
case class User(id: UUID, firstName: String, lastName: String, email: String)
view raw User.scala hosted with ❤ by GitHub

Now, we define the tables that correspond to the tables in Cassandra. This allows phantom to construct the queries for us in a typesafe manner.

The following are the use cases for which we would need clients;

  1. To access the data using the user id. (user_by_id table)
  2. To access the data using the user’s first name. (user_by_first_name table)

We create appropriate models that reflect these tables in cassandra and define low-level queries using phantom-dsl to allow phantom to construct queries for us so we don’t have to write any cql statements ever in our application.

// Query based model
abstract class UserById extends Table[UserById, User] {
// Override table metadata
override def tableName: String = "user_by_id"
// Define the table schema
object id extends UUIDColumn with PartitionKey
object fName extends StringColumn {
override def name: String = "firstName"
}
object lName extends StringColumn {
override def name: String = "lastName"
}
object email extends StringColumn
// Define low level queries
def createUserById(uuid: UUID, firstName: String, lastName: String, emailId: String): Future[ResultSet] = {
insert
.value(_.id, uuid)
.value(_.fName, firstName)
.value(_.lName, lastName)
.value(_.email, emailId)
.future()
}
def getUserById(uuid: UUID): Future[Option[User]] = select
.all()
.where(_.id eqs uuid)
.one()
}
abstract class UserByFirstName extends Table[UserByFirstName, User] {
// Override table metadata
override def tableName: String = "user_by_first_name"
// Define the table schema
object fName extends StringColumn with PartitionKey {
override def name: String = "firstName"
}
object id extends UUIDColumn
object lName extends StringColumn {
override def name: String = "lastName"
}
object email extends StringColumn
// Define low level queries
def createUserByUserName(uuid: UUID, firstName: String, lastName: String, emailId: String): Future[ResultSet] = {
insert
.value(_.id, uuid)
.value(_.fName, firstName)
.value(_.lName, lastName)
.value(_.email, emailId)
.future()
}
def getUserByFirstName(firstName: String): Future[Option[User]] = select
.all()
.where(_.fName eqs firstName)
.one()
}
view raw UserModel.scala hosted with ❤ by GitHub

You can access the state of the project until this step in GitHub here.

Now that we have our models and low level queries defined, we need to design an effective way to manage our session and database instances.

Step 2: Encapsulate database instances in a Provider

Since we have interactions with multiple tables for a single model User, phantom provides a way to encapsulate these database instances at one place and manage it for us. This way, our implementations won’t be scattered around and will be effectively managed with the appropriate session object.

So, in this step, we define an encapsulation for all the database instances by extending phantom‘s Database class. This is where we create instances for the models we defined in the above step.

import com.outworkers.phantom.dsl._
// This class will encapsulate all the valid database instances
class UserDatabase(implicit cassandraConnection: CassandraConnection)
extends Database[UserDatabase](connector = cassandraConnection) {
object userByIdInstance extends UserById with Connector
object userByFirstName extends UserByFirstName with Connector
}
// This trait will act as a database instance provider.
trait UserDbProvider extends DatabaseProvider[UserDatabase]

Notice that we also defined a trait extending DatabaseProvider[UserDatabase] in the above snippet. We will in the next step discuss how this trait is useful.

You can access the state of the project until this step here.

Step 3: Orchestrate your queries using DatabaseProvider

Now, that you have your model and database instances in place, exposing these methods may not be a good design. What if you need some kind of data-validation/data-enrichment before accessing these methods. Well, to be very specific in our case, we would need to enter data into two tables when a User record is created. To serve such purposes, we need an extra layer of abstraction for accessing the queries we defined. This is the exact purpose of DatabaseProvider trait.

We define our database access layer (like a service) by extending the trait DatabaseProvider and orchestrate our low-level queries so that the application can access data with the right abstraction.

import com.outworkers.phantom.dsl._
import scala.concurrent.Future
trait UserDatabaseService extends UserDbProvider {
// Orchestrate your low level queries appropriately.
def createUser(uuid: UUID, firstName: String, lastName: String, email:String): Future[Option[UUID]] =
database.userByIdInstance.createUserById(uuid, firstName, lastName, email)
.flatMap(_ => {
database.userByFirstName.createUserByUserName(uuid, firstName, lastName, email)
.map(rs => if(rs.wasApplied) Some(uuid) else None)
})
def selectUserById(uuid: UUID): Future[Option[User]] = database.userByIdInstance.getUserById(uuid)
def selectUserByFirstName(firstName: String): Future[Option[User]] = database.userByFirstName.getUserByFirstName(firstName)
}

You can see that we were able to combine both the inserts (to user_by_id & user_by_first_name) in one method call. We could have definitely, done some sort of validation or data-transformation if we wanted to here.

You can access the state of the project until this step here.

Step 4: Putting everything together

We are all set to the database service we created so far. Lets see how this is done.

We start out by creating our CassandraConnection instance. This is how we can inject out cassandra configuration into phantom and let it manage the database session for us.

implicit val cassandraConnection: CassandraConnection = {
ContactPoint.local.keySpace("user_keyspace")
}

Here we are using cassandra installed locally, hence we used ContactPoint.local. Ideally in real-world we would have to use ContactPoints(hosts).

Then we create our database encapsulation instance and the service object.

object UserDatabase extends UserDatabase
object databaseService extends UserDatabaseService {
override def database: UserDatabase = UserDatabase
}
view raw Service.scala hosted with ❤ by GitHub

Now, it is as simple as just calling a bunch of methods to test out if our database interactions work.

val userPair = for{
uuid <- databaseService.createUser(UUID.nameUUIDFromBytes("Sumanth".getBytes), "Sumanth", "Kumar", "sumanth@indywiz.com")
userByFirstName <- databaseService.selectUserByFirstName("Sumanth")
userById <- databaseService.selectUserById(uuid.get)
} yield (userById, userByFirstName)
val res = userPair.map {
case (userById, userByFirstName) =>
logger.info(s"User by Id : ${userById.get}")
logger.info(s"User by first name: ${userByFirstName.get}")
}
view raw Call.scala hosted with ❤ by GitHub

You can find the final state of the project here.

We might have had a lot of constructs like Database, DatabaseProvider, etc, bundled with phantom-dsl. But in my opinion, this is something that makes it more than just yet another scala dsl-library. It is because of these design constructs, phantom implicitly promotes good design for people using it.

Hope you found my rambling useful.

Standard