Skip to content

apache/cassandra

Apache Cassandra

Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key.

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster.

Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

For more information, see the Apache Cassandra web site.

Issues should be reported on The Cassandra Jira.

Requirements

  • Java: see supported versions in build.xml (search for property "java.supported").

  • Python: for cqlsh, see bin/cqlsh (search for function "is_supported_version").

Getting started

This short guide will walk you through getting a basic one node cluster up and running, and demonstrate some simple reads and writes. For a more-complete guide, please see the Apache Cassandra website’s Getting Started Guide.

First, we’ll unpack our archive:

$ tar -zxvf apache-cassandra-$VERSION.tar.gz
$ cd apache-cassandra-$VERSION

After that we start the server. Running the startup script with the -f argument will cause Cassandra to remain in the foreground and log to standard out; it can be stopped with ctrl-C.

$ bin/cassandra -f

Now let’s try to read and write some data using the Cassandra Query Language:

$ bin/cqlsh

The command line client is interactive so if everything worked you should be sitting in front of a prompt:

Connected to Test Cluster at localhost:9160.
[cqlsh 6.3.0 | Cassandra 5.0-SNAPSHOT | CQL spec 3.4.8 | Native protocol v5]
Use HELP for help.
cqlsh>

As the banner says, you can use 'help;' or '?' to see what CQL has to offer, and 'quit;' or 'exit;' when you’ve had enough fun. But lets try something slightly more interesting:

cqlsh> CREATE KEYSPACE schema1
       WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> USE schema1;
cqlsh:Schema1> CREATE TABLE users (
                 user_id varchar PRIMARY KEY,
                 first varchar,
                 last varchar,
                 age int
               );
cqlsh:Schema1> INSERT INTO users (user_id, first, last, age)
               VALUES ('jsmith', 'John', 'Smith', 42);
cqlsh:Schema1> SELECT * FROM users;
 user_id | age | first | last
---------+-----+-------+-------
  jsmith |  42 |  john | smith
cqlsh:Schema1>

If your session looks similar to what’s above, congrats, your single node cluster is operational!

For more on what commands are supported by CQL, see the CQL reference. A reasonable way to think of it is as, "SQL minus joins and subqueries, plus collections."

Wondering where to go from here?