Installing a Cassandra Cluster Step By Step

Setting Up an Apache Cassandra Cluster

📊Apache Cassandra is a highly scalable, distributed NoSQL database designed to handle large amounts of data across many commodity servers. Cassandra’s architecture provides high availability and fault tolerance without a single point of failure. In this blog, we’ll walk through setting up a basic Cassandra cluster with detailed steps and commands.

⚙️ Prerequisites

  • Operating System: Cassandra works well on Linux distributions like Ubuntu, CentOS, or Red Hat.
  • Java: Cassandra requires Java 8 or newer. Ensure Java is installed on each server in the cluster.
  • Servers: At least two or more servers are needed for a cluster setup (to demonstrate replication and distribution).
  • Networking: All servers should be networked with each other. Ensure that required ports (7000, 9042, and others) are open.

🔧 Step 1: Setting Up the Environment

Update Packages on all nodes:

sudo apt-get update
sudo apt-get upgrade

Install Java (if not already installed):

sudo apt-get install openjdk-11-jdk -y

Confirm Java installation:

java -version

The output should show a version of Java 8 or newer.

📥 Step 2: Download and Install Apache Cassandra

Add the Cassandra Repository:

echo "deb https://debian.cassandra.apache.org 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
wget -q -O - https://debian.cassandra.apache.org/debian/repo_key | sudo apt-key add -

Install Cassandra:

sudo apt-get update
sudo apt-get install cassandra -y

Verify Installation:

sudo systemctl status cassandra

Check the running Cassandra process:

nodetool status

🛠️ Step 3: Configure Cassandra for Cluster Setup

Edit the primary configuration file cassandra.yaml located in /etc/cassandra/:

sudo nano /etc/cassandra/cassandra.yaml

Modify the following parameters:

    • Cluster Name: Set the same cluster name on all nodes.
cluster_name: 'MyCassandraCluster'
    • Seeds: Specify the IP addresses of seed nodes.
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
      - seeds: "192.168.1.1,192.168.1.2"
    • Listen Address: Set to the node’s IP address.
listen_address: 192.168.1.1
    • RPC Address: Set to the node’s IP address or 0.0.0.0:
rpc_address: 0.0.0.0
    • Endpoint Snitch: For a simple cluster setup, use:
endpoint_snitch: SimpleSnitch

Save and close the file. Restart Cassandra to apply changes:

sudo systemctl restart cassandra

Step 4: Verify the Cluster

Use nodetool status to check the cluster’s status:

nodetool status

The output should show the cluster’s nodes and their statuses.

📂 Step 5: Create a Keyspace and Test Replication

Open the Cassandra Command Line Interface (cqlsh):

cqlsh 192.168.1.1

Create a Keyspace:

CREATE KEYSPACE test_keyspace WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 2};

Use the Keyspace and Create a Table:

USE test_keyspace;
CREATE TABLE users (user_id UUID PRIMARY KEY, name TEXT, email TEXT);

Insert Data and Query:

INSERT INTO users (user_id, name, email) VALUES (uuid(), 'John Doe', '[email protected]');
SELECT * FROM users;

📡 Step 6: Troubleshooting and Seed Connection

Seed Connection Explanation:

Seed nodes act as the entry point for new nodes joining the cluster. When a node starts, it contacts the seed nodes to learn about the cluster topology. Ensure the following for proper connectivity:

  • All nodes can ping the seed nodes by their IP addresses.
  • Ports 7000 (intra-node communication) and 9042 (client communication) are open on all nodes.
  • The seed nodes have consistent cassandra.yaml configurations with accurate seed IPs.

Troubleshooting Steps:

It seems the troubleshooting section is still lacking substantial additions to meet your expectations. Let me create a fully detailed troubleshooting guide and ensure it enhances the existing document appropriately.

Detailed Troubleshooting Additions:

  1. Seed Node Connectivity Issues:
    • Ping Test: Confirm that all nodes can communicate with the seed nodes:
      ping <seed-node-IP>
      

      If the ping fails, verify network configurations and firewalls.

    • Ports Test: Ensure ports 7000 (intra-node communication) and 9042 (client communication) are open. Use the nc command:
      nc -zv <seed-node-IP> 7000
      
    • DNS Resolution: If using hostnames instead of IPs for seeds, ensure the DNS resolves correctly:
      nslookup <seed-node-hostname>
      
  2. Common Errors and Fixes:
    • Error: Node Not Joining Cluster
      • Cause: Incorrect seed IPs or mismatched cluster name.
      • Fix: Ensure the same cluster_name is set in all cassandra.yaml files, and the seeds list is accurate.
    • Error: JVM Heap Size Issues
      • Cause: Memory exhaustion.
      • Fix: Modify the heap size settings in /etc/cassandra/jvm.options:
        -Xms4G
        -Xmx4G
        
    • Error: Read/Write Timeout
      • Cause: High load or network latency.
      • Fix: Increase timeouts in cassandra.yaml:
        read_request_timeout_in_ms: 10000
        write_request_timeout_in_ms: 10000
        
  3. Logs and Debugging:
    • Use Cassandra logs for detailed error messages:
      sudo tail -f /var/log/cassandra/system.log
      

      Look for messages about gossip, seed discovery, or schema disagreements.

  4. Testing Cluster State:
    • Validate the cluster setup with:
      nodetool status
      

      Check the UN (Up and Normal) status for all nodes. If nodes are down, revisit configurations.

 

Related articles

How to Create Public Load Balancer in Azure

How to Create Public Load Balancer in Azure A comprehensive guide to setting up and configuring Azure Load Balancers...

Git Version Control

Git Version Control How Git Version Control Works? Git is the most widely used distributed version control system in the...

Types of Cloud Computing

Types of Cloud Computing Cloud computing has revolutionized the way businesses store, process, and manage data. By delivering computing...

GCP Cost Optimization Strategies for 2026

GCP Cost Optimization Strategies for 2026 Google Cloud continues to grow rapidly in 2026 as companies adopt more AI...