Installing a Cassandra Cluster Step By Step
Setting Up an Apache Cassandra Cluster
⚙️ Prerequisites
- Operating System: Cassandra works well on Linux distributions like Ubuntu, CentOS, or Red Hat.
- Java: Cassandra requires Java 8 or newer. Ensure Java is installed on each server in the cluster.
- Servers: At least two or more servers are needed for a cluster setup (to demonstrate replication and distribution).
- Networking: All servers should be networked with each other. Ensure that required ports (7000, 9042, and others) are open.
🔧 Step 1: Setting Up the Environment
Update Packages on all nodes:
sudo apt-get update
sudo apt-get upgrade
Install Java (if not already installed):
sudo apt-get install openjdk-11-jdk -y
Confirm Java installation:
java -version
The output should show a version of Java 8 or newer.
📥 Step 2: Download and Install Apache Cassandra
Add the Cassandra Repository:
echo "deb https://debian.cassandra.apache.org 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
wget -q -O - https://debian.cassandra.apache.org/debian/repo_key | sudo apt-key add -
Install Cassandra:
sudo apt-get update
sudo apt-get install cassandra -y
Verify Installation:
sudo systemctl status cassandra
Check the running Cassandra process:
nodetool status
🛠️ Step 3: Configure Cassandra for Cluster Setup
Edit the primary configuration file cassandra.yaml located in /etc/cassandra/:
sudo nano /etc/cassandra/cassandra.yaml
Modify the following parameters:
-
- Cluster Name: Set the same cluster name on all nodes.
cluster_name: 'MyCassandraCluster'
-
- Seeds: Specify the IP addresses of seed nodes.
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "192.168.1.1,192.168.1.2"
-
- Listen Address: Set to the node’s IP address.
listen_address: 192.168.1.1
-
- RPC Address: Set to the node’s IP address or
0.0.0.0:
- RPC Address: Set to the node’s IP address or
rpc_address: 0.0.0.0
-
- Endpoint Snitch: For a simple cluster setup, use:
endpoint_snitch: SimpleSnitch
Save and close the file. Restart Cassandra to apply changes:
sudo systemctl restart cassandra
✅ Step 4: Verify the Cluster
Use nodetool status to check the cluster’s status:
nodetool status
The output should show the cluster’s nodes and their statuses.
📂 Step 5: Create a Keyspace and Test Replication
Open the Cassandra Command Line Interface (cqlsh):
cqlsh 192.168.1.1
Create a Keyspace:
CREATE KEYSPACE test_keyspace WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 2};
Use the Keyspace and Create a Table:
USE test_keyspace;
CREATE TABLE users (user_id UUID PRIMARY KEY, name TEXT, email TEXT);
Insert Data and Query:
INSERT INTO users (user_id, name, email) VALUES (uuid(), 'John Doe', '[email protected]');
SELECT * FROM users;
📡 Step 6: Troubleshooting and Seed Connection
Seed Connection Explanation:
Seed nodes act as the entry point for new nodes joining the cluster. When a node starts, it contacts the seed nodes to learn about the cluster topology. Ensure the following for proper connectivity:
- All nodes can ping the seed nodes by their IP addresses.
- Ports
7000(intra-node communication) and9042(client communication) are open on all nodes. - The seed nodes have consistent
cassandra.yamlconfigurations with accurate seed IPs.
Troubleshooting Steps:
It seems the troubleshooting section is still lacking substantial additions to meet your expectations. Let me create a fully detailed troubleshooting guide and ensure it enhances the existing document appropriately.
Detailed Troubleshooting Additions:
- Seed Node Connectivity Issues:
- Ping Test: Confirm that all nodes can communicate with the seed nodes:
ping <seed-node-IP>If the ping fails, verify network configurations and firewalls.
- Ports Test: Ensure ports 7000 (intra-node communication) and 9042 (client communication) are open. Use the
nccommand:nc -zv <seed-node-IP> 7000 - DNS Resolution: If using hostnames instead of IPs for seeds, ensure the DNS resolves correctly:
nslookup <seed-node-hostname>
- Ping Test: Confirm that all nodes can communicate with the seed nodes:
- Common Errors and Fixes:
- Error: Node Not Joining Cluster
- Cause: Incorrect seed IPs or mismatched cluster name.
- Fix: Ensure the same
cluster_nameis set in allcassandra.yamlfiles, and theseedslist is accurate.
- Error: JVM Heap Size Issues
- Cause: Memory exhaustion.
- Fix: Modify the heap size settings in
/etc/cassandra/jvm.options:-Xms4G -Xmx4G
- Error: Read/Write Timeout
- Cause: High load or network latency.
- Fix: Increase timeouts in
cassandra.yaml:read_request_timeout_in_ms: 10000 write_request_timeout_in_ms: 10000
- Error: Node Not Joining Cluster
- Logs and Debugging:
- Use Cassandra logs for detailed error messages:
sudo tail -f /var/log/cassandra/system.logLook for messages about gossip, seed discovery, or schema disagreements.
- Use Cassandra logs for detailed error messages:
- Testing Cluster State:
- Validate the cluster setup with:
nodetool statusCheck the
UN(Up and Normal) status for all nodes. If nodes are down, revisit configurations.
- Validate the cluster setup with:
