Apache Cassandra overview

Apache Cassandra – Overview

1. Introduction

Apache Cassandra is a distributed, column-oriented NoSQL database designed to handle large volumes of data across multiple servers with high availability and scalability.

It was originally developed by Facebook and later became an Apache project.

2. Key Characteristics of Cassandra

Distributed database system
No single point of failure
High availability
Linear scalability
Designed for write-heavy workloads

3. Cassandra Data Model (Brief)

Cassandra uses a column-family data model.

Main Elements

Keyspace – Similar to a database
Table – Similar to a table in RDBMS
Row – Identified by a primary key
Column – Name–value pair

4. Cassandra Architecture (High Level)

Uses a peer-to-peer architecture
All nodes are equal (no master)
Data is distributed using consistent hashing

5. Features of Apache Cassandra

1. High Availability

Data replicated across multiple nodes
Automatic failover

2. Scalability

Easy horizontal scaling
Add or remove nodes without downtime

3. Fault Tolerance

No single point of failure
System continues working even if a node fails

4. Tunable Consistency

User can choose consistency level
Balances consistency and availability

5. High Performance

Optimized for fast writes
Suitable for real-time applications

6. Consistency in Cassandra

Cassandra supports eventual consistency by default, but consistency can be adjusted using:

ONE
QUORUM
ALL

7. Use Cases of Cassandra

Time-series data
Messaging systems
IoT applications
Real-time analytics
Recommendation engines

8. Advantages of Cassandra

Highly available
Handles big data efficiently
No downtime during scaling
Reliable and fault tolerant

9. Limitations

Limited support for complex queries
Not suitable for joins
Learning curve is high

Notes By Kiran Sir

Search This Blog