I recently got engaged (NOTE: dragged) to work on HA testing for vCloud Director installation.
The setup is a full blown VCD setup that will be used for external customers. Full blown meaning: customized front end, 3 xCells, 2 x AMPQ, 3xCassandraNode, and a partridge in a pear tree.
Anyway – just wanted to document some ad-hoc tasks we did:
How to NUKE/ clear Cassandra DB:
- Stop cassandra service
- Delete contents on the following using rm -rf <directory>
-
- /var/lib/cassandra/commitlog/
- /var/lib/cassandra/data/
- /var/lib/cassandra/saved_cache/
-
- OPTIONAL: if you deleted using rm-rf, you may need to recreate the directories (commitlog, data, and saved_cache). Make sure to have the directories owned by cassandra user and cassandra group by doing the following:
chown -R cassandra:cassandra <directory>
4. Start cassandra service.
Do this for all the nodes.
RabbitMQ / Queue Name not included in the HA mode
Whats happening: Queue Name is not failing over to other surviving nodes during a fail-over.
Cause: This is because the Queue name is not configured for HA.
To enable: execute the following in one of the RabbitMQ nodes:
rabbitmqctl set_policy ha-all “” ‘{“ha-mode”:”all”,”ha-sync-mode”:”automatic”}’
Enable Node HA by changing replication factor for KairosDB from within Cassandra Database
For vCD <9.0, KairosDB is needed to store metrics. Kairos runs on top of Cassandra and by default, the replication factor is set to 1. This means, the kairos metrics is getting stripped across the nodes which would mean – a failure of one node will make some of the metrics data inaccessible.
To get around this, you need to specifically say to cassandra to REPLICATE the kairosDB across all nodes.
How:
- cqlsh <node IP of one of the node>
- You only have to do this in one node
- Issue the following command:
- ALTER KEYSPACE kairosdb WITH replication = {‘class’: ‘SimpleStrategy’, ‘replication_factor’: 3};
- kairosdb is the name of the keyspace where the metrics is stored.
- ALTER KEYSPACE kairosdb WITH replication = {‘class’: ‘SimpleStrategy’, ‘replication_factor’: 3};
- For each of the nodes:
- nodetool repair kairosdb
- kairosdb is the name of the keyspace where the metrics is stored.
- Make sure to complete the repair BEFORE proceeding to the next node
- nodetool repair kairosdb
- Verify if successfull by issuing
- nodetool status
- Owns should be 100% for all the nodes
- nodetool status
Hope that helps
Cheers,
nice!