Wednesday, January 26, 2011

Simple python script for generating Cassandra initial tokens

When using a RandomPartitioner, it is recommended that you specify the initial tokens. On the Cassandra Operations wiki page, it says:
Using a strong hash function means RandomPartitioner keys will, on average, be evenly spread across the Token space, but you can still have imbalances if your Tokens do not divide up the range evenly, so you should specify InitialToken to your first nodes as i * (2**127 / N) for i = 0 .. N-1. In Cassandra 0.7, you should specify initial_token in cassandra.yaml.
Here is a simple python script for generating them:
#! /usr/bin/python
import sys
if (len(sys.argv) > 1):
    num=int(raw_input("How many nodes are in your cluster? "))
for i in range(0, num):
    print 'node %d: %d' % (i, (i*(2**127)/num))
So it will take either a command-line arg for the number of nodes or will ask if none is given. For three nodes, it will give the following output:
node 0: 0
node 1: 56713727820156410577229101238628035242
node 2: 113427455640312821154458202477256070485
This post was adapted from this, just updated the script and corrected the formula.