cassandradatastax-enterprise

Adding single token nodes to existing datastax cassandra Cluster and data transfer is not working


Adding a new single token per nodes to existing datastax cluster and data transfer is not working. Process followed is mentioned below. Please update me if the process i followed is wrong.Thanks

We have 3 Single token range datastax nodes in our AWS EC2 Datacenter, both Search and Graph enabled. We are planning to add 3 more nodes into into our datacenter. We are currently using DseSimpleSnitch and Simple network topology for our keyspace.Also our current replication factor is 2.

Node 1 : 10.10.1.36
Node 2 : 10.10.1.46
Node 3 : 10.10.1.56

 cat /etc/default/dse | grep -E 'GRAPH_ENABLED=|SOLR_ENABLED='
   GRAPH_ENABLED=1  
   SOLR_ENABLED=1  

Datacenter : SearchGraph

Address     Rack          Status   State    Load      Owns Token               
10.10.1.46  rack1       Up     Normal  760.14 MiB  ? -9223372036854775808                  
10.10.1.36  rack1       Up     Normal  737.69 MiB  ? -3074457345618258603                   
10.10.1.56  rack1       Up     Normal  752.25 MiB  ? 3074457345618258602                   

Step (1) For adding 3 new node into our datacenter first we changed our keyspace topology and snitch to network aware.

1)Changed the snitch. cat /etc/dse/cassandra/cassandra.yaml | grep endpoint_snitch: endpoint_snitch: GossipingPropertyFileSnitch

cat /etc/dse/cassandra/cassandra-rackdc.properties |grep -E 'dc=|rack='
  dc=SearchGraph
  rack=rack1

2) (a) Shut down all the nodes, then restart them.

(b) Run a sequential repair and nodetool cleanup on each node.

3)Changed keyspace topology.

ALTER KEYSPACE tech_app1 WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'SearchGraph' : 2};
ALTER KEYSPACE tech_app2 WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'SearchGraph' : 2};
ALTER KEYSPACE tech_chat WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'SearchGraph' : 2};

Reference : http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsChangeKSStrategy.html , http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsSwitchSnitch.html

Step (2) For updating token range and setting up new cassandra node, we follow below process.

1) Recalculate token range

root@ip-10-10-1-36:~# token-generator

DC #1:

Node #1:  -9223372036854775808
Node #2:  -6148914691236517206
Node #3:  -3074457345618258604
Node #4:                    -2
Node #5:   3074457345618258600
Node #6:   6148914691236517202

2) Installed Datastax enterprise same version on new nodes.

3) Stopped the node service and and cleared the data.

4) (a) Assigned token range in following manner to new node.

Node 4: 10.10.2.96     Range: -2 
Node 5: 10.10.2.97     Range: 3074457345618258600
Node 6: 10.10.2.86     Range: 6148914691236517202

4) (b) Configured cassandra.yaml on each new node:

Node 4 :

cluster_name: 'SearchGraph' 
num_tokens: 1
initial_token: -2  
parameters: 
- seeds: "10.10.1.46, 10.10.1.56" 
listen_address: 10.10.2.96 
rpc_address: 10.10.2.96 
endpoint_snitch: GossipingPropertyFileSnitch

Node 5 :

cluster_name: 'SearchGraph' 
num_tokens: 1
initial_token: 3074457345618258600  
parameters: 
- seeds: "10.10.1.46, 10.10.1.56" 
listen_address: 10.10.2.97 
rpc_address: 10.10.2.97
endpoint_snitch: GossipingPropertyFileSnitch

Node 6 :

cluster_name: 'SearchGraph' 
num_tokens: 1
initial_token: 6148914691236517202   
parameters: 
- seeds: "10.10.1.46, 10.10.1.56" 
listen_address: 10.10.2.86 
rpc_address: 10.10.2.86 
endpoint_snitch: GossipingPropertyFileSnitch

5) Changed the snitch.

cat /etc/dse/cassandra/cassandra.yaml | grep endpoint_snitch:
endpoint_snitch: GossipingPropertyFileSnitch

cat /etc/dse/cassandra/cassandra-rackdc.properties |grep -E 'dc=|rack='
dc=SearchGraph
rack=rack1

6) Start DataStax Enterprise on each new node in two minutes intervals with consistent.rangemovement turned off:

JVM_OPTS="$JVM_OPTS -Dcassandra.consistent.rangemovement=false

7) After the new nodes are fully bootstrapped, used nodetool move to assign the new initial_token for existing nodes as per token recalculation done at step 4(a). Process done on each node one at a time.

On  Node 1(10.10.1.36)  :  nodetool move -3074457345618258603
On  Node 2(10.10.1.46)  :  nodetool move -9223372036854775808
On  Node 3(10.10.1.56)  :  nodetool move  3074457345618258602

Datacenter: SearchGraph

Address     Rack        Status State   Load            Owns                Token

10.10.1.46  rack1       Up     Normal  852.93 MiB ? -9223372036854775808
10.10.1.36  rack1       Up     Moving  900.12 MiB ? -3074457345618258603
10.10.2.96  rack1       UP     Normal  465.02 KiB ? -2
10.10.2.97  rack1       Up     Normal  109.16 MiB ? 3074457345618258600
10.10.1.56  rack1       Up     Moving  594.49 MiB ? 3074457345618258602
10.10.2.86  rack1       Up     Normal  663.94 MiB ? 6148914691236517202

Post Updated:

But we are getting following error while joining nodes.

AbstractSolrSecondaryIndex.java:1884 - Cannot find core chat.chat_history
AbstractSolrSecondaryIndex.java:1884 - Cannot find core chat.history
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.business_units
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.feeds
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.feeds_2
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.knowledegmodule
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.userdetails
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.userdetails_2
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.vault_details
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.workgroup
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.feeds
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.knowledgemodule
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.organizations
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.userdetails
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.vaults
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.workgroup

Node joining failed with following error :

ERROR [main] 2017-08-10 04:22:08,449  DseDaemon.java:488 - Unable to start DSE server.
com.datastax.bdp.plugin.PluginManager$PluginActivationException: Unable to activate plugin com.datastax.bdp.plugin.SolrContainerPlugin


Caused by: java.lang.IllegalStateException: Cannot find secondary index for core ekamsearch.userdetails_2, did you create it? 
If yes, please consider increasing the value of the dse.yaml option load_max_time_per_core, current value in minutes is: 10

ERROR [main] 2017-08-10 04:22:08,450  CassandraDaemon.java:705 - Exception encountered during startup
java.lang.RuntimeException: com.datastax.bdp.plugin.PluginManager$PluginActivationException: Unable to activate plugin

Has anyone encountered these errors or warnings before?


Solution

  • Token Assign Issue ::

    1) I had wrongly assigned token range in Step 4) (a). Assign token which 
       bisect or trisect the value which are generated using  
       "token-generator"
             Node 4: 10.10.2.96     Range: -6148914691236517206 
             Node 5: 10.10.2.97     Range: -2
             Node 6: 10.10.2.86     Range: 6148914691236517202
    
    Note : We don't need to change the token range of existing nodes in data   
           center.No need to follow procedure in Step 7 which i have mentioned 
           above.
    

    Solr Issue resolved : Cannot find cor ::

    Increased load_max_time_per_core value in  dse.yaml configuration file, 
    still i was receving the error.Finalys solved the issue 
    by following method
    
         1) Started the new nodes as non-solr and wait for all cassandra data  
            to migrate to joining nodes.
         2) Add the parameter auto_bootstrap: False directive to the 
            cassandra.yaml file
         3) Re-start the same nodes after enabling solr. Changed parameter 
            SOLR_ENABLED=1 in /etc/default/dse
         3) Re-index in all new joined nodes. I had to reloaded all core 
            required with the reindex=true and distributed=false parameters in 
            new  joined nodes. 
            Ref : http://docs.datastax.com/en/archived/datastax_enterprise/4.0/datastax_enterprise/srch/srchReldCore.html