Kafka Zookeeper to Kraft Migration

7 Kafka Zookeeper to Kraft Migration

The following sections describes the procedure for Kafka Zookeeper to Kraft Migration.

Kafka cluster management using Zookeeper has already been deprecated and will be removed in an upcoming Kafka release. It is recommended to use the Kafka KRaft controller for cluster management and to migrate Zookeeper-based Kafka cluster deployments to KRaft-controlled deployments.

This section provides an overview of the migration flow and outlines the procedure to facilitate the migration of a Zookeeper-based Kafka cluster to a KRaft-controlled Kafka cluster in Data Director.

7.1 Migration of Kafka Cluster to Kraft mode

This procedure is required to migrate a Kafka cluster running in Zookeeper mode to KRaft mode. Follow the steps below to complete the migration.

Note:

The parameter global.kafka.kraftEnabled must remain set to false during the migration from Zookeeper mode to KRaft mode.

Provision the KRaft Controller Quorum

Modify the ocnadd-custom-values-<wg-group>.yaml file created for the worker group (or default worker group) under the relevant chart folder, and update it as shown below:
```
global.kafka.spawnKraftController: false  ## ---> Update it to 'true'
```

Perform a Helm upgrade for the default Worker Group or the Worker Group in a separate namespace, and wait for the kraft-controller replicas to come up in a running state:

helm upgrade <worker-group-release-name> -f ocnadd-custom-values-<wg-group>.yaml --namespace <worker-group-namespace> <helm_chart>

Example:

helm upgrade ocnadd-wg -f ocnadd-custom-values-wg-group.yaml --namespace dd-worker-group ocnadd_wg

Verify that the KRaft controller pods are successfully spawned and the kraft-controller service is created:


pod/kraft-controller-0      1/1     Running     0     2m18s
pod/kraft-controller-1      1/1     Running     0     2m03s
pod/kraft-controller-2      1/1     Running     0     109s

service/kraft-controller    ClusterIP   None           <none>   9085/TCP   3m26s
service/kraft-controller-cs ClusterIP   10.233.23.86   <none>   9080/TCP   3m26s

Put the Kafka Brokers in Migration Mode
1. Modify the ocnadd-custom-values-<wg-group>.yaml file and update the following parameter:
```
global.kafka.migrationBroker: false  ## ---> Update it to 'true'
```
2. Perform a Helm upgrade for the Worker Group, and wait for the kafka-broker replicas to restart:
```
helm upgrade <worker-group-release-name> -f ocnadd-custom-values-<wg-group>.yaml --namespace <worker-group-namespace> <helm_chart>
```
  Example:
```
helm upgrade ocnadd-wg -f ocnadd-custom-values-wg-group.yaml --namespace dd-worker-group ocnadd_wg
```
3. Verify that all Kafka broker pods are restarted and are in the running state.
Migrate the Kafka Brokers to Use KRaft Mode
1. a. Modify the ocnadd-custom-values-<wg-group>.yaml file and update the following parameter:
```
global.kafka.kraftBroker: false  ## ---> Update it to 'true'
```
2. Perform a Helm upgrade for the Worker Group, and wait for the kafka-broker replicas to restart:
```
helm upgrade <worker-group-release-name> -f ocnadd-custom-values-<wg-group>.yaml --namespace <worker-group-namespace> <helm_chart>
```
  Example:
```
helm upgrade ocnadd-wg -f ocnadd-custom-values-wg-group.yaml --namespace dd-worker-group ocnadd_wg
```
3. Verify that all Kafka broker and KRaft controller pods are restarted and are in the running state.
Finalize the Migration
Caution:
- Some users prefer to wait for a week or two before finalizing the migration. While this requires you to keep the Zookeeper cluster running for a while longer, it may be helpful in validating Kraft mode in your cluster.
- After completing this procedure, reverting to Zookeeper mode is no longer possible. If any error is encountered during the previous steps, rollback should be performed following the section Rolling Back to Zookeeper Mode During The Migration.
- Once the migration is finalized, rollback of the Data Director deployment to a previous release is not supported.
1. Modify the ocnadd-custom-values-<wg-group>.yaml file and update the following parameter:
```
global.kafka.finalizeMigration: false  ## ---> Update it to 'true'
```
2. Perform a Helm upgrade for the Worker Group, and wait for the kafka-broker replicas to restart:
```
helm upgrade <worker-group-release-name> -f ocnadd-custom-values-<wg-group>.yaml --namespace <worker-group-namespace> <helm_chart>
```
  Example:
```
helm upgrade ocnadd-wg -f ocnadd-custom-values-wg-group.yaml --namespace dd-worker-group ocnadd_wg
```
3. Verify that all Kafka broker and KRaft controller pods are restarted and are in the running state.
Deprovision Zookeeper
1. Modify the ocnadd-custom-values-<wg-group>.yaml file and update the following parameter to deprovision the Zookeeper cluster:
```
global.kafka.removeZookeeper: false  ## ---> Update it to 'true'
```
  Note: The following parameters must also be updated if OCCM was used to generate the certificates for Zookeeper:
```
global.kafka.occmZookeeperClientUUID: 0969e413-7aa4-48cd-9300-1932de778272  ## UUID of ZOOKEEPER-SECRET-CLIENT-<namespace> from OCCM
global.kafka.occmZookeeperServerUUID: 8352fabe-8da4-4bf5-8123-c12e5a78965b  ## UUID of ZOOKEEPER-SECRET-SERVER-<namespace> from OCCM
```
  UUIDs can be obtained by logging into the OCCM UI and searching for the certificates ZOOKEEPER-SECRET-CLIENT-<namespace> and ZOOKEEPER-SECRET-SERVER-<namespace>
2. Perform a Helm upgrade for the Worker Group and wait for the Zookeeper pods to terminate:
```
helm upgrade <worker-group-release-name> -f ocnadd-custom-values-<wg-group>.yaml --namespace <worker-group-namespace> <helm_chart>
```
  Example:
```
helm upgrade ocnadd-wg -f ocnadd-custom-values-wg-group.yaml --namespace dd-worker-group ocnadd_wg
```
3. Verify that Zookeeper secrets have been deleted:
```
kubectl get secret -n <worker-group-namespace>
 
No secrets for the Zookeeper should be listed for the provided worker group namespace
```
  d. Verify that Zookeeper PVCs have been deleted:
```
kubectl get pvc -n <worker-group-namespace>
 
No PVCs for the Zookeeper should be listed for the provided worker group namespace
```

Caution:

Ensure to revert the values of the following parameters to their default settings in the corresponding ocnadd-custom-values-<wg-group>.yaml file after the migration procedure is completed. Failure to do so may impact subsequent upgrades.


kafka:
  kraftEnabled: false
  spawnKraftController: false
  migrationBroker: false
  kraftBroker: false
  finalizeMigration: false
  removeZookeeper: false

7.2 Rolling Back to Zookeeper Mode During The Migration

While the Kafka cluster is still in migration mode, it is possible to revert to Zookeeper mode. The steps to follow depend on how far the migration has progressed. To determine how to revert to Zookeeper mode, identify the final migration stage that has been completed from the table and take the corresponding action.

Provision the Kraft controller quorum

Stage: Provision the Kraft controller quorum

Action Required: Deprovision the Kraft controller quorum

Detailed steps to move back to Zookeeper mode:

Modify the ocnadd-custom-values-<wg-group>.yaml file created for the worker group (or default worker group) under the relevant chart folder and update the following parameter:
```
global.kafka.spawnKraftController: true  ## ---> Update it to 'false'
```

Perform a Helm upgrade for the default Worker Group or the Worker Group in a separate namespace, and wait for the kraft-controller replicas to terminate:

helm upgrade <worker-group-release-name> -f ocnadd-custom-values-<wg-group>.yaml --namespace <worker-group-namespace> <helm_chart>

Example:

helm upgrade ocnadd-wg -f ocnadd-custom-values-wg-group.yaml --namespace dd-worker-group ocnadd_wg

Put the Kafka brokers in the migration mode

Stage: Put the Kafka brokers in the migration mode

Action Required:

Deprovision the Kraft controller quorum.
Using zookeeper-shell.sh, run delete /controller so that one of the brokers can become the new old-style controller. Additionally, run get /migration followed by delete /migration to clear the migration state from ZooKeeper. This will allow you to re-attempt the migration in the future. The data read from /migration can also be useful for debugging.

Note:
It is important to perform the zookeeper-shell.sh steps quickly to minimize the amount of time the cluster operates without a controller. Until the /controller znode is deleted, you can ignore any errors in the broker log related to failing to connect to the Kraft controller. These error logs should disappear after the cluster fully reverts to pure ZooKeeper mode.
Revert the Kafka broker migration mode.

Detailed steps to move back to Zookeeper mode:

Modify the ocnadd-custom-values-<wg-group>.yaml file Located under the relevant chart folder for the worker group (or the default worker group), update the following value:
```
global.kafka.spawnKraftController: true   # ---> Update to 'false'
```
Perform a Helm upgrade for the default Worker Group or the Worker Group in a separate namespace, then wait for the kraft-controller replicas to terminate.
```
helm upgrade <worker-group-release-name> -f ocnadd-custom-values-<wg-group>.yaml --namespace <worker-group-namespace> <helm_chart>
```
Example:
```
helm upgrade ocnadd-wg -f ocnadd-custom-values-wg-group.yaml --namespace dd-worker-group ocnadd_wg
```
Using zookeeper-shell.sh, run delete /controller so that one of the brokers can become the new old-style controller. Additionally, run get /migration followed by delete /migration to clear the migration state from ZooKeeper. This will allow you to re-attempt the migration in the future. The data retrieved from /migration can also be useful for debugging.
Steps:
1. Exec into the ZooKeeper pod:
```
kubectl exec -it -n <namespace> zookeeper-0 -- bash
```
  b. Change directory:
```
cd kafka/bin/
```
  c. Run the ZooKeeper shell and execute commands:
```
./zookeeper-shell.sh localhost:2181
delete /controller
get /migration
delete /migration
quit
exit
```
2. Perform the above steps on all ZooKeeper instances.
Modify the ocnadd-custom-values-<wg-group>.yaml file created for the worker group (or default worker group) under the relevant chart folder and update it as below:
```
global.kafka.migrationBroker: true   # ---> Update to 'false'
```
Perform a Helm upgrade for the default Worker Group or the Worker Group in the separate namespace and wait for the kafka-broker replicas to restart.
```
helm upgrade <worker-group-release-name> -f ocnadd-custom-values-<wg-group>.yaml --namespace <worker-group-namespace> <helm_chart>
```
Example:
```
helm upgrade ocnadd-wg -f ocnadd-custom-values-wg-group.yaml --namespace dd-worker-group ocnadd_wg
```

Migrate the Kafka brokers to use Kraft mode

Stage: Migrate the Kafka brokers to use Kraft mode

Action Required:

Revert the Kraft mode in Kafka brokers
Deprovision the Kraft Controller Quorum
Using zookeeper-shell.sh, run delete /controller so that one of the brokers can become the new old-style controller. Additionally, run get /migration followed by delete /migration to clear the migration state from ZooKeeper. This will allow you to re-attempt the migration in the future. The data read from /migration can also be useful for debugging.

Note:
It is important to perform the zookeeper-shell.sh steps quickly to minimize the time the cluster operates without a controller. Until the /controller znode is deleted, you can ignore any errors in the broker logs related to failing to connect to the Kraft controller. These error logs should disappear after the cluster fully reverts to pure ZooKeeper mode.
Revert the Kafka Broker Migration Mode

Detailed steps to move back to Zookeeper mode:

Modify the ocnadd-custom-values-<wg-group>.yaml file created for the worker group (or default worker group) under the relevant chart folder and update it as below:
```
global.kafka.kraftBroker: true  ##---> Update it to 'false'
```
Perform Helm upgrade for the default Worker Group or the Worker Group in separate namespace and wait for the "kafka-broker" replicas to restart:
```
helm upgrade <worker-group-release-name> -f ocnadd-custom-values-<wg-group>.yaml --namespace <worker-group-namespace> <helm_chart> 
```
Example:
```
helm upgrade ocnadd-wg -f ocnadd-custom-values-wg-group.yaml --namespace dd-worker-group ocnadd_wg
```
Modify the ocnadd-custom-values-<wg-group>.yaml file created for the worker group (or default worker group) under the relevant chart folder and update it as below:
```
global.kafka.spawnKraftController: true  ##---> Update it to 'false'
```
Perform Helm upgrade for the default Worker Group or the Worker Group in separate namespace and wait for the "kafka-controller" replicas to terminate
```
helm upgrade <worker-group-release-name> -f ocnadd-custom-values-<wg-group>.yaml --namespace <worker-group-namespace> <helm_chart> 
```
Example:
```
helm upgrade ocnadd-wg -f ocnadd-custom-values-wg-group.yaml --namespace dd-worker-group ocnadd_wg
```
Using zookeeper-shell.sh, run delete /controller so that one of the brokers can become the new old-style controller. Then, run get /migration followed by delete /migration to clear the migration state from ZooKeeper. This will allow you to re-attempt the migration in the future. The data retrieved from /migration can also be useful for debugging.
Steps:
1. Exec into the ZooKeeper pod:
```
kubectl exec -it -n <namespace> zookeeper-0 -- bash
```
2. Change directory to Kafka binaries:
```
cd kafka/bin/
```
3. Run zookeeper-shell.sh and execute the following commands:
```
./zookeeper-shell.sh localhost:2181
delete /controller
get /migration
delete /migration
quit
exit
```
4. Perform the above steps on all ZooKeeper instances.
Modify the ocnadd-custom-values-<wg-group>.yaml file created for the worker group (or default worker group) under the relevant chart folder and update it as below:
```
global.kafka.migrationBroker: true  ##---> Update it to 'false'
```
Run Helm upgrade for the worker group namespace.
```
helm upgrade <worker-group-release-name> -f ocnadd-custom-values-<wg-group>.yaml --namespace <worker-group-namespace> <helm_chart> 
```
Example:
```
helm upgrade ocnadd-wg -f ocnadd-custom-values-wg-group.yaml --namespace dd-worker-group ocnadd_wg
```

Finalize the migration

Stage: Finalize the migration

Action Required:

If you have finalized the ZooKeeper (ZK) migration, you cannot revert to the previous mode.
Some users choose to wait for a week or two before finalizing the migration. While this requires keeping the ZooKeeper cluster running for a bit longer, it can be beneficial for validating Kraft mode stability in your cluster.

Detailed steps to move back to Zookeeper mode:

Disaster recovery should be carried out using the latest available backup, typically taken as part of the nightly backup process. For detailed steps, refer to the "Deployment Failure" section in the Oracle Communications Network Analytics Suite Installation, Upgrade, and Fault Recovery Guide.