Upgrading the UIM Cloud Native Environment

11 Upgrading the UIM Cloud Native Environment

This chapter describes the tasks you perform in order to apply a change or upgrade to a component in the cloud native environment.

Creating a detailed upgrade plan can be a complex process. It is useful to start by mapping your use case to an upgrade path. These upgrade paths identify a set of sequenced activities that align to a CD stage. Once you know the activity sequence, you can then look for the detailed steps involved in each to come up with the comprehensive set of steps to be performed.

Upgrade paths consist of activities that fall into the following two main categories:

Operational Procedures
Component Upgrade Procedures

Operational Procedures

There are many different operational procedures and all of these affect the operating state of UIM. UIM cloud native provides the mechanism to change the operational state as described in "Running Operational Procedures".

The flowcharts in this chapter use the following image to depict an operational procedure:

Description of the illustration uim_cn_deploy_operational-procedure.png

Component Upgrade Procedures

These are the actual set of steps to perform a component upgrade and can be one of the following types:

UIM Cloud Native Procedures: UIM cloud native owns the component and therefore the upgrade procedure for that component. UIM cloud native provides the mechanism to perform the upgrade via the scripts that are bundled with the UIM cloud native toolkit.
An example of this is a change to a value in a UIM cloud native specification file (shape, project, and instance).

The flowcharts in this chapter use the following image to depict a UIM cloud native owned procedure.

Description of the illustration uim_cn_deploy_osm-cn-owned-procedure.png
External Procedures: These procedures are for components that are part of the UIM cloud native operating environment, but are out of the control of UIM cloud native. UIM cloud native does not determine how to apply the upgrade, but provides recommendations on the operational state of UIM accompanying the upgrade.
An example would be updating the operating system on a worker node.

The flowcharts in this chapter use the following image to depict an external upgrade procedure.

Description of the illustration uim_cn_deploy_external-upgrade-procedure-image.png
Miscellaneous upgrade procedures: There are some procedures that require special handling and are not captured in any of the upgrade paths. These are described in "Miscellaneous Upgrade Procedures".

Rolling Restart

Occasionally, you may need to restart UIM managed servers in a rolling fashion, one at a time. This does not result in downtime, but only reduced capacity for a limited period. A rolling restart can be triggered by invoking the restart-instance.sh script. This script can restart the whole instance in a rolling fashion, or only the admin server or all the managed servers in a rolling fashion. Some operations may automatically trigger rolling restart. These include image updates, tuning parameter changes, and so on pushed via the upgrade-instance.sh script.

Identifying Your Upgrade Path

In order to prepare your detailed plan for an upgrade, you need to be able to map your upgrade use case to an upgrade path. Some common use cases are detailed in the following charts. If your use case is not listed, see "Upgrade Path Flow Chart", which guides you through the decision making process to prepare a specific upgrade path.

Table 11-1 Common Upgrade Paths

Upgrade Type	Component	Upgrade Path	Requires Changing Image?
Cartridge Management	Deploy new cartridge version with Ruleset code (Where the ruleset is referring Java files)	Online change, application upgrade, cartridge deployment	Yes
Cartridge Management	Redeploy a cartridge against an existing cartridge version with Ruleset code (Where the ruleset is referring Java files)	Online change, application upgrade, cartridge deployment	Yes
Cartridge Management	Deploy new cartridge version without Ruleset code	Online change, online cartridge deployment	No
Cartridge Management	Redeploy a cartridge against an existing cartridge version without Ruleset code	Online change, online cartridge deployment	No
Configuration and Tuning	UIM cluster size (scaling up or down)	Online change, application upgrade	Not applicable
Configuration and Tuning	Java parameters (memory, GC, and so on)	Online change, application upgrade	Not applicable
Configuration and Tuning	WebLogic domain configuration (WDT such as JMS Queue configuration)	Online change, application upgrade	No
Configuration and Tuning	UIM configuration parameters (custom-extensions.properties)	Online change, application upgrade	No
Database Storage Management	DB Purges	Offline Change, PDB upgrade	No
Security parameters	WebLogic Password change (poms cache coordination)	Miscellaneous upgrade procedures	No
Security parameters	UIM Schema Password Change	Miscellaneous upgrade procedures	No
Software Upgrade and Patching	UIM release or patch upgrade with Database change	Offline change, PDB upgrade, application upgrade	Yes
Software Upgrade and Patching	Fusion MiddleWare upgrade	Online change, application upgrade (some exceptions needing offline change)	Yes
Software Upgrade and Patching	UIM patch upgrade without Database change	Online Change, application upgrade (some exceptions needing offline change)	Yes
Software Upgrade and Patching	Fusion MiddleWare overlay patches (for example, PSU or one-off patch)	Online Change, application upgrade (some exceptions needing offline change)	Yes
Software Upgrade and Patching	Java upgrade	Online Change, application upgrade	Yes
Software Upgrade and Patching	Linux	Online Change, application upgrade	Yes
Software Upgrade and Patching	Custom code or third-party tool (custom image)	Online Change, application upgrade (some exceptions needing offline change)	Yes
Software Upgrade and Patching	UIM cloud native toolkit	The release dictates the constraints.	Not applicable
Shared infrastructure	Operating system or hardware on worker node	Online change, external procedure	No
Shared infrastructure	Docker	Online change, external procedure	No
Shared infrastructure	WebLogic Operator minor upgrade (backward compatible)	Online change, external procedure	No
Shared infrastructure	WebLogic Operator major upgrade (non-backward compatible)	Online change, external procedure	No

Once you understand the activities in your upgrade path, you can begin to map out the sequence of activities that you need to perform.

Offline Change Upgrade Paths

Offline changes are defined as those requiring UIM to be shutdown before the change can be applied.

All offline upgrades must start with a Scale Down procedure and end with a Scale Up procedure. You can find the explicit steps to perform these activities in Running Operational Procedures.

Once the cluster has been scaled down, you will need to perform either an external procedure (referencing documentation for the component) or follow a UIM cloud native owned procedure. See "UIM Cloud Native Upgrade Procedures" for details.

Figure 11-1 Offline Change Upgrade Paths

Description of "Figure 11-1 Offline Change Upgrade Paths"

As an example, if your use case is to perform DB purges, then the upgrade path is "Offline Change, DB Purge procedure". The actual steps involve the following:

Scale Down
- Edit the instance specification file to set cluster size to 0.
- Run upgrade-instance.sh.
PDB Upgrade
- Edit the instance specification file to include purge command.
- Run install-uimdb.sh with the command appropriate for the purge use case.
Scale Up
- Edit the instance specification file to return cluster size to original (1-18).
- Run upgrade-instance.sh.

Online Change Upgrade

Online changes are changes for which UIM can remain running while the component upgrade is performed. There is, therefore, no operational procedure at the start of the flow, but some paths include a rolling restart after the upgrade procedure is performed.

The component upgrade will either be an external procedure (referencing documentation for the component) or follow a UIM cloud native owned procedure described in "UIM Cloud Native Upgrade Procedures".

If explicit post-upgrade operational activities are required, you can find details in "Running Operational Procedures".

The following flowchart illustrates online change upgrade paths.

Figure 11-2 Online Change Upgrade Paths

Description of "Figure 11-2 Online Change Upgrade Paths"

Exceptions and Unsupported Tasks

Exceptions

The following require shutdown:

Some UIM patches
Some Oracle Fusion Middleware overlay patches
Oracle Fusion Middleware version upgrades

Unsupported Tasks

Adding, modifying, and deleting users or groups from embedded LDAP are not supported through an upgrade procedure. To make changes to users and groups, the instance must be deleted and re-create.

UIM Cloud Native Upgrade Procedures

The UIM cloud native owned upgrade procedures are:

PDB upgrade
UIM application upgrade
Online cartridge deployment

Change or upgrade procedures that are dictated by UIM cloud native are applied using the scripts and the configuration provided in the toolkit.

PDB Upgrade Procedure

Changes impacting the PDB can be found in any of the specification files - project, instance or shape.

Examples include updating the UIM DB Installer image.

To perform a PDB upgrade procedure:

Make the necessary modifications in your specification files.
Invoke $UIM_CNTK/scripts/install-uimdb.sh with the command appropriate for your use case.
To see a list of options, invoke with -h.

UIM Application Upgrade

Changes impacting the UIM application can be found in any of the specification files - project, instance or shape.

Examples include changing an existing value, changing the UIM image or supplying something new such as a secret or a new WDT extension.

To perform UIM application upgrade:

Make the necessary modifications in your specification files.
Invoke $UIM_CNTK/scripts/upgrade-instance.sh to push out the changes you just made to the running instance. This also triggers introspection for upgrade paths where introspection is required.
In upgrade paths where a manual restart is required, restart the instance. See "Restarting the Instance" for details.

Updating the Default Settings for Coherence Cluster

After you upgrade the UIM application, update the default settings for coherence cluster in the WebLogic console.

To update the default settings for coherence cluster:

Open the WebLogic console.
Under the Domain Structure section, expand Environment and select Coherence Clusters.

The Settings for defaultCoherenceCluster page appears.
Under the Members tab:
1. Under the Servers section, deselect AdminServer.
2. Under the Cluster section, select the required clusters.
Click Save.

The default settings for coherence cluster are updated.

Online Cartridge Deployment

The Online deployment mode supports deployment of new cartridges and depends on the type of the cartridge. The cartridges are classified as follows:

Simple cartridge (such as entity specifications, Groovy, or Drools code)
Custom Extension cartridge (Java code, configuration files, images, custom applications, Java libraries, Aspects, and localization)

For Simple Cartridges, deployment can be performed without any upgrade path.

For Custom Extension Cartridges, perform the deployment as follows:

Build customized image.
Make the necessary modifications in your project specification to modify the image name.
Upgrade the instance.
Deploy cartridges.

Upgrades to Infrastructure

From the point of view of UIM instances, upgrades to the cloud infrastructure fall into two categories: rolling upgrades and one-time upgrades.

Note:

All infrastructure upgrades must continue to meet the supported types and versions listed in the UIM documentation's certification statement.

Rolling upgrades are where, with proper high-availability planning (like anti-affinity rules), the instance as a whole remains available as parts of it undergo temporary outages. Examples of this are Kubernetes worker node OS upgrades, Kubernetes version upgrades and Docker version upgrades.

One-time upgrades affect a given instance all at once. The instance as a whole suffers either an operational outage or a control outage. Examples of this are WebLogic Operator upgrade and perhaps Ingress Controller upgrade.

Kubernetes and Docker Infrastructure Upgrades

Follow standard Kubernetes and Docker practices to upgrade these components. The impact at any point should be limited to one node - Master (Kubernetes and OS) or worker (Kubernetes, OS, and Docker). If a worker node is going to be upgraded, drain and cordon the node first. This will result in all pods moving away to other worker nodes. This is assuming your cluster has the capacity for this - you may have to temporarily add a worker node or two. For UIM instances, any pods on the cordoned worker will suffer an outage until they come up on other workers. However, their messages and orders are redistributed to remaining managed server pods and processing continues at a reduced capacity until the affected pods relocate and initialize. As each worker undergoes this process in turn, pods continue to terminate and start up elsewhere, but as long as the instance has pods in both affected and unaffected nodes, it will continue to process orders.

WebLogic Operator Upgrade

To upgrade the WebLogic Operator, follow the Operator documentation. As long as the target version can co-exist in a Kubernetes cluster with the current version, a phased cutover can be performed. In this, you will perform a fresh install of the new version of the Operator into a new namespace. RBAC will be arranged here, identical to your existing Operator namespace. Once the new Operator is functioning, for each UIM cloud native project, un-register it from the old Operator and register it with the new Operator. This can be done at your convenience on a per-project basis. When all projects have been switched to the new Operator, the old Operator can be safely deleted.

export WLSKO_NS=old-namespace $UIM_CNTK/scripts/unregister-namespace -p project -t wlsko 
export WLSKO_NS=new-namespace $UIM_CNTK/scripts/register-namespace -p project -t wlsko

All instances with the transitioned project are impacted by this operation. However, there is no order processing outage during the transition. There is a control outage - where no changes can be pushed to the instances (upgrade-instance.sh or delete-instance.sh). Also, during the control outage, the termination of a pod does not immediately trigger healing. However, once the transition of the project is complete, the new Operator will react to any changed state (whether in the cluster, like pod termination, or in pushed changes, like instance upgrades) and run the required actions.

Ingress Controller Upgrade

Follow the documentation of your chosen Ingress Controller to perform an upgrade. Depending on the Ingress Controller used and its deployment in your Kubernetes environment, the UIM instances it serves may see a wide set of impacts, ranging from no impact at all (if the Ingress Controller supports a clustered approach and can be upgraded that way) to a complete outage.

To take the sample of Traefik that UIM cloud native toolkit uses as an Ingress Controller illustration:

An approach identical to that of WebLogic Operator upgrade can be followed for Traefik upgrade. The new Traefik can be installed into a new namespace, and one-by-one, projects can be unregistered from the old Traefik and registered with the new Traefik.

export TRAEFIK_NS=old-namespace $UIM_CNTK/scripts/unregister-namespace -p project -t traefik 
export TRAEFIK_NS=new-namespace $UIM_CNTK/scripts/register-namespace -p project -t traefik

During this transition, there will be an outage in terms of the outside world interacting with UIM. Any data that flows through the ingress will be blocked until the new Traefik takes over. This includes GUI traffic, order injection, API queries, and SAF responses from external systems. This outage will affect all the instances in the project being transitioned.

Miscellaneous Upgrade Procedures

This section describes miscellaneous upgrade scenarios.

Network File System (NFS)

If an instance is created successfully, but a change to the NFS configuration is required, then the change cannot be made to a running UIM instance. In this case, the procedure is as follows:

Perform a fast delete. See "Running Operational Procedures" for details.
Update the nfs details in the instance specification.
Start the instance.

Security Parameters

To set the security parameters:

Perform a fast delete. See "Running Operational Procedures" for details.
Update the secrets for WebLogic, PDB credentials, or UIM Schema credentials.
Start the instance.

Running Operational Procedures

This section describes the tasks you perform on the UIM server in response to a planned upgrade to the UIM cloud native environment. You must consider if the change in the environment fundamentally affects UIM processing to the extent that UIM should not run when the upgrade is applied or UIM can run during the upgrade but must be restarted to properly process the change.

The operational procedures are performed using the UIM cloud native specification files and scripts.

The operational procedures you perform for upgrading your cloud environment are:

Trigger introspection
Scaling down the cluster
Scaling up the cluster
Restarting the cluster
Fast delete
- Shutting down the cluster
- Starting up the cluster

Triggering Introspection

When any of the specification files have changed, invoke the upgrade-instance.sh script to trigger the operator's introspector to examine the change and apply it to the running instance.

Scaling Down the Cluster

The scaling down procedure described here is only in the context of the upgrade flow diagram. Hence, scaling down is down to 0 managed servers. A generalized scaling can change the cluster size down to a value between 0 and 18 (both inclusive) in any desired increment or decrement.

To scale down the cluster, edit the instance specification and change the clusterSize parameter to 0. This terminates all the managed server pods, but leaves the admin server up and running.

Apply the change to the running Helm release by running the upgrade script:

$UIM_CNTK/scripts/upgrade-instance.sh -p project -i instance -s $SPEC_PATH

Scaling Up the Cluster

The scaling up procedure described here is only in the context of the upgrade flow diagram. Hence, scaling up is up to the initial cluster size. A generalized scaling can change the cluster size up to a value between 0 and 18 (both inclusive) in any desired increment or decrement.

To scale up the cluster, edit the instance specification and change the value of the clusterSize parameter to its original value to return the cluster to its previous operational state.

Apply the change to the running Helm release by running the upgrade script:

$UIM_CNTK/scripts/upgrade-instance.sh -p project -i instance -s $SPEC_PATH

Restarting the Instance

The UIM cloud native toolkit provides a script (restart-instance.sh) that you can use to perform different flavors of restarts on a running instance of UIM cloud native.

Following is the usage of the restart-instance.sh script

restart-instance.sh parameters
      -p projectName : mandatory
      -i instanceName : mandatory
      -s specPath : mandatory; locations of specification files
      -m customExtPath : optional; locations of custom extension files
      -r restartType : mandatory; what kind of restart is requested
    # specPath and customExtPath take a colon(:) delimited list of directories
    # restartType can take the following values:
      * full: Restarts the whole instance (rolling restart)
      * admin: Restarts the WebLogic Admin Server only
      * ms: Restarts all the Managed Servers (rolling restart)

    # or just -h for help

For example, to restart a complete cluster, run the following command:

$UIM_CNTK/scripts/restart-instance.sh -p project -i instance -s $SPEC_PATH -r full

Fast Delete

When the entire domain, including the admin server, needs to be taken offline, then the full shutdown and full startup procedures follow. This can be used to perform a "fast delete" or "dehydration" of the domain, instead of a full delete-instance operation where you may have to be concerned about the secrets and other pre-requisites being deleted. To quickly restore the domain, simply perform the startup procedure.

Shutting Down the Cluster

To shut down the cluster, edit the instance specification and add or modify the value of the serverStartPolicy parameter to Never. This terminates all the pods.

# Operational control parameters 
# scope - domain or cluster 
serverStartPolicy: Never

Apply the change to the running Helm release by running the upgrade script:

$UIM_CNTK/scripts/upgrade-instance.sh -p project -i instance -s $SPEC_PATH

Starting Up the Cluster

To start up the cluster, edit the instance specification and comment out or modify the value of the serverStartPolicy parameter to IfNeeded. This starts up all the pods.

# Operational control parameters 
# scope - domain or cluster 
serverStartPolicy: IfNeeded

Apply the change to the running Helm release by running the upgrade script:

$UIM_CNTK/scripts/upgrade-instance.sh -p project -i instance -s $SPEC_PATH

Upgrade Path Flow Chart

When comparing and contrasting the different flows, identifying common steps or divergences, it can be useful to have a combined view of the flowcharts along with the main decision points. This can be useful when trying to automate parts of the process.

The first decision to make is whether UIM can be running when you apply the change. Typically, UIM needs to be shutdown for PDB impacting scenarios and the exceptions listed in the "Exceptions and Unsupported Tasks" section.

The following flowchart illustrates the flow for offline upgrades and various scenarios.

Figure 11-3 Upgrade Path Flow for Offline Changes

Description of "Figure 11-3 Upgrade Path Flow for Offline Changes"

The following flowchart illustrates the flow for online upgrades and various scenarios.

Figure 11-4 Upgrade Path Flow for Online Changes

Description of "Figure 11-4 Upgrade Path Flow for Online Changes"