Cas d'utilisation : automatiser l'application de patches au système d'exploitation à l'aide d'une instance auto-hébergée
En tant qu'organisation, je veux automatiser l'application de patches de système d'exploitation pour les ressources Oracle Base Database Service DBNode à l'aide d'une instance auto-hébergée avec Fleet Application Management.
Ce cas d'emploi décrit un exemple dans lequel vous devez appliquer des patches au système d'exploitation pour les ressources Oracle Base Database Service. Etant donné qu'Oracle Base Database Service ne fournit pas de prise en charge native de l'application de patches au système d'exploitation, vous pouvez utiliser la fonctionnalité d'instance auto-hébergée dans Fleet Application Management pour automatiser l'application de patches au système d'exploitation sur les ressources DBNode et maintenir les systèmes à jour.
Pour plus d'informations sur les instances auto-hébergées, reportez-vous à Instances auto-hébergées dans Fleet Application Management.
Pour automatiser l'application de patches au système d'exploitation pour les ressources Oracle Base Database Service DBNode à l'aide de la fonctionnalité d'instance auto-hébergée dans Fleet Application Management, procédez comme suit :
- 1. Créer et configurer l'instance Compute
- 2. Activer l'authentification du principal d'instance pour l'instance
- 3. Créer des clés secrètes
- 4. Préparation du script d'application de patches au système d'exploitation DBNode sur l'instance
- 5. Exécution du script à l'aide de l'instance auto-hébergée avec un guide d'exécution
1. Créer et configurer l'instance Compute
Créez une instance Compute auto-hébergée dans Oracle Cloud Infrastructure.
2. Activer l'authentification du principal d'instance pour l'instance
Configurez l'authentification de principal d'instance afin de permettre à l'instance d'accéder aux informations d'identification de compte SSH et aux clés privées pour la cible DBNodes.
3. Créer des clés secrètes
- Ouvrez le menu de navigation , sélectionnez Identité et sécurité, puis Coffre.
- Si aucun coffre n'existe, créez-en un nouveau. Reportez-vous à Création d'un coffre.
- Créez une clé secrète dans le coffre pour la clé privée SSH. Reportez-vous à Création d'une clé secrète dans un coffre.
4. Préparation du script d'application de patches au système d'exploitation DBNode sur l'instance
5. Exécution du script à l'aide de l'instance auto-hébergée avec un guide d'exécution
Exécutez le script d'application de patches au système d'exploitation DBNode à l'aide d'une instance auto-hébergée dans un guide d'exploitation. Le traitement consiste à configurer l'instance, à définir un guide d'exécution et à surveiller le traitement.
Exemple de script d'application de patches au système d'exploitation DBNode
Voici un exemple de script permettant d'appliquer des patches au système d'exploitation DBNode. Indiquez la base de données display name et sélectionnez l'élément options à exécuter, par exemple la prévérification ou la mise à jour.
def main():
"""Main function to orchestrate the DBaaS patching process."""
args = parse_arguments()
logger.info(f"Starting script with arguments: display-name={args.display_name}, option={args.option}")
print(f"Starting script with arguments: display-name={args.display_name}, option={args.option}")
# Get tenancy and region
tenancy_id, region = get_tenancy_and_region()
# Initialize OCI clients
db_client, compute_client, virtual_network_client, identity_client, secrets_client = initialize_oci_clients(region)
# Get all compartments
try:
compartments = oci.pagination.list_call_get_all_results(identity_client.list_compartments, tenancy_id).data
logger.info(f"Retrieved {len(compartments)} compartments")
except oci.exceptions.ServiceError as e:
logger.error(f"Failed to list compartments: {str(e)}. Exiting program.")
//handle exception and exit
# Get DB System by display name
db_system, compartment_id = get_db_system_by_display_name(db_client, compartments, args.display_name)
if not db_system:
//handle exception and exit
# Get DB Nodes
db_nodes = get_db_nodes(db_client, compartment_id, db_system.id)
if not db_nodes:
//handle exception and exit
# Retrieve secret for SSH
secret_id = //handle fetching secrets from vault if required either from arguments or a suitable mechanism
private_key_content = get_secret_content(secrets_client, secret_id)
if not private_key_content:
//handle exception and exit
# Process each DB Node
for node in db_nodes:
node_ip = get_node_ip(virtual_network_client, node.vnic_id)
if not node_ip:
//handle exception and exit
# Initialize SSH client example using any suitable library based on your use case
ssh_client = paramiko.SSHClient()
ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
try:
logger.info(f"Connecting to {node_ip} as <user>")
private_key_file = StringIO(private_key_content)
private_key = paramiko.RSAKey.from_private_key(private_key_file)
ssh_client.connect(node_ip, username=user, pkey=private_key)
logger.info(f"Connected to {node_ip}")
except Exception as e:
//handle exception and exit
# Check DCS agent status and attempt to restart if down
//handle agent check if required
...
# Determine storage type to check if ASM is used if required
is_asm = identify_storage_type(ssh_client,command)
# Perform precheck or update
if args.option == "precheck":
if not os_update_precheck(ssh_client, node_ip, is_asm):
//handle exception and exit
logger.info(f"OS update precheck completed successfully on {node_ip}")
elif args.option == "update":
if not os_update_precheck(ssh_client, node_ip, is_asm):
//handle exception and exit
if not os_update(ssh_client, node_ip, is_asm, secrets_client, secret_id):
//handle exception and exit
logger.info(f"OS update completed successfully on {node_ip}")
ssh_client.close()
def os_update(ssh_client, node_ip, is_asm, secrets_client, secret_id):
# Pre-patching checks
if is_asm:
# Check grid user permissions
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
if error:
//handle exception and exit
# Check CRS status
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error or output != "<expected outcome>":
//handle exception and exit
logger.info(f"CRS is online on {node_ip}")
# Check DB processes
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error or int(output) <= <expected outcome>:
//handle exception and exit
logger.info(f"DB services are up on {node_ip} with {output} processes")
else:
# Check DB processes
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
if error or int(output) <= <expected outcome>:
//handle exception and exit
logger.info(f"DB services are up on {node_ip} with {output} processes")
# Check alert log for startup
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
if error or int(output) <= <expected outcome>:
//handle exception and exit
logger.info(f"Database startup confirmed in alert log on {node_ip}")
# Kernel control check
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
if error:
//handle exception and exit
kernel = output
if "<kernel version 1>" in kernel:
repo_file = "<version suitable repo>"
elif "<kernel version 2>" in kernel:
logger.warning(f"Node {node_ip} is running a version, which is end of life. Skipping OS patching.")
return False
else:
repo_file = "<version suitable repo>"
# Start OS patching
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error:
//handle exception and exit
if not output:
logger.error(f"No output from dbcli update server on {node_ip}, cannot proceed. Exiting program.")
//handle exception and exit
logger.info(f"dbcli update output: {output}")
try:
job_data = json.loads(output)
job_id = job_data.get('jobId')
if not job_id:
logger.error(f"No jobId found in dbcli update server output on {node_ip}. Exiting program.")
//handle exception and exit
logger.info(f"Update Job ID: {job_id}")
except json.JSONDecodeError:
//handle exception and exit
# Monitor job status every 5 minutes for up to 3 hours
start_time = time.time()
timeout = 10800 # 3 hours in seconds
polling_interval = 300 # 5 minutes in seconds
while True:
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error:
logger.error(f"Failed to check job status for {job_id} on {node_ip}: {error}. Exiting program.")
//handle exception and exit
if not output:
logger.error(f"No output from dbcli describe job for {job_id} on {node_ip}. Exiting program.")
//handle exception and exit
logger.info(f"Job {job_id} status output: {output}")
try:
job_data = json.loads(output)
status = job_data.get('status')
if not status:
logger.error(f"No status found in dbcli describe-job output for {job_id} on {node_ip}. Exiting program.")
//handle exception and exit
logger.info(f"Job {job_id} status: {status}")
if status == "Success":
logger.info(f"OS patching job {job_id} completed successfully on {node_ip}")
break
elif status == "Failure":
logger.error(f"OS patching job {job_id} failed on {node_ip}. Exiting program.")
//handle exception and exit
elif status in ["Running", "InProgress", "In_Progress"]:
elapsed = time.time() - start_time
if elapsed > timeout:
logger.error(f"OS patching job {job_id} timed out after 3 hours on {node_ip}. Exiting program.")
//handle exception and exit
logger.info(f"Job {job_id} still {status}, checking again in 5 minutes")
time.sleep(polling_interval)
else:
logger.error(f"Unexpected job status for {job_id} on {node_ip}: {status}. Exiting program.")
//handle exception and exit
except json.JSONDecodeError:
//handle exception and exit
# Shutdown CRS/DB before reboot
if is_asm:
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
logger.info(f"Pre-reboot CRS status output (as root): {output}")
if output == <expected outcome>:
logger.info(f"CRS is up, shutting down CRS on {node_ip} as root")
if error:
//handle exception and exit
time.sleep(120)
else:
logger.info(f"CRS is already down on {node_ip}, proceeding with reboot")
else:
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
logger.info(f"Pre-reboot database processes output: {output}")
print(f"Pre-reboot database processes output: {output}")
if output == <expected outcome>:
logger.info(f"Database is up, shutting down database on {node_ip}")
if error:
//handle exception and exit
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user) # check trace log
if output != <expected outcome>:
logger.error(f"Database shutdown incomplete on {node_ip}, expected 'Shutting down instance' in alert log. Exiting program.")
//handle exception and exit
time.sleep(120)
else:
logger.info(f"Database is already down on {node_ip}, proceeding with reboot")
# Reboot the server
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error:
//handle exception and exit
logger.info(f"Initiated reboot on {node_ip}")
time.sleep(120) # Wait for reboot to initiate
# Check host status with fresh SSH client
start_time = time.time()
timeout = 1440 # 24 minutes in seconds
new_ssh_client = None
while True:
new_ssh_client = paramiko.SSHClient()
new_ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
try:
//attempt connecting SSH to ensure its online
except Exception as e:
//handle exception and exit
elapsed = time.time() - start_time
if elapsed > timeout:
logger.error(f"Node {node_ip} failed to come online after {timeout} seconds. Exiting program.")
//handle exception and exit
logger.info(f"{node_ip} not up yet. Waiting 30 seconds...")
time.sleep(30)
# Post-reboot wait and checks with new SSH client
# Perform post-reboot service startup if needed
...
# Perform post-reboot checks if required
...
logger.info(f"OS update completed successfully on {node_ip}")
new_ssh_client.close()
return True
Pour plus d'informations sur les commandes DBCLI (Database Command Line Interface), reportez-vous au Guide de référence de l'interface de ligne de commande Oracle Database.