Caso de uso: automatización de la aplicación de parches del sistema operativo mediante una instancia alojada
Como organización, quiero automatizar la aplicación de parches del sistema operativo para los recursos DBNode de Oracle Base Database Service mediante una instancia autoalojada con Fleet Application Management.
En este caso de uso se describe un ejemplo en el que necesita aplicar parches al sistema operativo para los recursos de Oracle Base Database Service. Debido a que Oracle Base Database Service no proporciona soporte nativo para la aplicación de parches del sistema operativo, puede utilizar la función de instancia autoalojada en Fleet Application Management para automatizar la aplicación de parches del sistema operativo en recursos DBNode y mantener los sistemas actualizados.
Para obtener información sobre las instancias autoalojadas, consulte Instancias autoalojadas en Fleet Application Management.
Realice los siguientes pasos para automatizar la aplicación de parches del sistema operativo para los recursos DBNode de Oracle Base Database Service mediante la función de instancia autoalojada en Fleet Application Management:
- 1. Creación y configuración de la instancia informática
- 2. Activar autenticación de principal de instancia para la instancia
- 3. Crear secretos
- 4. Preparación del script de aplicación de parches del sistema operativo DBNode en la instancia
- 5. Ejecución del script mediante la instancia alojada automáticamente con un libro de ejecución
1. Creación y configuración de la instancia informática
Cree una instancia informática autoalojada en Oracle Cloud Infrastructure.
2. Activar autenticación de principal de instancia para la instancia
Configure la autenticación del principal de instancia para permitir que la instancia acceda a las credenciales de la cuenta SSH y las claves privadas para el destino DBNodes.
3. Crear secretos
- Abra el menú de navegación , seleccione Identidad y seguridad y, a continuación, Almacén.
- Si no existe ningún almacén, cree uno nuevo. Consulte Creación de un Almacén.
- Cree un secreto en Vault para la clave privada SSH. Consulte Creación de un secreto en un almacén.
4. Preparación del script de aplicación de parches del sistema operativo DBNode en la instancia
5. Ejecución del script mediante la instancia alojada automáticamente con un libro de ejecución
Ejecute el script de aplicación de parches del sistema operativo DBNode mediante una instancia autoalojada en un libro de ejecución. El proceso implica configurar la instancia, definir un libro de ejecución y supervisar el proceso.
Ejemplo de script de aplicación de parches del sistema operativo DBNode
A continuación, se muestra un script de ejemplo para aplicar un parche al sistema operativo DBNode. Especifique la base de datos display name y seleccione el options que desea realizar, como la comprobación previa o la actualización.
def main():
"""Main function to orchestrate the DBaaS patching process."""
args = parse_arguments()
logger.info(f"Starting script with arguments: display-name={args.display_name}, option={args.option}")
print(f"Starting script with arguments: display-name={args.display_name}, option={args.option}")
# Get tenancy and region
tenancy_id, region = get_tenancy_and_region()
# Initialize OCI clients
db_client, compute_client, virtual_network_client, identity_client, secrets_client = initialize_oci_clients(region)
# Get all compartments
try:
compartments = oci.pagination.list_call_get_all_results(identity_client.list_compartments, tenancy_id).data
logger.info(f"Retrieved {len(compartments)} compartments")
except oci.exceptions.ServiceError as e:
logger.error(f"Failed to list compartments: {str(e)}. Exiting program.")
//handle exception and exit
# Get DB System by display name
db_system, compartment_id = get_db_system_by_display_name(db_client, compartments, args.display_name)
if not db_system:
//handle exception and exit
# Get DB Nodes
db_nodes = get_db_nodes(db_client, compartment_id, db_system.id)
if not db_nodes:
//handle exception and exit
# Retrieve secret for SSH
secret_id = //handle fetching secrets from vault if required either from arguments or a suitable mechanism
private_key_content = get_secret_content(secrets_client, secret_id)
if not private_key_content:
//handle exception and exit
# Process each DB Node
for node in db_nodes:
node_ip = get_node_ip(virtual_network_client, node.vnic_id)
if not node_ip:
//handle exception and exit
# Initialize SSH client example using any suitable library based on your use case
ssh_client = paramiko.SSHClient()
ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
try:
logger.info(f"Connecting to {node_ip} as <user>")
private_key_file = StringIO(private_key_content)
private_key = paramiko.RSAKey.from_private_key(private_key_file)
ssh_client.connect(node_ip, username=user, pkey=private_key)
logger.info(f"Connected to {node_ip}")
except Exception as e:
//handle exception and exit
# Check DCS agent status and attempt to restart if down
//handle agent check if required
...
# Determine storage type to check if ASM is used if required
is_asm = identify_storage_type(ssh_client,command)
# Perform precheck or update
if args.option == "precheck":
if not os_update_precheck(ssh_client, node_ip, is_asm):
//handle exception and exit
logger.info(f"OS update precheck completed successfully on {node_ip}")
elif args.option == "update":
if not os_update_precheck(ssh_client, node_ip, is_asm):
//handle exception and exit
if not os_update(ssh_client, node_ip, is_asm, secrets_client, secret_id):
//handle exception and exit
logger.info(f"OS update completed successfully on {node_ip}")
ssh_client.close()
def os_update(ssh_client, node_ip, is_asm, secrets_client, secret_id):
# Pre-patching checks
if is_asm:
# Check grid user permissions
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
if error:
//handle exception and exit
# Check CRS status
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error or output != "<expected outcome>":
//handle exception and exit
logger.info(f"CRS is online on {node_ip}")
# Check DB processes
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error or int(output) <= <expected outcome>:
//handle exception and exit
logger.info(f"DB services are up on {node_ip} with {output} processes")
else:
# Check DB processes
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
if error or int(output) <= <expected outcome>:
//handle exception and exit
logger.info(f"DB services are up on {node_ip} with {output} processes")
# Check alert log for startup
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
if error or int(output) <= <expected outcome>:
//handle exception and exit
logger.info(f"Database startup confirmed in alert log on {node_ip}")
# Kernel control check
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
if error:
//handle exception and exit
kernel = output
if "<kernel version 1>" in kernel:
repo_file = "<version suitable repo>"
elif "<kernel version 2>" in kernel:
logger.warning(f"Node {node_ip} is running a version, which is end of life. Skipping OS patching.")
return False
else:
repo_file = "<version suitable repo>"
# Start OS patching
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error:
//handle exception and exit
if not output:
logger.error(f"No output from dbcli update server on {node_ip}, cannot proceed. Exiting program.")
//handle exception and exit
logger.info(f"dbcli update output: {output}")
try:
job_data = json.loads(output)
job_id = job_data.get('jobId')
if not job_id:
logger.error(f"No jobId found in dbcli update server output on {node_ip}. Exiting program.")
//handle exception and exit
logger.info(f"Update Job ID: {job_id}")
except json.JSONDecodeError:
//handle exception and exit
# Monitor job status every 5 minutes for up to 3 hours
start_time = time.time()
timeout = 10800 # 3 hours in seconds
polling_interval = 300 # 5 minutes in seconds
while True:
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error:
logger.error(f"Failed to check job status for {job_id} on {node_ip}: {error}. Exiting program.")
//handle exception and exit
if not output:
logger.error(f"No output from dbcli describe job for {job_id} on {node_ip}. Exiting program.")
//handle exception and exit
logger.info(f"Job {job_id} status output: {output}")
try:
job_data = json.loads(output)
status = job_data.get('status')
if not status:
logger.error(f"No status found in dbcli describe-job output for {job_id} on {node_ip}. Exiting program.")
//handle exception and exit
logger.info(f"Job {job_id} status: {status}")
if status == "Success":
logger.info(f"OS patching job {job_id} completed successfully on {node_ip}")
break
elif status == "Failure":
logger.error(f"OS patching job {job_id} failed on {node_ip}. Exiting program.")
//handle exception and exit
elif status in ["Running", "InProgress", "In_Progress"]:
elapsed = time.time() - start_time
if elapsed > timeout:
logger.error(f"OS patching job {job_id} timed out after 3 hours on {node_ip}. Exiting program.")
//handle exception and exit
logger.info(f"Job {job_id} still {status}, checking again in 5 minutes")
time.sleep(polling_interval)
else:
logger.error(f"Unexpected job status for {job_id} on {node_ip}: {status}. Exiting program.")
//handle exception and exit
except json.JSONDecodeError:
//handle exception and exit
# Shutdown CRS/DB before reboot
if is_asm:
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
logger.info(f"Pre-reboot CRS status output (as root): {output}")
if output == <expected outcome>:
logger.info(f"CRS is up, shutting down CRS on {node_ip} as root")
if error:
//handle exception and exit
time.sleep(120)
else:
logger.info(f"CRS is already down on {node_ip}, proceeding with reboot")
else:
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
logger.info(f"Pre-reboot database processes output: {output}")
print(f"Pre-reboot database processes output: {output}")
if output == <expected outcome>:
logger.info(f"Database is up, shutting down database on {node_ip}")
if error:
//handle exception and exit
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user) # check trace log
if output != <expected outcome>:
logger.error(f"Database shutdown incomplete on {node_ip}, expected 'Shutting down instance' in alert log. Exiting program.")
//handle exception and exit
time.sleep(120)
else:
logger.info(f"Database is already down on {node_ip}, proceeding with reboot")
# Reboot the server
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error:
//handle exception and exit
logger.info(f"Initiated reboot on {node_ip}")
time.sleep(120) # Wait for reboot to initiate
# Check host status with fresh SSH client
start_time = time.time()
timeout = 1440 # 24 minutes in seconds
new_ssh_client = None
while True:
new_ssh_client = paramiko.SSHClient()
new_ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
try:
//attempt connecting SSH to ensure its online
except Exception as e:
//handle exception and exit
elapsed = time.time() - start_time
if elapsed > timeout:
logger.error(f"Node {node_ip} failed to come online after {timeout} seconds. Exiting program.")
//handle exception and exit
logger.info(f"{node_ip} not up yet. Waiting 30 seconds...")
time.sleep(30)
# Post-reboot wait and checks with new SSH client
# Perform post-reboot service startup if needed
...
# Perform post-reboot checks if required
...
logger.info(f"OS update completed successfully on {node_ip}")
new_ssh_client.close()
return True
Para obtener más información sobre los comandos de la interfaz de línea de comandos de la base de datos (DBCLI), consulte la Referencia de la CLI de Oracle Database.