Caso d'uso: Automazione dell'applicazione di patch al sistema operativo mediante un'istanza in hosting automatico
Come organizzazione, desidero automatizzare l'applicazione di patch al sistema operativo per le risorse DBNode di Oracle Base Database Service utilizzando un'istanza self-hosted con Fleet Application Management.
Questo caso d'uso descrive un esempio in cui è necessario applicare patch al sistema operativo per le risorse Oracle Base Database Service. Poiché Oracle Base Database Service non fornisce il supporto nativo per l'applicazione delle patch al sistema operativo, è possibile utilizzare la funzione di istanza auto-ospitata in Fleet Application Management per automatizzare l'applicazione delle patch al sistema operativo sulle risorse DBNode e mantenere aggiornati i sistemi.
Per informazioni sulle istanze in hosting automatico, vedere Istanze in hosting automatico in Fleet Application Management.
Eseguire i passi riportati di seguito per automatizzare l'applicazione di patch al sistema operativo per le risorse DBNode di Oracle Base Database Service utilizzando la funzione di istanza self-hosted in Fleet Application Management.
1. Creare e impostare l'istanza di computazione
Crea un'istanza di computazione self-hosted in Oracle Cloud Infrastructure.
2. Abilita autenticazione principal istanza per l'istanza
Impostare l'autenticazione del principal dell'istanza per consentire all'istanza di accedere alle credenziali dell'account SSH e alle chiavi private per la destinazione DBNodes.
3. Crea segreti
- Aprire il menu di navigazione , selezionare Identità e sicurezza, quindi selezionare Vault.
- Se non esiste alcun vault, crearne uno nuovo. Vedere Creazione di un v Vault.
- Creare un segreto nel vault per la chiave privata SSH. Vedere Creazione di un segreto in un vault.
4. Preparare lo script di applicazione patch del sistema operativo DBNode nell'istanza
5. Eseguire lo script utilizzando l'istanza in hosting automatico con un runbook
Eseguire lo script di applicazione delle patch al sistema operativo DBNode utilizzando un'istanza self-hosted in un runbook. Il processo prevede la configurazione dell'istanza, la definizione di un runbook e il monitoraggio del processo.
Esempio di script di applicazione delle patch al sistema operativo DBNode
Di seguito è riportato uno script di esempio per applicare le patch al sistema operativo DBNode. Specificare il database display name e selezionare il file options da eseguire, ad esempio il controllo preliminare o l'aggiornamento.
def main():
"""Main function to orchestrate the DBaaS patching process."""
args = parse_arguments()
logger.info(f"Starting script with arguments: display-name={args.display_name}, option={args.option}")
print(f"Starting script with arguments: display-name={args.display_name}, option={args.option}")
# Get tenancy and region
tenancy_id, region = get_tenancy_and_region()
# Initialize OCI clients
db_client, compute_client, virtual_network_client, identity_client, secrets_client = initialize_oci_clients(region)
# Get all compartments
try:
compartments = oci.pagination.list_call_get_all_results(identity_client.list_compartments, tenancy_id).data
logger.info(f"Retrieved {len(compartments)} compartments")
except oci.exceptions.ServiceError as e:
logger.error(f"Failed to list compartments: {str(e)}. Exiting program.")
//handle exception and exit
# Get DB System by display name
db_system, compartment_id = get_db_system_by_display_name(db_client, compartments, args.display_name)
if not db_system:
//handle exception and exit
# Get DB Nodes
db_nodes = get_db_nodes(db_client, compartment_id, db_system.id)
if not db_nodes:
//handle exception and exit
# Retrieve secret for SSH
secret_id = //handle fetching secrets from vault if required either from arguments or a suitable mechanism
private_key_content = get_secret_content(secrets_client, secret_id)
if not private_key_content:
//handle exception and exit
# Process each DB Node
for node in db_nodes:
node_ip = get_node_ip(virtual_network_client, node.vnic_id)
if not node_ip:
//handle exception and exit
# Initialize SSH client example using any suitable library based on your use case
ssh_client = paramiko.SSHClient()
ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
try:
logger.info(f"Connecting to {node_ip} as <user>")
private_key_file = StringIO(private_key_content)
private_key = paramiko.RSAKey.from_private_key(private_key_file)
ssh_client.connect(node_ip, username=user, pkey=private_key)
logger.info(f"Connected to {node_ip}")
except Exception as e:
//handle exception and exit
# Check DCS agent status and attempt to restart if down
//handle agent check if required
...
# Determine storage type to check if ASM is used if required
is_asm = identify_storage_type(ssh_client,command)
# Perform precheck or update
if args.option == "precheck":
if not os_update_precheck(ssh_client, node_ip, is_asm):
//handle exception and exit
logger.info(f"OS update precheck completed successfully on {node_ip}")
elif args.option == "update":
if not os_update_precheck(ssh_client, node_ip, is_asm):
//handle exception and exit
if not os_update(ssh_client, node_ip, is_asm, secrets_client, secret_id):
//handle exception and exit
logger.info(f"OS update completed successfully on {node_ip}")
ssh_client.close()
def os_update(ssh_client, node_ip, is_asm, secrets_client, secret_id):
# Pre-patching checks
if is_asm:
# Check grid user permissions
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
if error:
//handle exception and exit
# Check CRS status
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error or output != "<expected outcome>":
//handle exception and exit
logger.info(f"CRS is online on {node_ip}")
# Check DB processes
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error or int(output) <= <expected outcome>:
//handle exception and exit
logger.info(f"DB services are up on {node_ip} with {output} processes")
else:
# Check DB processes
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
if error or int(output) <= <expected outcome>:
//handle exception and exit
logger.info(f"DB services are up on {node_ip} with {output} processes")
# Check alert log for startup
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
if error or int(output) <= <expected outcome>:
//handle exception and exit
logger.info(f"Database startup confirmed in alert log on {node_ip}")
# Kernel control check
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
if error:
//handle exception and exit
kernel = output
if "<kernel version 1>" in kernel:
repo_file = "<version suitable repo>"
elif "<kernel version 2>" in kernel:
logger.warning(f"Node {node_ip} is running a version, which is end of life. Skipping OS patching.")
return False
else:
repo_file = "<version suitable repo>"
# Start OS patching
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error:
//handle exception and exit
if not output:
logger.error(f"No output from dbcli update server on {node_ip}, cannot proceed. Exiting program.")
//handle exception and exit
logger.info(f"dbcli update output: {output}")
try:
job_data = json.loads(output)
job_id = job_data.get('jobId')
if not job_id:
logger.error(f"No jobId found in dbcli update server output on {node_ip}. Exiting program.")
//handle exception and exit
logger.info(f"Update Job ID: {job_id}")
except json.JSONDecodeError:
//handle exception and exit
# Monitor job status every 5 minutes for up to 3 hours
start_time = time.time()
timeout = 10800 # 3 hours in seconds
polling_interval = 300 # 5 minutes in seconds
while True:
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error:
logger.error(f"Failed to check job status for {job_id} on {node_ip}: {error}. Exiting program.")
//handle exception and exit
if not output:
logger.error(f"No output from dbcli describe job for {job_id} on {node_ip}. Exiting program.")
//handle exception and exit
logger.info(f"Job {job_id} status output: {output}")
try:
job_data = json.loads(output)
status = job_data.get('status')
if not status:
logger.error(f"No status found in dbcli describe-job output for {job_id} on {node_ip}. Exiting program.")
//handle exception and exit
logger.info(f"Job {job_id} status: {status}")
if status == "Success":
logger.info(f"OS patching job {job_id} completed successfully on {node_ip}")
break
elif status == "Failure":
logger.error(f"OS patching job {job_id} failed on {node_ip}. Exiting program.")
//handle exception and exit
elif status in ["Running", "InProgress", "In_Progress"]:
elapsed = time.time() - start_time
if elapsed > timeout:
logger.error(f"OS patching job {job_id} timed out after 3 hours on {node_ip}. Exiting program.")
//handle exception and exit
logger.info(f"Job {job_id} still {status}, checking again in 5 minutes")
time.sleep(polling_interval)
else:
logger.error(f"Unexpected job status for {job_id} on {node_ip}: {status}. Exiting program.")
//handle exception and exit
except json.JSONDecodeError:
//handle exception and exit
# Shutdown CRS/DB before reboot
if is_asm:
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
logger.info(f"Pre-reboot CRS status output (as root): {output}")
if output == <expected outcome>:
logger.info(f"CRS is up, shutting down CRS on {node_ip} as root")
if error:
//handle exception and exit
time.sleep(120)
else:
logger.info(f"CRS is already down on {node_ip}, proceeding with reboot")
else:
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user)
logger.info(f"Pre-reboot database processes output: {output}")
print(f"Pre-reboot database processes output: {output}")
if output == <expected outcome>:
logger.info(f"Database is up, shutting down database on {node_ip}")
if error:
//handle exception and exit
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>, sudo_user=user) # check trace log
if output != <expected outcome>:
logger.error(f"Database shutdown incomplete on {node_ip}, expected 'Shutting down instance' in alert log. Exiting program.")
//handle exception and exit
time.sleep(120)
else:
logger.info(f"Database is already down on {node_ip}, proceeding with reboot")
# Reboot the server
output, error = execute_ssh_command(ssh_client, command, user, sudo=<yes/no>)
if error:
//handle exception and exit
logger.info(f"Initiated reboot on {node_ip}")
time.sleep(120) # Wait for reboot to initiate
# Check host status with fresh SSH client
start_time = time.time()
timeout = 1440 # 24 minutes in seconds
new_ssh_client = None
while True:
new_ssh_client = paramiko.SSHClient()
new_ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
try:
//attempt connecting SSH to ensure its online
except Exception as e:
//handle exception and exit
elapsed = time.time() - start_time
if elapsed > timeout:
logger.error(f"Node {node_ip} failed to come online after {timeout} seconds. Exiting program.")
//handle exception and exit
logger.info(f"{node_ip} not up yet. Waiting 30 seconds...")
time.sleep(30)
# Post-reboot wait and checks with new SSH client
# Perform post-reboot service startup if needed
...
# Perform post-reboot checks if required
...
logger.info(f"OS update completed successfully on {node_ip}")
new_ssh_client.close()
return True
Per ulteriori informazioni sui comandi DBCLI (Database Command Line Interface), consultare il riferimento a Oracle Database CLI.