Deploy a Migrated MongoDB Workload to Oracle Autonomous Transaction Processing Serverless@Google Cloud

Migrate an existing workload that uses a document database, in this case MongoDB, to Google Cloud and Oracle Autonomous Transaction Processing deployed in Google Cloud, a cloud document database service that makes it simple to modernize the development of your JSON-centric applications alongside other multi-model workloads.

Workloads and applications that use documents and document databases to evolve data schemas and applications are quite popular due to the flexibility they offer to developers. Schema flexibility, rapid development, and scalability enable rapid prototyping of application features, easier application evolution, and the assurance of building iteratively smaller applications and features that developers can scale to address a large user base. However, these types of workloads have their challenges, including weaker transactional guarantees, data query versatility, and the inability to support other workloads on documents, such as analytics or machine learning.

What if these workloads can benefit from the advantages of traditional document databases and leverage the benefits of relational databases? For instance, have stronger transactional guarantees and added functionality such as analytics and machine learning, without the need to replicate data to another database or system.

Autonomous Transaction Processing (ATP) Serverless is a fully automated database service optimized to run transactional, analytical, and batch workloads concurrently. To accelerate performance, it’s preconfigured for row format, indexes, and data caching while providing scalability, availability, transparent security, and real-time operational analytics. Application developers and DBAs can rapidly and cost-effectively develop and deploy applications without sacrificing functionality or atomicity, consistency, isolation, and durability (ACID) properties.

Functional Architecture

This architecture assumes, as a starting point, that a workload with an application and a MongoDB database exists, either an on-premises or cloud deployment, and will be migrated to Google Cloud and Oracle Database@Google Cloud. It describes the future state architecture, its benefits, how it can be deployed and what additional features you can use to augment the existing workload.

One of the key features used in this architecture is Oracle Database API for MongoDB, which enables applications to interact with collections of JSON documents in Oracle Database using MongoDB drivers, tools, and SDKs. Existing application code can work with data stored in Autonomous Transaction Processing Serverless, without the need to refactor code.

The following diagram depicts a typical application composed of a database, back-end, and front-end tiers.

Description of mongodb-atp-s-google-logical-arch-migration.png follows

Description of the illustration mongodb-atp-s-google-logical-arch-migration.png

mongodb-atp-s-google-logical-arch-migration.zip

The MEAN stack is a popular stack used to implement this pattern:

MongoDB: Document database
Express: Back-end framework
Angular: Front-end framework
Node.js: Back-end server

This document uses a MEAN stack as an example of an existing deployment that will be migrated to Google Cloud and ATP Serverless.

The migration of this workload to Google Cloud and ATP Serverless is straightforward and consists, at high level, of the following steps:

Deploy an ATP Serverless instance, enabling at creation time the Oracle Database MongoDB API.
Migrate metadata and data from MongoDB to ATP Serverless.
Deploy application servers to run Node.js and Express using either Google Cloud Run, VMs, containers, or Kubernetes, to the same region and availability domain as ATP Serverless.
Deploy the back-end application code to the application servers.
Connect the back-end application to ATP Serverless using the same MongoDB tools and drivers used on the current application.
Connect users to the new application URI.

Note this reference architecture focuses on the deployment of the migrated workload and not on the migration process itself. For more details on the migration process, see the Explore More section.

After the workload is migrated to ATP Serverless, several features are available to augment the existing functionality, whether that is to 1) support additional nonfunctional requirements, such as easily improving scalability, resiliency, or high availability, or 2) have additional functional features such as operational reporting, analytics, and machine learning in place, without the need to copy data out of the database.

To improve scalability and high availability, use the Autonomous Transaction Processing Serverless auto scaling feature. With a single click or API call, it allows the workload to use up to 3 times the baseline capacity without any downtime. Note that Autonomous Transaction Processing Serverless uses Oracle Real Application Clusters (Oracle RAC) technology for high availability. For the back-end tier, either use VM Scale Sets with Autoscale setup, or a PaaS service such as App Service with Automatic Scaling setup to enable application high availability and scalability.

Since ATP Serverless is built on top of multi-model, multi-workload database technology, you can add features that rely on relational, spatial, graph or vector data types that work alongside the existing application.

Physical Architecture

The physical architecture includes Autonomous Transaction Processing Serverless deployed using delegated subnets in two Google Cloud regions to support high availability. OCI services support automatic backup to Oracle Cloud Infrastructure Object Storage.

The architecture supports the following:

Front-end tier
- Application users can connect from the internet.
- User connection is routed to the active region that is running the application, using a Global Cloud Load Balancer.
- User connection is secured using Cloud Armor.
- User connection to the application is load balanced using an external global application load balancer.
Back-end tier
- Application is deployed in a high availability fashion using Cloud Run.
- Cloud Run autoscaling is used to achieve horizontal scalability.
Database tier
- ATP Serverless provides high availability, as Oracle Real Application Clusters (Oracle RAC) and several database nodes underpin the service instance. Therefore, by default the database tier is highly available and resilient.
- Oracle Database API for MongoDB enabled in ATP Serverless allows you to use existing application code without changes.
- The Oracle Database API for MongoDB is highly resilient, and that resiliency is guaranteed internally by ATP Serverless.
- ATP Serverless can use auto scaling, adjusting when the system load increases and decreases.
- ATP Serverless business continuity is achieved through cross-region Autonomous Data Guard.
Disaster Recovery
- The second region is deployed with a similar topology to reduce the overall recovery time objective.
- Use a warm DR strategy to reduce the overall RTO. In a warm DR strategy, the back-end tier cloud resources are already provisioned alongside the ATP Serverless standby database.
- Alternatively you can provision the back-end tier resources in the event of a failure, decreasing the cost of running the DR resources but increasing the overall RTO.
Networking
- All application incoming traffic from on-premises and from the internet is routed by Cloud Load Balancer.
- ATP Serverless is deployed with a private endpoint to increase the security posture.
- Cloud Run is deployed using a Serverless VPC Access connector placed in an application subnet within the VPC, to reach the ATP Serverless instance.
- The Serverless VPC Access connector allows establishing private connectivity to the ATP Serverless instance.
Security
- All data is secure in transit and at rest.

The following potential design improvements are not depicted on this deployment for simplicity's sake:

Automate application Disaster Recovery using Cloud Monitoring alerts, Pub/Sub and Cloud Functions.
Leverage a hub and spoke topology to enforce centralized network security.
Leverage a network firewall, deployed in the hub VPC, to improve the overall security posture by inspecting all traffic and enforcing policies.

Description of mongodb-atp-s-google-physical-arch.png follows

Description of the illustration mongodb-atp-s-google-physical-arch.png

mongodb-atp-s-google-physical-arch.zip

The architecture has the following Google components:

Google Cloud region
A Google Cloud region is a geographical area that contains data centers and infrastructure for hosting resources. Regions are made up of zones, which are isolated from each other within the region.
Google Cloud zone
A zone in Google Cloud is a deployment area for resources within a region. Zones are isolated from each other within a region, and are treated as a single failure domain.
Google Cloud Project
A Google Cloud Project is required to use Google Workspace APIs and build Google Workspace add-ons or apps. A Cloud Project forms the basis for creating, enabling, and using all Google Cloud services, including managing APIs, enabling billing, adding and removing collaborators, and managing permissions.
Google Virtual Private Cloud
Google Virtual Private Cloud (VPC) provides networking functionality to Compute Engine virtual machine (VM) instances, Google Kubernetes Engine (GKE) containers, database services, and serverless workloads. VPC provides global, scalable, and flexible networking for your cloud-based service.
Google Subnets
Each Google Virtual Private Cloud (VPC) network consists of one or more IP address ranges calledsubnets. Subnets are regional resources that have IP address ranges associated with them.
Google Load Balancer
Google Load Balancer provides automated traffic distribution from a single entry point to multiple servers in the back end.
Google Cloud Armor
Google Cloud Armor is a network security service that provides enterprise-grade protection against distributed denial-of-service (DDoS) attacks and web application threats—such as cross-site scripting (XSS) and SQL injection—by combining always-on DDoS defense, a web application firewall (WAF) with preconfigured and customizable rules, and adaptive, machine learning–based threat detection for applications.
Google Cloud Run
Google Cloud Run is a fully managed, serverless compute platform that lets you run containerized applications, APIs, and batch jobs directly on Google’s scalable infrastructure with automatic scaling, integrated security, and no need to manage servers or clusters.

The architecture has the following Oracle components:

OCI region
An OCI region is a localized geographic area that contains one or more data centers, hosting availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).
Oracle Autonomous Database
Oracle Autonomous Database is a fully-managed, preconfigured database environment that you can use for transaction processing and data warehousing workloads. You do not need to configure or manage any hardware, or install any software. OCI handles creating, backing up, patching, upgrading, and tuning the database.
Oracle Autonomous Data Guard
Oracle Autonomous Data Guard enables a standby (peer) database to provide data protection and disaster recovery for your Autonomous Database instance. It provides a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases to enable production Oracle databases to remain available without interruption. Oracle Data Guard maintains these standby databases as copies of the production database. Then, if the production database becomes unavailable because of a planned or an unplanned outage, you can switch any standby database to the production role, minimizing the downtime associated with the outage.
OCI Object Storage
OCI Object Storage provides access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store data directly from applications or from within the cloud platform. You can scale storage without experiencing any degradation in performance or service reliability.

Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

Architecture Variant

This variant of the proposed physical architecture uses a customer-managed Oracle REST Data Services deployment running in each application server. However, the fully managed MongoDB API provided by ATP Serverless is the best solution for most workloads since it is easier to manage.

If there are requirements to manually control the configuration and management of Oracle REST Data Services, then using customer-managed Oracle REST Data Services is an option. For example, to allow the application to use larger connection pools.

Note:

Use this architecture variant if there is a specific workload requirement to do so. Only advanced users should deploy this architecture variant.

This section only describes the differences compared to the previously described physical architecture, so all physical architecture design principles are valid unless stated otherwise.

The following architecture diagram depicts how the variant is deployed. For simplicity, only the cloud resources deployed in the JSON Workload VCN are depicted, since the rest of the deployment is the same as the physical architecture described earlier.

Description of mongodb-atp-s-google-arch-variant.png follows

Description of the illustration mongodb-atp-s-google-arch-variant.png

mongodb-atp-s-google-arch-variant.zip

The following describes the front-end tier for the variant:

The incoming user requests are distributed by Cloud Load Balancer, which uses a Cloud Run serverless Network Endpoint Group (NEG) as its backend. This setup enables horizontal scaling and eliminates any single point of failure.
The back-end application is deployed as Cloud Run containers.
Cloud Run horizontally scales containers depending on the existing containers load.
Create, install, and configure the container with the application and Oracle REST Data Services, which enables both to run in the same container.
Each worker runs the container image that colocates the application and Oracle REST Data Services in the same runtime environment.
Customer-managed Oracle REST Data Services workers are configured to enable the MongoDB API, so that the application can connect to the database using MongoDB tools and drivers.
Customer-managed Oracle REST Data Services is configured to adjust to the workload non-functional requirements, for instance, by configuring larger connection pools or using a different database service.
Both the back-end code and the customer-managed Oracle REST Data Services are preinstalled and preconfigured in the container image used on the workers. When Cloud Run scales horizontally, new workers are able to run the back-end application and connect to the database after provisioning.

Recommendations

Use the following recommendations as a starting point to further improve and evolve the workload. Your requirements might differ from the architecture described here.

Application Deployment
- Consider using a container based deployment using Google Kubernetes Engine (GKE) if you need advanced orchestration, networking and security features that might not be available in Cloud Run.
Security
- Consider using Oracle Data Safe to further increase the workload security posture and perform database auditing.
Observability
- Consider using Google Cloud Monitoring, to monitor ATP Serverless metrics alongside all other Google Cloud services monitoring data.
Disaster Recovery
- Consider automating and orchestrating the disaster and recovery for all the layers of the stack using a Google Partner solution or custom scripts that detect failures and initiate failover processes.
Operational Efficiency
- If the ATP Serverless workload is part of a wider database fleet, consider using Elastic Pools for increased cost efficiency.
- Consider enabling Oracle Cloud Infrastructure Database Management, an OCI service that provides a comprehensive set of database performance monitoring and management features, to streamline management of the ATP Serverless instance.
Application Evolution
- Consider deploying operational analytics and real-time reporting in ATP Serverless using SQL and a front-end such as Oracle APEX or Looker, without moving data out of the database, for trusted and real-time data analysis.
- Consider using ATP Serverless for machine learning using Oracle Machine Learning (OML), to build and train models with JSON data without any need for data movement and to deploy the models alongside the existing workload for efficient inferencing.
- For additional use cases beyond the application core, consider using ATP Serverless Select AI and database views querying JSON and holding metadata, so that users can query JSON data using natural language.
- Consider using ATP Serverless to store additional data types (relational, vector, spatial or graph) for added workload functionality and flexibility.

Explore More

Learn more about the features of this architecture:

Review these additional resources:

Acknowledgments

Author: José Cruz
Contributors: Massimo Castelli, Simon Griffith, Hermann Baer, Matt DeMarco, Julian Dontcheff