4.1 Installing OpenMetadata
Install the following libraries, and softwares applications before installing OpenMetadata (OM):
Prerequisites
Install the following libraries, and softwares applications before installing OpenMetadata (OM):
- Oracle Linux version 8.
- Linux Libraries →
"Development Tools" and gcc gcc-c++ sqlite-devel python39-devel cyrus-sasl-devel bzip2-devel libffi libffi-devel openssl-devel mysql mysql-devel - MySQL version 8.0.3.2
- JDK version 17
- Python version 3.10 only
Note:
Python 3.11 or later must not be used for the OpenMetadata/Airflow setup, as it can cause pkg_resources or package compatibility errors during dependency installation or runtime. The Python virtual environment must be created using Python 3.10.x. The Airflow constraint URL must use the Python 3.10 constraints file. - Create the following databases in MySQL:
CREATE DATABASE openmetadata_db; CREATE DATABASE airflow_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Before creating users, check the active MySQL password validation policy:
SHOW VARIABLES LIKE 'validate_password%';Then create users and grant privileges:
CREATE USER 'openmetadata_user'@'%' IDENTIFIED BY '<OPENMETADATA_USER_PASSWORD>'; CREATE USER 'airflow_user'@'%' IDENTIFIED BY '<AIRFLOW_USER_PASSWORD>'; CREATE USER 'airflow_user'@'localhost' IDENTIFIED BY '<AIRFLOW_USER_PASSWORD>'; GRANT ALL PRIVILEGES ON openmetadata_db.* TO 'openmetadata_user'@'%' WITH GRANT OPTION; GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user'@'%' WITH GRANT OPTION; GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user'@'localhost' WITH GRANT OPTION; COMMIT;Note:
The sample passwords used in this guide (such asopenmetadata_passwordandairflow_pass) are examples only. These passwords may fail if MySQL password validation is enabled. Use passwords that comply with the configured MySQL password policy. RunSHOW VARIABLES LIKE 'validate_password%';to review the active policy before creating users. If required, temporarily lower the password policy only with DBA and security approval, and restore the required policy after user creation. - Install Apache Airflow Version 2.9.3. To do so:
- (Optional) Set up a proxy, if your network requires it:
export http_proxy=<YOUR_PROXY_URL> export https_proxy=<YOUR_PROXY_URL>Note:
Skip this step if a proxy is not required. - Define the Airflow installation settings and environment variables. Run the following command:
export INSTALL_DIR=<YOUR_INSTALL_DIR> export AIRFLOW_HOME=$INSTALL_DIR/airflow export MYSQL_DB_HOST=<YOUR_MYSQL_DB_HOST> export MYSQL_DB_PORT=<YOUR_MYSQL_DB_PORT> export PYTHON_INS_DIR=<PYTHON_INSTALLATION_PATH> export PYTHON_VENV_DIR=$PYTHON_INS_DIR/venv - Create the required Airflow directories. Run the following command:
mkdir -p "$AIRFLOW_HOME" chmod 755 -R "$AIRFLOW_HOME" - Create and activate a Python 3.10.x virtual environment. Run the following command:
export PYTHON_INS_DIR=<PYTHON_INSTALLATION_PATH> export PYTHON_VENV_DIR=$PYTHON_INS_DIR/venv cd "$INSTALL_DIR" python3.10 -m venv "$PYTHON_VENV_DIR" source "$PYTHON_VENV_DIR/bin/activate" - Upgrade pip to the latest version. Run the following command:
python3 -m pip install --upgrade pip - Install the required dependencies.
pip install "openmetadata-managed-apis~=1.7.7" pip install "openmetadata-ingestion[all]~=1.7.7" pip install "apache-airflow==2.9.3" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.9.3/constraints-3.10.txt" pip install "python-daemon>=3.0.0" - (Optional) Remove unneeded Apache Airflow providers. Run the following command:
pip freeze | grep "apache-airflow-providers" | grep -v "docker|http" | xargs pip uninstall -yNote:
Run this step only if you want to prune unused providers. - Configure the Airflow database connection. To do so:
- Set the required environment variables:
export AIRFLOW_DB=<YOUR_DB_NAME> export DB_USER=<YOUR_DB_USER> export DB_PASSWORD=<YOUR_DB_PASSWORD> export DB_SCHEME=mysql+pymysql export AIRFLOW_ADMIN_USER=<ADMIN_USERNAME> export AIRFLOW_ADMIN_PASSWORD=<ADMIN_PASSWORD> export AIRFLOW_ADMIN_EMAIL=<ADMIN_EMAIL> - Build and export the SQLAlchemy database connection string. Run the following command:
export AIRFLOW__DATABASE__SQL_ALCHEMY_CONN="mysql+pymysql://${DB_USER}:${DB_PASSWORD}@${MYSQL_DB_HOST}:${MYSQL_DB_PORT}/${AIRFLOW_DB}"Note:
DB_PASSWORDmust match the MySQL password created for airflow_user in Step 6. If the password contains special characters (such as @, /, +, or :), URL-encode the password before using it in the SQLAlchemy connection string. For example,p@ssw0rdmust be encoded asp%40ssw0rd. Failure to URL-encode special characters will cause database connection errors at startup.
- Set the required environment variables:
- Update
airflow.cfgby running the following commands:sed -i "s#\(sql_alchemy_conn = \).*#\1${AIRFLOW__DATABASE__SQL_ALCHEMY_CONN}#" "$AIRFLOW_HOME/airflow.cfg" sed -i "s#\(hostname_callable = \).*#\1socket.gethostname#" "$AIRFLOW_HOME/airflow.cfg" sed -i "s#\(auth_backends = \).*#\1airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session#" "$AIRFLOW_HOME/airflow.cfg" sed -i "s#\(executor = \).*#\1LocalExecutor#" "$AIRFLOW_HOME/airflow.cfg" - Initialize the Airflow database. Run the following command:
airflow db init - Create an Administrator Account. To do so:
- Create the Airflow administrator user by running the following command:
airflow users create \ --username "$AIRFLOW_ADMIN_USER" \ --firstname <ADMIN_FIRSTNAME> \ --lastname <ADMIN_LASTNAME> \ --role Admin \ --email "$AIRFLOW_ADMIN_EMAIL" \ --password "$AIRFLOW_ADMIN_PASSWORD" - (Optional) Run the following command to apply any pending database migrations:
airflow db migrate
- Create the Airflow administrator user by running the following command:
- Start Apache Airflow. To do so:
- Start Airflow in standalone mode:
airflow standalone - To run Airflow in the background, use the following command:
airflow standalone >> ./airflow.log 2>&1 & - Alternatively, start the webserver and scheduler as separate processes:
airflow webserver & airflow scheduler &Note:
- Replace all <...> placeholders with values specific to your environment.
- Ensure that Python 3.10 and pip are installed, a MySQL instance is running and accessible, and access to PyPI and GitHub is available.
- Follow the steps in order, providing the necessary values at each point.
- Start Airflow in standalone mode:
- (Optional) Set up a proxy, if your network requires it:
- Install and Start OpenSearch:
- Navigate to your intended install directory. To do so, run the following command:
cd <OpenSearch Directory> - Download OpenSearch installer version 2.7.0 from the following site:
https://artifacts.opensearch.org/releases/bundle/opensearch/2.7.0/opensearch-2.7.0-linux-x64.tar.gz
- Extract the package and set permissions:
chmod 777 opensearch-2.7.0-linux-x64.tar.gz tar -xvf opensearch-2.7.0-linux-x64.tar.gz rm -f opensearch-2.7.0-linux-x64.tar.gz chmod -R 755 opensearch-2.7.0/ echo 'plugins.security.disabled: true' >> opensearch-2.7.0/config/opensearch.yml - Start OpenSearch as a background process. To do so, run the following command:
cd opensearch-2.7.0/ ./bin/opensearch -d -p pidNote:
Ensure that OpenSearch is fully started and operational before proceeding with the startup of OpenMetadata or any other dependent services.
- Navigate to your intended install directory. To do so, run the following command:
- Create and update the
set_env.envenvironment variable file in the installation directory.OMD_INS_DIR=<OPEN_METADATA_INSTALLATION_PATH> LOCAL_REPO_DIR=<OPEN_METADATA_INSTALLATION_PATH>/local_repo AIRFLOW_HOME=<OPEN_METADATA_INSTALLATION_PATH>/airflow PYTHON_INS_DIR=<PYTHON_INSTALLATION_PATH> PYTHON_VENV_DIR=$PYTHON_INS_DIR/venv MYSQL_DB_HOST=<DATABASE_HOST> MYSQL_DB_PORT=<DATABASE_PORT> HTTP_PROXY_URL=http://<PROXY_HOST>:80 HTTPS_PROXY_URL=http://<PROXY_HOST>:80 AUTHENTICATION_CLIENT_ID=<AUTHENTICATION_CLIENT_ID> AUTHENTICATION_AUTHORITY=<AUTHENTICATION_AUTHORITY>Consider the following file as an example:
OMD_INS_DIR=/scratch/openmetadata-ins-dir LOCAL_REPO_DIR=/scratch/openmetadata-ins-dir/local_repo AIRFLOW_HOME=/scratch/openmetadata-ins-dir/airflow PYTHON_INS_DIR=/scratch/openmetadata-ins-dir/python310 PYTHON_VENV_DIR=$PYTHON_INS_DIR/venv MYSQL_DB_HOST=localhost MYSQL_DB_PORT=3306 - Install OpenMetadata v1.7.7.7. To do so:
Note:
Replace the example paths (such as /scratch/openmetadata-ins-dir) with your actual installation directories.CREATE DATABASE openmetadata_db; GRANT ALL PRIVILEGES ON openmetadata_db.* TO 'openmetadata_user'@'%' IDENTIFIED BY '<OPENMETADATA_USER_PASSWORD>' WITH GRANT OPTION;- Create the OpenMetadata database if it was not already created in Step 6. Run this command:
CREATE DATABASE openmetadata_db; GRANT ALL PRIVILEGES ON openmetadata_db.* TO 'openmetadata_user'@'%' IDENTIFIED BY '<OPENMETADATA_USER_PASSWORD>' WITH GRANT OPTION; - Create a Working Directory.
mkdir openmetadata cd openmetadata - Download and extract OpenMetadata version 1.7.7:
wget https://github.com/open-metadata/OpenMetadata/releases/download/1.7.7-release/openmetadata-1.7.7.tar.gz tar -xzf openmetadata-1.7.7.tar.gz - Configure openmetadata.yaml to connect to the MySQL database. To do so:
- Edit the following file:
<OM-INSTALL-DIR>/openmetadata-1.7.7/conf/openmetadata.yaml - Update the MySQL database connection details:
driverClass: ${DB_DRIVER_CLASS:-com.mysql.cj.jdbc.Driver} user: ${DB_USER:-<YOUR_DB_USER>} password: ${DB_USER_PASSWORD:-<YOUR_DB_PASSWORD>} url: jdbc:${DB_SCHEME:mysql}://${DB_HOST:<YOUR_DB_HOST>}:${DB_PORT:<YOUR_DB_PORT>}/${OM_DATABASE:<YOUR_DB_NAME>}?${DB_PARAMS:-allowPublicKeyRetrieval=true&useSSL=false&serverTimezone=UTC}Note:
- <YOUR_DB_USER> – MySQL username with privileges for OpenMetadata
- <YOUR_DB_PASSWORD> – Password for the MySQL user
- <YOUR_DB_HOST> – Hostname or IP address of the MySQL server (for example, localhost)
- <YOUR_DB_PORT> – MySQL port number (default: 3306)
- <YOUR_DB_NAME> – Database name (default: openmetadata_db)
- Edit the following file:
- Set OpenSearch as the search type. To do so, run the following command:
sed -i 's#\(searchType: ${SEARCH_TYPE:-\).*#\1opensearch}#' conf/openmetadata.yaml - Prepare the OpenMetadata Database and Indexes. To do so, run the following command:
cd <OM-INSTALL-DIR>/openmetadata-1.7.7 ./bootstrap/openmetadata-ops.sh drop-create
- Create the OpenMetadata database if it was not already created in Step 6. Run this command:
- Start Airflow, OpenSearch and Openmetadata services.
Use the following sample script to start Airflow, OpenSearch, and OpenMetadata services. Replace all placeholder values with values applicable to your environment before running the script.
Note:
Ensure that the Python virtual environment uses Python 3.10.x. Do not use Python 3.11 or later for this OpenMetadata/Airflow setup.#!/bin/bash set -euo pipefail ############################################################################### # OpenMetadata, Airflow, and OpenSearch Startup Script # # Replace the placeholder values in the "Customer-specific configuration" section # before running this script. ############################################################################### ############################################################################### # Customer-specific configuration ############################################################################### OMD_INS_DIR="<OPENMETADATA_INSTALLATION_DIRECTORY>" PYTHON_INS_DIR="<PYTHON_3_10_INSTALLATION_DIRECTORY>" PYTHON_VENV_DIR="<PYTHON_VIRTUAL_ENVIRONMENT_DIRECTORY>" AIRFLOW_INS_DIR="<AIRFLOW_HOME_DIRECTORY>" OPENSEARCH_DIR="<OPENSEARCH_INSTALLATION_DIRECTORY>" OPENMETADATA_DIR="<OPENMETADATA_INSTALLATION_DIRECTORY>/<OPENMETADATA_VERSION_DIRECTORY>" JAVA_HOME="<JAVA_17_INSTALLATION_DIRECTORY>" AIRFLOW_ADMIN_USER="<AIRFLOW_ADMIN_USER>" AIRFLOW_ADMIN_PASSWORD="<AIRFLOW_ADMIN_PASSWORD>" ############################################################################### # Example configuration ############################################################################### # OMD_INS_DIR="/scratch/openmetadata-ins-dir" # PYTHON_INS_DIR="/scratch/openmetadata-ins-dir/python310" # PYTHON_VENV_DIR="/scratch/openmetadata-ins-dir/python310/venv" # AIRFLOW_INS_DIR="/scratch/openmetadata-ins-dir/airflow" # OPENSEARCH_DIR="/scratch/openmetadata-ins-dir/opensearch-2.7.0" # OPENMETADATA_DIR="/scratch/openmetadata-ins-dir/openmetadata-1.7.7" # JAVA_HOME="/scratch/openmetadata-ins-dir/jdk-17.0.2" # # AIRFLOW_ADMIN_USER="admin" # AIRFLOW_ADMIN_PASSWORD="admin" ############################################################################### # Environment setup ############################################################################### export JAVA_HOME export PATH="$JAVA_HOME/bin:$PYTHON_INS_DIR/bin:$PATH" export AIRFLOW_HOME="$AIRFLOW_INS_DIR" echo "Using OMD_INS_DIR=$OMD_INS_DIR" echo "Using PYTHON_INS_DIR=$PYTHON_INS_DIR" echo "Using PYTHON_VENV_DIR=$PYTHON_VENV_DIR" echo "Using AIRFLOW_HOME=$AIRFLOW_HOME" echo "Using OPENSEARCH_DIR=$OPENSEARCH_DIR" echo "Using OPENMETADATA_DIR=$OPENMETADATA_DIR" echo "Using JAVA_HOME=$JAVA_HOME" echo "Checking Java version..." java -version echo "Checking Python version..." python3 --version echo "Activating Python virtual environment..." source "$PYTHON_VENV_DIR/bin/activate" echo "Validating Python virtual environment version..." python -c "import sys; exit(0 if sys.version_info[:2] == (3, 10) else 1)" || { echo "Python 3.10.x is required for this OpenMetadata/Airflow setup. Do not use Python 3.11 or later." exit 1 } echo "Checking Airflow version..." airflow version ############################################################################### # Start Airflow ############################################################################### echo "Starting Airflow standalone..." cd "$OMD_INS_DIR" airflow standalone >> "$OMD_INS_DIR/airflow.log" 2>&1 & echo $! > "$OMD_INS_DIR/airflow.pid" echo "Airflow PID: $(cat "$OMD_INS_DIR/airflow.pid")" deactivate ############################################################################### # Start OpenSearch ############################################################################### echo "Starting OpenSearch..." cd "$OPENSEARCH_DIR" ./bin/opensearch -d -p pid echo "OpenSearch PID file: $OPENSEARCH_DIR/pid" ############################################################################### # Start OpenMetadata ############################################################################### echo "Starting OpenMetadata..." cd "$OPENMETADATA_DIR" ./bin/openmetadata-server-start.sh conf/openmetadata.yaml >> "$OPENMETADATA_DIR/output.log" 2>&1 & echo $! > "$OPENMETADATA_DIR/output.pid" echo "OpenMetadata PID: $(cat "$OPENMETADATA_DIR/output.pid")" ############################################################################### # Validation ############################################################################### echo "Startup commands completed." echo "Review logs if services are not reachable:" echo " Airflow: $OMD_INS_DIR/airflow.log" echo " OpenSearch: $OPENSEARCH_DIR/logs/" echo " OpenMetadata: $OPENMETADATA_DIR/output.log" echo "Validation commands:" echo " curl -i -u ${AIRFLOW_ADMIN_USER}:${AIRFLOW_ADMIN_PASSWORD} http://localhost:8080/api/v1/dags" echo " curl -i http://localhost:9200" echo " curl -i http://localhost:8585/api/v1/system/version" - Stop Airflow, OpenSearch, and OpenMetadata services
Use the following sample script to stop Airflow, OpenSearch, and OpenMetadata services. Replace all placeholder values with values applicable to your environment before running the script.
#!/bin/bash set -euo pipefail ############################################################################### # OpenMetadata, Airflow, and OpenSearch Shutdown Script # # Replace the placeholder values in the "Customer-specific configuration" section # before running this script. ############################################################################### ############################################################################### # Customer-specific configuration ############################################################################### OMD_INS_DIR="<OPENMETADATA_INSTALLATION_DIRECTORY>" PYTHON_INS_DIR="<PYTHON_3_10_INSTALLATION_DIRECTORY>" PYTHON_VENV_DIR="<PYTHON_VIRTUAL_ENVIRONMENT_DIRECTORY>" OPENSEARCH_DIR="<OPENSEARCH_INSTALLATION_DIRECTORY>" OPENMETADATA_DIR="<OPENMETADATA_INSTALLATION_DIRECTORY>/<OPENMETADATA_VERSION_DIRECTORY>" ############################################################################### # Example configuration ############################################################################### # OMD_INS_DIR="/scratch/openmetadata-ins-dir" # PYTHON_INS_DIR="/scratch/openmetadata-ins-dir/python310" # PYTHON_VENV_DIR="/scratch/openmetadata-ins-dir/python310/venv" # OPENSEARCH_DIR="/scratch/openmetadata-ins-dir/opensearch-2.7.0" # OPENMETADATA_DIR="/scratch/openmetadata-ins-dir/openmetadata-1.7.7" ############################################################################### # Helper functions ############################################################################### stop_pid() { local name="$1" local pid="$2" if [ -z "$pid" ]; then return 0 fi if kill -0 "$pid" 2>/dev/null; then echo "Stopping $name PID $pid" kill -TERM "$pid" 2>/dev/null || true else echo "$name PID $pid is not running." fi } stop_matching_processes() { local pattern="$1" local description="$2" local pids pids=$(pgrep -f "$pattern" || true) if [ -n "$pids" ]; then echo "Stopping $description:" echo "$pids" echo "$pids" | xargs -r kill -TERM 2>/dev/null || true else echo "No running process found for $description" fi } force_kill_matching_processes() { local pattern="$1" local description="$2" local pids pids=$(pgrep -f "$pattern" || true) if [ -n "$pids" ]; then echo "Force stopping $description:" echo "$pids" echo "$pids" | xargs -r kill -KILL 2>/dev/null || true fi } ############################################################################### # Stop Airflow ############################################################################### echo "Stopping Airflow..." if [ -f "$OMD_INS_DIR/airflow.pid" ]; then AIRFLOW_PID=$(cat "$OMD_INS_DIR/airflow.pid") stop_pid "Airflow startup PID" "$AIRFLOW_PID" else echo "Airflow PID file not found at $OMD_INS_DIR/airflow.pid" fi stop_matching_processes "$PYTHON_VENV_DIR/bin/airflow webserver|airflow webserver" "Airflow webserver" stop_matching_processes "$PYTHON_VENV_DIR/bin/airflow triggerer|airflow triggerer" "Airflow triggerer" stop_matching_processes "airflow executor -- LocalExecutor" "Airflow LocalExecutor executor" stop_matching_processes "airflow worker -- LocalExecutor" "Airflow LocalExecutor workers" stop_matching_processes "airflow scheduler -- DagFileProcessorManager" "Airflow DAG file processor manager" stop_matching_processes "gunicorn.*airflow-webserver|gunicorn: worker \[airflow-webserver\]" "Airflow gunicorn webserver workers" stop_matching_processes "airflow standalone" "Airflow standalone" sleep 8 force_kill_matching_processes "$PYTHON_VENV_DIR/bin/airflow webserver|airflow webserver" "remaining Airflow webserver" force_kill_matching_processes "$PYTHON_VENV_DIR/bin/airflow triggerer|airflow triggerer" "remaining Airflow triggerer" force_kill_matching_processes "airflow executor -- LocalExecutor" "remaining Airflow LocalExecutor executor" force_kill_matching_processes "airflow worker -- LocalExecutor" "remaining Airflow LocalExecutor workers" force_kill_matching_processes "airflow scheduler -- DagFileProcessorManager" "remaining Airflow DAG file processor manager" force_kill_matching_processes "gunicorn.*airflow-webserver|gunicorn: worker \[airflow-webserver\]" "remaining Airflow gunicorn webserver workers" force_kill_matching_processes "airflow standalone" "remaining Airflow standalone" rm -f "$OMD_INS_DIR/airflow.pid" ############################################################################### # Stop OpenMetadata ############################################################################### echo "Stopping OpenMetadata..." if [ -f "$OPENMETADATA_DIR/output.pid" ]; then OM_PID=$(cat "$OPENMETADATA_DIR/output.pid") stop_pid "OpenMetadata" "$OM_PID" else echo "OpenMetadata PID file not found at $OPENMETADATA_DIR/output.pid" fi stop_matching_processes "OpenMetadataApplication" "OpenMetadata application" sleep 5 force_kill_matching_processes "OpenMetadataApplication" "remaining OpenMetadata application" rm -f "$OPENMETADATA_DIR/output.pid" ############################################################################### # Stop OpenSearch ############################################################################### echo "Stopping OpenSearch..." if [ -f "$OPENSEARCH_DIR/pid" ]; then OS_PID=$(cat "$OPENSEARCH_DIR/pid") stop_pid "OpenSearch" "$OS_PID" else echo "OpenSearch PID file not found at $OPENSEARCH_DIR/pid" fi stop_matching_processes "OpenSearch" "OpenSearch" sleep 5 force_kill_matching_processes "OpenSearch" "remaining OpenSearch" rm -f "$OPENSEARCH_DIR/pid" ############################################################################### # Validation ############################################################################### echo "Shutdown commands completed." echo "Validation commands:" echo " ps -ef | grep -E 'airflow|gunicorn|OpenMetadataApplication|OpenSearch' | grep -v grep" echo " ss -lntp | grep -E '8080|8585|9200'"