4.1 Installing OpenMetadata
Install the following libraries, and softwares applications before installing OpenMetadata (OM):
Prerequisites
Install the following libraries, and softwares applications before installing OpenMetadata (OM):
- Oracle Linux version 8.
- Linux Libraries →
"Development Tools" and gcc gcc-c++ sqlite-devel python39-devel cyrus-sasl-devel bzip2-devel libffi libffi-devel openssl-devel mysql mysql-devel - MySQL version 8.0.3.2
- JDK version 17
- Python version 3.10 or greater
- Create the following databases in MySQL:
CREATE DATABASE openmetadata_db; CREATE DATABASE airflow_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; CREATE USER 'openmetadata_user'@'%' IDENTIFIED BY 'openmetadata_password'; CREATE USER 'airflow_user'@'%' IDENTIFIED BY 'airflow_pass'; CREATE USER 'airflow_user'@'localhost' IDENTIFIED BY 'airflow_pass'; GRANT ALL PRIVILEGES ON openmetadata_db.* TO 'openmetadata_user'@'%' WITH GRANT OPTION; GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user'@'%' WITH GRANT OPTION; GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user'@'localhost' WITH GRANT OPTION; commit; - Install Apache Airflow Version 2.9.3. To do so:
- (Optional) Set up a proxy, if your network requires it:
export http_proxy=<YOUR_PROXY_URL> export https_proxy=<YOUR_PROXY_URL>Note:
Skip this step if a proxy is not required. - Define the Airflow installation settings and environment variables. To do so:
- Specify the installation directory (for example,
/your/airflow/install/dir). - Identify the MySQL host and port (for example, localhost, 3306).
- Provide the Airflow database name, database user, and password (for example, airflow_db, airflow_user, airflow_pass).
- Define the Airflow administrator username, password, and email address.
Example exports (replace the placeholders with values specific to your environment):
export INSTALL_DIR=<YOUR_INSTALL_DIR> export AIRFLOW_HOME=$INSTALL_DIR/airflow export MYSQL_DB_HOST=<YOUR_MYSQL_DB_HOST> export MYSQL_DB_PORT=<YOUR_MYSQL_DB_PORT> - Specify the installation directory (for example,
- Create the required Airflow directories. Run the following command:
mkdir -p "$AIRFLOW_HOME" chmod 755 -R "$AIRFLOW_HOME" - Create and activate a Python virtual environment. Run the following command:
cd "$INSTALL_DIR" python3 -m venv venv source venv/bin/activate - Upgrade pip to the latest version. Run the following command:
python3 -m pip install --upgrade pip - Install the required dependencies. Adjust the versions as required to align with your compatibility matrix:
pip install "openmetadata-managed-apis~=1.7.7" pip install "openmetadata-ingestion[all]~=1.7.7" pip install "apache-airflow==2.9.3" pip install "python-daemon>=3.0.0" - (Optional) Remove unneeded Apache Airflow providers. Run the following command:
pip freeze | grep "apache-airflow-providers" | grep -v "docker|http" | xargs pip uninstall -yNote:
Run this step only if you want to prune unused providers. - Configure the Airflow database connection. To do so:
- Set the required environment variables:
export AIRFLOW_DB=<YOUR_DB_NAME> export DB_USER=<YOUR_DB_USER> export DB_PASSWORD=<YOUR_DB_PASSWORD> export DB_SCHEME=mysql+pymysql export AIRFLOW_ADMIN_USER=<ADMIN_USERNAME> export AIRFLOW_ADMIN_PASSWORD=<ADMIN_PASSWORD> export AIRFLOW_ADMIN_EMAIL=<ADMIN_EMAIL> - Build and export the SQLAlchemy database connection string. Run the following command:
export AIRFLOW_DATABASE_SQL_ALCHEMY_CONN="mysql+pymysql://${DB_USER}:${DB_PASSWORD}@${MYSQL_DB_HOST}:${MYSQL_DB_PORT}/${AIRFLOW_DB}"
- Set the required environment variables:
- Update
airflow.cfgby running the following commands:sed -i "s#\(sql_alchemy_conn = \).*#\1${AIRFLOW__DATABASE__SQL_ALCHEMY_CONN}#" $AIRFLOW_HOME/airflow.cfg sed -i "s#\(hostname_callable = \).*#\1socket.gethostname#" $AIRFLOW_HOME/airflow.cfg sed -i "s#\(auth_backends = \).*#\1airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session#" $AIRFLOW_HOME/airflow.cfg sed -i "s#\(executor = \).*#\1LocalExecutor#" $AIRFLOW_HOME/airflow.cfg - Initialize the Airflow database. Run the following command:
airflow db init - Create an Administrator Account. To do so:
- Create the Airflow administrator user by running the following command:
airflow users create \ --username $AIRFLOW_ADMIN_USER \ --firstname <ADMIN_FIRSTNAME> \ --lastname <ADMIN_LASTNAME> \ --role Admin \ --email $AIRFLOW_ADMIN_EMAIL \ --password $AIRFLOW_ADMIN_PASSWORD - (Optional) Run the following command to apply any pending database migrations:
airflow db migrate
- Create the Airflow administrator user by running the following command:
- Start Apache Airflow. To do so:
- Start Airflow in standalone mode:
airflow standalone - To run Airflow in the background, use the following command:
airflow standalone >> ./airflow.log 2>&1 & - Alternatively, start the webserver and scheduler as separate processes:
airflow webserver & airflow scheduler &Note:
- Replace all <...> placeholders with values specific to your environment.
- Ensure that Python 3.10 and pip are installed, a MySQL instance is running and accessible, and access to PyPI and GitHub is available.
- Follow the steps in order, providing the necessary values at each point.
- Start Airflow in standalone mode:
- (Optional) Set up a proxy, if your network requires it:
- Install and Start OpenSearch:
- Navigate to your intended install directory. To do so, run the following command:
cd <OpenSearch Directory> - Download OpenSearch installer version 2.7.0 from the following site:
https://artifacts.opensearch.org/releases/bundle/opensearch/2.7.0/opensearch-2.7.0-linux-x64.tar.gz
- Untar the release download and set permissions
chmod 777 opensearch-2.7.0-linux-x64.tar.gz tar -xvf opensearch-2.7.0-linux-x64.tar.gz rm -f opensearch-2.7.0-linux-x64.tar.gz chmod -R 755 opensearch-2.7.0/ echo 'plugins.security.disabled: true' >> opensearch-2.7.0/config/opensearch.yml - Start OpenSearch as a background process. To do so, run the following command:
cd opensearch-2.7.0/ ./bin/opensearch -d -p pidNote:
Ensure that OpenSearch is fully started and operational before proceeding with the startup of OpenMetadata or any other dependent services.
- Navigate to your intended install directory. To do so, run the following command:
- Create and update the
set_env.envenvironment variable file with the locations of the variables in the installation directory.Provide the following details for each variable in the file:OMD_INS_DIR=<OPEN METADATA INSTALLATION PATH> LOCAL_REPO_DIR=<OPEN METADATA INSTALLATION PATH>/local_repo AIRFLOW_HOME=<OPEN METADATA INSTALLATION PATH>/airflow PYTHON_VENV_PATH=<PYTHON PATH> HTTP_PROXY_URL=http://<PROXY HOST>:80 HTTPS_PROXY_URL=http://<PROXY HOST>:80 MYSQL_DB_PORT=<DATABASE HOST>MYSQL_DB_PORT=<DATABASE PORT> AUTHENTICATION_CLIENT_ID=<AUTHENTICATION_CLIENT_ID> AUTHENTICATION_AUTHORITY=<AUTHENTICATION_AUTHORITY>Consider the following file as an example:
OMD_INS_DIR=/scratch/openmetadata-ins-dir LOCAL_REPO_DIR=/scratch/openmetadata-ins-dir/local_repo AIRFLOW_HOME=/scratch/openmetadata-ins-dir/airflow PYTHON_INS_DIR=/scratch/openmetadata-ins-dir/python39
- Install OpenMetadata v1.7.7. To do so:
Note:
Replace the example paths (such as /scratch/openmetadata-ins-dir) with your actual installation directories throughout the steps below.- Create the OpenMetadata Database (MySQL). To do so:
Note:
This step is typically completed as part of Step 6 in this section. If not already performed, complete it before proceeding.Connect to your MySQL server and execute below commands:
CREATE DATABASE openmetadata_db;GRANT ALL PRIVILEGES ON openmetadata_db.* TO 'openmetadata_user'@'%' IDENTIFIED BY '<YOUR_PASSWORD>' WITH GRANT OPTION;Note:
Replace <YOUR_PASSWORD> with your chosen password. - Create a Working Directory. To do so, you must choose a directory location for installation.
mkdir openmetadatacd openmetadata - Download and Extract OpenMetadata. To do so:
- In the working directory created in the previous step, download the OpenMetadata 1.7.7 installation package and extract its contents.
- Run the following command:
wget https://github.com/open-metadata/OpenMetadata/releases/download/1.7.7-release/openmetadata-1.7.7.tar.gz tar -xzf openmetadata-1.7.7.tar.gz
- Configure openmetadata.yaml to connect to the MySQL database. To do so:
- Edit the following file:
<OM Working Directory>/conf/openmetadata.yaml - Update the following placeholders for MySQL database connection to match your environment:
driverClass: ${DB_DRIVER_CLASS:-com.mysql.cj.jdbc.Driver} user: ${DB_USER:-<YOUR_DB_USER>} # e.g., openmetadata_user password: ${DB_USER_PASSWORD:-<YOUR_DB_PASSWORD>} # e.g., strongpassword url: jdbc:${DB_SCHEME:mysql}://${DB_HOST:<YOUR_DB_HOST>}:${DB_PORT:<YOUR_DB_PORT>}/${OM_DATABASE:<YOUR_DB_NAME>}?${DB_PARAMS:-allowPublicKeyRetrieval=true&useSSL=false&serverTimezone=UTC}Note:
- <YOUR_DB_USER> – MySQL username with privileges for OpenMetadata
- <YOUR_DB_PASSWORD> – Password for the MySQL user
- <YOUR_DB_HOST> – Hostname or IP address of the MySQL server (for example, localhost)
- <YOUR_DB_PORT> – MySQL port number (default: 3306)
- <YOUR_DB_NAME> – Database name (default: openmetadata_db)
- Edit the following file:
- Set OpenSearch as the search type. To do so, run the following command:
sed i 's#(searchType: ${SEARCH_TYPE: ".*#\1opensearch"}#' conf/openmetadata.yaml - Prepare the OpenMetadata Database and Indexes. To do so, run the following command:
cd <OM Working Directory>./bootstrap/openmetadata-ops.sh drop-createExample:
cd /scratch/openmetadata-ins-dir/openmetadata-1.7.7 # <== Example; use your chosen path ./bootstrap/openmetadata-ops.sh drop-create
- Create the OpenMetadata Database (MySQL). To do so:
- Start Airflow, OpenSearch and Openmetadata services. To do so:
Note:
Replace all example paths shown below (for example,/scratch/openmetadata-ins-dir) with the base installation directory specific to your environment.- Activate the python virtual environment by executing commands:
export OMD_INS_DIR=<your_installation_dir> export PYTHON_INS_DIR=$OMD_INS_DIR/python310 <Python installed directory> export PYTHON_VENV_DIR=$PYTHON_INS_DIR/venv export AIRFLOW_INS_DIR=$OMD_INS_DIR/airflow export JAVA_HOME=$OMD_INS_DIR/jdk-17.0.2 export PATH="$JAVA_HOME/bin:$PATH" java -version export PATH="$PYTHON_INS_DIR/bin:$PATH" python3 --version source $PYTHON_VENV_DIR/bin/activate - Start Airflow by executing below commands:
export AIRFLOW_HOME=$AIRFLOW_INS_DIR airflow version cd $OMD_INS_DIR airflow standalone >> ./airflow.log 2>&1 & echo $! > airflow.pid - Deactivate the python virtual environment by executing command:
deactivate - Start OpenSearch service by executing below commands:
cd $OMD_INS_DIR/opensearch-2.7.0/ ./bin/opensearch -d -p pid - Start OpenMetadata Service by executing below commands:
cd $OMD_INS_DIR/openmetadata-1.7.7/ ./bin/openmetadata-server-start.sh conf/openmetadata.yaml >> output.log 2>&1 & echo $! > output.pid
Note:
- Ensure that Airflow and OpenSearch are fully started and running successfully before launching OpenMetadata.
- Run all commands as a user with appropriate permissions to access configuration files and log directories.
- Review
output.log,airflow.log, and theOpenSearchlogs for any startup errors or warnings. - If an error occurs, verify all configuration paths, environment variable values, and service dependencies.
- If custom installation directories are used, ensure that all references to those paths are consistently updated across configuration files, environment variables, and command executions.
- Activate the python virtual environment by executing commands: