4.1 Installing OpenMetadata
Install the following libraries, and softwares applications before installing OpenMetadata (OM):
Prerequisites
Install the following libraries, and softwares applications before installing OpenMetadata (OM):
- Oracle Linux version 8.
- Linux Libraries →
"Development Tools" and gcc gcc-c++ sqlite-devel python39-devel cyrus-sasl-devel bzip2-devel libffi libffi-devel openssl-devel mysql mysql-devel - MySQL version 8.0.3.2
- JDK version 17
- Python version 3.10
- Create the following databases in MySQL:
CREATE DATABASE openmetadata_db; CREATE DATABASE airflow_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; CREATE USER 'openmetadata_user'@'%' IDENTIFIED BY 'openmetadata_password'; CREATE USER 'airflow_user'@'%' IDENTIFIED BY 'airflow_pass'; CREATE USER 'airflow_user'@'localhost' IDENTIFIED BY 'airflow_pass'; GRANT ALL PRIVILEGES ON openmetadata_db.* TO 'openmetadata_user'@'%' WITH GRANT OPTION; GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user'@'%' WITH GRANT OPTION; GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user'@'localhost' WITH GRANT OPTION; commit; - Install Apache Airflow Version 2.8.4. To do so:
- (Optional) Set up a proxy, if your network requires it:
export http_proxy=<YOUR_PROXY_URL> export https_proxy=<YOUR_PROXY_URL>Note:
Skip this step if a proxy is not required. - Define the Airflow installation settings and environment variables. To do so:
- Specify the installation directory (for example,
/your/airflow/install/dir). - Identify the MySQL host and port (for example, localhost, 3306).
- Provide the Airflow database name, database user, and password (for example, airflow_db, airflow_user, airflow_pass).
- Define the Airflow administrator username, password, and email address.
Example exports (replace the placeholders with values specific to your environment):
export INSTALL_DIR=<YOUR_INSTALL_DIR> export AIRFLOW_HOME=$INSTALL_DIR/airflow export MYSQL_DB_HOST=<YOUR_MYSQL_DB_HOST> export MYSQL_DB_PORT=<YOUR_MYSQL_DB_PORT> - Specify the installation directory (for example,
- Create the required Airflow directories. Run the following command:
mkdir -p "$AIRFLOW_HOME" chmod 755 -R "$AIRFLOW_HOME" - Create and activate a Python virtual environment. Run the following command:
cd "$INSTALL_DIR" python3 -m venv venv source venv/bin/activate - Upgrade pip to the latest version. Run the following command:
python3 -m pip install --upgrade pip - Install the required dependencies. Adjust the versions as required to align with your compatibility matrix:
pip install "openmetadata-managed-apis~=1.7.5" pip install "openmetadata-ingestion[all]~=1.7.5" pip install "apache-airflow==2.8.4" pip install "python-daemon>=3.0.0" - (Optional) Remove unneeded Apache Airflow providers. Run the following command:
pip freeze | grep "apache-airflow-providers" | grep -v "docker|http" | xargs pip uninstall -yNote:
Run this step only if you want to prune unused providers. - Configure the Airflow database connection. To do so:
- Set the required environment variables:
export AIRFLOW_DB=<YOUR_DB_NAME> export DB_USER=<YOUR_DB_USER> export DB_PASSWORD=<YOUR_DB_PASSWORD> export DB_SCHEME=mysql+pymysql export AIRFLOW_ADMIN_USER=<ADMIN_USERNAME> export AIRFLOW_ADMIN_PASSWORD=<ADMIN_PASSWORD> export AIRFLOW_ADMIN_EMAIL=<ADMIN_EMAIL> - Build and export the SQLAlchemy database connection string. Run the following command:
export AIRFLOW_DATABASE_SQL_ALCHEMY_CONN="mysql+pymysql://${DB_USER}:${DB_PASSWORD}@${MYSQL_DB_HOST}:${MYSQL_DB_PORT}/${AIRFLOW_DB}"
- Set the required environment variables:
- Update
airflow.cfgby running the following commands:sed -i "s#\(sql_alchemy_conn = \).*#\1${AIRFLOW__DATABASE__SQL_ALCHEMY_CONN}#" $AIRFLOW_HOME/airflow.cfg sed -i "s#\(hostname_callable = \).*#\1socket.gethostname#" $AIRFLOW_HOME/airflow.cfg sed -i "s#\(auth_backends = \).*#\1airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session#" $AIRFLOW_HOME/airflow.cfg sed -i "s#\(executor = \).*#\1LocalExecutor#" $AIRFLOW_HOME/airflow.cfg - Initialize the Airflow database. Run the following command:
airflow db init - Create an Administrator Account. To do so:
- Create the Airflow administrator user by running the following command:
airflow users create \ --username $AIRFLOW_ADMIN_USER \ --firstname <ADMIN_FIRSTNAME> \ --lastname <ADMIN_LASTNAME> \ --role Admin \ --email $AIRFLOW_ADMIN_EMAIL \ --password $AIRFLOW_ADMIN_PASSWORD - (Optional) Run the following command to apply any pending database migrations:
airflow db migrate
- Create the Airflow administrator user by running the following command:
- Start Apache Airflow. To do so:
- Start Airflow in standalone mode:
airflow standalone - To run Airflow in the background, use the following command:
airflow standalone >> ./airflow.log 2>&1 & - Alternatively, start the webserver and scheduler as separate processes:
airflow webserver & airflow scheduler &Note:
- Replace all <...> placeholders with values specific to your environment.
- Ensure that Python 3.10 and pip are installed, a MySQL instance is running and accessible, and access to PyPI and GitHub is available.
- Follow the steps in order, providing the necessary values at each point.
- Start Airflow in standalone mode:
- (Optional) Set up a proxy, if your network requires it:
- Download the OM installer version 1.7.5. For more information, see the https://docs.open-metadata.org/latest/releases/all-releases.
- Create and update the
set_env.envenvironment variable file with the locations of the variables in the installation directory.Provide the following details for each variable in the file:OMD_INS_DIR=<OPEN METADATA INSTALLATION PATH> LOCAL_REPO_DIR=<OPEN METADATA INSTALLATION PATH>/local_repo AIRFLOW_HOME=<OPEN METADATA INSTALLATION PATH>/airflow PYTHON_VENV_PATH=<PYTHON PATH> HTTP_PROXY_URL=http://<PROXY HOST>:80 HTTPS_PROXY_URL=http://<PROXY HOST>:80 MYSQL_DB_HOST=<DATABASE HOST> MYSQL_DB_HOST=<DATABASE PORT> AUTHENTICATION_CLIENT_ID=<AUTHENTICATION_CLIENT_ID> AUTHENTICATION_AUTHORITY=<AUTHENTICATION_AUTHORITY>Consider the following file as an example:
OMD_INS_DIR=/scratch/openmetadata-ins-dir LOCAL_REPO_DIR=/scratch/openmetadata-ins-dir/local_repo AIRFLOW_HOME=/scratch/openmetadata-ins-dir/airflow PYTHON_INS_DIR=/scratch/openmetadata-ins-dir/python39
- Run the OM installer. For more information about the installation, see the https://docs.open-metadata.org/latest/quick-start.