4.1 Installing OpenMetadata

Install the following libraries, and softwares applications before installing OpenMetadata (OM):

Prerequisites

Install the following libraries, and softwares applications before installing OpenMetadata (OM):

  1. Oracle Linux version 8.
  2. Linux Libraries → "Development Tools" and gcc gcc-c++ sqlite-devel python39-devel cyrus-sasl-devel bzip2-devel libffi libffi-devel openssl-devel mysql mysql-devel
  3. MySQL version 8.0.3.2
  4. JDK version 17
  5. Python version 3.10
  6. Create the following databases in MySQL:
    CREATE DATABASE openmetadata_db;
    CREATE DATABASE airflow_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    CREATE USER 'openmetadata_user'@'%' IDENTIFIED BY 'openmetadata_password';
    CREATE USER 'airflow_user'@'%' IDENTIFIED BY 'airflow_pass';
    CREATE USER 'airflow_user'@'localhost' IDENTIFIED BY 'airflow_pass';
    GRANT ALL PRIVILEGES ON openmetadata_db.* TO 'openmetadata_user'@'%' WITH GRANT OPTION;
    GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user'@'%' WITH GRANT OPTION;
    GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user'@'localhost' WITH GRANT OPTION;
    commit;
    
  7. Install Apache Airflow Version 2.8.4. To do so:
    1. (Optional) Set up a proxy, if your network requires it:
      export http_proxy=<YOUR_PROXY_URL>
      export https_proxy=<YOUR_PROXY_URL>
      

      Note:

      Skip this step if a proxy is not required.
    2. Define the Airflow installation settings and environment variables. To do so:
      1. Specify the installation directory (for example, /your/airflow/install/dir).
      2. Identify the MySQL host and port (for example, localhost, 3306).
      3. Provide the Airflow database name, database user, and password (for example, airflow_db, airflow_user, airflow_pass).
      4. Define the Airflow administrator username, password, and email address.

      Example exports (replace the placeholders with values specific to your environment):

      export INSTALL_DIR=<YOUR_INSTALL_DIR>
      export AIRFLOW_HOME=$INSTALL_DIR/airflow
      export MYSQL_DB_HOST=<YOUR_MYSQL_DB_HOST>
      export MYSQL_DB_PORT=<YOUR_MYSQL_DB_PORT>
      
    3. Create the required Airflow directories. Run the following command:

      mkdir -p "$AIRFLOW_HOME" chmod 755 -R "$AIRFLOW_HOME"

    4. Create and activate a Python virtual environment. Run the following command:

      cd "$INSTALL_DIR" python3 -m venv venv source venv/bin/activate

    5. Upgrade pip to the latest version. Run the following command:

      python3 -m pip install --upgrade pip

    6. Install the required dependencies. Adjust the versions as required to align with your compatibility matrix:
      pip install "openmetadata-managed-apis~=1.7.5"
      pip install "openmetadata-ingestion[all]~=1.7.5"
      pip install "apache-airflow==2.8.4"
      pip install "python-daemon>=3.0.0"
      
    7. (Optional) Remove unneeded Apache Airflow providers. Run the following command:

      pip freeze | grep "apache-airflow-providers" | grep -v "docker|http" | xargs pip uninstall -y

      Note:

      Run this step only if you want to prune unused providers.
    8. Configure the Airflow database connection. To do so:
      1. Set the required environment variables:
        export AIRFLOW_DB=<YOUR_DB_NAME>
        export DB_USER=<YOUR_DB_USER>
        export DB_PASSWORD=<YOUR_DB_PASSWORD>
        export DB_SCHEME=mysql+pymysql
        
        export AIRFLOW_ADMIN_USER=<ADMIN_USERNAME>
        export AIRFLOW_ADMIN_PASSWORD=<ADMIN_PASSWORD>
        export AIRFLOW_ADMIN_EMAIL=<ADMIN_EMAIL>
        
      2. Build and export the SQLAlchemy database connection string. Run the following command:

        export AIRFLOW_DATABASE_SQL_ALCHEMY_CONN="mysql+pymysql://${DB_USER}:${DB_PASSWORD}@${MYSQL_DB_HOST}:${MYSQL_DB_PORT}/${AIRFLOW_DB}"

    9. Update airflow.cfg by running the following commands:
      sed -i "s#\(sql_alchemy_conn = \).*#\1${AIRFLOW__DATABASE__SQL_ALCHEMY_CONN}#" $AIRFLOW_HOME/airflow.cfg
      sed -i "s#\(hostname_callable = \).*#\1socket.gethostname#" $AIRFLOW_HOME/airflow.cfg
      sed -i "s#\(auth_backends = \).*#\1airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session#" $AIRFLOW_HOME/airflow.cfg
      sed -i "s#\(executor = \).*#\1LocalExecutor#" $AIRFLOW_HOME/airflow.cfg
      
    10. Initialize the Airflow database. Run the following command:

      airflow db init

    11. Create an Administrator Account. To do so:
      1. Create the Airflow administrator user by running the following command:

        airflow users create \ --username $AIRFLOW_ADMIN_USER \ --firstname <ADMIN_FIRSTNAME> \ --lastname <ADMIN_LASTNAME> \ --role Admin \ --email $AIRFLOW_ADMIN_EMAIL \ --password $AIRFLOW_ADMIN_PASSWORD

      2. (Optional) Run the following command to apply any pending database migrations:

        airflow db migrate

    12. Start Apache Airflow. To do so:
      1. Start Airflow in standalone mode:
        airflow standalone
        
      2. To run Airflow in the background, use the following command:

        airflow standalone >> ./airflow.log 2>&1 &

      3. Alternatively, start the webserver and scheduler as separate processes:
        airflow webserver &
        airflow scheduler &
        

        Note:

        • Replace all <...> placeholders with values specific to your environment.
        • Ensure that Python 3.10 and pip are installed, a MySQL instance is running and accessible, and access to PyPI and GitHub is available.
        • Follow the steps in order, providing the necessary values at each point.
  8. Download the OM installer version 1.7.5. For more information, see the https://docs.open-metadata.org/latest/releases/all-releases.
  9. Create and update the set_env.env environment variable file with the locations of the variables in the installation directory.
    Provide the following details for each variable in the file:
    OMD_INS_DIR=<OPEN METADATA INSTALLATION PATH>
    LOCAL_REPO_DIR=<OPEN METADATA INSTALLATION PATH>/local_repo
    AIRFLOW_HOME=<OPEN METADATA INSTALLATION PATH>/airflow
    PYTHON_VENV_PATH=<PYTHON PATH>
    HTTP_PROXY_URL=http://<PROXY HOST>:80
    HTTPS_PROXY_URL=http://<PROXY HOST>:80
    MYSQL_DB_HOST=<DATABASE HOST>
    MYSQL_DB_HOST=<DATABASE PORT>
    AUTHENTICATION_CLIENT_ID=<AUTHENTICATION_CLIENT_ID>
    AUTHENTICATION_AUTHORITY=<AUTHENTICATION_AUTHORITY>

    Consider the following file as an example:

    OMD_INS_DIR=/scratch/openmetadata-ins-dir
    LOCAL_REPO_DIR=/scratch/openmetadata-ins-dir/local_repo
    AIRFLOW_HOME=/scratch/openmetadata-ins-dir/airflow
    PYTHON_INS_DIR=/scratch/openmetadata-ins-dir/python39
    

    Proxy image

  10. Run the OM installer. For more information about the installation, see the https://docs.open-metadata.org/latest/quick-start.