4.1 Installing OpenMetadata

Install the following libraries, and softwares applications before installing OpenMetadata (OM):

Prerequisites

Install the following libraries, and softwares applications before installing OpenMetadata (OM):

  1. Oracle Linux version 8.
  2. Linux Libraries → "Development Tools" and gcc gcc-c++ sqlite-devel python39-devel cyrus-sasl-devel bzip2-devel libffi libffi-devel openssl-devel mysql mysql-devel
  3. MySQL version 8.0.3.2
  4. JDK version 17
  5. Python version 3.10 or greater
  6. Create the following databases in MySQL:
    CREATE DATABASE openmetadata_db;
    CREATE DATABASE airflow_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    CREATE USER 'openmetadata_user'@'%' IDENTIFIED BY 'openmetadata_password';
    CREATE USER 'airflow_user'@'%' IDENTIFIED BY 'airflow_pass';
    CREATE USER 'airflow_user'@'localhost' IDENTIFIED BY 'airflow_pass';
    GRANT ALL PRIVILEGES ON openmetadata_db.* TO 'openmetadata_user'@'%' WITH GRANT OPTION;
    GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user'@'%' WITH GRANT OPTION;
    GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user'@'localhost' WITH GRANT OPTION;
    commit;
    
  7. Install Apache Airflow Version 2.9.3. To do so:
    1. (Optional) Set up a proxy, if your network requires it:
      export http_proxy=<YOUR_PROXY_URL>
      export https_proxy=<YOUR_PROXY_URL>
      

      Note:

      Skip this step if a proxy is not required.
    2. Define the Airflow installation settings and environment variables. To do so:
      1. Specify the installation directory (for example, /your/airflow/install/dir).
      2. Identify the MySQL host and port (for example, localhost, 3306).
      3. Provide the Airflow database name, database user, and password (for example, airflow_db, airflow_user, airflow_pass).
      4. Define the Airflow administrator username, password, and email address.

      Example exports (replace the placeholders with values specific to your environment):

      export INSTALL_DIR=<YOUR_INSTALL_DIR>
      export AIRFLOW_HOME=$INSTALL_DIR/airflow
      export MYSQL_DB_HOST=<YOUR_MYSQL_DB_HOST>
      export MYSQL_DB_PORT=<YOUR_MYSQL_DB_PORT>
      
    3. Create the required Airflow directories. Run the following command:

      mkdir -p "$AIRFLOW_HOME" chmod 755 -R "$AIRFLOW_HOME"

    4. Create and activate a Python virtual environment. Run the following command:

      cd "$INSTALL_DIR" python3 -m venv venv source venv/bin/activate

    5. Upgrade pip to the latest version. Run the following command:

      python3 -m pip install --upgrade pip

    6. Install the required dependencies. Adjust the versions as required to align with your compatibility matrix:
      pip install "openmetadata-managed-apis~=1.7.7"
      pip install "openmetadata-ingestion[all]~=1.7.7"
      pip install "apache-airflow==2.9.3"
      pip install "python-daemon>=3.0.0"
      
    7. (Optional) Remove unneeded Apache Airflow providers. Run the following command:

      pip freeze | grep "apache-airflow-providers" | grep -v "docker|http" | xargs pip uninstall -y

      Note:

      Run this step only if you want to prune unused providers.
    8. Configure the Airflow database connection. To do so:
      1. Set the required environment variables:
        export AIRFLOW_DB=<YOUR_DB_NAME>
        export DB_USER=<YOUR_DB_USER>
        export DB_PASSWORD=<YOUR_DB_PASSWORD>
        export DB_SCHEME=mysql+pymysql
        
        export AIRFLOW_ADMIN_USER=<ADMIN_USERNAME>
        export AIRFLOW_ADMIN_PASSWORD=<ADMIN_PASSWORD>
        export AIRFLOW_ADMIN_EMAIL=<ADMIN_EMAIL>
        
      2. Build and export the SQLAlchemy database connection string. Run the following command:

        export AIRFLOW_DATABASE_SQL_ALCHEMY_CONN="mysql+pymysql://${DB_USER}:${DB_PASSWORD}@${MYSQL_DB_HOST}:${MYSQL_DB_PORT}/${AIRFLOW_DB}"

    9. Update airflow.cfg by running the following commands:
      sed -i "s#\(sql_alchemy_conn = \).*#\1${AIRFLOW__DATABASE__SQL_ALCHEMY_CONN}#" $AIRFLOW_HOME/airflow.cfg
      sed -i "s#\(hostname_callable = \).*#\1socket.gethostname#" $AIRFLOW_HOME/airflow.cfg
      sed -i "s#\(auth_backends = \).*#\1airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session#" $AIRFLOW_HOME/airflow.cfg
      sed -i "s#\(executor = \).*#\1LocalExecutor#" $AIRFLOW_HOME/airflow.cfg
      
    10. Initialize the Airflow database. Run the following command:

      airflow db init

    11. Create an Administrator Account. To do so:
      1. Create the Airflow administrator user by running the following command:

        airflow users create \ --username $AIRFLOW_ADMIN_USER \ --firstname <ADMIN_FIRSTNAME> \ --lastname <ADMIN_LASTNAME> \ --role Admin \ --email $AIRFLOW_ADMIN_EMAIL \ --password $AIRFLOW_ADMIN_PASSWORD

      2. (Optional) Run the following command to apply any pending database migrations:

        airflow db migrate

    12. Start Apache Airflow. To do so:
      1. Start Airflow in standalone mode:
        airflow standalone
        
      2. To run Airflow in the background, use the following command:

        airflow standalone >> ./airflow.log 2>&1 &

      3. Alternatively, start the webserver and scheduler as separate processes:
        airflow webserver &
        airflow scheduler &
        

        Note:

        • Replace all <...> placeholders with values specific to your environment.
        • Ensure that Python 3.10 and pip are installed, a MySQL instance is running and accessible, and access to PyPI and GitHub is available.
        • Follow the steps in order, providing the necessary values at each point.
  8. Install and Start OpenSearch:
    1. Navigate to your intended install directory. To do so, run the following command:

      cd <OpenSearch Directory>

    2. Download OpenSearch installer version 2.7.0 from the following site:

      https://artifacts.opensearch.org/releases/bundle/opensearch/2.7.0/opensearch-2.7.0-linux-x64.tar.gz

    3. Untar the release download and set permissions
       chmod 777 opensearch-2.7.0-linux-x64.tar.gz
      
               tar -xvf opensearch-2.7.0-linux-x64.tar.gz
      
               rm -f opensearch-2.7.0-linux-x64.tar.gz
      
               chmod -R 755 opensearch-2.7.0/
      
               echo 'plugins.security.disabled: true' >> opensearch-2.7.0/config/opensearch.yml
    4. Start OpenSearch as a background process. To do so, run the following command:

      cd opensearch-2.7.0/ ./bin/opensearch -d -p pid

      Note:

      Ensure that OpenSearch is fully started and operational before proceeding with the startup of OpenMetadata or any other dependent services.
  9. Create and update the set_env.env environment variable file with the locations of the variables in the installation directory.
    Provide the following details for each variable in the file:
    OMD_INS_DIR=<OPEN METADATA INSTALLATION PATH>
    LOCAL_REPO_DIR=<OPEN METADATA INSTALLATION PATH>/local_repo
    AIRFLOW_HOME=<OPEN METADATA INSTALLATION PATH>/airflow
    PYTHON_VENV_PATH=<PYTHON PATH>
    HTTP_PROXY_URL=http://<PROXY HOST>:80
    HTTPS_PROXY_URL=http://<PROXY HOST>:80
    MYSQL_DB_PORT=<DATABASE HOST>MYSQL_DB_PORT=<DATABASE PORT>
    AUTHENTICATION_CLIENT_ID=<AUTHENTICATION_CLIENT_ID>
    AUTHENTICATION_AUTHORITY=<AUTHENTICATION_AUTHORITY>

    Consider the following file as an example:

    OMD_INS_DIR=/scratch/openmetadata-ins-dir
    LOCAL_REPO_DIR=/scratch/openmetadata-ins-dir/local_repo
    AIRFLOW_HOME=/scratch/openmetadata-ins-dir/airflow
    PYTHON_INS_DIR=/scratch/openmetadata-ins-dir/python39
    

    Proxy image

  10. Install OpenMetadata v1.7.7. To do so:

    Note:

    Replace the example paths (such as /scratch/openmetadata-ins-dir) with your actual installation directories throughout the steps below.
    1. Create the OpenMetadata Database (MySQL). To do so:

      Note:

      This step is typically completed as part of Step 6 in this section. If not already performed, complete it before proceeding.

      Connect to your MySQL server and execute below commands:

       CREATE DATABASE openmetadata_db;

       GRANT ALL PRIVILEGES ON openmetadata_db.* TO 'openmetadata_user'@'%' IDENTIFIED BY '<YOUR_PASSWORD>' WITH GRANT OPTION;

      Note:

      Replace <YOUR_PASSWORD> with your chosen password.
    2. Create a Working Directory. To do so, you must choose a directory location for installation.

      mkdir openmetadata 

      cd openmetadata

    3. Download and Extract OpenMetadata. To do so:
      1. In the working directory created in the previous step, download the OpenMetadata 1.7.7 installation package and extract its contents.
      2. Run the following command:

        wget https://github.com/open-metadata/OpenMetadata/releases/download/1.7.7-release/openmetadata-1.7.7.tar.gz tar -xzf openmetadata-1.7.7.tar.gz

    4. Configure openmetadata.yaml to connect to the MySQL database. To do so:
      1. Edit the following file:

        <OM Working Directory>/conf/openmetadata.yaml

      2. Update the following placeholders for MySQL database connection to match your environment:
        driverClass: ${DB_DRIVER_CLASS:-com.mysql.cj.jdbc.Driver}
        
           user: ${DB_USER:-<YOUR_DB_USER>} # e.g., openmetadata_user
        
           password: ${DB_USER_PASSWORD:-<YOUR_DB_PASSWORD>} # e.g., strongpassword
        
           url: jdbc:${DB_SCHEME:mysql}://${DB_HOST:<YOUR_DB_HOST>}:${DB_PORT:<YOUR_DB_PORT>}/${OM_DATABASE:<YOUR_DB_NAME>}?${DB_PARAMS:-allowPublicKeyRetrieval=true&useSSL=false&serverTimezone=UTC}

        Note:

        • <YOUR_DB_USER> – MySQL username with privileges for OpenMetadata
        • <YOUR_DB_PASSWORD> – Password for the MySQL user
        • <YOUR_DB_HOST> – Hostname or IP address of the MySQL server (for example, localhost)
        • <YOUR_DB_PORT> – MySQL port number (default: 3306)
        • <YOUR_DB_NAME> – Database name (default: openmetadata_db)
    5. Set OpenSearch as the search type. To do so, run the following command:

      sed i 's#(searchType: ${SEARCH_TYPE: ".*#\1opensearch"}#' conf/openmetadata.yaml

    6. Prepare the OpenMetadata Database and Indexes. To do so, run the following command:

       cd <OM Working Directory>

      ./bootstrap/openmetadata-ops.sh drop-create

      Example:

      cd /scratch/openmetadata-ins-dir/openmetadata-1.7.7      # <== Example; use your chosen path    ./bootstrap/openmetadata-ops.sh drop-create

  11. Start Airflow, OpenSearch and Openmetadata services. To do so:

    Note:

    Replace all example paths shown below (for example, /scratch/openmetadata-ins-dir) with the base installation directory specific to your environment.
    1. Activate the python virtual environment by executing commands:
      export OMD_INS_DIR=<your_installation_dir>
         export PYTHON_INS_DIR=$OMD_INS_DIR/python310 <Python installed directory>
         export PYTHON_VENV_DIR=$PYTHON_INS_DIR/venv
         export AIRFLOW_INS_DIR=$OMD_INS_DIR/airflow
      export JAVA_HOME=$OMD_INS_DIR/jdk-17.0.2
         export PATH="$JAVA_HOME/bin:$PATH"
         java -version
         export PATH="$PYTHON_INS_DIR/bin:$PATH"
         python3 --version
         source $PYTHON_VENV_DIR/bin/activate
    2. Start Airflow by executing below commands:
      export AIRFLOW_HOME=$AIRFLOW_INS_DIR
         airflow version
         cd $OMD_INS_DIR
         airflow standalone >> ./airflow.log 2>&1 & echo $! > airflow.pid
    3. Deactivate the python virtual environment by executing command:

       deactivate

    4. Start OpenSearch service by executing below commands:
      cd $OMD_INS_DIR/opensearch-2.7.0/
         ./bin/opensearch -d -p pid
    5. Start OpenMetadata Service by executing below commands:
      cd $OMD_INS_DIR/openmetadata-1.7.7/
      
         ./bin/openmetadata-server-start.sh conf/openmetadata.yaml >> output.log 2>&1 & echo $! > output.pid

    Note:

    • Ensure that Airflow and OpenSearch are fully started and running successfully before launching OpenMetadata.
    • Run all commands as a user with appropriate permissions to access configuration files and log directories.
    • Review output.log, airflow.log, and the OpenSearch logs for any startup errors or warnings.
    • If an error occurs, verify all configuration paths, environment variable values, and service dependencies.
    • If custom installation directories are used, ensure that all references to those paths are consistently updated across configuration files, environment variables, and command executions.