Setting up cgroups for the Dgraph

Control groups, or cgroups, is a Linux kernel feature that enables you to allocate resources like CPU time and system memory to specific processes or groups of processes. If you need to host the Dgraph on nodes running Spark, you must use cgroups to ensure sufficient resources are available to it.

Warning: Because the Dgraph and Spark are both memory-intensive processes, hosting them on the same nodes is not recommended and should only be done if absolutely necessary. Although you can use the --memory-limit flag to set Dgraph memory consumption, Spark isn't aware of this and will continue to use as much memory as it needs, regardless of other processes.

To do this, you must enable cgroups in Hadoop and create one for YARN to limit the CPU percentage and amount of memory it can consume. Then, create a separate cgroup for the Dgraph to allocate appropriate amounts of memory and swap space to it.

To set up cgroups:

  1. If your system doesn't currently have the libcgroup package, install it as root.
    This creates /etc/cgconfig.conf, which configures cgroups.
  2. Enable the cgconfig service to run automatically:
    chkconfig cgconfig on
  3. Create a cgroup for YARN. This must be done within Hadoop. For instructions, refer to the documentation for your Hadoop distribution.
    The YARN cgroup should limit the amounts of CPU and memory allocated to all YARN containers. The appropriate limits to set depend on your system and the amount of data you will process. At a minimum, you should reserve the following for the Dgraph:
    • 10GB of RAM
    • 2 CPU cores
    The number of CPU cores YARN is allowed to use must be specified as a percentage. For example, on a quad-core machine, YARN should only get two of cores, or 50%. On an eight-core machine, YARN could get up to four of them, or 75%. When setting this amount, remember that allocating more cores to the Dgraph will boost its performance.
  4. Create a cgroup for the Dgraph by adding the following to cgconfig.conf:
    # Create a Dgraph cgroup named "dgraph"
    group dgraph {
        # Specify which users can edit this group
        perm {
            admin {
                uid = $BDD_USER;
            }
            # Specify which users can add tasks for this group
            task {
                uid = $BDD_USER;
            }
        }
        # Set the memory and swap limits for this group
        memory {
            # Set memory limit to 10GB
            memory.limit_in_bytes = 10000000000;
    
    
            # Set memory + swap limit to 12GB
            memory.memsw.limit_in_bytes = 12000000000;
        }
    }
    Where $BDD_USER is the name of the bdd user.
    Note: The values given for memory.limit_in_bytes and memory.memsw.limit_in_bytes above are the absolute minimum requirements. You should use higher values, if possible.
  5. Restart cfconfig to enable your changes.