JOIN clause

JOIN clauses allow records from multiple statements and/or named states to be combined, based on a relationship between certain attributes in these statements.

JOIN clauses, which conform to a subset of the SQL standard, do a join with the specified join condition. The join condition may be an arbitrary Boolean expression referring to the attributes in the FROM statement. The expression must be enclosed in parentheses.

The JOIN clause always modifies a FROM clause. Two named sources (one or both of which can be named states) can be indicated in the FROM clause. Fields must be dot-qualified to indicate which source they come from, except in queries from a single table.

Self-join is supported. Statement aliasing is required for self-join.

Both input tables must result from DEFINE or RETURN statements (that is, from intermediate results).

Any number of joins can be performed in a single statement.

The syntax of JOIN is as follows:
FROM <statement1> [alias]
   [INNER,CROSS,LEFT,RIGHT,FULL] JOIN <statement2> [alias]
   ON (Boolean-expression) [JOIN <statementN> [alias] ON (Boolean-expression)]*
where statement is either a statement or a named state. Note that you can put multiple JOIN clauses under a FROM clause, but there must be exactly one FROM clause in any statement.

Types of joins

EQL supports the following types of joins:

  • INNER JOIN: INNER JOIN joins records on the left and right sides, then filters the result records by the join condition. That means that only rows for which the join condition is TRUE are included. If you do not specify the join type, JOIN defaults to INNER JOIN. Note that the INNER keyword can be used only with JOIN, and EQL will throw an error if it is used with the other join types.
  • LEFT JOIN, RIGHT JOIN, and FULL JOIN: LEFT JOIN, RIGHT JOIN, and FULL JOIN (collectively called outer joins) extend the result of an INNER JOIN with records from a side for which no record on the other side matched the join condition. When such an additional record is included from one side, the record in the join result contains NULLs for all attributes from the other side. LEFT JOIN includes all such rows from the left side, RIGHT JOIN includes all such rows from the right side, and FULL JOIN includes all such rows from either side.
  • CROSS JOIN: The result of CROSS JOIN is the Cartesian product of the left and right sides. Each result record has the assignments from both of the corresponding records from the two sides.

Keep in mind that if not used correctly, joins can cause the Dgraph to grow beyond available RAM because they can easily create very large results. For example, a CROSS JOIN of a result with 100 records and a result with 200 records would contain 20,000 records. Two best practices are to avoid CROSS JOIN if possible and to be careful with ON conditions so that the number of results are reasonable.

INNER JOIN example

The following INNER JOIN example finds employees whose sales in a particular subcategory account for more than 10% of that subcategory's total:
DEFINE EmployeeTotals AS 
SELECT 
   ARB(DimEmployee_FullName) AS Name, 
   SUM(FactSales_SalesAmount) AS Total
FROM SaleState
GROUP BY DimEmployee_EmployeeKey, ProductSubcategoryName;

DEFINE SubcategoryTotals AS 
SELECT 
   SUM(FactSales_SalesAmount) AS Total 
FROM SaleState
GROUP BY ProductSubcategoryName;

RETURN Stars AS 
SELECT 
   EmployeeTotals.Name AS Name, 
   EmployeeTotals.ProductSubcategoryName AS Subcategory, 
   100 * EmployeeTotals.Total / SubcategoryTotals.Total AS Pct 
FROM EmployeeTotals 
   INNER JOIN SubcategoryTotals 
   ON (EmployeeTotals.ProductSubcategoryName = SubcategoryTotals.ProductSubcategoryName) 
HAVING Pct > 10

Self-join example

The following self-join using INNER JOIN computes cumulative daily sales totals per employee:
DEFINE Days AS 
SELECT 
   FactSales_OrderDateKey AS DateKey, 
   DimEmployee_EmployeeKey AS EmployeeKey, 
   ARB(DimEmployee_FullName) AS EmployeeName, 
   SUM(FactSales_SalesAmount) AS DailyTotal
FROM SaleState
GROUP BY DateKey, EmployeeKey;

RETURN CumulativeDays AS 
SELECT 
   SUM(PreviousDays.DailyTotal) AS CumulativeTotal, 
   Day.DateKey AS DateKey, 
   Day.EmployeeKey AS EmployeeKey, 
   ARB(Day.EmployeeName) AS EmployeeName 
FROM Days Day 
   JOIN Days PreviousDays 
   ON (PreviousDays.DateKey <= Day.DateKey) 
GROUP BY DateKey, EmployeeKey

LEFT JOIN examples

The following LEFT JOIN example computes the top 5 subcategories along with an Other bucket, for use in a pie chart:
DEFINE Totals AS 
SELECT 
   SUM(FactSales_SalesAmount) AS Total
FROM SaleState
GROUP BY ProductSubcategoryName;

DEFINE Top5 AS 
SELECT 
   ARB(Total) AS Total 
FROM Totals
GROUP BY ProductSubcategoryName
ORDER BY Total DESC PAGE(0,5);

RETURN Chart AS 
SELECT 
   COALESCE(Top5.ProductSubcategoryName, 'Other') AS Subcategory, 
   SUM(Totals.Total) AS Total 
FROM Totals 
   LEFT JOIN Top5 
   ON (Totals.ProductSubcategoryName = Top5.ProductSubcategoryName) 
GROUP BY Subcategory 
The following LEFT JOIN computes metrics for each product in a particular region, ensuring all products appear in the list even if they have never been sold in that region:
DEFINE Product AS 
SELECT 
   ProductAlternateKey AS Key, 
   ARB(ProductName) AS Name
FROM SaleState
GROUP BY Key;

DEFINE RegionTrans AS 
SELECT 
   ProductAlternateKey AS ProductKey, 
   FactSales_SalesAmount AS Amount 
FROM SaleState
WHERE DimSalesTerritory_SalesTerritoryRegion='United Kingdom';

RETURN Results AS 
SELECT 
   Product.Key AS ProductKey, 
   ARB(Product.Name) AS ProductName, 
   COALESCE(SUM(RegionTrans.Amount), 0) AS SalesTotal, 
   COUNT(RegionTrans.Amount) AS TransactionCount 
FROM Product 
   LEFT JOIN RegionTrans 
   ON (Product.Key = RegionTrans.ProductKey) 
GROUP BY ProductKey

FULL JOIN example

The following FULL JOIN computes the top 10 employees' sales totals for the top 10 products, ensuring that each employee and each product appears in the result:
DEFINE TopEmployees AS 
SELECT 
   DimEmployee_EmployeeKey AS Key, 
   ARB(DimEmployee_FullName) AS Name, 
   SUM(FactSales_SalesAmount) AS SalesTotal 
FROM SaleState
GROUP BY Key 
ORDER BY SalesTotal DESC 
PAGE (0,10);

DEFINE TopProducts AS 
SELECT 
   ProductAlternateKey AS Key, 
   ARB(ProductName) AS Name, 
   SUM(FactSales_SalesAmount) AS SalesTotal 
FROM SaleState
GROUP BY Key 
ORDER BY SalesTotal DESC 
PAGE (0,10);

DEFINE EmployeeProductTotals AS 
SELECT 
   DimEmployee_EmployeeKey AS EmployeeKey, 
   ProductAlternateKey AS ProductKey, 
   SUM(FactSales_SalesAmount) AS SalesTotal 
FROM SaleState
GROUP BY EmployeeKey, ProductKey 
HAVING [EmployeeKey] IN TopEmployees AND [ProductKey] IN TopProducts;

RETURN Results AS 
SELECT 
   TopEmployees.Key AS EmployeeKey, 
   TopEmployees.Name AS EmployeeName, 
   TopEmployees.SalesTotal AS EmployeeTotal, 
   TopProducts.Key AS ProductKey, 
   TopProducts.Name AS ProductName, 
   TopProducts.SalesTotal AS ProductTotal, 
   EmployeeProductTotals.SalesTotal AS EmployeeProductTotal 
FROM EmployeeProductTotals 
   FULL JOIN TopEmployees 
   ON (EmployeeProductTotals.EmployeeKey = TopEmployees.Key) 
   FULL JOIN TopProducts 
   ON (EmployeeProductTotals.ProductKey = TopProducts.Key)

CROSS JOIN example

The following CROSS JOIN example finds the percentage of total sales each product subcategory represents:
DEFINE GlobalTotal AS 
SELECT 
   SUM(FactSales_SalesAmount) AS GlobalTotal
FROM SaleState
GROUP;

DEFINE SubcategoryTotals AS 
SELECT 
   SUM(FactSales_SalesAmount) AS SubcategoryTotal 
FROM SaleState
GROUP BY ProductSubcategoryName;

RETURN SubcategoryContributions AS 
SELECT 
   SubcategoryTotals.ProductSubcategoryName AS Subcategory, 
   SubcategoryTotals.SubcategoryTotal / GlobalTotal.GlobalTotal AS Contribution 
FROM SubcategoryTotals 
   CROSS JOIN GlobalTotal