JOIN
clauses allow records from multiple statements and/or named states to be combined, based on a relationship between certain attributes in these statements.
JOIN
clauses, which conform to a subset of the SQL standard, do a join with the specified join condition. The join condition may be an arbitrary Boolean expression referring to the attributes in the FROM
statement. The expression must be enclosed in parentheses.
The JOIN
clause always modifies a FROM
clause. Two named sources (one or both of which can be named states) can be indicated in the FROM
clause. Fields must be dot-qualified to indicate which source they come from, except in queries from a single table.
Self-join is supported. Statement aliasing is required for self-join.
Both input tables must result from DEFINE
or RETURN
statements (that is, from intermediate results).
Any number of joins can be performed in a single statement.
JOIN
is as follows:
FROM <statement1> [alias] [INNER,CROSS,LEFT,RIGHT,FULL] JOIN <statement2> [alias] ON (Boolean-expression) [JOIN <statementN> [alias] ON (Boolean-expression)]*where statement is either a statement or a named state. Note that you can put multiple
JOIN
clauses under a FROM
clause, but there must be exactly one FROM
clause in any statement.Types of joins
EQL supports the following types of joins:
INNER JOIN
joins records on the left and right sides, then filters the result records by the join condition. That means that only rows for which the join condition is TRUE
are included. If you do not specify the join type, JOIN
defaults to INNER JOIN
. Note that the INNER
keyword can be used only with JOIN
, and EQL will throw an error if it is used with the other join types.LEFT JOIN
, RIGHT JOIN
, and FULL JOIN
(collectively called outer joins) extend the result of an INNER JOIN
with records from a side for which no record on the other side matched the join condition. When such an additional record is included from one side, the record in the join result contains NULLs for all attributes from the other side. LEFT JOIN
includes all such rows from the left side, RIGHT JOIN
includes all such rows from the right side, and FULL JOIN
includes all such rows from either side.CROSS JOIN
is the Cartesian product of the left and right sides. Each result record has the assignments from both of the corresponding records from the two sides.Keep in mind that if not used correctly, joins can cause the Dgraph to grow beyond available RAM because they can easily create very large results. For example, a CROSS JOIN
of a result with 100 records and a result with 200 records would contain 20,000 records. Two best practices are to avoid CROSS JOIN
if possible and to be careful with ON
conditions so that the number of results are reasonable.
INNER JOIN example
INNER JOIN
example finds employees whose sales in a particular subcategory account for more than 10% of that subcategory's total:
DEFINE EmployeeTotals AS SELECT ARB(DimEmployee_FullName) AS Name, SUM(FactSales_SalesAmount) AS Total FROM SaleState GROUP BY DimEmployee_EmployeeKey, ProductSubcategoryName; DEFINE SubcategoryTotals AS SELECT SUM(FactSales_SalesAmount) AS Total FROM SaleState GROUP BY ProductSubcategoryName; RETURN Stars AS SELECT EmployeeTotals.Name AS Name, EmployeeTotals.ProductSubcategoryName AS Subcategory, 100 * EmployeeTotals.Total / SubcategoryTotals.Total AS Pct FROM EmployeeTotals INNER JOIN SubcategoryTotals ON (EmployeeTotals.ProductSubcategoryName = SubcategoryTotals.ProductSubcategoryName) HAVING Pct > 10
Self-join example
INNER JOIN
computes cumulative daily sales totals per employee:
DEFINE Days AS SELECT FactSales_OrderDateKey AS DateKey, DimEmployee_EmployeeKey AS EmployeeKey, ARB(DimEmployee_FullName) AS EmployeeName, SUM(FactSales_SalesAmount) AS DailyTotal FROM SaleState GROUP BY DateKey, EmployeeKey; RETURN CumulativeDays AS SELECT SUM(PreviousDays.DailyTotal) AS CumulativeTotal, Day.DateKey AS DateKey, Day.EmployeeKey AS EmployeeKey, ARB(Day.EmployeeName) AS EmployeeName FROM Days Day JOIN Days PreviousDays ON (PreviousDays.DateKey <= Day.DateKey) GROUP BY DateKey, EmployeeKey
LEFT JOIN examples
LEFT JOIN
example computes the top 5 subcategories along with an Other bucket, for use in a pie chart:
DEFINE Totals AS SELECT SUM(FactSales_SalesAmount) AS Total FROM SaleState GROUP BY ProductSubcategoryName; DEFINE Top5 AS SELECT ARB(Total) AS Total FROM Totals GROUP BY ProductSubcategoryName ORDER BY Total DESC PAGE(0,5); RETURN Chart AS SELECT COALESCE(Top5.ProductSubcategoryName, 'Other') AS Subcategory, SUM(Totals.Total) AS Total FROM Totals LEFT JOIN Top5 ON (Totals.ProductSubcategoryName = Top5.ProductSubcategoryName) GROUP BY Subcategory
LEFT JOIN
computes metrics for each product in a particular region, ensuring all products appear in the list even if they have never been sold in that region:
DEFINE Product AS SELECT ProductAlternateKey AS Key, ARB(ProductName) AS Name FROM SaleState GROUP BY Key; DEFINE RegionTrans AS SELECT ProductAlternateKey AS ProductKey, FactSales_SalesAmount AS Amount FROM SaleState WHERE DimSalesTerritory_SalesTerritoryRegion='United Kingdom'; RETURN Results AS SELECT Product.Key AS ProductKey, ARB(Product.Name) AS ProductName, COALESCE(SUM(RegionTrans.Amount), 0) AS SalesTotal, COUNT(RegionTrans.Amount) AS TransactionCount FROM Product LEFT JOIN RegionTrans ON (Product.Key = RegionTrans.ProductKey) GROUP BY ProductKey
FULL JOIN example
FULL JOIN
computes the top 10 employees' sales totals for the top 10 products, ensuring that each employee and each product appears in the result:
DEFINE TopEmployees AS SELECT DimEmployee_EmployeeKey AS Key, ARB(DimEmployee_FullName) AS Name, SUM(FactSales_SalesAmount) AS SalesTotal FROM SaleState GROUP BY Key ORDER BY SalesTotal DESC PAGE (0,10); DEFINE TopProducts AS SELECT ProductAlternateKey AS Key, ARB(ProductName) AS Name, SUM(FactSales_SalesAmount) AS SalesTotal FROM SaleState GROUP BY Key ORDER BY SalesTotal DESC PAGE (0,10); DEFINE EmployeeProductTotals AS SELECT DimEmployee_EmployeeKey AS EmployeeKey, ProductAlternateKey AS ProductKey, SUM(FactSales_SalesAmount) AS SalesTotal FROM SaleState GROUP BY EmployeeKey, ProductKey HAVING [EmployeeKey] IN TopEmployees AND [ProductKey] IN TopProducts; RETURN Results AS SELECT TopEmployees.Key AS EmployeeKey, TopEmployees.Name AS EmployeeName, TopEmployees.SalesTotal AS EmployeeTotal, TopProducts.Key AS ProductKey, TopProducts.Name AS ProductName, TopProducts.SalesTotal AS ProductTotal, EmployeeProductTotals.SalesTotal AS EmployeeProductTotal FROM EmployeeProductTotals FULL JOIN TopEmployees ON (EmployeeProductTotals.EmployeeKey = TopEmployees.Key) FULL JOIN TopProducts ON (EmployeeProductTotals.ProductKey = TopProducts.Key)
CROSS JOIN example
CROSS JOIN
example finds the percentage of total sales each product subcategory represents:
DEFINE GlobalTotal AS SELECT SUM(FactSales_SalesAmount) AS GlobalTotal FROM SaleState GROUP; DEFINE SubcategoryTotals AS SELECT SUM(FactSales_SalesAmount) AS SubcategoryTotal FROM SaleState GROUP BY ProductSubcategoryName; RETURN SubcategoryContributions AS SELECT SubcategoryTotals.ProductSubcategoryName AS Subcategory, SubcategoryTotals.SubcategoryTotal / GlobalTotal.GlobalTotal AS Contribution FROM SubcategoryTotals CROSS JOIN GlobalTotal