The size of the input for a statement can have a big impact on the
evaluation time of the query.
The input for a statement is defined by the
FROM clause. When possible, use an already completed
result from another statement instead of using collection records, to avoid
inputting unnecessary records.
Consider the following queries. In the first query, the input to each
statement is of a size on the order of the navigation state. In the first two
statements, Sums and Totals, the data is aggregated at two levels of
granularity. In the last statement, the data set is accessed again for the sole
purpose of identifying the month/year combinations that are present in the
data. The computations of interest are derived from previously-computed
results:
DEFINE Sums AS SELECT
SUM(a) AS MonthlyTotal
FROM SaleState
GROUP BY month,year;
DEFINE Totals AS SELECT
SUM(a) AS YearlyTotal
FROM SaleState
GROUP BY year;
DEFINE Result AS SELECT
Sums[month,year].MonthlyTotal AS MonthlyTotal,
Sums[month,year].MonthlyTotal/Totals[year].YearlyTotal AS Fraction
FROM SaleState
GROUP BY month,year
In the following rewritten version of the same query, the Dgraph
database for this data set is accessed only once. The first statement accesses
the database to compute the monthly totals. The second statement has been
modified to compute yearly totals using the results of the first statement.
Assuming that there are many records per month, the savings could be multiple
orders of magnitude. Finally, the last statement has also been modified to use
the results of the first statement. The first statement has already identified
all of the valid month/year combinations in the data set. Rather than accessing
the broader data set (possibly millions of records) just to identify the valid
combinations, the month/year pairs are read from the much smaller (probably
several dozen records) previous result:
DEFINE Sums AS SELECT
SUM(a) AS MonthlyTotal
FROM SalesState
GROUP BY month,year;
DEFINE Totals AS SELECT
SUM(MonthlyTotal) AS YearlyTotal
FROM Sums
GROUP year;
DEFINE Result AS SELECT
MonthlyTotal AS MonthlyTotal,
MonthlyTotal/Totals[year].YearlyTotal AS Fraction
FROM Sums
Defining constants independent of data set size
A common practice is to define constants for a query
through a single group, as shown in the first query below. The input for this
query is the entire navigation state, even though nothing from the input is
used:
DEFINE Constants AS SELECT
500 AS DefaultQuota
FROM SaleState
GROUP
Since none of the input is actually needed, restrict the input to the
smallest size possible with a very restrictive filter, such as the one shown in
this second example:
DEFINE Constants AS SELECT
500 AS DefaultQuota
FROM SaleState
WHERE FactSales_ProductKey IS NOT NULL
GROUP
In the example, FROM SalesState is the unique property key for the
Sales collection.