RECORD_IN_FAST_SAMPLE
is a row function that returns a Boolean indicating whether the current record is in the sample of the records in the named state.
RECORD_IN_FAST_SAMPLE
function is:
RECORD_IN_FAST_SAMPLE(<double_literal>)
where double_literal specifies the size of the requested sample, expressed as a fraction of the total number of records. The sample size must be between 0.0 and 1.0 (inclusive). For example, a value of 0.1 would return approximately 10% of the records in the state.RECORD_IN_FAST_SAMPLE
is intended to be a fast and convenient function for reducing the size of data sent from the Dgraph to Studio (for example, when generating approximate visualizations like heat maps). However, the function does not compute a truly random sample. That is, it is not the case that each record in the collection has the same probability of being chosen, and it is not the case that each subset of k records has the same probability of being chosen as every other subset of k records.
Restrictions on function use
RECORD_IN_FAST_SAMPLE
function are:
WHERE
condition.CASE
expression or as an argument to another function.FROM
a single state. EQL will signal an error if RECORD_IN_FAST_SAMPLE
occurs in a statement FROM
another statement, FROM
a view, or FROM
a JOIN
or CROSS
.Any violation of these restrictions will result in an EQL checking error.
WHERE
clause:
RETURN Results AS SELECT TotalSales AS Sales FROM SalesState WHERE RECORD_IN_FAST_SAMPLE(0.1)
RECORD_IN_FAST_SAMPLE
may be used with any of the Boolean operators, as in this similar query:
RETURN Results AS SELECT TotalSales AS Sales FROM SalesState WHERE TotalSales IS NOT NULL AND RECORD_IN_FAST_SAMPLE(0.1)
Note on sampling and joins
DEFINE s1 AS SELECT ... FROM State1 WHERE RECORD_IN_FAST_SAMPLE(0.1); DEFINE s2 AS SELECT ... FROM State2 WHERE RECORD_IN_FAST_SAMPLE(0.1); RETURN s3 AS SELECT .. FROM s1 JOIN s2 ON (...)
The results of s1 and s2 contain roughly 10% of the records from State1 and State2, respectively. However, in general, the results of s3 will contain far fewer than 10% of the records it would have had if the previous statements had not been sampled.