RECORD_IN_FAST_SAMPLE

RECORD_IN_FAST_SAMPLE is a row function that returns a Boolean indicating whether the current record is in the sample of the records in the named state.

The syntax of the RECORD_IN_FAST_SAMPLE function is:
RECORD_IN_FAST_SAMPLE(<double_literal>)
where double_literal specifies the size of the requested sample, expressed as a fraction of the total number of records. The sample size must be between 0.0 and 1.0 (inclusive). For example, a value of 0.1 would return approximately 10% of the records in the state.

RECORD_IN_FAST_SAMPLE is intended to be a fast and convenient function for reducing the size of data sent from the Dgraph to Studio (for example, when generating approximate visualizations like heat maps). However, the function does not compute a truly random sample. That is, it is not the case that each record in the collection has the same probability of being chosen, and it is not the case that each subset of k records has the same probability of being chosen as every other subset of k records.

Restrictions on function use

The restrictions for using the RECORD_IN_FAST_SAMPLE function are:
  • It may appear only as a per-statement WHERE condition.
  • It may not appear inside a CASE expression or as an argument to another function.
  • It is allowed only in statements that are FROM a single state. EQL will signal an error if RECORD_IN_FAST_SAMPLE occurs in a statement FROM another statement, FROM a view, or FROM a JOIN or CROSS.

Any violation of these restrictions will result in an EQL checking error.

This simple example illustrates the use of the function with the WHERE clause:
RETURN Results AS
SELECT TotalSales AS Sales
FROM SalesState
WHERE RECORD_IN_FAST_SAMPLE(0.1)
RECORD_IN_FAST_SAMPLE may be used with any of the Boolean operators, as in this similar query:
RETURN Results AS
SELECT TotalSales AS Sales
FROM SalesState
WHERE TotalSales IS NOT NULL AND RECORD_IN_FAST_SAMPLE(0.1)

Note on sampling and joins

Although you may not sample the results of a join (see the third restriction above), you may join the results of sampling. However, be aware that you may not get the desired results. For example, consider this query:
DEFINE s1 AS
  SELECT ...
  FROM State1
  WHERE RECORD_IN_FAST_SAMPLE(0.1);
DEFINE s2 AS
  SELECT ...
  FROM State2
  WHERE RECORD_IN_FAST_SAMPLE(0.1);
RETURN s3 AS
  SELECT ..
  FROM s1 JOIN s2 ON (...)

The results of s1 and s2 contain roughly 10% of the records from State1 and State2, respectively. However, in general, the results of s3 will contain far fewer than 10% of the records it would have had if the previous statements had not been sampled.