Data mining FAQs

Why data mining?

  1. What is the primary motivation for applying statistical data mining to the AERS database?

    The Oracle Empirica Signal application acts as a screening device to sort through several million potential drug/event combinations. You can use the application to prioritize tasks and decide what is random noise versus associations that need further investigation. The output is available as new questions are raised.

  2. How might the availability of a safety data mining system affect the way that safety evaluators go about doing their work?

    The data mining feature identifies and investigates potential problems in spontaneous safety data. You can use data mining to identify potential signals, prioritize pharmacovigilance activities, and complete evidence-based reviews.

Interpretation of results

  1. How can you determine that you are seeing more of some AE than expected for a particular drug when you do not know the number of prescriptions filled for that drug (the denominator)?

    The technique of disproportionality analysis, upon which the Oracle Empirica Signal application is based, uses all reports that mention a drug as a surrogate for the exposure.

  2. Does the lack of exposure data make it impossible to differentiate between a relatively safe drug that reports few events and another drug that reports more events?

    The method looks at particular adverse events rather than overall reporting rate. If prescription data is available, then overall reporting rate is worth looking at. In most circumstances, however, the method is less biased than a reporting rate for a particular AE.

  3. Does the fact that you do not know the reporting rate for a drug and/or event (for example, what percentage of true drug-related adverse events are actually reported to FDA) create a problem in obtaining quantitative estimates of product-event association?

    The fact that the reporting rate is unusually high or low for a particular drug does not affect the results as long as the reporting rate for that drug is uniformly high or low across all events.  An unusually high or low reporting rate for a particular event does not affect the results as long as the reporting rate for that event is uniformly high or low across all drugs. It is possible that publicity about a particular problem with a drug might cause the report rates for particular product-event combinations to increase without corresponding increases in the overall report rates for the drug or the event. If that happens, the corresponding associations from disproportionality methods, such as MGPS, becomes biased upward.

Counts

  1. Where does the notion of an expected count come from?

    Observed counts for each product-event combination of interest:

    Event Drug of interest All other drugs

    Event of interest

    a=N

    b

    All other events

    c

    d

    Expected counts (E):

    Event Drug of interest All other drugs

    Event of interest

    E(a) = ((a+b)(a+c))/(a+b+c+d)

    E(b) = ((a+b)(b+d))/(a+b+c+d)

    All other events

    E(c) = ((c+d)(a+c))/(a+b+c+d)

    E(d) = ((c+d)(b+d))/(a+b+c+d)

    The quantity E(a) is the number of product-event combinations of interest that would be expected if the event of interest appears in reports independently of the drug of interest.

    Observed count / Expected count = Relative Reporting Ratio (N/E = RR).

    Do not confuse the use of RR as an abbreviation for Relative Reporting Ratio in the Oracle Empirica Signal application with the concept of relative risk, which cannot be estimated from spontaneous report frequencies. Relative Reporting Ratio is also different from reporting rate, which can only be estimated if you know both the number of reported and unreported events. RR in the context of data mining of a spontaneous report database is the ratio of the number of occurrences of the product-event combination of interest to the expected number if the proportion of reports with that event among reports involving that drug were the same as the overall proportion in the database. RR is a disproportionality measure.

  2. What does it mean if the expected count is less than 1?

    Based on the total number of reports separately for the drug and event, it is expected, statistically, that there would be less than one report of the combination if the drug and event are not associated. The expected count (E) for a product-event combination can be less than 1 when either the number of reports for the drug, or the number of reports for the AE, is small. It is common for E to be less than 1 (sometimes even E < .001) for a rarely reported drug combined with a rarely reported event. This causes the ratio N/E to fluctuate wildly when N goes from 0 to 1 or more.

  3. How does the EBGM statistic differ from the concept of an observed count (for a product-event combination) divided by an expected count?

    EBGM is an improved estimate of the Relative Reporting Ratio (RR). Technically, EBGM is defined as the exponential of expectation value of log(RR), where, in this case, the probabilities determining expected value are not based on assuming independence of drugs and events, but are based on statistical measures of the dependencies exhibited in the database overall and for each product-event combination. (In Bayesian terminology, this is called the posterior distribution.) EBGM has the property that it is nearly identical to N/E when the counts are moderately large, but it shrinks towards the average value of N/E (typically ~1.0) when N/E is unreliable, because of instability issues with small counts.

    The posterior probability distribution also supports the calculation of lower and upper 95% confidence limits (EB05, EB95) for the relative reporting ratio.

EB05 and EB95

  1. Explain the meaning of EB05 and EB95.

    The Oracle Empirica Signal application distinguishes between the observed value of RR = N/E and the true value of the Relative Reporting Ratio that would be observed if a much larger database (so large that sampling error would be negligible) were to be drawn from the same conceptual population of reports. This unknown quantity, sometimes abbreviated as true RR or true N/E, can be bounded probabilistically:

    EB05 is a value such that there is about a 5% probability that the true value of N/E lies below it.

    EB95 is a value such that there is about a 5% probability that the true value of N/E lies above it.

    The interval from EB05 to EB95 is the 90% confidence interval. The confidence interval is a way to describe the uncertainty in an estimate due to small sample size (the larger the number of reports, the narrower the confidence interval). EB05 and EB95 describe the probability distribution for the correct answer. For example, an EB05 of 3 means that there is a 95% chance that the true Relative Reporting Ratio is at least 3.

  2. What is the relationship of other safety data mining report statistics, such as the Proportional Reporting Ratio, the Reporting Odds Ratio, and the Information Component, to EBGM (EB05, EB95)?

    These different disproportionality statistics all get at basically the same things, and the values for each in any given situation are likely to be similar, provided the sample size is large.

    MGPS allows the use of stratification for elimination of some confounding effects, so it will typically provide lower and more realistic scores if there is a confounding variable involved, such as age or gender, compared to a statistic that does not involve stratification. (Randomized trials are the only way of being sure there is no confounding in a dataset.)

    You can specify whether or not you want to calculate Proportional Reporting Ratios (PRR) and Reporting Odds Ratios (ROR) for a data mining run. Further, for a data mining run that is stratified, the PRR and ROR calculations can also be computed using stratification.

  3. When we apply data mining to evaluate the significance of many different product-event combinations, are we making a multiple comparison? Does this cause problems?

    Multiple comparisons refers to searching out the largest association (such as the largest value of N/E among a large number of product-event pairs) in a database and then evaluating it statistically as if that product-event pair had been the focus of the study before looking at the data. Bayesian shrinkage estimates shrink (modify the estimate of N/E towards 1 based on a theoretically sound statistical method) and the raw values of N/E so that all values of EB05 have the same 95% chance of being no larger than the true RR (Relative Reporting Ratio), even if you search out the largest value of EB05 in the output table.

Threshold EB05

  1. If MGPS yields a ranked list of associations according to the EBGM signal score, what threshold should be used for determining that an association is an alert that warrants further investigation?

    There is no absolute threshold. An alert is an indication that there may be a signal. Where to put the cut-off point demanding further investigation is a business decision that is based, among other considerations, on the nature of the data, seriousness of the AE, available resources, and other signal discovery mechanisms.

  2. Why has the threshold EB05 > 2 been recommended?

    Although the threshold EB05 > 1 might meet a typical definition of statistical significance, there are many biases in spontaneous report data and costs to following up each alert. Therefore, Oracle suggests you use a somewhat higher threshold. The value 2 is a round number that is often reasonable for relatively rare AEs. (For a rare AE, a doubling of the number of reports may not be a large absolute increase.)

    For further discussion, see:

    Szarfman A, Machado SG, O'Neill RT. Use of screening algorithms and computer systems to efficiently signal higher-than-expected combinations of drugs and events in the US FDA's spontaneous reports database. Drug Saf. 2002;25(6):381-92.

  3. If there is a product-event combination with an EB05 greater than my chosen threshold (for example, EB05>2), what can I say with certainty about the relationship between the drug and the event? Are they definitely related? Are they causally related?

    This value suggests only that there is a statistical association between this drug and event. The association may indicate a causal relationship or a reverse causal relationship; that is, the AE is an indication for the drug. It may indicate a relationship that is already labeled, or it may indicate a relationship that is not clinically significant for other reasons. The most frequent explanation is that there is a third variable common to both the drug and the event, often related to the drug’s indication (confounding by indication).

    You should conduct further investigation, including reading the label, doing a literature review, or, at the very least, expert evaluation of the individual medical records leading to the series of reports, to determine whether they seem causally related. Additional studies, such as clinical trials or case control studies, may be warranted in some cases.

False positives and negatives

  1. What is meant by alerts potentially being false positives?

    A false positive occurs when any signaling system, such as the EBGM or EB05 value, exceeds a pre-set threshold (such as EB05 > 2), while, in reality, the drug is either not related to the event or is related, but not causally.

  2. What is meant by alerts potentially being false negatives?

    A false negative occurs when any signaling system, such as the EBGM or EB05 value, did not exceed a pre-set threshold (such as EB05 > 2), while, in reality, the drug is causally related to the event.

  3. How do I set up an alerting system that gives hardly any false positives and hardly any false negatives. Is this possible?

    No. While the chances of finding false positives or false negatives can be reduced, you cannot eliminate them. It is possible to decrease the chances of finding false positives or false negatives by improving the current statistical algorithms with the addition of extra information, such as stratification, or by improving the data collection itself. Otherwise, any decrease in false positives is automatically accompanied by an increase in false negatives, and vice versa.

    There are two possible reasons for a false positive: there actually is no correlation between the drug and event, or there is a correlation, but it is not causal. The first can be dealt with to some degree statistically (although there will always be unresolved uncertainty for small sample sizes). The second is not a statistical issue, and can only be dealt with through medical knowledge.

    Similarly, a false negative can be due to a small sample or be the result of masking from another signal. Given a hypothesis based on medical knowledge you can reduce the masking. Oracle is experimenting with techniques for providing clues about when such masking may be occurring.

    A third reason for either a false positive or a false negative is that you are using the wrong threshold.

Signaling sensitivity

  1. What is meant by the sensitivity and specificity of signaling systems?

    Sensitivity is defined as the percentage of truly causal associations that are identified by the alerting system. Low sensitivity means a high number of false negatives. For example, in a database with counts for 10,000 product-event pairs where 1,000 of the pairs have a truly causal relationship, if MGPS identifies 700 out of the 1,000 true associations, the sensitivity of MGPS is 700/1000 or 70%. This calculation assumes that the true status of every product-event pair is known. You have to have a gold standard.

    Specificity indicates the percentage of noncausal (including nonexistent) associations for which the system does not show an alert. Low specificity means a high false positive rate. If MGPS declares that 3,000 of the 99,000 non-causal pairs are associated, that is a 96,000/99,000 = 97% specificity rate or a 3% false-positive rate. Again, this calculation assumes that you know the actual negatives.

    Because there is no gold standard in most pharmacovigilance situations, you cannot usually estimate the values of sensitivity and specificity very well. Sometimes, the gold standard is whether a reaction is on the label. This is problematical. It is a difficult and subjective task to transform the language on a label into an accurate and complete list of MedDRA terms. The label itself may be wrong or incomplete.

  2. If EB05 is large, does that mean that you can be 95% sure that there is a true signal?

    No, you can only be 95% sure that an association this large will hold up as the database becomes larger and larger. When EB05 > 2, almost all the false positives are due to noncausal associations that are reliably likely to continue as more data is collected, such as events being related to indications for the drugs, or publicity effects. A large EB05 rules out mere statistical fluctuation, but not other spurious causes of the association. Thus, medical knowledge is very important for interpretation of data mining results.

  3. If my threshold for signaling is EB05 > 2, is a product-event combination with an EB05 < 2 definitely not a signal?

    The application distinguishes between an alert that occurs when a data mining statistic exceeds a predetermined threshold and a signal that might be declared upon review of all relevant information. Since there can easily be strong evidence of causality from other information in the absence of a data mining alert, EB05 < 2 would not rule out the possibility of a signal. Considering the data mining results, you would require EB95 < 1, rather than EB05 < 2, to say that there is strong statistical evidence against a positive association between the drug and the event. Remember, all of the values of true RR that lie within the interval (EB05, EB95) are reasonably compatible with the observed data.

  4. Are there any guidelines regarding the smallest N (for a product-event combination) for which you can get a meaningful signal score?

    It is not necessary to take the sample size into account. The methodology does that. In practice, however, the built-in conservatism of the MGPS methodology, and the choice of threshold, prevents EB05 > 2 when analyzing AERS data unless N is at least 3. This suggests that reducing the threshold to EB05 > 1 might be advised as part of screening for very serious adverse events, if you want to be warned whenever the first two reports show up significantly earlier than expected for a new drug.

Working with new drugs and small sample sizes

  1. If a drug is new and only has a handful of reports (e.g., 50) overall in the AERS database, can useful data mining signals be obtained?

    Yes, the EB05 statistic takes the sample size into account.

  2. What about the case of an event term for which there is a small number of reports (<100) overall?

    Yes, the EB05 statistic takes the sample size into account.

Stratification and subsets

  1. What is the difference between stratification and subsetting?

    Stratification is a way of normalizing data to eliminate confounding by some variable or variables. You generally stratify by age group, gender, and receipt year. The Oracle Empirica Signal application calculates the expected count separately for each stratum, then sums those counts and uses this total expected count for the RR and EBGM calculations. This process can be used, for example, to deal with confounding due to a drug being administered only to the elderly or only to males, or to the drug being new on the market. This process is often called adjustment for confounding variables in the epidemiology literature. Other possible confounding variables are not generally available in AERS data.

    Subsetting is a way of repeating the entire data mining analysis using different subsets of the database. Subsets are typically used to study how the strength of a product-event association evolved over time, but can also be used to compare strength of associations for different age or gender subsets. For each subset, the application calculates the observed count and expected count, then uses those values to calculate a separate RR and EBGM value for each subset. Different subsets of the reports can be overlapping or non-overlapping, depending on how the subset run parameters are specified.

  2. Is it meaningful to compare signal scores across subsets (e.g., signal scores for males versus signal scores for females)?

    Yes. Non-overlapping confidence intervals can alert you to the possibility of a gender difference in response to the drug.

Cross-drug comparisons

  1. Under what circumstances might it be meaningful to compare signal scores across different drugs?

    Non-overlapping confidence intervals can alert you to the possibility of a difference between the safety profiles of the drugs. Alternately, if the confidence intervals overlap a lot, the signal scores can help corroborate evidence that there is no difference.

  2. What are the advantages and disadvantages of including concomitant medications (as well as suspect drugs) in the data mining analysis?

    An advantage of including concomitant medications is that you can eliminate the bias from physicians who pre-determined the suspect drugs. Each drug will be weighted equally. In addition, including both concomitant and suspect drugs provides a larger background case set for each drug and may reveal associations unsuspected by the reporter.

    A disadvantage is that potentially valuable judgments from physicians regarding which drug(s) is/are suspect are not taken into account.

    More research is needed to understand whether it is advantageous to include concomitant drugs in an MGPS analysis. Because logistic regression analysis is specifically designed to assess the results of polytherapy, Oracle advises always using a data configuration that includes concomitant drugs in logistic regressions.

  3. What are the advantages and disadvantages of fully breaking apart combination drugs into their component ingredients for purposes of data mining?

    An advantage is that the number of unique drugs in the database becomes much smaller, and the counts of each unique ingredient are higher. It is also easier to implement drug classification.

    A disadvantage is that ingredients are equally weighted no matter whether they were part of combination drugs or not.

Combining reports in a single data mining analysis

  1. Is it acceptable to combine foreign and domestic reports in a single data mining run?

    Yes, although there is only a small proportion of foreign reports (12%) in the FOI FDA AERS+SRS data. WHO data contains approximately 50% cases reported in the US and 50% with foreign sources. It is possible to specify that report source (US/non-US) be used as an additional stratification variable, which may help protect from biases due to differential reporting rates in US and non-US reports.

  2. Is it acceptable to combine health professional and consumer reports in a single data mining analysis?

    It can be interesting to compare the two to see whether there is a significant difference. It is not clear that consumer reports are more likely to give false positives or negatives than health professional reports, given that the application examines disproportional reporting as opposed to absolute numbers.

    Subsetting by the report source or occupation code of the reporter allows comparison of values calculated for reports by consumers to those reported by health care professionals. Also, you can use stratification on these variables to eliminate the possibility of confounding based on report source.

  3. Is it acceptable to combine serious and non-serious adverse events in a single data mining analysis? What if some drugs have non-serious reports entered into AERS and others you are interested in, for purposes of comparing signal scores, do not?

    Yes, although AERS contains mostly serious events. Including large numbers of non-serious events probably improves the performance of the system, since some analyses have shown that the ratios of non-serious event reports are a reasonable surrogate for the exposure ratios. If, for some reason, non-serious reports have been suppressed to a much greater degree for some drugs than for others, the result would be to bias comparisons of alerting scores across the two sets of drugs.

Data issues

  1. What is the meaning of a drug showing a positive alerting score with a particular event sometimes being confounded by polytherapy?

    If a drug is typically co-prescribed with another drug that causes a particular event, then the first drug is statistically associated with the event as well.

  2. What is the impact of cloaking, where a very strong association with one drug masks your ability to see a moderately strong association with another drug?

    If a drug has a very strong association with a given event, this may mean that it accounts for a relatively large proportion of the reports for that event in the database. This would tend to raise the expected value for this event for all drugs and, thus, mask the signal (lower observed / expected) for another drug.

  3. If a potential signal of interest may be spread across multiple MedDRA Primary Terms (different reporters may choose to use slightly different terminology to describe essentially the same effect), will this reduce the ability to see the signal of interest?

    Yes, this is possible, although significant associations for several of the possible PTs are a good indication of something real. If you suspect that this spreading is happening, then it is possible in the Oracle Empirica Signal application to map the suspect PTs into a new pseudo PT, called a custom term, to look at the combined score. Alternatively, data mining analysis can be based on MedDRA HLT or HLGT, rather than PT.

  4. MGPS essentially compares the safety profile (relative frequency of different AEs) of each drug against a background comparator of all drugs. Would it be better to choose a more narrowly defined background? For example, to understand the profile of Cerivastatin, why not limit the background to all other statins?

    Given that noise is a surrogate for exposure, the bigger the denominator, the better. Oracle performed an experiment to look into this; the results showed more noise with smaller databases. To compare a drug against others in its class to see differences, look at the EB05 scores for each of the drugs run against the entire background. Looking at a drug against its own class also means that it is unlikely that you would find a previously unknown class effect.

  5. Are data mining results skewed by publicity effects?

    This can occur, and it is one of the possible causes to consider when doing case evaluation of the potential signals.

Comparing EBGM values

  1. When comparing different EBGM values, if a given drug D has an EBGM signal score of 4.0 for event E1 and 8.0 for event E2, is it correct to say that the risk of E2 appears to be twice as high?

    No. At most, you could say (if the confidence interval was very narrow) that the drug had 4 times as many reports for E1 as expected and 8 times as many reports for E2 as expected. To estimate risks you have to know about reporting rates and exposure rates, neither of which are estimable from spontaneous reporting data alone.

  2. What about comparing EBGM values across two drugs? Suppose, for event E, drug D1 has an EBGM score of 10 and drug D2 has an EBGM score of 20. Is the risk of the event twice as high with drug D2 as with drug D1?

    Confidence intervals can be used for comparisons, provided they do not overlap. In this example, the interpretation is that the association between D2 and the event is twice as high as the association between D1 and the event. But associations can differ for reasons unrelated to causation.

  3. What do EBGM scores < 1 signify?

    A score of EBGM < 1 indicates that the product-event combination is occurring less frequently than expected statistically. If EBGM were < 1 and N were high, this might indicate a possible therapeutic effect.