After editing, any data still missing for recency-of-use and frequency-of-use questions (for drugs other than alcohol, cocaine, and marijuana) were statistically imputed using the technique of sequential hot-deck imputation. The first step in this procedure involves sorting the data file progressively using data on recency of use of alcohol, marijuana, and cocaine; age; gender; Hispanic origin; race/ethnicity; and a State indicator variable (i.e., California, Arizona, or remainder of the United States). The hot-deck imputationprocedure replaces a missing item on a particular record by the last encountered nonmissing response for that item (from the previous record) on the sorted database. The hot-deck imputation procedure is appropriate for recency-of-use and most frequency-of-use variables because the level of item nonresponse is low.
Missing data for the variables on frequency of use of alcohol, cocaine, and marijuana in the past 12 months were statistically imputed using a regression-based method of imputation. This imputation procedure involves estimating a polytomous logistic model using a number of respondent characteristics. The explanatory variables used in these models included those variables used in the recency-of-use hot-deck imputation procedure, such as recency of use of alcohol, marijuana, and cocaine; age; gender; Hispanic origin; race/ethnicity; and State. After the model parameters were estimated, the resulting model was used to predict a categorical value for each frequency-of-use item nonresponse. The model-based imputation procedure is appropriate for alcohol, cocaine, and marijuana frequencies for two reasons: (a) the relative amount of nonresponse or faulty responses to these questions is larger than what is observed for the recency-of-use and other frequency-of-use items, and (b) the model-based procedure allows a greater number of statistically significant explanatory variables to contribute to imputing a response compared to what is possible with the hot-deck method.
The main advantage of imputation is that it simplifies the calculation of estimates. Its use can reduce the bias caused by missing data and thus improve the accuracy of estimates. In the 1998 NHSDA, however, the potential impact of bias due to item nonresponse and the impact of imputation on the estimates themselves were quite small because item nonresponse was less than 2% for most of the drug use recency questions.
Sampling errors denote the random fluctuations that occur in estimates based on samples drawn from a population; such variations can be eliminated only by conducting a complete census. Using the same procedures, different samples drawn from the same population would be expected to result in different estimates. Many of these observed estimates would differ to some degree from the "true" population value, and these differences are due to sampling error. The variance of an estimate is the basic measure of this type of error.
To account for the complex features of the NHSDA sample design (such as unequal selectionprobabilities, stratification, and clustering), the variance estimates of the NHSDA drug use statistics are computed for this report using a survey data analysis software package called SUDAAN.5 Estimates of means or proportions, such as drug use prevalence, take the form of nonlinear statistics where the variances cannot be expressed in closed form. Variance estimation for nonlinear statistics in SUDAAN is based on a first-order Taylor series approximation of the deviations of estimates from their expected values. The resulting variance estimates are approximately unbiased for sufficiently large sample sizes.
For a given variance estimate, the associated design effect is the ratio of the design-based variance estimate over the variance that would have been obtained from a simple random sample of the same size. Because the combined design features of stratification, clustering, and unequal weighting are expected to increase the variance estimates, the design effect should virtually always be greater than one. For prevalence rates near zero, however, the variance-inflating effects of unequal weighting and clustering are sometimes underestimated, resulting in design effects of less than one. Because the corresponding variance estimates are then considered anomalously small, two other variance estimates are computed as quality control measures. The first of these other variance estimates is based only on the stratification and unequal weighting effects, and the second is based on simple random sampling. The variance estimate used for obtaining confidence intervals is then the maximum of these three estimates.
The 95% confidence intervals for the drug use proportions and corresponding population estimates are constructed based on the logit transformation. Because the drug use proportions in the NHSDA are frequently small, the logit transformation has been used for this report to yield asymmetric interval boundaries. These asymmetric intervals are more balanced with respect to the probability that the interval is above or below the true population value than is the case for standard symmetric confidence intervals.
To illustrate the method, let
L = logit of p = ln [p/(1-p)],
where "ln" denotes the natural logarithm, and
where the quantity in parentheses that is multiplied
by 1.96 estimates the standard error (SE) of L. Applying the inverse
logistic transformation to the confidence interval endpoints, A and B,
yields a 95% confidence interval for the proportion, P, as
where "exp" denotes the inverse log transformation.
The lower and upper confidence interval endpoints for percentage estimates
are obtained by multiplying the lower and upper endpoints for proportions
by 100. The confidence interval for the corresponding population estimate
is obtained by multiplying the confidence interval endpoints by the estimated
number of individuals in the population subgroup constituting the base
or denominator of the associated proportion.
For tables in this report, each estimate of the number of users of the drug in the defined subgroup (as well as its corresponding estimated percentage of the subgroup's total population) is accompanied by an upper and lower confidence limit. For example, in the lower portion of Table 3A, the "observed estimate" for the total number of people who have "ever used" marijuana is 72,070,000. The "lower limit" is 69,122,000, and the "upper limit" is 75,080,000. The interpretation of these estimates is that one can be 95% confident that the total number of people who have ever tried marijuana at least once in their lifetime lies between 69,122,000 and 75,080,000, with the best 1998 NHSDA estimate being 72,070,000. The corresponding percentage estimates for the lower and upper confidence limits are 31.6% and 34.4%, respectively, with the best estimate being 33.0%.
As in other publications in the NHSDA series, estimates with low precision are not reported. The criterion used for suppressing estimates is based on the size of the estimate and the relative standard error (RSE) of the estimate. The RSE is defined as the ratio of the standard error of an estimate divided by the estimate itself. Specifically, cell percentages and corresponding estimates of numbers of users are suppressed if at least one of the following three criteria is met:
(1) p < .0005 or p $ .9995
(2) RSE[-ln(p)] > 0.175 when p < 0.5
(3) RSE[-ln(1-p)] > 0.175 when p > 0.5
SE(p)/(1-p)
4 These 1998 population projections were based on the 1990 U.S. Census counts.
5 Shah, B.V., Barnwell, B.G., & Bieler, G.S. (1997). SUDAAN user's manual: Version 7.5. Research
Triangle Park, NC: Research Triangle Institute.
This page was last updated on June 03, 2008. |