Skip To Content Table Of Contents
Click for DHHS Home Page
Click for the SAMHSA Home Page
Click for the OAS Drug Abuse Statistics Home Page
Click for What's New
Click for Recent Reports and HighlightsClick for Information by Topic Click for OAS Data Systems and more Pubs Click for Data on Specific Drugs of Use Click for Short Reports and Facts Click for Frequently Asked Questions Click for Publications Click to send OAS Comments, Questions and Requests Click for OAS Home Page Click for Substance Abuse and Mental Health Services Administration Home Page Click to Search Our Site
National and State Estimates of Drug Abuse Treatment Gap 

Appendix B: State Estimation Methodology

B.1 Background

In response to the need for State-level information on substance abuse problems, the Substance Abuse and Mental Health Services Administration (SAMHSA) began developing and testing small area estimation (SAE) methods for the National Household Survey on Drug Abuse (NHSDA) in 1994 under a contract with RTI of Research Triangle Park, North Carolina. That developmental work used logistic regression models with data from the combined 1991 to 1993 NHSDAs and local area indicators, such as drug-related arrests, alcohol-related death rates, and block group/tract-level characteristics from the 1990 Census that were found to be associated with substance abuse. In 1996, the results were published for 25 States for which there were sufficient sample data (Office of Applied Studies [OAS], 1996). A subsequent report described the methodology in detail and noted areas in which improvements were needed (Folsom & Judkins, 1997).

The increasing need for State-level estimates of substance use led to the decision to expand the NHSDA to provide estimates for all 50 States and the District of Columbia on an annual basis beginning in 1999. It was determined that, with the use of modeling similar to that used with the 1991 to 1993 NHSDA data in conjunction with a sample designed for State-level estimation, a sample of about 67,500 persons would be sufficient to make reasonably precise estimates.

The State-based NHSDA sample design implemented in 1999 had the following characteristics:

In preparation for the modeling of the 1999 data, RTI used the data from the combined 1994-96 NHSDAs to develop an improved methodology that utilized more local area data and produced better estimates of the accuracy of the State estimates (Folsom, Shah, & Vaish, 1999). That effort involved the development of procedures that would validate the results for geographic areas with large samples. This work was reviewed by a panel with expertise in small area estimation.3 They approved of the methodology, but suggested further improvements for the modeling to be used to produce the 1999 State estimates. Those improvements have been incorporated into the methodology finally used for the State estimates included in this report. The methodology, called Survey-Weighted Hierarchical Bayes Estimation (HB), is described below.

B.2 Goals of Modeling

There were several goals underlying the estimation process. The first was to model substance use-related rates at the lowest possible level and aggregate over the levels to form the State estimates. The chosen level of aggregation was the 32 age group (12 to 17, 18 to 25, 26 to 34, 35+) by race/ethnicity (white-not Hispanic, black-not Hispanic, Hispanic, other) by gender cells at the block group level. Estimated population counts are obtained from a private vendor for each block group for each of the 32 cells. This level of aggregation was desired because the NHSDA first stage of sample selection was at the block group level, so that there would be data at this level to fit a model. In addition, there was a great deal of information from the Census at the block group level that could be used as predictors in the models. If substance use-related rates could be estimated for each of the 32 cells at the block group level, it would only be necessary to multiply by the estimated population counts and aggregate to the State level.

Another goal of the estimation process was to include the sampling weight in the model in such a way that the small area estimates would converge to the design-based (sample-weighted) estimate when they are aggregated to a sufficient sample size. There was a desire for the estimates to have this characteristic so that there would be consistency with the survey-weighted national estimates based on the entire sample.

A third goal was to include as much local source data as possible, especially data related to each substance use measure. This would help provide a better fit beyond the strictly sociodemographic information. The desire was to use national sources of these data so that there would be consistency of collection and estimation methodology across States.

Recognizing that estimates based solely on these "fixed" effects would not reflect differences across States due to differences in laws, enforcement activities, advertising campaigns, outreach activities, and other such unique State contributions, a fourth goal was to include "random" effects to compensate for these differences. The types of random effects that could be supported by NHSDA data were a function of the size of sample and the model fit to the sample data. For the 1999 survey, random effects were included at the State level and for substate regions comprised of three (typically neighboring) FI regions. Although this grouping of the three FI regions was principally motivated by the need to accumulate enough sample to support good model fitting for the low prevalence NHSDA outcomes, it was also reasoned that it would be possible to produce substate HB estimates for areas comprised of these FI region groups, once 2 or 3 years of NHSDA data were available, because that would yield substate region samples of at least 400 respondents. For substate areas like counties and large municipalities that do not conform to the substate region boundaries, HB estimates could be derived from their elemental block group-level contributions, but the direct survey data employed in the estimation of the associated substate region effects would not be restricted to the county or city of interest. This mismatch of FI region and county/large municipality boundaries weakens the theoretical appeal of the associated HB estimate. For this reason, substate HB estimates probably should be restricted to areas that can be matched reasonably well to FI region groups.

One of the difficulties of typical SAE has been obtaining good estimates of the accuracy of the estimates with prediction intervals that give a good representation of the true probability of coverage of the intervals. Therefore, the final major goal was to provide accurate prediction intervals—ones that would approach the usual sample-based intervals as the sample size increases.

B.3 Predictors Used in Logistic Regression Models

Local area data used as potential predictor variables in the logistic regression models were obtained from several sources, including Claritas, the Census Bureau, the FBI (Uniform Crime Reports), Health Resources and Services Administration (Area Resource File), SAMHSA (Uniform Facility Data Set), and the National Center for Health Statistics (mortality data). The list of sources and the actual variables that were selected as independent predictors for each age group for the estimation of the treatment gap are provided below.

B.3.1 Sources of Data

B.3.2 Predictor Variables in Final Model, by Age Group

Age Group 1 (Ages 12 to 17)

Age Group 2 (Ages 18 to 25)

Age Group 3 (Ages 26 to 34)

Age Group 4 (Ages 35 or Older)

B.4 Method of Selecting Independent Variables for the Models

For the 1999 SAE exercise, independent variables for modeling each of the substance use measures were first identified by a CHAID (Chi-squared Automatic Interaction Detector) algorithm. CHAID does not use sample weights. Prior to this process, all the continuous variables were categorized using deciles and were treated as ordinal in CHAID. Region was treated as a nominal categorical variables in CHAID. Significant independent variables from each model that were final nodes in the tree-growing process were identified as indicator variables destined for inclusion at a later step.

Independently, a SAS stepwise logistic regression model was fit for each dependent variable by age group. The SAS stepwise was used because it was able to quickly run all of the variables for all of the models, although it was recognized that the software would not take into account the complex sample design and the weights. The independent variables included all the first-order or linear polynomial trend contrasts across the 10 levels of the categorized variables, as well as the gender, region, and race variables. Significant variables (at the 3 percent level) were identified from this process. Based on this list, another list of variables was created that included the second- and third-order polynomials and the interaction of the first-order polynomials with the gender, race, and region variables.

Next, the variables from the CHAID process and the SAS process were entered into a SAS stepwise logistic model at the 1 percent significance level. Because of past concerns about overfitting of the data in earlier estimation using the 1991 to 1993 NHSDA data, the significance levels were made quite stringent. These variables were then entered into a SUDAAN logistic regression model because the SUDAAN software would adjust for the effects of the weights and other aspects of the complex sample design. All variables that were still significant at the 1 percent significance level were entered into the survey-weighted HB process.

Independently, a factor-analytic approach was used to determine the important variables to include in the model. This approach would allow the data to self-identify the important dimensions. The concern here was to use an alternative method that would have a certain face validity. That method was utilized to identify an independent set of variables that were then processed through the HB estimation. The results, however, in terms of model-fit and prediction intervals were generally not as good as with the CHAID/SAS/SUDAAN screening process for candidate independent variables. Also, the factor-analytic approach involved an inherently subjective step to attribute names to the various factor loadings, and the interest was more in the predictive ability of variables than in a substantive description of the dimensions. Nevertheless, it was encouraging to see that the results of the two approaches gave reasonably similar results. For these reasons, the estimates in this report were those based on the latter method that started with the CHAID process.

To select variables for the 2000 treatment gap model, an alternative to the 1999 approach was also implemented. This alternative, designed to further reduce the risk of overfitting, involved splitting the 2000 sample into two halves with the 7,200 sample area segments (block clusters) used as sampling units for the splitting. One of those half-samples was designated the training sample, and its complement was assigned the role of validation sample. The 1999 variable selection strategy was then applied to the training sample with a less stringent 10 percent significance level for retaining variables. Note that with a sample size one-half as large, the training sample would yield standard errors for the logistic regression coefficients that were expected to be inflated by a factor of 1.4. Therefore, a training sample significance level of 7 percent would be expected to yield a significance level of 1 percent in the full sample. The 10 percent level was chosen for the training sample after trying several alternatives. Once the variables were chosen using the training sample, the model was refit on the validation sample and variables that were not significant at the 10 percent level were dropped. The two alternative models resulting from the 1999 variable selection method and the new 2000 alternative were both subjected to the internal benchmarking validation exercise described later in this appendix (Section B.7). The new method produced small area estimates that were noticeably less biased for the 26 or older age groups and the 12 or older age groups. Based on this result, the alternative set of predictor variables was chosen.

B.5 General Model Description

The model can be characterized as a complex mixed model (including both fixed and random effects) of the form:

=X + ZU

Each of the symbols represents a matrix or vector. The leading term X is the usual (fixed) regression contribution, and ZU represents random effects for the States and FI region groups that the data will support and for which estimates are desired. Not obvious from the notation is that the form of the model is a logistic model used to estimate dichotomous data. The vector has elements ln[ijk /(1-ijk)], where the ijk is the propensity for the kth person in the jth FI composite region in the ith State to engage in the behavior of interest (e.g., to use marijuana in the past month). Also not obvious from the notation is that the model fitting utilizes the final "sample" weights as discussed above. The "sample" weights have been adjusted for nonresponse and poststratified to known Census counts.

The estimate for each State behaves like a "weighted" average of the direct survey estimate in that State and the predicted value based on the national regression model. The "weights" in this case are functions of the relative precision of the sample-based estimate for the State and the predicted estimate based on the national model. The eight large States have large samples, and thus more "weight" is given to the sample estimate relative to the model-based regression estimate. The 42 small States and the District of Columbia put relatively more "weight" on the regression estimate because of their smaller samples. The national regression estimate actually uses national parameters that are based on the full sample of approximately 72,000 persons; however, the regression estimate for a specific State is based on applying the national regression parameters to that State's "local" county, block group, and tract-level predictor variables and summing to the State level. Therefore, even the national regression component of the estimate for a State includes "local" State data.

The goal then was to come up with the best estimates of and U. This would lead to the best estimates of , which would in turn lead to the best estimate of . Once the best estimate of for each block group and each age/race/gender cell within a block group has been estimated, the results could be weighted by the projected Census population counts at that level to make estimates for any geographic area larger than a block group.

B.6 Implementation of Modeling

The solution to the equation for in the above section is not straightforward but involves a series of iterative steps to generate values of the desired fixed and random effects from the underlying joint distribution. The details of the technique will be described in more detail in a methodological report currently in progress. In the interim, the basic process can be described as follows.

Let denote the matrix of fixed effects, be the matrix of State random effects i = 1-51, and v denote the matrix of FI composite region effects j within State i. Because the goal is to estimate separate models for four age groups, it is assumed that the random effect vectors are four variate Normal with null mean vectors and 4X4 covariance matrices D and Dv, respectively. To estimate the individual effects, a Bayesian approach is used to represent the joint density function given the data by f(vDvD | y). According to the Bayes process, this can be estimated once the conditional distributions are known:

f1( | , v, Dv, D, y),  f2(Dv, D | , v, y),  and f3(, v | , Dv, D, y).

To generate random draws from these distributions, Markov Chain Monte Carlo (MCMC) processes need to be used. These are a body of methods for generating pseudo-random draws from probability distributions via Markov chains. A Markov chain is fully specified by its starting distribution P(X0) and the transition kernel P(Xt |Xt-1).

Each MCMC step that involves the vector of binary outcome variables y in the conditioning set needs first to be modified by defining a pseudo-likelihood using survey weights. In defining pseudo-likelihood, weights are introduced after scaling them to the effective sample size based on a suitable design effect. Note that with the pseudo-likelihood, the covariance matrix of the pseudo-score functions is no longer equal to the pseudo-information matrix; therefore, a sandwich-type of covariance matrix was used to compute the design effect. In this process, weights are largely assumed to be noninformative (i.e., unrelated to the outcome variable y). The assumption of noninformative weights is useful in finding tractable expressions for the appropriate information matrix of the pseudo-score functions. The pseudo-log-likelihood remains an unbiased estimate of the finite-population log-likelihood regardless of this assumption.

    Step I [a | , v, y] (note that this does not depend oin D, Dv)

With flat prior for a, the conditional posterior is proportional to the pseudo-likelihood function. For large samples, this posterior can be approximated by the multivariate Normal distribution with mean vector equal to the pseudo-maximum likelihood estimate and with asymptotic covariance matrix having the associated sandwich form. Assuming that the survey weights are noninformative makes the age group specific a vectors conditionally independent of each other. Therefore, the a can be updated separately at each MCMC cycle.

    Step II [i | , vi, D, y] (this does not depend on Dv)

Here, the conditional posterior is proportional to the product of the prior g(i |.), the pseudo-likelihood function f(y|.) as well as the prior p(,D); this last prior can be omitted as it does not involve i. Calculating the denominator (or the normalization constant) of the posterior distribution for i requires multidimensional integration and is numerically intractable. To get around this problem, the Metropolis-Hastings (M-H) algorithm is used that requires a dominating density convenient for Monte Carlo sampling. For this purpose, the mode and curvature of the conditional posterior distribution are used; these can be simply obtained from its numerator. Then a Gaussian distribution is used with matching mode and curvature to define the dominating density for M-H. As with the age group specific a parameters, the State-specific random effect vectors i are conditionally independent of each other and can be updated separately at each MCMC cycle.

    Step III [vij | , i, Dv, y] (this does not depend on D)

    Similar to step II.

    Step IV [D | ] , [Dv | v] (here, and v include all the information from y)

Here, the pseudo-likelihood involving design weights comes in implicitly through the conditioning parameters and v evaluated at the current cycle. An exact conditional posterior distribution is obtained because the inverse Wishart priors for D and Dv are conjugate.

Remarks

  • In the NHSDA application, three FI regions were combined to form a minimum of four substate regions with corresponding random effects. This was done to ensure adequate sample sizes for estimation purposes.

  • There is self-calibration built in to the modeling. This is achieved via design effect scaling of survey weights incorporated in the conditional posterior density so that small area estimates for large States become asymptotically equivalent to the direct estimates. Similarly, survey-weighted estimates of the fixed parameters (in particular, the intercept) give calibration of the aggregate of small area estimates to the national direct estimate.

  • For posterior variance estimation purposes, the survey weights were largely assumed to be noninformative. The survey design effects on the posterior variance are therefore restricted to unequal weighting effects. It was assumed that all the design-related clustering effects are represented by between-State and between-substate (within State) variability of random effects. This does not take care of variability at lower levels of clustering. However, sample size is not sufficient at lower levels to support stable estimates of random effects for area segments.

  • If the logistic mixed model fits well, the variance estimates should be reasonable. The self-calibration property provides some protection against model breakdown. Research is currently under way to develop a new MCMC algorithm that fully accounts for survey design effects on the small area estimate posterior prediction intervals.

B.7 Validation and Other Results

The following validation methodology was implemented at the time of the estimation of the 2000 percentage treatment gap and is specific to this measure. Validation was also conducted earlier at the time of the first release of the 1999 NHSDA data (OAS, 2000) and was based on the seven variables discussed in that report. Subsequently, an error in the imputation program was discovered and corrected, and the corrected file was used for the validation of the treatment gap estimation. Further information about the impact of the error on the previously released data from the 1999 NHSDA is provided in the 2000 Summary of Findings (OAS, 2001).

To validate the fit of the SAE models, the eight large sample States were used as internal benchmarks. For this purpose, 6 pseudo FI regions within each large sample State were created by pooling the 48 initial regions into 6 groups of 8. Each of these 6 pseudo-FI regions were then expected to have 16 area segments per calendar quarter. For each of these pseudo FI region-by-quarter sets of 16 area segments, any segments devoid of interviews were first randomly replaced by a selection from the non-empty segments in the set. The completed set of 16 segments from each pseudo-FI region-by-quarter combination was then randomly partitioned into 8 replicates of 2 segments each. When combined, each pair of large sample States had 12 pseudo-FI regions. By pooling one segment pair from each of the 48 pseudo-FI region-by-quarter combinations, 8 substate replicates were formed. Each of these 8 substate replicates mimicked the size and design structure of a small sample State.

Having created 8 pseudo-small State samples and associated universe-level files for each of the 4 paired States, SAEs were then produced for the 32 pseudo-States. Table B.4 shows these 32 substate SAEs and their direct survey-weighted analogs for the percentage treatment gap.4 Relative absolute biases of the substate estimates are shown where the full State sample direct estimate is used as the benchmark value.

The State-specific relative absolute bias (RB) quantities in Table B.4 equal the absolute differences of the averaged eight substate small area estimates and the State full sample design-based benchmark (e.g., California and Texas) divided by the benchmark. The average relative absolute bias (ARB) is the simple average across the four combined-State pairs of the RBs. The average relative bias across the 32 pseudo-States was only about 4 percent. This implies, on average, for a pseudo-State (similar in design and sample size to the 42 small States and the District of Columbia) with an estimated 2 percent treatment gap that the true value in the population is within 0.08 percent.

To compare the overall precision of the small area estimates with the direct survey estimates, ratios of the corresponding 95 percent Bayes prediction intervals, which fully account for the posterior variance of the fixed and random effect parameters, were compared with the corresponding direct survey 95 percent confidence intervals. These results are displayed in Table B.5.

The SAE and direct intervals are summarized by showing average ratios of the relative interval widths (the interval width for a State divided by the corresponding estimate for that State) by State and overall averages of the ratios across States. The average relative width across the 32 pseudo-States is about 1.80. This indicates generally that the confidence intervals for direct design-based estimates based on the same sample size would be 1.8 times larger than the prediction intervals resulting from the HB approach. The HB estimates are equivalent in precision to a direct estimate based on a sample that is 3.3 times larger. The tables also present the average relative root mean square (RMSQ), a measure that takes into consideration both the (small) bias and the variance in the HB estimation.

B.8 Caveats

Table B.1 shows the screening, interview, and overall response rate for each State and the District of Columbia. As mentioned in the text, these variable response rates can be associated with variable levels of nonresponse bias. In addition, there may be varying levels of response bias as a result of underreporting (and sometimes overreporting) use of illicit substances. For 1999 and 2000, the assumption being made is that the biases from these two sources are constant across States so that comparisons among States still hold.

Another possible contributor to bias in the State estimates, and the estimates in general, was the effect of editing and imputation. In developing the editing and imputation process, the desire was to minimize the amount of editing, typically somewhat subjective, and instead let the random imputation process supply any partially missing information. Overall, the percentage of imputed information was quite small for most substances. For example, respondents may have indicated that they used the drug in their lifetime or in the past year, but left blank the question about use in the past month. The method is based on a multivariate imputation in which some demographic and other substance use information from the respondent is used to determine a donor who is similar in those characteristics but has supplied data for the drug in question. Often, information was also available from the partial respondent on the recency of drug use. For many of the records, this auxiliary information was available. For a small portion, no auxiliary information was available, in which case a random donor with similar drug use patterns and demographic characteristics was used.

 

Table B.1 2000 NHSDA Weighted CAI Screening and Interview Response Rates, by State

State

Screening Response Rate

Interview Response Rate

Overall Response Rate

State

Screening Response Rate

Interview Response Rate

Overall Response Rate

Total

92.84

73.93

68.64

Missouri

92.25

70.80

65.31

Alabama

95.50

77.98

74.47

Montana

94.91

80.21

76.13

Alaska

95.43

80.24

76.58

Nebraska

93.13

74.58

69.46

Arizona

92.99

73.78

68.61

Nevada

92.08

74.44

68.54

Arkansas

97.19

81.00

78.73

New Hampshire

92.41

75.12

69.42

California

90.99

69.50

63.24

New Jersey

91.96

66.56

61.21

Colorado

94.84

75.26

71.37

New Mexico

97.43

80.80

78.72

Connecticut

89.83

71.36

64.10

New York

88.78

73.73

65.46

Delaware

92.91

68.25

63.42

North Carolina

94.51

73.19

69.17

District of Columbia

93.50

85.56

80.00

North Dakota

94.43

79.46

75.03

Florida

94.64

75.73

71.67

Ohio

94.89

75.79

71.92

Georgia

92.95

69.76

64.84

Oklahoma

93.06

74.85

69.66

Hawaii

91.95

78.45

72.14

Oregon

91.87

73.91

67.90

Idaho

93.94

74.45

69.94

Pennsylvania

94.37

73.50

69.36

Illinois

88.71

65.59

58.19

Rhode Island

91.26

74.11

67.63

Indiana

92.62

73.87

68.42

South Carolina

94.69

77.84

73.71

Iowa

94.78

80.00

75.83

South Dakota

95.15

76.67

72.95

Kansas

92.28

73.45

67.79

Tennessee

90.25

72.45

65.39

Kentucky

95.79

84.14

80.59

Texas

94.72

78.12

74.00

Louisiana

95.04

80.81

76.80

Utah

95.11

83.44

79.36

Maine

92.39

78.46

72.49

Vermont

92.62

80.80

74.83

Maryland

94.88

76.88

72.94

Virginia

91.44

75.18

68.75

Massachusetts

89.77

66.45

59.65

Washington

93.59

75.45

70.61

Michigan

93.19

73.18

68.20

West Virginia

95.19

78.17

74.41

Minnesota

94.66

80.62

76.32

Wisconsin

94.33

75.06

70.81

Mississippi

93.60

79.14

74.07

Wyoming

95.41

76.61

73.09

Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 2000.

 

Table B.2 Estimated Numbers (in Thousands) of Persons Aged 12 or Older, by Age Group and State: 2000

State

Total

Age Group (Years)

12-17

18-25

26 or Older

Total

223,280

23,368

28,984

170,927

Alabama

3,654

371

476

2,807

Alaska

491

63

71

357

Arizona

3,866

434

516

2,916

Arkansas

2,159

225

279

1,655

California

25,736

2,851

3,513

19,371

Colorado

3,411

358

452

2,601

Connecticut

2,701

257

308

2,136

Delaware

630

65

79

487

District of Columbia

424

44

58

321

Florida

12,693

1,178

1,368

10,147

Georgia

6,354

680

863

4,811

Hawaii

975

95

115

764

Idaho

1,083

130

165

789

Illinois

9,768

998

1,306

7,465

Indiana

4,949

512

665

3,772

Iowa

2,390

249

319

1,822

Kansas

2,155

240

293

1,622

Kentucky

3,287

329

435

2,524

Louisiana

3,561

418

519

2,624

Maine

1,047

103

122

822

Maryland

4,281

421

510

3,349

Massachusetts

5,119

504

611

4,004

Michigan

7,918

832

1,032

6,053

Minnesota

3,954

431

539

2,985

Mississippi

2,270

259

323

1,688

Missouri

4,534

476

596

3,462

Montana

776

85

100

591

Nebraska

1,376

154

189

1,032

Nevada

1,544

146

184

1,214

New Hampshire

1,007

105

120

782

New Jersey

6,717

629

783

5,305

New Mexico

1,490

174

211

1,105

New York

14,782

1,476

1,825

11,480

North Carolina

6,365

651

777

4,936

North Dakota

535

62

77

396

Ohio

9,292

951

1,212

7,129

Oklahoma

2,744

306

367

2,072

Oregon

2,827

276

355

2,197

Pennsylvania

10,117

988

1,186

7,943

Rhode Island

821

84

95

642

South Carolina

3,130

326

386

2,418

South Dakota

619

73

88

458

Tennessee

4,657

464

598

3,595

Texas

16,057

1,877

2,368

11,813

Utah

1,715

248

326

1,142

Vermont

512

55

63

394

Virginia

5,648

563

691

4,395

Washington

4,784

487

606

3,691

West Virginia

1,553

141

195

1,216

Wisconsin

4,376

476

590

3,310

Wyoming

425

49

61

315

Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 2000.

 

Table B.3 Survey Sample Size for Persons Aged 12 or Older, by Age Group and State: 2000

State

Total

Age Group (Years)

12-17

18-25

26 or Older

Total

71,764

25,717

22,613

23,434

Alabama

936

294

337

305

Alaska

833

294

257

282

Arizona

927

292

303

332

Arkansas

960

310

364

286

California

5,022

2,365

1,354

1,303

Colorado

911

278

298

335

Connecticut

891

299

262

330

Delaware

928

321

297

310

District of Columbia

918

259

340

319

Florida

3,478

1,194

1,140

1,144

Georgia

1,145

520

330

295

Hawaii

945

309

307

329

Idaho

894

311

283

300

Illinois

3,660

1,262

1,128

1,270

Indiana

1,061

405

353

303

Iowa

921

284

324

313

Kansas

897

291

323

283

Kentucky

1,018

341

345

332

Louisiana

939

356

278

305

Maine

901

321

234

346

Maryland

967

332

317

318

Massachusetts

1,002

378

298

326

Michigan

3,576

1,234

1,090

1,252

Minnesota

893

297

306

290

Mississippi

917

309

320

288

Missouri

893

314

302

277

Montana

914

276

334

304

Nebraska

906

311

291

304

Nevada

925

305

284

336

New Hampshire

883

280

246

357

New Jersey

1,200

553

289

358

New Mexico

874

315

267

292

New York

3,589

1,160

1,142

1,287

North Carolina

1,043

418

326

299

North Dakota

896

288

320

288

Ohio

3,678

1,227

1,215

1,236

Oklahoma

973

303

374

296

Oregon

864

288

275

301

Pennsylvania

3,997

1,474

1,195

1,328

Rhode Island

950

293

324

333

South Carolina

855

275

269

311

South Dakota

855

289

272

294

Tennessee

947

367

285

295

Texas

4,020

1,498

1,307

1,215

Utah

1,031

362

372

297

Vermont

981

344

320

317

Virginia

1,047

437

274

336

Washington

1,006

408

289

309

West Virginia

950

322

286

342

Wisconsin

1,119

453

312

354

Wyoming

828

301

255

272

Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 2000.

 

Table B.4 Simulated Substate Prevalence Rates, Relative Absolute Bias, and Root Mean Square for Persons Needing But Not Receiving Treatment for an Illicit Drug Problem in the Past Year: 2000

 

Needing But Not Receiving Treatment for an Illicit Drug Problem

Total

12-17

18-25

26 or Older

California and Texas SAE

2.18

5.30

4.79

1.21

California and Texas DBE

2.01

5.34

4.95

0.95

CA_TX1

2.25

4.69

4.20

1.51

CA_TX2

2.27

6.24

5.17

1.12

CA_TX3

2.34

5.97

5.09

1.27

CA_TX4

2.59

5.37

5.71

1.59

CA_TX5

2.13

5.11

4.93

1.16

CA_TX6

2.12

4.98

4.69

1.21

CA_TX7

1.96

4.36

4.43

1.13

CA_TX8

2.09

5.28

4.20

1.21

RMSQ

13.75

11.03

10.36

38.26

REL ABS BIAS

10.46

1.66

3.06

34.08

New York and Florida SAE

1.84

3.90

6.78

0.85

New York and Florida DBE

1.82

3.48

7.04

0.85

NY_FL1

1.70

3.98

6.23

0.76

NY_FL2

1.88

4.04

6.94

0.87

NY_FL3

1.93

4.51

7.36

0.81

NY_FL4

1.88

3.93

6.69

0.92

NY_FL5

1.82

3.66

6.30

0.93

NY_FL6

1.69

4.13

5.80

0.78

NY_FL7

1.59

3.68

5.37

0.77

NY_FL8

2.02

3.78

7.52

0.99

RMSQ

7.37

15.84

12.32

9.65

REL ABS BIAS

0.37

13.97

7.33

0.94

Ohio and Michigan SAE

1.64

3.84

5.44

0.70

Ohio and Michigan DBE

1.66

4.00

5.59

0.67

OH_MI1

1.52

3.21

5.34

0.64

OH_MI2

1.72

3.91

5.27

0.82

OH_MI3

1.75

3.98

5.49

0.81

OH_MI4

1.59

4.35

5.00

0.63

OH_MI5

1.60

4.46

5.16

0.61

OH_MI6

1.62

3.49

5.73

0.66

OH_MI7

1.80

3.21

6.01

0.89

OH_MI8

1.63

3.91

5.25

0.70

RMSQ

5.31

12.04

6.37

16.34

REL ABS BIAS

0.37

4.68

3.25

7.16

Pennsylvania and Illinois SAE

1.74

3.40

5.92

0.86

Pennsylvania and Illinois DBE

1.70

3.17

5.84

0.85

PA_IL1

1.83

3.02

5.67

1.05

PA_IL2

1.69

4.03

5.05

0.84

PA_IL3

1.69

3.22

6.47

0.72

PA_IL4

1.75

3.85

5.74

0.83

PA_IL5

2.03

4.10

6.98

0.96

PA_IL6

1.65

3.06

5.47

0.86

PA_IL7

1.77

3.23

6.85

0.77

PA_IL8

1.78

3.39

5.23

1.01

RMSQ

7.65

16.41

11.93

13.63

REL ABS BIAS

4.11

10.08

1.54

4.09

AVERAGE RMSQ

8.52

13.83

10.24

19.47

AVERAGE REL ABS BIAS

3.83

7.60

3.79

11.57

Note: Relative Absolute Bias = |(Combined State Design-Based Estimate (DBE) - Mean of Eight Substate Small Area Estimates (SAE)|/Combined State Design-Based Estimate.

Note: Root Mean Square (RMSQ) = Sqrt(Mean Squared Differences of Substate Small Area Estimates with Respect to Combined State Design-Based Estimates)/Combined State Design-Based Estimate.

Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 2000.

 

Table B.5 Ratio of Relative Width of Design-Based Confidence Intervals to Small Area Estimation Prediction Intervals for Persons Needing But Not Receiving Treatment for an Illicit Drug Problem in the Past Year: 2000

 

Ratio of Relative Width

Total

12-17

18-25

26 or Older

CA_TX1

1.35

1.60

1.95

1.03

CA_TX2

1.37

1.13

1.09

5.19

CA_TX3

1.42

1.38

1.58

1.89

CA_TX4

1.69

1.61

1.81

1.87

CA_TX5

1.19

1.15

1.57

2.37

CA_TX6

1.63

1.13

1.77

2.69

CA_TX7

1.54

1.43

2.04

2.73

CA_TX8

1.78

1.41

1.94

3.17

California and Texas

1.37

1.09

1.18

1.83

AVERAGE OVER 8 SUBSTATES

1.50

1.35

1.72

2.62

NY_FL1

1.39

1.91

1.30

5.42

NY_FL2

2.57

1.77

1.26

2.99

NY_FL3

1.42

1.91

1.65

2.57

NY_FL4

2.96

2.17

1.65

2.89

NY_FL5

2.42

2.41

1.55

2.74

NY_FL6

2.00

1.61

1.36

3.55

NY_FL7

1.54

1.85

1.84

2.62

NY_FL8

1.73

2.18

1.40

1.88

New York and Florida

1.61

1.27

1.16

1.75

AVERAGE OVER 8 SUBSTATES

2.01

1.97

1.50

3.08

OH_MI1

2.34

1.74

2.24

5.12

OH_MI2

2.16

1.90

1.24

2.18

OH_MI3

2.15

1.30

1.78

2.73

OH_MI4

2.03

1.70

1.65

5.23

OH_MI5

1.55

1.17

1.99

*

OH_MI6

1.59

1.49

1.42

5.48

OH_MI7

1.84

2.20

1.17

1.73

OH_MI8

1.55

1.49

1.80

1.17

Ohio and Michigan

1.37

1.22

1.01

1.42

AVERAGE OVER 8 SUBSTATES

1.90

1.62

1.66

3.38

PA_IL1

1.63

2.36

1.79

1.23

PA_IL2

2.52

1.86

2.24

3.75

PA_IL3

1.74

2.00

1.46

*

PA_IL4

1.59

1.34

2.12

2.31

PA_IL5

1.66

1.49

1.29

2.11

PA_IL6

1.79

2.12

1.32

1.94

PA_IL7

1.90

2.51

1.37

5.44

PA_IL8

2.12

1.94

1.49

1.76

Pennsylvania and Illinois

1.48

1.38

1.30

1.36

AVERAGE OVER 8 SUBSTATES

1.87

1.95

1.64

2.65

* Relative width not computed due to design-based estimate of zero.

Note: Relative Width Ratio = (Length of Design-Based Confidence Interval/Design-Based Estimate)/(Length of Small Area Estimate Prediction Interval/Small Area Estimate).

Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 2000.



Appendix B: End Notes

3 The panel included William Bell of the U.S. Bureau of the Census; Partha Lahiri of the University of Nebraska; Balgobin Nandram of Worcester Polytechnic Institute and the National Center for Health Statistics; Wesley Schaible, formerly Associate Commissioner for Research and Evaluation at the Bureau of Labor Statistics; and Alan Zaslavsky of Harvard University. Other attendees involved in the development or discussion were Ralph Folsom, Judith Lessler, Avinash Singh, and Akhil Vaish of RTI and Doug Wright of SAMHSA.

4 The validation results were based on a preliminary model; therefore, the combined State estimates shown in Table B.4 generally will not agree with estimates made by combining the corresponding State estimates from Table 6 or 7 in Chapter 3.

Table Of Contents

This page was last updated on June 03, 2008.