The first line of defense is in any inaccuracy of the initial review. If the claims reviewers made subjective judgments about the adequacy of the documentation in the claims files or if the provider is able to supplement missing data to the manually reviewed documentation, the validity of the deficient samples can be severely compromised under GIGO, the axiom of “garbage in-garbage out.”

The second line of defense relates to “due process” or procedural fairness requirements for the validity and accuracy of the methodology used by the agency in determining the sample to be extrapolated. This is not the same as being “processed duly.” The agency utilizing statistical sampling techniques has the burden to establish that the sample developed is in fact random and statistically valid. In the seminal case of *Chaves County Home Health Services v. Sullivan*, 931 F.2d 914 (D.C. Cir. 1991), the District of Columbia Circuit Court of Appeals held that the use of statistical sampling techniques was not in and of itself a violation of due process of law “in light of fairly low risk of error so long as the extrapolation is made from a representative sample and is statistically significant.” *Chaves *at 922.

Is the sample truly representative and is the extrapolation statistically significant? There are two general sources of guidance as to the assurance of the representative accuracy of the sample and the statistical significance of the extrapolation, and both really relate to the precision in the tolerances of the sample. The first source emanates from the rules and procedures adopted by the government. The second derives from generally accepted standards within those disciplines regularly engaged in the practice of statistical analysis. HCFA originally published its own Sampling Guidelines Appendix (“SGA”) in the Medicare Carrier’s Manual setting out minimum standards to assure the integrity of the sample. The SGA identified the basic sampling unit, “a service, a bill, or a beneficiary for a particular period of time.”

The SGA identified a number of factors affecting the accuracy of the sample – the time frame of the sample, the size of the sample, the size of the claim amount sought, the stratification of the sample universe, the randomness of the selection, and the complete documentation of the process so as to enable others to reproduce the results. The SGA explicitly recognized that the second source, -- “persons with competence in statistical sampling can provide effective guidance in using more sophisticated techniques which might ensure a better result for the same degree of effort.” It specifically listed Cochron, W.G., Sampling Techniques, 2nd edition, New York; John Wiley and sons, 1963, Hansen, Morris H., William W. Hurwits and William G. Madow, Sample Methods and Theory, New Your; John Wiley and Sons, 1953 and Kesh, Leslie, Survey Sampling, New York, John Wiley and Sons, 1965 as “useful sampling references.”

The effect of compliance with the SGA “minimum standards was to assure that certain precision standards in the results were achieved. Unfortunately, in practice, oversight agencies tended to ignore the standards or to farm out the sampling process to subcontractors who were unfamiliar with them. This was particularly true with respect to the selection of sample size to be used. Instead of using statistically significant sample sizes to achieve acceptable accuracy tolerances in the result there was a tendency to select an arbitrary number of say 100 or 200 when the SGA and other authorities might require a minimum of 400. (The actual minimum number required can be determined mathematically and almost always ends up being an odd number like 353 rather than a round number like 200, which is almost always an indication of a too low arbitrary number used because of agency resource limitations. See In the Case of *American Health Care Services*, HICN 103-01-0077A (2000) before the Social Security Administration, Office of Hearings and Appeals overturning a $1,248,747.00 overpayment determination. There a representative of the Office of Inspector General testified that the OIG could not have looked at the required minimum of 400-sample size due to “lack of audit resources and availability of staff” and the OIG failed to preserve the sample “frames” and other data to permit a replication of the sampling.

Some states have adopted their own standards for statistical sampling, while many have not. The applicability of the federal Medicare standards to state Medicaid recovery actions is not anywhere made explicit, but the logic of their use in state actions flows from the fact that both programs involve the recovery of federal funds.

In 2001, HCFA replaced the SGA with PMB-01-01 that eliminated the minimum sample size and sampling detail requirements contained in the SGA. It also suggests that the probability sample and statement results are “always” valid which is unsupported in any professional literature on the subject. This is a prime example of the principle if you can’t win by playing by the rules (or even not by the rules) just get rid of the rules. The fundamental problem for the government is that the procedure used by the agency must still stand up to due process requirements in order to be upheld and the effect of this loosy goosey dilution of precision in the sampling requirements has yet to be determined in the courts.

Sample size and reproducibility are but two factors affecting the accuracy of a representative sample. A representative sample is one from which all bias has been removed. A basic sample is random if every name or thing in the whole group has a mathematically equal chance to be in the sample. The question is how accurate a sample can be taken to represent the whole universe measured in figures (i.e. “probable” error and “standard error”).

The degree to which the sample universe is homogeneous is an important accuracy factor. The greater the degree of heterogeneity the more difficult and complicated the process. In a recent audit in Colorado, the state Department of Healthcare Financing utilized sample units described as TCN’s (Transaction Control Numbers). These were individual billings for one patient for varying periods, encompassing different units of service. No effort was made to use more homogeneous units or to “stratify” the disparate elements of the units into discrete categories so as to establish greater reliability in the sample accuracy. Further, the states predicated its analysis on “rows of TCN’s.” It discarded entire rows of units of service when there was any defect in the documentation of any individual TCN in the row.

There were also two huge “outliers” in the selected samples which alone accounted for twenty percent of the sample claims. The final result was an asymmetrical distribution of the sample. (A normal distribution looks like a bell curve with the mean being the same value as the median.) The states mean of $100.69 was asymmetrical from the median of $33.72, reflecting a demonstrable lack of precision in the integrity of sample as fairly representing the universe of TCN’s.

Government agencies will sometimes rely on concepts like the “central limit theorem to compensate for the lack of stratification or homogeneity in the sample. The theorem” provides that when you are averaging over different elements, as you are averaging over more and more elements, even though the distribution of those elements may be of any sort, as you take averages of those things, the averages become closer and closer to normal or “Gausian” distribution – a standard distribution. The problem is that it takes a very large sample to reach the standard distribution and their argument is therefore circular.

It is amazing how infrequently agencies calculate the coefficient of variation (“COV”) of the sample which is a mathematical measure of the imprecision of the sample, the higher the value, the more imprecise the sample. The COV is the best overall measure of the validity of the sample. In order to achieve improvement in the tolerances as measured by the COV, the agency must adjust for outliers, stratify sample categories and/or increase the sample size.

Once statistically acceptable precision in the sample is determined there are a number of methods of extrapolation that can be applied to reach a representative amount – the method selection is generally not statistically significant in recovery actions, unless clerical errors exist.

Despite the perception of mathematical unassailability, overpayment and fraud developed through statistical sampling are not always developed with sufficient care and precision to overcome the basic constraints of due process of law and fundamental fairness. There is almost always room to develop a formidable defense in statistical sampling recoupment actions.

## Comments

You can follow this conversation by subscribing to the comment feed for this post.