Three-way ROC surfaces are based on a generalization of dichotomous ROC analysis to three-class diagnostic tests. The discriminatory power of three-class diagnostic tests is measured by the volume under the ROC surface. This measure can be given a probabilistic interpretation similar to the equivalence of the c- index to the area under the ROC curve. This article presents a method to calculate nonparametric estimates of the variance of the volume under the surface using Mann-Whitney U statistics. As a simple extension of this result, it is possible to calculate covariance estimates for the volume under the surface. This allows the statistical comparison of two tests used for diagnostic tasks with three possible outcomes. The formulas derived are validated on synthetic data and applied to a three- class data set of pigmented skin lesions. It is shown that a neural network algorithm trained on clinical data and lesion features performs better than one trained on only the lesion features.