I am currently using 10-fold cross-validation for model selection with data
that is clustered in 91 groups. The models that I am using either employ
cluster-wise robust standard errors or random intercepts.
I have found that how the data is randomly assigned into groups can
significantly influence the prediction errors and change which models have
the lowest prediction errors.
I am wondering if anyone knows the correct procedure for running k-fold
cross-validation when the grouping of the variables matter. I am
considering generating ten different group assignments and then comparing
the means and standard deviations of all ten prediction errors.
Unfortunately, this would be a very computationally intensive process and I
am hoping for a more efficient procedure.
Note: Increasing the k seems to increase the variability of the prediction
errors between one set group assignments and another.
Thank you’re your input.
Anthony
-----------------------------------------------
Anthony A. Pezzola
[log in to unmask]
(02) 354-7823
Profesor de Ciencia Política
Instituto de Ciencia Política
Universidad Pontifica Católica de Chile
Santiago de Chile
**********************************************************
Political Methodology E-Mail List
Editors: Melanie Goodrich, <[log in to unmask]>
Delia Bailey, <[log in to unmask]>
**********************************************************
Send messages to [log in to unmask]
To join the list, cancel your subscription, or modify
your subscription settings visit:
http://polmeth.wustl.edu/polmeth.php
**********************************************************