What’s the difference between cross-validation and a test set?

Cross-validation reuses your training data in multiple splits to estimate performance and guide model selection. A separate test set is held back until the end to provide a final, unbiased check on the chosen model.

How do I choose the right number of folds (k) for K-Fold?

Common starting points are k = 5 or k = 10 because they often balance stability and compute cost. If training is expensive or the dataset is very large, smaller k can be practical; if data is scarce, slightly larger k can help.

When should I use nested cross-validation?

Use nested cross-validation when you are tuning hyperparameters and comparing models, and you want a less biased estimate of real-world performance. It’s most useful for smaller datasets or high-stakes comparisons, where optimistic validation scores can mislead decisions.

Cross-Validation Strategies in Machine Learning (2026 Guide)

Updated on February 01, 2026 5 minutes read

Cross-validation is a practical way to estimate how well a model will perform on new, unseen data. Instead of relying on a single train/test split, you evaluate the model across multiple splits and summarize the results.

Used well, it helps you spot overfitting early, compare model choices more fairly, and report performance with more confidence. In 2026, it’s also a standard component in many ML workflows, so understanding the trade-offs is essential.

What cross-validation is and why it matters

Cross-validation repeatedly trains your model on one portion of the data and validates it on another. Each split gives you a score; the full set of scores shows how sensitive your results are to the particular sample you trained on.

This matters because a single split can be lucky or unlucky. Cross-validation reduces “split randomness” and gives a more dependable view of generalization, especially when data is limited.

The core workflow

Most cross-validation strategies follow the same structure. The difference is how you create the splits and what assumptions those splits respect (class balance, time order, groups, and so on).

A reliable workflow looks like this:

Split the dataset into training and validation parts using a strategy that matches the data.
Fit the full pipeline (preprocessing plus model) on the training split only.
Evaluate on the validation split with metrics that match the problem.
Repeat across splits, then summarize results (mean plus variability).

A note on pipelines and data leakage

Any preprocessing step that learns from data must be fit inside each training split. That includes scaling, encoding, imputation, feature selection, and target-based transformations.

If you fit preprocessing on the full dataset before splitting, validation scores can become inflated. This is one of the most common reasons cross-validation results look great but fail in production.

Strategy 1: K-Fold cross-validation

K-Fold cross-validation divides your dataset into k roughly equal folds. You train k times, each time holding out one fold for validation and training on the other k − 1 folds.

Common choices are k = 5 or k = 10, balancing compute cost and stability. For very large datasets or expensive training, smaller values can be more practical.

Stratified K-Fold for classification

For classification, especially with imbalanced classes, use Stratified K-Fold. It keeps class proportions similar across folds, so each validation fold is more representative.

This reduces the risk that one fold contains too few examples of a rare class, which can make your evaluation unstable or misleading.

Repeated K-Fold for more stable estimates

If scores vary a lot from fold to fold, Repeated K-Fold can help. You run K-Fold multiple times with different shuffles, then summarize all scores together.

It increases compute cost, but can produce a more stable estimate when the dataset is small, noisy, or sensitive to sampling.

Strategy 2: Leave-One-Out cross-validation (LOOCV)

Leave-One-Out cross-validation trains the model once per sample. Each run holds out a single data point for validation and uses all remaining points for training.

LOOCV can be useful for small datasets with fast models, but it is usually computationally expensive. It can also produce high-variance estimates, so it is not automatically “better” than K-Fold.

Strategy 3: Group-aware cross-validation

If your rows are not independent, you need group-aware splitting. Examples include multiple records per user, patient, device, store, session, or account.

In those cases, standard K-Fold can leak information across folds, because very similar rows may appear in both training and validation. Use Group K-Fold or Leave-One-Group-Out so each group stays entirely in either training or validation.

Strategy 4: Time series cross-validation

For time series, random shuffling breaks the timeline and creates unrealistic training conditions. A model evaluated on “future data mixed into training” can look strong but fail when deployed.

Use walk-forward validation: train on earlier periods and validate on later periods, repeating by expanding or rolling the training window. This mirrors how models are used in production, where you always predict the future from the past.

Strategy 5: ShuffleSplit (Monte Carlo cross-validation)

ShuffleSplit repeatedly samples random train/validation splits with a fixed validation size. It is flexible and often faster than full K-Fold on large datasets.

It is a good option when you have plenty of data and want quick repeated estimates. It is not a good fit for time series or grouped data unless you use specialized variants that respect those constraints.

Strategy 6: Nested cross-validation for honest model selection

If you tune hyperparameters and report validation performance from the same folds, results can become optimistic. Nested cross-validation reduces this bias by separating model selection from model evaluation.

It uses two loops:

Outer loop estimates generalization performance.
Inner loop chooses hyperparameters (grid search, random search, or Bayesian optimization). A nested CV is especially helpful when comparing multiple model families on smaller datasets.

Practical checklist to avoid leakage and misleading scores

Cross-validation is only as trustworthy as the pipeline around it. Use this checklist to keep evaluations realistic:

Fit preprocessing steps inside each fold using a proper pipeline.
Choose a split strategy that matches the data structure (time, groups, imbalance).
Avoid “future features” in time series (features must be available at prediction time).
Report mean performance and variability across folds, not only the best score.
After selecting a final model, validate on a separate test set when possible.

How to choose the right strategy

Pick the strategy that matches how your data is generated and how the model will be used. A quick guide:

General tabular data: K-Fold (often 5 or 10 folds).
Imbalanced classification: Stratified K-Fold.
Repeated measurements per entity: Group K-Fold or Leave-One-Group-Out.
Time series or forecasting: walk-forward validation.
Heavy tuning and model comparisons: nested cross-validation.
Large datasets with quick repeated checks: ShuffleSplit.

If you are unsure, start with K-Fold (or Stratified K-Fold for classification) and then switch to group-aware or time-aware splitting if independence assumptions do not hold.

Next steps

If you want a structured way to build these evaluation habits and apply them across projects, explore Code Labs Academy’s Data Science & AI Bootcamp.

You can also browse all courses to find a path that matches your goals and schedule.