% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.cv.R
\name{xgb.cv}
\alias{xgb.cv}
\title{Cross Validation}
\usage{
xgb.cv(
  params = xgb.params(),
  data,
  nrounds,
  nfold,
  prediction = FALSE,
  showsd = TRUE,
  metrics = list(),
  objective = NULL,
  custom_metric = NULL,
  stratified = "auto",
  folds = NULL,
  train_folds = NULL,
  verbose = TRUE,
  print_every_n = 1L,
  early_stopping_rounds = NULL,
  maximize = NULL,
  callbacks = list(),
  ...
)
}
\arguments{
\item{params}{List of XGBoost parameters which control the model building process.
See the \href{https://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}
and the documentation for \code{\link[=xgb.params]{xgb.params()}} for details.

Should be passed as list with named entries. Parameters that are not specified in this
list will use their default values.

A list of named parameters can be created through the function \code{\link[=xgb.params]{xgb.params()}}, which
accepts all valid parameters as function arguments.}

\item{data}{An \code{xgb.DMatrix} object, with corresponding fields like \code{label} or bounds as required
for model training by the objective.

Note that only the basic \code{xgb.DMatrix} class is supported - variants such as \code{xgb.QuantileDMatrix}
or \code{xgb.ExtMemDMatrix} are not supported here.}

\item{nrounds}{Max number of boosting iterations.}

\item{nfold}{The original dataset is randomly partitioned into \code{nfold} equal size subsamples.}

\item{prediction}{A logical value indicating whether to return the test fold predictions
from each CV model. This parameter engages the \code{\link[=xgb.cb.cv.predict]{xgb.cb.cv.predict()}} callback.}

\item{showsd}{Logical value whether to show standard deviation of cross validation.}

\item{metrics}{List of evaluation metrics to be used in cross validation,
when it is not specified, the evaluation metric is chosen according to objective function.
Possible options are:
\itemize{
\item \code{error}: Binary classification error rate
\item \code{rmse}: Root mean square error
\item \code{logloss}: Negative log-likelihood function
\item \code{mae}: Mean absolute error
\item \code{mape}: Mean absolute percentage error
\item \code{auc}: Area under curve
\item \code{aucpr}: Area under PR curve
\item \code{merror}: Exact matching error used to evaluate multi-class classification
}}

\item{objective}{Customized objective function. Should take two arguments: the first one will be the
current predictions (either a numeric vector or matrix depending on the number of targets / classes),
and the second one will be the \code{data} DMatrix object that is used for training.

It should return a list with two elements \code{grad} and \code{hess} (in that order), as either
numeric vectors or numeric matrices depending on the number of targets / classes (same
dimension as the predictions that are passed as first argument).}

\item{custom_metric}{Customized evaluation function. Just like \code{objective}, should take two arguments,
with the first one being the predictions and the second one the \code{data} DMatrix.

Should return a list with two elements \code{metric} (name that will be displayed for this metric,
should be a string / character), and \code{value} (the number that the function calculates, should
be a numeric scalar).

Note that even if passing \code{custom_metric}, objectives also have an associated default metric that
will be evaluated in addition to it. In order to disable the built-in metric, one can pass
parameter \code{disable_default_eval_metric = TRUE}.}

\item{stratified}{Logical flag indicating whether sampling of folds should be stratified
by the values of outcome labels. For real-valued labels in regression objectives,
stratification will be done by discretizing the labels into up to 5 buckets beforehand.

If passing "auto", will be set to \code{TRUE} if the objective in \code{params} is a classification
objective (from XGBoost's built-in objectives, doesn't apply to custom ones), and to
\code{FALSE} otherwise.

This parameter is ignored when \code{data} has a \code{group} field - in such case, the splitting
will be based on whole groups (note that this might make the folds have different sizes).

Value \code{TRUE} here is \strong{not} supported for custom objectives.}

\item{folds}{List with pre-defined CV folds (each element must be a vector of test fold's indices).
When folds are supplied, the \code{nfold} and \code{stratified} parameters are ignored.

If \code{data} has a \code{group} field and the objective requires this field, each fold (list element)
must additionally have two attributes (retrievable through \code{attributes}) named \code{group_test}
and \code{group_train}, which should hold the \code{group} to assign through \code{\link[=setinfo.xgb.DMatrix]{setinfo.xgb.DMatrix()}} to
the resulting DMatrices.}

\item{train_folds}{List specifying which indices to use for training. If \code{NULL}
(the default) all indices not specified in \code{folds} will be used for training.

This is not supported when \code{data} has \code{group} field.}

\item{verbose}{If 0, xgboost will stay silent. If 1, it will print information about performance.
If 2, some additional information will be printed out.
Note that setting \code{verbose > 0} automatically engages the
\code{xgb.cb.print.evaluation(period=1)} callback function.}

\item{print_every_n}{When passing \code{verbose>0}, evaluation logs (metrics calculated on the
data passed under \code{evals}) will be printed every nth iteration according to the value passed
here. The first and last iteration are always included regardless of this 'n'.

Only has an effect when passing data under \code{evals} and when passing \code{verbose>0}. The parameter
is passed to the \code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} callback.}

\item{early_stopping_rounds}{Number of boosting rounds after which training will be stopped
if there is no improvement in performance (as measured by the evaluatiation metric that is
supplied or selected by default for the objective) on the evaluation data passed under
\code{evals}.

Must pass \code{evals} in order to use this functionality. Setting this parameter adds the
\code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}} callback.

If \code{NULL}, early stopping will not be used.}

\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set, then this parameter must be set as well.
When it is \code{TRUE}, it means the larger the evaluation score the better.
This parameter is passed to the \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}} callback.}

\item{callbacks}{A list of callback functions to perform various task during boosting.
See \code{\link[=xgb.Callback]{xgb.Callback()}}. Some of the callbacks are automatically created depending on the
parameters' values. User can provide either existing or their own callback methods in order
to customize the training process.}

\item{...}{Not used.

Some arguments that were part of this function in previous XGBoost versions are currently
deprecated or have been renamed. If a deprecated or renamed argument is passed, will throw
a warning (by default) and use its current equivalent instead. This warning will become an
error if using the \link[=xgboost-options]{'strict mode' option}.

If some additional argument is passed that is neither a current function argument nor
a deprecated or renamed argument, a warning or error will be thrown depending on the
'strict mode' option.

\bold{Important:} \code{...} will be removed in a future version, and all the current
deprecation warnings will become errors. Please use only arguments that form part of
the function signature.}
}
\value{
An object of class 'xgb.cv.synchronous' with the following elements:
\itemize{
\item \code{call}: Function call.
\item \code{params}: Parameters that were passed to the xgboost library. Note that it does not
capture parameters changed by the \code{\link[=xgb.cb.reset.parameters]{xgb.cb.reset.parameters()}} callback.
\item \code{evaluation_log}: Evaluation history stored as a \code{data.table} with the
first column corresponding to iteration number and the rest corresponding to the
CV-based evaluation means and standard deviations for the training and test CV-sets.
It is created by the \code{\link[=xgb.cb.evaluation.log]{xgb.cb.evaluation.log()}} callback.
\item \code{niter}: Number of boosting iterations.
\item \code{nfeatures}: Number of features in training data.
\item \code{folds}: The list of CV folds' indices - either those passed through the \code{folds}
parameter or randomly generated.
\item \code{best_iteration}: Iteration number with the best evaluation metric value
(only available with early stopping).
}

Plus other potential elements that are the result of callbacks, such as a list \code{cv_predict} with
a sub-element \code{pred} when passing \code{prediction = TRUE}, which is added by the \code{\link[=xgb.cb.cv.predict]{xgb.cb.cv.predict()}}
callback (note that one can also pass it manually under \code{callbacks} with different settings,
such as saving also the models created during cross validation); or a list \code{early_stop} which
will contain elements such as \code{best_iteration} when using the early stopping callback (\code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}}).
}
\description{
The cross validation function of xgboost.
}
\details{
The original sample is randomly partitioned into \code{nfold} equal size subsamples.

Of the \code{nfold} subsamples, a single subsample is retained as the validation data for testing the model,
and the remaining \code{nfold - 1} subsamples are used as training data.

The cross-validation process is then repeated \code{nrounds} times, with each of the
\code{nfold} subsamples used exactly once as the validation data.

All observations are used for both training and validation.

Adapted from \url{https://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29}
}
\examples{
data(agaricus.train, package = "xgboost")

dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))

cv <- xgb.cv(
  data = dtrain,
  nrounds = 3,
  params = xgb.params(
    nthread = 2,
    max_depth = 3,
    objective = "binary:logistic"
  ),
  nfold = 5,
  metrics = list("rmse","auc")
)
print(cv)
print(cv, verbose = TRUE)

}
