Title: | Predictive (Classification and Regression) Models Homologator |
---|---|
Description: | Methods to unify the different ways of creating predictive models and their different predictive formats for classification and regression. It includes methods such as K-Nearest Neighbors Schliep, K. P. (2004) <doi:10.5282/ubm/epub.1769>, Decision Trees Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone (2017) <doi:10.1201/9781315139470>, ADA Boosting Esteban Alfaro, Matias Gamez, Noelia García (2013) <doi:10.18637/jss.v054.i02>, Extreme Gradient Boosting Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>, Random Forest Breiman (2001) <doi:10.1023/A:1010933404324>, Neural Networks Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Support Vector Machines Bennett, K. P. & Campbell, C. (2000) <doi:10.1145/380995.380999>, Bayesian Methods Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1995) <doi:10.1201/9780429258411>, Linear Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Quadratic Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Logistic Regression Dobson, A. J., & Barnett, A. G. (2018) <doi:10.1201/9781315182780> and Penalized Logistic Regression Friedman, J. H., Hastie, T., & Tibshirani, R. (2010) <doi:10.18637/jss.v033.i01>. |
Authors: | Oldemar Rodriguez R. [aut, cre], Andres Navarro D. [aut], Ariel Arroyo S. [aut], Diego Jimenez A. [aut] |
Maintainer: | Oldemar Rodriguez R. <[email protected]> |
License: | GPL (>=2) |
Version: | 2.2.0 |
Built: | 2024-11-03 04:21:01 UTC |
Source: | https://github.com/PROMiDAT/traineR |
Function that graphs the distribution of individuals and shows their category according to a categorical variable.
categorical.predictive.power( data, predict.variable, variable.to.compare, ylab = "", xlab = "", main = paste("Variable Distribution", variable.to.compare, "according to", predict.variable), col = NA )
categorical.predictive.power( data, predict.variable, variable.to.compare, ylab = "", xlab = "", main = paste("Variable Distribution", variable.to.compare, "according to", predict.variable), col = NA )
data |
A data frame. |
predict.variable |
Character type. The name of the variable to predict. This name must be part of the columns of the data frame. |
variable.to.compare |
Character type. The name of the categorical variable to compare. This name must be part of the columns of the data frame. |
ylab |
A character string that describes the y-axis on the graph. |
xlab |
A character string that describes the x-axis on the graph. |
main |
Character type. The main title of the chart. |
col |
A vector that specifies the colors of the categories of the variable to predict. |
A ggplot object.
With this function we can analyze the predictive power of a categorical variable.
cars <- datasets::mtcars cars$cyl <- as.factor(cars$cyl) cars$vs <- as.factor(cars$vs) categorical.predictive.power(cars,"vs","cyl")
cars <- datasets::mtcars cars$cyl <- as.factor(cars$cyl) cars$vs <- as.factor(cars$vs) categorical.predictive.power(cars,"vs","cyl")
create the confusion matrix.
confusion.matrix(newdata, prediction)
confusion.matrix(newdata, prediction)
newdata |
matrix or data frame of test data. |
prediction |
a prmdt prediction object. |
A matrix with predicted and actual values.
data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.knn <- train.knn(Species~., data.train) modelo.knn prob <- predict(modelo.knn, data.test, type = "prob") prob prediccion <- predict(modelo.knn, data.test, type = "class") prediccion confusion.matrix(data.test, prediccion)
data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.knn <- train.knn(Species~., data.train) modelo.knn prob <- predict(modelo.knn, data.test, type = "prob") prob prediccion <- predict(modelo.knn, data.test, type = "class") prediccion confusion.matrix(data.test, prediccion)
Returns a matrix of contrasts for the train.kknn
.
contr.dummy(n, contrasts = TRUE)
contr.dummy(n, contrasts = TRUE)
n |
A vector containing levels of a factor, or the number of levels. |
contrasts |
A logical value indicating whether contrasts should be computed. |
A matrix with n rows and n-1 columns for contr.ordinal, a matrix with n rows and n columns for contr.dummy and a vector of length n for contr.metric.
Returns a matrix of contrasts for the train.kknn
.
contr.metric(n, contrasts = TRUE)
contr.metric(n, contrasts = TRUE)
n |
A vector containing levels of a factor, or the number of levels. |
contrasts |
A logical value indicating whether contrasts should be computed. |
A matrix with n rows and n-1 columns for contr.ordinal, a matrix with n rows and n columns for contr.dummy and a vector of length n for contr.metric.
Returns a matrix of contrasts for the train.kknn
.
contr.ordinal(n, contrasts = TRUE)
contr.ordinal(n, contrasts = TRUE)
n |
A vector containing levels of a factor, or the number of levels. |
contrasts |
A logical value indicating whether contrasts should be computed. |
A matrix with n rows and n-1 columns for contr.ordinal, a matrix with n rows and n columns for contr.dummy and a vector of length n for contr.metric.
Calculates the confusion matrix, overall accuracy, overall error and the category accuracy for a classification problem and the Root Mean Square Error, Mean Absolute Error, Relative Error and Correlation for a regression problem.
general.indexes(newdata, prediction, mc = NULL)
general.indexes(newdata, prediction, mc = NULL)
newdata |
matrix or data frame of test data. |
prediction |
a prmdt prediction object. |
mc |
(optional) a matrix for calculating the indices. If mc is entered as parameter newdata and prediction are not necessary. |
A list with the appropiate error and precision measurement. The class of this list is indexes.prmdt
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.knn <- train.knn(Species~., data.train) prediccion <- predict(modelo.knn, data.test, type = "class") general.indexes(data.test, prediccion) # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.knn <- train.knn(Infant.Mortality~.,ttraining) prediccion <- predict(model.knn, ttesting) prediccion general.indexes(ttesting, prediccion)
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.knn <- train.knn(Species~., data.train) prediccion <- predict(modelo.knn, data.test, type = "class") general.indexes(data.test, prediccion) # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.knn <- train.knn(Infant.Mortality~.,ttraining) prediccion <- predict(model.knn, ttesting) prediccion general.indexes(ttesting, prediccion)
Function that graphs the importance of the variables.
importance.plot(model, col = "steelblue")
importance.plot(model, col = "steelblue")
model |
fitted model object. |
col |
the color of the chart bars. |
A ggplot object.
With this function we can identify how important the variables are for the generation of a predictive model.
ggplot
, train.adabag
, boosting
data <- iris n <- nrow(data) sam <- sample(1:n,n*0.75) training <- data[sam,] testing <- data[-sam,] model <- train.adabag(formula = Species~.,data = training,minsplit = 2, maxdepth = 30, mfinal = 10) importance.plot(model)
data <- iris n <- nrow(data) sam <- sample(1:n,n*0.75) training <- data[sam,] testing <- data[-sam,] model <- train.adabag(formula = Species~.,data = training,minsplit = 2, maxdepth = 30, mfinal = 10) importance.plot(model)
Function that graphs the density of individuals and shows their category according to a numerical variable.
numerical.predictive.power( data, predict.variable, variable.to.compare, ylab = "", xlab = "", main = paste("Variable Density", variable.to.compare, "according to", predict.variable), col = NA )
numerical.predictive.power( data, predict.variable, variable.to.compare, ylab = "", xlab = "", main = paste("Variable Density", variable.to.compare, "according to", predict.variable), col = NA )
data |
A data frame. |
predict.variable |
Character type. The name of the variable to predict. This name must be part of the columns of the data frame. |
variable.to.compare |
Character type. The name of the numeric variable to compare. This name must be part of the columns of the data frame. |
ylab |
A character string that describes the y-axis on the graph. |
xlab |
A character string that describes the x-axis on the graph. |
main |
Character type. The main title of the chart. |
col |
A vector that specifies the colors of the categories of the variable to predict. |
A ggplot object.
With this function we can analyze the predictive power of a numerical variable.
numerical.predictive.power(iris,"Species","Sepal.Length")
numerical.predictive.power(iris,"Species","Sepal.Length")
Plotting prmdt models
## S3 method for class 'prmdt' plot(x, ...)
## S3 method for class 'prmdt' plot(x, ...)
x |
A prmdt models |
... |
optional arguments to print o format method |
a plot of a model.
Return prediction for a ada
model.
## S3 method for class 'ada.prmdt' predict(object, newdata, type = "class", n.iter = NULL, ...)
## S3 method for class 'ada.prmdt' predict(object, newdata, type = "class", n.iter = NULL, ...)
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
n.iter |
number of iterations to consider for the prediction. By default this is iter from the ada call (n.iter< iter). |
... |
additional arguments affecting the predictions produced. |
a vector or matrix of predictions for ada model.
Return prediction for a boosting
model.
## S3 method for class 'adabag.prmdt' predict(object, newdata, type = "class", ...)
## S3 method for class 'adabag.prmdt' predict(object, newdata, type = "class", ...)
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
... |
additional arguments affecting the predictions produced. |
a vector or matrix of predictions adabag model.
Return prediction for a naiveBayes
model.
## S3 method for class 'bayes.prmdt' predict(object, newdata, type = "class", threshold = 0.001, eps = 0, ...)
## S3 method for class 'bayes.prmdt' predict(object, newdata, type = "class", threshold = 0.001, eps = 0, ...)
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
threshold |
Value replacing cells with 0 probabilities. |
eps |
double for specifying an epsilon-range to apply laplace smoothing (to replace zero or close-zero probabilities by theshold). |
... |
additional arguments affecting the predictions produced. |
a vector or matrix of predictions for bayes model.
Return prediction for a gbm
model.
## S3 method for class 'gbm.prmdt' predict( object, newdata, type = "class", n.trees = NULL, single.tree = FALSE, ... )
## S3 method for class 'gbm.prmdt' predict( object, newdata, type = "class", n.trees = NULL, single.tree = FALSE, ... )
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
n.trees |
Number of trees used in the prediction. n.trees may be a vector in which case predictions are returned for each iteration specified |
single.tree |
If single.tree=TRUE then predict.gbm returns only the predictions from tree(s) n.trees. |
... |
additional arguments affecting the predictions produced. |
a vector or matrix of predictions gbm model.
Return prediction for a glm
model.
## S3 method for class 'glm.prmdt' predict( object, newdata, type = "class", se.fit = FALSE, dispersion = NULL, terms = NULL, na.action = na.pass, ... )
## S3 method for class 'glm.prmdt' predict( object, newdata, type = "class", se.fit = FALSE, dispersion = NULL, terms = NULL, na.action = na.pass, ... )
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
se.fit |
logical switch indicating if standard errors are required. |
dispersion |
the dispersion of the GLM fit to be assumed in computing the standard errors. If omitted, that returned by summary applied to the object is used. |
terms |
with type = "terms" by default all terms are returned. A character vector specifies which terms are to be returned. |
na.action |
function determining what should be done with missing values in newdata. The default is to predict NA. |
... |
additional arguments affecting the predictions produced. |
a vector or matrix of predictions for glm model.
Return prediction for a glmnet
model.
## S3 method for class 'glmnet.prmdt' predict(object, newdata, type = "class", s = NULL, ...)
## S3 method for class 'glmnet.prmdt' predict(object, newdata, type = "class", s = NULL, ...)
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
s |
a |
... |
additional arguments affecting the predictions produced. |
Return prediction for a train.kknn
model.
## S3 method for class 'knn.prmdt' predict(object, newdata, type = "class", ...)
## S3 method for class 'knn.prmdt' predict(object, newdata, type = "class", ...)
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
... |
additional arguments affecting the predictions produced. |
a vector or matrix of predictions for knn model.
Return prediction for a lda
model.
## S3 method for class 'lda.prmdt' predict(object, newdata, type = "class", ...)
## S3 method for class 'lda.prmdt' predict(object, newdata, type = "class", ...)
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
... |
additional arguments affecting the predictions produced. |
a vector or matrix of predictions for lda model.
Return prediction for a neuralnet
model.
## S3 method for class 'neuralnet.prmdt' predict(object, newdata, type = "class", ...)
## S3 method for class 'neuralnet.prmdt' predict(object, newdata, type = "class", ...)
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
... |
additional arguments affecting the predictions produced. |
a vector or matrix of predictions for neuralnet.
Return prediction for a nnet
model.
## S3 method for class 'nnet.prmdt' predict(object, newdata, type = "class", ...)
## S3 method for class 'nnet.prmdt' predict(object, newdata, type = "class", ...)
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
... |
additional arguments affecting the predictions produced. |
a vector or matrix of predictions for nnet model.
Return prediction for a qda
model.
## S3 method for class 'qda.prmdt' predict(object, newdata, type = "class", ...)
## S3 method for class 'qda.prmdt' predict(object, newdata, type = "class", ...)
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
... |
additional arguments affecting the predictions produced. |
a vector or matrix of predictions for qda model.
Return prediction for a randomForest
model.
## S3 method for class 'randomForest.prmdt' predict( object, newdata, type = "class", norm.votes = TRUE, predict.all = FALSE, proximity = FALSE, nodes = FALSE, cutoff, ... )
## S3 method for class 'randomForest.prmdt' predict( object, newdata, type = "class", norm.votes = TRUE, predict.all = FALSE, proximity = FALSE, nodes = FALSE, cutoff, ... )
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
norm.votes |
Should the vote counts be normalized (i.e., expressed as fractions)? Ignored if object$type is regression. |
predict.all |
Should the predictions of all trees be kept? |
proximity |
Should proximity measures be computed? An error is issued if object$type is regression. |
nodes |
Should the terminal node indicators (an n by ntree matrix) be return? If so, it is in the “nodes” attribute of the returned object. |
cutoff |
(Classification only) A vector of length equal to number of classes. The ‘winning’ class for an observation is the one with the maximum ratio of proportion of votes to cutoff. Default is taken from the forest$cutoff component of object (i.e., the setting used when running randomForest). |
... |
additional arguments affecting the predictions produced. |
a vector or matrix of predictions for randomforest model.
Return prediction for a rpart
model.
## S3 method for class 'rpart.prmdt' predict(object, newdata, type = "class", na.action = na.pass, ...)
## S3 method for class 'rpart.prmdt' predict(object, newdata, type = "class", na.action = na.pass, ...)
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
na.action |
a function to determine what should be done with missing values in newdata. The default is to pass them down the tree using surrogates in the way selected when the model was built. Other possibilities are na.omit and na.fail. |
... |
additional arguments affecting the predictions produced. |
a vector or matrix of predictions for rpart model.
Return prediction for a svm
model.
## S3 method for class 'svm.prmdt' predict( object, newdata, type = "class", decision.values = FALSE, ..., na.action = na.omit )
## S3 method for class 'svm.prmdt' predict( object, newdata, type = "class", decision.values = FALSE, ..., na.action = na.omit )
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
decision.values |
Logical controlling whether the decision values of all binary classifiers computed in multiclass classification shall be computed and returned. |
... |
additional arguments affecting the predictions produced. |
na.action |
A function to specify the action to be taken if ‘NA’s are found. The default action is na.omit, which leads to rejection of cases with missing values on any required variable. An alternative is na.fail, which causes an error if NA cases are found. (NOTE: If given, this argument must be named.) |
a vector or matrix of predictionsfor svm model.
Return prediction for a xgb.train
model.
## S3 method for class 'xgb.Booster.prmdt' predict( object, newdata, type = "class", missing = NA, outputmargin = FALSE, ntreelimit = NULL, predleaf = FALSE, predcontrib = FALSE, approxcontrib = FALSE, predinteraction = FALSE, reshape = FALSE, ... )
## S3 method for class 'xgb.Booster.prmdt' predict( object, newdata, type = "class", missing = NA, outputmargin = FALSE, ntreelimit = NULL, predleaf = FALSE, predcontrib = FALSE, approxcontrib = FALSE, predinteraction = FALSE, reshape = FALSE, ... )
object |
a |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction 'prob' or 'class' (default). |
missing |
Missing is only used when input is dense matrix. Pick a float value that represents missing values in data (e.g., sometimes 0 or some other extreme value is used). |
outputmargin |
whether the prediction should be returned in the for of original untransformed sum of predictions from boosting iterations' results. E.g., setting outputmargin=TRUE for logistic regression would result in predictions for log-odds instead of probabilities. |
ntreelimit |
Deprecated, use iterationrange instead. |
predleaf |
whether predict leaf index. |
predcontrib |
whether to return feature contributions to individual predictions (see Details). |
approxcontrib |
whether to use a fast approximation for feature contributions (see Details). |
predinteraction |
whether to return contributions of feature interactions to individual predictions (see Details). |
reshape |
whether to reshape the vector of predictions to a matrix form when there are several prediction outputs per case. This option has no effect when either of predleaf, predcontrib, or predinteraction flags is TRUE. |
... |
additional arguments affecting the predictions produced. |
a vector or matrix of predictions for xgb model.
Function that graphs the balance of the different categories of a column of a data frame.
prediction.variable.balance( data, predict.variable, ylab = "Number of individuals", xlab = "", main = paste("Variable Distribution", predict.variable), col = NA )
prediction.variable.balance( data, predict.variable, ylab = "Number of individuals", xlab = "", main = paste("Variable Distribution", predict.variable), col = NA )
data |
A data frame. |
predict.variable |
Character type. The name of the variable to predict. This name must be part of the columns of the data frame. |
ylab |
A character string that describes the y-axis on the graph. |
xlab |
A character string that describes the x-axis on the graph. |
main |
Character type. The main title of the chart. |
col |
A vector that specifies the colors of the categories represented by bars within the chart. |
A ggplot object.
With this function we can identify if the data is balanced or not, according to the variable to be predicted.
prediction.variable.balance(iris,"Species")
prediction.variable.balance(iris,"Species")
Printing prmdt index object
## S3 method for class 'indexes.prmdt' print(x, ...)
## S3 method for class 'indexes.prmdt' print(x, ...)
x |
A prmdt index object |
... |
optional arguments to print o format method |
a print of the results of a prediction model.
Printing prmdt prediction object
## S3 method for class 'prediction.prmdt' print(x, ...)
## S3 method for class 'prediction.prmdt' print(x, ...)
x |
A prmdt prediction object |
... |
optional arguments to print o format method |
a print prediction of a model.
Printing prmdt models
## S3 method for class 'prmdt' print(x, ...)
## S3 method for class 'prmdt' print(x, ...)
x |
A prmdt models |
... |
optional arguments to print o format method |
a print information of a model.
Function that calculates the area of the ROC curve of a prediction with only 2 categories.
ROC.area(prediction, real)
ROC.area(prediction, real)
prediction |
A vector of real numbers representing the prediction score of a category. |
real |
A vector with the real categories of the individuals in the prediction. |
The value of the area(numeric).
iris2 <- dplyr::filter(iris,(Species == "setosa") | (Species == "virginica")) iris2$Species <- factor(iris2$Species,levels = c("setosa","virginica")) sam <- sample(1:100,20) ttesting <- iris2[sam,] ttraining <- iris2[-sam,] model <- train.rpart(Species~.,ttraining) prediction.prob <- predict(model,ttesting, type = "prob") ROC.area(prediction.prob$prediction[,2],ttesting$Species)
iris2 <- dplyr::filter(iris,(Species == "setosa") | (Species == "virginica")) iris2$Species <- factor(iris2$Species,levels = c("setosa","virginica")) sam <- sample(1:100,20) ttesting <- iris2[sam,] ttraining <- iris2[-sam,] model <- train.rpart(Species~.,ttraining) prediction.prob <- predict(model,ttesting, type = "prob") ROC.area(prediction.prob$prediction[,2],ttesting$Species)
Function that plots the ROC curve of a prediction with only 2 categories.
ROC.plot(prediction, real, .add = FALSE, color = "red")
ROC.plot(prediction, real, .add = FALSE, color = "red")
prediction |
A vector of real numbers representing the prediction score of a category. |
real |
A vector with the real categories of the individuals in the prediction. |
.add |
A logical value that indicates if it should be added to an existing graph |
color |
Color of the ROC curve in the graph |
A plot object.
iris2 <- dplyr::filter(iris,(Species == "setosa") | (Species == "virginica")) iris2$Species <- factor(iris2$Species,levels = c("setosa","virginica")) sam <- sample(1:100,20) ttesting <- iris2[sam,] ttraining <- iris2[-sam,] model <- train.rpart(Species~.,ttraining) prediction.prob <- predict(model,ttesting, type = "prob") ROC.plot(prediction.prob$prediction[,2],ttesting$Species)
iris2 <- dplyr::filter(iris,(Species == "setosa") | (Species == "virginica")) iris2$Species <- factor(iris2$Species,levels = c("setosa","virginica")) sam <- sample(1:100,20) ttesting <- iris2[sam,] ttraining <- iris2[-sam,] model <- train.rpart(Species~.,ttraining) prediction.prob <- predict(model,ttesting, type = "prob") ROC.plot(prediction.prob$prediction[,2],ttesting$Species)
Returns a scaled data.frame.
scaler(df)
scaler(df)
df |
A data.frame only with numeric variables. |
A data.frame.
Provides a wrapping function for the ada
.
train.ada(formula, data, ..., subset, na.action = na.rpart)
train.ada(formula, data, ..., subset, na.action = na.rpart)
formula |
a symbolic description of the model to be fit. |
data |
an optional data frame containing the variables in the model. |
... |
arguments passed to rpart.control. For stumps, use rpart.control(maxdepth=1,cp=-1,minsplit=0,xval=0). maxdepth controls the depth of trees, and cp controls the complexity of trees. The priors should also be fixed through the parms argument as discussed in the second reference. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function that indicates how to process ‘NA’ values. Default=na.rpart. |
A object ada.prmdt with additional information to the model that allows to homogenize the results.
the parameter information was taken from the original function ada
.
The internal function is from package ada
.
data("Puromycin") n <- seq_len(nrow(Puromycin)) .sample <- sample(n, length(n) * 0.75) data.train <- Puromycin[.sample,] data.test <- Puromycin[-.sample,] modelo.ada <- train.ada(state~., data.train) modelo.ada prob <- predict(modelo.ada, data.test , type = "prob") prob prediccion <- predict(modelo.ada, data.test , type = "class") prediccion
data("Puromycin") n <- seq_len(nrow(Puromycin)) .sample <- sample(n, length(n) * 0.75) data.train <- Puromycin[.sample,] data.test <- Puromycin[-.sample,] modelo.ada <- train.ada(state~., data.train) modelo.ada prob <- predict(modelo.ada, data.test , type = "prob") prob prediccion <- predict(modelo.ada, data.test , type = "class") prediccion
Provides a wrapping function for the boosting
.
train.adabag( formula, data, boos = TRUE, mfinal = 100, coeflearn = "Breiman", minsplit = 20, maxdepth = 30, ... )
train.adabag( formula, data, boos = TRUE, mfinal = 100, coeflearn = "Breiman", minsplit = 20, maxdepth = 30, ... )
formula |
a symbolic description of the model to be fit. |
data |
an optional data frame containing the variables in the model. |
boos |
if TRUE (by default), a bootstrap sample of the training set is drawn using the weights for each observation on that iteration. If FALSE, every observation is used with its weights. |
mfinal |
an integer, the number of iterations for which boosting is run or the number of trees to use. Defaults to mfinal=100 iterations. |
coeflearn |
if 'Breiman'(by default), alpha=1/2ln((1-err)/err) is used. If 'Freund' alpha=ln((1-err)/err) is used. In both cases the AdaBoost.M1 algorithm is used and alpha is the weight updating coefficient. On the other hand, if coeflearn is 'Zhu' the SAMME algorithm is implemented with alpha=ln((1-err)/err)+ ln(nclasses-1). |
minsplit |
the minimum number of observations that must exist in a node in order for a split to be attempted. |
maxdepth |
Set the maximum depth of any node of the final tree, with the root node counted as depth 0. Values greater than 30 rpart will give nonsense results on 32-bit machines. |
... |
arguments passed to rpart.control or adabag::boosting. For stumps, use rpart.control(maxdepth=1,cp=-1,minsplit=0,xval=0). maxdepth controls the depth of trees, and cp controls the complexity of trees. |
A object adabag.prmdt with additional information to the model that allows to homogenize the results.
The parameter information was taken from the original function boosting
and rpart.control
.
The internal function is from package boosting
.
data <- iris n <- nrow(data) sam <- sample(1:n,n*0.75) training <- data[sam,] testing <- data[-sam,] model <- train.adabag(formula = Species~.,data = training,minsplit = 2, maxdepth = 30, mfinal = 10) model predict <- predict(object = model,testing,type = "class") predict
data <- iris n <- nrow(data) sam <- sample(1:n,n*0.75) training <- data[sam,] testing <- data[-sam,] model <- train.adabag(formula = Species~.,data = training,minsplit = 2, maxdepth = 30, mfinal = 10) model predict <- predict(object = model,testing,type = "class") predict
Provides a wrapping function for the naiveBayes
.
train.bayes(formula, data, laplace = 0, ..., subset, na.action = na.pass)
train.bayes(formula, data, laplace = 0, ..., subset, na.action = na.pass)
formula |
A formula of the form class ~ x1 + x2 + .... Interactions are not allowed. |
data |
Either a data frame of predictors (categorical and/or numeric) or a contingency table. |
laplace |
positive double controlling Laplace smoothing. The default (0) disables Laplace smoothing. |
... |
Currently not used. |
subset |
For data given in a data frame, an index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.) |
na.action |
A function to specify the action to be taken if NAs are found. The default action is not to count them for the computation of the probability factors. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.) |
A object bayes.prmdt with additional information to the model that allows to homogenize the results.
the parameter information was taken from the original function naiveBayes
.
The internal function is from package naiveBayes
.
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.bayes <- train.bayes(Species ~., data.train) modelo.bayes prob <- predict(modelo.bayes, data.test, type = "prob") prob prediccion <- predict(modelo.bayes, data.test, type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.bayes <- train.bayes(Infant.Mortality~.,ttraining) prediction <- predict(model.bayes, ttesting) prediction
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.bayes <- train.bayes(Species ~., data.train) modelo.bayes prob <- predict(modelo.bayes, data.test, type = "prob") prob prediccion <- predict(modelo.bayes, data.test, type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.bayes <- train.bayes(Infant.Mortality~.,ttraining) prediction <- predict(model.bayes, ttesting) prediction
Provides a wrapping function for the gbm
.
train.gbm( formula, data, distribution = "bernoulli", weights, var.monotone = NULL, n.trees = 100, interaction.depth = 1, n.minobsinnode = 10, shrinkage = 0.001, bag.fraction = 0.5, train.fraction = 1, cv.folds = 0, keep.data = TRUE, verbose = F, class.stratify.cv = NULL, n.cores = NULL )
train.gbm( formula, data, distribution = "bernoulli", weights, var.monotone = NULL, n.trees = 100, interaction.depth = 1, n.minobsinnode = 10, shrinkage = 0.001, bag.fraction = 0.5, train.fraction = 1, cv.folds = 0, keep.data = TRUE, verbose = F, class.stratify.cv = NULL, n.cores = NULL )
formula |
a symbolic description of the model to be fit. |
data |
an optional data frame containing the variables in the model. |
distribution |
Either a character string specifying the name of the distribution to use or a list with a component name specifying the distribution and any additional parameters needed. |
weights |
an optional vector of weights to be used in the fitting process. Must be positive but do not need to be normalized. |
var.monotone |
an optional vector, the same length as the number of predictors, indicating which variables have a monotone increasing (+1), decreasing (-1), or arbitrary (0) relationship with the outcome. |
n.trees |
Integer specifying the total number of trees to fit. This is equivalent to the number of iterations and the number of basis functions in the additive expansion. Default is 100. |
interaction.depth |
Integer specifying the maximum depth of each tree (i.e., the highest level of variable interactions allowed). A value of 1 implies an additive model, a value of 2 implies a model with up to 2-way interactions, etc. Default is 1. |
n.minobsinnode |
Integer specifying the minimum number of observations in the terminal nodes of the trees. Note that this is the actual number of observations, not the total weight. |
shrinkage |
a shrinkage parameter applied to each tree in the expansion. Also known as the learning rate or step-size reduction; 0.001 to 0.1 usually work, but a smaller learning rate typically requires more trees. Default is 0.1. |
bag.fraction |
the fraction of the training set observations randomly selected to propose the next tree in the expansion. This introduces randomnesses into the model fit. |
train.fraction |
The first train.fraction * nrows(data) observations are used to fit the gbm and the remainder are used for computing out-of-sample estimates of the loss function. |
cv.folds |
Number of cross-validation folds to perform. If cv.folds>1 then gbm, in addition to the usual fit, will perform a cross-validation, calculate an estimate of generalization error returned in cv.error. |
keep.data |
a logical variable indicating whether to keep the data and an index of the data stored with the object. Keeping the data and index makes subsequent calls to gbm.more faster at the cost of storing an extra copy of the dataset. |
verbose |
Logical indicating whether or not to print out progress and performance indicators (TRUE). If this option is left unspecified for gbm.more, then it uses verbose from object. Default is FALSE. |
class.stratify.cv |
Logical indicating whether or not the cross-validation should be stratified by class. |
n.cores |
The number of CPU cores to use. The cross-validation loop will attempt to send different CV folds off to different cores. If n.cores is not specified by the user, it is guessed using the detectCores function in the parallel package. |
A object gbm.prmdt with additional information to the model that allows to homogenize the results.
The parameter information was taken from the original function gbm
.
The internal function is from package gbm
.
# Classification data <- iris n <- nrow(data) sam <- sample(1:n, n*0.75) training <- data[sam,] testing <- data[-sam,] model <- train.gbm(formula = Species ~ ., data = training) model predict <- predict(object = model, testing) predict # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.10,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.gbm <- train.gbm(Infant.Mortality~., ttraining, distribution = "gaussian") prediction <- predict(model.gbm, ttesting) prediction
# Classification data <- iris n <- nrow(data) sam <- sample(1:n, n*0.75) training <- data[sam,] testing <- data[-sam,] model <- train.gbm(formula = Species ~ ., data = training) model predict <- predict(object = model, testing) predict # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.10,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.gbm <- train.gbm(Infant.Mortality~., ttraining, distribution = "gaussian") prediction <- predict(model.gbm, ttesting) prediction
Provides a wrapping function for the glm
train.glm( formula, data, family = binomial, weights, subset, na.action, start = NULL, etastart, mustart, offset, control = list(...), model = TRUE, method = "glm.fit", x = FALSE, y = TRUE, singular.ok = TRUE, contrasts = NULL, ... )
train.glm( formula, data, family = binomial, weights, subset, na.action, start = NULL, etastart, mustart, offset, control = list(...), model = TRUE, method = "glm.fit", x = FALSE, y = TRUE, singular.ok = TRUE, contrasts = NULL, ... )
formula |
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’. |
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which glm is called. |
family |
a description of the error distribution and link function to be used in the model. For glm this can be a character string naming a family function, a family function or the result of a call to a family function. For glm.fit only the third option is supported. (See family for details of family functions.) |
weights |
an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful. |
start |
starting values for the parameters in the linear predictor. |
etastart |
starting values for the linear predictor. |
mustart |
starting values for the vector of means. |
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset. |
control |
a list of parameters for controlling the fitting process. For glm.fit this is passed to glm.control. |
model |
a logical value indicating whether model frame should be included as a component of the returned value. |
method |
the method to be used in fitting the model. The default method "glm.fit" uses iteratively reweighted least squares (IWLS): the alternative "model.frame" returns the model frame and does no fitting. User-supplied fitting functions can be supplied either as a function or a character string naming a function, with a function which takes the same arguments as glm.fit. If specified as a character string it is looked up from within the stats namespace. |
x , y
|
For glm: logical values indicating whether the response vector and model matrix used in the fitting process should be returned as components of the returned value. For glm.fit: x is a design matrix of dimension n * p, and y is a vector of observations of length n. |
singular.ok |
logical; if FALSE a singular fit is an error. |
contrasts |
an optional list. See the contrasts.arg of model.matrix.default. |
... |
For glm: arguments to be used to form the default control argument if it is not supplied directly. For weights: further arguments passed to or from other methods. |
A object glm.prmdt with additional information to the model that allows to homogenize the results.
The internal function is from package glm
.
The internal function is from package glm
.
# Classification data("Puromycin") n <- seq_len(nrow(Puromycin)) .sample <- sample(n, length(n) * 0.65) data.train <- Puromycin[.sample,] data.test <- Puromycin[-.sample,] modelo.glm <- train.glm(state~., data.train) modelo.glm prob <- predict(modelo.glm, data.test , type = "prob") prob prediccion <- predict(modelo.glm, data.test , type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.glm <- train.glm(Infant.Mortality~.,ttraining, family = "gaussian") prediction <- predict(model.glm, ttesting) prediction
# Classification data("Puromycin") n <- seq_len(nrow(Puromycin)) .sample <- sample(n, length(n) * 0.65) data.train <- Puromycin[.sample,] data.test <- Puromycin[-.sample,] modelo.glm <- train.glm(state~., data.train) modelo.glm prob <- predict(modelo.glm, data.test , type = "prob") prob prediccion <- predict(modelo.glm, data.test , type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.glm <- train.glm(Infant.Mortality~.,ttraining, family = "gaussian") prediction <- predict(model.glm, ttesting) prediction
Provides a wrapping function for the glmnet
.
train.glmnet( formula, data, standardize = TRUE, alpha = 1, family = "multinomial", cv = TRUE, ... )
train.glmnet( formula, data, standardize = TRUE, alpha = 1, family = "multinomial", cv = TRUE, ... )
formula |
A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators. |
data |
An optional data frame, list or environment from which variables specified in formula are preferentially to be taken. |
standardize |
Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE. If variables are in the same units already, you might not wish to standardize. See details below for y standardization with family="gaussian". |
alpha |
The elasticnet mixing parameter. alpha=1 is the lasso penalty, and alpha=0 the ridge penalty. |
family |
Either a character string representing one of the built-in families, or else a glm() family object. For more information, see Details section below or the documentation for response type (above). |
cv |
True or False. Perform cross-validation to find the best value of the penalty parameter lambda and save this value in the model. This value could be used in predict() function. |
... |
Arguments passed to or from other methods. |
A object glmnet.prmdt with additional information to the model that allows to homogenize the results.
The parameter information was taken from the original function glmnet
.
The internal function is from package glmnet
.
# Classification len <- nrow(iris) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- iris[sampl,] ttraining <- iris[-sampl,] model.glmnet <- train.glmnet(Species~.,ttraining) prediction <- predict(model.glmnet,ttesting) prediction # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.glmnet <- train.glmnet(Infant.Mortality~.,ttraining, family = "gaussian") prediction <- predict(model.glmnet, ttesting) prediction
# Classification len <- nrow(iris) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- iris[sampl,] ttraining <- iris[-sampl,] model.glmnet <- train.glmnet(Species~.,ttraining) prediction <- predict(model.glmnet,ttesting) prediction # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.glmnet <- train.glmnet(Infant.Mortality~.,ttraining, family = "gaussian") prediction <- predict(model.glmnet, ttesting) prediction
Provides a wrapping function for the train.kknn
.
train.knn( formula, data, kmax = 11, ks = NULL, distance = 2, kernel = "optimal", ykernel = NULL, scale = TRUE, contrasts = c(unordered = "contr.dummy", ordered = "contr.ordinal"), ... )
train.knn( formula, data, kmax = 11, ks = NULL, distance = 2, kernel = "optimal", ykernel = NULL, scale = TRUE, contrasts = c(unordered = "contr.dummy", ordered = "contr.ordinal"), ... )
formula |
A formula object. |
data |
Matrix or data frame. |
kmax |
Maximum number of k, if ks is not specified. |
ks |
A vector specifying values of k. If not null, this takes precedence over kmax. |
distance |
Parameter of Minkowski distance. |
kernel |
Kernel to use. Possible choices are "rectangular" (which is standard unweighted knn), "triangular", "epanechnikov" (or beta(2,2)), "biweight" (or beta(3,3)), "triweight" (or beta(4,4)), "cos", "inv", "gaussian" and "optimal". |
ykernel |
Window width of an y-kernel, especially for prediction of ordinal classes. |
scale |
logical, scale variable to have equal sd. |
contrasts |
A vector containing the 'unordered' and 'ordered' contrasts to use. |
... |
Further arguments passed to or from other methods. |
A object knn.prmdt with additional information to the model that allows to homogenize the results.
the parameter information was taken from the original function train.kknn
.
The internal function is from package train.kknn
.
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.knn <- train.knn(Species~., data.train) modelo.knn prob <- predict(modelo.knn, data.test, type = "prob") prob prediccion <- predict(modelo.knn, data.test, type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.knn <- train.knn(Infant.Mortality~.,ttraining) prediction <- predict(model.knn, ttesting) prediction
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.knn <- train.knn(Species~., data.train) modelo.knn prob <- predict(modelo.knn, data.test, type = "prob") prob prediccion <- predict(modelo.knn, data.test, type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.knn <- train.knn(Infant.Mortality~.,ttraining) prediction <- predict(model.knn, ttesting) prediction
Provides a wrapping function for the lda
.
train.lda(formula, data, ..., subset, na.action)
train.lda(formula, data, ..., subset, na.action)
formula |
A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators. |
data |
An optional data frame, list or environment from which variables specified in formula are preferentially to be taken. |
... |
Arguments passed to or from other methods. |
subset |
An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.) |
na.action |
Function to specify the action to be taken if NAs are found. The default action is for the procedure to fail. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.) |
A object lda.prmdt with additional information to the model that allows to homogenize the results.
The parameter information was taken from the original function lda
.
The internal function is from package lda
.
len <- nrow(iris) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- iris[sampl,] ttraining <- iris[-sampl,] model.lda <- train.lda(Species~.,ttraining) model.lda prediction <- predict(model.lda,ttesting) prediction
len <- nrow(iris) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- iris[sampl,] ttraining <- iris[-sampl,] model.lda <- train.lda(Species~.,ttraining) model.lda prediction <- predict(model.lda,ttesting) prediction
Provides a wrapping function for the neuralnet
.
train.neuralnet( formula, data, hidden = 1, threshold = 0.01, stepmax = 1e+05, rep = 1, startweights = NULL, learningrate.limit = NULL, learningrate.factor = list(minus = 0.5, plus = 1.2), learningrate = NULL, lifesign = "none", lifesign.step = 1000, algorithm = "rprop+", err.fct = "sse", act.fct = "logistic", linear.output = TRUE, exclude = NULL, constant.weights = NULL, likelihood = FALSE )
train.neuralnet( formula, data, hidden = 1, threshold = 0.01, stepmax = 1e+05, rep = 1, startweights = NULL, learningrate.limit = NULL, learningrate.factor = list(minus = 0.5, plus = 1.2), learningrate = NULL, lifesign = "none", lifesign.step = 1000, algorithm = "rprop+", err.fct = "sse", act.fct = "logistic", linear.output = TRUE, exclude = NULL, constant.weights = NULL, likelihood = FALSE )
formula |
a symbolic description of the model to be fitted. |
data |
a data frame containing the variables specified in formula. |
a vector of integers specifying the number of hidden neurons (vertices) in each layer. |
|
threshold |
a numeric value specifying the threshold for the partial derivatives of the error function as stopping criteria. |
stepmax |
the maximum steps for the training of the neural network. Reaching this maximum leads to a stop of the neural network's training process. |
rep |
the number of repetitions for the neural network's training. |
startweights |
a vector containing starting values for the weights. Set to NULL for random initialization. |
learningrate.limit |
a vector or a list containing the lowest and highest limit for the learning rate. Used only for RPROP and GRPROP. |
learningrate.factor |
a vector or a list containing the multiplication factors for the upper and lower learning rate. Used only for RPROP and GRPROP. |
learningrate |
a numeric value specifying the learning rate used by traditional backpropagation. Used only for traditional backpropagation. |
lifesign |
a string specifying how much the function will print during the calculation of the neural network. 'none', 'minimal' or 'full'. |
lifesign.step |
an integer specifying the stepsize to print the minimal threshold in full lifesign mode. |
algorithm |
a string containing the algorithm type to calculate the neural network. The following types are possible: 'backprop', 'rprop+', 'rprop-', 'sag', or 'slr'. 'backprop' refers to backpropagation, 'rprop+' and 'rprop-' refer to the resilient backpropagation with and without weight backtracking, while 'sag' and 'slr' induce the usage of the modified globally convergent algorithm (grprop). See Details for more information. |
err.fct |
a differentiable function that is used for the calculation of the error. Alternatively, the strings 'sse' and 'ce' which stand for the sum of squared errors and the cross-entropy can be used. |
act.fct |
a differentiable function that is used for smoothing the result of the cross product of the covariate or neurons and the weights. Additionally the strings, 'logistic' and 'tanh' are possible for the logistic function and tangent hyperbolicus. |
linear.output |
logical. If act.fct should not be applied to the output neurons set linear output to TRUE, otherwise to FALSE. |
exclude |
a vector or a matrix specifying the weights, that are excluded from the calculation. If given as a vector, the exact positions of the weights must be known. A matrix with n-rows and 3 columns will exclude n weights, where the first column stands for the layer, the second column for the input neuron and the third column for the output neuron of the weight. |
constant.weights |
a vector specifying the values of the weights that are excluded from the training process and treated as fix. |
likelihood |
logical. If the error function is equal to the negative log-likelihood function, the information criteria AIC and BIC will be calculated. Furthermore the usage of confidence.interval is meaningfull. |
A object neuralnet.prmdt with additional information to the model that allows to homogenize the results.
the parameter information was taken from the original function neuralnet
.
The internal function is from package neuralnet
.
Provides a wrapping function for the nnet
.
train.nnet(formula, data, weights, ..., subset, na.action, contrasts = NULL)
train.nnet(formula, data, weights, ..., subset, na.action, contrasts = NULL)
formula |
A formula of the form class ~ x1 + x2 + ... |
data |
Data frame from which variables specified in formula are preferentially to be taken. |
weights |
(case) weights for each example – if missing defaults to 1. |
... |
arguments passed to or from other methods. |
subset |
An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.) |
na.action |
A function to specify the action to be taken if NAs are found. The default action is for the procedure to fail. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.) |
contrasts |
a list of contrasts to be used for some or all of the factors appearing as variables in the model formula. |
A object nnet.prmdt with additional information to the model that allows to homogenize the results.
the parameter information was taken from the original function nnet
.
The internal function is from package nnet
.
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.nn <- train.nnet(Species~., data.train, size = 20) modelo.nn prob <- predict(modelo.nn, data.test, type = "prob") prob prediccion <- predict(modelo.nn, data.test, type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.knn <- train.nnet(Infant.Mortality~.,ttraining, size = 20) prediction <- predict(model.knn, ttesting) prediction
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.nn <- train.nnet(Species~., data.train, size = 20) modelo.nn prob <- predict(modelo.nn, data.test, type = "prob") prob prediccion <- predict(modelo.nn, data.test, type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.knn <- train.nnet(Infant.Mortality~.,ttraining, size = 20) prediction <- predict(model.knn, ttesting) prediction
Provides a wrapping function for the qda
.
train.qda(formula, data, ..., subset, na.action)
train.qda(formula, data, ..., subset, na.action)
formula |
A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators. |
data |
An optional data frame, list or environment from which variables specified in formula are preferentially to be taken. |
... |
Arguments passed to or from other methods. |
subset |
An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.) |
na.action |
Function to specify the action to be taken if NAs are found. The default action is for the procedure to fail. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.) |
A object qda.prmdt with additional information to the model that allows to homogenize the results.
The parameter information was taken from the original function qda
.
The internal function is from package qda
.
len <- nrow(iris) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- iris[sampl,] ttraining <- iris[-sampl,] model.qda <- train.qda(Species~.,ttraining) model.qda prediction <- predict(model.qda, ttesting) prediction
len <- nrow(iris) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- iris[sampl,] ttraining <- iris[-sampl,] model.qda <- train.qda(Species~.,ttraining) model.qda prediction <- predict(model.qda, ttesting) prediction
Provides a wrapping function for the randomForest
.
train.randomForest(formula, data, ..., subset, na.action = na.fail)
train.randomForest(formula, data, ..., subset, na.action = na.fail)
formula |
a formula describing the model to be fitted (for the print method, an randomForest object). |
data |
an optional data frame containing the variables in the model. By default the variables are taken from the environment which randomForest is called from. |
... |
optional parameters to be passed to the low level function randomForest.default. |
subset |
an index vector indicating which rows should be used. (NOTE: If given, this argument must be named.) |
na.action |
A function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named.) |
A object randomForest.prmdt with additional information to the model that allows to homogenize the results.
the parameter information was taken from the original function randomForest
.
The internal function is from package randomForest
.
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.rf <- train.randomForest(Species~., data.train) modelo.rf prob <- predict(modelo.rf, data.test, type = "prob") prob prediccion <- predict(modelo.rf, data.test, type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.rf <- train.randomForest(Infant.Mortality~.,ttraining) prediction <- predict(model.rf, ttesting) prediction
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.rf <- train.randomForest(Species~., data.train) modelo.rf prob <- predict(modelo.rf, data.test, type = "prob") prob prediccion <- predict(modelo.rf, data.test, type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.rf <- train.randomForest(Infant.Mortality~.,ttraining) prediction <- predict(model.rf, ttesting) prediction
Provides a wrapping function for the rpart
.
train.rpart( formula, data, weights, subset, na.action = na.rpart, method, model = TRUE, x = FALSE, y = TRUE, parms, control, cost, ... )
train.rpart( formula, data, weights, subset, na.action = na.rpart, method, model = TRUE, x = FALSE, y = TRUE, parms, control, cost, ... )
formula |
a formula, with a response but no interaction terms. If this a a data frame, that is taken as the model frame. |
data |
an optional data frame in which to interpret the variables named in the formula. |
weights |
optional case weights. |
subset |
optional expression saying that only a subset of the rows of the data should be used in the fit. |
na.action |
the default action deletes all observations for which y is missing, but keeps those in which one or more predictors are missing. |
method |
one of "anova", "poisson", "class" or "exp". If method is missing then the routine tries to make an intelligent guess. If y is a survival object, then method = "exp" is assumed, if y has 2 columns then method = "poisson" is assumed, if y is a factor then method = "class" is assumed, otherwise method = "anova" is assumed. It is wisest to specify the method directly, especially as more criteria may added to the function in future. Alternatively, method can be a list of functions named init, split and eval. Examples are given in the file ‘tests/usersplits.R’ in the sources, and in the vignettes ‘User Written Split Functions’. |
model |
if logical: keep a copy of the model frame in the result? If the input value for model is a model frame (likely from an earlier call to the rpart function), then this frame is used rather than constructing new data. |
x |
keep a copy of the x matrix in the result. |
y |
keep a copy of the dependent variable in the result. If missing and model is supplied this defaults to FALSE. |
parms |
optional parameters for the splitting function. Anova splitting has no parameters. Poisson splitting has a single parameter, the coefficient of variation of the prior distribution on the rates. The default value is 1. Exponential splitting has the same parameter as Poisson. For classification splitting, the list can contain any of: the vector of prior probabilities (component prior), the loss matrix (component loss) or the splitting index (component split). The priors must be positive and sum to 1. The loss matrix must have zeros on the diagonal and positive off-diagonal elements. The splitting index can be gini or information. The default priors are proportional to the data counts, the losses default to 1, and the split defaults to gini. |
control |
a list of options that control details of the rpart algorithm. See |
cost |
a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose. |
... |
arguments to |
A object rpart.prmdt with additional information to the model that allows to homogenize the results.
the parameter information was taken from the original function rpart
.
The internal function is from package rpart
.
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.rpart <- train.rpart(Species~., data.train) modelo.rpart prob <- predict(modelo.rpart, data.test, type = "prob") prob prediccion <- predict(modelo.rpart, data.test, type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.rpart <- train.rpart(Infant.Mortality~.,ttraining) prediction <- predict(model.rpart,ttesting) prediction
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.rpart <- train.rpart(Species~., data.train) modelo.rpart prob <- predict(modelo.rpart, data.test, type = "prob") prob prediccion <- predict(modelo.rpart, data.test, type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.rpart <- train.rpart(Infant.Mortality~.,ttraining) prediction <- predict(model.rpart,ttesting) prediction
Provides a wrapping function for the svm
.
train.svm(formula, data, ..., subset, na.action = na.omit, scale = TRUE)
train.svm(formula, data, ..., subset, na.action = na.omit, scale = TRUE)
formula |
a symbolic description of the model to be fit. |
data |
an optional data frame containing the variables in the model. By default the variables are taken from the environment which ‘svm’ is called from. |
... |
additional parameters for the low level fitting function svm.default |
subset |
An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.) |
na.action |
A function to specify the action to be taken if NAs are found. The default action is na.omit, which leads to rejection of cases with missing values on any required variable. An alternative is na.fail, which causes an error if NA cases are found. (NOTE: If given, this argument must be named.) |
scale |
A logical vector indicating the variables to be scaled. If scale is of length 1, the value is recycled as many times as needed. Per default, data are scaled internally (both x and y variables) to zero mean and unit variance. The center and scale values are returned and used for later predictions. |
A object svm.prmdt with additional information to the model that allows to homogenize the results.
the parameter information was taken from the original function svm
.
The internal function is from package svm
.
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.svm <- train.svm(Species~., data.train) modelo.svm prob <- predict(modelo.svm, data.test , type = "prob") prob prediccion <- predict(modelo.svm, data.test , type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.svm <- train.svm(Infant.Mortality~.,ttraining) prediction <- predict(model.svm, ttesting) prediction
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.svm <- train.svm(Species~., data.train) modelo.svm prob <- predict(modelo.svm, data.test , type = "prob") prob prediccion <- predict(modelo.svm, data.test , type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.svm <- train.svm(Infant.Mortality~.,ttraining) prediction <- predict(model.svm, ttesting) prediction
Provides a wrapping function for the xgb.train
.
train.xgboost( formula, data, nrounds, watchlist = list(), obj = NULL, feval = NULL, verbose = 1, print_every_n = 1L, early_stopping_rounds = NULL, maximize = NULL, save_period = NULL, save_name = "xgboost.model", xgb_model = NULL, callbacks = list(), eval_metric = "mlogloss", extra_params = NULL, booster = "gbtree", objective = NULL, eta = 0.3, gamma = 0, max_depth = 6, min_child_weight = 1, subsample = 1, colsample_bytree = 1, ... )
train.xgboost( formula, data, nrounds, watchlist = list(), obj = NULL, feval = NULL, verbose = 1, print_every_n = 1L, early_stopping_rounds = NULL, maximize = NULL, save_period = NULL, save_name = "xgboost.model", xgb_model = NULL, callbacks = list(), eval_metric = "mlogloss", extra_params = NULL, booster = "gbtree", objective = NULL, eta = 0.3, gamma = 0, max_depth = 6, min_child_weight = 1, subsample = 1, colsample_bytree = 1, ... )
formula |
a symbolic description of the model to be fit. |
data |
training dataset. xgb.train accepts only an xgb.DMatrix as the input. xgboost, in addition, also accepts matrix, dgCMatrix, or name of a local data file. |
nrounds |
max number of boosting iterations. |
watchlist |
named list of xgb.DMatrix datasets to use for evaluating model performance. Metrics specified in either eval_metric or feval will be computed for each of these datasets during each boosting iteration, and stored in the end as a field named evaluation_log in the resulting object. When either verbose>=1 or cb.print.evaluation callback is engaged, the performance results are continuously printed out during the training. E.g., specifying watchlist=list(validation1=mat1, validation2=mat2) allows to track the performance of each round's model on mat1 and mat2. |
obj |
customized objective function. Returns gradient and second order gradient with given prediction and dtrain. |
feval |
custimized evaluation function. Returns list(metric='metric-name', value='metric-value') with given prediction and dtrain. |
verbose |
If 0, xgboost will stay silent. If 1, it will print information about performance. If 2, some additional information will be printed out. Note that setting verbose > 0 automatically engages the cb.print.evaluation(period=1) callback function. |
print_every_n |
Print each n-th iteration evaluation messages when verbose>0. Default is 1 which means all messages are printed. This parameter is passed to the cb.print.evaluation callback. |
early_stopping_rounds |
If NULL, the early stopping function is not triggered. If set to an integer k, training with a validation set will stop if the performance doesn't improve for k rounds. Setting this parameter engages the cb.early.stop callback. |
maximize |
If feval and early_stopping_rounds are set, then this parameter must be set as well. When it is TRUE, it means the larger the evaluation score the better. This parameter is passed to the cb.early.stop callback. |
save_period |
when it is non-NULL, model is saved to disk after every save_period rounds, 0 means save at the end. The saving is handled by the cb.save.model callback. |
save_name |
the name or path for periodically saved model file. |
xgb_model |
a previously built model to continue the training from. Could be either an object of class xgb.Booster, or its raw data, or the name of a file with a previously saved model. |
callbacks |
a list of callback functions to perform various task during boosting. See callbacks. Some of the callbacks are automatically created depending on the parameters' values. User can provide either existing or their own callback methods in order to customize the training process. |
eval_metric |
eval_metric evaluation metrics for validation data. Users can pass a self-defined function to it. Default: metric will be assigned according to objective(rmse for regression, and error for classification, mean average precision for ranking). List is provided in detail section. |
extra_params |
the list of parameters. The complete list of parameters is available at http://xgboost.readthedocs.io/en/latest/parameter.html. |
booster |
booster which booster to use, can be gbtree or gblinear. Default: gbtree. |
objective |
objective specify the learning task and the corresponding learning objective, users can pass a self-defined function to it. The default objective options are below: + reg:linear linear regression (Default). + reg:logistic logistic regression. + binary:logistic logistic regression for binary classification. Output probability. + binary:logitraw logistic regression for binary classification, output score before logistic transformation. + num_class set the number of classes. To use only with multiclass objectives. + multi:softmax set xgboost to do multiclass classification using the softmax objective. Class is represented by a number and should be from 0 to num_class - 1. + multi:softprob same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging to each class. + rank:pairwise set xgboost to do ranking task by minimizing the pairwise loss. |
eta |
eta control the learning rate: scale the contribution of each tree by a factor of 0 < eta < 1 when it is added to the current approximation. Used to prevent overfitting by making the boosting process more conservative. Lower value for eta implies larger value for nrounds: low eta value means model more robust to overfitting but slower to compute. Default: 0.3 |
gamma |
gamma minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be.gamma minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be. |
max_depth |
max_depth maximum depth of a tree. Default: 6 |
min_child_weight |
min_child_weight minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative the algorithm will be. Default: 1 |
subsample |
subsample subsample ratio of the training instance. Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees and this will prevent overfitting. It makes computation shorter (because less data to analyse). It is advised to use this parameter with eta and increase nrounds. Default: 1 |
colsample_bytree |
colsample_bytree subsample ratio of columns when constructing each tree. Default: 1 |
... |
other parameters to pass to params. |
A object xgb.Booster.prmdt with additional information to the model that allows to homogenize the results.
the parameter information was taken from the original function xgb.train
.
The internal function is from package xgb.train
.
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.xg <- train.xgboost(Species~., data.train, nrounds = 10, maximize = FALSE) modelo.xg prob <- predict(modelo.xg, data.test, type = "prob") prob prediccion <- predict(modelo.xg, data.test, type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.xgb <- train.xgboost(Infant.Mortality~.,ttraining, nrounds = 10, maximize = FALSE) prediction <- predict(model.xgb, ttesting) prediction
# Classification data("iris") n <- seq_len(nrow(iris)) .sample <- sample(n, length(n) * 0.75) data.train <- iris[.sample,] data.test <- iris[-.sample,] modelo.xg <- train.xgboost(Species~., data.train, nrounds = 10, maximize = FALSE) modelo.xg prob <- predict(modelo.xg, data.test, type = "prob") prob prediccion <- predict(modelo.xg, data.test, type = "class") prediccion # Regression len <- nrow(swiss) sampl <- sample(x = 1:len,size = len*0.20,replace = FALSE) ttesting <- swiss[sampl,] ttraining <- swiss[-sampl,] model.xgb <- train.xgboost(Infant.Mortality~.,ttraining, nrounds = 10, maximize = FALSE) prediction <- predict(model.xgb, ttesting) prediction
Methods to unify the different ways of creating predictive models and their different predictive formats for classification and regression. It includes methods such as K-Nearest Neighbors Schliep, K. P. (2004) <doi:10.5282/ubm/epub.1769>, Decision Trees Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone (2017) <doi:10.1201/9781315139470>, ADA Boosting Esteban Alfaro, Matias Gamez, Noelia García (2013) <doi:10.18637/jss.v054.i02>, Extreme Gradient Boosting Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>, Random Forest Breiman (2001) <doi:10.1023/A:1010933404324>, Neural Networks Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Support Vector Machines Bennett, K. P. & Campbell, C. (2000) <doi:10.1145/380995.380999>, Bayesian Methods Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1995) <doi:10.1201/9780429258411>, Linear Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Quadratic Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Logistic Regression Dobson, A. J., & Barnett, A. G. (2018) <doi:10.1201/9781315182780> and Penalized Logistic Regression Friedman, J. H., Hastie, T., & Tibshirani, R. (2010) <doi:10.18637/jss.v033.i01>.
Package: | traineR |
Type: | Package |
Version: | 2.2.0 |
Date: | 2023-11-09 |
License: | GPL (>=2) |
Maintainer: Oldemar Rodriguez Rojas <[email protected]>
Oldemar Rodriguez Rojas <[email protected]>
Andres Navarro D
Ariel Arroyo S
Diego Jiménez
Plotting prmdt ada models
varplot(x, ...)
varplot(x, ...)
x |
A ada prmdt model |
... |
optional arguments to print o format method |
a plot of the importance of variables.