What’s so special about {kindling}
This package is planned to make it compatible for any machine
learning task, even time series and image classification cam be
supported. Yes, you can do both linear regression and logistic
regression with extra steps: heavily customized optimizer and loss
functions. The train_nn() function (available on
>v0.3.x) supports this { optimizer \leftrightarrow optimizer_args }
and { loss }. For both cases, the key is to remove all
hidden layers and rely entirely on the output layer and the appropriate
loss function to recover the classical model’s behavior.
Setup
box::use(
kindling[train_nn, act_funs, args],
recipes[
recipe, step_dummy, step_normalize,
all_nominal_predictors, all_numeric_predictors
],
rsample[initial_split, training, testing],
yardstick[metric_set, rmse, rsq, accuracy, mn_log_loss],
dplyr[mutate, select],
tibble[tibble]
)Linear Regression as a Special Case
A standard linear regression model predicts a continuous outcome as a weighted sum of inputs — no nonlinearity, no hidden layers. A neural network recovers this exactly when:
- There are no hidden layers
(
hidden_neurons = integer(0)or simply omit it), - The output activation is the identity (i.e., no activation), and
- The common loss function is MSE, but we can choose different loss
function: (
loss = "mse").
Under these conditions, gradient descent minimizes the same objective as ordinary least squares, and the learned weights converge to the OLS solution given sufficient epochs and a small learning rate.
Data
We use mtcars to predict fuel efficiency
(mpg) from the other variables.
set.seed(42)
split = initial_split(mtcars, prop = 0.8)
train = training(split)
test = testing(split)
rec = recipe(mpg ~ ., data = train) |>
step_normalize(all_numeric_predictors())Fitting the model
To create no hidden units, the hidden_neuron parameter
from train_nn() considers the following to achieve:
NULL- Empty
c() - No arguments at all
In this example, the empty vector c() is used and will
collapse the network to a single linear layer from inputs to output. The
optimizer = "rmsprop" with a small learn_rate
mirrors classical gradient descent for OLS.
lm_nn = train_nn(
mpg ~ .,
data = train,
hidden_neurons = c(),
loss = torch::nnf_l1_loss,
optimizer = "rmsprop",
learn_rate = 0.01,
epochs = 200,
verbose = FALSE
)
lm_nn##
## ========================== Generalized Neural Network ==========================
##
##
## -- Model Summary ---------------------------------------------------------------
## -------------------------------------------------------------------
## NN Model Type : FFNN n_predictors : 10
## Number of Epochs : 200 n_response : 1
## Hidden Layer Units : reg. : None
## Number of Hidden Layers : 0 Device : cpu
## Pred. Type : regression :
## -------------------------------------------------------------------
##
##
##
## -- Activation Functions --------------------------------------------------------
## -------------------------------------------------
## Layer {} : No act function applied
## Output Activation : No act function applied
## -------------------------------------------------
##
##
##
## -- Architecture Spec -----------------------------------------------------------
## --------------------------------------------------------------
## nn_layer : N/A before_output_transform : N/A
## out_nn_layer : N/A after_output_transform : N/A
## nn_layer_args : N/A last_layer_args : N/A
## layer_arg_fn : N/A input_transform : N/A
## forward_extract : N/A :
## --------------------------------------------------------------
Evaluation
preds = predict(lm_nn, newdata = test)
tibble(
truth = test$mpg,
estimate = preds
) |>
metric_set(rmse, rsq)(truth = truth, estimate = estimate)## # A tibble: 2 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 rmse standard 4.40
## 2 rsq standard 0.930
Comparison with lm()
lm_fit = lm(mpg ~ ., data = train)
tibble(
truth = test$mpg,
estimate = predict(lm_fit, newdata = test)
) |>
metric_set(rmse, rsq)(truth = truth, estimate = estimate)## # A tibble: 2 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 rmse standard 4.88
## 2 rsq standard 0.493
The two models should produce very similar RMSE and R^2 values. Any small gap reflects that
gradient descent is an iterative approximation, while lm()
solves for the exact OLS coefficients directly. Increasing
epochs or switching to optimizer = "lbfgs" (if
supported) will close the gap further.
Logistic Regression as a Special Case
Logistic regression models a binary or multiclass outcome by passing a linear combination of inputs through a sigmoid or softmax activation. A neural network with:
- No hidden layers,
- A sigmoid output for binary classification (or softmax for multiclass), and
-
Cross-entropy (
loss = "cross_entropy") for the loss function
is mathematically equivalent to logistic regression.
Binary Logistic Regression
We use the Sonar dataset from {mlbench} to
distinguish rocks from mines (binary outcome).
data("Sonar", package = "mlbench")
sonar = Sonar
set.seed(42)
split_s = initial_split(sonar, prop = 0.8, strata = Class)
train_s = training(split_s)
test_s = testing(split_s)
rec_s = recipe(Class ~ ., data = train_s) |>
step_normalize(all_numeric_predictors())
logit_nn = train_nn(
Class ~ .,
data = train_s,
hidden_neurons = c(),
loss = "cross_entropy",
optimizer = "adam",
learn_rate = 0.01,
epochs = 200,
verbose = FALSE
)
logit_nn##
## ========================== Generalized Neural Network ==========================
##
##
## -- Model Summary ---------------------------------------------------------------
## -----------------------------------------------------------------------
## NN Model Type : FFNN n_predictors : 60
## Number of Epochs : 200 n_response : 2
## Hidden Layer Units : reg. : None
## Number of Hidden Layers : 0 Device : cpu
## Pred. Type : classification :
## -----------------------------------------------------------------------
##
##
##
## -- Activation Functions --------------------------------------------------------
## -------------------------------------------------
## Layer {} : No act function applied
## Output Activation : No act function applied
## -------------------------------------------------
##
##
##
## -- Architecture Spec -----------------------------------------------------------
## --------------------------------------------------------------
## nn_layer : N/A before_output_transform : N/A
## out_nn_layer : N/A after_output_transform : N/A
## nn_layer_args : N/A last_layer_args : N/A
## layer_arg_fn : N/A input_transform : N/A
## forward_extract : N/A :
## --------------------------------------------------------------
preds_s = predict(logit_nn, newdata = test_s, type = "response")
tibble(
truth = test_s$Class,
estimate = preds_s
) |>
accuracy(truth = truth, estimate = estimate)## # A tibble: 1 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy binary 0.744
Comparison with glm() /
nnet::multinom()
box::use(nnet[multinom])
glm_fit = glm(Class ~ ., data = train_s, family = binomial())
tibble(
truth = test_s$Class,
estimate = {
as.factor({
preds = predict(glm_fit, newdata = test_s, type = "response")
ifelse(preds < 0.5, "M", "R")
})
}
) |>
accuracy(truth = truth, estimate = estimate)## # A tibble: 1 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy binary 0.698
Again, accuracy should be comparable between the two approaches. The neural network version converges iteratively, so the match is not guaranteed to be exact, but both are optimizing the same cross-entropy objective over a linear model.
