This function simulates a right-censored survival dataset from a fixed
Weibull model with three covariates. It is intended **only** as a simple
helper for examples, unit tests, and reproducible illustrations of
FastSurvivalSVM functionality.
data_generation(n, prop_cen)A data.frame with the following columns:
Observed time (minimum between event time and censoring time).
Event indicator: 1 = event, 0 = right censoring.
Continuous covariate generated from \(N(1, 1^2)\).
Continuous covariate generated from \(N(2, 2^2)\).
Continuous covariate generated from an exponential distribution with rate 1.
The data-generating mechanism is not general-purpose and is not meant to
cover arbitrary covariate structures or censoring schemes. The model is
hard-coded with three covariates (x1, x2, x3), a
Weibull baseline distribution, and a specific nonlinear predictor.
The true event times time_t follow a Weibull distribution with
shape parameter shape = 2 and scale parameter scale = 5,
modified by a nonlinear predictor:
$$
\eta = x_1 \log(|x_2|) + 2 \sin(x_3 - x_2)^2,
$$
and the event times are generated via the inverse CDF method:
$$
T = \left( -\log(1 - U) \right)^{1/\text{shape}} \times
\text{scale} \times \exp(-\eta),
$$
where \(U \sim \text{Uniform}(0, 1)\).
Censoring times are generated by randomly selecting approximately
n * prop_cen individuals and replacing their event time by a
uniform draw between the minimum observed time_t and the
individual's own time_t. For those individuals, the event
indicator cens is set to 0.
set.seed(123)
df <- data_generation(n = 300L, prop_cen = 0.1)
head(df)
#> tempo cens x1 x2 x3
#> 1 1.4801101 1 0.4395244 0.5695156 0.33901239
#> 2 2.3390461 1 0.7698225 0.4946221 0.43618362
#> 3 788.5807128 1 2.5587083 0.1229226 0.73802924
#> 4 17.1329416 1 1.0705084 -0.1050266 0.57069724
#> 5 1.6826516 1 1.1292877 1.1256809 0.08437658
#> 6 0.0492852 1 2.7150650 2.6623583 1.35933656
if (reticulate::py_module_available("sksurv")) {
# Example: using this toy dataset with FastKernelSurvivalSVM
fit <- fastsvm(
data = df,
time_col = "tempo",
delta_col = "cens",
kernel = "rbf",
alpha = 1,
rank_ratio = 0
)
score(fit, df)
}
#> [1] 0.8438216