Example data generator for testing FastSurvivalSVM — data

This function simulates a right-censored survival dataset from a fixed Weibull model with three covariates. It is intended **only** as a simple helper for examples, unit tests, and reproducible illustrations of FastSurvivalSVM functionality.

data_generation(n, prop_cen)

Arguments

n: Integer. Sample size.
prop_cen: Numeric in $(0, 1)$. Approximate proportion of censored observations among the n generated individuals.

Value

A data.frame with the following columns:

tempo: Observed time (minimum between event time and censoring time).
cens: Event indicator: 1 = event, 0 = right censoring.
x1: Continuous covariate generated from $N(1, 1^2)$.
x2: Continuous covariate generated from $N(2, 2^2)$.
x3: Continuous covariate generated from an exponential distribution with rate 1.

Details

The data-generating mechanism is not general-purpose and is not meant to cover arbitrary covariate structures or censoring schemes. The model is hard-coded with three covariates (x1, x2, x3), a Weibull baseline distribution, and a specific nonlinear predictor.

The true event times time_t follow a Weibull distribution with shape parameter shape = 2 and scale parameter scale = 5, modified by a nonlinear predictor: $$ \eta = x_1 \log(|x_2|) + 2 \sin(x_3 - x_2)^2, $$ and the event times are generated via the inverse CDF method: $$ T = \left( -\log(1 - U) \right)^{1/\text{shape}} \times \text{scale} \times \exp(-\eta), $$ where $U \sim \text{Uniform}(0, 1)$.

Censoring times are generated by randomly selecting approximately n * prop_cen individuals and replacing their event time by a uniform draw between the minimum observed time_t and the individual's own time_t. For those individuals, the event indicator cens is set to 0.

Examples

set.seed(123)
df <- data_generation(n = 300L, prop_cen = 0.1)
head(df)
#>         tempo cens        x1         x2         x3
#> 1   1.4801101    1 0.4395244  0.5695156 0.33901239
#> 2   2.3390461    1 0.7698225  0.4946221 0.43618362
#> 3 788.5807128    1 2.5587083  0.1229226 0.73802924
#> 4  17.1329416    1 1.0705084 -0.1050266 0.57069724
#> 5   1.6826516    1 1.1292877  1.1256809 0.08437658
#> 6   0.0492852    1 2.7150650  2.6623583 1.35933656

if (reticulate::py_module_available("sksurv")) {
  # Example: using this toy dataset with FastKernelSurvivalSVM
  fit <- fastsvm(
    data      = df,
    time_col  = "tempo",
    delta_col = "cens",
    kernel    = "rbf",
    alpha     = 1,
    rank_ratio = 0
  )

  score(fit, df)
}
#> [1] 0.8438216