Create datasets from a hypothetical cohort study. Since correlation structures are often found in udder health studies, the generated datasets are deemed to have occurred from the collection of two milk samples collected 1 month apart from each quarters of a random sample of 30 cows per herd, from 100 dairy herds. The first milk sample (S1) is used to identify quarters at risk of intramammary infection (IMI) at the beginning of the cohort, while the second (S2) is used to identify the outcome (acquisition of a new IMI). Three hypothetical exposures E_q, E_c, E_h (quarter, cow, and herd level) with known strength of association (OR~3.0) are generated. As it is often the case (Dufour et al., 2012), exposures are equally associated with odds of a prevalent IMI on first milk sample as with odds of IMI acquisition on second sample. Exposures are randomly associated with odds of eliminating an existing IMI (OR=1.0). If S. aureus or CNS is chosen, default parameters are used. Otherwise user has to provide his own.

make_data(n_herd, n_cow, bact = c("saureus", "cns", "other"), E_hPr = NULL,
  E_cPr = NULL, E_qPr = NULL, sigma_sqhPr = NULL, sigma_sqcPr = NULL,
  b0_Pr = NULL, OR_hPr = NULL, OR_cPr = NULL, OR_qPr = NULL,
  sigma_sqhI = NULL, sigma_sqcI = NULL, b0_I = NULL, OR_hI = NULL,
  OR_cI = NULL, OR_qI = NULL, sigma_sqhEl = NULL, sigma_sqcEl = NULL,
  b0_El = NULL, se_parms = NULL, sp_parms = NULL, se_series = NULL,
  sp_series = NULL, se_parall = NULL, sp_parall = NULL, se_tri = NULL,
  sp_tri = NULL)

Arguments

n_herd

Number of herds.

n_cow

Number of cows per herd.

bact

Type of bacteria: S. aureus, CNS, or other. If other, as to provide parameters.

E_hPr

Exposure distribution (0 to 1) of the binary herd-level (h) predictor. S. aureus and CNS = 0.5.

E_cPr

Exposure distribution (0 to 1) of the binary cow-level (c) predictor. S. aureus and CNS = 0.5.

E_qPr

Exposure distribution (0 to 1) of the binary quarter-level (q) predictor. S. aureus and CNS = 0.5.

sigma_sqhPr

Herd-level variance (sigma_sq) for prevalence of intra-mammary infection (IMI). S. aureus = 0.14; CNS = 0.363.

sigma_sqcPr

Cow-level variance for prevalence of IMI. S. aureus = 2.25; CNS = 0.294.

b0_Pr

Intercept for IMI prevalence; aiming at a prevalence of 2.5%. S. aureus = -6.7; CNS = -2.15.

OR_hPr

OR of association between herd-level variable and IMI prevalence. S. aureus and CNS = 3.

OR_cPr

OR of association between cow-level variable and IMI prevalence. S. aureus and CNS = 3.

OR_qPr

OR of association between observation-level variable and IMI prevalence. S. aureus and CNS = 3.

sigma_sqhI

Herd-level variance for incidence of IMI. S. aureus = 0.838; CNS = 0.27.

sigma_sqcI

Cow-level variance for incidence of IMI. S. aureus = 2.926; CNS = 0.256.

b0_I

Intercept for IMI incidence. S. aureus = -8.3; CNS = -2.4.

OR_hI

OR of association between herd-level variable and IMI incidence. S. aureus and CNS = 3.

OR_cI

OR of association between cow-level variable and IMI incidence. S. aureus and CNS = 3.

OR_qI

OR of association between observation-level variable and IMI incidence. S. aureus and CNS = 3.

sigma_sqhEl

Herd-level variance for elimination of IMI. S. aureus = 0.15; CNS = 0.112.

sigma_sqcEl

Cow-level variance for elimination of IMI. S. aureus = 2.246; CNS = 0.7.

b0_El

Intercept for IMI persistency. S. aureus = -0.6; CNS = 1.6.

se_parms

Vector for mode and x to determine shape parameters of Beta distribution of sensitivity (Se). S. aureus = c(0.90, 0.85); CNS = c(0.60, 0.55).

sp_parms

Vector for mode and x to determine shape parameters of Beta distribution of specificity (Sp). S. aureus = 1; CNS = c(0.95, 0.90).

se_series

Se improvement or loss associated with sampling strategy: duplicate series. S. aureus = -0.1; CNS = -0.25.

sp_series

Sp improvement or loss associated with sampling strategy: duplicate series. S. aureus = 0; CNS = 0.05.

se_parall

Se improvement or loss associated with sampling strategy: duplicate parallel. S. aureus = 0.1; CNS = 0.15.

sp_parall

Sp improvement or loss associated with sampling strategy:duplicate parallel. S. aureus = 0; CNS = -0.05.

se_tri

Se improvement or loss associated with sampling strategy: triplicate (2 out of 3). S. aureus = 0; CNS = 0.

sp_tri

Sp improvement or loss associated with sampling strategy: triplicate (2 out of 3). S. aureus = 0; CNS = 0.10.

Value

A data frame with variables:

herd

Herd id.

cow

Cow id.

quarter

Quarter id.

S1

First milk sample true status.

S2

Second milk sample true status.

E_h

Herd-level exposure.

E_c

Cow-level exposure.

E_q

Quarter-level exposure.

S1i

Misclassified first milk sample.

S2i

Misclassified second milk sample.

S1_series

Misclassified first milk sample based on duplicate series sampling strategy.

S2_series

Misclassified second milk sample based on duplicate series sampling strategy.

S1_parall

Misclassified first milk sample based on duplicate parallel sampling strategy.

S2_parall

Misclassified second milk sample based on duplicate parallel sampling strategy.

S1_tri

Misclassified first milk sample based on triplicate sampling strategy.

S2_tri

Misclassified second milk sample based on triplicate sampling strategy.

References

Dufour, S., Dohoo, I.R., Barkema, H.W., DesCôteaux, L., DeVries, T.J., Reyher, K.K., Roy, J.-P., Scholl, D.T., 2012 Epidemiology of coagulase-negative staphylococci intramammary infection in dairy cattle and the effect of bacteriological culture misclassification. Journal of Dairy Science 95(6):3110-3124.

Examples

# Initiate a list to store the n data frames sim_list <- vector("list", 5) # Do not forget to set seed for replication set.seed(123) sim_list <- replicate(n = 5, expr = make_data(100, 30, "saureus"), simplify = FALSE) # Or with a progress bar require(pbapply)
#> Loading required package: pbapply
sim_list <- pbreplicate(n = 5, expr = make_data(100, 30, "cns"), simplify = FALSE)
#> | | 0 % ~calculating |++++++++++ | 20% ~01s |++++++++++++++++++++ | 40% ~01s |++++++++++++++++++++++++++++++ | 60% ~01s |++++++++++++++++++++++++++++++++++++++++ | 80% ~00s |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 01s