Create datasets from a hypothetical cohort study. Since correlation structures are often found in udder health studies, the generated datasets are deemed to have occurred from the collection of two milk samples collected 1 month apart from each quarters of a random sample of 30 cows per herd, from 100 dairy herds. The first milk sample (S1) is used to identify quarters at risk of intramammary infection (IMI) at the beginning of the cohort, while the second (S2) is used to identify the outcome (acquisition of a new IMI). Three hypothetical exposures E_q, E_c, E_h (quarter, cow, and herd level) with known strength of association (OR~3.0) are generated. As it is often the case (Dufour et al., 2012), exposures are equally associated with odds of a prevalent IMI on first milk sample as with odds of IMI acquisition on second sample. Exposures are randomly associated with odds of eliminating an existing IMI (OR=1.0). If S. aureus or CNS is chosen, default parameters are used. Otherwise user has to provide his own.
make_data(n_herd, n_cow, bact = c("saureus", "cns", "other"), E_hPr = NULL, E_cPr = NULL, E_qPr = NULL, sigma_sqhPr = NULL, sigma_sqcPr = NULL, b0_Pr = NULL, OR_hPr = NULL, OR_cPr = NULL, OR_qPr = NULL, sigma_sqhI = NULL, sigma_sqcI = NULL, b0_I = NULL, OR_hI = NULL, OR_cI = NULL, OR_qI = NULL, sigma_sqhEl = NULL, sigma_sqcEl = NULL, b0_El = NULL, se_parms = NULL, sp_parms = NULL, se_series = NULL, sp_series = NULL, se_parall = NULL, sp_parall = NULL, se_tri = NULL, sp_tri = NULL)
n_herd | Number of herds. |
---|---|
n_cow | Number of cows per herd. |
bact | Type of bacteria: S. aureus, CNS, or other. If other, as to provide parameters. |
E_hPr | Exposure distribution (0 to 1) of the binary herd-level (h) predictor. S. aureus and CNS = 0.5. |
E_cPr | Exposure distribution (0 to 1) of the binary cow-level (c) predictor. S. aureus and CNS = 0.5. |
E_qPr | Exposure distribution (0 to 1) of the binary quarter-level (q) predictor. S. aureus and CNS = 0.5. |
sigma_sqhPr | Herd-level variance (sigma_sq) for prevalence of intra-mammary infection (IMI). S. aureus = 0.14; CNS = 0.363. |
sigma_sqcPr | Cow-level variance for prevalence of IMI. S. aureus = 2.25; CNS = 0.294. |
b0_Pr | Intercept for IMI prevalence; aiming at a prevalence of 2.5%. S. aureus = -6.7; CNS = -2.15. |
OR_hPr | OR of association between herd-level variable and IMI prevalence. S. aureus and CNS = 3. |
OR_cPr | OR of association between cow-level variable and IMI prevalence. S. aureus and CNS = 3. |
OR_qPr | OR of association between observation-level variable and IMI prevalence. S. aureus and CNS = 3. |
sigma_sqhI | Herd-level variance for incidence of IMI. S. aureus = 0.838; CNS = 0.27. |
sigma_sqcI | Cow-level variance for incidence of IMI. S. aureus = 2.926; CNS = 0.256. |
b0_I | Intercept for IMI incidence. S. aureus = -8.3; CNS = -2.4. |
OR_hI | OR of association between herd-level variable and IMI incidence. S. aureus and CNS = 3. |
OR_cI | OR of association between cow-level variable and IMI incidence. S. aureus and CNS = 3. |
OR_qI | OR of association between observation-level variable and IMI incidence. S. aureus and CNS = 3. |
sigma_sqhEl | Herd-level variance for elimination of IMI. S. aureus = 0.15; CNS = 0.112. |
sigma_sqcEl | Cow-level variance for elimination of IMI. S. aureus = 2.246; CNS = 0.7. |
b0_El | Intercept for IMI persistency. S. aureus = -0.6; CNS = 1.6. |
se_parms | Vector for mode and x to determine shape parameters of Beta distribution of sensitivity (Se). S. aureus = c(0.90, 0.85); CNS = c(0.60, 0.55). |
sp_parms | Vector for mode and x to determine shape parameters of Beta distribution of specificity (Sp). S. aureus = 1; CNS = c(0.95, 0.90). |
se_series | Se improvement or loss associated with sampling strategy: duplicate series. S. aureus = -0.1; CNS = -0.25. |
sp_series | Sp improvement or loss associated with sampling strategy: duplicate series. S. aureus = 0; CNS = 0.05. |
se_parall | Se improvement or loss associated with sampling strategy: duplicate parallel. S. aureus = 0.1; CNS = 0.15. |
sp_parall | Sp improvement or loss associated with sampling strategy:duplicate parallel. S. aureus = 0; CNS = -0.05. |
se_tri | Se improvement or loss associated with sampling strategy: triplicate (2 out of 3). S. aureus = 0; CNS = 0. |
sp_tri | Sp improvement or loss associated with sampling strategy: triplicate (2 out of 3). S. aureus = 0; CNS = 0.10. |
A data frame with variables:
Herd id.
Cow id.
Quarter id.
First milk sample true status.
Second milk sample true status.
Herd-level exposure.
Cow-level exposure.
Quarter-level exposure.
Misclassified first milk sample.
Misclassified second milk sample.
Misclassified first milk sample based on duplicate series sampling strategy.
Misclassified second milk sample based on duplicate series sampling strategy.
Misclassified first milk sample based on duplicate parallel sampling strategy.
Misclassified second milk sample based on duplicate parallel sampling strategy.
Misclassified first milk sample based on triplicate sampling strategy.
Misclassified second milk sample based on triplicate sampling strategy.
Dufour, S., Dohoo, I.R., Barkema, H.W., DesCôteaux, L., DeVries, T.J., Reyher, K.K., Roy, J.-P., Scholl, D.T., 2012 Epidemiology of coagulase-negative staphylococci intramammary infection in dairy cattle and the effect of bacteriological culture misclassification. Journal of Dairy Science 95(6):3110-3124.
# Initiate a list to store the n data frames sim_list <- vector("list", 5) # Do not forget to set seed for replication set.seed(123) sim_list <- replicate(n = 5, expr = make_data(100, 30, "saureus"), simplify = FALSE) # Or with a progress bar require(pbapply)#>#> | | 0 % ~calculating |++++++++++ | 20% ~01s |++++++++++++++++++++ | 40% ~01s |++++++++++++++++++++++++++++++ | 60% ~01s |++++++++++++++++++++++++++++++++++++++++ | 80% ~00s |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 01s