Running Simulation
To generate synthetic double knockout data, by default, the simulated datasets would be stored
under data/ in the current directory, use getwd() to navigate your current working directory.
The default values for parameters of simulated CRISPR screens are set based on empirical assumptions as
follows:
Abbreviations
KO, knockout; SKO, single knockout; DKO, double knockout; %, percentage; GI, genetic interaction; std. dev., standard deviation.
Default Values of Tunable Parameters
Initialized Library Parameters
coverage: 100
n_guide_g: 3
moi: 0.3
sd_freq0: 1/3.29 (chosen by setting a 10-fold difference between 95th and 5th percentiles of SKO counts distribution)
GI Parameters
p_gi : 0.03
sd_gi : 1.5
Gene Class Parameters
% of theoretical phenotype to each gene class
pt_neg: 0.15
pt_pos: 0.05
pt_wt: 0.75
pt_ctrl: 0.05
Mean and std. dev. of theoretical phenotype
mu_neg: -0.75
sd_neg: 0.1
mu_pos: 0.75
sd_pos: 0.1
sd_wt: 0.25
Guide Parameters
High-efficacy guides proportion and CRISPR mode
p_high : 1
mode: CRISPRn-100%Eff
Mean and std. dev. of guide-efficacy
mu_high: 0.9
sd_high: 0.1
mu_low: 0.05
sd_low: 0.07
Cell Doublings Parameters
size.bottleneck: 2
n.bottlenecks: 1
n.iterations: 30
Randomization Parameter
rseed: NULL
Miscellaneous
path: current working directory
cores_free: 1
Run Simulation by Default
To run simulation by default, simply name your simulation by sample_name and specify the number of single genes by n. Be cautious that number of genes in each gene class should be an integer to optimize simulation run. A quick Simulation Settings Summary would be returned for each run. Additionally, number of cores used for parallel computing, Run Time in unit of hours would be collected after one successful run. An example running code is as follows:
dkosim(sample_name = "test", n = 40)
Run Customized Simulation
Alternatively, you may adjust values to any tunable parameters as desired, but please make sure your input on percentage of each gene class add up to 1 for all classes, and each initialized number of genes is an integer. You may also change the output directory using path in the function; by default, the output simulated data and log is under the same directory of current project workspace. The randomization seed can also be specified by rseed to ensure same subsets of gene-pairs has GI in multiple run.
An example running code is
dkosim(sample_name="test",
coverage=10,
n=60,
n_guide_g=2,
sd_freq0 = 1/3.29,
moi = 0.3,
p_gi=0.03,
sd_gi=1.5,
pt_neg=0.15,
pt_pos=0.05,
pt_wt=0.75,
pt_ctrl=0.05,
mu_neg=-0.75,
sd_neg=0.1,
mu_pos=0.75,
sd_pos=0.1,
sd_wt=0.25,
p_high=0.8,
mode="CRISPRn",
mu_high=0.8,
sd_high=0.2,
mu_low=0.1,
sd_low=0.08,
size.bottleneck = 3,
n.bottlenecks= 2,
n.iterations = 30,
rseed = 111,
path = ".",
cores_free = 2)
Check example output in the pre-built DKOsimR vignettes (PDF) Section 4.