Welcome to WGBSSuite Pages.

We have developed WGBS Suite in order to allow anyone to asses which WGBS analysis package is the best fit for their data. We provide a tool to analyse your data in order to simulate data of the same type. This data can then be used to benchmark existing analysis methods either automatically (BSmooth, MethylKit and MethylSig) or independently on any existing package. Doing this will help you identify the best package to use for your downstream analysis, hence increase the quality of your analysis overall.

The Simulator

The simulator is summarised in three sections; (A) Firstly the locations of the CpG are simulated using a 2 state hidden Markov model. (B)Next the methylation status for each CpG is simulated using a modulated hidden Markov model that ensures that CpG sites that are close together are more likely to have coordinated behaviour. (C) Finally the methylation profile is calculated producing a simulated number of methylated and de-methylated reads at each CpG. The statistical framework for this simulated can be switched between binomial and negative-binomial depending on which distribution you think better fits your data.

The Benchmarking

The benchmarking currently uses MethylKit, BSmooth , MethylSeq and the Fisher Exact Test. We will be working to encorportate more in the future however it is very easy to simulate data and use it to test any analysis software of your choice. The benchmarking that we have carried out so far shows that the performance is very context dependent with different packages performing very differently depending on coverage and distribution assumption. As an example, the following is an ROC analysis based binomial (D) and negative-binomially (E) distributed methylation counts.

Before starting

Install R: You will need to install the correct version of R for your operating system. In order to do this visit the R mirror that is closest to your location from: http://www.r-project.org/ Install Bsmooth: Download and install the code for bsseq from here: http://rafalab.jhsph.edu/bsmooth/ Install MethylSig:Download and install the code for bsseq from here: http://sartorlab.ccmb.med.umich.edu/node/17 Install MethylKit:Download and install the code for bsseq from here: https://code.google.com/p/methylkit/

Installing the software

Clone the code in a directory that you have read/write permissions. eg /home/USERNAME/bin/WGBSSuite/

git clone https://github.com/SystemsGeneticsSG/WGBSSuite.git

cd /home/USERNAME/bin/WGBSSuite

If this directory is not already in your PATH then you can add it using the following command:

export PATH=$PATH:/home/USERNAME/bin/WGBSSuite/R

Quickstart

You will need to have you data in the correct format (see below) then follow these steps:

Analysing your data:From the command line simply type:

Rscript analyse_WGBS.R

This will run the analysis script in interactive mode, for more details on how to run this script read the advanced guide below. The results of this script will be written to a folder that you select at runtime and include graphs and statistics about the dataset that can be used to parametrise the simulation. The result of this will be a set of parameters for the simulation but also a summary document of the input data, eg. analysis_of_real_data.pdf. This document can be used to visualise the properties of a dataset for example the distribution of read counts or methylated proportion.

Simulating data: From the command simply type,

Rscript simulate_WGBS.R interactive

This will run the simulation script in interactive mode, for more details on how to run this script read the advanced guide below. The results of this script will be written to a folder that you select at runtime and include graph and the raw data that can be used for the benchmarking step or within R to test any WGBS software.

Benchmarking the tools: From the command line simply type,

Rscript benchmark_WGBS.R interactive

This will run the benchmark script in interactive mode, for more details on how to run this script read the advanced guide below. You will be required to enter the filename for the data you have simulated from the previous step as one of the options. This will produce an ROC, AUC and runtime plot for the dataset.

Advanced Guide

Simulating and Benchmarking the data in non-interactive mode:This runs the simulation and benchmarking in a loop and produces an averaged ROC, AUC and runtime analysis. This approach should be used to get the most accurate idea of the performance of a package. The result of this will be a set of simulated data, all store in /tmp/myWGBSanalysis. As well as this will be individual and the averaged analysis plots.

Rscript simulate_WGBS.R multi 5000 0.9203 0.076 0.1 0.1 29 29 3 2 0.1 0.5 0.019,0.002 /tmp/myWGBSanalysis binomial 10

Using the simulated data on other packages:The following command will create a file that can be used to test any WGBS package as follows,

Rscript simulate_WGBS.R 5000 0.9203 0.076 0.1 0.1 29 29 3 2 0.1 0.5 0.019,0.002 /tmp/myWGBSanalysis binomial

Once this has executed you will find the simulated data in the file “/tmp/myWGBSanalysis_5000_1_0.65_0.2_0.35_0.8_0.9_0.1_0.1_0.9_29_29_0.9203_0.076_0.1_0.1_0_0.1_0.5_.txt”. The data in this file has the following format:

blocks of 4 columns for each replica as follows:

This information is sufficient to test any differential methylation software.

Authors and Contributors

@owenrackham

Support or Contact

please email me owen.rackham@duke-nus.edu.sg