~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ DRAGEN (Differential Regulation based enrichment Analysis for GENe sets) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Author: Shining Ma Date: 2013/7/4 1.Preparations 2.Input format 3.Output format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.Preparations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Download and install the GNU Scientific Library (GSL) first, which is a numerical library for C and C++ programmers. The website is: http://www.gnu.org/software/gsl/ Then download the software "DRAGEN" from the software page of our website. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2.Input format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The input command is: ./DRAGEN expression.txt sample_size_1 sample_size_2 genelist.txt network.txt geneset.txt permutation_times output_file expression.txt: This is the file that contains two-phenotype expression data. It is the tab separated expression matrix text file, with columns indicating samples and rows indicating genes. Note that row and column names are not included. sample_size_1: The sample size of Phenotype 1 sample_size_2: The sample size of Phenotype 2 genelist.txt: This file contains gene name list in one column. Each row indicates the gene name corresponding to the expression matrix. No row or column headers are needed. network.txt: This file contains two columns with tap separated, of which the first indicates the name of TF gene, and the second indicates the name of target gene. No row or column headers are needed. geneset.txt: This is the file that contains gene sets with tap separated. Each row indicates one gene set with gene set name as the first column. The genes inside one gene set are listed after the name. permutation_times : The number of sampling permutation. output_file: The output path and file name. The examples are offered on the software page. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3.Output format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The output file contains five columns, i.e.the gene set names, edge numbers, scores, p-valules and FDR results.