Frequently Asked Questions
Running GSEA with DecoPath
To run GSEA using DecoPath, you will need:
- An expression dataset
- A file containing class labels (e.g., normal and tumor) for samples in the expression dataset.
You can submit RNA-Seq, microarray, and ChIP-Seq data. Gene identifiers must be HUGO Gene Nomenclature Committee (HGNC) symbols.
The expression data you submit to run GSEA should be in a comma-separated (*.csv), tab-separated (*.tsv) or plain-text (*.txt) file. The first and second column labels should be 'name' and 'description', respectively. All subsequent column labels should be the sample identifiers. Rows should contain an HGNC symbol, a description (which can be left as NA) and all subsequent columns should contain the expression value for that gene for each of the samples:
Much like a categorical class (CLS) file format, this file (a comma-separated (*.csv), tab-separated (*.tsv) or plain-text (*.txt) file) should contain specifications on the labels for each of the samples in the dataset.
For example, for a dataset with expression values for samples that fall into two distinct classes (e.g., normal and tumor), the class labels file should have a column with the same sample identifiers as the ones in the expression dataset file you submit and another column with their corresponding class labels. Please ensure the order of the samples in the expression dataset is the same as the order of samples in the class labels file. Below is an example of a class labels file:
Yes, GSEA can be used to analyze a pre-ranked list of genes using DecoPath.
To run GSEA pre-ranked, upload a file in a comma-separated (*.csv), tab-separated (*.tsv) or plain-text (*.txt) format that contains the following 2 columns:
- HGNC symbols
- Class difference metric for the HGNC symbols
In this case, do not include a header (column names) in the file. See the example below:
Running ORA with DecoPath
To run ORA using DecoPath, you will need:
- A gene list
To run ORA, submit a file that contains a single column with HGNC symbols in a comma-separated (*.csv), tab-separated (*.tsv) or plain-text (*.txt) format.
Submitting your own results
You can also submit the results of an enrichment analysis and use the DecoPath visualizations and functionalities to compare and explore the consensus around different pathway databases. If you do opt to submit your own results, you must use the databases provided in DecoPath. We strongly recommend using gene set files from the following links:
If you submit ORA results, ensure they are in a comma-separated (*.csv), tab-separated (*.tsv) or plain-text (*.txt) text file and the file contains the columns: pathway, p_value, q_value. For example:
If you submit GSEA results, ensure they are in a comma-separated (*.csv), tab-separated (*.tsv) or plain-text (*.txt) text file and the file contains the columns: pathway, es, nes, p_value and q_value. For example:
Performing differential gene expression analysis
No, this is an optional field and can be skipped.
While you can run DecoPath without performing differential expression analysis, we also generate visualizations to identify genes that are differentially expressed according to a fold change cutoff.
The file you submit containing log2 fold changes should be a comma-separated (*.csv), tab-separated (*.tsv) or plain-text (*.txt) file with the columns, gene_symbol, log2fc, p-value and q-value. Please ensure that the log2 fold changes of the genes uploaded correspond to the expression dataset or GSEA results file of the same experimental groups in the case of GSEA and to the log2 fold changes of genes in the gene list in the case of ORA. For example:
You will require two files (comma-separated (*.csv), tab-separated (*.tsv) or plain-text (*.txt) files):
- un-normalized read counts in the form of a matrix of integer values
- class labels
The files must be consistently ordered such that the samples in the counts matrix are in the same order as the samples in the class labels file. If you are running GSEA, you only need to submit one class labels file corresponding to both the expression dataset to perform GSEA and the counts matrix to calculate the fold changes. Example files are given below:
Running DecoPath with other databases
Currently, we only provide 4 databases (i.e., KEGG, Reactome, WikiPathways and PathBank) to run pathway analysis. You can add more databases by uploading files in the gene matrix transposed (GMT) format containing HGNC symbols and their corresponding gene sets. You must also upload corresponding mapping files between each database you select as input.
So, if you wish to run the comparative analysis on KEGG, Reactome and your uploaded database, you need to first create mappings between KEGG and your uploaded database as well as Reactome and your uploaded database. For example, 'Citric acid cycle (TCA cycle)' pathway from Reactome is equivalent to 'TCA Cycle' from WikiPathways.You can see example files here:
If you have prepared these files, you can follow this link to run DecoPath using your preferred gene set databases.