GeneMerge Documentation


Publications

The following documents describe the rationale and statistical methods behind GeneMerge:


Stand-alone GeneMerge Execution

GeneMerge is called as follows:
./GeneMerge.pl gene-association.file description.file population.file study.file output.filename


How to make your own Gene Association Files

Structured text files for use with GeneMerge are available for download and a description of these files can be found here.

However, it's easy to make your own gene association files for use with GeneMerge. Just use any text editor to make two text files with the following formats:

Gene Association file
genename tab functionID;
genename tab functionID;
genename tab functionID;functionID;

Description file
functionID tab description_of_function
functionID tab description_of_function
functionID tab description_of_function

Here's an example of a Gene Association file for Drosophila melanogaster

GeneAssociation File Screenshot

The FBgn numbers are Flybase gene names and the GO:XXXXXXX terms are Gene Ontology IDs for specific functions. The white-space is a single tab. Each ID is followed by a semi-colon and if more than one ID is associated with a gene then these are separated by a semi-colon.


Here's an example of a Description file:

GeneAssociation File Screenshot

The ID terms here are Gene Ontology IDs for specific functions. The human-readable functional descriptions follow after a single tab. Note these lines do not have to end in semi-colons.

You can use a text editor and spreadsheet program to make these files. The following are typical steps you can follow to create gene-associaton and description files using Word and Excel on a Mac:

    1. Download a spreadsheet with the genomic data you are interested in
    2. Open it in Excel
    3. Organize the data so that there are two columns, one with genenames, the adjacent column with IDs
    4. Copy and paste the two columns into Word using Paste Special --> "unformatted text"
    5. Do a seach and replace for the line ending to add semi-colons. Replace ^p with ;^p.
    6. Save file as "text"

Description files can be made along the same lines, just skip step 5. If there are no IDs for your genomic data just make them up in Excel. A list of numbers works just fine, just make sure that each function/categorygets a unique ID.


Understanding the output

Output is a tab-delimited text file that can be opened in a spreadsheet program like Excel either by cutting and pasting from a text editor or importing "as tab delimited." The output file lists each gene-association term found in the study set along with it's English description, frequency in the population set, frequency in the study set, and statistical enrichment score-- uncorrected and corrected. Below is a breakdown of each column header.

GMRG_Term GeneMerge term, for example a GO identifier "GO:0001234"
Pop_freq fraction of genes in the population with this term
Pop_frac fraction of genes in the population with this term (whole numbers)
Study_frac fraction of genes in the study set with this term (whole numbers)
Raw_es P-value
e-score Bonferroni corrected P-value
Description GeneMerge term's English description
Contributing_genes All the genes that are associated with this term in the study set


Here's an example of a GeneMerge output file:

GeneAssociation File Screenshot


The output file also lists the total number of population and study genes, the total number of GeneMerge terms examined, and the number of genes that have terms associated with them. Genes that have no gene-association data associated with them are listed as well. Finally the number of population non-singletons, i.e. the number of terms that contribute to the Bonferroni correction is also given.


Further details on creating gene-association files, installing the program, and troubleshooting can be found in the pdf file:
GeneMerge Documentation.



GeneMerge - Post-genomic analysis, data mining, and hypothesis testing

Cristian I. Castillo-Davis
Department of Biology
University of Maryland
castill0@umd.edu