set PATH
= (${PATH}:/your/IDEA/installation/directory
)export
PATH=$PATH:/your/IDEA/installation/directoryIDEA expects different input files depending on
whether you
have multiple datasets or a single dataset and whether you intend to
use existing phylogenetic trees. All these files should be in
the same directory (or directory hierarchy).
5 537>species1ATGGCATGTAAAGTTGATAAAGCTTTAGAGCATTCTACCCAAAATGAAGCACCCTCA------------AAAAATTATATGAACAATTTGTGTTATTACAAAAATAATGAATTAAAAAAAATAGACTCATCATATTTTCAAGATAAGTATTTAGGATTATTTTTTGGAGCTTCATGGTGTAAATATTGTGTATCATTTATAAATAATTTGAATTTATTTAAAACCTACTTTCCCTTTTTTGAAATAATATATATACCATTTGATCAAACATATACAGATTATATCAATTTTTTAAAAAATACTAATTTTTATAGTTTACCTTTTGATAATTATTTATATATAGCTAATAAATTTAAAGTCACAAATTTGCCATCTTTTATTATTATAGCACCCAATAATAATATCCTTGTTAGGGATGGAGTGCAATTAATTAAAACTGACAACTATATAAACAACTTCAAATCCTTGATAAAAAATTATACAATACACCCCAAAACATTCAAATCAAATAATCGATTTTTCGATTTATTCTACAAT>species2ATGGCTTGTAAAGTTGATAAAGCTCCAGAGCATCCTACCCAAAATGAAGTACCCTCA------------CAAAATTATATGAACAATTTATGTTATTACAAAAATAATGAATTAAAAAAAATAGACTCATCATATTTTCAAGATAAATATTTAGGATTATTTTTTGGAGCCTCATGGTGTAAATATTGTGTATCATTTATAAATAATTTGAATTTATTTAAAACCTACTTCCCCTTCTTTGAAATCATATATATACCATTTGATCAAACATATACAGATTATATTAATTTTTTAAAAAATACAAATTTCTATAGCTTACCTTTTGATAATTATTTATATATAGCTAATAAATTTAAAGTCAAAAATTTACCATCATTTATTATTATAGCACCCAATAATAATATCTTG---AGGGATGGTGTGCAATTAATTAAAACAGACACCTATCTAAATAATTTCAAATCATTGATAAAAAATTATACAATACACCCAAAAACATTCAAATCAAATAACCGATTTTTCGACTTATTCTACAAT>species3ATGGCATGTAAAGTTGATAAAGTTTTAGAGCATCCTACCCAAAATGAAGAAACCTCA------------AAAAATTATATGAACAATTTGTTTTATTACAAAAATAATGAATTAAAAAAAATAGACTCATCATATTTTCAAGATAAATATTTAGGATTATTTTTTGGAGCCTCATGGTGTAAATATTGTGTATCATTTATAAATAATTTAAATTTATTTAAAACTTATTTTCCTTTTTTTGAAATTATATATATACCATTTGATCAAACATATACAGATTATATTAATTTTTTAAAAAATACAAATTTTTATACTTTACCTTTTGATAATTATTTATATATAGCTAATAAATTTAAAGTCAAAAATTTGCCATCTTTTATTATTATAGCACCAAATAATAATATACTTGTTAGGGATGGAGTACAATTAATTAAAACTGACAATTATGTAAATAATTTCAAATCTTTGATAAAAAATTATACAATACACCCCAAAACATTCAAATCAAATAATCGATTTTTCGACTTATTCTACAAT>species4ATGGCGTGCCAAGTTGATAACCCCCCTAAAACATACCCAAACGATAAAACAGCTGAATACGAAAAGTACGCAAATTATATGAACTATCTATATTATTATCAAAATAATGAATTAAAAAAAATCGATTCCTCTTATTTTAAAGATAAATATTTAGGATTATTTTTTGGAGCTTCATGGTGTAAATACTGTGTAACCTTTATAGATAGCTTAAATATATTTAAAAAGAACTTCCCCAATGTTGAAATTATATATATACCATTTGATAGAACATATCAAGAGTACCAATCCTTTTTAAAAAATACAAACTTTTATGCTTTACCTTTTGATAATTATTTATATATATGTAAAAAGTATCAAATAAAAAATCTACCTTCCTTTATGTTAATTACACCTAATAATAATATACTAGTAAAGGATGCAGCACAATTAATTAAAACAGATGAATATATAAATAATTTAAAATCATTAATAAAAAATTATATCATACATCCTAAAACGTTTCAATTTAATAATCGCTTTTTTGATTTGTTTCGTAAT>species5ATGAAATGCCAAGTGGATCGCCCCGTTACACCAAACGAAGAGCTAAATGGGGGCCAACAAAATGTAGCCAAAAATTACATCCCCCATTTGTATCAATTCCAAAATAATGAAATGAAAAAAATCGATGCGTCTTACTTTGATAATAAATATCTGGGGCTATTTTTTGGAGCATCCTGGTGCAGGTATTGCGTAACTTTCATCCAAAAAATAAATTTTTTTAAAAAGAATTTCCCCTTTATAGAAATTATATACATCCCTTTTGACAAGACATATAATGATTATATAGCTTTCCTAAAAGGGACCGACTTTTACAGCCTTCCTTTTGATAACTATCTCTACGTTTGCAAAAAATTTAATGTTCAAAATTTGCCATCCTTTATGATCATAGCCCCCAACAACAATGTGCTCGTCAAGGATGCCGTGCAGCTCATCAAGACGGATGCCTACGTGGCGAACTTCAAGTCGTTGGTGAAAAATTACACAATTCACCCGAACCAGTTTAAGTTTGGCAACCGATTTTTCGACTTATTTTGCGCAgene1.PAMLseqgene2.PAMLseqgene3.PAMLseqgene1.PAMLseq
gene1.treegene2.PAMLseq
gene2.treegene3.PAMLseq
gene3.treeset1/alignments/gene1.PAMLseq
set1/trees/gene1.treeset1/alignments/gene2.PAMLseq
set1/trees/gene2.treeset2/alignments/gene3.PAMLseq
set2/trees/gene3.treeidea .
You should see the following window:Fig. 1:
IDEA Start Page

| PAML program | Choose codeml to perform
codon- or amino-acid-based analysis. Choose baseml to perform nucleotide-based analysis. |
| IDEA mode | Multi-dataset mode is the preferred method for
analyzing multiple datasets. It uses grid resources to execute many jobs simultaneously. |
| Dataset
name list (multi-dataset mode) |
The name of the dataset list file described in section 3.1.3. |
| Input
directory (multi-dataset mode) |
This directory should contain all the files described
in section 3.1, except possibly the dataset list. |
| Output
directory (multi-dataset mode) |
This is the directory where all output of an IDEA run
will go. It is recommended that this not be the same as the input directory. If you specify a directory that does not already exist, IDEA will attempt to create it for you. It is recommended that no other files be stored in this directory. |
| runmode | If you have only two sequences for each dataset and are
running codeml,
set runmode
to -2 to perform a pairwise analysis. This will save
computation time and allow you to
view a specialized
output display.If your dataset list includes both pairwise and non-pairwise datasets, setting runmode
to -2 will result in a pairwise analysis for the pairwise datasets and
a standard analysis for the non-pairwise datasets.Setting runmode
to 0 will always result in a standard analysis for all datasets.Extra ω values are ignored in pairwise mode, as are certain PAML parameters. |
| Extra
omega values (codeml) |
This IDEA feature allows you to specify additional
starting values for ω. Separate multiple values with spaces. PAML will be run once for each starting value, including the first starting value given as the parameter omega in the
left column.Afterwards, IDEA will choose the best run for each dataset and evolution model. Extra ω values are ignored in pairwise mode ( runmode -2). |
| PAML Parameters | Mouse over the For more detailed documentation, please refer to the PAML manual (version 4 or 3.15). |
codeml.ctl
or baseml.ctl
file you already have. To do so, choose "Load
configuration..." in the File
menu. After loading your configuration, you may modify
parameters as you see fit..summary .
If you are running codeml
and chose at least one nested pair of NSsites
models, an additional file with the suffix .lrt will be
produced. Its contents will be the results of a likelihood
ratio test performed by running the PAML program chi2.| Standard output: | Standard error: |
idea.create-tree.<dataset name>.out.yyyy-mm-dd |
idea.create-tree.<dataset
name>.err.yyyy-mm-dd |
seqtype = 1;
model = 0),
although a small subset of its features are also applicable
to baseml
analyses and certain other codeml
analyses.NSsites
parameter.
and
buttons to sort across datasets. Use the
and
buttons to sort the models within
each dataset.
) button
is also available.
Press this button to see an interactive histogram of the data
points in the corresponding column. In non-pairwise mode, you
will be prompted to select a single model of evolution on which to base
the histogram. Use the "Number of bins" slider to dynamically
adjust the number of bins between two and 50 (and thus adjust the bin
size). You can also switch models by selecting a different
model from the drop-down box at the top of the window. Press
the "Save" button to save the histogram as a JPEG image.Fig.
2:
Adjustable Histogram

NSsites
is used to specify models that allow ω
to vary across sites.
The
available choices include pairs of nested models which are identical
except that one allows some sites to have ω
> 1. Comparing the likelihood of the observed data
under each
model therefore amounts to a test of the hypothesis that some sites are
under positive selection. For each such pair you selected, a
likelihood ratio test should have been performed as part of the final
step of the IDEA pipeline. If the test is significant at the
5%
level, a
will be visible to the right of
the
likelihood score for the alternative (more complicated) model;
otherwise, an
will be visible in its place.
Mousing
over the
or
icon will bring up a display of the details of the likelihood
calculation similar to that below.
In non-pairwise mode, the bottom of the screen is devoted to the display of selected sites. Click on any row in the table (except rows for model 0-one-ratio) to bring up a display of selected sites for that dataset and evolution model. You will see the amino acid sequence for that dataset. Above each amino acid are one or more colored bars representing the probability that ω at that site falls into each of the site classes allowed by the model. (Although ω can really take on continuous values, PAML's models assume it has one of several discrete values.) A color key listing the ω value for each site class can be found at the top left of the display. The height of each colored bar is proportional to the probability that ω at that site is in the matching site class; probabilities sum to 1. The most probable site class for each site is shown as a colored square under the amino acid. Some models, such as 2-PositiveSelection, 3-discrete and 8-beta&w>1, may have a site class for which ω > 1 (indicating that positive selection is likely at certain sites). In such cases, amino acids for which the most likely ω value is > 1 will be shown in white on black. The numbers directly above the colored bars indicate position (in amino acids) on the sequence.
The selected sites display will be based on PAML's Bayes empirical Bayes analysis when it is available; it will be based on PAML's Naïve empirical Bayes analysis otherwise. The analysis used will be noted in the title of the selected sites display.
Occasionally, the selected sites cannot be visualized. This is usually the result of errors during the execution of PAML, which are in turn usually caused by errors in the input data. In the case of such a failure, you are given the opportunity to browse the text output of the display-creation command in order to find the error.
Note: ω values ≥ 100 for site classes may be shown incorrectly in the color key because ω values ≥ 1000 may be truncated in PAML's output (for example, an ω value of 1100 for a site class could be truncated to 100). This is a known problem in PAML and does not affect the overall ω values listed in the data table.
The portion of the code that creates this picture was contributed by Jonathan Crabtree.In non-pairwise mode, the right side of the screen is devoted to the display of selected sites. Click on the Tree button in any row to display a phylogenetic tree for the selected dataset based on the selected model. As an alternative, you can select a dataset and model from the drop-down boxes in the tree area and press Display. The tree displayed will be drawn to scale, using the branch lengths estimated by PAML. To see a close-up view, click "click to enlarge".
The portion of the code that creates this picture was contributed by Jonathan Badger.| list.txt | This should be a list with the name of one sequence
file on each line. There may optionally be a second column containing the name of one tree file for each sequence file. For the purposes of the output display, the sequence files listed in list.txt need not actually exist. |
| <sequence file name>.lrt | (Optional) One such file may be provided for each sequence file listed in list.txt. To view LRT results, generate these files by calling idea-D-parse-output.pl. |
| sequence
file name>.mlc -OR- <sequence file name>.SINGLE.PAMLout.merged |
One such file is required for each sequence file listed
in list.txt. This should be a PAML output file. |
| <sequence file name>.mst | To view selected sites, one such file is required for
each sequence file listed in list.txt. This should be an RST file as output by PAML. (The .mst
suffix reflects the fact that IDEA generates a merged RST file.) |
| Multiple-dataset analyses: | Single-dataset analyses: |
<dataset name>.PAMLtree*
<dataset name>.wX_X.lnf*
<dataset name>.wX_X.rst*
<dataset name>.mst** (merged rst)
<dataset name>.wX_X.PAMLout*
<dataset name>.mlc** (merged PAMLout)
<dataset name>.summary |
2NG.dN*
ONLY.wX_X.lnf*
ONLY.wX_X.rst*
ONLY.mst** (merged rst)
rates*** (merged PAMLout)
<output file name>.summary<output file name>.lrt |
Egan A., A. Mahurkar, J.
Crabtree, J.H. Badger, J.M. Carlton and J.C. Silva.
IDEA: Interactive Display for Evolutionary Analyses.
BMC Bioinformatics 2008, 9:524.