Large Graph Layout
LGL is a compendium of
applications for making the visualization of large networks and trees
tractable. LGL was specifically motivated by the need to make the
visualization and exploration of large biological networks more
accessible. Essentially the network is a graph, which is the
data that you define, and LGL is responsible for showing it to you.
Some graph based interpretations of biological data investigated in
this lab are shown below in the Current Applications
LGL blossomed into a large project from scratch, and
has undergone two significant rewrites. The first incorporated an
iterative layout, and the second incorporated many data types and
algorithms from the Boost
Graph Library. As expected bugs have crept in between rewrites, but
much effort has been focused on miminimizing this. To report any
problems on any of the LGL related programs please contact Alex Adai at
[ alex dot adai at ucsf dot edu ]. It
should be noted that all of the LGL programs will be sensitive to extra
blank lines trailing the ends of any of the input files so leave those
out. LGL and its periperal
applications were developed at the University
of Texas at Austin in the Marcotte
Lab with support from the National Science Foundation.
submission page is a publicly available server that one may submit
edge file, and get coordinates of the layout in return. The
server is a free service offered to anyone with the hope that it will
be useful to them. After the edge file is submitted the coordinates
from the layout are emailed to you. The server can also send you
a '.lgl' edge file, which is necessary for viewing your results with
supplementary programs (lglview and genVrml.pl).
The time for a given run
varies with system load and usage, but layouts on average can be very
quick (less than a minute) for graphs having tens of thousands of
edges and vertices. The server is expected to have more options over
time, and is currently limited to defaults set in the LGL programs.
This should be sufficient for smaller graphs, but larger graph layouts
can vary significantly with the unincluded optional arguments. I
apologize for this, and hope to address this issue in the near future.
The submission server has
internal settings that may reject user submissions for different
reasons. Before each submission the load average of the server is
checked to be under certain limits. Imposed user limits exist for the
edge file that one submits. The edge file must be less than the given
file size limit, the maximum vertex limit, and the maximum edge limit
as revealed by the 'Show Limits' button. One can reduce the file size
of an edge file by substituting simple integers starting from 1 to n,
where n is the number of vertices, for long names. To see what
the limits are for the server click on the 'Show Limits' button.
Two files are necessary for
looking at the results of your layout. The first is the edge file and
the second is the coordinates. While these files are the minimum other
types of input are allowable for highlighting your 2D or 3D layout.
Such additions can include coloring the edges, vertices, labelling,
lglview will work for windows ONLY UNDER JAVA VERSION 1.4.1_07. The
same may go for Mac OSX. You can download version 1.4.1_07 from Sun in
the Archive area.
lglview ( lglview.jar for JAVA Version
1.4.1 and lglview.jar for JAVA
Version 1.4.2 ) is a JAVA application written solely for viewing 2D
graphs generated by LGL, although it can view graphs generated by
other means if 2D coordinates are available, such as parsing 'lineto'
calls in existing .ps graph files. You must at least have Sun's
version 1.4.1 to safely run the viewer. The coordinates
are not as important as the relative coordinates between the vertices,
since lglview will rescale all the coordinates anyways. Please
understand that lglview is literally a few months old, and should be
considered a work in progress. The README
file is highly recommended. There is also an examples
dir available that has some sample files.
For viewing 3D graphs a PERL
script, genVrml.pl, is available to
generate a VRML file,
which is viewable with a VRML
browser. The perl
script has options for edge and vertex coloring, URL anchoring, text
labels, and more. genVrml.pl uses the VRML
module, which is freely available from CPAN. It also requires an internally
devolped (and not yet documented) module LGLFormatHandler.pm. These are
necessary to compile and run the PERL script so they must be in your
PERL @INC path. You don't have to use or call these modules directly
the script will. This script does not generate optimal VRML code,
but necessity or interest (or outside advice) could elicit a revision.
For usage of genVrml.pl just run the script without any arguments,
and read the output. The command genVrml.pl edges.lgl layout.coords
(where edges.lgl is your edge file and layout.coords is a 3D layout)
will get some VRML code going and get you started. The output VRML
file is always the coords file + '.wrl'. So in the short exapmle
above, the output file would be layout.coords.wrl.
File Formats [top]
There are 2 different file formats that
used for the edge files, which are denoted with the file suffixes .lgl (LGL format) and the .ncol format. The .ncol edge files
are simple 2 column files where two vertices are on each line of the
file white space delimited:
vertex1name vertex3name [optionalWeight]
The graphs here are undirected, and LGL is pretty particular about
that. So if you have an edge A <-> B then you should not have an
edge B <-> A. As far as the .ncol file is concerned, you should NOT have
in the same file nor should
any vertex have an edge to itself.
The second format is the LGL file format (.lgl file
suffix). This is yet another graph file format that tries to be as
stingy as possible with space, yet keeping the edge file in a human
readable (not binary) format. The format itself is like the
Here, the first vertex of an edge is preceded with a pound sign '#'.
Then each vertex that shares an edge with that vertex is listed
one per line on subsequent lines. Again, you can't have directed edges
in the file so you should NOT have
in the same file.
Current In Lab Applications
Visualizing Protein Family Relationships
Visualizing protein homology
Visualizing Predicted Functional Links in Proteins
Applications outside the Lab
Visualizing the internet - http://www.opte.org
Let us know how you are using LGL...
The gallery is a collection of
different graphs and trees generated by LGL from different sources of
biological data. All 2D images used lglview. All 3D images used
genVrml.pl to make the VRML code
SCOP sunid Heirarchy The data is here.
A Protein Homology Graph (32,727 Proteins with
1,206,654 Edges). Color coded based on layout hierarchy. The data is here.
Zoomed Region of the Yeast Protein
Interaction Map in 3D (VRML) . The data is here.
Zooming regions of the "Minimum Spanning Protein Homology Tree" -
302,832 Vertices (Proteins) ...
[ Full Size ] [Full Size 4000x4000 Pixels!]
[ Zoom 1] [Zoom 2] [Zoom 3] [Zoom 4]
[ Largest Connected Set
4000x4000 Pixels! ] - The data is here.
An edge is colored blue if it connectes 2 proteins from the same
species, and red if it connects 2 proteins from 2 different species. If
that information is not available the edges are colored based on layout
hierarchy. In the "Largest Connected Set" the edges are colored as
above, but they remain white if there is no species information
available. Also, the proteins are colored based on their COG (Clusters
of Orthologous Groups) membership.
The Protein Homology Network [ LargeImage
| SmallImage ]
The scource code is available as a zipped tar file.
The programs will only compile on Linux systems with gnu compilers.
While you are free to port and use
LGL on other operating systems and compilers, I will probably not be
of much help with support. If you aren't sure about compiling the
programs yourself, try the online
submission page first.
All files provided with LGL fall under the terms
of the GNU General
Public License. By downloading LGL you agree to the terms of that
Source Code: LGL.tar.gz
Update - 9/2005 - LGL and Boost code was slightly updated to compile under GNU 4.X compilers, but
there will be several warnings from the Boost library if you do use newer compilers. The Boost
library is now included in the archive, since it is patched.
You will need the boost library 1.30.2 if you want this older version of LGL.
Source Code: LGL.tar.gz
Update - 12/2003 - All JAVA source code is now included in LGL.
Update - 2/2004 - Boost Version 1.31 has changes in the random number
generator library that are probably responsible for the break in LGL
compilation. Version 1.30.2 works fine, and it is recommended until a
patch is made for the fix. Thanks to Paul Brunk for pointing this out.
A README is available in the archive after you unpack it. To get you
prompt$ tar -zxvf LGL.tar.gz # This
unpacks it into a dir called 'LGL-1.1'
prompt$ cd LGL-1.1 # Now take a look at
prompt$ ./setup -i # To install the
programs into the ./bin dir
prompt$ ./setup -c conf_file # To get an
example config file to modify and run LGL
prompt$ ./bin/lgl.pl -c conf_file #
modifying the config for your needs, run LGL
After running the above you now have coordinates of your vertices in
space. Now you have to run the right program to look at your results
(Your edge file and final.coords file will probably be elsewhere and
not in the current working directory).
prompt$ ./perls/genVrml.pl edgeFile.lgl final.coords # If your layout was 3D, then you can make some VRML
prompt$ java -jar lglview.jar edgeFile.lgl final.coords # If your layout was 2D, use the JAVA browser to see
(and Graphs) Nicely
The software is designed to draw arbitrarily large trees/graphs so the
underlying algorithm has no functions for minimizing edge overlaps or
other features specific for trees. Although functions exist for such
things, LGL doesn't have an implementation because the layouts would
then not have such scalability. However, there are some tricks for
doing layouts with trees. First you need to do the following:
1) Make sure your tree is in
a singly connected set. That is, every node is reachable by
every other node in the graph by traversing edges. If it isn't your
layouts will be awkward or undefined.
2) Make sure your tree is in
.lgl format. You have to use the base programs so only .lgl format will do, and not
.ncol or any other format.
Now you can use lglayout2D directly (or lglayout3D) in the "bin"
directory of the archive you downloaded and compiled. lglayout2D (or
lglayout3D) is designed specifically for layouts of singly connected
sets. Running it without arguments gives the argument list. A good
option to toy with is -q. This gives the suggested edge distances. For
trees you might want to run it as:
prompt$ lglayout2D -q.05 sample.lgl. # Check your path to lglayout2D
What the -q option does is set the equilibrium distance of the edges.
The smaller the q value; the more it draws the nodes closer together so
the edge lengths aren't as long. You can experiment with that side of
it. Things that help are coloring the edges based on heirarchy - giving
light edge colors to the higher order edges and darker colors for the
lower order edges. That is what was done in the gallery
file of SCOP. If you run a
layout without adjusting the -q option you may see what I call the hairdryer effect. That happens when
all of the edges, in particular those to leaf nodes, are piled on top
of each other in layouts.
Another option is to dive into the code; specifically the function
placementFormula in calcFuncs.C. That function determines the placement
distances for successive layers in the tree. If you feel your tree is
getting too compressed then changing the return value for that function
to a much higher number will give the layout "more room". Undoubtedly,
this is probably the least desirable method; not to mention a total
hack job, but i've had to do for certain layouts.
However, for trees with perfectly non overlapping edges such as those
drawn with phylogenetic programs you may have to use other programs
that are made to view such trees. Those specialized programs will
provide more clear layouts. LGL is meant to be generic and can't
provide clearer layouts than software specialized for such layouts.
Another example is visualizing metabolic pathways - there you also have
to minimize the edge overlaps and present the layout in a more
symmetric manner with specific labels. The obvious drawback of such
specialized programs is usually scalability.
Alex Adai - alex dot adai at ucsf dot edu
Edward Marcotte - marcotte at icmb dot utexas dot edu
AT, Date SV, Wieland S, Marcotte EM. LGL: creating a map of protein
function with an algorithm for visualizing very large biological
networks. J Mol Biol. 2004 Jun 25;340(1):179-90.
and content are Copyright (C) 2002 Alex Adai
Send any questions, comments,
bugs, etc. to [ alex dot adai at ucsf dot edu ]