Large Graph Layout


Contents
Introduction
Submission Page
Viewing Results
File Formats
Current Applications
Gallery
Download
Usage
Tips
Contact
Cite


Introduction [top]

    LGL is a compendium of applications for making the visualization of large networks and trees tractable.  LGL was specifically motivated by the need to make the visualization and exploration of large biological networks more accessible.  Essentially the network is a graph, which is the data that you define, and LGL is responsible for showing it to you. Some graph based interpretations of biological data investigated in this lab are shown below in the Current Applications section.
    LGL blossomed into a large project from scratch, and has undergone two significant rewrites. The first incorporated an iterative layout, and the second incorporated many data types and algorithms from the Boost Graph Library. As expected bugs have crept in between rewrites, but much effort has been focused on miminimizing this. To report any problems on any of the LGL related programs please contact Alex Adai at [ alex dot adai at ucsf dot edu ]. It should be noted that all of the LGL programs will be sensitive to extra blank lines trailing the ends of any of the input files so leave those out. LGL and its periperal applications were developed at the University of Texas at Austin in the Marcotte Lab with support from the National Science Foundation.

Submission Page [top]

    The submission page is a publicly available server that one may submit an edge file, and get coordinates of the layout in return.  The server is a free service offered to anyone with the hope that it will be useful to them. After the edge file is submitted the coordinates from the layout are emailed to you.  The server can also send you a '.lgl' edge file, which is necessary for viewing your results with the supplementary programs (lglview and genVrml.pl).
    The time for a given run varies with system load and usage, but layouts on average can be very quick (less than a minute) for graphs having tens of thousands of edges and vertices. The server is expected to have more options over time, and is currently limited to defaults set in the LGL programs. This should be sufficient for smaller graphs, but larger graph layouts can vary significantly with the unincluded optional arguments. I apologize for this, and hope to address this issue in the near future.
    The submission server has internal settings that may reject user submissions for different reasons. Before each submission the load average of the server is checked to be under certain limits. Imposed user limits exist for the edge file that one submits. The edge file must be less than the given file size limit, the maximum vertex limit, and the maximum edge limit as revealed by the 'Show Limits' button. One can reduce the file size of an edge file by substituting simple integers starting from 1 to n, where n is the number of vertices, for long names. To see what the limits are for the server click on the 'Show Limits' button.

Viewing Results [top]

    Two files are necessary for looking at the results of your layout. The first is the edge file and the second is the coordinates. While these files are the minimum other types of input are allowable for highlighting your 2D or 3D layout. Such additions can include coloring the edges, vertices, labelling, and more.

2D 

Important Note: lglview will work for windows ONLY UNDER JAVA VERSION 1.4.1_07. The same may go for Mac OSX. You can download version 1.4.1_07 from Sun in the Archive area. lglview ( lglview.jar for JAVA Version 1.4.1 and lglview.jar for JAVA Version 1.4.2 ) is a JAVA application written solely for viewing 2D graphs generated by LGL, although it can view graphs generated by other means if 2D coordinates are available, such as parsing 'lineto' calls in existing .ps graph files. You must at least have Sun's JAVA version 1.4.1 to safely run the viewer. The coordinates themselves are not as important as the relative coordinates between the vertices, since lglview will rescale all the coordinates anyways. Please understand that lglview is literally a few months old, and should be considered a work in progress. The README file is highly recommended. There is also an examples dir available that has some sample files.

3D
    For viewing 3D graphs a PERL script, genVrml.pl, is available to generate a VRML file, which is viewable with a VRML browser. The perl script has options for edge and vertex coloring, URL anchoring, text labels, and more.  genVrml.pl uses the VRML module, which is freely available from CPAN. It also requires an internally devolped (and not yet documented) module LGLFormatHandler.pm. These are necessary to compile and run the PERL script so they must be in your PERL @INC path. You don't have to use or call these modules directly but the script will.  This script does not generate optimal VRML code, but necessity or interest (or outside advice) could elicit a revision. For usage of genVrml.pl just run the script without any arguments, and read the output. The command genVrml.pl edges.lgl layout.coords (where edges.lgl is your edge file and layout.coords is a 3D layout) will get some VRML code going and get you started. The output VRML file is always the coords file + '.wrl'. So in the short exapmle above, the output file would be layout.coords.wrl.

File Formats [top]

     There are 2 different file formats that used for the edge files, which are denoted with the file suffixes .lgl (LGL format) and the .ncol format. The .ncol edge files are simple 2 column files where two vertices are on each line of the file white space delimited:

vertex1name vertex2name [optionalWeight]
vertex1name vertex3name [optionalWeight]
...

The graphs here are undirected, and LGL is pretty particular about that. So if you have an edge A <-> B then you should not have an edge B <-> A. As far as the .ncol file is concerned, you should NOT have

vertex1name vertex2name
vertex2name vertex1name

in the same file nor should any vertex have an edge to itself.
    The second format is the LGL file format (.lgl file suffix). This is yet another graph file format that tries to be as stingy as possible with space, yet keeping the edge file in a human readable (not binary) format.  The format itself is like the following:

# vertex1name
vertex2name [optionalWeight]
vertex3name [optionalWeight]

Here, the first vertex of an edge is preceded with a pound sign '#'.  Then each vertex that shares an edge with that vertex is listed one per line on subsequent lines. Again, you can't have directed edges in the file so you should NOT have

# vertex1name
vertex2name
# vertex2name
vertex1name

in the same file.

Current In Lab Applications [top]

    Visualizing Protein Family Relationships

    Visualizing protein homology

    Visualizing Predicted Functional Links in Proteins

Applications outside the Lab [top]

    Visualizing the internet - http://www.opte.org

Let us know how you are using LGL...

Gallery [top]

    The gallery is a collection of different graphs and trees generated by LGL from different sources of biological data. All 2D images used lglview. All 3D images used genVrml.pl to make the VRML code

SCOP sunid Heirarchy The data is here.
A Protein Homology Graph (32,727 Proteins with 1,206,654 Edges). Color coded based on layout hierarchy. The data is here.
Zoomed Region of the Yeast Protein Interaction Map in 3D (VRML) . The data is here.

Zooming regions of the "Minimum Spanning Protein Homology Tree" - 302,832 Vertices (Proteins) ...

[ Full Size ] [Full Size 4000x4000 Pixels!] [ Zoom 1] [Zoom 2] [Zoom 3] [Zoom 4]
[ Largest Connected Set 4000x4000 Pixels! ] - The data is here.

An edge is colored blue if it connectes 2 proteins from the same species, and red if it connects 2 proteins from 2 different species. If that information is not available the edges are colored based on layout hierarchy. In the "Largest Connected Set" the edges are colored as above, but they remain white if there is no species information available. Also, the proteins are colored based on their COG (Clusters of Orthologous Groups) membership.

The Protein Homology Network [ LargeImage | SmallImage ]

Download [top]

    The scource code is available as a zipped tar file. The programs will only compile on Linux systems with gnu compilers. While you are free to port and use LGL on other operating systems and compilers, I will probably not be of much help with support. If you aren't sure about compiling the programs yourself, try the online submission page first.

    All files provided with LGL fall under the terms of the GNU General Public License. By downloading LGL you agree to the terms of that license.

Version 1.1
Source Code: LGL.tar.gz
Update - 9/2005 - LGL and Boost code was slightly updated to compile under GNU 4.X compilers, but there will be several warnings from the Boost library if you do use newer compilers. The Boost library is now included in the archive, since it is patched.


Version 1.0
You will need the boost library 1.30.2 if you want this older version of LGL.
Source Code: LGL.tar.gz
Update - 12/2003 - All JAVA source code is now included in LGL.
Update - 2/2004 - Boost Version 1.31 has changes in the random number generator library that are probably responsible for the break in LGL compilation. Version 1.30.2 works fine, and it is recommended until a patch is made for the fix. Thanks to Paul Brunk for pointing this out.

Usage  [top]

A README is available in the archive after you unpack it. To get you going try...

prompt$ tar -zxvf LGL.tar.gz # This unpacks it into a dir called 'LGL-1.1'
prompt$ cd LGL-1.1  # Now take a look at the README!
prompt$ ./setup -i # To install the programs into the ./bin dir
prompt$ ./setup -c conf_file # To get an example config file to modify and run LGL
prompt$ ./bin/lgl.pl -c conf_file # After modifying the config for your needs, run LGL

After running the above you now have coordinates of your vertices in space. Now you have to run the right program to look at your results (Your edge file and final.coords file will probably be elsewhere and not in the current working directory).

prompt$ ./perls/genVrml.pl edgeFile.lgl final.coords # If your layout was 3D, then you can make some VRML coords
prompt$ java -jar lglview.jar edgeFile.lgl final.coords # If your layout was 2D, use the JAVA browser to see your results

Tips  [top]

    Drawing Trees (and Graphs) Nicely

The software is designed to draw arbitrarily large trees/graphs so the underlying algorithm has no functions for minimizing edge overlaps or other features specific for trees. Although functions exist for such things, LGL doesn't have an implementation because the layouts would then not have such scalability. However, there are some tricks for doing layouts with trees. First you need to do the following:

1) Make sure your tree is in a singly connected set. That is, every node is reachable by every other node in the graph by traversing edges. If it isn't your layouts will be awkward or undefined.
2) Make sure your tree is in .lgl format. You have to use the base programs so only .lgl format will do, and not .ncol or any other format.

Now you can use lglayout2D directly (or lglayout3D) in the "bin" directory of the archive you downloaded and compiled. lglayout2D (or lglayout3D) is designed specifically for layouts of singly connected sets. Running it without arguments gives the argument list. A good option to toy with is -q. This gives the suggested edge distances. For trees you might want to run it as:

prompt$ lglayout2D -q.05 sample.lgl.  # Check your path to lglayout2D

What the -q option does is set the equilibrium distance of the edges. The smaller the q value; the more it draws the nodes closer together so the edge lengths aren't as long. You can experiment with that side of it. Things that help are coloring the edges based on heirarchy - giving light edge colors to the higher order edges and darker colors for the lower order edges. That is what was done in the gallery file of SCOP. If you run a layout without adjusting the -q option you may see what I call the hairdryer effect. That happens when all of the edges, in particular those to leaf nodes, are piled on top of each other in layouts.

Another option is to dive into the code; specifically the function placementFormula in calcFuncs.C. That function determines the placement distances for successive layers in the tree. If you feel your tree is getting too compressed then changing the return value for that function to a much higher number will give the layout "more room". Undoubtedly, this is probably the least desirable method; not to mention a total hack job, but i've had to do for certain layouts.

However, for trees with perfectly non overlapping edges such as those drawn with phylogenetic programs you may have to use other programs that are made to view such trees. Those specialized programs will provide more clear layouts. LGL is meant to be generic and can't provide clearer layouts than software specialized for such layouts. Another example is visualizing metabolic pathways - there you also have to minimize the edge overlaps and present the layout in a more symmetric manner with specific labels. The obvious drawback of such specialized programs is usually scalability.

Contact  [top]

Alex Adai - alex dot adai at ucsf dot edu
Edward Marcotte - marcotte at icmb dot utexas dot edu

Cite  [top]

Adai AT, Date SV, Wieland S, Marcotte EM. LGL: creating a map of protein function with an algorithm for visualizing very large biological networks. J Mol Biol. 2004 Jun 25;340(1):179-90.


All programs and content are Copyright (C) 2002 Alex Adai
Send any questions, comments, bugs, etc. to [ alex dot adai at ucsf dot edu ]