Calculating Graph Layout
odgi Layout
PangyPlot relies on odgi to calculate the 2D layout of nodes.
The GFA file is therefore needs to be converted into odgi format *.og.
From the odgi documentation:
The odgi layout command computes 2D layouts of the graph using stochastic gradient descent (SGD). The input graph must be sorted and id-compacted. The algorithm itself is described in Graph Drawing by Stochastic Gradient Descent. The force-directed graph drawing algorithm minimizes the graph’s energy function or stress level.
Script Generator
PangyPlot provides an interactive script generator that walks you through the preprocessing steps:
python pangyplot.py preprocess
This will prompt you for the ODGI file path, number of threads, whether to sort the graph, path priorities, GPU acceleration, and output directory. It produces a ready-to-run bash script with the appropriate odgi sort, odgi layout, and odgi view commands.
GPU acceleration is auto-detected and enabled by default if available.
Manual Steps
To do the one-dimensional sort of the graph:
odgi sort -i ${INPUT}.og -o ${OUTPUT}.og --optimize -Y -H paths.txt
We highly recommend the -H flag to specify which paths to prioritize. The paths.txt file contains one path name per line, the path priority.
The primary reference path should be set as the first path in this file.
The --optimize flag is also needed for optimizing node IDs.
The command used to calculate the layout:
odgi layout -i ${INPUT}.og -o ${OUTPUT}.lay -T ${OUTPUT}.lay.tsv
Note
The -t flag can be used to specify the number of threads to use.
The -P flag can be used to enable progress output.
The --gpu flag can be added for odgi layout if odgi was built with CUDA support. This speeds up the layout calculation significantly (https://arxiv.org/abs/2409.00876).
To convert back to GFA for ingestion into PangyPlot:
odgi view -i ${INPUT}.og -g > ${OUTPUT}.gfa
The *.lay.tsv output is structured as follows:
idx |
X |
Y |
component |
|---|---|---|---|
0 |
1000 |
12547.3115187589 |
0 |
1 |
165426 |
10586.0915549587 |
0 |
2 |
165426 |
7320.81894996611 |
0 |
3 |
165427 |
14814.159085348 |
0 |
4 |
165427 |
14425.5419673736 |
0 |
5 |
165445 |
15525.0135879779 |
0 |
6 |
165445 |
12244.877453525 |
0 |
7 |
165446 |
12979.6128977908 |
0 |
… |
… |
… |
… |
For each S line in the GFA file, two coordinate pairs are calculate, representing the coordinates for the start position and end position of each segment.
For example, for the first S line, the start position is given by line idx = 0 and the end position by idx = 1.
Bandage Layout
Bandage can also be used to calculate the 2D layout of nodes.
After opening the GFA file in Bandage, the layout can be exported via File -> Export Layout in the Bandage layout format (*.layout). The graph has to be small enough to be opened in Bandage, which is not always possible since Bandage loads the entire graph into memory.