Calculating Bubbles

BubbleGun is a tool that identifies topological structures in a graph (GFA file). These topological structures can be nested within each other, forming a hierarchical chain of superstructures.

Note

BubbleGun is bundled with PangyPlot (vendored under BubbleGun/ at the repo root) and is invoked automatically during pangyplot add. You do not need to install it separately. This section is for informational purposes.

Structure Definitions

Segment
a contiguous chunk of sequence with no variation. Basic nodes that make up a graph genome.
Bubble
An acyclic, directed subgraph with source and sink nodes. All paths through the bubble must touch the source and sink nodes.
Bubble Source/Sink
The entry and exit points of a bubble.
Bubble Chain
A sequence of bubbles where the sink of one directly connects to the source of the next, forming a larger structure.
Compacted Graph
A genome graph simplified by merging consecutive, non-branching segments into single nodes while preserving all variation points.
figure from BubbleGun paper

From the BubbleGun publication.

Compacting the Graph

Compacting the graph before bubble detection removes long stretches of trivial, non-branching nodes. Without compaction, these nodes artificially break up bubblechains, making bubbles look smaller or fragmented. By merging them, bubble sources and sinks become clear, bubble boundaries are preserved, and bubblechains reflect the true size of the underlying variation. This also reduces graph noise and improves performance.

PangyPlot will attempt to compact the graph during bubble detection so that bubble chains aren’t disrupted. The data is still stored uncompacted, which means that bubble sources and sinks may contain multiple uncompacted segments.