Quick Start

Prerequisites

  • Python 3.11 or higher recommended.

  • odgi required to prepare custom data.

git clone https://github.com/ScottMastro/pangyplot.git
cd pangyplot
pip install -r requirements.txt

gunicorn is additionally recommended for production deployment but is not required for local development (Flask’s built-in server is used in that case). See the commented line in requirements.txt.

Quick Start - Running PangyPlot

python pangyplot.py run --db hprc.clip --ref GRCh38 --annotations gencode48.chrY

This should launch a local web server at http://127.0.0.1:5700 with chrY data that is included with the codebase.

What is it doing?

pangyplot run loads the specified database (--db) and launches the Flask web server.

The database is loaded from datastore/graphs/{db}. The directory at this location is assumed to be filled with chromosome-specific subdirectories (i.e. datastore/graphs/hprc.clip/chrY). Each chromosome directory holds the database files created from a GFA file.

The reference path (--ref) is used to specify the primary reference path.

The optional gene annotation file (--annotations) is similarly loaded from datastore/annotations/{ref}/{annotations} (i.e. datastore/annotations/GRCh38/gencode48.chrY).

Quick Start - Loading Prepared Data

wget https://zenodo.org/records/17173731/files/chrY.zip
unzip chrY.zip

mkdir -p datastore/graphs/hprc.prepared
mv chrY datastore/graphs/hprc.prepared/chrY

python pangyplot.py run --db hprc.prepared --ref GRCh38
What is it doing?

HPRC chromosome data has been preprocessed and available at: https://doi.org/10.5281/zenodo.17173731 Here we manually set up the directory structure to store the prepared data.

Zipping up the directory structure is a convenient way to share prepared PangyPlot data.

Quick Start - Preparing Data

Tip

The steps below can be generated for you interactively with pangyplot preprocess, which writes a tailored shell (or SLURM) script from a few prompts. The manual walkthrough below is kept for reference and for cases where you want finer control over the individual odgi invocations.

cd pangyplot
wget https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-grch38/hprc-v1.1-mc-grch38.chroms/chrY.vg

# convert to odgi format - odgi cannot read GFA files with W-lines
vg convert --no-wline chrY.vg -f > chrY_unsorted.gfa
odgi build -O -g chrY_unsorted.gfa -o chrY_unsorted.og

# one-dimensional sort
odgi paths -L -i chrY_unsorted.og | grep GRCh38 > path_sort_order.txt
odgi paths -L -i chrY_unsorted.og | grep CHM13 >> path_sort_order.txt
odgi sort -t 4 --optimize -Y -H path_sort_order.txt -i chrY_unsorted.og -o chrY.og -P

# create layout file
odgi layout -t 4 -i chrY.og --tsv chrY.lay.tsv -P

# create GFA file
odgi view -i chrY.og -g > chrY.gfa

python pangyplot.py add --ref GRCh38 --chr chrY --db hprc.test --gfa chrY.gfa --layout chrY.lay.tsv
python pangyplot.py status --db hprc.test
python pangyplot.py run --db hprc.test --ref GRCh38
What is it doing?

This is how the data was prepared for the previous example. PangyPlot requires a GFA file and an layout file to create the database. Here we optimize the graph for the primary reference path GRCh38 during the 1D sort.