User Guide#
This guide covers how to use g4hunterpy3 in detail, including the Python API and the command-line interface.
Background: How G4Hunter Scoring Works#
G4Hunter assigns a per-base score to every nucleotide in a sequence:
Guanine (G): positive score equal to the length of the G-run (capped at 4). For example, each G in a
GGGrun scores +3.Cytosine (C): negative score equal to the length of the C-run (capped at -4). For example, each C in a
CCCCrun scores -4.A, T, and other bases: score 0.
These per-base scores are then averaged over a sliding window (default 25 nt) to produce a G4Hunter score for each window position. Windows whose absolute score meets or exceeds a threshold are reported as candidate G4-forming regions. Overlapping windows are merged into contiguous regions.
Recommended thresholds:
1.2 — good compromise for identifying many true G4 motifs
1.5 — high-confidence predictions (precision >90%)
2.0 — very high propensity
For more information see Bedrat et al. 2016.
Python API#
The core functionality lives in g4hunterpy3.core.
Scanning a single sequence#
Use scan_sequence() for the simplest workflow:
from g4hunterpy3.core import scan_sequence
seq = "ATGGGGATTTTGGGGCCCGGGGATTTGGGG"
window_scores, hits, regions = scan_sequence(
seq, window_size=25, threshold=1.2
)
This returns three objects:
window_scores— a NumPy array of per-window mean scores.hits— a list ofWindowHitobjects, one per window that passes the threshold.regions— a list ofRegionobjects formed by merging overlapping hits.
Working with WindowHits and Regions#
Each WindowHit has start, end (0-based,
end-exclusive), and score attributes:
for h in hits:
print(f"Window [{h.start}:{h.end}] score={h.score:.2f}")
Each Region adds sequence, length,
and n_windows:
for r in regions:
print(f"Region [{r.start}:{r.end}] len={r.length} "
f"score={r.score:.2f} ({r.n_windows} windows merged)")
print(f" Sequence: {r.sequence}")
Scanning a FASTA file#
Use scan_fasta() to iterate over all records in a
FASTA file:
from g4hunterpy3.core import scan_fasta
results = scan_fasta("sequences.fasta", window_size=25, threshold=1.2)
for record_id, (window_scores, hits, regions) in results.items():
print(f">{record_id}: {len(hits)} hits, {len(regions)} regions")
Step-by-step API#
For more control, you can call the individual functions:
from g4hunterpy3.core import (
base_scores,
window_mean_scores,
find_window_hits,
merge_overlapping_windows,
)
seq = "GGGGTTTTGGGG"
# Step 1: per-base scores
bs = base_scores(seq)
# array([ 4, 4, 4, 4, 0, 0, 0, 0, 4, 4, 4, 4])
# Step 2: sliding-window means
ws = window_mean_scores(bs, window_size=4)
# Step 3: find windows above threshold
hits = find_window_hits(ws, window_size=4, threshold=1.0)
# Step 4: merge overlapping hits into regions
regions = merge_overlapping_windows(hits, seq, base_score_array=bs)
Plotting#
The g4hunterpy3.plotting module provides two visualization functions.
Simple plot — a line plot of sliding-window scores:
from g4hunterpy3.core import scan_sequence
from g4hunterpy3.plotting import simple_plot
ws, hits, regions = scan_sequence(seq, window_size=25, threshold=1.2)
simple_plot(ws, "output_scores.pdf")
Complex plot — a binned heatmap suitable for large genomes:
from g4hunterpy3.plotting import complex_plot
complex_plot(
hits,
genome_length=len(seq),
out_pdf="output_complex.pdf",
nbins=500,
score=1.2,
strand_agnostic=True,
highlight_regions=[[1000, 2000], [5000, 6000]],
)
Command-Line Interface#
After installation, the g4hunterpy3 command is available in your terminal.
Basic usage#
g4hunterpy3 -i <input.fasta> -o <output_directory> [options]
CLI options#
Option |
Short |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Path to the input FASTA file. |
|
|
(required) |
Output directory (created if it doesn’t exist). |
|
|
25 |
Sliding window size in bases. |
|
|
1.2 |
Absolute score threshold for calling hits. |
|
off |
Print sequence info (length, hit/region counts). |
|
|
off |
Write a PDF line plot of sliding-window scores. |
|
|
off |
Write a PDF binned heatmap (for large sequences). |
|
|
1000 |
Number of bins for the complex plot. |
|
|
95 |
Percentile for y-axis limit in complex plot. |
|
|
off |
Use absolute scores (ignores strand) in complex plot. |
|
|
Regions to highlight ( |
CLI examples#
Basic analysis:
g4hunterpy3 -i sequences.fasta -o results/
Custom window size and stricter threshold:
g4hunterpy3 -i genome.fasta -o output/ -w 30 -s 1.5
Print sequence info:
g4hunterpy3 -i sequences.fasta -o results/ --info
Generate plots:
# simple line plot
g4hunterpy3 -i sequences.fasta -o results/ --simple-plot
# complex binned heatmap for a genome
g4hunterpy3 -i genome.fasta -o results/ --complex-plot --complex-plot-nbins 500
Highlight genomic regions on complex plot:
g4hunterpy3 -i genome.fasta -o results/ \
--complex-plot \
--highlight-regions 1000:2000 5000:6000 8000:9000
Strand-agnostic vs strand-specific:
# strand-specific (default): blue = C-rich, red = G-rich
g4hunterpy3 -i genome.fasta -o results/ --complex-plot
# strand-agnostic: all G4-forming regions in red
g4hunterpy3 -i genome.fasta -o results/ --complex-plot --strand-agnostic
Output files#
For each FASTA record, the CLI writes:
Per-window hit file (
<header>-W<k>-S<threshold>.txt) — tab-separated with columns: Start, End, Sequence, Length, Score (1-based coordinates).Merged region file (
<header>-Merged.txt) — tab-separated with columns: Start, End, Sequence, Length, Score, NBR (1-based coordinates).Plot files (optional): -
<header>-ScorePlot.pdf— simple line plot (with--simple-plot) -<header>-ComplexScorePlot.pdf— binned heatmap (with--complex-plot)
Understanding scores#
Positive scores → G-rich regions (G4-forming on the forward strand).
Negative scores → C-rich regions (G4-forming on the reverse strand).
Score magnitudes: - |score| ≥ 1.2 — moderate propensity - |score| ≥ 1.5 — high propensity - |score| ≥ 2.0 — very high propensity
How to Cite#
Please cite the original G4Hunter paper and link to the g4hunterpy3 repository:
Bedrat, A., Lacroix, L. & Mergny, J.-L. Re-evaluation of G-quadruplex propensity with G4Hunter. Nucleic Acids Res. 44, 1746–1759 (2016). doi:10.1093/nar/gkw006