API Reference#

Core Module#

Core algorithms for G4Hunter scanning.

The G4Hunter approach assigns a per-base score based on runs of G or C: - G runs contribute positive scores (1..4), capped at 4. - C runs contribute negative scores (-1..-4), capped at -4. - Other bases contribute 0.

A sliding window average over these base scores yields a “G4Hunter score” per window. Windows whose absolute score exceeds a threshold are reported as candidate G4-forming regions.

This module provides a small, testable API around those steps.

class g4hunterpy3.core.Region(start: int, end: int, sequence: str, length: int, score: float, n_windows: int)[source]#

Bases: object

A merged region formed by overlapping/adjacent WindowHits.

Parameters:

start (int) – 0-based start index of the region.
end (int) – 0-based end index (exclusive) of the region.
sequence (str) – Sequence slice for the region (original sequence[start:end]).
length (int) – Region length, equal to end - start.
score (float) – Region score. By default, this implementation uses the mean of the per-base scores across the region (rounded to 2 decimals), matching the original script’s behavior.
n_windows (int) – Number of window hits that were merged into this region.

end: int#

length: int#

n_windows: int#

score: float#

sequence: str#

start: int#

class g4hunterpy3.core.WindowHit(start: int, end: int, score: float)[source]#

Bases: object

A single scoring window that passes the threshold.

Parameters:

start (int) – 0-based start index of the window.
end (int) – 0-based end index (exclusive) of the window.
score (float) – Mean G4Hunter score for the window (mean of base scores in window).

end: int#

score: float#

start: int#

g4hunterpy3.core.base_scores(seq: str | ndarray) → ndarray[source]#

Compute per-base G4Hunter scores for a sequence.

This is a refactor of the original BaseScore routine. Runs of G (or g) contribute positive scores and runs of C (or c) contribute negative scores. The magnitude is capped at 4 and applied to every base in the run.

Parameters:: seq (str) – Input DNA sequence (may contain lower/upper case).
Returns:: Array of shape (len(seq),) with integer scores in [-4, 4].
Return type:: numpy.ndarray

Notes

Any character other than G/g/C/c receives score 0.
Runs longer than 4 are scored as 4 (or -4) for all positions in the run.

g4hunterpy3.core.find_window_hits(window_scores: ndarray, window_size: int, threshold: float) → List[WindowHit][source]#

Identify scoring windows whose absolute score passes a threshold.

Parameters:

window_scores (numpy.ndarray) – Sliding-window mean scores (output of window_mean_scores).
window_size (int) – Window length in bases.
threshold (float) – Threshold applied to absolute window score.

Returns:

Each hit corresponds to one window start position i. WindowHit uses 0-based indexing with end exclusive.

Return type:

list of WindowHit

g4hunterpy3.core.merge_overlapping_windows(hits: Sequence[WindowHit], seq: str, base_score_array: ndarray | None = None) → List[Region][source]#

Merge overlapping/adjacent window hits into regions.

The original script merged windows when their start positions were consecutive (difference of 1). This produces regions that are the union of a run of overlapping windows.

Parameters:

hits (sequence of WindowHit) – Window hits, ideally sorted by start.
seq (str) – Original sequence the hits are defined on.
base_score_array (numpy.ndarray, optional) – Per-base score array for seq. If not supplied, it will be computed.

Returns:

Merged regions with region score computed as mean per-base score across the region (rounded to 2 decimals), consistent with the original script output.

Return type:

list of Region

Notes

If hits is empty, returns an empty list.
Windows are treated as overlapping if the next start is <= current_end - 1. For consecutive starts and fixed window size, this matches the original.

g4hunterpy3.core.scan_fasta(fasta_path: str | Path, window_size: int = 25, threshold: float = 1.5) → Dict[str, Tuple[ndarray, List[WindowHit], List[Region]]][source]#

Scan every record in a FASTA file.

Parameters:

fasta_path (str or pathlib.Path) – Path to FASTA file.
window_size (int, default=25) – Sliding window length in bases.
threshold (float, default=1.5) – Absolute score threshold for calling windows as hits.

Returns:

Mapping from record id to (window_scores, hits, regions).

Return type:

dict

g4hunterpy3.core.scan_sequence(seq: str, window_size: int = 25, threshold: float = 1.5) → Tuple[ndarray, List[WindowHit], List[Region]][source]#

Run G4Hunter scoring on a single DNA sequence.

Parameters:

seq (str) – DNA sequence to scan.
window_size (int, default=25) – Sliding window length in bases.
threshold (float, default=1.5) – Absolute score threshold for calling windows as hits.

Returns:

window_scores (numpy.ndarray) – Sliding-window mean score array.
hits (list of WindowHit) – Per-window hits whose absolute score >= threshold.
regions (list of Region) – Merged hit regions.

Examples

>>> ws, hits, regions = scan_sequence("GGGGTTTTGGGG", window_size=4, threshold=1.0)
>>> len(hits) > 0
True

g4hunterpy3.core.window_mean_scores(scores: ndarray, window_size: int) → ndarray[source]#

Compute sliding-window mean scores.

Parameters:

scores (numpy.ndarray) – Per-base score array (output of base_scores).
window_size (int) – Window length in bases (k in the original script).

Returns:

Array of shape (len(scores) - window_size + 1,) containing the mean score for each window starting at i.

Return type:

numpy.ndarray

Raises:

ValueError – If window_size is < 1 or larger than the sequence length.

Plotting Module#

g4hunterpy3.plotting.complex_plot(hits: list, genome_length: int, out_pdf: Path, nbins: int = 1000, score: float = 1.2, percentile_to_use: int = 95, dpi: int = 300, figsize: tuple = (8, 1.5), colorbar_vmax: float = 3.0, colorbar_vmin: float = None, highlight_regions: list = None, strand_agnostic: bool = True)[source]#

Save a PDF plot of the sliding-window scores.

Parameters:

hits (list) – List of hit objects with start, end, and score attributes. Each hit represents a window with a calculated G4Hunter score.
genome_length (int) – Length of the genome/sequence being analyzed; this is needed to map the hits to the full length sequence.
out_pdf (Path) – Output PDF file path.
nbins (int, optional) – Number of bins for the complex plot, by default 1000.
score (float, optional) – Score threshold used for calling hits, used to set floor on colorbar. By default 1.2.
percentile_to_use (int, optional) – Percentile of scores to use within each bin (e.g., 95 for 95th percentile), by default 95.
dpi (int, optional) – Dots per inch for the output PDF, by default 300.
figsize (tuple, optional) – Figure size as (width, height) in inches, by default (8, 2.0).
colorbar_vmax = float, optional – Maximum value for colorbar, by default 3.0.
colorbar_vmin = float, optional – Mininum value for colorbar, by default max(score, 0.0).
highlight_regions (list, optional) – List of [start, end] pairs defining regions to highlight on the x-axis. Each element should be a list or tuple with two integers representing the start and end positions (1-based) of the region to highlight. Highlighted regions are shown as yellow vertical spans with alpha=0.5. By default None (no highlighting).
strand_agnostic (bool, optional) – If True, use absolute G4 scores (ignoring strand) for plotting. If False, use raw scores (which can be negative for C-rich regions). Set to true for dsDNA sequences where both strands can form G4s. By default True.

Returns:

No return value but writes to file.

Return type:

None

g4hunterpy3.plotting.simple_plot(scores: ndarray, out_pdf: Path, dpi: int = 300, line_color='red', line_width=0.8) → None[source]#

Save a PDF plot of the sliding-window scores.

Parameters:

scores (np.ndarray) – Array of sliding-window scores.
out_pdf (Path) – Output PDF file path.
dpi (int, optional) – Dots per inch for the output PDF, by default 300.

Returns:

No return value but writes to file.

Return type:

None