API Reference#
Core Module#
Core algorithms for G4Hunter scanning.
The G4Hunter approach assigns a per-base score based on runs of G or C: - G runs contribute positive scores (1..4), capped at 4. - C runs contribute negative scores (-1..-4), capped at -4. - Other bases contribute 0.
A sliding window average over these base scores yields a “G4Hunter score” per window. Windows whose absolute score exceeds a threshold are reported as candidate G4-forming regions.
This module provides a small, testable API around those steps.
- class g4hunterpy3.core.Region(start: int, end: int, sequence: str, length: int, score: float, n_windows: int)[source]#
Bases:
objectA merged region formed by overlapping/adjacent WindowHits.
- Parameters:
start (int) – 0-based start index of the region.
end (int) – 0-based end index (exclusive) of the region.
sequence (str) – Sequence slice for the region (original sequence[start:end]).
length (int) – Region length, equal to end - start.
score (float) – Region score. By default, this implementation uses the mean of the per-base scores across the region (rounded to 2 decimals), matching the original script’s behavior.
n_windows (int) – Number of window hits that were merged into this region.
- end: int#
- length: int#
- n_windows: int#
- score: float#
- sequence: str#
- start: int#
- class g4hunterpy3.core.WindowHit(start: int, end: int, score: float)[source]#
Bases:
objectA single scoring window that passes the threshold.
- Parameters:
start (int) – 0-based start index of the window.
end (int) – 0-based end index (exclusive) of the window.
score (float) – Mean G4Hunter score for the window (mean of base scores in window).
- end: int#
- score: float#
- start: int#
- g4hunterpy3.core.base_scores(seq: str | ndarray) ndarray[source]#
Compute per-base G4Hunter scores for a sequence.
This is a refactor of the original BaseScore routine. Runs of G (or g) contribute positive scores and runs of C (or c) contribute negative scores. The magnitude is capped at 4 and applied to every base in the run.
- Parameters:
seq (str) – Input DNA sequence (may contain lower/upper case).
- Returns:
Array of shape (len(seq),) with integer scores in [-4, 4].
- Return type:
numpy.ndarray
Notes
Any character other than G/g/C/c receives score 0.
Runs longer than 4 are scored as 4 (or -4) for all positions in the run.
- g4hunterpy3.core.find_window_hits(window_scores: ndarray, window_size: int, threshold: float) List[WindowHit][source]#
Identify scoring windows whose absolute score passes a threshold.
- Parameters:
window_scores (numpy.ndarray) – Sliding-window mean scores (output of window_mean_scores).
window_size (int) – Window length in bases.
threshold (float) – Threshold applied to absolute window score.
- Returns:
Each hit corresponds to one window start position i. WindowHit uses 0-based indexing with end exclusive.
- Return type:
list of WindowHit
- g4hunterpy3.core.merge_overlapping_windows(hits: Sequence[WindowHit], seq: str, base_score_array: ndarray | None = None) List[Region][source]#
Merge overlapping/adjacent window hits into regions.
The original script merged windows when their start positions were consecutive (difference of 1). This produces regions that are the union of a run of overlapping windows.
- Parameters:
hits (sequence of WindowHit) – Window hits, ideally sorted by start.
seq (str) – Original sequence the hits are defined on.
base_score_array (numpy.ndarray, optional) – Per-base score array for seq. If not supplied, it will be computed.
- Returns:
Merged regions with region score computed as mean per-base score across the region (rounded to 2 decimals), consistent with the original script output.
- Return type:
list of Region
Notes
If hits is empty, returns an empty list.
Windows are treated as overlapping if the next start is <= current_end - 1. For consecutive starts and fixed window size, this matches the original.
- g4hunterpy3.core.scan_fasta(fasta_path: str | Path, window_size: int = 25, threshold: float = 1.5) Dict[str, Tuple[ndarray, List[WindowHit], List[Region]]][source]#
Scan every record in a FASTA file.
- Parameters:
fasta_path (str or pathlib.Path) – Path to FASTA file.
window_size (int, default=25) – Sliding window length in bases.
threshold (float, default=1.5) – Absolute score threshold for calling windows as hits.
- Returns:
Mapping from record id to (window_scores, hits, regions).
- Return type:
dict
- g4hunterpy3.core.scan_sequence(seq: str, window_size: int = 25, threshold: float = 1.5) Tuple[ndarray, List[WindowHit], List[Region]][source]#
Run G4Hunter scoring on a single DNA sequence.
- Parameters:
seq (str) – DNA sequence to scan.
window_size (int, default=25) – Sliding window length in bases.
threshold (float, default=1.5) – Absolute score threshold for calling windows as hits.
- Returns:
window_scores (numpy.ndarray) – Sliding-window mean score array.
hits (list of WindowHit) – Per-window hits whose absolute score >= threshold.
regions (list of Region) – Merged hit regions.
Examples
>>> ws, hits, regions = scan_sequence("GGGGTTTTGGGG", window_size=4, threshold=1.0) >>> len(hits) > 0 True
- g4hunterpy3.core.window_mean_scores(scores: ndarray, window_size: int) ndarray[source]#
Compute sliding-window mean scores.
- Parameters:
scores (numpy.ndarray) – Per-base score array (output of base_scores).
window_size (int) – Window length in bases (k in the original script).
- Returns:
Array of shape (len(scores) - window_size + 1,) containing the mean score for each window starting at i.
- Return type:
numpy.ndarray
- Raises:
ValueError – If window_size is < 1 or larger than the sequence length.
Plotting Module#
- g4hunterpy3.plotting.complex_plot(hits: list, genome_length: int, out_pdf: Path, nbins: int = 1000, score: float = 1.2, percentile_to_use: int = 95, dpi: int = 300, figsize: tuple = (8, 1.5), colorbar_vmax: float = 3.0, colorbar_vmin: float = None, highlight_regions: list = None, strand_agnostic: bool = True)[source]#
Save a PDF plot of the sliding-window scores.
- Parameters:
hits (list) – List of hit objects with start, end, and score attributes. Each hit represents a window with a calculated G4Hunter score.
genome_length (int) – Length of the genome/sequence being analyzed; this is needed to map the hits to the full length sequence.
out_pdf (Path) – Output PDF file path.
nbins (int, optional) – Number of bins for the complex plot, by default 1000.
score (float, optional) – Score threshold used for calling hits, used to set floor on colorbar. By default 1.2.
percentile_to_use (int, optional) – Percentile of scores to use within each bin (e.g., 95 for 95th percentile), by default 95.
dpi (int, optional) – Dots per inch for the output PDF, by default 300.
figsize (tuple, optional) – Figure size as (width, height) in inches, by default (8, 2.0).
colorbar_vmax = float, optional – Maximum value for colorbar, by default 3.0.
colorbar_vmin = float, optional – Mininum value for colorbar, by default max(score, 0.0).
highlight_regions (list, optional) – List of [start, end] pairs defining regions to highlight on the x-axis. Each element should be a list or tuple with two integers representing the start and end positions (1-based) of the region to highlight. Highlighted regions are shown as yellow vertical spans with alpha=0.5. By default None (no highlighting).
strand_agnostic (bool, optional) – If True, use absolute G4 scores (ignoring strand) for plotting. If False, use raw scores (which can be negative for C-rich regions). Set to true for dsDNA sequences where both strands can form G4s. By default True.
- Returns:
No return value but writes to file.
- Return type:
None
- g4hunterpy3.plotting.simple_plot(scores: ndarray, out_pdf: Path, dpi: int = 300, line_color='red', line_width=0.8) None[source]#
Save a PDF plot of the sliding-window scores.
- Parameters:
scores (np.ndarray) – Array of sliding-window scores.
out_pdf (Path) – Output PDF file path.
dpi (int, optional) – Dots per inch for the output PDF, by default 300.
- Returns:
No return value but writes to file.
- Return type:
None