API Reference#

Core Module#

Core algorithms for G4Hunter scanning.

The G4Hunter approach assigns a per-base score based on runs of G or C: - G runs contribute positive scores (1..4), capped at 4. - C runs contribute negative scores (-1..-4), capped at -4. - Other bases contribute 0.

A sliding window average over these base scores yields a “G4Hunter score” per window. Windows whose absolute score exceeds a threshold are reported as candidate G4-forming regions.

This module provides a small, testable API around those steps.

class g4hunterpy3.core.Region(start: int, end: int, sequence: str, length: int, score: float, n_windows: int)[source]#

Bases: object

A merged region formed by overlapping/adjacent WindowHits.

Parameters:
  • start (int) – 0-based start index of the region.

  • end (int) – 0-based end index (exclusive) of the region.

  • sequence (str) – Sequence slice for the region (original sequence[start:end]).

  • length (int) – Region length, equal to end - start.

  • score (float) – Region score. By default, this implementation uses the mean of the per-base scores across the region (rounded to 2 decimals), matching the original script’s behavior.

  • n_windows (int) – Number of window hits that were merged into this region.

end: int#
length: int#
n_windows: int#
score: float#
sequence: str#
start: int#
class g4hunterpy3.core.WindowHit(start: int, end: int, score: float)[source]#

Bases: object

A single scoring window that passes the threshold.

Parameters:
  • start (int) – 0-based start index of the window.

  • end (int) – 0-based end index (exclusive) of the window.

  • score (float) – Mean G4Hunter score for the window (mean of base scores in window).

end: int#
score: float#
start: int#
g4hunterpy3.core.base_scores(seq: str | ndarray) ndarray[source]#

Compute per-base G4Hunter scores for a sequence.

This is a refactor of the original BaseScore routine. Runs of G (or g) contribute positive scores and runs of C (or c) contribute negative scores. The magnitude is capped at 4 and applied to every base in the run.

Parameters:

seq (str) – Input DNA sequence (may contain lower/upper case).

Returns:

Array of shape (len(seq),) with integer scores in [-4, 4].

Return type:

numpy.ndarray

Notes

  • Any character other than G/g/C/c receives score 0.

  • Runs longer than 4 are scored as 4 (or -4) for all positions in the run.

g4hunterpy3.core.find_window_hits(window_scores: ndarray, window_size: int, threshold: float) List[WindowHit][source]#

Identify scoring windows whose absolute score passes a threshold.

Parameters:
  • window_scores (numpy.ndarray) – Sliding-window mean scores (output of window_mean_scores).

  • window_size (int) – Window length in bases.

  • threshold (float) – Threshold applied to absolute window score.

Returns:

Each hit corresponds to one window start position i. WindowHit uses 0-based indexing with end exclusive.

Return type:

list of WindowHit

g4hunterpy3.core.merge_overlapping_windows(hits: Sequence[WindowHit], seq: str, base_score_array: ndarray | None = None) List[Region][source]#

Merge overlapping/adjacent window hits into regions.

The original script merged windows when their start positions were consecutive (difference of 1). This produces regions that are the union of a run of overlapping windows.

Parameters:
  • hits (sequence of WindowHit) – Window hits, ideally sorted by start.

  • seq (str) – Original sequence the hits are defined on.

  • base_score_array (numpy.ndarray, optional) – Per-base score array for seq. If not supplied, it will be computed.

Returns:

Merged regions with region score computed as mean per-base score across the region (rounded to 2 decimals), consistent with the original script output.

Return type:

list of Region

Notes

  • If hits is empty, returns an empty list.

  • Windows are treated as overlapping if the next start is <= current_end - 1. For consecutive starts and fixed window size, this matches the original.

g4hunterpy3.core.scan_fasta(fasta_path: str | Path, window_size: int = 25, threshold: float = 1.5) Dict[str, Tuple[ndarray, List[WindowHit], List[Region]]][source]#

Scan every record in a FASTA file.

Parameters:
  • fasta_path (str or pathlib.Path) – Path to FASTA file.

  • window_size (int, default=25) – Sliding window length in bases.

  • threshold (float, default=1.5) – Absolute score threshold for calling windows as hits.

Returns:

Mapping from record id to (window_scores, hits, regions).

Return type:

dict

g4hunterpy3.core.scan_sequence(seq: str, window_size: int = 25, threshold: float = 1.5) Tuple[ndarray, List[WindowHit], List[Region]][source]#

Run G4Hunter scoring on a single DNA sequence.

Parameters:
  • seq (str) – DNA sequence to scan.

  • window_size (int, default=25) – Sliding window length in bases.

  • threshold (float, default=1.5) – Absolute score threshold for calling windows as hits.

Returns:

  • window_scores (numpy.ndarray) – Sliding-window mean score array.

  • hits (list of WindowHit) – Per-window hits whose absolute score >= threshold.

  • regions (list of Region) – Merged hit regions.

Examples

>>> ws, hits, regions = scan_sequence("GGGGTTTTGGGG", window_size=4, threshold=1.0)
>>> len(hits) > 0
True
g4hunterpy3.core.window_mean_scores(scores: ndarray, window_size: int) ndarray[source]#

Compute sliding-window mean scores.

Parameters:
  • scores (numpy.ndarray) – Per-base score array (output of base_scores).

  • window_size (int) – Window length in bases (k in the original script).

Returns:

Array of shape (len(scores) - window_size + 1,) containing the mean score for each window starting at i.

Return type:

numpy.ndarray

Raises:

ValueError – If window_size is < 1 or larger than the sequence length.

Plotting Module#

g4hunterpy3.plotting.complex_plot(hits: list, genome_length: int, out_pdf: Path, nbins: int = 1000, score: float = 1.2, percentile_to_use: int = 95, dpi: int = 300, figsize: tuple = (8, 1.5), colorbar_vmax: float = 3.0, colorbar_vmin: float = None, highlight_regions: list = None, strand_agnostic: bool = True)[source]#

Save a PDF plot of the sliding-window scores.

Parameters:
  • hits (list) – List of hit objects with start, end, and score attributes. Each hit represents a window with a calculated G4Hunter score.

  • genome_length (int) – Length of the genome/sequence being analyzed; this is needed to map the hits to the full length sequence.

  • out_pdf (Path) – Output PDF file path.

  • nbins (int, optional) – Number of bins for the complex plot, by default 1000.

  • score (float, optional) – Score threshold used for calling hits, used to set floor on colorbar. By default 1.2.

  • percentile_to_use (int, optional) – Percentile of scores to use within each bin (e.g., 95 for 95th percentile), by default 95.

  • dpi (int, optional) – Dots per inch for the output PDF, by default 300.

  • figsize (tuple, optional) – Figure size as (width, height) in inches, by default (8, 2.0).

  • colorbar_vmax = float, optional – Maximum value for colorbar, by default 3.0.

  • colorbar_vmin = float, optional – Mininum value for colorbar, by default max(score, 0.0).

  • highlight_regions (list, optional) – List of [start, end] pairs defining regions to highlight on the x-axis. Each element should be a list or tuple with two integers representing the start and end positions (1-based) of the region to highlight. Highlighted regions are shown as yellow vertical spans with alpha=0.5. By default None (no highlighting).

  • strand_agnostic (bool, optional) – If True, use absolute G4 scores (ignoring strand) for plotting. If False, use raw scores (which can be negative for C-rich regions). Set to true for dsDNA sequences where both strands can form G4s. By default True.

Returns:

No return value but writes to file.

Return type:

None

g4hunterpy3.plotting.simple_plot(scores: ndarray, out_pdf: Path, dpi: int = 300, line_color='red', line_width=0.8) None[source]#

Save a PDF plot of the sliding-window scores.

Parameters:
  • scores (np.ndarray) – Array of sliding-window scores.

  • out_pdf (Path) – Output PDF file path.

  • dpi (int, optional) – Dots per inch for the output PDF, by default 300.

Returns:

No return value but writes to file.

Return type:

None