Coordinates¶

Pure functions for genomic coordinate math: circular genome wrapping, interval overlap, distance calculation, strand-aware offset, and promoter region extraction.

Normalization¶

normalize ¶

normalize(pos: int, length: int, circular: bool = True) -> int

Wrap a position into the range [0, length).

When circular is False, returns pos unchanged — the caller is on a linear chromosome and wrapping is not applicable.

Parameters:

Name	Type	Description	Default
`pos`	`int`	Genomic position (may be negative or >= length).	required
`length`	`int`	Chromosome or sequence length.	required
`circular`	`bool`	Whether to apply modular wrapping. Defaults to `True` for backward compatibility.	`True`

Returns:

Type	Description
`int`	Position wrapped to `[0, length)` if circular, else pos.

Examples:

>>> normalize(1091300, 1091291)
9
>>> normalize(-5, 100, circular=False)
-5

Source code in src/seqchain/primitives/coordinates.py

def normalize(pos: int, length: int, circular: bool = True) -> int:
    """Wrap a position into the range ``[0, length)``.

    When *circular* is ``False``, returns *pos* unchanged — the caller
    is on a linear chromosome and wrapping is not applicable.

    Args:
        pos: Genomic position (may be negative or >= *length*).
        length: Chromosome or sequence length.
        circular: Whether to apply modular wrapping.  Defaults to
            ``True`` for backward compatibility.

    Returns:
        Position wrapped to ``[0, length)`` if circular, else *pos*.

    Examples:
        >>> normalize(1091300, 1091291)
        9
        >>> normalize(-5, 100, circular=False)
        -5
    """
    return pos % length if circular else pos

Overlap & distance¶

interval_overlap ¶

interval_overlap(a_start: int, a_end: int, b_start: int, b_end: int, chrom_length: int | None = None) -> int

Compute the overlap in bp between two intervals.

In linear mode (chrom_length is None), performs a standard interval overlap calculation.

In circular mode (chrom_length given), coordinates are first normalised with % chrom_length and origin-wrapping features (where start > end after normalisation) are handled. This replicates the logic of legacy get_overlap from targets.py.

Parameters:

Name	Type	Description	Default
`a_start`	`int`	Start of the first interval (the "target").	required
`a_end`	`int`	End of the first interval.	required
`b_start`	`int`	Start of the second interval (the "feature").	required
`b_end`	`int`	End of the second interval.	required
`chrom_length`	`int \| None`	Chromosome length for circular mode. `None` (default) selects linear mode.	`None`

Returns:

Type	Description
`int`	Overlap in base pairs. `0` if the intervals do not overlap.

Examples:

>>> interval_overlap(10, 20, 15, 25)
5

Source code in src/seqchain/primitives/coordinates.py

def interval_overlap(
    a_start: int,
    a_end: int,
    b_start: int,
    b_end: int,
    chrom_length: int | None = None,
) -> int:
    """Compute the overlap in bp between two intervals.

    In **linear mode** (``chrom_length`` is ``None``), performs a standard
    interval overlap calculation.

    In **circular mode** (``chrom_length`` given), coordinates are first
    normalised with ``% chrom_length`` and origin-wrapping features
    (where ``start > end`` after normalisation) are handled. This
    replicates the logic of legacy ``get_overlap`` from ``targets.py``.

    Args:
        a_start: Start of the first interval (the "target").
        a_end: End of the first interval.
        b_start: Start of the second interval (the "feature").
        b_end: End of the second interval.
        chrom_length: Chromosome length for circular mode.
            ``None`` (default) selects linear mode.

    Returns:
        Overlap in base pairs. ``0`` if the intervals do not overlap.

    Examples:
        >>> interval_overlap(10, 20, 15, 25)
        5
    """
    if chrom_length is None:
        # Linear overlap
        overlap = min(a_end, b_end) - max(a_start, b_start)
        return max(0, overlap)

    # Circular mode — replicate legacy get_overlap exactly
    a_start = a_start % chrom_length
    a_end = a_end % chrom_length
    b_start = b_start % chrom_length
    b_end = b_end % chrom_length

    # Determine which interval wraps and which is the "feature"
    # Legacy convention: b is the feature
    if b_start > b_end:  # Feature wraps around origin
        if a_start < b_end:  # Target in early part (after origin)
            overlap_start = a_start
            overlap_end = min(a_end, b_end)
        elif a_start >= b_start:  # Target in later part (before origin)
            overlap_start = a_start
            overlap_end = min(a_end, chrom_length)
            if a_end < b_end:  # Target also wraps — add early overlap
                overlap_end += min(a_end, b_end)
        else:
            return 0
    else:  # Normal feature (no wrapping)
        overlap_start = max(a_start, b_start)
        overlap_end = min(a_end, b_end)

    overlap = overlap_end - overlap_start
    return overlap if overlap >= 0 else 0

distance_to_feature ¶

distance_to_feature(pos: int, feat_start: int, feat_end: int, chrom_length: int | None = None) -> int

Compute the shortest distance from a position to a feature.

Returns 0 when pos is inside [feat_start, feat_end). When chrom_length is given, the shortest circular-genome distance is returned (wrapping around the origin if that path is shorter).

Parameters:

Name	Type	Description	Default
`pos`	`int`	Query position.	required
`feat_start`	`int`	Feature start (inclusive).	required
`feat_end`	`int`	Feature end (exclusive).	required
`chrom_length`	`int \| None`	Chromosome length for circular mode. `None` (default) selects linear mode.	`None`

Returns:

Type	Description
`int`	Distance in base pairs (always >= 0).

Examples:

>>> distance_to_feature(90, 100, 200)
10

Source code in src/seqchain/primitives/coordinates.py

def distance_to_feature(
    pos: int,
    feat_start: int,
    feat_end: int,
    chrom_length: int | None = None,
) -> int:
    """Compute the shortest distance from a position to a feature.

    Returns ``0`` when *pos* is inside ``[feat_start, feat_end)``.
    When *chrom_length* is given, the shortest circular-genome distance
    is returned (wrapping around the origin if that path is shorter).

    Args:
        pos: Query position.
        feat_start: Feature start (inclusive).
        feat_end: Feature end (exclusive).
        chrom_length: Chromosome length for circular mode.
            ``None`` (default) selects linear mode.

    Returns:
        Distance in base pairs (always >= 0).

    Examples:
        >>> distance_to_feature(90, 100, 200)
        10
    """
    if feat_start <= pos < feat_end:
        return 0

    d_start = abs(pos - feat_start)
    d_end = abs(pos - feat_end)

    if chrom_length is not None:
        # Circular: also consider wrap-around distances
        d_start = min(d_start, chrom_length - d_start)
        d_end = min(d_end, chrom_length - d_end)

    return min(d_start, d_end)

relative_position ¶

relative_position(pos: int, feat_start: int, feat_end: int, chrom_length: int = 0) -> float

Compute the fractional position within a feature.

Returns 0.0 at feat_start and 1.0 at feat_end. Values outside [0, 1] indicate the position is outside the feature.

When chrom_length is positive and the feature uses virtual coordinates (feat_end > chrom_length), positions in the wrapped portion (pos < feat_start) are shifted by chrom_length so the fraction is computed correctly.

Parameters:

Name	Type	Description	Default
`pos`	`int`	Query position.	required
`feat_start`	`int`	Feature start.	required
`feat_end`	`int`	Feature end.	required
`chrom_length`	`int`	Chromosome length for origin-wrapping features. `0` (default) disables wrapping adjustment.	`0`

Returns:

Type	Description
`float`	Fraction in the range `(-inf, +inf)`. `0.0` for
`float`	zero-length features.

Examples:

>>> relative_position(150, 100, 200)
0.5

Source code in src/seqchain/primitives/coordinates.py

def relative_position(
    pos: int,
    feat_start: int,
    feat_end: int,
    chrom_length: int = 0,
) -> float:
    """Compute the fractional position within a feature.

    Returns ``0.0`` at *feat_start* and ``1.0`` at *feat_end*. Values
    outside ``[0, 1]`` indicate the position is outside the feature.

    When *chrom_length* is positive and the feature uses virtual
    coordinates (``feat_end > chrom_length``), positions in the wrapped
    portion (``pos < feat_start``) are shifted by *chrom_length* so the
    fraction is computed correctly.

    Args:
        pos: Query position.
        feat_start: Feature start.
        feat_end: Feature end.
        chrom_length: Chromosome length for origin-wrapping features.
            ``0`` (default) disables wrapping adjustment.

    Returns:
        Fraction in the range ``(-inf, +inf)``. ``0.0`` for
        zero-length features.

    Examples:
        >>> relative_position(150, 100, 200)
        0.5
    """
    length = feat_end - feat_start
    if length == 0:
        return 0.0
    if chrom_length > 0 and feat_end > chrom_length and pos < feat_start:
        pos = pos + chrom_length
    return (pos - feat_start) / length

offset_in_feature ¶

offset_in_feature(target_start: int, target_end: int, feat_start: int, feat_end: int, strand: str | int, chrom_length: int) -> int

Compute a strand-aware offset of a target within a genomic feature.

Replicates the logic of legacy get_offset from targets.py. All four coordinates are normalised with % chrom_length before the calculation, and origin-wrapping genes (where feat_start > feat_end after normalisation) are handled correctly.

For forward-strand features the offset is the distance from the feature start to the target start. For reverse-strand features it is the distance from the target end to the feature end.

Parameters:

Name	Type	Description	Default
`target_start`	`int`	Target start position.	required
`target_end`	`int`	Target end position.	required
`feat_start`	`int`	Feature (gene) start position.	required
`feat_end`	`int`	Feature (gene) end position.	required
`strand`	`str \| int`	Strand indicator. Accepted values: `"+"`, `"F"`, `1` for forward; `"-"`, `"R"`, `-1` for reverse.	required
`chrom_length`	`int`	Chromosome length for coordinate normalisation.	required

Returns:

Type	Description
`int`	Signed offset in base pairs.

Examples:

>>> offset_in_feature(1500, 1520, 1000, 2000, "+", 1091291)
500

Raises:

Type	Description
`ValueError`	If strand is not a recognized strand indicator.

Source code in src/seqchain/primitives/coordinates.py

def offset_in_feature(
    target_start: int,
    target_end: int,
    feat_start: int,
    feat_end: int,
    strand: str | int,
    chrom_length: int,
) -> int:
    """Compute a strand-aware offset of a target within a genomic feature.

    Replicates the logic of legacy ``get_offset`` from ``targets.py``.
    All four coordinates are normalised with ``% chrom_length`` before
    the calculation, and origin-wrapping genes (where ``feat_start >
    feat_end`` after normalisation) are handled correctly.

    For **forward-strand** features the offset is the distance from the
    feature start to the target start. For **reverse-strand** features
    it is the distance from the target end to the feature end.

    Args:
        target_start: Target start position.
        target_end: Target end position.
        feat_start: Feature (gene) start position.
        feat_end: Feature (gene) end position.
        strand: Strand indicator. Accepted values: ``"+"``, ``"F"``,
            ``1`` for forward; ``"-"``, ``"R"``, ``-1`` for reverse.
        chrom_length: Chromosome length for coordinate normalisation.

    Returns:
        Signed offset in base pairs.

    Examples:
        >>> offset_in_feature(1500, 1520, 1000, 2000, "+", 1091291)
        500

    Raises:
        ValueError: If *strand* is not a recognized strand indicator.
    """
    sign = _validate_strand(strand)
    # Normalize all coordinates
    ts = target_start % chrom_length
    te = target_end % chrom_length
    fs = feat_start % chrom_length
    fe = feat_end % chrom_length

    if sign == 1:
        if fs > fe:  # Gene wraps around origin
            if ts < fe:  # Target in early part (after origin)
                return ts + (chrom_length - fs)
            else:  # Target in later part
                return ts - fs
        else:  # Normal gene
            return ts - fs
    else:
        if fs > fe:  # Gene wraps around origin
            if te < fe:  # Target in early part
                return (chrom_length - te) + (fe - fs)
            else:  # Target in later part
                return fe - te
        else:  # Normal gene
            return fe - te

Region geometry¶

expand_region ¶

expand_region(start: int, end: int, upstream: int, downstream: int, chrom_length: int | None = None) -> tuple[int, int]

Expand an interval by the given number of bases on each side.

In linear mode the start is clamped to 0 (no negative coordinates). In circular mode coordinates wrap via modulo.

Parameters:

Name	Type	Description	Default
`start`	`int`	Interval start.	required
`end`	`int`	Interval end.	required
`upstream`	`int`	Bases to add before start.	required
`downstream`	`int`	Bases to add after end.	required
`chrom_length`	`int \| None`	Chromosome length for circular wrapping. `None` (default) selects linear mode.	`None`

Returns:

Type	Description
`tuple[int, int]`	`(new_start, new_end)` tuple.

Examples:

>>> expand_region(1000, 2000, 500, 500)
(500, 2500)

Source code in src/seqchain/primitives/coordinates.py

def expand_region(
    start: int,
    end: int,
    upstream: int,
    downstream: int,
    chrom_length: int | None = None,
) -> tuple[int, int]:
    """Expand an interval by the given number of bases on each side.

    In **linear mode** the start is clamped to 0 (no negative
    coordinates). In **circular mode** coordinates wrap via modulo.

    Args:
        start: Interval start.
        end: Interval end.
        upstream: Bases to add before *start*.
        downstream: Bases to add after *end*.
        chrom_length: Chromosome length for circular wrapping.
            ``None`` (default) selects linear mode.

    Returns:
        ``(new_start, new_end)`` tuple.

    Examples:
        >>> expand_region(1000, 2000, 500, 500)
        (500, 2500)
    """
    new_start = start - upstream
    new_end = end + downstream

    if chrom_length is not None:
        new_start = new_start % chrom_length
        new_end = new_end % chrom_length
    else:
        new_start = max(0, new_start)

    return (new_start, new_end)

promoter_region ¶

promoter_region(gene_start: int, gene_end: int, strand: str | int, upstream: int = 500) -> tuple[int, int]

Derive promoter coordinates from gene boundaries and strand.

The promoter is the region of upstream bp immediately before the transcription start site. For forward-strand genes that is upstream of gene_start; for reverse-strand genes it is downstream of gene_end.

Parameters:

Name	Type	Description	Default
`gene_start`	`int`	Gene start position (0-based).	required
`gene_end`	`int`	Gene end position (exclusive).	required
`strand`	`str \| int`	Strand indicator. Accepted values: `"+"`, `"F"`, `1` for forward; `"-"`, `"R"`, `-1` for reverse.	required
`upstream`	`int`	Promoter length in bp. Defaults to 500.	`500`

Returns:

Type	Description
`int`	`(promoter_start, promoter_end)` tuple. The start may be
`int`	negative for genes near position 0; use
`tuple[int, int]`	`normalize()` to wrap on circular genomes.

Examples:

>>> promoter_region(10000, 11000, "+")
(9500, 10000)

Raises:

Type	Description
`ValueError`	If strand is not a recognized strand indicator.

Source code in src/seqchain/primitives/coordinates.py

def promoter_region(
    gene_start: int,
    gene_end: int,
    strand: str | int,
    upstream: int = 500,
) -> tuple[int, int]:
    """Derive promoter coordinates from gene boundaries and strand.

    The promoter is the region of *upstream* bp immediately before the
    transcription start site. For forward-strand genes that is upstream
    of *gene_start*; for reverse-strand genes it is downstream of
    *gene_end*.

    Args:
        gene_start: Gene start position (0-based).
        gene_end: Gene end position (exclusive).
        strand: Strand indicator. Accepted values: ``"+"``, ``"F"``,
            ``1`` for forward; ``"-"``, ``"R"``, ``-1`` for reverse.
        upstream: Promoter length in bp. Defaults to 500.

    Returns:
        ``(promoter_start, promoter_end)`` tuple. The start may be
        negative for genes near position 0; use
        `normalize()` to wrap on circular genomes.

    Examples:
        >>> promoter_region(10000, 11000, "+")
        (9500, 10000)

    Raises:
        ValueError: If *strand* is not a recognized strand indicator.
    """
    sign = _validate_strand(strand)
    if sign == 1:
        return (gene_start - upstream, gene_start)
    else:
        return (gene_end, gene_end + upstream)

terminator_region ¶

terminator_region(gene_start: int, gene_end: int, strand: str | int, downstream: int = 500) -> tuple[int, int]

Derive terminator coordinates from gene boundaries and strand.

The terminator is the region of downstream bp immediately after the transcription stop site. For forward-strand genes that is downstream of gene_end; for reverse-strand genes it is upstream of gene_start.

Parameters:

Name	Type	Description	Default
`gene_start`	`int`	Gene start position (0-based).	required
`gene_end`	`int`	Gene end position (exclusive).	required
`strand`	`str \| int`	Strand indicator. Accepted values: `"+"`, `"F"`, `1` for forward; `"-"`, `"R"`, `-1` for reverse.	required
`downstream`	`int`	Terminator length in bp. Defaults to 500.	`500`

Returns:

Type	Description
`int`	`(terminator_start, terminator_end)` tuple. The start may be
`int`	negative for reverse-strand genes near position 0; use
`tuple[int, int]`	`normalize()` to wrap on circular genomes.

Examples:

>>> terminator_region(10000, 11000, "+")
(11000, 11500)

Raises:

Type	Description
`ValueError`	If strand is not a recognized strand indicator.

Source code in src/seqchain/primitives/coordinates.py

def terminator_region(
    gene_start: int,
    gene_end: int,
    strand: str | int,
    downstream: int = 500,
) -> tuple[int, int]:
    """Derive terminator coordinates from gene boundaries and strand.

    The terminator is the region of *downstream* bp immediately after the
    transcription stop site. For forward-strand genes that is downstream
    of *gene_end*; for reverse-strand genes it is upstream of
    *gene_start*.

    Args:
        gene_start: Gene start position (0-based).
        gene_end: Gene end position (exclusive).
        strand: Strand indicator. Accepted values: ``"+"``, ``"F"``,
            ``1`` for forward; ``"-"``, ``"R"``, ``-1`` for reverse.
        downstream: Terminator length in bp. Defaults to 500.

    Returns:
        ``(terminator_start, terminator_end)`` tuple. The start may be
        negative for reverse-strand genes near position 0; use
        `normalize()` to wrap on circular genomes.

    Examples:
        >>> terminator_region(10000, 11000, "+")
        (11000, 11500)

    Raises:
        ValueError: If *strand* is not a recognized strand indicator.
    """
    sign = _validate_strand(strand)
    if sign == 1:
        return (gene_end, gene_end + downstream)
    else:
        return (gene_start - downstream, gene_start)