Skip to content

Coordinates

Pure functions for genomic coordinate math: circular genome wrapping, interval overlap, distance calculation, strand-aware offset, and promoter region extraction.

Normalization

normalize

normalize(pos: int, length: int, circular: bool = True) -> int

Wrap a position into the range [0, length).

When circular is False, returns pos unchanged — the caller is on a linear chromosome and wrapping is not applicable.

Parameters:

Name Type Description Default
pos int

Genomic position (may be negative or >= length).

required
length int

Chromosome or sequence length.

required
circular bool

Whether to apply modular wrapping. Defaults to True for backward compatibility.

True

Returns:

Type Description
int

Position wrapped to [0, length) if circular, else pos.

Examples:

>>> normalize(1091300, 1091291)
9
>>> normalize(-5, 100, circular=False)
-5
Source code in src/seqchain/primitives/coordinates.py
def normalize(pos: int, length: int, circular: bool = True) -> int:
    """Wrap a position into the range ``[0, length)``.

    When *circular* is ``False``, returns *pos* unchanged — the caller
    is on a linear chromosome and wrapping is not applicable.

    Args:
        pos: Genomic position (may be negative or >= *length*).
        length: Chromosome or sequence length.
        circular: Whether to apply modular wrapping.  Defaults to
            ``True`` for backward compatibility.

    Returns:
        Position wrapped to ``[0, length)`` if circular, else *pos*.

    Examples:
        >>> normalize(1091300, 1091291)
        9
        >>> normalize(-5, 100, circular=False)
        -5
    """
    return pos % length if circular else pos

Overlap & distance

interval_overlap

interval_overlap(a_start: int, a_end: int, b_start: int, b_end: int, chrom_length: int | None = None) -> int

Compute the overlap in bp between two intervals.

In linear mode (chrom_length is None), performs a standard interval overlap calculation.

In circular mode (chrom_length given), coordinates are first normalised with % chrom_length and origin-wrapping features (where start > end after normalisation) are handled. This replicates the logic of legacy get_overlap from targets.py.

Parameters:

Name Type Description Default
a_start int

Start of the first interval (the "target").

required
a_end int

End of the first interval.

required
b_start int

Start of the second interval (the "feature").

required
b_end int

End of the second interval.

required
chrom_length int | None

Chromosome length for circular mode. None (default) selects linear mode.

None

Returns:

Type Description
int

Overlap in base pairs. 0 if the intervals do not overlap.

Examples:

>>> interval_overlap(10, 20, 15, 25)
5
Source code in src/seqchain/primitives/coordinates.py
def interval_overlap(
    a_start: int,
    a_end: int,
    b_start: int,
    b_end: int,
    chrom_length: int | None = None,
) -> int:
    """Compute the overlap in bp between two intervals.

    In **linear mode** (``chrom_length`` is ``None``), performs a standard
    interval overlap calculation.

    In **circular mode** (``chrom_length`` given), coordinates are first
    normalised with ``% chrom_length`` and origin-wrapping features
    (where ``start > end`` after normalisation) are handled. This
    replicates the logic of legacy ``get_overlap`` from ``targets.py``.

    Args:
        a_start: Start of the first interval (the "target").
        a_end: End of the first interval.
        b_start: Start of the second interval (the "feature").
        b_end: End of the second interval.
        chrom_length: Chromosome length for circular mode.
            ``None`` (default) selects linear mode.

    Returns:
        Overlap in base pairs. ``0`` if the intervals do not overlap.

    Examples:
        >>> interval_overlap(10, 20, 15, 25)
        5
    """
    if chrom_length is None:
        # Linear overlap
        overlap = min(a_end, b_end) - max(a_start, b_start)
        return max(0, overlap)

    # Circular mode — replicate legacy get_overlap exactly
    a_start = a_start % chrom_length
    a_end = a_end % chrom_length
    b_start = b_start % chrom_length
    b_end = b_end % chrom_length

    # Determine which interval wraps and which is the "feature"
    # Legacy convention: b is the feature
    if b_start > b_end:  # Feature wraps around origin
        if a_start < b_end:  # Target in early part (after origin)
            overlap_start = a_start
            overlap_end = min(a_end, b_end)
        elif a_start >= b_start:  # Target in later part (before origin)
            overlap_start = a_start
            overlap_end = min(a_end, chrom_length)
            if a_end < b_end:  # Target also wraps — add early overlap
                overlap_end += min(a_end, b_end)
        else:
            return 0
    else:  # Normal feature (no wrapping)
        overlap_start = max(a_start, b_start)
        overlap_end = min(a_end, b_end)

    overlap = overlap_end - overlap_start
    return overlap if overlap >= 0 else 0

distance_to_feature

distance_to_feature(pos: int, feat_start: int, feat_end: int, chrom_length: int | None = None) -> int

Compute the shortest distance from a position to a feature.

Returns 0 when pos is inside [feat_start, feat_end). When chrom_length is given, the shortest circular-genome distance is returned (wrapping around the origin if that path is shorter).

Parameters:

Name Type Description Default
pos int

Query position.

required
feat_start int

Feature start (inclusive).

required
feat_end int

Feature end (exclusive).

required
chrom_length int | None

Chromosome length for circular mode. None (default) selects linear mode.

None

Returns:

Type Description
int

Distance in base pairs (always >= 0).

Examples:

>>> distance_to_feature(90, 100, 200)
10
Source code in src/seqchain/primitives/coordinates.py
def distance_to_feature(
    pos: int,
    feat_start: int,
    feat_end: int,
    chrom_length: int | None = None,
) -> int:
    """Compute the shortest distance from a position to a feature.

    Returns ``0`` when *pos* is inside ``[feat_start, feat_end)``.
    When *chrom_length* is given, the shortest circular-genome distance
    is returned (wrapping around the origin if that path is shorter).

    Args:
        pos: Query position.
        feat_start: Feature start (inclusive).
        feat_end: Feature end (exclusive).
        chrom_length: Chromosome length for circular mode.
            ``None`` (default) selects linear mode.

    Returns:
        Distance in base pairs (always >= 0).

    Examples:
        >>> distance_to_feature(90, 100, 200)
        10
    """
    if feat_start <= pos < feat_end:
        return 0

    d_start = abs(pos - feat_start)
    d_end = abs(pos - feat_end)

    if chrom_length is not None:
        # Circular: also consider wrap-around distances
        d_start = min(d_start, chrom_length - d_start)
        d_end = min(d_end, chrom_length - d_end)

    return min(d_start, d_end)

relative_position

relative_position(pos: int, feat_start: int, feat_end: int, chrom_length: int = 0) -> float

Compute the fractional position within a feature.

Returns 0.0 at feat_start and 1.0 at feat_end. Values outside [0, 1] indicate the position is outside the feature.

When chrom_length is positive and the feature uses virtual coordinates (feat_end > chrom_length), positions in the wrapped portion (pos < feat_start) are shifted by chrom_length so the fraction is computed correctly.

Parameters:

Name Type Description Default
pos int

Query position.

required
feat_start int

Feature start.

required
feat_end int

Feature end.

required
chrom_length int

Chromosome length for origin-wrapping features. 0 (default) disables wrapping adjustment.

0

Returns:

Type Description
float

Fraction in the range (-inf, +inf). 0.0 for

float

zero-length features.

Examples:

>>> relative_position(150, 100, 200)
0.5
Source code in src/seqchain/primitives/coordinates.py
def relative_position(
    pos: int,
    feat_start: int,
    feat_end: int,
    chrom_length: int = 0,
) -> float:
    """Compute the fractional position within a feature.

    Returns ``0.0`` at *feat_start* and ``1.0`` at *feat_end*. Values
    outside ``[0, 1]`` indicate the position is outside the feature.

    When *chrom_length* is positive and the feature uses virtual
    coordinates (``feat_end > chrom_length``), positions in the wrapped
    portion (``pos < feat_start``) are shifted by *chrom_length* so the
    fraction is computed correctly.

    Args:
        pos: Query position.
        feat_start: Feature start.
        feat_end: Feature end.
        chrom_length: Chromosome length for origin-wrapping features.
            ``0`` (default) disables wrapping adjustment.

    Returns:
        Fraction in the range ``(-inf, +inf)``. ``0.0`` for
        zero-length features.

    Examples:
        >>> relative_position(150, 100, 200)
        0.5
    """
    length = feat_end - feat_start
    if length == 0:
        return 0.0
    if chrom_length > 0 and feat_end > chrom_length and pos < feat_start:
        pos = pos + chrom_length
    return (pos - feat_start) / length

offset_in_feature

offset_in_feature(target_start: int, target_end: int, feat_start: int, feat_end: int, strand: str | int, chrom_length: int) -> int

Compute a strand-aware offset of a target within a genomic feature.

Replicates the logic of legacy get_offset from targets.py. All four coordinates are normalised with % chrom_length before the calculation, and origin-wrapping genes (where feat_start > feat_end after normalisation) are handled correctly.

For forward-strand features the offset is the distance from the feature start to the target start. For reverse-strand features it is the distance from the target end to the feature end.

Parameters:

Name Type Description Default
target_start int

Target start position.

required
target_end int

Target end position.

required
feat_start int

Feature (gene) start position.

required
feat_end int

Feature (gene) end position.

required
strand str | int

Strand indicator. Accepted values: "+", "F", 1 for forward; "-", "R", -1 for reverse.

required
chrom_length int

Chromosome length for coordinate normalisation.

required

Returns:

Type Description
int

Signed offset in base pairs.

Examples:

>>> offset_in_feature(1500, 1520, 1000, 2000, "+", 1091291)
500

Raises:

Type Description
ValueError

If strand is not a recognized strand indicator.

Source code in src/seqchain/primitives/coordinates.py
def offset_in_feature(
    target_start: int,
    target_end: int,
    feat_start: int,
    feat_end: int,
    strand: str | int,
    chrom_length: int,
) -> int:
    """Compute a strand-aware offset of a target within a genomic feature.

    Replicates the logic of legacy ``get_offset`` from ``targets.py``.
    All four coordinates are normalised with ``% chrom_length`` before
    the calculation, and origin-wrapping genes (where ``feat_start >
    feat_end`` after normalisation) are handled correctly.

    For **forward-strand** features the offset is the distance from the
    feature start to the target start. For **reverse-strand** features
    it is the distance from the target end to the feature end.

    Args:
        target_start: Target start position.
        target_end: Target end position.
        feat_start: Feature (gene) start position.
        feat_end: Feature (gene) end position.
        strand: Strand indicator. Accepted values: ``"+"``, ``"F"``,
            ``1`` for forward; ``"-"``, ``"R"``, ``-1`` for reverse.
        chrom_length: Chromosome length for coordinate normalisation.

    Returns:
        Signed offset in base pairs.

    Examples:
        >>> offset_in_feature(1500, 1520, 1000, 2000, "+", 1091291)
        500

    Raises:
        ValueError: If *strand* is not a recognized strand indicator.
    """
    sign = _validate_strand(strand)
    # Normalize all coordinates
    ts = target_start % chrom_length
    te = target_end % chrom_length
    fs = feat_start % chrom_length
    fe = feat_end % chrom_length

    if sign == 1:
        if fs > fe:  # Gene wraps around origin
            if ts < fe:  # Target in early part (after origin)
                return ts + (chrom_length - fs)
            else:  # Target in later part
                return ts - fs
        else:  # Normal gene
            return ts - fs
    else:
        if fs > fe:  # Gene wraps around origin
            if te < fe:  # Target in early part
                return (chrom_length - te) + (fe - fs)
            else:  # Target in later part
                return fe - te
        else:  # Normal gene
            return fe - te

Region geometry

expand_region

expand_region(start: int, end: int, upstream: int, downstream: int, chrom_length: int | None = None) -> tuple[int, int]

Expand an interval by the given number of bases on each side.

In linear mode the start is clamped to 0 (no negative coordinates). In circular mode coordinates wrap via modulo.

Parameters:

Name Type Description Default
start int

Interval start.

required
end int

Interval end.

required
upstream int

Bases to add before start.

required
downstream int

Bases to add after end.

required
chrom_length int | None

Chromosome length for circular wrapping. None (default) selects linear mode.

None

Returns:

Type Description
tuple[int, int]

(new_start, new_end) tuple.

Examples:

>>> expand_region(1000, 2000, 500, 500)
(500, 2500)
Source code in src/seqchain/primitives/coordinates.py
def expand_region(
    start: int,
    end: int,
    upstream: int,
    downstream: int,
    chrom_length: int | None = None,
) -> tuple[int, int]:
    """Expand an interval by the given number of bases on each side.

    In **linear mode** the start is clamped to 0 (no negative
    coordinates). In **circular mode** coordinates wrap via modulo.

    Args:
        start: Interval start.
        end: Interval end.
        upstream: Bases to add before *start*.
        downstream: Bases to add after *end*.
        chrom_length: Chromosome length for circular wrapping.
            ``None`` (default) selects linear mode.

    Returns:
        ``(new_start, new_end)`` tuple.

    Examples:
        >>> expand_region(1000, 2000, 500, 500)
        (500, 2500)
    """
    new_start = start - upstream
    new_end = end + downstream

    if chrom_length is not None:
        new_start = new_start % chrom_length
        new_end = new_end % chrom_length
    else:
        new_start = max(0, new_start)

    return (new_start, new_end)

promoter_region

promoter_region(gene_start: int, gene_end: int, strand: str | int, upstream: int = 500) -> tuple[int, int]

Derive promoter coordinates from gene boundaries and strand.

The promoter is the region of upstream bp immediately before the transcription start site. For forward-strand genes that is upstream of gene_start; for reverse-strand genes it is downstream of gene_end.

Parameters:

Name Type Description Default
gene_start int

Gene start position (0-based).

required
gene_end int

Gene end position (exclusive).

required
strand str | int

Strand indicator. Accepted values: "+", "F", 1 for forward; "-", "R", -1 for reverse.

required
upstream int

Promoter length in bp. Defaults to 500.

500

Returns:

Type Description
int

(promoter_start, promoter_end) tuple. The start may be

int

negative for genes near position 0; use

tuple[int, int]

normalize() to wrap on circular genomes.

Examples:

>>> promoter_region(10000, 11000, "+")
(9500, 10000)

Raises:

Type Description
ValueError

If strand is not a recognized strand indicator.

Source code in src/seqchain/primitives/coordinates.py
def promoter_region(
    gene_start: int,
    gene_end: int,
    strand: str | int,
    upstream: int = 500,
) -> tuple[int, int]:
    """Derive promoter coordinates from gene boundaries and strand.

    The promoter is the region of *upstream* bp immediately before the
    transcription start site. For forward-strand genes that is upstream
    of *gene_start*; for reverse-strand genes it is downstream of
    *gene_end*.

    Args:
        gene_start: Gene start position (0-based).
        gene_end: Gene end position (exclusive).
        strand: Strand indicator. Accepted values: ``"+"``, ``"F"``,
            ``1`` for forward; ``"-"``, ``"R"``, ``-1`` for reverse.
        upstream: Promoter length in bp. Defaults to 500.

    Returns:
        ``(promoter_start, promoter_end)`` tuple. The start may be
        negative for genes near position 0; use
        `normalize()` to wrap on circular genomes.

    Examples:
        >>> promoter_region(10000, 11000, "+")
        (9500, 10000)

    Raises:
        ValueError: If *strand* is not a recognized strand indicator.
    """
    sign = _validate_strand(strand)
    if sign == 1:
        return (gene_start - upstream, gene_start)
    else:
        return (gene_end, gene_end + upstream)

terminator_region

terminator_region(gene_start: int, gene_end: int, strand: str | int, downstream: int = 500) -> tuple[int, int]

Derive terminator coordinates from gene boundaries and strand.

The terminator is the region of downstream bp immediately after the transcription stop site. For forward-strand genes that is downstream of gene_end; for reverse-strand genes it is upstream of gene_start.

Parameters:

Name Type Description Default
gene_start int

Gene start position (0-based).

required
gene_end int

Gene end position (exclusive).

required
strand str | int

Strand indicator. Accepted values: "+", "F", 1 for forward; "-", "R", -1 for reverse.

required
downstream int

Terminator length in bp. Defaults to 500.

500

Returns:

Type Description
int

(terminator_start, terminator_end) tuple. The start may be

int

negative for reverse-strand genes near position 0; use

tuple[int, int]

normalize() to wrap on circular genomes.

Examples:

>>> terminator_region(10000, 11000, "+")
(11000, 11500)

Raises:

Type Description
ValueError

If strand is not a recognized strand indicator.

Source code in src/seqchain/primitives/coordinates.py
def terminator_region(
    gene_start: int,
    gene_end: int,
    strand: str | int,
    downstream: int = 500,
) -> tuple[int, int]:
    """Derive terminator coordinates from gene boundaries and strand.

    The terminator is the region of *downstream* bp immediately after the
    transcription stop site. For forward-strand genes that is downstream
    of *gene_end*; for reverse-strand genes it is upstream of
    *gene_start*.

    Args:
        gene_start: Gene start position (0-based).
        gene_end: Gene end position (exclusive).
        strand: Strand indicator. Accepted values: ``"+"``, ``"F"``,
            ``1`` for forward; ``"-"``, ``"R"``, ``-1`` for reverse.
        downstream: Terminator length in bp. Defaults to 500.

    Returns:
        ``(terminator_start, terminator_end)`` tuple. The start may be
        negative for reverse-strand genes near position 0; use
        `normalize()` to wrap on circular genomes.

    Examples:
        >>> terminator_region(10000, 11000, "+")
        (11000, 11500)

    Raises:
        ValueError: If *strand* is not a recognized strand indicator.
    """
    sign = _validate_strand(strand)
    if sign == 1:
        return (gene_end, gene_end + downstream)
    else:
        return (gene_start - downstream, gene_start)