Skip to content

Locus Annotation

Annotate regions with overlapping genome features: gene name, locus tag, feature type, strand, relative position, and intergenic status.

Built on IntervalIndex overlap queries — feed it a genome's gene index and it enriches each Region's tags with the nearest gene context.

annotate

Pure locus annotation — enrich a Region with overlapping feature info.

For each input Region, finds overlapping features via an IntervalIndex and computes overlap, offset, and relative position using coordinate primitives. Produces one output Region per input-feature pair, enriched with annotation tags.

Dependencies: primitives.intervals, primitives.coordinates, region, stdlib.

annotate_locus_from_features

annotate_locus_from_features(region: Region, features: Iterable[Region], chrom_length: int | None = None) -> Iterator[Region]

Annotate a Region from a pre-computed iterable of overlapping features.

Core annotation engine. Accepts the overlapping features directly so the caller decides how to compute them — via an IntervalIndex query or a sweep-line buffer. If no features overlap, yields the Region unchanged.

Tags added per feature overlap:

  • feature_name: the feature's name (e.g. locus_tag)
  • feature_type: the feature's type tag (e.g. "gene", "CDS")
  • feature_start, feature_end, feature_strand: feature coordinates
  • overlap: bp of overlap between hit and feature
  • offset: strand-aware offset of the hit within the feature
  • relative_pos: fractional position within the feature

Parameters:

Name Type Description Default
region Region

Input Region to annotate.

required
features Iterable[Region]

Overlapping feature Regions (pre-filtered by the caller).

required
chrom_length int | None

Chromosome length for circular-aware calculations. If None, linear coordinates are assumed.

None

Yields:

Type Description
Region

Annotated Region objects with enriched tags.

Examples:

>>> gene = Region("c", 100, 200, strand="+", name="g1",
...               tags={"feature_type": "gene"})
>>> hit = Region("c", 150, 160)
>>> results = list(annotate_locus_from_features(hit, [gene]))
>>> results[0].tags["feature_name"]
'g1'
Source code in src/seqchain/primitives/annotate.py
def annotate_locus_from_features(
    region: Region,
    features: Iterable[Region],
    chrom_length: int | None = None,
) -> Iterator[Region]:
    """Annotate a Region from a pre-computed iterable of overlapping features.

    Core annotation engine.  Accepts the overlapping features directly so
    the caller decides how to compute them — via an `IntervalIndex` query
    or a sweep-line buffer.  If no features overlap, yields the Region
    unchanged.

    Tags added per feature overlap:

    - ``feature_name``: the feature's name (e.g. locus_tag)
    - ``feature_type``: the feature's type tag (e.g. "gene", "CDS")
    - ``feature_start``, ``feature_end``, ``feature_strand``: feature
      coordinates
    - ``overlap``: bp of overlap between hit and feature
    - ``offset``: strand-aware offset of the hit within the feature
    - ``relative_pos``: fractional position within the feature

    Args:
        region: Input Region to annotate.
        features: Overlapping feature Regions (pre-filtered by the caller).
        chrom_length: Chromosome length for circular-aware calculations.
            If ``None``, linear coordinates are assumed.

    Yields:
        Annotated `Region` objects with enriched ``tags``.

    Examples:
        >>> gene = Region("c", 100, 200, strand="+", name="g1",
        ...               tags={"feature_type": "gene"})
        >>> hit = Region("c", 150, 160)
        >>> results = list(annotate_locus_from_features(hit, [gene]))
        >>> results[0].tags["feature_name"]
        'g1'
    """
    found = False
    for feat in features:
        found = True
        overlap = interval_overlap(
            region.start,
            region.end,
            feat.start,
            feat.end,
            chrom_length,
        )

        feat_strand = feat.strand
        if feat_strand in ("+", "-") and chrom_length is not None:
            offset = offset_in_feature(
                region.start,
                region.end,
                feat.start,
                feat.end,
                feat_strand,
                chrom_length,
            )
        else:
            offset = region.start - feat.start

        rel_pos = relative_position(
            region.start, feat.start, feat.end,
            chrom_length=chrom_length or 0,
        )

        new_tags = {
            **region.tags,
            "feature_name": feat.name,
            "feature_type": feat.tags.get("feature_type", ""),
            "feature_start": feat.start,
            "feature_end": feat.end,
            "feature_strand": feat.strand,
            "overlap": overlap,
            "offset": offset,
            "relative_pos": rel_pos,
        }

        for key in ("locus_tag", "gene", "product"):
            if key in feat.tags:
                new_tags[f"feature_{key}"] = feat.tags[key]

        yield replace(region, tags=new_tags)

    if not found:
        yield region

annotate_locus

annotate_locus(region: Region, gene_index: IntervalIndex, chrom_length: int | None = None) -> Iterator[Region]

Annotate a Region with overlapping feature information.

Queries gene_index for features overlapping region, then delegates to annotate_locus_from_features for the actual math.

Parameters:

Name Type Description Default
region Region

Input Region to annotate.

required
gene_index IntervalIndex

An IntervalIndex of reference features.

required
chrom_length int | None

Chromosome length for circular-aware overlap and offset calculations. If None, linear coordinates are assumed.

None

Yields:

Type Description
Region

Annotated Region objects with enriched tags.

Examples:

>>> from seqchain.primitives.intervals import IntervalIndex
>>> gene = Region("c", 100, 200, strand="+", name="g1",
...               tags={"feature_type": "gene"})
>>> idx = IntervalIndex.build([gene])
>>> hit = Region("c", 150, 160)
>>> results = list(annotate_locus(hit, idx))
>>> results[0].tags["feature_name"]
'g1'
Source code in src/seqchain/primitives/annotate.py
def annotate_locus(
    region: Region,
    gene_index: IntervalIndex,
    chrom_length: int | None = None,
) -> Iterator[Region]:
    """Annotate a Region with overlapping feature information.

    Queries *gene_index* for features overlapping *region*, then
    delegates to `annotate_locus_from_features` for the actual math.

    Args:
        region: Input Region to annotate.
        gene_index: An `IntervalIndex` of reference features.
        chrom_length: Chromosome length for circular-aware overlap
            and offset calculations. If ``None``, linear coordinates
            are assumed.

    Yields:
        Annotated `Region` objects with enriched ``tags``.

    Examples:
        >>> from seqchain.primitives.intervals import IntervalIndex
        >>> gene = Region("c", 100, 200, strand="+", name="g1",
        ...               tags={"feature_type": "gene"})
        >>> idx = IntervalIndex.build([gene])
        >>> hit = Region("c", 150, 160)
        >>> results = list(annotate_locus(hit, idx))
        >>> results[0].tags["feature_name"]
        'g1'
    """
    features = gene_index.overlapping(region.chrom, region.start, region.end)
    yield from annotate_locus_from_features(region, features, chrom_length)