Locus Annotation¶
Annotate regions with overlapping genome features: gene name, locus tag, feature type, strand, relative position, and intergenic status.
Built on IntervalIndex overlap queries — feed it a genome's gene index
and it enriches each Region's tags with the nearest gene context.
annotate
¶
Pure locus annotation — enrich a Region with overlapping feature info.
For each input Region, finds overlapping features via an IntervalIndex
and computes overlap, offset, and relative position using coordinate
primitives. Produces one output Region per input-feature pair, enriched
with annotation tags.
Dependencies: primitives.intervals, primitives.coordinates, region, stdlib.
annotate_locus_from_features
¶
annotate_locus_from_features(region: Region, features: Iterable[Region], chrom_length: int | None = None) -> Iterator[Region]
Annotate a Region from a pre-computed iterable of overlapping features.
Core annotation engine. Accepts the overlapping features directly so
the caller decides how to compute them — via an IntervalIndex query
or a sweep-line buffer. If no features overlap, yields the Region
unchanged.
Tags added per feature overlap:
feature_name: the feature's name (e.g. locus_tag)feature_type: the feature's type tag (e.g. "gene", "CDS")feature_start,feature_end,feature_strand: feature coordinatesoverlap: bp of overlap between hit and featureoffset: strand-aware offset of the hit within the featurerelative_pos: fractional position within the feature
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
region
|
Region
|
Input Region to annotate. |
required |
features
|
Iterable[Region]
|
Overlapping feature Regions (pre-filtered by the caller). |
required |
chrom_length
|
int | None
|
Chromosome length for circular-aware calculations.
If |
None
|
Yields:
| Type | Description |
|---|---|
Region
|
Annotated |
Examples:
>>> gene = Region("c", 100, 200, strand="+", name="g1",
... tags={"feature_type": "gene"})
>>> hit = Region("c", 150, 160)
>>> results = list(annotate_locus_from_features(hit, [gene]))
>>> results[0].tags["feature_name"]
'g1'
Source code in src/seqchain/primitives/annotate.py
annotate_locus
¶
annotate_locus(region: Region, gene_index: IntervalIndex, chrom_length: int | None = None) -> Iterator[Region]
Annotate a Region with overlapping feature information.
Queries gene_index for features overlapping region, then
delegates to annotate_locus_from_features for the actual math.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
region
|
Region
|
Input Region to annotate. |
required |
gene_index
|
IntervalIndex
|
An |
required |
chrom_length
|
int | None
|
Chromosome length for circular-aware overlap
and offset calculations. If |
None
|
Yields:
| Type | Description |
|---|---|
Region
|
Annotated |
Examples:
>>> from seqchain.primitives.intervals import IntervalIndex
>>> gene = Region("c", 100, 200, strand="+", name="g1",
... tags={"feature_type": "gene"})
>>> idx = IntervalIndex.build([gene])
>>> hit = Region("c", 150, 160)
>>> results = list(annotate_locus(hit, idx))
>>> results[0].tags["feature_name"]
'g1'