Intervals¶
Max-end segment tree for O(log n + k) interval overlap queries.
IntervalIndex is the data structure behind IntervalTrack.regions().
Build an index from Regions, then query it with chromosome + coordinate range to get all overlapping entries.
intervals
¶
Fast interval overlap lookups and set operations.
IntervalIndex provides O(log n + k) overlap queries backed by a
max-end segment tree. intersect_intervals() computes pairwise
intersections of two region sets — the fundamental spatial join primitive.
No external dependencies — uses bisect from stdlib.
IntervalIndex
¶
Index for fast overlap, nearest, and proximity queries on Regions.
Backed by a per-chromosome segment tree over coordinate-sorted
(start, end, region) tuples. Construction is O(n log n) via
sorting; overlap queries are O(log n + k) where k is the number
of results — bisect prunes the right boundary and the max-end tree
prunes the left.
Examples:
>>> idx = IntervalIndex.build([
... Region("chr1", 100, 200),
... Region("chr1", 150, 250),
... ])
>>> len(idx.overlapping("chr1", 120, 160))
2
Source code in src/seqchain/primitives/intervals.py
build
staticmethod
¶
Construct an index from a sequence of Regions.
Regions are grouped by chromosome and sorted by start position (ties broken by end position).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
regions
|
Sequence[Region]
|
Iterable of |
required |
Returns:
| Type | Description |
|---|---|
IntervalIndex
|
A new |
Examples:
>>> idx = IntervalIndex.build([Region("chr1", 10, 20)])
>>> idx.overlapping("chr1", 15, 25)
[Region(chrom='chr1', start=10, end=20, strand='.', score=0.0, name='', tags={})]
Source code in src/seqchain/primitives/intervals.py
overlapping
¶
Return all regions that overlap the query interval.
A region overlaps if region.start < end and
region.end > start (half-open interval intersection).
Complexity is O(log n + k) where k is the result count.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chrom
|
str
|
Chromosome name. |
required |
start
|
int
|
Query interval start (inclusive). |
required |
end
|
int
|
Query interval end (exclusive). |
required |
Returns:
| Type | Description |
|---|---|
list[Region]
|
List of overlapping |
list[Region]
|
ordered by start position. |
Examples:
>>> idx = IntervalIndex.build([Region("chr1", 10, 20)])
>>> idx.overlapping("chr1", 15, 25)
[Region(chrom='chr1', start=10, end=20, strand='.', score=0.0, name='', tags={})]
Source code in src/seqchain/primitives/intervals.py
nearest
¶
Return the single nearest region to a position.
Distance is measured from pos to the closest edge of each region (0 if pos is inside the region).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chrom
|
str
|
Chromosome name. |
required |
pos
|
int
|
Query position. |
required |
max_distance
|
int | None
|
If given, only consider regions within this
many bp. |
None
|
Returns:
| Type | Description |
|---|---|
Region | None
|
The nearest |
Region | None
|
the chromosome has no regions (or none within |
Region | None
|
max_distance). |
Examples:
>>> idx = IntervalIndex.build([Region("chr1", 100, 200)])
>>> idx.nearest("chr1", 50)
Region(chrom='chr1', start=100, end=200, strand='.', score=0.0, name='', tags={})
Source code in src/seqchain/primitives/intervals.py
within_distance
¶
Return all regions within a given distance of a position.
Distance is measured from pos to the closest edge of each
region (0 if pos is inside the region). Complexity is
O(log n + k) — delegates to overlapping() with an
expanded window.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chrom
|
str
|
Chromosome name. |
required |
pos
|
int
|
Query position. |
required |
distance
|
int
|
Maximum distance in bp (inclusive). |
required |
Returns:
| Type | Description |
|---|---|
list[Region]
|
List of |
list[Region]
|
distance bp, ordered by start position. |
Examples:
>>> idx = IntervalIndex.build([
... Region("chr1", 100, 200),
... Region("chr1", 500, 600),
... ])
>>> len(idx.within_distance("chr1", 50, 100))
1
Source code in src/seqchain/primitives/intervals.py
__len__
¶
Total number of indexed regions across all chromosomes.
Returns:
| Type | Description |
|---|---|
int
|
Region count. |
chromosomes
¶
Return the list of chromosomes present in the index.
Returns:
| Type | Description |
|---|---|
list[str]
|
Sorted list of chromosome names. |
Examples:
>>> idx = IntervalIndex.build([Region("chr2", 0, 10), Region("chr1", 0, 10)])
>>> idx.chromosomes()
['chr1', 'chr2']
Source code in src/seqchain/primitives/intervals.py
intersect_intervals
¶
intersect_intervals(query: Iterable[Region], reference: Iterable[Region]) -> Iterator[tuple[Region, Region]]
Compute pairwise intersections of two region sets.
For each query region, finds all overlapping reference regions and
yields (fragment, ref) pairs. The fragment is the query region
with coordinates clipped to the intersection — it preserves the
query's name, strand, score, and tags. The ref is the original
reference region (unmodified).
Uses a sorted sweep-line grouped by chromosome. Reference regions
are grouped by chromosome and sorted by start; each query region
scans only its own chromosome's sorted slice, stopping early when
ref.start >= query.end.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
Iterable[Region]
|
Input regions to intersect. |
required |
reference
|
Iterable[Region]
|
Reference regions to intersect against. |
required |
Yields:
| Type | Description |
|---|---|
Region
|
|
Region
|
clipped to the overlap with ref. Query regions with no |
tuple[Region, Region]
|
overlapping reference regions are silently dropped. |
Examples:
>>> q = [Region("chr1", 100, 300)]
>>> r = [Region("chr1", 0, 200, name="A"), Region("chr1", 200, 400, name="B")]
>>> [(f.start, f.end, ref.name) for f, ref in intersect_intervals(q, r)]
[(100, 200, 'A'), (200, 300, 'B')]
Source code in src/seqchain/primitives/intervals.py
overlay
¶
overlay(query: Iterable[Region], reference: Iterable[Region], *, name_tag: str | None = None) -> Iterator[Region]
Spatial join with tag merge.
For each query region, finds all overlapping reference regions via
intersect_intervals(), clips to the intersection, and merges
the reference region's tags into the fragment. If name_tag is
given, promotes the reference region's name field into a tag on
the output fragment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
Iterable[Region]
|
Input regions to overlay. |
required |
reference
|
Iterable[Region]
|
Reference regions to join against. |
required |
name_tag
|
str | None
|
If given, sets |
None
|
Yields:
| Type | Description |
|---|---|
Region
|
Regions with coordinates clipped to the intersection and tags |
Region
|
merged from both query and reference. Query regions with no |
Region
|
overlapping reference are silently dropped. |
Examples:
>>> q = [Region("chr1", 100, 300, name="state", tags={"mark": "H3"})]
>>> r = [Region("chr1", 0, 200, name="promoter", tags={"gene": "G1"})]
>>> result = list(overlay(q, r, name_tag="feature_type"))
>>> result[0].tags["feature_type"]
'promoter'
>>> result[0].tags["gene"]
'G1'
>>> result[0].tags["mark"]
'H3'