Genome¶
Load genomes from GenBank and FASTA files. The Genome dataclass holds
sequences, features, organism metadata, and topology (circular/linear)
in a frozen, immutable structure.
from seqchain.io.genome import load_genbank
genome = load_genbank("yeast.gb")
print(genome.chrom_lengths) # {'chrI': 230218, 'chrII': 813184, ...}
print(genome.genes()[:3]) # First 3 gene features
Genome
dataclass
¶
Genome(name: str, organisms: dict[str, str], sequences: dict[str, str], features: tuple[Region, ...], topologies: dict[str, str])
Immutable container for a parsed genome.
Holds sequences, features, organism names, and topology information for one or more chromosomes or contigs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Genome name (typically the filename stem). |
required |
organisms
|
dict[str, str]
|
Mapping of chromosome name to organism string. |
required |
sequences
|
dict[str, str]
|
Mapping of chromosome name to uppercase DNA string. |
required |
features
|
tuple[Region, ...]
|
All extracted features across all chromosomes. |
required |
topologies
|
dict[str, str]
|
Mapping of chromosome name to |
required |
Examples:
chrom_lengths
property
¶
chroms
property
¶
features_on
¶
Filter features to a single chromosome.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chrom
|
str
|
Chromosome name to filter by. |
required |
Returns:
| Type | Description |
|---|---|
tuple[Region, ...]
|
Tuple of regions on the given chromosome. |
Examples:
>>> r = Region("chr1", 10, 20)
>>> Genome("g", {}, {}, (r,), {}).features_on("chr1")
(Region(chrom='chr1', start=10, end=20, strand='.', score=0.0, name='', tags={}),)
Source code in src/seqchain/io/genome.py
genes
¶
Filter features to gene-type only.
Returns:
| Type | Description |
|---|---|
tuple[Region, ...]
|
Tuple of regions where |
Examples:
>>> r = Region("c", 0, 10, tags={"feature_type": "gene"})
>>> Genome("g", {}, {}, (r,), {}).genes()
(Region(chrom='c', start=0, end=10, strand='.', score=0.0, name='', tags={'feature_type': 'gene'}),)
Source code in src/seqchain/io/genome.py
feature_index
¶
Build an IntervalTrack from all features.
Returns:
| Type | Description |
|---|---|
IntervalTrack
|
An |
IntervalTrack
|
feature in this genome. |
Examples:
Source code in src/seqchain/io/genome.py
is_circular
¶
Check whether a chromosome has circular topology.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chrom
|
str
|
Chromosome name. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
|
Examples:
Source code in src/seqchain/io/genome.py
topological_sequence
¶
Return sequence with overhang appended for circular chromosomes.
For circular chromosomes, appends the first overhang bases of the sequence to the end, enabling alignment across the origin. For linear chromosomes, returns the sequence unchanged.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chrom
|
str
|
Chromosome name. |
required |
overhang
|
int
|
Number of bases to append from the start.
Defaults to |
100000
|
Returns:
| Type | Description |
|---|---|
str
|
The (possibly extended) DNA string. |
Examples:
>>> g = Genome("g", {}, {"c": "ATGC"}, (), {"c": "circular"})
>>> g.topological_sequence("c", overhang=2)
'ATGCAT'
Source code in src/seqchain/io/genome.py
load_genbank
¶
Parse a GenBank file into a Genome.
Supports plain .gb and gzip-compressed .gb.gz files.
Extracts all features except source as Region objects.
BioPython is imported inside the function body (deferred import) so that the rest of the library can be used without it installed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to a GenBank file ( |
required |
Returns:
| Type | Description |
|---|---|
Genome
|
A |
Examples:
Source code in src/seqchain/io/genome.py
load_fasta
¶
Parse a FASTA file into a Genome.
Produces a Genome with sequences only — no features, no organism metadata. All chromosomes default to linear topology unless listed in circular.
Supports plain and gzip-compressed (.gz) files. No BioPython
required — uses a stdlib-only parser.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to a FASTA file. |
required |
circular
|
set[str] | None
|
Set of chromosome names to mark as circular.
Defaults to |
None
|
Returns:
| Type | Description |
|---|---|
Genome
|
A |
Examples: