Sequence¶
Pure functions for DNA sequence math. No dependencies beyond stdlib.
Basics¶
reverse_complement
¶
Return the reverse complement of a DNA sequence.
Complements A<->T and C<->G while preserving N. Case is preserved (lowercase input yields lowercase output).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq
|
str
|
DNA sequence containing characters in |
required |
Returns:
| Type | Description |
|---|---|
str
|
The reverse-complemented string. |
Examples:
Source code in src/seqchain/primitives/sequence.py
gc_content
¶
Compute the GC fraction of a DNA sequence.
Counts G and C bases (case-insensitive) and divides by the total sequence length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq
|
str
|
DNA sequence. |
required |
Returns:
| Type | Description |
|---|---|
float
|
GC fraction in the range |
float
|
empty string. |
Examples:
Source code in src/seqchain/primitives/sequence.py
is_valid_dna
¶
Check whether a sequence contains only unambiguous DNA bases.
Accepts A, T, G, C in either case. Rejects IUPAC
ambiguity codes (N, R, Y, etc.) and empty strings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq
|
str
|
Candidate DNA string. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
|
bool
|
is non-empty; |
Examples:
Source code in src/seqchain/primitives/sequence.py
melting_temp
¶
Estimate melting temperature using the nearest-neighbor method.
Implements the SantaLucia (1998) unified parameters with a salt correction for monovalent cations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq
|
str
|
DNA oligonucleotide sequence (case-insensitive). Must be at least 2 nt for a meaningful result. |
required |
oligo_conc
|
float
|
Total oligonucleotide concentration in mol/L. Defaults to 250 nM. |
2.5e-07
|
na_conc
|
float
|
Monovalent cation (Na+) concentration in mol/L. Defaults to 50 mM. |
0.05
|
Returns:
| Type | Description |
|---|---|
float
|
Predicted melting temperature in degrees Celsius, rounded to one |
float
|
decimal place. Returns |
Examples:
Source code in src/seqchain/primitives/sequence.py
Distance¶
hamming_distance
¶
Count the number of positions where two sequences differ.
Both sequences must be the same length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
str
|
First sequence. |
required |
b
|
str
|
Second sequence (same length as a). |
required |
Returns:
| Type | Description |
|---|---|
int
|
Number of mismatched positions. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If a and b have different lengths. |
Examples:
Source code in src/seqchain/primitives/sequence.py
edit_distance
¶
Compute the Levenshtein edit distance between two sequences.
Uses a single-row dynamic-programming approach for O(min(m, n)) space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
str
|
First sequence. |
required |
b
|
str
|
Second sequence. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Minimum number of single-character insertions, deletions, or |
int
|
substitutions to transform a into b. |
Examples:
Source code in src/seqchain/primitives/sequence.py
diff
¶
Produce a character-level diff between two equal-length sequences.
Matching positions are represented by '.' and mismatches show
the character from b.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
str
|
Reference sequence. |
required |
b
|
str
|
Query sequence (same length as a). |
required |
Returns:
| Type | Description |
|---|---|
str
|
Diff string of the same length as a and b. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If a and b have different lengths. |
Examples:
Source code in src/seqchain/primitives/sequence.py
IUPAC & pattern matching¶
expand_iupac
¶
Convert an IUPAC degenerate nucleotide pattern to a regex string.
Each IUPAC ambiguity code is replaced by the corresponding regex
character class (e.g. N -> [ATGC], V -> [ACG]).
Concrete bases and unrecognised characters pass through unchanged.
Input is uppercased before conversion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern
|
str
|
IUPAC nucleotide pattern (e.g. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Regex-compatible string. |
Examples:
Source code in src/seqchain/primitives/sequence.py
pattern_matches
¶
Test whether an IUPAC pattern fully matches a concrete sequence.
The pattern must cover the entire sequence (anchored full-match). Both pattern and seq are compared case-insensitively.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern
|
str
|
IUPAC nucleotide pattern (e.g. |
required |
seq
|
str
|
Concrete DNA sequence to test. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
|
Examples:
Source code in src/seqchain/primitives/sequence.py
find_pattern
¶
Find all occurrences of an IUPAC pattern on both strands.
Scans the forward strand and then the reverse complement. All
positions are reported in forward-strand coordinates. Overlapping
matches are included (e.g. "AGGG" yields NGG at positions 0
and 1).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq
|
str
|
Reference DNA sequence to search. |
required |
pattern
|
str
|
IUPAC nucleotide pattern (e.g. |
required |
Yields:
| Type | Description |
|---|---|
int
|
|
str
|
is |
str
|
coordinate. |
Examples: