regex_map¶
Find IUPAC nucleotide patterns in genome sequences. Expands patterns
like NGG into regex ([ATGC]GG) and scans both strands. Supports
circular genome topologies with origin-spanning matches.
Use this for PAM scanning in CRISPR guide design, or any motif search where you need all occurrences of a degenerate pattern.
regex_map
¶
regex_map(sequences: dict[str, str], pattern: str, *, topologies: dict[str, str] | None = None) -> Iterator[Region]
Scan genome sequences for an IUPAC motif pattern.
Scans both forward and reverse strands of every chromosome for
occurrences of pattern, yielding a Region per match.
For circular chromosomes, also scans a junction window around the origin to find motif sites that straddle position 0.
Each Region carries:
name: the matched sequence stringtags["pattern"]: the original IUPAC patterntags["matched"]: the concrete matched sequence
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequences
|
dict[str, str]
|
Mapping of chromosome name to uppercase DNA string. |
required |
pattern
|
str
|
IUPAC nucleotide pattern (e.g. |
required |
topologies
|
dict[str, str] | None
|
Mapping of chromosome name to |
None
|
Yields:
| Type | Description |
|---|---|
Region
|
|
Region
|
coordinates. |
Examples: