CRISPR Guide Design¶
Complete guide-design workflow driven by nuclease presets. Four steps: scan (find PAM sites), interpret (extract spacers), score (off-target alignment), and annotate (gene overlap).
from seqchain.io.genome import load_genbank
from seqchain.recipes import load_preset
from seqchain.recipes.crispr import design_guides
genome = load_genbank("yeast.gb")
guides = design_guides(genome, load_preset("spcas9"))
Each step is individually callable — design_guides() is the default
composition.
CRISPRPreset
dataclass
¶
CRISPRPreset(name: str, pam: str, spacer_len: int, pam_direction: str, description: str = '', align_pam_with_spacer: bool = False, seed_alignment: bool = False, seed_length: int | None = None, report_all_hits: bool = False, max_reported_hits: int | None = None, mismatches: int = 0)
Configuration for a CRISPR nuclease system.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Nuclease name (e.g. |
required |
pam
|
str
|
PAM pattern in IUPAC notation (e.g. |
required |
spacer_len
|
int
|
Guide spacer length in bp. |
required |
pam_direction
|
str
|
|
required |
description
|
str
|
Human-readable description. |
''
|
align_pam_with_spacer
|
bool
|
If |
False
|
seed_alignment
|
bool
|
If |
False
|
seed_length
|
int | None
|
Seed length for |
None
|
report_all_hits
|
bool
|
If |
False
|
max_reported_hits
|
int | None
|
Maximum alignments to report per query
( |
None
|
mismatches
|
int
|
Number of allowed mismatches. Defaults to |
0
|
Examples: >>> CRISPRPreset("SpCas9", "NGG", 20, "downstream", "Cas9") CRISPRPreset(name='SpCas9', pam='NGG', spacer_len=20, pam_direction='downstream', description='Cas9', align_pam_with_spacer=False, seed_alignment=False, seed_length=None, report_all_hits=False, max_reported_hits=None, mismatches=0)
design_guides
¶
design_guides(genome: Genome, preset: CRISPRPreset, *, chrom: str | None = None, threads: int = 1) -> Iterator[Region]
Run the full CRISPR guide design pipeline.
Pure streaming pipe: Scan → Interpret → Score → Annotate.
Batching inside score_off_targets is the only materialization,
and it happens inside that operation — not here.
The annotation step is a dual-sorted sweep-line via
annotate_with_genes — O(genes_in_window) memory, never O(genome).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genome
|
Genome
|
A loaded |
required |
preset
|
CRISPRPreset
|
Nuclease preset defining PAM, spacer length, etc. |
required |
chrom
|
str | None
|
Chromosome to scan. Defaults to all chromosomes. |
None
|
threads
|
int
|
Number of bowtie threads for off-target scoring. |
1
|
Yields:
| Type | Description |
|---|---|
Region
|
Fully annotated guide Regions. |
Examples:
>>> from seqchain.io.genome import load_genbank
>>> from seqchain.recipes import load_preset
>>> genome = load_genbank("yeast.gb")
>>> guides = design_guides(genome, load_preset("spcas9"))
Source code in src/seqchain/recipes/crispr.py
interpret_guides
¶
interpret_guides(regions: Iterable[Region], sequences: dict[str, str], preset: CRISPRPreset, *, topologies: dict[str, str] | None = None) -> Iterator[Region]
Transform PAM hits into domain-meaningful guide Regions.
Yields guide Regions with full footprint coords, coordinate hash
as name, and tags including guide_id, pam_start,
pam_end, spacer, guide_seq. PAM hits at chromosome
boundaries are silently skipped.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
regions
|
Iterable[Region]
|
PAM hit Regions (output of a |
required |
sequences
|
dict[str, str]
|
Mapping of chromosome name to DNA string. |
required |
preset
|
CRISPRPreset
|
Nuclease preset defining spacer length and PAM direction. |
required |
topologies
|
dict[str, str] | None
|
Mapping of chromosome name to |
None
|
Yields:
| Type | Description |
|---|---|
Region
|
Guide Regions. |
Examples:
>>> from seqchain.recipes import load_preset
>>> preset = load_preset("spcas9")
>>> interpreted = list(interpret_guides(hits, {"chr": seq}, preset))
Source code in src/seqchain/recipes/crispr.py
configure_mapper
¶
configure_mapper(preset: CRISPRPreset, *, topologies: dict[str, str] | None = None) -> Callable[..., Iterator[Region]]
Create a mapper for the preset's PAM pattern.
Returns a functools.partial(bowtie_map, ...) when alignment
fields are set on the preset (e.g. mismatches > 0 or
align_pam_with_spacer), otherwise a bound
regex_map() function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
preset
|
CRISPRPreset
|
Nuclease preset defining PAM and alignment parameters. |
required |
topologies
|
dict[str, str] | None
|
Mapping of chromosome name to |
None
|
Returns:
| Type | Description |
|---|---|
Callable[..., Iterator[Region]]
|
A callable that takes |
Callable[..., Iterator[Region]]
|
returns an iterator of Regions. |
Examples:
>>> from seqchain.recipes import load_preset
>>> mapper = configure_mapper(load_preset("spcas9"))
>>> callable(mapper)
True