Bowtie Mapping¶

Align spacer sequences against a reference genome using Bowtie. Handles index building, query generation, alignment, and result parsing.

Supports both standard mode (-v, sequence-space mismatches) and spy mode (-n, seed-based alignment with PAM included in the query).

bowtie ¶

Alignment-based mapping via Bowtie 1.

Provides bowtie_map() which handles the full workflow: PAM scanning, spacer extraction, FASTA/FASTQ creation, bowtie-build, bowtie alignment, SAM parsing, coordinate normalization, and Region yielding.

Two alignment modes are supported:

Standard mode (-v mismatch counting): reports only the best unique hit per query. Used for on-target mapping.
Spy mode (-n seed-based alignment): reports all hits up to max_reported_hits, with PAM prepended to the query. Used for comprehensive off-target analysis.

Requires bowtie and bowtie-build on $PATH (installed via mamba).

bowtie_map ¶

bowtie_map(sequences: dict[str, str], pam: str, *, spacer_len: int = 20, pam_direction: str = 'downstream', mismatches: int = 0, align_pam_with_spacer: bool = False, seed_alignment: bool = False, seed_length: int | None = None, report_all_hits: bool = False, max_reported_hits: int | None = None, threads: int = 1, topologies: dict[str, str] | None = None) -> Iterator[Region]

Scan sequences for PAM sites and align via bowtie.

Scans genomes for PAM sites, extracts spacers, aligns them against a topological reference via bowtie, parses the SAM output, and yields Regions with alignment metadata.

Parameters:

Name	Type	Description	Default
`sequences`	`dict[str, str]`	Mapping of chromosome name to uppercase DNA string.	required
`pam`	`str`	PAM pattern in IUPAC notation (e.g. `"NGG"`).	required
`spacer_len`	`int`	Guide spacer length in bp. Defaults to `20`.	`20`
`pam_direction`	`str`	`"downstream"` or `"upstream"`. Defaults to `"downstream"`.	`'downstream'`
`mismatches`	`int`	Number of allowed mismatches. Defaults to `0`.	`0`
`align_pam_with_spacer`	`bool`	If `True`, prepend/append PAM to the query (spy strategy). Defaults to `False`.	`False`
`seed_alignment`	`bool`	If `True`, use `-n` mode instead of `-v`. Defaults to `False`.	`False`
`seed_length`	`int \| None`	Seed length for `-n` mode. Defaults to `None`.	`None`
`report_all_hits`	`bool`	If `True`, use `-a` to report all hits. Defaults to `False`.	`False`
`max_reported_hits`	`int \| None`	Cap on reported alignments (`-m` flag). Defaults to `None`.	`None`
`threads`	`int`	Number of bowtie threads. Defaults to `1`.	`1`
`topologies`	`dict[str, str] \| None`	Mapping of chrom name to `"circular"` or `"linear"`. If `None`, all chroms treated as linear.	`None`

Yields:

Type	Description
`Region`	`Region` objects for each alignment hit, with tags:
`Region`	`pattern`, `matched`, `spacer`, `target`,
`Region`	`mismatches`, `guide_seq`.

Raises:

Type	Description
`RuntimeError`	If `bowtie` or `bowtie-build` is not found on `$PATH`.

Examples:

>>> hits = list(bowtie_map({"chr1": "ATGCNGG..."}, pam="NGG"))

Source code in src/seqchain/operations/map/bowtie.py

def bowtie_map(
    sequences: dict[str, str],
    pam: str,
    *,
    spacer_len: int = 20,
    pam_direction: str = "downstream",
    mismatches: int = 0,
    align_pam_with_spacer: bool = False,
    seed_alignment: bool = False,
    seed_length: int | None = None,
    report_all_hits: bool = False,
    max_reported_hits: int | None = None,
    threads: int = 1,
    topologies: dict[str, str] | None = None,
) -> Iterator[Region]:
    """Scan sequences for PAM sites and align via bowtie.

    Scans genomes for PAM sites, extracts spacers, aligns them against
    a topological reference via bowtie, parses the SAM output, and
    yields Regions with alignment metadata.

    Args:
        sequences: Mapping of chromosome name to uppercase DNA string.
        pam: PAM pattern in IUPAC notation (e.g. ``"NGG"``).
        spacer_len: Guide spacer length in bp. Defaults to ``20``.
        pam_direction: ``"downstream"`` or ``"upstream"``. Defaults to
            ``"downstream"``.
        mismatches: Number of allowed mismatches. Defaults to ``0``.
        align_pam_with_spacer: If ``True``, prepend/append PAM to the
            query (spy strategy). Defaults to ``False``.
        seed_alignment: If ``True``, use ``-n`` mode instead of ``-v``.
            Defaults to ``False``.
        seed_length: Seed length for ``-n`` mode. Defaults to ``None``.
        report_all_hits: If ``True``, use ``-a`` to report all hits.
            Defaults to ``False``.
        max_reported_hits: Cap on reported alignments (``-m`` flag).
            Defaults to ``None``.
        threads: Number of bowtie threads. Defaults to ``1``.
        topologies: Mapping of chrom name to ``"circular"`` or
            ``"linear"``. If ``None``, all chroms treated as linear.

    Yields:
        `Region` objects for each alignment hit, with tags:
        ``pattern``, ``matched``, ``spacer``, ``target``,
        ``mismatches``, ``guide_seq``.

    Raises:
        RuntimeError: If ``bowtie`` or ``bowtie-build`` is not found
            on ``$PATH``.

    Examples:
        >>> hits = list(bowtie_map({"chr1": "ATGCNGG..."}, pam="NGG"))  # doctest: +SKIP
    """
    topo = topologies or {}

    with tempfile.TemporaryDirectory(prefix="seqchain_bowtie_") as tmpdir:
        if not sequences:
            return

        # 1. Build topological FASTA reference
        ref_fasta = os.path.join(tmpdir, "ref.fa")
        chrom_lengths = write_reference_fasta(sequences, ref_fasta, topo)

        # 2. Build bowtie index
        index_prefix = os.path.join(tmpdir, "ref")
        build_bowtie_index(ref_fasta, index_prefix)

        # 3. Scan PAMs and extract spacers → write FASTQ queries
        query_fastq = os.path.join(tmpdir, "queries.fq")
        query_map = _write_query_fastq(
            sequences, query_fastq, pam, spacer_len, pam_direction,
            align_pam_with_spacer, topo,
        )

        if not query_map:
            return

        # 4. Run bowtie alignment
        sam_file = os.path.join(tmpdir, "alignments.sam")
        run_bowtie_alignment(
            index_prefix,
            query_fastq,
            sam_file,
            mismatches=mismatches,
            seed_alignment=seed_alignment,
            seed_length=seed_length,
            report_all_hits=report_all_hits,
            max_reported_hits=max_reported_hits,
            threads=threads,
        )

        # 5. Parse SAM and yield Regions
        pam_regex = re.compile(expand_iupac(pam.upper()))
        yield from _parse_sam(
            sam_file, ref_fasta, query_map, chrom_lengths, pam_regex,
            sequences, pam, pam_direction, spacer_len,
            align_pam_with_spacer, topo,
        )