Skip to content

Bowtie Mapping

Align spacer sequences against a reference genome using Bowtie. Handles index building, query generation, alignment, and result parsing.

Supports both standard mode (-v, sequence-space mismatches) and spy mode (-n, seed-based alignment with PAM included in the query).

bowtie

Alignment-based mapping via Bowtie 1.

Provides bowtie_map() which handles the full workflow: PAM scanning, spacer extraction, FASTA/FASTQ creation, bowtie-build, bowtie alignment, SAM parsing, coordinate normalization, and Region yielding.

Two alignment modes are supported:

  • Standard mode (-v mismatch counting): reports only the best unique hit per query. Used for on-target mapping.
  • Spy mode (-n seed-based alignment): reports all hits up to max_reported_hits, with PAM prepended to the query. Used for comprehensive off-target analysis.

Requires bowtie and bowtie-build on $PATH (installed via mamba).

bowtie_map

bowtie_map(sequences: dict[str, str], pam: str, *, spacer_len: int = 20, pam_direction: str = 'downstream', mismatches: int = 0, align_pam_with_spacer: bool = False, seed_alignment: bool = False, seed_length: int | None = None, report_all_hits: bool = False, max_reported_hits: int | None = None, threads: int = 1, topologies: dict[str, str] | None = None) -> Iterator[Region]

Scan sequences for PAM sites and align via bowtie.

Scans genomes for PAM sites, extracts spacers, aligns them against a topological reference via bowtie, parses the SAM output, and yields Regions with alignment metadata.

Parameters:

Name Type Description Default
sequences dict[str, str]

Mapping of chromosome name to uppercase DNA string.

required
pam str

PAM pattern in IUPAC notation (e.g. "NGG").

required
spacer_len int

Guide spacer length in bp. Defaults to 20.

20
pam_direction str

"downstream" or "upstream". Defaults to "downstream".

'downstream'
mismatches int

Number of allowed mismatches. Defaults to 0.

0
align_pam_with_spacer bool

If True, prepend/append PAM to the query (spy strategy). Defaults to False.

False
seed_alignment bool

If True, use -n mode instead of -v. Defaults to False.

False
seed_length int | None

Seed length for -n mode. Defaults to None.

None
report_all_hits bool

If True, use -a to report all hits. Defaults to False.

False
max_reported_hits int | None

Cap on reported alignments (-m flag). Defaults to None.

None
threads int

Number of bowtie threads. Defaults to 1.

1
topologies dict[str, str] | None

Mapping of chrom name to "circular" or "linear". If None, all chroms treated as linear.

None

Yields:

Type Description
Region

Region objects for each alignment hit, with tags:

Region

pattern, matched, spacer, target,

Region

mismatches, guide_seq.

Raises:

Type Description
RuntimeError

If bowtie or bowtie-build is not found on $PATH.

Examples:

>>> hits = list(bowtie_map({"chr1": "ATGCNGG..."}, pam="NGG"))
Source code in src/seqchain/operations/map/bowtie.py
def bowtie_map(
    sequences: dict[str, str],
    pam: str,
    *,
    spacer_len: int = 20,
    pam_direction: str = "downstream",
    mismatches: int = 0,
    align_pam_with_spacer: bool = False,
    seed_alignment: bool = False,
    seed_length: int | None = None,
    report_all_hits: bool = False,
    max_reported_hits: int | None = None,
    threads: int = 1,
    topologies: dict[str, str] | None = None,
) -> Iterator[Region]:
    """Scan sequences for PAM sites and align via bowtie.

    Scans genomes for PAM sites, extracts spacers, aligns them against
    a topological reference via bowtie, parses the SAM output, and
    yields Regions with alignment metadata.

    Args:
        sequences: Mapping of chromosome name to uppercase DNA string.
        pam: PAM pattern in IUPAC notation (e.g. ``"NGG"``).
        spacer_len: Guide spacer length in bp. Defaults to ``20``.
        pam_direction: ``"downstream"`` or ``"upstream"``. Defaults to
            ``"downstream"``.
        mismatches: Number of allowed mismatches. Defaults to ``0``.
        align_pam_with_spacer: If ``True``, prepend/append PAM to the
            query (spy strategy). Defaults to ``False``.
        seed_alignment: If ``True``, use ``-n`` mode instead of ``-v``.
            Defaults to ``False``.
        seed_length: Seed length for ``-n`` mode. Defaults to ``None``.
        report_all_hits: If ``True``, use ``-a`` to report all hits.
            Defaults to ``False``.
        max_reported_hits: Cap on reported alignments (``-m`` flag).
            Defaults to ``None``.
        threads: Number of bowtie threads. Defaults to ``1``.
        topologies: Mapping of chrom name to ``"circular"`` or
            ``"linear"``. If ``None``, all chroms treated as linear.

    Yields:
        `Region` objects for each alignment hit, with tags:
        ``pattern``, ``matched``, ``spacer``, ``target``,
        ``mismatches``, ``guide_seq``.

    Raises:
        RuntimeError: If ``bowtie`` or ``bowtie-build`` is not found
            on ``$PATH``.

    Examples:
        >>> hits = list(bowtie_map({"chr1": "ATGCNGG..."}, pam="NGG"))  # doctest: +SKIP
    """
    topo = topologies or {}

    with tempfile.TemporaryDirectory(prefix="seqchain_bowtie_") as tmpdir:
        if not sequences:
            return

        # 1. Build topological FASTA reference
        ref_fasta = os.path.join(tmpdir, "ref.fa")
        chrom_lengths = write_reference_fasta(sequences, ref_fasta, topo)

        # 2. Build bowtie index
        index_prefix = os.path.join(tmpdir, "ref")
        build_bowtie_index(ref_fasta, index_prefix)

        # 3. Scan PAMs and extract spacers → write FASTQ queries
        query_fastq = os.path.join(tmpdir, "queries.fq")
        query_map = _write_query_fastq(
            sequences, query_fastq, pam, spacer_len, pam_direction,
            align_pam_with_spacer, topo,
        )

        if not query_map:
            return

        # 4. Run bowtie alignment
        sam_file = os.path.join(tmpdir, "alignments.sam")
        run_bowtie_alignment(
            index_prefix,
            query_fastq,
            sam_file,
            mismatches=mismatches,
            seed_alignment=seed_alignment,
            seed_length=seed_length,
            report_all_hits=report_all_hits,
            max_reported_hits=max_reported_hits,
            threads=threads,
        )

        # 5. Parse SAM and yield Regions
        pam_regex = re.compile(expand_iupac(pam.upper()))
        yield from _parse_sam(
            sam_file, ref_fasta, query_map, chrom_lengths, pam_regex,
            sequences, pam, pam_direction, spacer_len,
            align_pam_with_spacer, topo,
        )