Skip to content

Tn-seq Pipeline

Transposon insertion profiling: Discover → Trim → Map → Count. Raw FASTQ in, IntervalTrack of insertion counts out. Supports Himar1 (TA sites) and Tn5 (any position), Bowtie or BWA alignment, and staggered primer libraries.

Validated against TRANSIT's TPP on Sassetti lab Himar1 data (Pearson r = 0.998).

TnSeqResult dataclass

TnSeqResult(track: IntervalTrack, structure: TnSeqReadStructure, total_reads: int, junction_reads: int, mapped_reads: int, insertion_sites_hit: int, total_possible_sites: int, saturation: float)

Result of a Tn-seq processing pipeline run.

Contains an insertion-count Track plus summary statistics. The Track holds one Region per possible insertion site (e.g. every TA dinucleotide for Himar1), each with a read count in its score field and tags["count"].

Parameters:

Name Type Description Default
track IntervalTrack

IntervalTrack of insertion site Regions.

required
structure TnSeqReadStructure

Read structure learned during discovery.

required
total_reads int

Total reads in the junction file.

required
junction_reads int

Reads containing the transposon terminus.

required
mapped_reads int

Junction reads that aligned to the reference.

required
insertion_sites_hit int

Sites with at least one read.

required
total_possible_sites int

Total enumerable insertion sites.

required
saturation float

Fraction of sites hit (sites_hit / total_possible).

required

Examples:

>>> result.saturation
0.42

TnSeqPipeline dataclass

TnSeqPipeline(transposon: str = 'himar1', mapper: str = 'bowtie', mismatches: int = 1, threads: int = 1, min_trimmed_length: int = 20)

Tn-seq processing pipeline: Discover -> Trim -> Map -> Count.

Processes raw Tn-seq FASTQ files into an insertion-count Track. The Track contains one Region per possible insertion site with read counts. Export to .wig or .bed is handled by the I/O layer (write_wig(), write_bed()).

Parameters:

Name Type Description Default
transposon str

Preset name or literal terminus sequence. Defaults to "himar1".

'himar1'
mapper str

Alignment tool — "bowtie" or "bwa". BWA uses bwa aln + bwa samse (same algorithm as TRANSIT's TPP). Defaults to "bowtie".

'bowtie'
mismatches int

Allowed mismatches for alignment. Defaults to 1.

1
threads int

Alignment threads. Defaults to 1.

1
min_trimmed_length int

Minimum length for a trimmed read to be mapped. Defaults to 20.

20

Examples:

>>> pipeline = TnSeqPipeline(transposon="himar1")
>>> result = pipeline.run("R1.fq.gz", "R2.fq.gz", "ref.fna")

run

run(reads1: str, reads2: str | None = None, reference: str = '') -> TnSeqResult

Run the full Tn-seq pipeline.

Parameters:

Name Type Description Default
reads1 str

Path to R1 FASTQ file.

required
reads2 str | None

Optional path to R2 FASTQ file.

None
reference str

Path to reference genome FASTA.

''

Returns:

Type Description
TnSeqResult

TnSeqResult with insertion-count Track and stats.

Raises:

Type Description
RuntimeError

If the selected mapper is not on $PATH.

FileNotFoundError

If input files don't exist.

ValueError

If mapper is not "bowtie" or "bwa".

Examples:

>>> result = TnSeqPipeline().run("R1.fq.gz", "R2.fq.gz", "H37Rv.fna")
Source code in src/seqchain/recipes/_draft/tnseq.py
def run(
    self,
    reads1: str,
    reads2: str | None = None,
    reference: str = "",
) -> TnSeqResult:
    """Run the full Tn-seq pipeline.

    Args:
        reads1: Path to R1 FASTQ file.
        reads2: Optional path to R2 FASTQ file.
        reference: Path to reference genome FASTA.

    Returns:
        `TnSeqResult` with insertion-count Track and stats.

    Raises:
        RuntimeError: If the selected mapper is not on ``$PATH``.
        FileNotFoundError: If input files don't exist.
        ValueError: If *mapper* is not ``"bowtie"`` or ``"bwa"``.

    Examples:
        >>> result = TnSeqPipeline().run("R1.fq.gz", "R2.fq.gz", "H37Rv.fna")
    """
    if self.mapper == "bowtie":
        for tool in ("bowtie", "bowtie-build"):
            if not shutil.which(tool):
                raise RuntimeError(
                    f"{tool} not found on $PATH. "
                    "Install via: mamba install bowtie"
                )
    elif self.mapper == "bwa":
        if not shutil.which("bwa"):
            raise RuntimeError(
                "bwa not found on $PATH. Install via: mamba install bwa"
            )
    else:
        raise ValueError(
            f"Unknown mapper: {self.mapper!r}. Use 'bowtie' or 'bwa'."
        )

    # Step 1: Discover read structure
    log.info("Step 1: Discovering read structure...")
    structure = tnseq_discover(reads1, reads2, transposon=self.transposon)
    log.info(
        "Discovery: junction=%s, terminus=%s, trim=%d, swap=%s",
        structure.junction_read,
        structure.transposon_terminus,
        structure.trim_position,
        structure.need_swap,
    )

    # Determine junction file path
    junction_path = reads2 if structure.need_swap else reads1
    terminus = structure.transposon_terminus
    rc_terminus = reverse_complement(terminus)
    insertion_motif = structure.insertion_motif

    # Read reference genome
    log.info("Loading reference genome from %s...", reference)
    ref_sequences = load_fasta(reference).sequences
    total_bp = sum(len(s) for s in ref_sequences.values())
    log.info(
        "Reference: %d sequence(s), %d bp", len(ref_sequences), total_bp,
    )

    with tempfile.TemporaryDirectory(prefix="seqchain_tnseq_") as tmpdir:
        # Step 2: Trim junction reads (per-read terminus search)
        log.info("Step 2: Trimming junction reads...")
        trimmed_fasta = os.path.join(tmpdir, "trimmed.fa")
        total_reads, junction_reads = self._trim_reads(
            junction_path, terminus, rc_terminus, trimmed_fasta,
        )
        log.info(
            "Trimming: %d total, %d junction (%.1f%%)",
            total_reads,
            junction_reads,
            100 * junction_reads / total_reads if total_reads > 0 else 0,
        )

        if junction_reads == 0:
            log.warning("No junction reads found — returning empty track")
            track, total_sites, _ = self._build_track(
                ref_sequences, Counter(), insertion_motif,
            )
            return TnSeqResult(
                track=track,
                structure=structure,
                total_reads=total_reads,
                junction_reads=0,
                mapped_reads=0,
                insertion_sites_hit=0,
                total_possible_sites=total_sites,
                saturation=0.0,
            )

        # Step 3: Map trimmed reads to reference
        log.info("Step 3: Mapping trimmed reads to reference (%s)...",
                 self.mapper)
        sam_file = os.path.join(tmpdir, "aligned.sam")
        ref_for_align = prepare_reference(reference, tmpdir)
        if self.mapper == "bwa":
            indexed_ref = self._bwa_index(ref_for_align, tmpdir)
            self._bwa_align(indexed_ref, trimmed_fasta, sam_file)
        else:
            index_prefix = os.path.join(tmpdir, "ref")
            build_bowtie_index(ref_for_align, index_prefix)
            run_bowtie_alignment(
                index_prefix,
                trimmed_fasta,
                sam_file,
                mismatches=self.mismatches,
                threads=self.threads,
                query_is_fasta=True,
            )

        # Step 4: Count insertions per site
        log.info("Step 4: Counting insertions...")
        motif_len = len(insertion_motif) if insertion_motif else 0
        insertion_counts, mapped_reads = self._count_insertions(
            sam_file, motif_len,
        )
        log.info("Mapping: %d reads mapped", mapped_reads)

    # Build Track with all possible insertion sites
    track, total_sites, sites_hit = self._build_track(
        ref_sequences, insertion_counts, insertion_motif,
    )

    saturation = sites_hit / total_sites if total_sites > 0 else 0.0
    log.info(
        "Result: %d/%d sites hit (%.1f%% saturation), %d mapped reads",
        sites_hit, total_sites, 100 * saturation, mapped_reads,
    )

    return TnSeqResult(
        track=track,
        structure=structure,
        total_reads=total_reads,
        junction_reads=junction_reads,
        mapped_reads=mapped_reads,
        insertion_sites_hit=sites_hit,
        total_possible_sites=total_sites,
        saturation=saturation,
    )