Skip to content

Exact Barcode Counting

Count exact barcode matches in sequencing reads. Given a barcode library and a read structure, count_barcodes() counts how many reads contain each barcode at the expected position.

exact

Exact-match barcode counting.

Module-level function count_barcodes() migrated from legacy/barcoder/heuristicount.py. Uses a learned ReadStructure to extract barcode+flanks at known offset, validates against flank-augmented library via set lookup.

count_barcodes

count_barcodes(reads1: str, library: set[str], *, structure: ReadStructure, reads2: str | None = None) -> TableTrack

Count barcodes in reads using the learned structure.

Parameters:

Name Type Description Default
reads1 str

Path to primary reads file.

required
library set[str]

Set of known barcode sequences.

required
structure ReadStructure

Learned read structure from discovery.

required
reads2 str | None

Optional path to paired-end reads file.

None

Returns:

Type Description
TableTrack

TableTrack with barcode counts. Undocumented barcodes

TableTrack

(valid flanks but not in library) have "*" suffix.

Examples:

>>> track = count_barcodes("r.fq", lib, structure=s)
Source code in src/seqchain/operations/quantify/exact.py
def count_barcodes(
    reads1: str,
    library: set[str],
    *,
    structure: ReadStructure,
    reads2: str | None = None,
) -> TableTrack:
    """Count barcodes in reads using the learned structure.

    Args:
        reads1: Path to primary reads file.
        library: Set of known barcode sequences.
        structure: Learned read structure from discovery.
        reads2: Optional path to paired-end reads file.

    Returns:
        `TableTrack` with barcode counts. Undocumented barcodes
        (valid flanks but not in library) have ``"*"`` suffix.

    Examples:
        >>> track = count_barcodes("r.fq", lib, structure=s)
    """
    bc_len = structure.barcode_length
    is_paired = reads2 is not None

    # Build flank-augmented library sets
    bcs_rev: set[str] = set()
    for bc in library:
        bcs_rev.add(reverse_complement(bc))

    L_fwd = structure.left_flank
    R_fwd = structure.right_flank
    L_rev = structure.reverse_left_flank
    R_rev = structure.reverse_right_flank

    bcs_with_flanks_fwd = _add_flanks(library, L_fwd, R_fwd)
    bcs_with_flanks_rev = _add_flanks(bcs_rev, L_rev, R_rev)

    # Compute extraction starts (offset - flank length)
    L_fwd_len = _safe_len(L_fwd)
    R_fwd_len = _safe_len(R_fwd)
    L_rev_len = _safe_len(L_rev)
    R_rev_len = _safe_len(R_rev)

    L_fwd_start = structure.barcode_offset - L_fwd_len
    L_rev_start = (
        (structure.reverse_offset - L_rev_len)
        if structure.reverse_offset is not None
        else None
    )

    # Stream and count
    log.info("Counting...")
    total_counts: PyCounter = PyCounter()
    chunk_stream = read_fastq_chunks(
        reads1, path2=reads2 if is_paired else None,
    )

    for chunk in chunk_stream:
        chunk_counts, _ = _process_chunk(
            chunk,
            bcs_with_flanks_fwd,
            bcs_with_flanks_rev,
            L_fwd_start,
            L_rev_start,
            bc_len,
            L_fwd,
            R_fwd,
            L_rev,
            R_rev,
            structure.need_swap,
        )
        total_counts.update(chunk_counts)

    # Split into documented and undocumented, build TableTrack
    data: dict[str, float] = {}
    documented = 0
    undocumented = 0
    for barcode, count in total_counts.items():
        data[barcode] = float(count)
        if barcode.endswith("*"):
            undocumented += count
        else:
            documented += count

    log.info(
        "Counted %s documented, %s undocumented",
        f"{documented:,}", f"{undocumented:,}",
    )
    return TableTrack(TrackLabel("barcode_counts"), data)