Skip to content

Bin

Summarize a track into fixed-size genomic bins. This is the bridge from SignalTrack (which doesn't support map_scores) to IntervalTrack (which supports all transforms).

The mean method works on any Track. The max, min, and sum methods require raw_signal() (SignalTrack only).

bin

Bin-based transforms for signal tracks.

Converts continuous signal data into discrete intervals by summarizing signal values within fixed-size genomic bins.

Examples:

>>> from seqchain.transform.bin import bin_summarize
>>> intervals = bin_summarize(signal_track, 200, {"chr1": 10000})

bin_summarize

bin_summarize(track: 'Track', bin_size: int, chrom_sizes: dict[str, int], method: str = 'mean') -> IntervalTrack

Summarize a track's signal into fixed-size genomic bins.

"mean" uses signal_at() and works on any track type. "max", "min", and "sum" require per-base resolution via raw_signal() (only available on SignalTrack).

Parameters:

Name Type Description Default
track 'Track'

Input track.

required
bin_size int

Bin size in base pairs.

required
chrom_sizes dict[str, int]

Chromosome name to length mapping.

required
method str

Summarization method. One of "mean" (default), "max", "min", "sum".

'mean'

Returns:

Type Description
IntervalTrack

An IntervalTrack with one Region per bin.

Raises:

Type Description
ValueError

If method is not recognized, or if a per-base method is requested on a track without raw_signal().

Examples:

>>> t = bin_summarize(atac_signal, 200, {"chr1": 1000})
>>> len(t)
5
Source code in src/seqchain/transform/bin.py
def bin_summarize(
    track: "Track",
    bin_size: int,
    chrom_sizes: dict[str, int],
    method: str = "mean",
) -> IntervalTrack:
    """Summarize a track's signal into fixed-size genomic bins.

    ``"mean"`` uses ``signal_at()`` and works on any track type.
    ``"max"``, ``"min"``, and ``"sum"`` require per-base resolution
    via ``raw_signal()`` (only available on SignalTrack).

    Args:
        track: Input track.
        bin_size: Bin size in base pairs.
        chrom_sizes: Chromosome name to length mapping.
        method: Summarization method.  One of ``"mean"`` (default),
            ``"max"``, ``"min"``, ``"sum"``.

    Returns:
        An IntervalTrack with one Region per bin.

    Raises:
        ValueError: If *method* is not recognized, or if a per-base
            method is requested on a track without ``raw_signal()``.

    Examples:
        >>> t = bin_summarize(atac_signal, 200, {"chr1": 1000})
        >>> len(t)  # doctest: +SKIP
        5
    """
    if method not in _METHODS:
        raise ValueError(
            f"Unknown method {method!r}, expected one of {', '.join(str(m) for m in _METHODS)}"
        )

    if method in _PERBASE_METHODS and not hasattr(track, "raw_signal"):
        raise ValueError(
            f"method={method!r} requires per-base resolution. "
            f"{type(track).__name__} only has region-level scores. "
            f"Use method='mean' or convert your source data to a SignalTrack."
        )

    _agg = {"max": max, "min": min, "sum": sum}

    regions: list[Region] = []
    for b in bin_query(track, chrom_sizes, bin_size):
        if method == "mean":
            score = b.signal
        else:
            vals = track.raw_signal(b.chrom, b.start, b.end)
            score = _agg[method](vals) if vals else 0.0
        regions.append(
            Region(chrom=b.chrom, start=b.start, end=b.end, score=score)
        )

    return IntervalTrack(track, regions)