Threshold¶

Score-based filtering and binarization. Use binarize to convert continuous scores to presence/absence (1/0) at a cutoff. Use filter_by_score to keep only entries within a score range.

binarize ¶

binarize(regions: Iterable[Region], predicate: Callable[[float], float]) -> Iterator[Region]

Apply a score predicate to each Region's score.

Yields copies of input Regions with scores replaced by the predicate's output (typically 0.0 or 1.0).

Parameters:

Name	Type	Description	Default
`regions`	`Iterable[Region]`	Input regions to transform.	required
`predicate`	`Callable[[float], float]`	Function mapping score to new score.	required

Yields:

Type	Description
`Region`	Regions with transformed scores.

Examples:

>>> from seqchain.region import Region
>>> rs = [Region("chr1", 0, 10, score=5.0)]
>>> list(binarize(rs, make_predicate(3.0)))[0].score
1.0

Source code in src/seqchain/transform/threshold.py

def binarize(
    regions: Iterable[Region],
    predicate: Callable[[float], float],
) -> Iterator[Region]:
    """Apply a score predicate to each Region's score.

    Yields copies of input Regions with scores replaced by
    the predicate's output (typically 0.0 or 1.0).

    Args:
        regions: Input regions to transform.
        predicate: Function mapping score to new score.

    Yields:
        Regions with transformed scores.

    Examples:
        >>> from seqchain.region import Region
        >>> rs = [Region("chr1", 0, 10, score=5.0)]
        >>> list(binarize(rs, make_predicate(3.0)))[0].score
        1.0
    """
    for r in regions:
        yield replace(r, score=predicate(r.score))

filter_by_score ¶

filter_by_score(track: Track, min_score: float | None = None, max_score: float | None = None) -> Track

Keep only entries within a score range.

NaN entries are excluded when any bound is set, and pass through when both bounds are None.

Parameters:

Name	Type	Description	Default
`track`	`Track`	The track to transform.	required
`min_score`	`float \| None`	Minimum score (inclusive). `None` means no lower bound.	`None`
`max_score`	`float \| None`	Maximum score (inclusive). `None` means no upper bound.	`None`

Returns:

Type	Description
`Track`	A new Track with only matching entries.

Examples:

>>> from seqchain.track import TableTrack, TrackLabel
>>> t = TableTrack(TrackLabel("t"), {"a": 1.0, "b": 5.0, "c": 10.0})
>>> t2 = filter_by_score(t, min_score=2.0, max_score=8.0)
>>> t2.keys()
['b']

Source code in src/seqchain/transform/threshold.py

def filter_by_score(
    track: Track,
    min_score: float | None = None,
    max_score: float | None = None,
) -> Track:
    """Keep only entries within a score range.

    NaN entries are excluded when any bound is set, and pass through
    when both bounds are ``None``.

    Args:
        track: The track to transform.
        min_score: Minimum score (inclusive). ``None`` means no
            lower bound.
        max_score: Maximum score (inclusive). ``None`` means no
            upper bound.

    Returns:
        A new Track with only matching entries.

    Examples:
        >>> from seqchain.track import TableTrack, TrackLabel
        >>> t = TableTrack(TrackLabel("t"), {"a": 1.0, "b": 5.0, "c": 10.0})
        >>> t2 = filter_by_score(t, min_score=2.0, max_score=8.0)
        >>> t2.keys()
        ['b']
    """
    has_bound = min_score is not None or max_score is not None

    def _keep(s: float) -> bool:
        if math.isnan(s):
            return not has_bound
        if min_score is not None and s < min_score:
            return False
        if max_score is not None and s > max_score:
            return False
        return True

    return track.filter_entries(_keep)