Skip to content

Multiplicity

Hit-class classification for mapped regions. Groups regions by a key field and classifies each group as unique, multi, or unmapped.

Use this after alignment to separate single-mapping guides from multi-mappers.

HitClass dataclass

HitClass(count: int, category: str)

Classification result for one group key.

Parameters:

Name Type Description Default
count int

Number of regions in this group.

required
category str

One of "unique", "multi", or "unmapped".

required

Examples:

>>> HitClass(1, "unique")
HitClass(count=1, category='unique')

classify

classify(regions: Sequence[Region], group_by: str = 'name', expected: Iterable[str] | None = None) -> dict[str, HitClass]

Group regions by a key and classify each group's hit count.

Parameters:

Name Type Description Default
regions Sequence[Region]

Iterable of Region objects.

required
group_by str

Field name ("name", "chrom", "strand", etc.) or a tag key to group by. Defaults to "name".

'name'
expected Iterable[str] | None

Optional iterable of key values that must appear in the output even when no regions match them (these get category="unmapped").

None

Returns:

Type Description
dict[str, HitClass]

Dict mapping each group key to a HitClass.

Examples:

>>> from seqchain.region import Region
>>> regions = [Region("chr1", 0, 10, name="g1"), Region("chr1", 20, 30, name="g1")]
>>> result = classify(regions)
>>> result["g1"].category
'multi'
Source code in src/seqchain/primitives/multiplicity.py
def classify(
    regions: Sequence[Region],
    group_by: str = "name",
    expected: Iterable[str] | None = None,
) -> dict[str, HitClass]:
    """Group regions by a key and classify each group's hit count.

    Args:
        regions: Iterable of `Region` objects.
        group_by: Field name (``"name"``, ``"chrom"``, ``"strand"``, etc.)
            or a tag key to group by. Defaults to ``"name"``.
        expected: Optional iterable of key values that must appear in the
            output even when no regions match them (these get
            ``category="unmapped"``).

    Returns:
        Dict mapping each group key to a `HitClass`.

    Examples:
        >>> from seqchain.region import Region
        >>> regions = [Region("chr1", 0, 10, name="g1"), Region("chr1", 20, 30, name="g1")]
        >>> result = classify(regions)
        >>> result["g1"].category
        'multi'
    """
    counts: dict[str, int] = {}

    if expected is not None:
        for key in expected:
            counts[key] = 0

    for region in regions:
        key = _extract_key(region, group_by)
        if key is None:
            continue
        counts[key] = counts.get(key, 0) + 1

    result: dict[str, HitClass] = {}
    for key, count in counts.items():
        result[key] = HitClass(count, _category(count))
    return result