Skip to content

Chromatin Annotation

Truth-table lookup over binned signal tracks. Given a set of named signal tracks and a YAML-configured set of state definitions (mark present/absent), assigns a chromatin state label to each genomic bin.

States describe mark combinations only — no gene names, no functional interpretation. Gene features are a separate annotation layer.

Configuration

TrackThreshold dataclass

TrackThreshold(name: str, threshold: float | None = None)

Per-track threshold for present/absent classification.

Parameters:

Name Type Description Default
name str

Track name (must match a key in the tracks dict).

required
threshold float | None

Signal value above which the track is "present". None means the threshold must be computed at runtime via a threshold function.

None

Examples:

>>> TrackThreshold("ATAC", 2.0)
TrackThreshold(name='ATAC', threshold=2.0)

StateDefinition dataclass

StateDefinition(name: str, requirements: dict[str, Literal['present', 'absent']])

A state defined by track presence/absence requirements.

Parameters:

Name Type Description Default
name str

State label (e.g. "open_active").

required
requirements dict[str, Literal['present', 'absent']]

Map of track name to "present" or "absent". Tracks not mentioned are wildcards (ignored).

required

Examples:

>>> StateDefinition("open_active", {"ATAC": "present", "H3K4me3": "present"})
StateDefinition(name='open_active', requirements={'ATAC': 'present', 'H3K4me3': 'present'})

ChromatinConfig dataclass

ChromatinConfig(organism: str, bin_size: int, thresholds: list[TrackThreshold], states: list[StateDefinition], default_state: str = 'unclassified')

Configuration for chromatin state annotation.

Parameters:

Name Type Description Default
organism str

Organism name.

required
bin_size int

Bin size in bp for genome walking.

required
thresholds list[TrackThreshold]

Per-track thresholds for present/absent calls.

required
states list[StateDefinition]

Ordered list of state definitions (priority order).

required
default_state str

Label for windows matching no defined state.

'unclassified'

Examples:

>>> config = ChromatinConfig(
...     organism="S. cerevisiae",
...     bin_size=200,
...     thresholds=[TrackThreshold("ATAC", 2.0)],
...     states=[StateDefinition("open", {"ATAC": "present"})],
...     default_state="unclassified",
... )

chromatin_config_from_dict

chromatin_config_from_dict(data: dict) -> ChromatinConfig

Build a ChromatinConfig from a parsed YAML dict.

Tracks with missing or null threshold values are accepted — the threshold must be provided at runtime via a threshold function.

Parameters:

Name Type Description Default
data dict

Dictionary with chromatin config fields.

required

Returns:

Type Description
ChromatinConfig

A ChromatinConfig.

Examples:

>>> cfg = chromatin_config_from_dict({"type": "chromatin", ...})
Source code in src/seqchain/operations/annotate/chromatin.py
def chromatin_config_from_dict(data: dict) -> ChromatinConfig:
    """Build a ChromatinConfig from a parsed YAML dict.

    Tracks with missing or null ``threshold`` values are accepted —
    the threshold must be provided at runtime via a threshold function.

    Args:
        data: Dictionary with chromatin config fields.

    Returns:
        A `ChromatinConfig`.

    Examples:
        >>> cfg = chromatin_config_from_dict({"type": "chromatin", ...})  # doctest: +SKIP
    """
    thresholds = []
    for name, info in data["tracks"].items():
        if isinstance(info, dict) and "threshold" in info and info["threshold"] is not None:
            thresholds.append(TrackThreshold(name=name, threshold=info["threshold"]))
        else:
            thresholds.append(TrackThreshold(name=name, threshold=None))

    priority = data.get("priority") or data.get("states", {}).keys()

    states = []
    for state_name in priority:
        reqs = data["states"][state_name]
        states.append(StateDefinition(name=state_name, requirements=dict(reqs)))

    return ChromatinConfig(
        organism=data.get("organism", ""),
        bin_size=data.get("bin_size", 200),
        thresholds=thresholds,
        states=states,
        default_state=data.get("default_state", "unclassified"),
    )

Function

annotate_chromatin

annotate_chromatin(config: ChromatinConfig, tracks: dict[str, Track], chroms: dict[str, int], threshold_fn: PredicateFactory | None = None) -> Iterator[Region]

Annotate all chromosomes and yield state-labeled Regions.

Each input track is wrapped in a BinaryAdapter that applies the per-mark threshold on-the-fly via signal_at(). No intermediate track is materialized. The output is the lazy iterator from classify_truth_table() — one Region at a time.

When a track's threshold is None in the config, threshold_fn is called to compute a predicate from the track's scores. If both are None, raises ValueError.

Parameters:

Name Type Description Default
config ChromatinConfig

A ChromatinConfig loaded from YAML.

required
tracks dict[str, Track]

Named input tracks keyed by track name.

required
chroms dict[str, int]

Chromosome name to size mapping.

required
threshold_fn PredicateFactory | None

Optional predicate factory that computes a predicate from scores. Called when a track's config threshold is None.

None

Yields:

Type Description
Region

State-labeled Regions with name set to the state label.

Raises:

Type Description
ValueError

If a track has no threshold and no threshold_fn.

Examples:

>>> for r in annotate_chromatin(config, {"ATAC": t}, {"chrI": 230218}):
...     print(r.name)
Source code in src/seqchain/operations/annotate/chromatin.py
def annotate_chromatin(  # batch_boundary
    config: ChromatinConfig,
    tracks: dict[str, Track],
    chroms: dict[str, int],
    threshold_fn: PredicateFactory | None = None,
) -> Iterator[Region]:
    """Annotate all chromosomes and yield state-labeled Regions.

    Each input track is wrapped in a ``BinaryAdapter`` that applies
    the per-mark threshold on-the-fly via ``signal_at()``.  No
    intermediate track is materialized.  The output is the lazy
    iterator from ``classify_truth_table()`` — one Region at a time.

    When a track's threshold is ``None`` in the config, *threshold_fn*
    is called to compute a predicate from the track's scores.
    If both are ``None``, raises `ValueError`.

    Args:
        config: A `ChromatinConfig` loaded from YAML.
        tracks: Named input tracks keyed by track name.
        chroms: Chromosome name to size mapping.
        threshold_fn: Optional predicate factory that computes a
            predicate from scores.  Called when a track's config
            threshold is ``None``.

    Yields:
        State-labeled Regions with ``name`` set to the state label.

    Raises:
        ValueError: If a track has no threshold and no threshold_fn.

    Examples:
        >>> for r in annotate_chromatin(config, {"ATAC": t}, {"chrI": 230218}):
        ...     print(r.name)
    """
    binary_tracks: dict[str, Track] = {}
    for t in config.thresholds:
        if t.name in tracks:
            track = tracks[t.name]
            if t.threshold is not None:
                predicate = make_predicate(t.threshold)
            elif threshold_fn is not None:
                predicate = threshold_fn(track.scores())
            else:
                raise ValueError(
                    f"Track {t.name!r} has no threshold and no "
                    f"threshold_fn was provided"
                )
            binary_tracks[t.name] = BinaryAdapter(track, predicate)
        else:
            binary_tracks[t.name] = ZeroTrack(t.name)

    yield from classify_truth_table(
        binary_tracks,
        chroms,
        bin_size=config.bin_size,
        states=config.states,
        default_state=config.default_state,
    )