Signal¶
Signal detection primitives for count-based structure discovery.
Pure functions for deciding whether observed counts represent real signal
or noise. Used by heuristic_discover() and tnseq_discover() to auto-detect read structure
(barcode offsets, transposon flanks) from raw sequencing data.
is_signal_boundary
¶
is_signal_boundary(count_at_n: int, count_at_n_plus_1: int, *, alphabet_size: int = 4, safety_margin: float = 0.75) -> bool
Test whether extending a sequence by one base crosses a signal boundary.
A real flanking sequence at length N will have roughly alphabet_size
times the count of any extension to length N+1, because the next base
is random and splits counts K ways. The threshold is
alphabet_size * safety_margin to tolerate sampling noise.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
count_at_n
|
int
|
Observation count for the candidate at length N. |
required |
count_at_n_plus_1
|
int
|
Observation count for the best candidate at length N+1. |
required |
alphabet_size
|
int
|
Size of the sequence alphabet. 4 for DNA, 20 for protein. Defaults to 4. |
4
|
safety_margin
|
float
|
Fraction of the theoretical ratio to require. Defaults to 0.75 (i.e., require 3x for DNA instead of the theoretical 4x). |
0.75
|
Returns:
| Type | Description |
|---|---|
bool
|
|
bool
|
count_at_n_plus_1. |
Note
At very low counts (below ~30), Poisson sampling noise may exceed the safety margin. Callers processing sparse data should enforce a minimum observation count before relying on this test.
Examples:
Source code in src/seqchain/primitives/signal.py
is_dominant
¶
Test whether the top candidate dominates the runner-up.
Used for offset and orientation convergence: the most common offset
must be at least dominance times more frequent than the second
most common before sampling can stop.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
top_count
|
int
|
Observation count for the leading candidate. |
required |
runner_up_count
|
int
|
Observation count for the second candidate. |
required |
dominance
|
float
|
Required ratio of top to runner-up. Defaults to 2.0. |
2.0
|
Returns:
| Type | Description |
|---|---|
bool
|
|
bool
|
there is no runner-up (runner_up_count <= 0). |
Examples:
Source code in src/seqchain/primitives/signal.py
diversity_saturated
¶
Test whether sampling has seen enough diversity to be confident.
Sampling should continue until observed >= factor * expected.
The default factor of 5 means we want 5x oversampling before
declaring saturation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
observed
|
int
|
Number of observations (diversity hits, novel reads, novel barcodes, etc.). |
required |
expected
|
int
|
Expected population size (typically library size). |
required |
factor
|
float
|
Required oversampling ratio. Defaults to 5.0. |
5.0
|
Returns:
| Type | Description |
|---|---|
bool
|
|
Examples: