rules
Raw: /docs/rules.md
Raw: /docs/rules.md
# Sequence Dojo — Competition Rules (Draft v0.3)
See also:
* [SPEC.md](./SPEC.md) (protocol and artifact formats)
* [SCORING.md](./SCORING.md) (ranking and anti-hardcode rules)
* [REVEAL.md](./REVEAL.md) (lifecycle phases and what becomes public)
* [PLATFORM.md](./PLATFORM.md) (implementation guidance)
## 0. Purpose
Sequence Dojo is a competition for **programmatic inference** of integer sequences under partial disclosure.
* A **Setter** provides a deterministic program that generates an integer sequence.
* The **Platform** validates the setter program and publishes a **commitment hash** plus partial disclosed terms.
* A **Solver** submits a deterministic program that reconstructs the first `N` terms (default `N=200`).
* The Platform judges solutions by **exact equality** in a controlled environment.
This competition is designed to be:
* **reproducible** (platform-defined environment and canonicalization)
* **verifiable** (platform-generated hash commitment and later reveal)
* **robust** (no manual copy/paste of long integer lists)
---
## 1. Roles
### 1.1 Setter
* Submits a problem package (`problem.json` + `setter.py`) to the Platform.
* Must comply with all safety, determinism, and resource constraints.
### 1.2 Solver
* Submits a solution package (`solver.py`) to the Platform.
* Must comply with safety and resource constraints.
### 1.3 Platform (Judge)
* Validates setter submissions.
* Publishes only after validation passes.
* Computes and publishes commitment hashes and disclosure data.
* Executes solver submissions and issues verdicts.
* Reveals setter code after judging closes.
---
## 2. Definitions
### 2.1 Sequence
Each problem defines an integer sequence:
`a_0, a_1, a_2, ...`
All terms are Python integers (`int`).
### 2.2 Check Length
Unless specified otherwise, the Platform checks:
* `N_check = 200` terms, i.e. `a_0..a_199`.
### 2.3 Disclosure (Default)
The Platform discloses:
* the first 50 **odd-index** terms:
`a_1, a_3, ..., a_99`
The Platform does **not** disclose even-index terms.
---
## 3. Submission Types
### 3.1 Setter Submission Package
A setter submits a package containing:
#### `problem.json`
Minimal required fields:
```json
{
"title": "Trial 002",
"interface": "seq",
"N_check": 200
}
```
* `interface`: `"seq"` or `"gen"`
* `N_check`: defaults to `200` if omitted
#### `setter.py`
Must implement **exactly one** interface:
**Option A**
```python
def seq(n: int) -> int:
```
**Option B**
```python
def gen(N: int) -> list[int]:
```
Return values must be Python `int`.
### 3.2 Solver Submission Package
A solver submits a package containing:
#### `solver.py`
Recommended interface:
```python
def solver() -> list[int]:
```
It must return exactly `N_check` integers:
`[a_0, a_1, ..., a_{N_check-1}]`
> Program-only submissions are mandatory; pasting long integer lists is not an official submission format.
---
## 4. Mandatory Constraints
### 4.1 Purity & Determinism
Setter programs must be **pure and deterministic**:
* no file I/O
* no network access
* no subprocess execution
* no system time, clock, or environment variable reads
* no external state
Randomness is allowed only if **fully deterministic** (fixed seed, reproducible output).
Solver programs must follow the same sandbox safety constraints.
### 4.2 Dependency Whitelist
Allowed imports:
* `sympy`, `math`, `fractions`, `itertools`
Disallowed imports include (not exhaustive):
* `os`, `pathlib`, `subprocess`, `socket`, `requests`, `time`, `datetime`
The Platform enforces this using static scans and runtime import interception.
### 4.3 Resource Limits (Default Season Settings)
* Setter must generate `a_0..a_{N_check-1}` within **1 second** on the Platform standard machine.
* Setter code length ≤ **100 lines** (Platform-defined effective line counting).
* Setter UTF-8 character count ≤ **5000**.
* Platform may apply comparable limits to solver programs.
> All limits and counting rules must be published and stable for the duration of a season.
---
## 5. Platform Validation & Publication (Commit–Reveal)
### 5.1 Validation Gates (Setter)
A problem is published only if it passes all gates:
1. **Static Gate**: line/char limits, banned imports/symbols.
2. **Sandbox Gate**: no I/O, no network, no subprocess; runtime import whitelist.
3. **Performance Gate**: generate first `N_check` terms within 1 second.
4. **Determinism Gate**: repeated runs produce identical outputs.
If any gate fails, the submission is rejected and not published.
### 5.2 Canonicalization (for Hashing)
The Platform canonicalizes `setter.py` before hashing:
* UTF-8 encoding
* normalize newlines to `\n`
* strip trailing blank lines at EOF
* leave line-trailing whitespace unchanged
### 5.3 Commitment Hash
The Platform computes:
* `P_hash = SHA-256(canonical_setter.py_bytes)`
**Setters must not self-report hashes.** The Platform-generated `P_hash` is authoritative.
### 5.4 Published Problem Record
After validation, the Platform publishes a record containing at least:
* `problem_id`, `title`
* `P_hash`
* `interface`, `N_check`
* disclosure data (default: odd-index first 50)
The Platform must generate disclosure values directly from the validated setter output (no manual transcription).
### 5.5 Reveal
After the problem closes, the Platform reveals `setter.py` (and canonicalization policy) so anyone can verify:
* `SHA-256(canonical(setter.py)) == P_hash`
---
## 6. Judging & Scoring
### 6.1 Ground Truth
The Platform generates ground truth `a_true[0..N_check-1]` from the validated setter program.
### 6.2 Solver Output
The Platform runs the solver program to obtain `a_hat[0..N_check-1]`.
### 6.3 Verdict
* **Stage Pass**: first 100 terms match exactly
`a_hat[0:100] == a_true[0:100]`
* **Reward**: first 200 terms match exactly
`a_hat[0:200] == a_true[0:200]`
The Platform should report:
* pass/fail
* earliest mismatch index (if any)
* expected value vs actual value at mismatch
---
## 7. Setter Incentives (Optional Rule)
A platform may introduce setter rewards to encourage **stable, extrapolatable** problems.
Intended target (informative):
* Problems where matching the first 100 terms often implies a solver has captured the true generative principle,
so Stage passers tend to also reach 200 (the distribution of “how far you stay correct” is biased toward 200).
* This discourages “trap problems” that are easy to fit up to 100 but fail beyond 100.
### 7.1 Recommended minimum rule (stage→reward consistency)
One recommended rule:
* Let `S100` be solvers who pass Stage (first 100 correct),
* Let `S200` be solvers who receive Reward (first 200 correct).
Setter reward condition (piecewise):
* If `|S100| <= 3`, require `S100 ⊆ S200` (all stage passers get reward).
* If `|S100| >= 4`, require `|S200| / |S100| >= 0.9`.
### 7.2 Optional published diagnostics (extrapolation profile)
Platforms may additionally publish a per-problem diagnostic after Reveal to make “extrapolatability” concrete.
One simple metric is the **longest correct prefix length** for each Stage passer:
* For each solver in `S100`, define `L = max k in [100..200]` such that `a_hat[0:k] == a_true[0:k]`.
* Publish summary statistics of `L` over `S100` (e.g., median, 10th percentile, histogram bins).
This does not change solver verdicts; it is intended for transparency, postmortems, and optional setter incentives.
### 7.3 Optional quantitative setter reward (brevity × difficulty)
Some platforms may want a **numerical** setter reward score that increases when:
* the setter program is shorter (in canonical bytes), and
* the problem is harder (fewer solvers reach Reward), while still being extrapolatable (not a trap).
This section is informative and defines one compatible approach.
#### 7.3.1 Brevity factor (setter length)
Let:
* `L_set = len(canonical_setter.py_bytes)`
Define a bounded brevity factor:
* `B_set = exp(-beta_set * L_set)` in `(0, 1]`
`beta_set` is a season parameter (example scale: `beta_set = 1/800` matches the solver brevity scale).
#### 7.3.2 Difficulty factor (participation-adjusted)
Let:
* `T` be the set of solver submissions for the problem (within the season window),
* `R = |S200|` (Reward Correct count),
* `U = |T|` (submission count).
Define a difficulty factor that avoids rewarding “nobody tried” problems:
* Require `U >= U_min` and `R >= 1` for any setter reward to apply.
* `D = clamp( log((U + 1) / (R + 1)) / log((U + 1) / 2), 0, 1 )`
Intuition:
* If many people try (`U` large) but few reach 200 (`R` small), `D` approaches 1.
* If most submissions reach 200, `D` approaches 0.
All parameters (`U_min`) and counting rules must be published for the season.
#### 7.3.3 Extrapolatability factor (anti-trap)
Reuse the stage→reward consistency idea as a multiplier:
* `E = 0` if the recommended minimum rule (7.1) fails.
* Otherwise `E = clamp(|S200| / max(1, |S100|), 0, 1)`.
This makes the setter reward collapse to 0 for classic “100 fits, 200 fails” traps.
#### 7.3.4 Consensus / rarity boost (optional)
Let `K` be the number of distinct solving method tags among Reward Correct solutions (see `SCORING.md`):
* `K = |{method_tag among Reward Correct submissions}|`
Define a small multiplicative boost:
* `C = 1 + gamma * log2(max(1, K))`
where `gamma` is a season parameter (e.g., `gamma = 0.1`).
#### 7.3.5 Final setter reward score (example)
An example per-problem setter reward score:
* `setter_score = SETTER_BASE * B_set * D * E * C`
with `SETTER_BASE` and all parameters published for the season.
---
## 8. Transparency Requirements
For each season, the Platform must publish and keep stable:
* Python version, library versions (e.g., sympy)
* standard machine class/spec
* timing method (wall vs CPU)
* line/character counting rules
* canonicalization policy used for hashing
---
## 9. Disputes
* Platform sandbox results are authoritative.
* After reveal, any participant may reproduce results locally to verify integrity.
* If a discrepancy is found, the Platform must publish a postmortem including:
* canonicalization details
* environment versions
* reproduction steps
---
## 10. Versioning
This document is **Draft v0.3**. Breaking changes require a new season or an explicit version bump. Non-breaking clarifications may be issued at any time but must not change outcomes of already-published problems.