This document is for maintainers and implementers of the Sequence Dojo platform. It describes how the platform should validate setter submissions, generate commitments, publish problems, sandbox execution, judge solvers, and reveal setters under a platform-enforced Commit–Reveal protocol.
Design goals: reproducible, verifiable, safe, and low-friction.
See also:
The Platform must:
P_hash (SHA-256) for the setter sourcepublished.json)P_hashThe platform must support exactly one of the following per season (recommended: seq):
seq(n: int) -> intgen(N: int) -> list[int]solver() -> list[int] returning exactly N_check integersKeep the solver interface fixed and simple; avoid allowing “paste-a-list” solutions.
Expected files:
problem.jsonsetter.pyExpected files:
solver.pysolution.jsonThe platform outputs a published.json containing:
problem_id, titleP_hashinterface, N_checkdisclosure (default: odd-index first 50)timestampplatform metadata (versions + canonicalization policy)A setter submission is published only if it passes all gates. Fail fast with clear error messages.
Inputs: setter.py, problem.json
Checks:
sympy, math, fractions, itertoolsos, pathlib, subprocess, socket, requests, time, datetimeopen, eval, exec, compile, __import__, input__dict__, __class__, __mro__, __subclasses__globals(), locals()getattr/setattr on unknown targetsimportlib usageOutputs:
Implementation hint: use Python
astto scanImport,ImportFrom,Name,Attribute,Call.
Goal: ensure the code cannot perform I/O/network/subprocess and cannot import forbidden modules at runtime.
Recommended baseline approach (portable, “good enough”):
open, eval, exec, compile, __import__, etc.Stronger approach (optional, more robust):
Minimum requirements:
Outputs:
Goal: setter must generate a[0..N_check-1] within time limit on the platform standard machine.
Procedure:
N_check terms (default 200)Record:
Goal: repeated runs must produce identical results.
Procedure:
N_check listThis catches:
The platform must produce a stable P_hash using a documented canonicalization policy.
Given setter.py raw bytes:
\nThe canonical bytes are then hashed:
P_hash = SHA-256(canonical_bytes)The platform must publish this policy in
published.json.platform.canonicalization.
Without canonicalization, the same logical source can hash differently due to:
Canonicalization makes hash verification robust and predictable.
After Gate A–D pass:
a_true = [a_0..a_{N_check-1}]odd_first_50 = [a_1, a_3, …, a_99]published.jsonThe platform must never accept manually entered disclosure lists.
Pick one timing definition and publish it for the season:
Implementation notes:
generate(N_check) inside the sandboxAt minimum enforce:
N_check=200 (setter)Execute solver.py in the same sandbox policy used for setters (or stricter).
Call:
solver() → list of ints of length N_checkValidate:
N_checkintCompute:
first_mismatch_index (if any)Return a structured verdict object:
{
"ok": false,
"stage_pass": true,
"reward": false,
"first_mismatch": {
"index": 145,
"expected": 2918000611027443,
"got": 2917990611027443
}
}After the problem is closed (or per policy):
setter.py (or the canonicalized version—choose one and document it)Third parties should be able to verify:
SHA-256(canonicalize(setter.py)) == P_hashUse stable, machine-readable error codes.
Examples:
E_STATIC_LINE_LIMITE_STATIC_CHAR_LIMITE_STATIC_IMPORT_FORBIDDENE_STATIC_DANGEROUS_BUILTINE_STATIC_AST_PARSEE_SANDBOX_FORBIDDEN_IMPORTE_SANDBOX_IO_ATTEMPTE_SANDBOX_SUBPROCESS_ATTEMPTE_TIMEOUTE_OOME_INTERFACE_MISSINGE_INTERFACE_BAD_RETURN_TYPEE_INTERFACE_BAD_LENGTHE_INTERFACE_NON_INT_ELEMENTE_NONDETERMINISTIC_OUTPUTE_MISMATCHDiagnostics should include:
A minimal working CLI typically includes:
validate <setter_pack_dir>publish <setter_pack_dir> --out published.jsonP_hashpublished.jsonjudge <published.json> <solver_pack_dir>reveal <published.json> --out reveal_dirA pure-Python “restricted builtins” approach is not bulletproof against adversarial code. For early prototypes it is acceptable, but for public competitions consider process- or container-level sandboxes.
Favor seq(n) or solver() and fixed N_check. This reduces complexity and avoids ambiguous I/O contracts.
Trial 002 showed how easy it is to corrupt a problem by copying large integers manually. This platform must prevent that class of failure entirely.
A season.toml / config.json can fix:
The platform should embed the relevant configuration in published.json.platform for auditability.
# PLATFORM.md — Sequence Dojo Platform Implementation Guide (Draft v0.3)
This document is for maintainers and implementers of the **Sequence Dojo platform**. It describes how the platform should validate setter submissions, generate commitments, publish problems, sandbox execution, judge solvers, and reveal setters under a **platform-enforced Commit–Reveal protocol**.
> Design goals: **reproducible**, **verifiable**, **safe**, and **low-friction**.
See also:
* [SPEC.md](./SPEC.md) (normative protocol and artifact formats)
* [RULES.md](./RULES.md) (participant-facing constraints)
* [REVEAL.md](./REVEAL.md) (lifecycle phases and disclosure)
---
## 1. Platform Responsibilities
The Platform must:
1. **Validate** setter packages before publication
2. **Canonicalize** and compute `P_hash` (SHA-256) for the setter source
3. **Generate disclosure data** from true outputs (no manual transcription)
4. **Publish** a public problem record (`published.json`)
5. **Execute and judge** solver submissions in a sandbox
6. **Reveal** setter source after the problem closes, enabling third-party verification of `P_hash`
7. Provide **clear diagnostics** on failures (static gate, sandbox, runtime, mismatch)
---
## 2. Standard Interfaces
### 2.1 Setter interfaces
The platform must support exactly one of the following per season (recommended: `seq`):
* `seq(n: int) -> int`
* `gen(N: int) -> list[int]`
### 2.2 Solver interface (recommended)
* `solver() -> list[int]` returning exactly `N_check` integers
> Keep the solver interface fixed and simple; avoid allowing “paste-a-list” solutions.
---
## 3. Artifact Formats
### 3.1 Setter package
Expected files:
* `problem.json`
* `setter.py`
### 3.2 Solver package
Expected files:
* `solver.py`
* optional `solution.json`
### 3.3 Published problem record
The platform outputs a `published.json` containing:
* `problem_id`, `title`
* `P_hash`
* `interface`, `N_check`
* `disclosure` (default: odd-index first 50)
* `timestamp`
* `platform` metadata (versions + canonicalization policy)
---
## 4. Validation Pipeline (Gates)
A setter submission is published **only if it passes all gates**. Fail fast with clear error messages.
### Gate A — Static validation (no execution)
**Inputs**: `setter.py`, `problem.json`
Checks:
1. **Line limit**: effective lines ≤ 100
2. **Character limit**: UTF-8 char count ≤ 5000
3. **AST parse**: parse must succeed
4. **Import whitelist**:
* allowed: `sympy`, `math`, `fractions`, `itertools`
* disallowed (non-exhaustive): `os`, `pathlib`, `subprocess`, `socket`, `requests`, `time`, `datetime`
5. **Dangerous builtins** (recommend reject if referenced):
* `open`, `eval`, `exec`, `compile`, `__import__`, `input`
6. **Suspicious patterns** (recommend reject if present):
* attribute access to `__dict__`, `__class__`, `__mro__`, `__subclasses__`
* `globals()`, `locals()`
* `getattr`/`setattr` on unknown targets
* `importlib` usage
**Outputs**:
* pass/fail
* list of violations with line/column if possible
> Implementation hint: use Python `ast` to scan `Import`, `ImportFrom`, `Name`, `Attribute`, `Call`.
---
### Gate B — Sandbox safety validation (controlled execution)
**Goal**: ensure the code cannot perform I/O/network/subprocess and cannot import forbidden modules at runtime.
Recommended baseline approach (portable, “good enough”):
* Run code in a **separate process**
* Use **resource limits**:
* CPU time limit
* wall time limit
* memory limit
* Use a **restricted import hook**:
* allow only whitelist modules
* deny everything else
* Provide a **restricted builtins** dict:
* remove/replace `open`, `eval`, `exec`, `compile`, `__import__`, etc.
Stronger approach (optional, more robust):
* containerized sandbox (Docker/Firejail/nsjail) with no network, read-only FS, limited syscalls
Minimum requirements:
* No file writes
* No network access
* No subprocess execution
* No clock/time APIs (or ensure they return fixed values)
**Outputs**:
* pass/fail
* if fail: indicate which capability was attempted (e.g., forbidden import, file access)
---
### Gate C — Performance validation (1 second)
**Goal**: setter must generate `a[0..N_check-1]` within time limit on the platform standard machine.
Procedure:
1. Load the setter in the sandbox
2. Generate first `N_check` terms (default 200)
3. Measure runtime (see §7 timing policy)
4. Fail if runtime > 1 second
Record:
* wall time
* CPU time (if available)
* peak RSS memory (optional but useful)
---
### Gate D — Determinism validation
**Goal**: repeated runs must produce identical results.
Procedure:
1. Run generation twice in the same environment (fresh process recommended)
2. Compare the full `N_check` list
3. Fail if any mismatch
This catches:
* non-fixed random seeds
* time/environment dependence
* nondeterministic iteration order / hashing dependence (less likely in pure numeric code, but still possible)
---
## 5. Canonicalization & Hashing (Commit)
The platform must produce a stable `P_hash` using a **documented canonicalization policy**.
### 5.1 Canonicalization policy (recommended)
Given `setter.py` raw bytes:
1. Decode as UTF-8 (reject if invalid)
2. Normalize newlines to `\n`
3. Strip trailing blank lines at end-of-file
4. Keep all other bytes exactly as-is (do not reformat code)
The canonical bytes are then hashed:
* `P_hash = SHA-256(canonical_bytes)`
> The platform must publish this policy in `published.json.platform.canonicalization`.
### 5.2 Why canonicalization
Without canonicalization, the same logical source can hash differently due to:
* CRLF vs LF
* trailing newlines
* packaging artifacts
Canonicalization makes hash verification robust and predictable.
---
## 6. Disclosure Generation (No Human Copy/Paste)
After Gate A–D pass:
1. Generate `a_true = [a_0..a_{N_check-1}]`
2. Build disclosure values by rule:
* default: `odd_first_50 = [a_1, a_3, …, a_99]`
3. Store disclosure into `published.json`
The platform must never accept manually entered disclosure lists.
---
## 7. Timing & Resource Accounting
### 7.1 Timing
Pick one timing definition and publish it for the season:
* **Wall time** is recommended (closer to user experience).
* Measure inside the sandbox process.
Implementation notes:
* exclude compilation/import overhead? (choose and stick to it)
* simplest: measure end-to-end `generate(N_check)` inside the sandbox
### 7.2 Limits
At minimum enforce:
* wall time ≤ 1 second for `N_check=200` (setter)
* memory ≤ platform-defined cap (recommend: 256–1024 MB)
* recursion depth: default Python limit is fine; optionally cap to prevent abuse
---
## 8. Judging Solver Submissions
### 8.1 Load & run solver
Execute `solver.py` in the same sandbox policy used for setters (or stricter).
Call:
* `solver()` → list of ints of length `N_check`
Validate:
* type is list
* length is exactly `N_check`
* all elements are Python `int`
### 8.2 Compare to ground truth
Compute:
* `first_mismatch_index` (if any)
* stage pass: match first 100
* reward: match first 200
Return a structured verdict object:
```json
{
"ok": false,
"stage_pass": true,
"reward": false,
"first_mismatch": {
"index": 145,
"expected": 2918000611027443,
"got": 2917990611027443
}
}
```
---
## 9. Reveal Procedure
After the problem is closed (or per policy):
1. Publish the original `setter.py` (or the canonicalized version—choose one and document it)
2. Publish the canonicalization policy
3. (Optional) publish validation logs (gate results, runtime, determinism checks)
Third parties should be able to verify:
* `SHA-256(canonicalize(setter.py)) == P_hash`
---
## 10. Error Codes & Diagnostics (Recommended)
Use stable, machine-readable error codes.
Examples:
### Static gate errors
* `E_STATIC_LINE_LIMIT`
* `E_STATIC_CHAR_LIMIT`
* `E_STATIC_IMPORT_FORBIDDEN`
* `E_STATIC_DANGEROUS_BUILTIN`
* `E_STATIC_AST_PARSE`
### Sandbox/runtime errors
* `E_SANDBOX_FORBIDDEN_IMPORT`
* `E_SANDBOX_IO_ATTEMPT`
* `E_SANDBOX_SUBPROCESS_ATTEMPT`
* `E_TIMEOUT`
* `E_OOM`
### Interface errors
* `E_INTERFACE_MISSING`
* `E_INTERFACE_BAD_RETURN_TYPE`
* `E_INTERFACE_BAD_LENGTH`
* `E_INTERFACE_NON_INT_ELEMENT`
### Determinism
* `E_NONDETERMINISTIC_OUTPUT`
### Judging mismatch
* `E_MISMATCH`
Diagnostics should include:
* which gate failed
* the offending module/symbol if applicable
* earliest mismatch index for judging
---
## 11. Minimal CLI Workflow (Suggested)
A minimal working CLI typically includes:
### `validate <setter_pack_dir>`
* run Gate A–D
* print pass/fail + diagnostics
### `publish <setter_pack_dir> --out published.json`
* run validate
* canonicalize + compute `P_hash`
* generate disclosure
* write `published.json`
### `judge <published.json> <solver_pack_dir>`
* load setter by problem_id (or stored ground truth)
* run solver
* compare and output verdict JSON
### `reveal <published.json> --out reveal_dir`
* export setter source + metadata
* optionally export logs
---
## 12. Implementation Notes & Tradeoffs
### 12.1 Sandboxing in Python is hard
A pure-Python “restricted builtins” approach is not bulletproof against adversarial code. For early prototypes it is acceptable, but for public competitions consider process- or container-level sandboxes.
### 12.2 Keep the interface small
Favor `seq(n)` or `solver()` and fixed `N_check`. This reduces complexity and avoids ambiguous I/O contracts.
### 12.3 Treat all disclosure as generated artifacts
Trial 002 showed how easy it is to corrupt a problem by copying large integers manually. This platform must prevent that class of failure entirely.
---
## 13. Season Configuration (Recommended)
A `season.toml` / `config.json` can fix:
* allowed imports
* banned modules/builtins
* N_check, disclosure rule
* time/memory limits
* canonicalization policy
* judging thresholds (100/200)
The platform should embed the relevant configuration in `published.json.platform` for auditability.