platform

原始： /docs/platform.md
# PLATFORM.md — Sequence Dojo Platform Implementation Guide (Draft v0.3)

This document is for maintainers and implementers of the **Sequence Dojo platform**. It describes how the platform should validate setter submissions, generate commitments, publish problems, sandbox execution, judge solvers, and reveal setters under a **platform-enforced Commit–Reveal protocol**.

> Design goals: **reproducible**, **verifiable**, **safe**, and **low-friction**.

See also:

* [SPEC.md](./SPEC.md) (normative protocol and artifact formats)
* [RULES.md](./RULES.md) (participant-facing constraints)
* [REVEAL.md](./REVEAL.md) (lifecycle phases and disclosure)

---

## 1. Platform Responsibilities

The Platform must:

1. **Validate** setter packages before publication
2. **Canonicalize** and compute `P_hash` (SHA-256) for the setter source
3. **Generate disclosure data** from true outputs (no manual transcription)
4. **Publish** a public problem record (`published.json`)
5. **Execute and judge** solver submissions in a sandbox
6. **Reveal** setter source after the problem closes, enabling third-party verification of `P_hash`
7. Provide **clear diagnostics** on failures (static gate, sandbox, runtime, mismatch)

---

## 2. Standard Interfaces

### 2.1 Setter interfaces

The platform must support exactly one of the following per season (recommended: `seq`):

* `seq(n: int) -> int`
* `gen(N: int) -> list[int]`

### 2.2 Solver interface (recommended)

* `solver() -> list[int]` returning exactly `N_check` integers

> Keep the solver interface fixed and simple; avoid allowing “paste-a-list” solutions.

---

## 3. Artifact Formats

### 3.1 Setter package

Expected files:

* `problem.json`
* `setter.py`

### 3.2 Solver package

Expected files:

* `solver.py`
* optional `solution.json`

### 3.3 Published problem record

The platform outputs a `published.json` containing:

* `problem_id`, `title`
* `P_hash`
* `interface`, `N_check`
* `disclosure` (default: odd-index first 50)
* `timestamp`
* `platform` metadata (versions + canonicalization policy)

---

## 4. Validation Pipeline (Gates)

A setter submission is published **only if it passes all gates**. Fail fast with clear error messages.

### Gate A — Static validation (no execution)

**Inputs**: `setter.py`, `problem.json`

Checks:

1. **Line limit**: effective lines ≤ 100
2. **Character limit**: UTF-8 char count ≤ 5000
3. **AST parse**: parse must succeed
4. **Import whitelist**:

   * allowed: `sympy`, `math`, `fractions`, `itertools`
   * disallowed (non-exhaustive): `os`, `pathlib`, `subprocess`, `socket`, `requests`, `time`, `datetime`
5. **Dangerous builtins** (recommend reject if referenced):

   * `open`, `eval`, `exec`, `compile`, `__import__`, `input`
6. **Suspicious patterns** (recommend reject if present):

   * attribute access to `__dict__`, `__class__`, `__mro__`, `__subclasses__`
   * `globals()`, `locals()`
   * `getattr`/`setattr` on unknown targets
   * `importlib` usage

**Outputs**:

* pass/fail
* list of violations with line/column if possible

> Implementation hint: use Python `ast` to scan `Import`, `ImportFrom`, `Name`, `Attribute`, `Call`.

---

### Gate B — Sandbox safety validation (controlled execution)

**Goal**: ensure the code cannot perform I/O/network/subprocess and cannot import forbidden modules at runtime.

Recommended baseline approach (portable, “good enough”):

* Run code in a **separate process**
* Use **resource limits**:

  * CPU time limit
  * wall time limit
  * memory limit
* Use a **restricted import hook**:

  * allow only whitelist modules
  * deny everything else
* Provide a **restricted builtins** dict:

  * remove/replace `open`, `eval`, `exec`, `compile`, `__import__`, etc.

Stronger approach (optional, more robust):

* containerized sandbox (Docker/Firejail/nsjail) with no network, read-only FS, limited syscalls

Minimum requirements:

* No file writes
* No network access
* No subprocess execution
* No clock/time APIs (or ensure they return fixed values)

**Outputs**:

* pass/fail
* if fail: indicate which capability was attempted (e.g., forbidden import, file access)

---

### Gate C — Performance validation (1 second)

**Goal**: setter must generate `a[0..N_check-1]` within time limit on the platform standard machine.

Procedure:

1. Load the setter in the sandbox
2. Generate first `N_check` terms (default 200)
3. Measure runtime (see §7 timing policy)
4. Fail if runtime > 1 second

Record:

* wall time
* CPU time (if available)
* peak RSS memory (optional but useful)

---

### Gate D — Determinism validation

**Goal**: repeated runs must produce identical results.

Procedure:

1. Run generation twice in the same environment (fresh process recommended)
2. Compare the full `N_check` list
3. Fail if any mismatch

This catches:

* non-fixed random seeds
* time/environment dependence
* nondeterministic iteration order / hashing dependence (less likely in pure numeric code, but still possible)

---

## 5. Canonicalization & Hashing (Commit)

The platform must produce a stable `P_hash` using a **documented canonicalization policy**.

### 5.1 Canonicalization policy (recommended)

Given `setter.py` raw bytes:

1. Decode as UTF-8 (reject if invalid)
2. Normalize newlines to `\n`
3. Strip trailing blank lines at end-of-file
4. Keep all other bytes exactly as-is (do not reformat code)

The canonical bytes are then hashed:

* `P_hash = SHA-256(canonical_bytes)`

> The platform must publish this policy in `published.json.platform.canonicalization`.

### 5.2 Why canonicalization

Without canonicalization, the same logical source can hash differently due to:

* CRLF vs LF
* trailing newlines
* packaging artifacts

Canonicalization makes hash verification robust and predictable.

---

## 6. Disclosure Generation (No Human Copy/Paste)

After Gate A–D pass:

1. Generate `a_true = [a_0..a_{N_check-1}]`
2. Build disclosure values by rule:

   * default: `odd_first_50 = [a_1, a_3, …, a_99]`
3. Store disclosure into `published.json`

The platform must never accept manually entered disclosure lists.

---

## 7. Timing & Resource Accounting

### 7.1 Timing

Pick one timing definition and publish it for the season:

* **Wall time** is recommended (closer to user experience).
* Measure inside the sandbox process.

Implementation notes:

* exclude compilation/import overhead? (choose and stick to it)
* simplest: measure end-to-end `generate(N_check)` inside the sandbox

### 7.2 Limits

At minimum enforce:

* wall time ≤ 1 second for `N_check=200` (setter)
* memory ≤ platform-defined cap (recommend: 256–1024 MB)
* recursion depth: default Python limit is fine; optionally cap to prevent abuse

---

## 8. Judging Solver Submissions

### 8.1 Load & run solver

Execute `solver.py` in the same sandbox policy used for setters (or stricter).

Call:

* `solver()` → list of ints of length `N_check`

Validate:

* type is list
* length is exactly `N_check`
* all elements are Python `int`

### 8.2 Compare to ground truth

Compute:

* `first_mismatch_index` (if any)
* stage pass: match first 100
* reward: match first 200

Return a structured verdict object:

```json
{
  "ok": false,
  "stage_pass": true,
  "reward": false,
  "first_mismatch": {
    "index": 145,
    "expected": 2918000611027443,
    "got": 2917990611027443
  }
}
```

---

## 9. Reveal Procedure

After the problem is closed (or per policy):

1. Publish the original `setter.py` (or the canonicalized version—choose one and document it)
2. Publish the canonicalization policy
3. (Optional) publish validation logs (gate results, runtime, determinism checks)

Third parties should be able to verify:

* `SHA-256(canonicalize(setter.py)) == P_hash`

---

## 10. Error Codes & Diagnostics (Recommended)

Use stable, machine-readable error codes.

Examples:

### Static gate errors

* `E_STATIC_LINE_LIMIT`
* `E_STATIC_CHAR_LIMIT`
* `E_STATIC_IMPORT_FORBIDDEN`
* `E_STATIC_DANGEROUS_BUILTIN`
* `E_STATIC_AST_PARSE`

### Sandbox/runtime errors

* `E_SANDBOX_FORBIDDEN_IMPORT`
* `E_SANDBOX_IO_ATTEMPT`
* `E_SANDBOX_SUBPROCESS_ATTEMPT`
* `E_TIMEOUT`
* `E_OOM`

### Interface errors

* `E_INTERFACE_MISSING`
* `E_INTERFACE_BAD_RETURN_TYPE`
* `E_INTERFACE_BAD_LENGTH`
* `E_INTERFACE_NON_INT_ELEMENT`

### Determinism

* `E_NONDETERMINISTIC_OUTPUT`

### Judging mismatch

* `E_MISMATCH`

Diagnostics should include:

* which gate failed
* the offending module/symbol if applicable
* earliest mismatch index for judging

---

## 11. Minimal CLI Workflow (Suggested)

A minimal working CLI typically includes:

### `validate <setter_pack_dir>`

* run Gate A–D
* print pass/fail + diagnostics

### `publish <setter_pack_dir> --out published.json`

* run validate
* canonicalize + compute `P_hash`
* generate disclosure
* write `published.json`

### `judge <published.json> <solver_pack_dir>`

* load setter by problem_id (or stored ground truth)
* run solver
* compare and output verdict JSON

### `reveal <published.json> --out reveal_dir`

* export setter source + metadata
* optionally export logs

---

## 12. Implementation Notes & Tradeoffs

### 12.1 Sandboxing in Python is hard

A pure-Python “restricted builtins” approach is not bulletproof against adversarial code. For early prototypes it is acceptable, but for public competitions consider process- or container-level sandboxes.

### 12.2 Keep the interface small

Favor `seq(n)` or `solver()` and fixed `N_check`. This reduces complexity and avoids ambiguous I/O contracts.

### 12.3 Treat all disclosure as generated artifacts

Trial 002 showed how easy it is to corrupt a problem by copying large integers manually. This platform must prevent that class of failure entirely.

---

## 13. Season Configuration (Recommended)

A `season.toml` / `config.json` can fix:

* allowed imports
* banned modules/builtins
* N_check, disclosure rule
* time/memory limits
* canonicalization policy
* judging thresholds (100/200)

The platform should embed the relevant configuration in `published.json.platform` for auditability.