spec

原始：/docs/spec.md
# Sequence Dojo — Submission & Publication Protocol (SPEC) (Draft v0.3)

This document is the protocol surface for Sequence Dojo: artifact formats, required interfaces, and what the Platform must publish and later reveal.

See also:

* [RULES.md](./RULES.md) (participant-facing constraints)
* [REVEAL.md](./REVEAL.md) (lifecycle phases and disclosure)
* [PLATFORM.md](./PLATFORM.md) (implementation guidance for gates and sandboxing)
* [SCORING.md](./SCORING.md) (ranking among correct solvers)

This document defines a **platform-enforced** workflow for publishing Sequence Dojo problems under a **Commit–Reveal** protocol. The core idea is simple:

* **Setters submit code + metadata to the Platform**
* The **Platform validates** safety, determinism, and performance
* Only then does the Platform **publish the problem statement + platform-generated hash**
* After judging ends, the Platform **reveals the setter code** so anyone can verify the hash

The protocol is designed to eliminate:

* “Simulated”/self-reported hashes
* transcription errors in disclosed data
* unsafe or non-deterministic setter programs
* manual copy/paste failures in solver submissions

---

# 1. Roles

### Setter

Creates a hidden deterministic program that generates an integer sequence.

### Solver

Infers the hidden logic from disclosed terms and submits a program that generates the first 200 terms.

### Platform (Judge)

Validates submissions, generates commitments (hashes), publishes problems, runs judging, and performs reveal.

---

# 2. Submission Package (Setter → Platform)

A setter submits a **package** (directory or zip) containing at minimum:

## 2.1 `problem.json` (metadata)

Example:

```json
{
  "title": "Trial 002",
  "author": "setter_handle",
  "version": "1.0",
  "interface": "seq",
  "N_check": 200,
  "reveal_policy": "after_judged",
  "notes": "pure function, deterministic, no I/O",
  "expected_runtime_ms": 100
}
```

**Required fields**

* `title`: human-readable name
* `interface`: `"seq"` or `"gen"`
* `N_check`: integer (default 200)

**Optional fields**

* `author`, `version`, `notes`, `expected_runtime_ms`
* `reveal_policy`: `"after_judged"` (default) or other platform-defined options

## 2.2 `setter.py` (the generator)

The setter must implement **exactly one** of the following interfaces:

### Interface A (recommended)

```python
def seq(n: int) -> int:
```

### Interface B

```python
def gen(N: int) -> list[int]:
```

Return values must be Python `int`. For invalid inputs, it should raise an exception.

---

# 3. Mandatory Constraints (Setter Program)

The setter program must satisfy all constraints below.

## 3.1 Pure Function & Determinism

The program must not:

* read/write files
* access the network
* spawn subprocesses
* read system time / clock
* read environment variables or any external state

Randomness is allowed **only if deterministic**, i.e. a fixed seed is hard-coded such that output is fully reproducible.

## 3.2 Dependency Whitelist

Allowed imports:

* `sympy`, `math`, `fractions`, `itertools`

Disallowed imports include (not exhaustive):

* `os`, `pathlib`, `subprocess`, `socket`, `requests`, `time`, `datetime`

The Platform enforces this via static scanning and runtime import interception.

## 3.3 Resource Limits

* Must generate `a_0..a_{N_check-1}` within **1 second** on the Platform standard machine
* Code length ≤ **100 lines** (Platform-defined “effective code lines”)
* Total UTF-8 character count ≤ **5000** (including whitespace and newlines)

---

# 4. Platform Validation Gates (Publish Allowed Only After Passing)

The Platform must perform the following checks before publishing any problem.

## Gate A — Static Validation

* Measure effective lines and character count (must pass)
* Parse AST and reject disallowed imports/symbols
* Reject obviously dangerous primitives (platform-defined), typically including:

  * `open`, `eval`, `exec`, `compile`, `__import__`

## Gate B — Sandbox Safety Validation

Execute `setter.py` in a restricted sandbox:

* block filesystem access
* block network access
* block subprocess creation
* enforce import whitelist at runtime

Any violation fails the submission.

## Gate C — Performance Validation

Run the setter to compute `a[0..N_check-1]`.

* Total runtime must be ≤ 1 second
* Platform may record runtime/memory metrics for later analytics

## Gate D — Determinism Validation

Run the same generation at least twice in the same environment.

* Outputs must match exactly, element by element
* Any nondeterminism fails the submission

---

# 5. Canonicalization & Platform Commitment (Hash)

**Setters must not self-report hashes.**

The Platform produces the hash after canonicalizing the source.

## 5.1 Canonicalization Rules

The Platform canonicalizes `setter.py` into bytes by:

* forcing UTF-8 encoding
* normalizing all line endings to `\n`
* removing trailing empty lines at end-of-file
* leaving line-trailing whitespace unchanged

## 5.2 Commitment Hash

The Platform computes:

* `P_hash = SHA-256(canonical_setter_py_bytes)`

This `P_hash` is the official commitment.

---

# 6. Publication Output (Platform → Public)

After passing Gates A–D, the Platform publishes a **public problem record**, e.g. `published.json`:

```json
{
  "problem_id": "<sha256>",
  "title": "Trial 002",
  "P_hash": "<sha256>",
  "interface": "seq",
  "N_check": 200,
  "disclosure": {
    "type": "odd_first_50",
    "values": ["<a_1>", "<a_3>", "...", "<a_99>"]
  },
  "timestamp": "2026-02-13T00:00:00Z"
}
```

`problem_id` is the stable public identifier and is required to equal `P_hash` (the commitment hash). The two fields are
kept distinct in the schema to preserve conceptual clarity, but the Platform must enforce equality.

## 6.1 Disclosure Data (No Manual Copy/Paste)

The Platform must generate disclosure values directly from the validated setter output.
For the default rule `odd_first_50`, the disclosure is:

* `[a_1, a_3, …, a_99]`

This eliminates transcription errors.

---

# 7. Solver Submissions (Solver → Platform)

To prevent copy/paste errors, the Platform accepts **only program submissions** as official solutions.

A solver submits:

* `solver.py`
* optional `solution.json` (notes, approach)

### Required interface (recommended)

```python
def solver() -> list[int]:
```

The function must return exactly `N_check` integers (`200` by default), representing:

* `[a_0, a_1, …, a_{N_check-1}]`

Alternatively, the Platform may allow `seq/gen` interfaces, but it must be fixed per season.

## 7.1 Solver Constraints

The Platform should enforce the same safety/determinism constraints for solver code as for setter code (at minimum: no I/O, no network, no subprocess).

---

# 8. Judging Rules

The Platform computes ground truth from the setter:

* `a_true = [a_0..a_199]`

It computes the solver output:

* `a_hat = solver()`

The result is judged by exact equality.

## 8.1 Stage Pass & Reward

* **Stage Pass**: first 100 terms match exactly
  `a_hat[0:100] == a_true[0:100]`
* **Reward**: first 200 terms match exactly
  `a_hat[0:200] == a_true[0:200]`

The Platform should report:

* pass/fail
* earliest mismatch index (if any)
* expected vs. actual value at mismatch

---

# 9. Reveal Policy (After Judging)

Once the problem is closed (or after judging ends), the Platform reveals:

* the full `setter.py` source
* the canonicalization policy
* (optionally) runtime and determinism check logs

Anyone can verify that:

* `SHA-256(canonical(setter.py)) == P_hash`

---

# 10. Platform Transparency Requirements

For reproducibility, the Platform must publicly fix and disclose:

* Python version and relevant library versions (e.g., sympy)
* standard machine class/spec
* timing method (wall time vs CPU time)
* effective line-count rule and character-count rule
* canonicalization policy used for hashing

These must remain stable for at least a full season.

---

# 11. Non-Compliance

A setter submission is rejected (not published) if it violates any constraint or fails any gate.

A solver submission is rejected if it violates sandbox rules, fails to return properly formatted output, or exceeds resource limits.

The Platform may rate-limit or suspend repeat offenders.

---

## Appendix A — Recommended Folder Layout

Setter package:

```
trial-002/
  problem.json
  setter.py
```

Solver package:

```
solution/
  solver.py
  solution.json   (optional)
```