Skip to content

📖 Consolidate v6 docs and add implementation plan#4994

Draft
justaugustus wants to merge 5 commits intoossf:mainfrom
justaugustus:v6-docs
Draft

📖 Consolidate v6 docs and add implementation plan#4994
justaugustus wants to merge 5 commits intoossf:mainfrom
justaugustus:v6-docs

Conversation

@justaugustus
Copy link
Copy Markdown
Member

What kind of change does this PR introduce?

Documentation: consolidate Scorecard v6 documents under docs/v6/ and add a
dependency-ordered implementation plan for Phase 1.

What is the current behavior?

v6 documents are scattered across openspec/changes/osps-baseline-conformance/
and docs/. There is no implementation plan showing dependency ordering
between v6 work items.

What is the new behavior (if this is a feature change)?

All v6 documents consolidated under docs/v6/:

The implementation plan covers Phase 1 (OSPS Baseline Level 1 conformance
evidence) with 6 steps ordered by dependency:

  • Step 0: OpenFeature feature flag infrastructure
  • Step 1: Framework abstraction (proven with existing checks)
  • Step 2: JSON output for conformance results
  • Step 3: OSPS Baseline as second framework
  • Step 4: Human review of L1 coverage analysis
  • Step 5: Complete L1 coverage (gap probes)

The plan also includes:

  • Feature promotion table for existing flagged features
  • Forge support scope (GitHub in Phase 1; GitLab and Azure DevOps deferred)
  • Codebase reuse map documenting existing infrastructure to extend
  • Recommendations pending approval (marked as such)

Cross-document links in ROADMAP.md, decisions.md, and proposal.md
updated to reflect new file locations.

  • Tests for the changes have been added (for bug fixes/features)

N/A — documentation only.

Which issue(s) this PR fixes

NONE

Special notes for your reviewer

This is a follow-up to PR #4952 (merged). The proposal and decisions documents
are unchanged except for link fixes. The implementation plan (plan.md) is new
content. Several recommendations are marked "pending approval" for Steering
Committee discussion.

Does this PR introduce a user-facing change?

NONE

justaugustus and others added 5 commits March 31, 2026 19:47
Signed-off-by: Stephen Augustus <foo@auggie.dev>
Add dependency-ordered implementation plan for Scorecard v6 and fix broken
links after docs/v6/ consolidation.

Implementation steps:
- Step 0: OpenFeature infrastructure (enables all v6 work)
- Step 1: Evidence model + framework abstraction (core types)
- Step 2: Conformance engine + applicability (core evaluation)
- Step 3: Output formats, staggered (JSON → in-toto → Gemara → OSCAL)
- Step 4: L1 probe coverage + metadata ingestion (parallel with Steps 2-3)
- Step 5: Probe catalog extraction (downstream tool integration)
- Steps 6-8: Phase 2 (release integrity, attestation, evidence bundles)
- Steps 9-11: Phase 3 (enforcement detection, multi-repo, attestation GA)

Key design decisions:
- v6 is a clean, backwards-compatible successor (no parallel v5 maintenance)
- OpenFeature for granular feature gating during v5→v6 transition
- FeatureGate field on checker.Check replaces hard-coded delete list
- Feature flag wrapper at internal/featureflags/ (not public API)
- Explicit phase gates: Phase 1 must prove value before Phase 2 begins

Link fixes:
- docs/ROADMAP.md: update proposal and coverage links to docs/v6/
- docs/v6/decisions.md: update coverage link (now same directory)
- docs/v6/proposal.md: update coverage link (now same directory)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
Restructure Phase 1 to deliver complete OSPS Baseline Level 1 conformance
evidence using existing infrastructure where possible. Key changes:

**Ordering: Prove abstractions with existing code first**
- Step 0: OpenFeature with existing env vars (SCORECARD_V6, SCORECARD_EXPERIMENTAL)
- Step 1: Framework abstraction proven with existing checks before building OSPS
- Step 2: JSON output extension (use existing format, defer other formats)
- Step 3: OSPS Baseline as second framework (uses proven abstraction)
- Step 4: Complete L1 coverage (all 9 gap controls closed)

**Phase 1 success criteria:**
- Complete L1 control coverage (all 9 gap controls + existing coverage validated)
- Framework abstraction proven with checks before OSPS Baseline
- Production-ready conformance results in extended JSON
- Existing checks, probes, scores unchanged (v6 is additive)

**Key findings from investigation:**
- Probes produce findings (reusable)
- Check evaluation logic produces 0-10 scores (NOT reusable for conformance)
- Pattern is reusable: "take findings, apply rules, produce verdict"
- Don't shoehorn - checks and conformance have different semantics
- Metadata ingestion already exists via checks/fileparser/ (no new infrastructure)

**Deferred to Phase 2:**
- Probe catalog extraction (wait for framework abstraction to stabilize)
- Additional output formats (in-toto, Gemara, OSCAL)
- Cron infrastructure (storage/serving cost evaluation needed)
- Level 2/3 controls, attestation, multi-repo

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
Capture recommendations (pending approval) from review discussion:

Feature flag changes:
- Simplify to two flags: scorecard.experimental and scorecard.v6
- Add feature promotion table for existing flagged features (Webhooks, SBOM,
  raw format, SARIF must be promoted/migrated in v6)
- Remove per-feature granular flags (deferred until actual need arises)
- Add testing strategy recommendation (e2e runs twice: default + v6)

JSON schema:
- Add Option B recommendation: unified evaluations key instead of
  checks + conformance as parallel top-level keys
- Preserve backward compatibility via old schema as default

Control catalog:
- Replace versioned data file with importing security-baseline Go package
- Control definitions from upstream; probe mappings in Scorecard

Coverage validation:
- Add Step 3.5: human review of L1 coverage analysis before writing probes
- Gap probe estimates subject to validated coverage analysis

Forge support:
- Document GitHub (primary), GitLab (where probes work), Azure DevOps
  (deferred), local directory (file-based probes only)
- Controls unsupported on a forge produce UNKNOWN, not FAIL

Baseline levels:
- Document that L1/L2/L3 are one framework with levels, not three frameworks

Housekeeping:
- Remove checkmarks from Phase 1 Complete (not yet done)
- Move resolved questions to Resolved decisions section
- Remove probe catalog from Phase 1 (already moved to Phase 2)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
Add section documenting existing infrastructure that v6 should extend rather
than duplicate, based on comprehensive codebase review.

- Map execution pipeline integration point (conformance evaluator consumes
  Result.Findings after probes run, no parallel pipeline needed)
- Document 13 reusable components with file locations and how v6 uses each
- Identify 4 duplication risks: Framework Result interface may over-abstract,
  finding.Outcome types may already cover conformance status, applicability
  could use existing NotApplicable outcome, gap probes may overlap existing

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Stephen Augustus <foo@auggie.dev>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.67%. Comparing base (353ed60) to head (7e8b9b0).
⚠️ Report is 337 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4994      +/-   ##
==========================================
+ Coverage   66.80%   69.67%   +2.87%     
==========================================
  Files         230      251      +21     
  Lines       16602    15654     -948     
==========================================
- Hits        11091    10907     -184     
+ Misses       4808     3873     -935     
- Partials      703      874     +171     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines 235 to 236
3. **UNKNOWN-first honesty.** If Scorecard cannot observe a control, the
status is UNKNOWN with an explanation — never a false PASS or FAIL.
Copy link
Copy Markdown
Contributor

@JamieMagee JamieMagee Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this principle a lot, and I want to flag something for when non-GitHub platforms come into scope.

Probes operate on RawResults and don't know which platform produced the data. When ListReleases() returns ErrUnsupportedFeature on ADO, the raw check swallows the error, and the probe sees "no releases." It returns NotApplicable instead of UNKNOWN. Those mean different things: "I looked and there's nothing there" vs. "I couldn't look."

Doesn't need solving in Phase 1, but it'd help to say so in the principle text. Something like: "For non-GitHub platforms, distinguishing UNKNOWN from NOT_APPLICABLE requires platform capability metadata, deferred to a later phase."

Comment on lines +27 to +31
**Forge support in Phase 1:**
- **GitHub:** Primary target (full L1 coverage)
- **GitLab:** Deferred to a future phase
- **Azure DevOps:** Deferred to a future phase
- **Local directory:** Conformance results for file-based probes only
Copy link
Copy Markdown
Contributor

@JamieMagee JamieMagee Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferral makes sense. A few things that'd be easier to account for now while the abstractions are still being designed:

Framework.Evaluate(findings) (Step 1) doesn't have a way to say "this probe couldn't run because the platform doesn't support it." Adding that later means changing the interface after other things depend on it.

Also, ErrUnsupportedFeature handling in raw varies: license.go falls back to file detection, security_policy.go silently skips. Picking a canonical pattern before the conformance layer builds on top would save future headaches.

And the enriched JSON schema could include a reason field on UNKNOWN statuses from day one. Cheap now, painful later once consumers depend on the shape.

Comment on lines +167 to +168
// Evaluate takes probe findings and produces framework-specific results
Evaluate(findings []finding.Finding) (Result, error)
Copy link
Copy Markdown
Contributor

@JamieMagee JamieMagee Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about how this plays out on ADO: how does the evaluation layer distinguish "finding is absent because the platform can't see it" from "the project just doesn't do this thing"? Right now branch_protection.go catches ErrUnsupportedFeature and the probe downstream sees empty data with no context.

Maybe a PlatformCapabilities input alongside findings, or a new outcome like OutcomeUnobservable with a reason string. Just so the UNKNOWN-first principle can work beyond GitHub when the time comes.

- Metadata ingestion layer v1 — Security Insights as first supported source (BR-03.01, BR-03.02, QA-04.01); architecture supports additional metadata sources
- Scorecard control catalog extraction — Extract Scorecard checks into an in-project control framework representation that uses the same unified framework abstraction as OSPS Baseline. This enables checks and OSPS Baseline controls to be treated uniformly within the evaluation layer.

### Phase 2: Release integrity + Level 2 core
Copy link
Copy Markdown
Contributor

@JamieMagee JamieMagee Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Phase 2 release integrity work: six controls depend on "releases" (BR-02.01, BR-04.01, BR-06.01, LE-02.02, LE-03.02, QA-02.02). The concept maps clean to GitHub Releases, but ADO doesn't have a real equivalent. ADO teams ship through Azure Artifacts feeds, classic release pipelines, or pipeline artifacts, which all work differently.

A probe that only understands GitHub Releases would return NotApplicable for an ADO project that ships fine through Azure Artifacts. Probably just worth a note near Phase 2: "release-related probes will need platform-specific implementations."

| Webhooks check | `SCORECARD_EXPERIMENTAL` | "remove this check when v6 is released" | Promote to always-on in Phase 1 |
| SBOM check | `SCORECARD_EXPERIMENTAL` | "remove this check when v6 is released" | Promote to always-on in Phase 1 |
| Raw output format | `SCORECARD_V6` | none | Promote to always-on in Phase 1 |
| Azure DevOps support | `SCORECARD_EXPERIMENTAL` | none | Keep behind `scorecard.experimental` |
Copy link
Copy Markdown
Contributor

@JamieMagee JamieMagee Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a question: what would graduating ADO from scorecard.experimental look like?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants