Deterministic Rule-Based Core · Optional LLM Review
All skill inferences are produced by explicit, deterministic, auditable rules applied to raw GitHub API activity data. The same input always produces the same output. Every score is fully traceable. An optional LLM review layer (configurable in Settings) can provide human-readable summaries and recommendations, but it absolutely does not affect scores, confidence levels, or evidence trails. We do not use opaque AI scoring.

Data Sources

GitHub REST API Only
All data is fetched exclusively from the GitHub REST API. This means:
  • No local Git repository cloning or mining is performed
  • Analysis is limited to what the GitHub API exposes (may not include all historical data)
  • Private repository access requires a properly scoped Personal Access Token
  • API rate limits apply and may affect ingestion completeness for large repositories
What we ingest via GitHub API
  • Commits: SHA, author, message, timestamp, additions, deletions, files changed (via detail endpoint)
  • Pull Requests: Author, title, state, merge status, timestamps, additions, deletions, changed files (via detail endpoint)
  • Reviews: Reviewer, state (approved/changes_requested/commented), timestamp
  • Review Comments: Reviewer, file path, body, timestamp
  • Files: Path, detected language (by file extension heuristic)

Metrics

MetricDescriptionLimitations
Contribution Activity Commits and PRs per contributor per month Volume does not indicate quality. Pair programming and mob programming are not captured.
Code Ownership Proportion of changes to each file by each contributor Based on change frequency, not code criticality. Refactoring skews results.
Code Churn Ratio of deletions to total changes High churn may indicate refactoring (positive) or rework (negative). Context needed.
Bus Factor Minimum contributors to cover 50% of files Based on files touched, not knowledge depth. Does not account for documentation or mentoring.
Language Distribution File changes grouped by detected language Language detected by file extension only. Config files may be miscategorized.
Review Participation Reviews and comments per contributor as proportion of total PRs Does not measure review quality, depth, or impact.

Inference Rules

RuleWhat It InfersKey CriteriaLimitations
language_familiarity_v1 Familiarity with a programming language Repeated commits in files of that language, weighted by language share and commit share Based on file extensions, not code quality. Familiarity does not equal proficiency.
module_familiarity_v1 Familiarity with a code module/directory Repeated changes across multiple files in same directory structure, weighted by module share Module boundaries from directory structure may not match logical domains.
review_participation_v1 Active participation in code review Sustained review activity over time (min 2 reviews, higher thresholds for higher confidence) Does not distinguish rubber-stamp approvals from thorough reviews.

Scoring & Confidence

Score (0.0 - 1.0)

Reflects volume, consistency, and proportion of observed activity. Higher scores mean more repeated, sustained activity. Scores are not proficiency ratings.

Confidence Levels
  • High Substantial evidence from many data points over an extended period
  • Medium Moderate evidence; more data would strengthen the inference
  • Low Limited evidence; treat as preliminary signal only
What This Platform Does NOT Do

Background Processing Model

How ingestion works

Ingestion runs as an in-process background task within the web server. When you trigger ingestion for a repository, the pipeline runs through three sequential stages: GitHub API data fetch, metrics computation, and skill inference.

Current guarantees and limitations
  • In-process only: Processing runs inside the web server process — there is no external job queue or background worker. If the process restarts mid-run, that run is lost.
  • Durability: Each pipeline run is tracked as a ProcessingRun record with status (running / completed / failed) and error details
  • Timeout: A 600-second per-repository timeout prevents indefinite hangs
  • No retry: Failed ingestion must be manually re-triggered; there is no automatic retry
  • No queue: Concurrent ingestion tasks run in the same process; resource contention is possible for many simultaneous repos
  • Visibility: Pipeline run status is visible via the ingestion status API (/api/ingestion/status)
  • Partial failure: If metrics or inference fail after ingestion succeeds, the run is marked as partially failed and the granular error is recorded.

Principles