AI writes the code. Who reviews it? SecureReview is the first OpenEnv harness that trains and grades agents on real security review — supply chain, infrastructure-as-code, and database migrations.
# Suggested by LLM during code generation openai==1.3.0 langchain-utils==0.5.2 ← hallucinated, not on PyPI streamlit-helpers==0.3.1 ← slopsquat opportunity torch-helpers==1.9.0 ← real pkg is "torch" chromadb-client==0.4.5 ← real pkg is "chromadb" embedding-models==2.1.0 ← does not exist vector-store==1.2.3 ← does not exist ai-toolkit==0.8.0 ← generic squat target
resource "aws_security_group" "db" { ingress { cidr_blocks = ["0.0.0.0/0"] ← Postgres open to internet } } resource "aws_db_instance" "analytics" { username = "admin" password = "Sup3rSecret!2023" ← in TF state publicly_accessible = true ← public RDS storage_encrypted = false ← unencrypted backup_retention_period = 0 ← no PITR } resource "aws_s3_bucket" "exports" { acl = "public-read" ← public bucket }
-- table: 4.2B rows · 1.4k writes/sec -- deploy: rolling, 0s downtime budget CREATE INDEX idx_dev_metric_time ← no CONCURRENTLY ON telemetry_records(device_id, metric, recorded_at); ALTER TABLE telemetry_records SET (fillfactor = 100); ← kills HOT updates CLUSTER telemetry_records ← AccessExclusiveLock USING idx_dev_metric_time; on a 4B-row table CREATE INDEX idx_payload ON telemetry_records USING gin(tags); ← no jsonb_path_ops
Typosquats, hallucinated PyPI imports, pinned CVEs. Supply-chain literacy.
CIS violations in Terraform / K8s — public buckets, wildcard IAM, privileged containers. Multi-file cloud reasoning.
SQL migrations against live production context — table sizes, write throughput, downstream services. Judgment, not lint.
# 1. start a dependency review episode curl -X POST https://sam25kat-securereview.hf.space/reset \ -d '{"task_id": "dependency_review"}' # 2. mark complete to receive the F1-graded reward curl -X POST https://sam25kat-securereview.hf.space/step \ -d '{"action": {"action_type": "mark_complete"}}'