Right Marginal Data: Curation That Scales Without Collapsing
Once you can collect clean data reliably, the hard problems change. Active learning, deduplication at scale, contamination, and synthetic anchoring separate competent pipelines from expert ones.