Curated list of Awesome Training Data! (Data Labeling, Annotation, Discovery, Workflow etc)
Maintained by Diffgram
Contributions welcome!
- Diffgram Training Data (Data Labeling, Annotation, Workflow) for all Data Types (Image, Video, 3D, Text, Geo, Audio, more) at scale.
- CVAT Computer Vision Annotion Tool
- DOS (dos-kernel) Trust kernel for fleets of AI agents — verifies what an agent actually did from evidence the agent cannot forge (git ancestry, test results); its
reward()verdict gates which agent trajectories may enter a training set, rejecting "resolved" claims the evidence refutes. - Cleanlab Data-centric AI library for finding label errors and data quality issues in training sets.
- Training Data for Machine Learning (Anthony Sarkis, O'Reilly, 2022)