Code to reproduce results from the paper:
CROWDLAB: Supervised learning to infer consensus labels and quality scores for data with multiple annotators
NeurIPS 2022 Human in the Loop Learning Workshop
This repository benchmarks algorithms that estimate:
This repository is only for intended for scientific purposes. To apply the CROWDLAB algorithm to your own multi-annotator data, you should instead use the implementation from the official cleanlab library.
Code to benchmark methods for active learning with multiple data annotators can be found in the active_learning_benchmarks folder.
To run the model training and benchmark, you need to install the following dependencies:
pip install ./cleanlab
pip install ./crowd-kit
pip install -r requirements.txt
Note that our cleanlab/ and crowd-kit/ folders here contain forks of the cleanlab and crowd-kit libraries. These forks differ from the main libraries as follows:
cleanlab fork contains various multi-annotator algorithms studied in the benchmark (to obtain consensus labels and compute consensus and annotator quality scores) that are not present in the main library.crowd-kit fork addresses some numeric underflow issues in the original library (needed for properly ranking examples by their quality). Instead of operating directly on probabilities, our fork does calculations on log-probabilities with the log-sum-exp trick.To benchmark various multi-annotator algorithms using given predictions from already trained classifier models, run the following notebooks:
To generate the multi-annotator datasets and train the image classifier considered in our benchmarks, run the following notebooks: