multiannotator-benchmarks

Benchmarking methods for classification data labeled by multiple annotators

Code to reproduce results from the paper:

CROWDLAB: Supervised learning to infer consensus labels and quality scores for data with multiple annotators
NeurIPS 2022 Human in the Loop Learning Workshop

This repository benchmarks algorithms that estimate:

  1. A consensus label for each example that aggregates the individual annotations.
  2. A confidence score for the correctness of each consensus label.
  3. A rating for each annotator which estimates the overall correctness of their labels.

This repository is only for intended for scientific purposes. To apply the CROWDLAB algorithm to your own multi-annotator data, you should instead use the implementation from the official cleanlab library.

Code to benchmark methods for active learning with multiple data annotators can be found in the active_learning_benchmarks folder.

Install Dependencies

To run the model training and benchmark, you need to install the following dependencies:

pip install ./cleanlab
pip install ./crowd-kit
pip install -r requirements.txt

Note that our cleanlab/ and crowd-kit/ folders here contain forks of the cleanlab and crowd-kit libraries. These forks differ from the main libraries as follows:

Run Benchmarks

To benchmark various multi-annotator algorithms using given predictions from already trained classifier models, run the following notebooks:

  1. benchmark.ipynb - runs the benchmarks and saves results to csv
  2. benchmark_results_[…].ipynb - visualize benchmark results in plots

Generate Data and Train Classfier Model

To generate the multi-annotator datasets and train the image classifier considered in our benchmarks, run the following notebooks:

  1. preprocess_data.ipynb - preprocesses the dataset
  2. create_labels_df.ipynb - generates correct absolute label paths for images in preprocessed data
  3. xval_model_train.ipynb / xval_model_train_perfect_model.ipynb - trains a model and obtains predicted class probabilities for each image