Best Practices: Using MTurk for tagging training data

This week,we came across a blog post by Hernán Correa that shares thoughts and best practices for using 万博体育网址Amazon Mechanical Turk (MTurk) for gathering the training data needed to develop Machine Learning models.

Inthis blog post,Hernán shares a few thoughts and best practices that almost any Requester will find valuable.Here are some excerpts about how to design great tasks,and how to review and approve submissions from Worker manbetx官网地址customers:

About Task Design

  • "Tasks should be kept as simple as possible so that every worker understands what the task is about.""
  • "…include at least three examples of HITs that have already been categorized.You might also find it useful to highlight relevant information in the examples so that workers can complete the task more easily.""
  • "The most important component of every HIT is its instructions.The more straightforward instructions are,the better the data you get is likely to be.""

About Reviewing and Approving Work

  • "If you are planning on evaluating the results of your HITs manually (which we believe might be a good practice),you should make it clear to [MTurk Workers] so that they know what to expect.This way,if Turkers do not like the evaluation method you will use,they will be able to decide not to work on your tasks.""
  • "Here's a word on ethics: if you do reject work -which is not a good idea,- do not use that data in your projects.Using data from rejected work might be really bad for both you and Turkers and it is,of course,unethical.""
  • "…[MTurk Workers] whose HITs have been rejected -and therefore don't get paid- are likely to get upset -even if you have provided them with information about the evaluation criteria for the responses to your HITs.This will probably lead to their posting negative comments about you and your HITs on forums""

You can check out the entire blog post here: