Mechanical Turk

As I introduced in a previous post, Specific Text-to-Text Generation Tasks, Mechanical Turk is a website that allows researchers to post HITs or Human Intelligence Tasks. We can post HITs where the Turker must rank the paraphrases produced by our machine translator. The Turker will rank the paraphrases from 1 to 5 based on two categories: meaning retention and grammaticality. The score of a 5 means the paraphrase had perfect meaning retention and no grammatical errors, respectively.

In order to post HITs on Mechanical Turk, I must create a webpage which can take in the sentence pair data, allow the user to rank the sentence pairs, return and evaluate the data, and have this working with the Mechanical Turk interface. With the help of Ryan Cotterell, we began development of the webpage. Note that MT was defined as /home/ryanc/research/aws-mturk-clt-1.3.0/.

First we set up the experiment properties found at $MT/hits/paraphrase_judgements/

## External HIT Properties

title:Compression Task
description:Rate Meaning and Grammer of Paraphrases
keywords:english, paraphrasing

To set up the questions for the HITs and add the sentence pairs to the webpage remotely, we added this line to the exp.question file at $MT/hits/paraphrase_judgements/exp.question:


This allows us to import 3 sentence pairs onto the webpage. These sentence pairs were defined in exp.input at $MT/hits/paraphrase_judgements/exp.input. Just for testing purposes, we defined example sentence pairs that will not be implemented when running actual HITs:

url original1 original2 original3 paraphrase1 paraphrase2 paraphrase3 He kicked the bucket Hello Goodbye He died. Hola Sayonara

The url is in the first column. Then the original1 sentence “He kicked the bucket” is paired with the paraphrase1 sentence “He died.”

The results can be found in the $MT/hits/paraphrase_judgements directory after Turkers evaluate the paraphrasing data.

This HIT only appears in the sandbox, as defined in the script $MT/bin/ This ensures that no Turkers are attempting this HIT until we are ready to post actual data.

Regarding the HTML code, it is important to make sure the webpage correlates with Mechanical Turk. So, we must include code to register Turker information, including their user ID and how many hits they have worked on. This way we can evaluate whether Turkers should receive the reward for their HITs (usually on the order of $0.10 per HIT). It is also important to include a submit button that can interact with Mechanical Turk:

<form id=”mturk_form” method=”POST” action=”; charset=”UTF-8″>

<input type=”submit”/>

The remainder of the setup includes adding the text boxes and the buttons for each of the paraphrase pairs. To set up the first sentence pair:

<div name=”stimulus1″ id=”stimulus1″>
<p> Paraphrase Examplar
<textarea readonly rows=”1″ cols=”70″ name=”original1″ id=”original1″ style=”font-size: 16px”> </textarea>
<textarea readonly rows=”1″ cols=”70″ name=”paraphrase1″ id=”paraphrase1″ style=”font-size: 16px”></textarea>

Please rate the <b><u>preservation of meaning</u></b> of the compression pair on a scale
from 1 to 5
<input type=”radio” name=”is_meaningful1″ value=”qual5″ /> 5
<input type=”radio” name=”is_meaningful1″ value=”qual4″ /> 4
<input type=”radio” name=”is_meaningful1″ value=”qual3″ /> 3
<input type=”radio” name=”is_meaningful1″ value=”qual2″ /> 2
<input type=”radio” name=”is_meaningful1″ value=”qual1″ /> 1
Please rate the <b><u>grammaticality</u></b> of the compression pair on a scale
from 1 to 5
<input type=”radio” name=”is_grammatical1″ value=”qual5″ /> 5
<input type=”radio” name=”is_grammatical1″ value=”qual4″ /> 4
<input type=”radio” name=”is_grammatical1″ value=”qual3″ /> 3
<input type=”radio” name=”is_grammatical1″ value=”qual2″ /> 2
<input type=”radio” name=”is_grammatical1″ value=”qual1″ /> 1

The code above creates the text areas for the sentence pair and also the buttons: 5 to 1 for meaning retention and 5 to 1 for grammaticality. Then the sentence pairs are added to the page by setting the variables:

$(“#paraphrase1”).text(gup(“paraphrase1” ));

function gup(paramname) {
var regexS = “[\\?&]” + paramname + “=([^&#]*)”;
var regex = new RegExp(regexS);
var tmpURL = decodeURI(window.location.href); // decodeURI to preserve UTF

var results = regex.exec(tmpURL);
if (results == null)
return “”;
return results[1];

As of right now, there is duplicated code for each paraphrase pair on each HIT and I hope to figure out how to create a loop which can go through each of the sentence pairs and add them to additional text areas. I also have yet to add instructions to the HITs or successfully implement the worker information. Another thing to take into consideration is to make the webpage as user friendly as possible, so as to minimize user errors when filing out the HITs.

Aside: make sure to add to the header of the code:

<script type=”text/javascript”
<script src=”js/jquery.cookie.js”></script>

To develop the webpage in a more timely manner, I plan on implementing Courtney Napoles’s¬†Mechanical Turk webpage that closely resembles the type of HITs we are interested in posting.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: