pythonamazon-web-servicesboto3mechanicalturk

Error message when submitting HIT to Amazon Mechanical Turk


I have a problem submitting a HIT to Amazon Mechanical Turk sandbox.

I'm using the following code to submit a HIT:

external_content = """"
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
  <ExternalURL>https://MY_HOST_GOES_HERE/</ExternalURL>
  <FrameHeight>400</FrameHeight>
</ExternalQuestion>
"""

import boto3

import os

region_name = 'us-east-1'

aws_access_key_id = 'MYKEY'
aws_secret_access_key = 'MYSECRETKEY'

endpoint_url = 'https://mturk-requester-sandbox.us-east-1.amazonaws.com'

# Uncomment this line to use in production
# endpoint_url = 'https://mturk-requester.us-east-1.amazonaws.com'

client = boto3.client('mturk',
                      endpoint_url=endpoint_url,
                      region_name=region_name,
                      aws_access_key_id=aws_access_key_id,
                      aws_secret_access_key=aws_secret_access_key,
                      )

# This will return $10,000.00 in the MTurk Developer Sandbox
print(client.get_account_balance()['AvailableBalance'])


response = client.create_hit(Question=external_content,
                             LifetimeInSeconds=60 * 60 * 24,
                             Title="Answer a simple question",
                             Description="Help research a topic",
                             Keywords="question, answer, research",
                             AssignmentDurationInSeconds=120,
                             Reward='0.05')

# The response included several helpful fields
hit_group_id = response['HIT']['HITGroupId']
hit_id = response['HIT']['HITId']

# Let's construct a URL to access the HIT
sb_path = "https://workersandbox.mturk.com/mturk/preview?groupId={}"
hit_url = sb_path.format(hit_group_id)

print(hit_url)

The error message I get is:

botocore.exceptions.ClientError: An error occurred (ParameterValidationError) when calling the CreateHIT operation: There was an error parsing the XML question or answer data in your request.  Please make sure the data is well-formed and validates against the appropriate schema. Details: Content is not allowed in prolog. (1493572622889 s)

What might be the reason here? The xml fully agrees with xml schema located on amazon servers.

The html returned by the external host is:

<!DOCTYPE html>
<head>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/>
<script src='https://s3.amazonaws.com/mturk-public/externalHIT_v1.js' type='text/javascript'></script>
</head>
<body>
<!-- HTML to handle creating the HIT form -->
<form name='mturk_form' method='post' id='mturk_form' action='https://workersandbox.mturk.com/mturk/externalSubmit'>
<input type='hidden' value='' name='assignmentId' id='assignmentId'/>
<!-- This is where you define your question(s) --> 
<h1>Please name the company that created the iPhone</h1>
<p><textarea name='answer' rows=3 cols=80></textarea></p>
<!-- HTML to handle submitting the HIT -->
<p><input type='submit' id='submitButton' value='Submit' /></p></form>
<script language='Javascript'>turkSetAssignmentID();</script>
</body>
</html>

Thank you


Solution

  • This message "Details: Content is not allowed in prolog." is the clue. It turns out that what this is saying is that you can't have content outside of where it is expected. This is what usually happens when a junk character (think smart-quotes or non-printable ASCII value) appears in there. These can be a real pain in the butt to diagnose.

    In your case, it's a little easier to debug but still just as frustrating. Check out this line:

    external_content = """"
    

    It turns out that Python only needs three quotes (""") in order to acknowledge a multi-line string definition. Thus your fourth " was actually rendering as part of the XML. Change that line to this:

    external_content = """
    

    And you're golden. I just tested it and it works. Sorry for all the frustration, but hopefully this unblocks you. Happy Sunday!