For the past couple of days, I have focused on specific text-to-text generation tasks, including legalese to plain English translations and prose to poetry translations.
Legalese is the type of writing used in legal documents that can be very difficult to understand for the general public. Signed in 2010, the Plain Writing Act requires that Federal agencies promote clear communication so that the public can understand new laws and regulations that may affect them. There have also been many other movements promoting plain language, for example for lawyers to use plain English when they address the jury. An obvious market exists for translation tools that can effectively compress and simplify legalese.
As stated in my previous post, I have been collecting translations from legalese to plain English. I now have more than 200 different passage translations, along with approximately 700 different key phrase translations. These key phrases include English-to-English rewording, such as replacing “concerning the matter of” with the more colloquial word “about,” and common Latin phrase translations that are often used in legalese, for example translating the phrase “corpus delicti” as “material evidence.” A major challenge of this translation task is the breakdown of sentences; many times when translating from legalese to plain English, sentences are combined and sometimes deleted. So through translation, paraphrasing will need to take place across different sentences to result in more accurate translations. Another challenge will be dealing with the visual display of the translation; if the legalese contains a list of items or notes, in the translation, this list is structured so that each item gets a new line. For example, the legalese statement, “Federal employees are required to participate in this program if they are involved in the direct care of animals or their living quarters or have direct contact with animals (live or dead), their viable tissues, body fluids or waste” will be formatted as
“Federal employees must participate in this program if they:
- Take care of animals;
- Take care of animal living quarters;or
- Have other direct contact with:
- Live or dead animals;
- Viable animal tissues; or
- Animal body fluids or waste.”
I think a good place to start with this translation task is by focusing on paraphrasing singular sentences, specifically utilizing the key phrase translations I have collected.
Another interesting topic is the translation from prose to poetry. Writing poetry is viewed as a human intelligence tasks; a task that artificial intelligence could not possibly perform successfully. In fact, I have not been able to find any research papers discussing this translation task. The closest topics that I have found include translating Shakespeare to plain English (as discussed in the following research paper) and foreign language translation of poetry (discussed in this research paper). The latter topic addresses some of the difficulties foreseen when dealing with poetic rhythm and rhyming schemes.
Due to the lack of translated poetry from prose on the internet, I have been unable to successfully collect any passage translations. In order to go forward with this translation task, it may be necessary to post HITs on Amazon Mechanical Turk. The problem with this is it would require “Turkers” to write poetry, in which case every Turker will produce a different output. My suggestion is to collect input in the form of paraphrases and synonyms, instead of taking in sentence pairs. One reason being that collecting sentence pairs or prose-to-poetry as input seems challenging, if not impossible. Another reason because this type of translation will depend more on rhyming and syllable recognition, which will require a collection of synonyms and paraphrases, along with a collection of words or phrases that rhyme and convey the same meaning.
In addition to researching legalese-to-plain English and prose-to-poetry translations, I have also spent more time looking at the java code and the pipeline of the Joshua Decoder to learn more about how the decoder works. I plan on attending a tutorial on the Joshua Decoder tomorrow to further my understanding.