I am still focusing on rhyming the final words in a haiku. I decided to change how the synonym accuracy was affecting the final value. Instead of adding p(e|f) and p(f|e), I decided to multiply them together so that accuracy played a larger role in the final value. To compare differences between the original algorithm […]

This week, I began delving more into my prose to poetry translation task. Juri told me to just focus on rhyming the last word of already written haikus. Hopefully from here we can expand to paraphrasing the lines and finally to breaking up English prose sentences into poetic lines to create the best possible rhymed […]

Support Vector Machines (or SVM) are supervised learning models used for classification analysis. Classification is the problem of determining which set an unseen observation belongs to through the use of a trained classifier. There is a good article on the theory behind SVMs that can be accessed here: SVM. I am going to use a classifier […]

Ann Irvine asked me to help with her project distinguishing scientific bird names with common bird names. Ann is trying to classify different bird names based on whether the common name is a direct translation of the scientific name. A direct translation relies on mostly Latin and Greek roots, with usually two different roots combined to […]

I had difficulty running the legalese from the document I translated through the parser. The other 5 text files I have sent through the parser, took approximately 100 seconds to finish. This text file (sentences_2.legal) took more than one day to finish running and the line count of the corpus.legal.Parsed file was 20 instead of […]

I have been trying to determine good ways to distinguish legalese from plain English through the use of coding. I need to come up with rules to teach the decoder, so when presented with unseen phrases or challenging sentences, the decoder will have a better success rate with translation into plain English. The easier ones […]

I have complied 411 different legalese to plain English sentence pairs. I found 285 sentence pairs from the internet; sources include plainlanguage.gov and Michigan Bar Journal. Out of the 285, I am only taking the 131 sentence pairs where the plain English structurally resembles the legalese. I have disregarded the other 154 pairs that I found […]

Dr. Callison-Burch has asked me to perform Word Alignment HITs on Mechanical Turk. These HITs are only available for researchers working with him because of the difficulty level of the task. Word alignment, as defined by Wikipedia, is the natural language processing task of identifying translation relationships among the words (or more rarely multiword units) in a bitext, resulting […]

As I introduced in a previous post, Specific Text-to-Text Generation Tasks, Mechanical Turk is a website that allows researchers to post HITs or Human Intelligence Tasks. We can post HITs where the Turker must rank the paraphrases produced by our machine translator. The Turker will rank the paraphrases from 1 to 5 based on two […]

I am now beginning to get into paraphrasing tasks; moving away from the two translation experiments that I ran and analyzed (the very small English to Spanish translation experiment and the larger English to Spanish translation during the Joshua tutorial). Juri gave me a large amount of paraphrasing input data that he asked me to […]