Monthly Archives: July 2013

Rhyming Poetry with Haikus

This week, I began delving more into my prose to poetry translation task. Juri told me to just focus on rhyming the last word of already written haikus. Hopefully from here we can expand to paraphrasing the lines and finally to breaking up English prose sentences into poetic lines to create the best possible rhymed […]

Support Vector Machines

Support Vector Machines (or SVM) are supervised learning models used for classification analysis. Classification is the problem of determining which set an unseen observation belongs to through the use of a trained classifier. There is a good article on the theory behind SVMs that can be accessed here: SVM. I am going to use a classifier […]

Bird Annotations

Ann Irvine asked me to help with her project distinguishing scientific bird names with common bird names. Ann is trying to classify different bird names based on whether the common name is a direct translation of the scientific name. A direct translation relies on mostly Latin and Greek roots, with usually two different roots combined to […]

Parsed Data Analysis II

I had difficulty running the legalese from the document I translated through the parser. The other 5 text files I have sent through the parser, took approximately 100 seconds to finish. This text file (sentences_2.legal) took more than one day to finish running and the line count of the corpus.legal.Parsed file was 20 instead of […]

Parsed Data Analysis

I have been trying to determine good ways to distinguish legalese from plain English through the use of coding. I need to come up with rules to teach the decoder, so when presented with unseen phrases or challenging sentences, the decoder will have a better success rate with translation into plain English. The easier ones […]

Legalese to Plain English Translation

I have complied 411 different legalese to plain English sentence pairs. I found 285 sentence pairs from the internet; sources include plainlanguage.gov and Michigan Bar Journal. Out of the 285, I am only taking the 131 sentence pairs where the plain English structurally resembles the legalese. I have disregarded the other 154 pairs that I found […]