KINGSTON, R.I, May 1, 2017— Under the supervision of statistics professors and through the completion of courses such as Statistical Analysis of Network Data and Independent Study in Statistics, Timothy Leonard of East Greenwich took home the Liberty Mutual award for best poster at this year’s New England Statistics Symposium (NESS) hosted at the University of Connecticut on April 22.
The winning poster was chosen from a pool of submissions from some of the most prestigious universities in New England. For the past three years, the winner of the best poster award has been a URI student. In 2015, Daven Amin won the award with the help of Professor Natallia Katenka as his advisor. In 2016, Anton Lobach received the award with the help of Professor Gavino Puggioni as his advisor. Professor Katenka also advised this year’s winner.
The main goal of NESS is to bring together statisticians from all over New England to share their research in statistics and related fields, discuss emerging issues, and network with colleagues.
NESS activities are usually spread over two days and include a series of short courses, parallel research presentations, keynote talks, and student poster sessions.
For his first-place poster, Leonard developed a statistical graph model that measures the tendency for objects of the same type to occur sequentially in sequenced data. When applied to the machine learning problem of authorship prediction, the model is on average 96 percent accurate given writing samples of 10,000 words each, 5 authors, with 30 observations per author. The technique can be applied to any graph where the vertices have an attribute and is called an assortative mixture model. In his case, the assortative mixture was of English parts of speech.
“The idea came from a language model I developed over the summer where I tagged parts of speech to create Markov chains,” said Leonard. “The innovation of including all of the parts of speech, as opposed to picking and choosing what words or parts of speech to include in the language model, is new. No one has attempted to graphically analyze parts of speech in totality and apply the results to machine learning.”
A Markov chain is a randomly determined model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.
“It was the influence of my professors in the computer science and statistics department that led me to analyze the language model in the way that I did,” said Leonard. “They provided me with the direction and the tools to look at the model from a new perspective.”
Leonard hopes that his research leads to advances in how machines understand language. As far as human applications, when the model is applied to word graphs, it gives a grammar signature of the writer. It may be possible to score grammar using his technique, rather than to grade grammar by hand, the way it is done now.
Olivia Ross, an intern in the Marketing and Communications Department at URI and public relations major, wrote this press release.