Shakespeare Statistics


Authorship of The Life and Death of Jack Straw

UK-Flagge auf deutsch All data were generated by R Stylo. See: Computational Stylistics Group Homepage


Rolling Classify makes use of classifiers like nsc (nearest shrunken centroid), svm (support vector machine) and delta. They were applied to word frequencies (mf1w), character bigrams (mf2c) and character trigrams (mf3c). An improved methodology had recourse to a large number of reference texts, all of which are single-authored and well attributed. Core plays of large corpora made sure that no bias came into being. The window size is between 1000 and 7000 words at a distance of 1000 words each, and a slice overlap of 250 words provides comparability with Rolling Delta results. The mathematical kernels of the classifiers are unique and explain differences in the results. Nsc has a rather low decision level, whereas svm has a high one, and is more reliable also for that reason. Vocabulary, however, is less reliable than character bi- and trigrams.
A majority of attributions favours William Shakespeare as author of The Life and Death of Jack Straw followed by Samuel Rowley, particularly in nsc classifications.




In contrast to Rolling Delta, the differently scoring classifiers throw up a large number of eligible authors depending on the selection of variables (mf1w, mf2c and mf3c) and on window size. The chart above results from a pre-selected set of plays:
chettle_hoffman.txt; daniels_cleop.txt; greene_friarbb.txt; kyd_soliman.txt; kyd_spanpure.txt; lodge_mariusscilla.txt; lyly_motherbombie.txt; mar_tamburlain1.txt; mar_tamburlain2.txt; mars_antmellid.txt; mars_malcontent.txt; nashe_summerslast.txt; peele_oldwives.txt; row_whenysee.txt; shak_hamlet.txt; shak_thnight.txt; sidney_marcantonie.txt; wilson_3ladieslondon.text;
But it is the windows with a larger size which have a clear penchant for Shakespeare. Only two of his core plays were used to avoid a bias towards authors with a larger corpus.
Compare these evaluations with the results of Rolling Delta and the General Imposters Method.