Shakespeare Statistics


The General Imposters Method and The Life and Death of Jack Straw

BRD-Flagge in German     All data were generated with R Stylo. See:


In his blog "Authorship verification with the package 'stylo'" of May 30, 2018 Maciej Eder (https://computationalstylistics.github.io/blog/imposters/) described a new feature of the stylo package 'namely the General Imposters (GI) method, also referred to as the second verification system, introduced by Koppel and Winter (2014) and applied to the study of Julius Caesar's disputed writings (Kestemont et al., 2016a).' Eder then quotes the authors: "[t]he general intuition behind the GI, is not to assess whether two documents are simply similar in writing style, given a static feature vocabulary, but rather, it aims to assess whether two documents are significantly more similar to one another than other documents, across a variety of stochastically impaired feature spaces (Eder, 2012; Stamatatos, 2006), and compared to random selections of so-called distractor authors (Juola, 2015), also called 'imposters'." (Kestemont et al., 2016a: 88). In the context of the authorship attribution of the early history play The Life and Death of Jack Straw (pr. 1593) the following play texts were in the corresponding folder:
anon_jackstraw.txt; chettle_hoffman.txt; greene_friarbb.txt; kyd_soliman.txt; kyd_spanpure.txt; lodge_mariusscilla.txt; lyly_motherbombie.txt; mar_tamburlain1.txt; mar_tamburlain2.txt; mars_antmellid.txt; mars_malcontent.txt; nashe_summerslast.txt; peele_oldwives.txt; row_whenysee.txt; shak_hamlet.txt; shak_thnight.txt; sidney_marcantonie.txt; wilson_3ladieslondon.txt
For each of these texts, word frequencies were examined with GI, where the number was determined with 5000 and the classical delta method was used in each case, supplemented by the so-called Wurzburg distance and Ruzicka metrics.
Rather than using the function imposters() a script by Jan Rybicki examined 5000 words and then optimized the results by providing a lower and an upper boundary of uncertain results. Authors above the upper value were identified with the respective method, either delta, delta and Wurzburg distance, and delta plus Ruzicka distance. It assumes that all the texts to be analyzed are already pre-processed and represented in a form of a matrix with frequencies of features (usually words). The function contrasts, in several iterations, a text in question against (1) some texts written by possible candidates to authorship, or the authors that are suspected of being the actual author, and (2) a selection of "imposters", or the authors that could not have written the text to be assessed. The allocation resulted in the following tabular overview:
Table 1


Delta and the very precise Ruzicka metrics indicate preferably William Shakespeare's authorship of The Life and Death of Jack Straw.

Compare these evaluations with the results of Rolling Classify and of Rolling Delta.