dynamic image
Reading by Numbers
An Outcome of The Australian Literary and Publishing History Project led by Katherine Bode
  • Methodology

    Downloadable data for the following topics is available:

    Why have I made these datasets available?

    These datasets represent months of work collecting and collating the information in AustLit on Australian novels, and significantly expanding this information to enable empirical analysis of major research questions in Australian literary studies. There are still many other research areas that could be investigated using these datasets. Why, then, am I making them publicly available, rather than continuing to analyse them, and publish the results, myself?

    Part of the answer to this question is that there is far more information in these datasets than one person could hope to explore and interpret in a lifetime. I hope, in making them available, to increase the likelihood that others will feel motivated to analyse and interpret this information to enhance our understanding of the history of the Australian novel.

    Making these datasets freely available also contributes to what I see as fundamental methodological imperatives of quantitative literary scholarship: openness, testability and accountability. No one would publish a work of literary criticism about a text that no one else has access too. Readers of that criticism need to read the text to consider whether they agree with the interpretations offered. Likewise, it is necessary that the 'source texts' – the datasets – used in quantitative literary studies are available, so that others can explore and query the nature of the data and the interpretations presented, and in so doing, assess the arguments made and, if necessary, challenge them.

    How were these datasets created?

    The first two datasets are based on data in AustLit, supplemented by further research. I extracted the records for these datasets using the following steps:

    1. Performing guided searches in AustLit, asking for Type – 'single work' – and Form – 'novel' – records for particular year ranges;
    2. Displaying these results as tagged text (NOTE: during the period when I created and updated these datasets – January 2007 to December 2011 – AustLit would not display more than 999 records as tagged text; as long as this remains the case, those wishing to extract data via this process will need to design searches that return less than 1000 results);
    3. Copying and pasting these results into a text file;
    4. Using command lines in terminal to group the data and then copying and pasting the results into Excel.

    This process left me with Excel files that initially included the type, title, author, year of publication, publisher and genre/s for Australian novels first published between 1830 and 1899 and between 1945 and 2009. I then added information to the datasets as my research developed and specific questions emerged (for full descriptions of the content of these datasets see below).

    The third, fourth and fifth datasets were created with Dr Tara Murphy, who works in the Schools of Information Technologies and Physics at the University of Sydney, and research assistant Jonathan Hutchinson, then an Honours student in the School of Information Technologies. Directed by Tara, Jonathan wrote a script that automatically extracted the 'works about' Australian novelists from AustLit, and then Tara analysed the results to produce these datasets. The third dataset – 'Critical attention to Australian novelists overall, 1945 to 2009' – shows the results of the overall results of this extraction, and lists for each year in this period the first fifty Australian novelists ranked in order of the number of 'works about' they received. The fourth and fifth datasets show these results, from 1950 to 2009, for 'works about' published in, respectively, newspapers and academic journals. Identification of publications as newspapers or academic (peer-reviewed) journals was done manually. (NOTE: regarding academic publications, titles were categorised retrospectively, based on whether they were peer-reviewed in 2007.)

You might be interested in...

X