This March I’ll be presenting at the ACRL 2015 conference with Christine Murray (Bates College) on teaching data literacy in the library. To help me prepare and perhaps preview our discussion, I thought I’d post a few thoughts on the blog to get the juices flowing. Let’s begin with some definitions as they appear in both the library literature and the scholarship of statistics education in order to answer the question: what is data literacy?
In Libraryland, “data literacy” seems to be the most popular term (over statistical literacy, quantitative literacy, and numeracy), and consists of two aspects: information literacy and data management. From an information literacy perspective, the emphasis is on statistics, which are considered a special form of information but one that still falls under the information literacy umbrella. For example, Schield (2004:6) describes statistical literacy as the critical consumption of statistical information when used as evidence in arguments. Similarly, Stephenson and Caravello (2007) advocate for librarians to promote statistical literacy by assisting learners to locate and evaluate authoritative statistical sources, recalling Standards 2 and 3 of the 2000 ACRL Information Literacy Standards, as well as reference classics like the annual Statistical Abstract of the United States.
From the data management perspective, the emphasis is on data rather than statistics, and focuses on the organizational skills needed to create, process, and preserve original data sets. Returning to Schield (2004:7), he defines data literacy as the ability to obtain and manipulate data, but reserves these skills for certain fields of study such as business or the social sciences. Carson et al. (2011:631), based on interviews with faculty and GIS students, emphasize the importance of data management and curation skills required to “store, describe, organize, track, preserve, and interoperate data.” There is plenty of literature on data management, a hot topic in Libraryland fueled by interest in e-science initiatives, new data requirements for federal grants, and the creation of institutional repositories. In my experience, though, discussion of data management is often divorced from statistical literacy, perhaps due to its focus on faculty and other experts rather than data novices. Calzada Prado and Marzal (2013) do attempt to unify the information literacy and data management aspects under one rubric, although their proposal for five data literacy standards is largely derivative of the soon-to-be-sunsetted 2000 ACRL Information Literacy Standards, which doesn’t bode well for their wider adoption.
Turning away from librarianship, we find that statisticians and statistics educators typically use the term “statistical literacy” to describe the knowledge, skills, and dispositions surrounding their field. One widely cited exposition of statistical literacy is that of Iddo Gal (2002:2-3), who identifies two interrelated components: the ability to interpret and critically evaluate statistical information, as well as the ability to discuss and communicate one’s understanding, opinions, and concerns regarding such statistical information. Gal (2002:4) further describes a model of interrelated knowledge elements and dispositions that together enable statistically literate behavior. Gal’s definition will no doubt look familiar to information literacy librarians, incorporating the evaluative and communicative aspects of information literacy along with the dispositions and affective components we find highlighted under the new ACRL Framework.
But what is the nature of statistical information, the object of Gal’s model for statistical literacy? It may be helpful to consider this in terms put forth by George Cobb and David S. Moore (1997). In their oft cited article on statistics pedagogy, they break down statistical analysis into three interrelated phases: data production, data analysis, and formal inference. Each of these phases produces statistical information requiring varying levels of contextual and mathematical knowledge.
Data production includes aspects of the research process related to designing a study, creating a data set, and preparing the data for short term and long term analysis. Viewed from the library, the data production phase is most closely associated with data management skills. Data analysis, next in Cobb and Moore’s schema, consists of the exploratory and descriptive phase of data-driven research. This includes examining the data set to discover trends or outliers, and using descriptive statistics to reduce large amounts of data into summary information such as measures of central tendency and variance (e.g. mean, median, mode, range, percentiles, standard deviation). Through this analysis, researchers can make hypotheses or predictions about phenomena revealed by the data. Finally, formal inference can be used to draw conclusions about a population from findings in sample data. Here we find the notorious formulas full of Greek letters such as Student’s t-test, chi-square test, ANOVA, and regression models. I’ll return to Cobb and Moore’s pedagogical advice in a future post.
So back to the original question: what is data literacy?
I suggest librarians borrow heavily from statistics educators when trying to answer this question. To paraphrase Gal and apply his definition to Cobb and Moore’s three phases of statistical analysis, the simplest definition of data literacy is the ability to interpret, evaluate, and communicate statistical information. Central to this ability is an understanding of how statistical information is created, encompassing data production, data analysis, and formal inference. In other words, data literacy includes the ability to evaluate the modes of data production, including the underlying research design and means of sampling, and how this impacts the possible findings. Data literacy also includes the ability to interpret the results of formal inference tests, including confidence intervals and the probability that findings are representative of a population rather than coincidental to the given sample. And finally, data literacy includes the ability to interpret and communicate about the descriptive statistics learners and citizens encounter everyday, from unemployment rates to political polling.
And what about data management? Ultimately it belongs to the data production phase of Cobb and Moore’s schema, and is perhaps one aspect of data literacy that, as Schield intimated, can be reserved for the specialists. While the data literate person can identify and evaluate the soundness of a research design and data collection methods, perhaps only trained practitioners need the specialized skills to carry out a full-fledged project involving data curation and advanced tools. And in most instances, teaching these skills is beyond the purview of librarians. Stay tuned for more on this and data literacy instruction in the library.