We live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Additionally, powerful algorithms are capable of churning through seas of data to uncover patterns. Providing a simple and accessible introduction to data mining, Paul Attewell and David B. Monaghan discuss how data mining substantially differs from conventional statistical modeling familiar to most social scientists. The authors also empower social scientists to tap into these new resources and incorporate data mining methodologies in their analytical toolkits. Data Mining for the Social Sciences demystifies the process by describing the diverse set of techniques available, discussing the strengths and weaknesses of various approaches, and giving practical demonstrations of how to carry out analyses using tools in various statistical software packages.
PART 1. CONCEPTS
1. What Is Data Mining?
2. Contrasts with the Conventional Statistical Approach
3. Some General Strategies Used in Data Mining
4. Important Stages in a Data Mining Project
PART 2. WORKED EXAMPLES
5. Preparing Training and Test Datasets
6. Variable Selection Tools
7. Creating New Variables Using Binning and Trees
8. Extracting Variables
10. Classification Trees
11. Neural Networks
13. Latent Class Analysis and Mixture Models
14. Association Rules
Paul Attewell is Distinguished Professor of Sociology at the Graduate Center of the City University of New York, where he teaches doctoral level courses on quantitative methods including data mining and other courses on the sociology of education and on social stratification. Professor Attewell is the principal investigator of a grant from the National Science Foundation that supports an interdisciplinary initiative on data mining in the social and behavioral sciences and education. In projects funded by the Spencer and Gates and Ford Foundations, Paul Attewell has also studied issues of access and inequality in K-12 schools and in higher education. One of his previous books, Passing the Torch: Does Higher Education for the Disadvantaged Pay Off Across the Generations?, won the Grawemeyer Prize in Education and the American Education Research Association’s prize for outstanding book in 2009.
David B. Monaghan is a doctoral candidate in Sociology at the Graduate Center of the City University of New York, and has taught courses on quantitative research methods, demography, and education. His research is focused on the relationship between higher education and social stratification.
"Attewell and Monaghan show us how to find our way in the mountains of big data generated by administrative records, commercial transactions, and online traffic. They explain clearly why our usual methods won't work and how social scientists can apply data science innovations to answer our questions. Their introduction is useful now and also prepares us for developments yet to come."—Michael Hout, New York University
“The analysis of big data is becoming a core enterprise in social research. Paul Attewell and David Monaghan provide an excellent and accessible introduction to data mining tools that forward-looking social scientists can use for analyzing such data. I highly recommend it for experienced researchers and graduate students alike.”—Glenn Firebaugh, Seven Rules for Social Research
"Most social scientists are unaware of the latest data mining tools that are available to help them discover patterns in the data they analyze. With lucid prose, clear examples, and sound practical advice, Data Mining for the Social Sciences will open the eyes of many."—Stephen L. Morgan, Johns Hopkins University