Data Mining for the Social Sciences: An Introduction by Paul Attewell, David Monaghan

By Paul Attewell, David Monaghan

we are living in a global of massive information: the volume of data accumulated on human habit every day is superb, and exponentially more than at any time long ago. also, robust algorithms are able to churning via seas of knowledge to discover styles. supplying an easy and obtainable creation to information mining, Paul Attewell and David B. Monaghan talk about how info mining considerably differs from traditional statistical modeling everyday to so much social scientists. The authors additionally empower social scientists to faucet into those new assets and include facts mining methodologies of their analytical toolkits. Data Mining for the Social Sciences demystifies the method by means of describing the varied set of recommendations to be had, discussing the strengths and weaknesses of varied techniques, and giving sensible demonstrations of the way to hold out analyses utilizing instruments in a number of statistical software program packages.

Show description

Read or Download Data Mining for the Social Sciences: An Introduction PDF

Best demography books

The Geography of American Poverty: Is There a Need for Place-Based Policies?

Partridge and Rickman discover the vast geographic disparities in poverty around the usa. Their specialize in the spatial dimensions of U. S. poverty finds exact transformations throughout states, metropolitan components, and counties and leads them to think about why antipoverty guidelines have succeeded in a few locations and failed in others.

American Diversity: A Demographic Challenge for the Twenty-First Century

Demographers discover inhabitants range within the usa.

English Population History from Family Reconstitution 1580–1837

English inhabitants heritage from relations Reconstitution is the second one a part of the only most crucial demographic enquiry of the prior iteration, the 1st half being The inhabitants historical past of britain, 1541-1871. This learn proves that relations reconstitution has been rather profitable in acquiring exact information regarding the demography of prior populations.

Extra resources for Data Mining for the Social Sciences: An Introduction

Sample text

However, publications commonly report an overall classification error rate instead: (n21 + n12)/(n11 + n12 + n21 + n22). . Some articles report a measure called sensitivity, defined as n22/(n21 + n22). The false-negative rate is defined as the proportion of predicted negatives that were in reality positive: n21/(n11 + n21). Some also report a measure known as specificity, defined as n11/(n11 + n12). The false-positive rate is defined as the proportion of predicted positives that were in reality negative: n12/(n12 + n22).

When the predictive model (usually in the form of an equation) derived from a particular training sample is applied to a completely separate test sample, containing different observations or cases, then one can compare the predicted values obtained from the model to the observed values in the new dataset and determine how well they fit. This second step provides a trustworthy assessment of how well the predictive model works for data which were not used before. SOME GENERAL STRATEGIES USED IN DATA MINING • 33 The overfitting will “drop out” or fail to help in prediction of the test or holdout data because the part of the model that described chance patterns in the training data (the overfitted part) will fail to predict anything useful in the second or test dataset.

In part this is simply a matter of time and effort: it is extremely time-consuming to examine nonlinearity for many predictors. However, there is something more basic going on than simply time and convenience. A large part of conventional statistics has been built upon the concept of correlation—the extent to which as one variable increases in value, the other also changes. A whole dataset can be represented by a correlation matrix or a variance-covariance matrix that summarizes the relations between variables.

Download PDF sample

Rated 4.61 of 5 – based on 21 votes