Saved in:
Main Authors: | , |
---|---|
Format: | Book |
Language: | English |
Published: |
Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo
O'Reilly
2017
|
Edition: | First edition |
Subjects: | |
Links: | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029364482&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
Item Description: | Hier auch später erschienene, unveränderte Nachdrucke |
Physical Description: | xvi, 298 Seiten Illustrationen, Diagramme |
ISBN: | 9781491952962 |
Staff View
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV043955725 | ||
003 | DE-604 | ||
005 | 20181119 | ||
007 | t| | ||
008 | 161212s2017 xx a||| |||| 00||| eng d | ||
020 | |a 9781491952962 |c pbk. |9 978-1-4919-5296-2 | ||
035 | |a (OCoLC)990292891 | ||
035 | |a (DE-599)BVBBV043955725 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-355 |a DE-1028 |a DE-706 |a DE-1050 |a DE-11 |a DE-739 |a DE-20 |a DE-384 |a DE-898 |a DE-523 |a DE-1046 | ||
084 | |a SK 850 |0 (DE-625)143263: |2 rvk | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
100 | 1 | |a Bruce, Peter C. |d 1953- |e Verfasser |0 (DE-588)1104275260 |4 aut | |
245 | 1 | 0 | |a Practical statistics for data scientists |b 50 essential concepts |c Peter Bruce and Andrew Bruce |
250 | |a First edition | ||
264 | 1 | |a Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo |b O'Reilly |c 2017 | |
300 | |a xvi, 298 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Hier auch später erschienene, unveränderte Nachdrucke | ||
650 | 0 | 7 | |a Data Science |0 (DE-588)1140936166 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Statistik |0 (DE-588)4056995-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Datenanalyse |0 (DE-588)4123037-1 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Big Data |0 (DE-588)4802620-7 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | |5 DE-604 | |
689 | 1 | 0 | |a Data Science |0 (DE-588)1140936166 |D s |
689 | 1 | 1 | |a Big Data |0 (DE-588)4802620-7 |D s |
689 | 1 | 2 | |a Statistik |0 (DE-588)4056995-0 |D s |
689 | 1 | 3 | |a Datenanalyse |0 (DE-588)4123037-1 |D s |
689 | 1 | |8 1\p |5 DE-604 | |
700 | 1 | |a Bruce, Andrew |e Verfasser |0 (DE-588)170744183 |4 aut | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029364482&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
883 | 1 | |8 1\p |a cgwrk |d 20201028 |q DE-101 |u https://d-nb.info/provenance/plan#cgwrk | |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-029364482 |
Record in the Search Index
DE-BY-OTHR_call_number | F 03/ST 530 B886 |
---|---|
DE-BY-OTHR_katkey | 5837051 |
DE-BY-OTHR_location | 02 |
DE-BY-OTHR_media_number | 067004151372 067004151394 |
DE-BY-UBR_call_number | 8912/SK 850 B886 |
DE-BY-UBR_katkey | 5837051 |
DE-BY-UBR_location | UB Handapparat Naturwiss. Prof. Hartig |
DE-BY-UBR_media_number | TEMP12592686 |
_version_ | 1835105699865034752 |
adam_text | Table of Contents
Preface...................................................................... xiii
1. Exploratory Data Analysis................................................ 1
Elements of Structured Data 2
Further Reading 4
Rectangular Data 5
Data Frames and Indexes 6
Nonrectangular Data Structures 7
Further Reading 8
Estimates of Location 8
Mean 9
Median and Robust Estimates 10
Example: Location Estimates of Population and Murder Rates 12
Further Reading 13
Estimates of Variability 13
Standard Deviation and Related Estimates 15
Estimates Based on Percentiles 17
Example: Variability Estimates of State Population 18
Further Reading 19
Exploring the Data Distribution 19
Percentiles and Boxplots 20
Frequency Table and Histograms 21
Density Estimates 24
Further Reading 26
Exploring Binary and Categorical Data 26
Mode 28
Expected Value 28
Further Reading 29
Correlation
Scatterplots
Further Reading
Exploring Two or More Variables
Hexagonal Binning and Contours (Plotting Numeric versus Numeric Data)
Two Categorical Variables
Categorical and Numeric Data
Visualizing Multiple Variables
Further Reading
Summary
2. Data and Sampling Distributions............................................
Random Sampling and Sample Bias
Bias
Random Selection
Size versus Quality; When Does Size Matter?
Sample Mean versus Population Mean
Further Reading
Selection Bias
Regression to the Mean
Further Reading
Sampling Distribution of a Statistic
Central Limit Theorem
Standard Error
Further Reading
The Bootstrap
Resampling versus Bootstrapping
Further Reading
Confidence Intervals
Further Reading
Normal Distribution
Standard Normal and QQ-Plots
Long-Tailed Distributions
Further Reading
Student s t-Distribution
Further Reading
Binomial Distribution
Further Reading
Poisson and Related Distributions
Poisson Distributions
Exponential Distribution
Estimating the Failure Rate
29
32
34
34
34
37
38
40
42
42
43
44
46
47
48
49
49
50
51
53
53
55
56
57
57
60
60
61
63
64
65
67
69
69
72
72
74
74
75
75
76
vi | Table of Contents
Weibull Distribution 76
Further Reading 77
Summary 77
Statistical Experiments and Significance Testing..................... ........... 79
A/B Testing 80
Why Have a Control Group? 82
Why Just A/B? Why Not C, D...? 83
For Further Reading 84
Hypothesis Tests 85
The Null Hypothesis 86
Alternative Hypothesis 86
One-Way Two-Way Hypothesis Test 87
Further Reading 88
Resampling 88
Permutation Test 88
Example: Web Stickiness 89
Exhaustive and Bootstrap Permutation Test 92
Permutation Tests: The Bottom Line for Data Science 93
For Further Reading 93
Statistical Significance and P-Values 93
P-Value 96
Alpha 96
Type 1 and Type 2 Errors 98
Data Science and P-Values 98
Further Reading 99
t-Tests 99
Further Reading 101
Multiple Testing 101
Further Reading 104
Degrees of Freedom 104
Further Reading 106
ANOVA 106
F-Statistic 109
Two-Way ANOVA 110
further Reading 111
Chi-Square Test 111
Chi-Square Test: A Resampling Approach 112
Chi-Squared Test: Statistical Theory 114
Fisher s Exact Test 115
Relevance for Data Science 117
Further Reading 118
Table of Contents | vii
Multi-Arm Bandit Algorithm 119
Further Reading 122
Power and Sample Size 122
Sample Size 123
Further Reading 125
Summary 125
4. Regression and Prediction.............................................. 127
Simple Linear Regression 127
The Regression Equation 129
Fitted Values and Residuals 131
Least Squares 132
Prediction versus Explanation (Profiling) 133
Further Reading 134
Multiple Linear Regression 134
Example: King County Housing Data 135
Assessing the Model 136
Cross-Validation 138
Model Selection and Stepwise Regression 139
Weighted Regression 141
Prediction Using Regression 142
The Dangers of Extrapolation 143
Confidence and Prediction Intervals 143
Factor Variables in Regression 145
Dummy Variables Representation 145
Factor Variables with Many Levels 147
Ordered Factor Variables 149
Interpreting the Regression Equation 150
Correlated Predictors 150
Multicollinearity 151
Confounding Variables 152
Interactions and Main Effects 153
Testing the Assumptions: Regression Diagnostics 155
Outliers 156
Influential Values 158
Heteroskedasticity, Non-Normality and Correlated Errors 161
Partial Residual Plots and Nonlinearity 164
Polynomial and Spline Regression 166
Polynomial 167
Splines 168
Generalized Additive Models 170
Further Reading 172
viii I Table of Contents
Summary
172
5. Classification.......................................................... 173
Naive Bayes 174
Why Exact Bayesian Classification Is Impractical 175
The Naive Solution 176
Numeric Predictor Variables 178
Further Reading 178
Discriminant Analysis 179
Covariance Matrix 180
Fishers Linear Discriminant 180
A Simple Example 181
Further Reading 183
Logistic Regression 184
Logistic Response Function and Logit 184
Logistic Regression and the GLM 186
Generalized Linear Models 187
Predicted Values from Logistic Regression 188
Interpreting the Coefficients and Odds Ratios 188
Linear and Logistic Regression: Similarities and Differences 190
Assessing the Model 191
Further Reading 194
Evaluating Classification Models 194
Confusion Matrix 195
The Rare Class Problem 196
Precision, Recall, and Specificity 197
ROC Curve 198
AUC 200
Lift 201
Further Reading 202
Strategies for Imbalanced Data 203
Undersampling 204
Oversampling and Up/Down Weighting 204
Data Generation 205
Cost-Based Classification 206
Exploring the Predictions 206
Further Reading 208
Summary 208
6. Statistical Machine Learning............................................. 209
K-Nearest Neighbors 210
A Small Example: Predicting Loan Default 211
Table of Contents | ix
Distance Metrics 213
One Hot Encoder 214
Standardization (Normalization, Z-Scores) 215
Choosing K 217
KNN as a Feature Engine 218
Tree Models 219
A Simple Example 221
The Recursive Partitioning Algorithm 222
Measuring Homogeneity or Impurity 224
Stopping the Tree from Growing 225
Predicting a Continuous Value 227
How Trees Are Used 227
Further Reading 228
Bagging and the Random Forest 228
Bagging 230
Random Forest 230
Variable Importance 233
Hyperparameters 236
Boosting 237
The Boosting Algorithm 238
XGBoost 239
Regularization: Avoiding Overfitting 241
Hyperparameters and Cross-Validation 245
Summary 247
Unsupervised Learning......................................
Principal Components Analysis 250
A Simple Example 251
Computing the Principal Components 254
Interpreting Principal Components 254
Further Reading 257
K-Means Clustering 257
A Simple Example 258
K-Means Algorithm 260
Interpreting the Clusters 261
Selecting the Number of Clusters 263
Hierarchical Clustering 265
A Simple Example 266
The Dendrogram 266
The Agglomerative Algorithm 268
Measures of Dissimilarity 268
Model-Based Clustering 270
x | Table of Contents
Multivariate Normal Distribution 270
Mixtures of Normals 272
Selecting the Number of Clusters 274
Further Reading 276
Scaling and Categorical Variables 276
Scaling the Variables 277
Dominant Variables 278
Categorical Data and Gower s Distance 280
Problems with Clustering Mixed Data 283
Summary 284
Bibliography........................................................ 285
Index........................................................................ 287
Table of Contents
xi
|
any_adam_object | 1 |
author | Bruce, Peter C. 1953- Bruce, Andrew |
author_GND | (DE-588)1104275260 (DE-588)170744183 |
author_facet | Bruce, Peter C. 1953- Bruce, Andrew |
author_role | aut aut |
author_sort | Bruce, Peter C. 1953- |
author_variant | p c b pc pcb a b ab |
building | Verbundindex |
bvnumber | BV043955725 |
classification_rvk | SK 850 ST 530 |
ctrlnum | (OCoLC)990292891 (DE-599)BVBBV043955725 |
discipline | Informatik Mathematik |
edition | First edition |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02148nam a2200481 c 4500</leader><controlfield tag="001">BV043955725</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20181119 </controlfield><controlfield tag="007">t|</controlfield><controlfield tag="008">161212s2017 xx a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781491952962</subfield><subfield code="c">pbk.</subfield><subfield code="9">978-1-4919-5296-2</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)990292891</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV043955725</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-355</subfield><subfield code="a">DE-1028</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-1050</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-898</subfield><subfield code="a">DE-523</subfield><subfield code="a">DE-1046</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 850</subfield><subfield code="0">(DE-625)143263:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Bruce, Peter C.</subfield><subfield code="d">1953-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1104275260</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Practical statistics for data scientists</subfield><subfield code="b">50 essential concepts</subfield><subfield code="c">Peter Bruce and Andrew Bruce</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">First edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo</subfield><subfield code="b">O'Reilly</subfield><subfield code="c">2017</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xvi, 298 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Hier auch später erschienene, unveränderte Nachdrucke</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Statistik</subfield><subfield code="0">(DE-588)4056995-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="1" ind2="0"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2="1"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2="2"><subfield code="a">Statistik</subfield><subfield code="0">(DE-588)4056995-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2="3"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2=" "><subfield code="8">1\p</subfield><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Bruce, Andrew</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)170744183</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029364482&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="883" ind1="1" ind2=" "><subfield code="8">1\p</subfield><subfield code="a">cgwrk</subfield><subfield code="d">20201028</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#cgwrk</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-029364482</subfield></datafield></record></collection> |
id | DE-604.BV043955725 |
illustrated | Illustrated |
indexdate | 2024-12-20T17:49:42Z |
institution | BVB |
isbn | 9781491952962 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-029364482 |
oclc_num | 990292891 |
open_access_boolean | |
owner | DE-355 DE-BY-UBR DE-1028 DE-706 DE-1050 DE-11 DE-739 DE-20 DE-384 DE-898 DE-BY-UBR DE-523 DE-1046 |
owner_facet | DE-355 DE-BY-UBR DE-1028 DE-706 DE-1050 DE-11 DE-739 DE-20 DE-384 DE-898 DE-BY-UBR DE-523 DE-1046 |
physical | xvi, 298 Seiten Illustrationen, Diagramme |
publishDate | 2017 |
publishDateSearch | 2017 |
publishDateSort | 2017 |
publisher | O'Reilly |
record_format | marc |
spellingShingle | Bruce, Peter C. 1953- Bruce, Andrew Practical statistics for data scientists 50 essential concepts Data Science (DE-588)1140936166 gnd Statistik (DE-588)4056995-0 gnd Data Mining (DE-588)4428654-5 gnd Datenanalyse (DE-588)4123037-1 gnd Big Data (DE-588)4802620-7 gnd |
subject_GND | (DE-588)1140936166 (DE-588)4056995-0 (DE-588)4428654-5 (DE-588)4123037-1 (DE-588)4802620-7 |
title | Practical statistics for data scientists 50 essential concepts |
title_auth | Practical statistics for data scientists 50 essential concepts |
title_exact_search | Practical statistics for data scientists 50 essential concepts |
title_full | Practical statistics for data scientists 50 essential concepts Peter Bruce and Andrew Bruce |
title_fullStr | Practical statistics for data scientists 50 essential concepts Peter Bruce and Andrew Bruce |
title_full_unstemmed | Practical statistics for data scientists 50 essential concepts Peter Bruce and Andrew Bruce |
title_short | Practical statistics for data scientists |
title_sort | practical statistics for data scientists 50 essential concepts |
title_sub | 50 essential concepts |
topic | Data Science (DE-588)1140936166 gnd Statistik (DE-588)4056995-0 gnd Data Mining (DE-588)4428654-5 gnd Datenanalyse (DE-588)4123037-1 gnd Big Data (DE-588)4802620-7 gnd |
topic_facet | Data Science Statistik Data Mining Datenanalyse Big Data |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029364482&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT brucepeterc practicalstatisticsfordatascientists50essentialconcepts AT bruceandrew practicalstatisticsfordatascientists50essentialconcepts |