Principles and methods for data science:
Gespeichert in:
Weitere beteiligte Personen: | , |
---|---|
Format: | Buch |
Sprache: | Englisch |
Veröffentlicht: |
Amsterdam
North-Holland
[2020]
|
Schriftenreihe: | Handbook of statistics
volume 43 |
Schlagwörter: | |
Links: | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032201484&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
Umfang: | xvii, 478 Seiten Illustrationen, Diagramme 24 cm |
ISBN: | 9780444642110 0444642110 |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV046792552 | ||
003 | DE-604 | ||
005 | 20200721 | ||
007 | t| | ||
008 | 200703s2020 xx a||| |||| 00||| eng d | ||
020 | |a 9780444642110 |9 978-0-444-64211-0 | ||
020 | |a 0444642110 |9 0-444-64211-0 | ||
035 | |a (OCoLC)1164659551 | ||
035 | |a (DE-599)BVBBV046792552 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-739 |a DE-703 |a DE-473 |a DE-384 |a DE-210 | ||
084 | |a SK 840 |0 (DE-625)143261: |2 rvk | ||
245 | 1 | 0 | |a Principles and methods for data science |c edited by Arni S.R. Srinivasa Rao, C.R. Rao |
264 | 1 | |a Amsterdam |b North-Holland |c [2020] | |
264 | 4 | |c © 2020 | |
300 | |a xvii, 478 Seiten |b Illustrationen, Diagramme |c 24 cm | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Handbook of statistics |v volume 43 | |
650 | 0 | 7 | |a Methode |0 (DE-588)4038971-6 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Prinzip |0 (DE-588)4175725-7 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data Science |0 (DE-588)1140936166 |2 gnd |9 rswk-swf |
655 | 7 | |0 (DE-588)4143413-4 |a Aufsatzsammlung |2 gnd-content | |
689 | 0 | 0 | |a Data Science |0 (DE-588)1140936166 |D s |
689 | 0 | 1 | |a Prinzip |0 (DE-588)4175725-7 |D s |
689 | 0 | 2 | |a Methode |0 (DE-588)4038971-6 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Rao, Arni S. R. Srinivasa |0 (DE-588)1143256220 |4 edt | |
700 | 1 | |a Rao, Calyampudi Radhakrishna |d 1920-2023 |0 (DE-588)119285924 |4 edt | |
830 | 0 | |a Handbook of statistics |v volume 43 |w (DE-604)BV000002510 |9 43 | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032201484&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-032201484 |
Datensatz im Suchindex
_version_ | 1819321823051907072 |
---|---|
adam_text | Contents Contributors Preface 1. xiii xv Markov chain Monte Carlo methods: Theory and practice 1 David A. Spade 1 2 3 4 5 6 Introduction Introduction to Bayesian statistical analysis 1 3 2.1 2.2 2.3 4 7 9 Noninformative prior distributions Informative prior distributions Bayesian estimation Markov chain Monte Carlo background 12 3.1 3.2 13 14 Discrete-state Markov chains General state space Markov chain theory Common MCMC algorithms 16 4.1 4.2 4.3 4.4 4.5 16 18 22 30 31 The Metropolis-Hastings algorithm Multivariate Metropolis-Hastings The Gibbs sampler Slice sampling Reversible jump MCMC Markov chain Monte Carlo in practice 35 5.1 5.2 5.3 5.4 35 36 38 39 MCMC in regression models Random effects models Bayesian generalized linear models Hierarchical models Assessing Markov chain behavior 40 6.1 6.2 6.3 6.4 40 43 49 61 Using the theory to bound the mixing time Output-based convergence diagnostics Using auxiliary simulations to bound mixing time Examining sampling frequency 7 Conclusion References Further reading 63 64 66 v
Contents VI 2. An information and statistical analysis pipeline for microbial metagenomicsequencing aata 67 Shinji Nakaoka and Keisuke H. Ota 1 2 Introduction A brief overview of shotgun metagenomic sequencing analysis 67 68 2.1 2.2 68 2.3 2.4 2.5 3 3. Sequence assembly and contig binning Annotation of taxonomy, protein, metabolic, and biological functions Statistical analysis and machine learning Reconstruction of pseudo-dynamics and mathematical modeling Construction of analysis pipeline with reproducibility and portability 68 69 69 70 Computational tools and resources 71 3.1 3.2 3.3 71 74 Tools and software Public resources and databases Do-It-Yourself information analysis pipeline for metagenomic sequences 77 4 Notes Acknowledgments References 77 79 79 Machine learning algorithms, applications, and practices in datascience 81 Kalidas Yeturu 1 2 3 4 Introduction Supervised methods 83 85 2.1 Data sets 2.2 Linear regression 2.3 Logistic regression 2.4 Support vector machine—Linearkernel 2.5 Decision tree Outline of decision tree 2.6 Ensemble methods 2.7 Bias-variance trade off 2.8 Cross validation and model selection 2.9 Multiclass and multivariate scenarios 2.10 Regularization 2.11 Metrics in machine learning 86 87 92 95 97 98 98 109 111 118 122 124 Practical considerations in model building 131 3.1 3.2 3.3 3.4 131 131 132 133 Noise in the data Missing values Class imbalance Model maintenance Unsupervised methods 133 4.1 4.2 4.3 134 137 137 Clustering Comparison of clustering algorithms over data sets Matrix factorization
Contents 4. vii 4.4 Principal component analysis 4.5 Understanding the SVD algorithm 4.6 Data distributions and visualization 5 Graphical methods 5.1 Naive Bayes algorithm 5.2 Expectation maximization Example of email spam and nonspam problem—Posing as graphical model 5.3 Markovian networks Topic modeling of audio data Topic modeling of image data 6 Deep learning 6.1 Neural network 6.2 Encoder 6.3 Convolutional neural network 6.4 Recurrent neural network 6.5 Generative adversarial network 7 Optimization 8 Artificial intelligence 8.1 Notion of state space and search 8.2 State space—Search algorithms 8.3 Planning algorithms 8.4 Formal logic 8.5 Resolution by refutation method 8.6 Al framework adaptability issues 9 Applications and laboratory exercises 9.1 Automatic differentiation 9.2 Machine learning exercises 9.3 Clustering exercises 9.4 Graphical model exercises 9.5 Data visualization exercises 9.6 Deep learning exercises References 156 159 162 163 163 164 169 173 178 183 184 185 186 188 191 193 195 196 197 197 199 199 199 200 201 201 Bayesian model selection for high-dimensional data 207 Naveert Naidu Narisetty 1 Introduction 2 Classical variable selection methods 2.1 Best subset selection 2.2 Stepwise selection methods 2.3 Criterion functions 3 The penalization framework 3.1 LASSO and generalizations 3.2 Nonconvex penalization 3.3 Variable screening 4 The Bayesian framework for model selection 139 141 147 151 152 155 208 209 209 210 211 211 212 215 216 217
viii Contents 5 Spike and slab priors 5.1 Point mass spike prior 5.2 Continuous spike priors 5.3 Spike and slab LASSO 6 Continuous shrinkage priors 6.1 Bayesian LASSO 6.2 Horseshoe prior 6.3 Global-local shrinkage priors 6.4 Regularization of Bayesian priors 6.5 Prior elicitation—Hyperparameter selection 7 Computation 7.1 Direct exploration of the model space 7.2 Gibbs sampling 7.3 EM algorithm 7.4 Approximate algorithms 8 Theoretical properties 8.1 Consistency properties of the posterior mode 8.2 Posterior concentration 8.3 Pairwise model comparison consistency 8.4 Strong model selection consistency 9 Implementation 10 An example 11 Discussion Acknowledgments References 5. Competing risks: Aims and methods 220 220 223 224 225 226 226 227 228 228 229 230 231 233 234 236 237 237 238 238 239 239 243 244 244 249 Ronald B. Ceskus 1 2 3 4 5 Introduction Research aim: Explanation vs prediction 2.1 In-hospital infection and discharge 2.2 Causes of death after HIV infection 2.3 AIDS and pre-AIDS death Basic quantities and their estimators 3.1 Definitions and notation 3.2 Data setup 3.3 Nonparametric estimation 3.4 Standard errors and confidence intervals 3.5 Regression models 3.6 Software Time-varying covariables and the subdistribution hazard 4.1 Overall survival 4.2 Spectrum in causes of death 4.3 Summary Confusion 5.1 What is the appropriate analysis? 5.2 Is a marginal analysis feasible in practice? 250 250 251 253 255 258 258 260 262 270 271 272 272 273 274 280 281 282 283
Contents IX 5.3 5.4 5.5 If we fit a Cox model, do we need to assume that the competing risks are independent? Is a regression model for thesubdistribution hazard (such as a Fine and Gray model) the only truly competing risks regression model? Is the subdistribution hazarda quantity that can be given an interpretation? Acknowledgment References 6. High-dimensional statistical inference: Theoretical development to data analytics 284 285 286 286 289 Deepak Nag Ayyala 1 Introduction 289 2 Mean vector testing 291 2.1 2.2 2.3 2.4 2.5 293 301 304 307 309 3 4 Independent observations Projection-based tests Random projections Other approaches Dependent observations Covariance matrix 313 3.1 3.2 Estimation Hypothesis testing 314 317 Discrete multivariate models 319 4.1 4.2 4.3 320 323 326 Multinomial distribution Compound multinomial models Other distributions 5 Conclusion References 7. 284 Big data challenges in genomics 330 331 337 Hongyan Xu 1 Introduction 2 Next-generation sequencing 3 Data integration 4 High dimensionality 5 Computing infrastructure 6 Dimension reduction 7 Data smoothing 8 Data security 9 Example 10 Conclusion References 337 338 339 342 343 344 345 345 346 346 346
x 8. Contents Analysis of microarray gene expression data using information theory ana stochastic algorithm 349 Narayan Beh era 1 2 3 4 Introduction 350 1.1 352 Gene clustering algorithms Methodology 356 2.1 2.2 2.3 356 356 357 Discretization Genetic algorithm The evolutionary clustering algorithm Results 361 3.1 3.2 361 362 Synthetic data Real data Section A: Studies on gastric cancer dataset (GDS1210) 4.1 4.2 4.3 4.4 Comparison of the algorithms based on theclassification accuracy of samples Analysis of classificatory genes Comparison of algorithms based on the representative genes Study of gene distribution in clusters 5 Section B: A brief study on colon cancer dataset 5.1 Comparison of the algorithms based on classification 6 Section C: A brief study on brain cancer (medulloblastoma metastasis) dataset (GDS232) accuracy 6.1 Comparison of the algorithms based on theclassification accuracy 7 Conclusion Appendices Appendix A: A brief overview of the OCDD algorithm Appendix B: Smoothing and Chi-square test method References Further reading 9. Human life expectancy is computed from an incomplete sets of data: Modeling and analysis Arni S.R. Srinivasa Rao and James R. Carey 1 Introduction 2 Life expectancy of newly born babies 3 Numerical examples Acknowledgments Appendix. Analysis of the life expectancy function References 363 363 366 367 369 370 370 372 372 373 375 375 376 377 378 379 379 383 385 387 387 389
Contents 10. Support vector machines: A robust prediction method with applications in bioinformatics xi 391 Arnout Van Messern 1 2 Introduction Mathematical prerequisites 2.1 Topology 2.2 Probability and measure theory 2.3 Functional and convex analysis 2.4 Derivatives in normed spaces 2.5 (t) Convex programs, Lagrange multipliers and duality 3 An introduction to supportvector machines 3.1 (t) The generalized portrait algorithm 3.2 (+) The hard margin SVM 3.3 (t) The soft margin SVM 3.4 (t) Empirical risk minimization and support vector machines 3.5 Kernels and the reproducing kernel Hilbert space 3.6 (t) Loss functions 3.7 Bouligand-derivatives of loss functions 3.8 Shifting the loss function 4 (+) An introduction to robustness andinfluence functions 5 Properties of SVMs 5.1 Existence, uniqueness and consistency ofSVMs 5.2 Robustness of SVMs 6 (t) Applications 6.1 Predicting blood pressure through BMI in the presence of outliers 6.2 Breast cancer distant metastasis through gene expression 6.3 Splice site detection References Index 392 393 393 394 396 398 401 404 404 408 410 413 416 418 423 428 431 434 434 437 452 454 455 460 462 467
|
any_adam_object | 1 |
author2 | Rao, Arni S. R. Srinivasa Rao, Calyampudi Radhakrishna 1920-2023 |
author2_role | edt edt |
author2_variant | a s r s r asrs asrsr c r r cr crr |
author_GND | (DE-588)1143256220 (DE-588)119285924 |
author_facet | Rao, Arni S. R. Srinivasa Rao, Calyampudi Radhakrishna 1920-2023 |
building | Verbundindex |
bvnumber | BV046792552 |
classification_rvk | SK 840 |
ctrlnum | (OCoLC)1164659551 (DE-599)BVBBV046792552 |
discipline | Mathematik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01832nam a2200433 cb4500</leader><controlfield tag="001">BV046792552</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20200721 </controlfield><controlfield tag="007">t|</controlfield><controlfield tag="008">200703s2020 xx a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780444642110</subfield><subfield code="9">978-0-444-64211-0</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0444642110</subfield><subfield code="9">0-444-64211-0</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1164659551</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV046792552</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-739</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-473</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-210</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 840</subfield><subfield code="0">(DE-625)143261:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Principles and methods for data science</subfield><subfield code="c">edited by Arni S.R. Srinivasa Rao, C.R. Rao</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Amsterdam</subfield><subfield code="b">North-Holland</subfield><subfield code="c">[2020]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2020</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xvii, 478 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield><subfield code="c">24 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Handbook of statistics</subfield><subfield code="v">volume 43</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Methode</subfield><subfield code="0">(DE-588)4038971-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Prinzip</subfield><subfield code="0">(DE-588)4175725-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4143413-4</subfield><subfield code="a">Aufsatzsammlung</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Prinzip</subfield><subfield code="0">(DE-588)4175725-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Methode</subfield><subfield code="0">(DE-588)4038971-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Rao, Arni S. R. Srinivasa</subfield><subfield code="0">(DE-588)1143256220</subfield><subfield code="4">edt</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Rao, Calyampudi Radhakrishna</subfield><subfield code="d">1920-2023</subfield><subfield code="0">(DE-588)119285924</subfield><subfield code="4">edt</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Handbook of statistics</subfield><subfield code="v">volume 43</subfield><subfield code="w">(DE-604)BV000002510</subfield><subfield code="9">43</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032201484&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-032201484</subfield></datafield></record></collection> |
genre | (DE-588)4143413-4 Aufsatzsammlung gnd-content |
genre_facet | Aufsatzsammlung |
id | DE-604.BV046792552 |
illustrated | Illustrated |
indexdate | 2024-12-20T19:01:08Z |
institution | BVB |
isbn | 9780444642110 0444642110 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-032201484 |
oclc_num | 1164659551 |
open_access_boolean | |
owner | DE-739 DE-703 DE-473 DE-BY-UBG DE-384 DE-210 |
owner_facet | DE-739 DE-703 DE-473 DE-BY-UBG DE-384 DE-210 |
physical | xvii, 478 Seiten Illustrationen, Diagramme 24 cm |
publishDate | 2020 |
publishDateSearch | 2020 |
publishDateSort | 2020 |
publisher | North-Holland |
record_format | marc |
series | Handbook of statistics |
series2 | Handbook of statistics |
spellingShingle | Principles and methods for data science Handbook of statistics Methode (DE-588)4038971-6 gnd Prinzip (DE-588)4175725-7 gnd Data Science (DE-588)1140936166 gnd |
subject_GND | (DE-588)4038971-6 (DE-588)4175725-7 (DE-588)1140936166 (DE-588)4143413-4 |
title | Principles and methods for data science |
title_auth | Principles and methods for data science |
title_exact_search | Principles and methods for data science |
title_full | Principles and methods for data science edited by Arni S.R. Srinivasa Rao, C.R. Rao |
title_fullStr | Principles and methods for data science edited by Arni S.R. Srinivasa Rao, C.R. Rao |
title_full_unstemmed | Principles and methods for data science edited by Arni S.R. Srinivasa Rao, C.R. Rao |
title_short | Principles and methods for data science |
title_sort | principles and methods for data science |
topic | Methode (DE-588)4038971-6 gnd Prinzip (DE-588)4175725-7 gnd Data Science (DE-588)1140936166 gnd |
topic_facet | Methode Prinzip Data Science Aufsatzsammlung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032201484&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV000002510 |
work_keys_str_mv | AT raoarnisrsrinivasa principlesandmethodsfordatascience AT raocalyampudiradhakrishna principlesandmethodsfordatascience |