Data science at the command line:
Gespeichert in:
Beteilige Person: | |
---|---|
Format: | Buch |
Sprache: | Englisch |
Veröffentlicht: |
Beijing ; Boston ; Farnham
O'Reilly
August 2021
|
Ausgabe: | Second edition |
Schlagwörter: | |
Links: | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035217605&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035217605&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
Beschreibung: | Aus dem Impressum: "The author maintains an online version at https://github.com/jeroenjanssens/data-science-at-the-command-line." |
Umfang: | xxii, 257 Seiten Illustrationen, Diagramme |
ISBN: | 9781492087915 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV049878175 | ||
003 | DE-604 | ||
005 | 20241120 | ||
007 | t| | ||
008 | 240919s2021 xxua||| |||| 00||| eng d | ||
020 | |a 9781492087915 |c pbk |9 978-1-4920-8791-5 | ||
035 | |a (OCoLC)1466908111 | ||
035 | |a (DE-599)BVBBV049878175 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
044 | |a xxu |c XD-US | ||
049 | |a DE-384 | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
084 | |a XC 4011 |0 (DE-625)152514:12917 |2 rvk | ||
084 | |8 1\p |a 004 |2 23sdnb | ||
100 | 1 | |a Janssens, Jeroen |d 1983- |e Verfasser |0 (DE-588)1060854597 |4 aut | |
245 | 1 | 0 | |a Data science at the command line |c Jeroen Janssens |
250 | |a Second edition | ||
264 | 1 | |a Beijing ; Boston ; Farnham |b O'Reilly |c August 2021 | |
300 | |a xxii, 257 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Aus dem Impressum: "The author maintains an online version at https://github.com/jeroenjanssens/data-science-at-the-command-line." | ||
650 | 0 | 7 | |a Befehlszeile |0 (DE-588)7518108-3 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Big Data |0 (DE-588)4802620-7 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Datenanalyse |0 (DE-588)4123037-1 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data Science |0 (DE-588)1140936166 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Data Science |0 (DE-588)1140936166 |D s |
689 | 0 | 1 | |a Befehlszeile |0 (DE-588)7518108-3 |D s |
689 | 0 | |5 DE-604 | |
689 | 1 | 0 | |a Big Data |0 (DE-588)4802620-7 |D s |
689 | 1 | 1 | |a Datenanalyse |0 (DE-588)4123037-1 |D s |
689 | 1 | |5 DE-604 | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-1-492-08788-5 |
856 | 4 | 2 | |m Digitalisierung UB Augsburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035217605&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Augsburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035217605&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Klappentext |
883 | 1 | |8 1\p |a onx |d 20140820 |q DE-101 |u https://d-nb.info/provenance/plan#onx | |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-035217605 |
Datensatz im Suchindex
_version_ | 1820961176954077184 |
---|---|
adam_text |
Table of Contents Foreword. xiii Preface. xv 1. Introduction. 1 Data Science Is OSEMN Obtaining Data Scrubbing Data Exploring Data Modeling Data Interpreting Data Intermezzo Chapters What Is the Command Line? Why Data Science at the Command Line? The Command Line Is Agile The Command Line Is Augmenting The Command Line Is Scalable The Command Line Is Extensible The Command Line Is Ubiquitous Summary For Further Exploration 2 3 3 3 4 4 4 5 7 7 8 8 9 9 10 10 2. Getting Started. 11 Getting the Data Installing the Docker Image Essential Unix Concepts The Environment Executing a Command-Line Tool 11 12 13 14 15 vii
Five Types of Command-Line Tools Combining Command-Line Tools Redirecting Input and Output Working with Files and Directories Managing Output Help! Summary For Further Exploration ID 20 22 26 28 30 33 33 3. Obtaining Data. . Overview. ^ Copying Local Files to the Docker Container. 36 Downloading from the Internet. 37 Introducing curl. 37 Saving. 38 Other Protocols. 39 Following Redirects. 39 Decompressing Files. 41 Converting Microsoft Excel Spreadsheets to CSV 43 Querying Relational Databases 46 Calling Web APIs 47 Authentication 48 Streaming APIs 50 Summary 52 For Further Exploration 52 4. Creating Command-line Tools. Overview Converting One-Liners into Shell Scripts Step 1: Create a File Step 2: Give
Permission to Execute Step 3: Define a Shebang Step 4: Remove the Fixed Input Step 5: Add Arguments Step 6: Extend Your PATH Creating Command-Line Tools with Python and R Porting the Shell Script Processing Streaming Data from Standard Input Summary For Further Exploration ™ I Table of Contents 53 54 55 58 61 62 65 66 68 69 70 72 74 74
5. Scrubbing Data. 77 Overview Transformations, Transformations Everywhere Plain Text Filtering Lines Extracting Values Replacing and Deleting Values CSV Bodies and Headers and Columns, Oh My! Performing SQL Queries on CSV Extracting and Reordering Columns Filtering Rows Merging Columns Combining Multiple CSV Files Working with XML/HTML and JSON Summary For Further Exploration 78 78 81 81 86 88 90 90 93 94 95 96 99 101 104 105 6. Project Management with Make. 107 108 109 109 112 113 118 118 Overview Introducing Make Running Tasks Building, for Real Adding Dependencies Summary For Further Exploration 7. Exploring Data. 119 Overview Inspecting Data and Its Properties Header or Not, Here I Come Inspect All the Data Feature Names and Data Types Unique Identifiers, Continuous Variables, and Factors Computing Descriptive Statistics Column Statistics R One-Liners on the Shell Creating Visualizations Displaying Images from the Command Line Plotting in a Rush Creating Bar Charts Creating Histograms 120 120 120 121 122 124 126 126 129 133 133 138 140 142 Table of Contents | ix
143 144 146 147 149 150 152 152 152 Creating Density Plots Happy Little Accidents Creating Scatter Plots Creating Trend Lines Creating Box Plots Adding Labels Going Beyond Basic Plots Summary For Further Exploration rardiiel ripeiines. . . Overview Serial Processing Looping Over Numbers Looping Over Lines Looping Over Files Parallel Processing Introducing GNU Parallel Specifying Input Controlling the Number of Concurrent Jobs Logging and Output Creating Parallel Tools Distributed Processing Get List of Running AWS EC2 Instances Running Commands on Remote Machines Distributing Local Data Among Remote Machines Processing Files on Remote Machines Summary For Further Exploration Modeling Data Overview More Wine, Please! Dimensionality Reduction with Tapkee Introducing Tapkee Linear and Nonlinear Mappings Regression with Vowpal Wabbit Preparing the Data Training the Model Testing the Model Classification with SciKit-Learn Laboratory Preparing the Data X I Table of Contents . 153 154 154 155 156 157 158 160 162 164 164 166 167 167 169 170 171 174 175 . . . 177 178 178 182 183 183 187 187 188 190 193 193
Running the Experiment Parsing the Results Summary For Further Exploration 194 195 197 198 10. Polyglot Data Science. 199 Overview Jupyter Python R RStudio Apache Spark Summary For Further Exploration 200 200 203 205 207 208 210 211 11. Conclusion. 213 Let’s Recap Three Pieces of Advice Be Patient Be Creative Be Practical Where to Go from Here The Command Line Shell Programming Python, R, and SQL APIs Machine Learning Getting in Touch 213 214 214 215 215 215 216 216 216 216 217 217 List of Command-Line Tools. 219 Index. 249 Table of Contents | xi
O'REILLY· Data Science at the Command Line This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools—useful whether you work with Windows, macOS, or Linux. You'll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you're comfortable processing data with Python or R, you'll learn how to greatly improve your data science workflow by leveraging the command line's power. This book is ideal for data scientists, analysts, engineers, system administrators, and researchers. • Obtain data from websites, APIs, databases, and spreadsheets • Perform scrub operations on text, CSV, HTML, XML, and JSON files • Explore data, compute descriptive statistics, and create visualizations • Manage your data science workflow • Create your own tools from one-liners and existing Python or R code • Parallelize and distribute data-intensive pipelines • Model data with dimensionality reduction, regression, and classification algorithms • Leverage the command line from Python, Jupyter, R, RStudio, and Apache Spark "The first edition of Data Science at the Command Line was one of the most comprehensive and clear references when I was a novice in the art, and now with the second edition, Vm again learning new tools and applications from it." -Dan
Nguyen Data Scientist, former News Application Developer at ProPublica, and former Lorry I. Lokey Visiting Professor In Professional Journalism at Stanford University Jeroen Janssens runs Data Science Workshops, a training and coaching firm that organizes in-company courses, inspiration sessions, and hackathons, both in person and online. Previously, he was an assistant professor at Jheronimus Academy of Data Science and a data scientist at Elsevier in Amsterdam and various startups in New York City. Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University. |
any_adam_object | 1 |
author | Janssens, Jeroen 1983- |
author_GND | (DE-588)1060854597 |
author_facet | Janssens, Jeroen 1983- |
author_role | aut |
author_sort | Janssens, Jeroen 1983- |
author_variant | j j jj |
building | Verbundindex |
bvnumber | BV049878175 |
classification_rvk | ST 530 XC 4011 |
ctrlnum | (OCoLC)1466908111 (DE-599)BVBBV049878175 |
discipline | Informatik Medizin |
edition | Second edition |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV049878175</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20241120</controlfield><controlfield tag="007">t|</controlfield><controlfield tag="008">240919s2021 xxua||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781492087915</subfield><subfield code="c">pbk</subfield><subfield code="9">978-1-4920-8791-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1466908111</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV049878175</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">XD-US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-384</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">XC 4011</subfield><subfield code="0">(DE-625)152514:12917</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="8">1\p</subfield><subfield code="a">004</subfield><subfield code="2">23sdnb</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Janssens, Jeroen</subfield><subfield code="d">1983-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1060854597</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data science at the command line</subfield><subfield code="c">Jeroen Janssens</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">Second edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Beijing ; Boston ; Farnham</subfield><subfield code="b">O'Reilly</subfield><subfield code="c">August 2021</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxii, 257 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Aus dem Impressum: "The author maintains an online version at https://github.com/jeroenjanssens/data-science-at-the-command-line."</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Befehlszeile</subfield><subfield code="0">(DE-588)7518108-3</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Befehlszeile</subfield><subfield code="0">(DE-588)7518108-3</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="1" ind2="0"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2="1"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-492-08788-5</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Augsburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035217605&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Augsburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035217605&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield><datafield tag="883" ind1="1" ind2=" "><subfield code="8">1\p</subfield><subfield code="a">onx</subfield><subfield code="d">20140820</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#onx</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-035217605</subfield></datafield></record></collection> |
id | DE-604.BV049878175 |
illustrated | Illustrated |
indexdate | 2025-01-11T13:58:25Z |
institution | BVB |
isbn | 9781492087915 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-035217605 |
oclc_num | 1466908111 |
open_access_boolean | |
owner | DE-384 |
owner_facet | DE-384 |
physical | xxii, 257 Seiten Illustrationen, Diagramme |
publishDate | 2021 |
publishDateSearch | 2021 |
publishDateSort | 2021 |
publisher | O'Reilly |
record_format | marc |
spelling | Janssens, Jeroen 1983- Verfasser (DE-588)1060854597 aut Data science at the command line Jeroen Janssens Second edition Beijing ; Boston ; Farnham O'Reilly August 2021 xxii, 257 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier Aus dem Impressum: "The author maintains an online version at https://github.com/jeroenjanssens/data-science-at-the-command-line." Befehlszeile (DE-588)7518108-3 gnd rswk-swf Big Data (DE-588)4802620-7 gnd rswk-swf Datenanalyse (DE-588)4123037-1 gnd rswk-swf Data Science (DE-588)1140936166 gnd rswk-swf Data Science (DE-588)1140936166 s Befehlszeile (DE-588)7518108-3 s DE-604 Big Data (DE-588)4802620-7 s Datenanalyse (DE-588)4123037-1 s Erscheint auch als Online-Ausgabe 978-1-492-08788-5 Digitalisierung UB Augsburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035217605&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung UB Augsburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035217605&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Klappentext 1\p onx 20140820 DE-101 https://d-nb.info/provenance/plan#onx |
spellingShingle | Janssens, Jeroen 1983- Data science at the command line Befehlszeile (DE-588)7518108-3 gnd Big Data (DE-588)4802620-7 gnd Datenanalyse (DE-588)4123037-1 gnd Data Science (DE-588)1140936166 gnd |
subject_GND | (DE-588)7518108-3 (DE-588)4802620-7 (DE-588)4123037-1 (DE-588)1140936166 |
title | Data science at the command line |
title_auth | Data science at the command line |
title_exact_search | Data science at the command line |
title_full | Data science at the command line Jeroen Janssens |
title_fullStr | Data science at the command line Jeroen Janssens |
title_full_unstemmed | Data science at the command line Jeroen Janssens |
title_short | Data science at the command line |
title_sort | data science at the command line |
topic | Befehlszeile (DE-588)7518108-3 gnd Big Data (DE-588)4802620-7 gnd Datenanalyse (DE-588)4123037-1 gnd Data Science (DE-588)1140936166 gnd |
topic_facet | Befehlszeile Big Data Datenanalyse Data Science |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035217605&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035217605&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT janssensjeroen datascienceatthecommandline |