Verfügbarkeit: Cleaning data for effective data science

Cleaning data for effective data science: doing the other 80% of the work with Python, R, and command-line tools

A comprehensive guide for data scientists to master effective data cleaning tools and techniques Key Features Master data cleaning techniques in a language-agnostic manner Learn from intriguing hands-on examples from numerous domains, such as biology, weather data, demographics, physics, time series...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Beteilige Person:	Mertz, David (VerfasserIn)
Format:	Elektronisch E-Book
Sprache:	Englisch
Veröffentlicht:	[Erscheinungsort nicht ermittelbar] Packt Publishing Limited 2021
Schlagwörter:	Computational biology > Methods Database management Data integrity Python (Computer program language) R (Computer program language) Computational Biology > methods Data Analysis Data Accuracy Python (Programming language) Bases de données ; Gestion Intégrité des données Python (Langage de programmation) R (Langage de programmation) Qualité des données Database design & theory Data capture & analysis Mathematical theory of computation Machine learning Information architecture Computers ; Data Processing Computers ; Machine Theory Computers ; Data Modeling & Design Computational biology Fulltext Internet Resources Methods (Music)
Links:	https://learning.oreilly.com/library/view/-/9781801071291/?ar
Zusammenfassung:	A comprehensive guide for data scientists to master effective data cleaning tools and techniques Key Features Master data cleaning techniques in a language-agnostic manner Learn from intriguing hands-on examples from numerous domains, such as biology, weather data, demographics, physics, time series, and image processing Work with detailed, commented, well-tested code samples in Python and R Book Description It is something of a truism in data science, data analysis, or machine learning that most of the effort needed to achieve your actual purpose lies in cleaning your data. Written in David's signature friendly and humorous style, this book discusses in detail the essential steps performed in every production data science or data analysis pipeline and prepares you for data visualization and modeling results. The book dives into the practical application of tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. It also offers long-form exercises at the end of each chapter to practice the skills acquired. You will begin by looking at data ingestion of data formats such as JSON, CSV, SQL RDBMSes, HDF5, NoSQL databases, files in image formats, and binary serialized data structures. Further, the book provides numerous example data sets and data files, which are available for download and independent exploration. Moving on from formats, you will impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals. By the end of this book, you will have acquired a firm understanding of the data cleaning process necessary to perform real-world data science and machine learning tasks. What you will learn How to think carefully about your data and ask the right questions Identify problem data pertaining to individual data points Detect problem data in the systematic "shape" of the data Remediate data integrity and hygiene problems Prepare data for analytic and machine learning tasks Impute values into missing or unreliable data Generate synthetic features that are more amenable to data science, data analysis, or visualization goals. Who this book is for This book is designed to benefit software developers, data scientists, aspiring data scientists, and students who are interested in data analysis or scientific computing. Basic familiarity with statistics, general concepts in machine learning,...
Umfang:	1 Online-Ressource
ISBN:	9781801074407 1801074402 9781801071291

Internformat

MARC


LEADER	00000cam a22000002 4500
001	ZDB-30-ORH-063078791
003	DE-627-1
005	20240228121336.0
007	cr uuu---uuuuu
008	210427s2021 xx \|\|\|\|\|o 00\| \|\|eng c
020			\|a 9781801074407 \|c electronic bk. \|9 978-1-80107-440-7
020			\|a 1801074402 \|c electronic bk. \|9 1-80107-440-2
020			\|a 9781801071291 \|9 978-1-80107-129-1
035			\|a (DE-627-1)063078791
035			\|a (DE-599)KEP063078791
035			\|a (ORHE)9781801071291
035			\|a (DE-627-1)063078791
040			\|a DE-627 \|b ger \|c DE-627 \|e rda
041			\|a eng
082	0		\|a 005.7 \|2 23
100	1		\|a Mertz, David \|e VerfasserIn \|4 aut
245	1	0	\|a Cleaning data for effective data science \|b doing the other 80% of the work with Python, R, and command-line tools \|c David Mertz
264		1	\|a [Erscheinungsort nicht ermittelbar] \|b Packt Publishing Limited \|c 2021
300			\|a 1 Online-Ressource
336			\|a Text \|b txt \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
520			\|a A comprehensive guide for data scientists to master effective data cleaning tools and techniques Key Features Master data cleaning techniques in a language-agnostic manner Learn from intriguing hands-on examples from numerous domains, such as biology, weather data, demographics, physics, time series, and image processing Work with detailed, commented, well-tested code samples in Python and R Book Description It is something of a truism in data science, data analysis, or machine learning that most of the effort needed to achieve your actual purpose lies in cleaning your data. Written in David's signature friendly and humorous style, this book discusses in detail the essential steps performed in every production data science or data analysis pipeline and prepares you for data visualization and modeling results. The book dives into the practical application of tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. It also offers long-form exercises at the end of each chapter to practice the skills acquired. You will begin by looking at data ingestion of data formats such as JSON, CSV, SQL RDBMSes, HDF5, NoSQL databases, files in image formats, and binary serialized data structures. Further, the book provides numerous example data sets and data files, which are available for download and independent exploration. Moving on from formats, you will impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals. By the end of this book, you will have acquired a firm understanding of the data cleaning process necessary to perform real-world data science and machine learning tasks. What you will learn How to think carefully about your data and ask the right questions Identify problem data pertaining to individual data points Detect problem data in the systematic "shape" of the data Remediate data integrity and hygiene problems Prepare data for analytic and machine learning tasks Impute values into missing or unreliable data Generate synthetic features that are more amenable to data science, data analysis, or visualization goals. Who this book is for This book is designed to benefit software developers, data scientists, aspiring data scientists, and students who are interested in data analysis or scientific computing. Basic familiarity with statistics, general concepts in machine learning,...
650		0	\|a Computational biology \|x Methods
650		0	\|a Database management
650		0	\|a Data integrity
650		0	\|a Python (Computer program language)
650		0	\|a R (Computer program language)
650		2	\|a Computational Biology \|x methods
650		2	\|a Data Analysis
650		2	\|a Data Accuracy
650		4	\|a Python (Programming language)
650		4	\|a Bases de données ; Gestion
650		4	\|a Intégrité des données
650		4	\|a Python (Langage de programmation)
650		4	\|a R (Langage de programmation)
650		4	\|a Qualité des données
650		4	\|a Database design & theory
650		4	\|a Data capture & analysis
650		4	\|a Mathematical theory of computation
650		4	\|a Machine learning
650		4	\|a Information architecture
650		4	\|a Computers ; Data Processing
650		4	\|a Computers ; Machine Theory
650		4	\|a Computers ; Data Modeling & Design
650		4	\|a Computational biology
650		4	\|a Data integrity
650		4	\|a Database management
650		4	\|a Python (Computer program language)
650		4	\|a R (Computer program language)
650		4	\|a Fulltext
650		4	\|a Internet Resources
650		4	\|a Methods (Music)
776	1		\|z 1801071292
776	0	8	\|i Erscheint auch als \|n Druck-Ausgabe \|z 1801071292
966	4	0	\|l DE-91 \|p ZDB-30-ORH \|q TUM_PDA_ORH \|u https://learning.oreilly.com/library/view/-/9781801071291/?ar \|m X:ORHE \|x Aggregator \|z lizenzpflichtig \|3 Volltext
912			\|a ZDB-30-ORH
912			\|a ZDB-30-ORH
951			\|a BO
912			\|a ZDB-30-ORH
049			\|a DE-91

Datensatz im Suchindex

DE-BY-TUM_katkey	ZDB-30-ORH-063078791
_version_	1821494834206081025
adam_text
any_adam_object
author	Mertz, David
author_facet	Mertz, David
author_role	aut
author_sort	Mertz, David
author_variant	d m dm
building	Verbundindex
bvnumber	localTUM
collection	ZDB-30-ORH
ctrlnum	(DE-627-1)063078791 (DE-599)KEP063078791 (ORHE)9781801071291
dewey-full	005.7
dewey-hundreds	000 - Computer science, information, general works
dewey-ones	005 - Computer programming, programs, data, security
dewey-raw	005.7
dewey-search	005.7
dewey-sort	15.7
dewey-tens	000 - Computer science, information, general works
discipline	Informatik
format	Electronic eBook
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>05031cam a22007332 4500</leader><controlfield tag="001">ZDB-30-ORH-063078791</controlfield><controlfield tag="003">DE-627-1</controlfield><controlfield tag="005">20240228121336.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">210427s2021 xx \|\|\|\|\|o 00\| \|\|eng c</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781801074407</subfield><subfield code="c">electronic bk.</subfield><subfield code="9">978-1-80107-440-7</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1801074402</subfield><subfield code="c">electronic bk.</subfield><subfield code="9">1-80107-440-2</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781801071291</subfield><subfield code="9">978-1-80107-129-1</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627-1)063078791</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)KEP063078791</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ORHE)9781801071291</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627-1)063078791</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">005.7</subfield><subfield code="2">23</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Mertz, David</subfield><subfield code="e">VerfasserIn</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Cleaning data for effective data science</subfield><subfield code="b">doing the other 80% of the work with Python, R, and command-line tools</subfield><subfield code="c">David Mertz</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">[Erscheinungsort nicht ermittelbar]</subfield><subfield code="b">Packt Publishing Limited</subfield><subfield code="c">2021</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 Online-Ressource</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">A comprehensive guide for data scientists to master effective data cleaning tools and techniques Key Features Master data cleaning techniques in a language-agnostic manner Learn from intriguing hands-on examples from numerous domains, such as biology, weather data, demographics, physics, time series, and image processing Work with detailed, commented, well-tested code samples in Python and R Book Description It is something of a truism in data science, data analysis, or machine learning that most of the effort needed to achieve your actual purpose lies in cleaning your data. Written in David's signature friendly and humorous style, this book discusses in detail the essential steps performed in every production data science or data analysis pipeline and prepares you for data visualization and modeling results. The book dives into the practical application of tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. It also offers long-form exercises at the end of each chapter to practice the skills acquired. You will begin by looking at data ingestion of data formats such as JSON, CSV, SQL RDBMSes, HDF5, NoSQL databases, files in image formats, and binary serialized data structures. Further, the book provides numerous example data sets and data files, which are available for download and independent exploration. Moving on from formats, you will impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals. By the end of this book, you will have acquired a firm understanding of the data cleaning process necessary to perform real-world data science and machine learning tasks. What you will learn How to think carefully about your data and ask the right questions Identify problem data pertaining to individual data points Detect problem data in the systematic "shape" of the data Remediate data integrity and hygiene problems Prepare data for analytic and machine learning tasks Impute values into missing or unreliable data Generate synthetic features that are more amenable to data science, data analysis, or visualization goals. Who this book is for This book is designed to benefit software developers, data scientists, aspiring data scientists, and students who are interested in data analysis or scientific computing. Basic familiarity with statistics, general concepts in machine learning,...</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Computational biology</subfield><subfield code="x">Methods</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Database management</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Data integrity</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Python (Computer program language)</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">R (Computer program language)</subfield></datafield><datafield tag="650" ind1=" " ind2="2"><subfield code="a">Computational Biology</subfield><subfield code="x">methods</subfield></datafield><datafield tag="650" ind1=" " ind2="2"><subfield code="a">Data Analysis</subfield></datafield><datafield tag="650" ind1=" " ind2="2"><subfield code="a">Data Accuracy</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Python (Programming language)</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Bases de données ; Gestion</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Intégrité des données</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Python (Langage de programmation)</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">R (Langage de programmation)</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Qualité des données</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Database design & theory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data capture & analysis</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mathematical theory of computation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Machine learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Information architecture</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computers ; Data Processing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computers ; Machine Theory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computers ; Data Modeling & Design</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computational biology</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data integrity</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Database management</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Python (Computer program language)</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">R (Computer program language)</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Fulltext</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Internet Resources</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Methods (Music)</subfield></datafield><datafield tag="776" ind1="1" ind2=" "><subfield code="z">1801071292</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Druck-Ausgabe</subfield><subfield code="z">1801071292</subfield></datafield><datafield tag="966" ind1="4" ind2="0"><subfield code="l">DE-91</subfield><subfield code="p">ZDB-30-ORH</subfield><subfield code="q">TUM_PDA_ORH</subfield><subfield code="u">https://learning.oreilly.com/library/view/-/9781801071291/?ar</subfield><subfield code="m">X:ORHE</subfield><subfield code="x">Aggregator</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-30-ORH</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-30-ORH</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">BO</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-30-ORH</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-91</subfield></datafield></record></collection>
id	ZDB-30-ORH-063078791
illustrated	Not Illustrated
indexdate	2025-01-17T11:20:40Z
institution	BVB
isbn	9781801074407 1801074402 9781801071291
language	English
open_access_boolean
owner	DE-91 DE-BY-TUM
owner_facet	DE-91 DE-BY-TUM
physical	1 Online-Ressource
psigel	ZDB-30-ORH TUM_PDA_ORH ZDB-30-ORH
publishDate	2021
publishDateSearch	2021
publishDateSort	2021
publisher	Packt Publishing Limited
record_format	marc
spelling	Mertz, David VerfasserIn aut Cleaning data for effective data science doing the other 80% of the work with Python, R, and command-line tools David Mertz [Erscheinungsort nicht ermittelbar] Packt Publishing Limited 2021 1 Online-Ressource Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier A comprehensive guide for data scientists to master effective data cleaning tools and techniques Key Features Master data cleaning techniques in a language-agnostic manner Learn from intriguing hands-on examples from numerous domains, such as biology, weather data, demographics, physics, time series, and image processing Work with detailed, commented, well-tested code samples in Python and R Book Description It is something of a truism in data science, data analysis, or machine learning that most of the effort needed to achieve your actual purpose lies in cleaning your data. Written in David's signature friendly and humorous style, this book discusses in detail the essential steps performed in every production data science or data analysis pipeline and prepares you for data visualization and modeling results. The book dives into the practical application of tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. It also offers long-form exercises at the end of each chapter to practice the skills acquired. You will begin by looking at data ingestion of data formats such as JSON, CSV, SQL RDBMSes, HDF5, NoSQL databases, files in image formats, and binary serialized data structures. Further, the book provides numerous example data sets and data files, which are available for download and independent exploration. Moving on from formats, you will impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals. By the end of this book, you will have acquired a firm understanding of the data cleaning process necessary to perform real-world data science and machine learning tasks. What you will learn How to think carefully about your data and ask the right questions Identify problem data pertaining to individual data points Detect problem data in the systematic "shape" of the data Remediate data integrity and hygiene problems Prepare data for analytic and machine learning tasks Impute values into missing or unreliable data Generate synthetic features that are more amenable to data science, data analysis, or visualization goals. Who this book is for This book is designed to benefit software developers, data scientists, aspiring data scientists, and students who are interested in data analysis or scientific computing. Basic familiarity with statistics, general concepts in machine learning,... Computational biology Methods Database management Data integrity Python (Computer program language) R (Computer program language) Computational Biology methods Data Analysis Data Accuracy Python (Programming language) Bases de données ; Gestion Intégrité des données Python (Langage de programmation) R (Langage de programmation) Qualité des données Database design & theory Data capture & analysis Mathematical theory of computation Machine learning Information architecture Computers ; Data Processing Computers ; Machine Theory Computers ; Data Modeling & Design Computational biology Fulltext Internet Resources Methods (Music) 1801071292 Erscheint auch als Druck-Ausgabe 1801071292
spellingShingle	Mertz, David Cleaning data for effective data science doing the other 80% of the work with Python, R, and command-line tools Computational biology Methods Database management Data integrity Python (Computer program language) R (Computer program language) Computational Biology methods Data Analysis Data Accuracy Python (Programming language) Bases de données ; Gestion Intégrité des données Python (Langage de programmation) R (Langage de programmation) Qualité des données Database design & theory Data capture & analysis Mathematical theory of computation Machine learning Information architecture Computers ; Data Processing Computers ; Machine Theory Computers ; Data Modeling & Design Computational biology Fulltext Internet Resources Methods (Music)
title	Cleaning data for effective data science doing the other 80% of the work with Python, R, and command-line tools
title_auth	Cleaning data for effective data science doing the other 80% of the work with Python, R, and command-line tools
title_exact_search	Cleaning data for effective data science doing the other 80% of the work with Python, R, and command-line tools
title_full	Cleaning data for effective data science doing the other 80% of the work with Python, R, and command-line tools David Mertz
title_fullStr	Cleaning data for effective data science doing the other 80% of the work with Python, R, and command-line tools David Mertz
title_full_unstemmed	Cleaning data for effective data science doing the other 80% of the work with Python, R, and command-line tools David Mertz
title_short	Cleaning data for effective data science
title_sort	cleaning data for effective data science doing the other 80 of the work with python r and command line tools
title_sub	doing the other 80% of the work with Python, R, and command-line tools
topic	Computational biology Methods Database management Data integrity Python (Computer program language) R (Computer program language) Computational Biology methods Data Analysis Data Accuracy Python (Programming language) Bases de données ; Gestion Intégrité des données Python (Langage de programmation) R (Langage de programmation) Qualité des données Database design & theory Data capture & analysis Mathematical theory of computation Machine learning Information architecture Computers ; Data Processing Computers ; Machine Theory Computers ; Data Modeling & Design Computational biology Fulltext Internet Resources Methods (Music)
topic_facet	Computational biology Methods Database management Data integrity Python (Computer program language) R (Computer program language) Computational Biology methods Data Analysis Data Accuracy Python (Programming language) Bases de données ; Gestion Intégrité des données Python (Langage de programmation) R (Langage de programmation) Qualité des données Database design & theory Data capture & analysis Mathematical theory of computation Machine learning Information architecture Computers ; Data Processing Computers ; Machine Theory Computers ; Data Modeling & Design Computational biology Fulltext Internet Resources Methods (Music)
work_keys_str_mv	AT mertzdavid cleaningdataforeffectivedatasciencedoingtheother80oftheworkwithpythonrandcommandlinetools

Verfügbarkeit

‌

Online lesen