Coping with Missing, Invalid, and Duplicate Data in R
Learn about the most essential steps of data preparation: Missing value imputation, outlier detection, and duplicate removal.
Data preparation is part of nearly any data analytics project, therefore the skills are highly valuable. In this course, Coping with Missing, Invalid, and Duplicate Data in R, you will learn the main steps of data preparation. First, you will learn how to handle duplicate data. Next, you will discover that missing values prevent a lot of R functions from working properly, therefore you are limited in your R toolset as long as you do not take care of all these NA’s. Finally, you will explore outlier and invalid data detection and how they can introduce bias into your analysis. When you’re finished with this course, you will understand why missing values, outliers, and duplicates are problematic, how to detect them, and how to remove them from the dataset.
Author Name: Martin Burger
Author Description:
Martin studied biostatistics and worked for several pharmaceutical companies before he became a data science consultant and author. He published over 15 courses on R, Tableau 9 and other data science related subjects. His main focus lies on analytics software like R and SPSS but he is also interested in modern data visualization tools like Tableau. If he is not busy coding, blogging or working out new teaching concepts you may find him skiing or hiking in the Alps.
Table of Contents
- Course Overview
1min - Managing Duplicate Data
28mins - Managing Missing Data
37mins - Outlier and Invalid Data Detection
37mins - Further Resources and Summary
14mins
There are no reviews yet.