SkillSoft Explore Course – Lambers, Inc.

Aspire Data Analyst to Data Scientist Data Science Track 4: Data Scientist

The imbalanced-learn library that integrates with Pandas ML (machine learning) offers several techniques to address the imbalance in datasets used for classification. In this course, explore oversampling, undersampling, and a combination of techniques. Begin by using Pandas ML to explore a data set in which samples are not evenly distributed across target classes. Then apply the technique of oversampling with the RandomOverSampler class in the imbalanced-learn library; build a classification model with oversampled data; and evaluate its performance. Next, learn how to create a balanced data set with the Synthetic Minority Oversampling Technique and how to perform undersampling operations on a data set by applying Near Miss, Cluster Centroids, and Neighborhood cleaning rules techniques. Next, look at ensemble classifiers for imbalanced data, applying combination samplers for imbalanced data, and finding correlations in a data set. Learn how to build a multilabel classification model, explore the use of principal component analysis, or PCA, and how to combine use of oversampling and PCA in building a classification model. The exercise involves working with imbalanced data sets.

Asset ID	it_dsmdladj_04_enus
Course Type	Video Course
Course Category	Data Analyst to Data Scientist

Objectives
Machine & Deep Learning Algorithms: Imbalanced Datasets Using Pandas ML Course Overview use Pandas ML to explore a dataset where the samples are not evenly distributed across the target classes apply the technique of oversampling using the RandomOverSampler class in the imbalanced-learn library, build a classification model with the oversampled data, and evaluate its performance create a balanced dataset using the Synthetic Minority Oversampling Technique and build and evaluate a classification model with that data perform undersampling operations on a dataset by applying the Near Miss, Cluster Centroids, and Neighborhood Cleaning Rule techniques use the EasyEnsembleClassifier and BalancedRandomForestClassifier available in the imbalanced-learn library to build classification models with imbalanced data apply a combination of oversampling and undersampling using the SMOTETomek and SMOTEENN techniques use Pandas and Seaborn to visualize the correlated fields in a dataset train and evaluate a classification model to predict the quality ratings of red wines transform a dataset containing multiple features to a handful of principal components and build a classification model using the reduced dimensions of the dataset combine the use of oversampling and PCA in building a classification model recall the techniques used by algorithms for undersampling and oversampling data and the use of combined samplers

Objectives

Machine & Deep Learning Algorithms: Imbalanced Datasets Using Pandas ML

Course Overview
use Pandas ML to explore a dataset where the samples are not evenly distributed across the target classes
apply the technique of oversampling using the RandomOverSampler class in the imbalanced-learn library, build a classification model with the oversampled data, and evaluate its performance
create a balanced dataset using the Synthetic Minority Oversampling Technique and build and evaluate a classification model with that data
perform undersampling operations on a dataset by applying the Near Miss, Cluster Centroids, and Neighborhood Cleaning Rule techniques
use the EasyEnsembleClassifier and BalancedRandomForestClassifier available in the imbalanced-learn library to build classification models with imbalanced data
apply a combination of oversampling and undersampling using the SMOTETomek and SMOTEENN techniques
use Pandas and Seaborn to visualize the correlated fields in a dataset
train and evaluate a classification model to predict the quality ratings of red wines
transform a dataset containing multiple features to a handful of principal components and build a classification model using the reduced dimensions of the dataset
combine the use of oversampling and PCA in building a classification model
recall the techniques used by algorithms for undersampling and oversampling data and the use of combined samplers