Seaborn的统计数据可视化培训
Project: Statistical Data Visualization with Seaborn
Welcome to this project-based course on Statistical Data Visualization with Seaborn.
Producing visualizations is an important first step in exploring and analyzing real-world data sets.
As such, visualization is an indispensable method in any data scientist's toolbox.
It is also a powerful tool to identify problems in analyses and for illustrating results.
In this project, we will employ the statistical data visualization library, Seaborn,
to discover and explore the relationships in the Breast Cancer Wisconsin (Diagnostic) data set.
We will use the results from our exploratory data analysis (EDA) in the previous project,
Breast Cancer Diagnosis – Exploratory Data Analysis to: drop correlated features,
implement feature selection and feature extraction methods including feature selection with correlation,
univariate feature selection, recursive feature elimination, principal component analysis (PCA) and tree based feature selection methods. Lastly,
we will build a boosted decision tree classifier with XGBoost to classify tumors as either malignant or benign.