Python Scatter Plots
Scatter Plots in Jupyter Notebooks
This article describes how to create a Scatter Plot Charts from a CSV dataset file using Matplotlib & Seaborn. The CSV dataset file (penguins_size.csv) was downloaded from Kaggle https://www.kaggle.com/datasets/parulpandey/palmer-archipelago-antarctica-penguin-data?resource=download. The first scatter plot will use matplotlib to display the Antarctica penguins species and compare their culmen (beak) length and depth.
Load the dataset into a DataFrame
Write code to use pandas to read the CSV data into a DataFrame (df_p) and then write a statement create a second dataframe where the NaN records are Dropped.
data:image/s3,"s3://crabby-images/ce113/ce1133daca0ef420afcb38d03cea7cd17ed08285" alt="Python Data output"
Code for second dataframe dropping the NaN values & resetting the index.
data:image/s3,"s3://crabby-images/00c0f/00c0f6ee3755b6fac4ddd8d7f8d04fee09e96592" alt="Python dataframe DropNA for NaN values"
Matplotlib Scatter Plot - Create a list of species from the dataframe for the Legend. Code the scatterplot with the values for the dataframe culmen Length (X-axis) and the culmen depth (Y-axis) by Penguin Species. s = the size of the marker in points. c = a 2 dimensional array for the colors plotted. plt.legend produces the scatter plot with labels and Title.
data:image/s3,"s3://crabby-images/5ef25/5ef25e781d4c1b268431f3c2cd3491f75c70a4b4" alt="matplotlib scatterplot"
Seaborn Scatter Plot - Using the same dataframe, df_penguin, the Seaborn scatterplot code requires the x-axis and y-axis values columns from the dataframe and the Hue to color the data separately for each species.
data:image/s3,"s3://crabby-images/74a23/74a2396ea4f85541ed172d4b04e191f2729e8d55" alt="Seaborn scatterplot"
Seaborn Scatterplot with Linear Regression Model - Create the code to output the chart to include the Size, Title, Y-Axis, and data for plt.bar.
data:image/s3,"s3://crabby-images/329c6/329c65ef6008c302e6bdd8920417aaf6c9e49f3a" alt="Seaborn Scatterplot + Linear Regression model code"
Using the same code for the sns.lmplot, this example uses markers to differentiate the Species (^ = Triangle, s = Square, * = Star). The color palettes information may be found on the seaborn website: http://https://seaborn.pydata.org/tutorial/color_palettes.html
data:image/s3,"s3://crabby-images/8c272/8c2721522d71040319bcc032255cb7d446bae3b2" alt="Seaborn Scatterplot + LM code including markers and palette"
Custom Scatterplot Legend - additional scatterplot with missing information.
Write code to produce a scatterplot to display the differences in Culmen length & depth by the sex of the Penguins. The output appears to have a value of "." for the Sex of one of the dataframe records.
data:image/s3,"s3://crabby-images/1ee6d/1ee6dd8720ae0a84ced1ed947251e66534223334" alt="Penguin Body Mass by Species & Sex"
The issue with the additional value in the dataframe is confirmed.
data:image/s3,"s3://crabby-images/dae34/dae34ca77f9ef726d91480df98ff6897b7ac2e2f" alt="DataFrame with additional value of '.'"
Create a Custom Legend to account for this missing data using a List for the Legend.
data:image/s3,"s3://crabby-images/9cb43/9cb435da8a6de40308763bce5099e46a934bb9aa" alt="Lists of legend values for plot"
Custom Scatterplots using "Row" - An alternate method would be to generate a clean dataframe to create the scatterplots.
data:image/s3,"s3://crabby-images/3fe04/3fe044bb051bcbab7cc84877221692b904ba566c" alt="Cleansed dataframe version for penguins lmplot"
Using the addition of the Row in the lmplot, 2 separate scatterlots are produced - one for male penguins, one for female penguins.
data:image/s3,"s3://crabby-images/f5688/f56882880297d222b147896fbed11f96c2b87511" alt="2 grids - one for Male penguins, one for Female penguins"