Limited Time Offer!

For Less Than the Cost of a Starbucks Coffee, Access All DevOpsSchool Videos on YouTube Unlimitedly.
Master DevOps, SRE, DevSecOps Skills!

Enroll Now

Jupyter notebook – Lab Session – 1 – Exploring Dataset with Pandas and NumPy

Importing Libraries

import pandas as pd
import numpy as np

Explanation: Import the essential libraries.

Loading the Dataset

df = pd.read_csv('/path_to_your_dataset.csv')

Explanation: Load the dataset into a Pandas DataFrame.

Display First Few Rows

df.head()

Explanation: Display the first five rows to understand the structure.

Display Last Few Rows

df.tail()

Explanation: Display the last five rows of the dataset.

Dataset Information

df.info()

Explanation: Get an overview, including data types and null values.

Descriptive Statistics

df.describe()

Explanation: Get statistics like mean, median, min, and max for each column.

Column Names

df.columns

Explanation: List all column names in the dataset.

Shape of the Dataset

df.shape

Explanation: Get the number of rows and columns.

Check for Null Values

df.isnull().sum()

Explanation: Count null values in each column.

Drop Rows with Null Values

df_cleaned = df.dropna()

Explanation: Remove rows with null values for a cleaner dataset.

Fill Null Values

df.fillna(value='Unknown', inplace=True)

Explanation: Fill null values with a placeholder.

Unique Values in a Column

df['column_name'].unique()

Explanation: Display unique values in a specific column.

Value Counts

df['column_name'].value_counts()

Explanation: Count the occurrences of each unique value in a column.

Filter Rows by Condition

df_filtered = df[df['column_name'] > some_value]

Explanation: Filter rows based on a condition.

Selecting Multiple Columns

df[['column1', 'column2']]

Explanation: Select and display specific columns.

Add a New Column

df['new_column'] = df['column1'] + df['column2']

Explanation: Add a new column by combining values from other columns.

Rename Columns

df.rename(columns={'old_name': 'new_name'}, inplace=True)

Explanation: Rename columns for better readability.

Sorting Values

df.sort_values(by='column_name', ascending=False)

Explanation: Sort the dataset by a specific column.

Drop a Column

df.drop('column_name', axis=1, inplace=True)

Explanation: Remove a specific column.

Group By and Aggregate

df.groupby('column_name').sum()

Explanation: Group by a column and apply an aggregate function like sum.

Calculate Mean of a Column

df['column_name'].mean()

Explanation: Calculate the mean of a specific column.

Calculate Median of a Column

df['column_name'].median()

Explanation: Calculate the median of a specific column.

Standard Deviation of a Column

df['column_name'].std()

Explanation: Calculate the standard deviation of a specific column.

Detecting Outliers

df[(df['column_name'] > upper_limit) | (df['column_name'] < lower_limit)]

Explanation: Detect outliers by specifying upper and lower limits.

Apply Custom Function

df['new_column'] = df['column_name'].apply(lambda x: x * 2)

Explanation: Apply a custom function to each value in a column.

Pivot Table

df.pivot_table(values='value_column', index='index_column', columns='column_name')

Explanation: Create a pivot table to analyze relationships.

Correlation Matrix

df.corr()

Explanation: Calculate the correlation matrix for numeric columns.

Visualizing with Histograms

df['column_name'].hist()

Explanation: Plot a histogram for a column to view the distribution.

Scatter Plot

df.plot.scatter(x='column_x', y='column_y')

Explanation: Create a scatter plot to see relationships between two columns.

Box Plot

df.boxplot(column='column_name')

Explanation: Generate a box plot to identify the spread and outliers.

Live Example of Data set Attached

DOWNLOAD from HERE – CLICK HERE

Rajesh Kumar
Follow me
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x