Importing Libraries
import pandas as pd
import numpy as np
Explanation: Import the essential libraries.
Loading the Dataset
df = pd.read_csv('/path_to_your_dataset.csv')
Explanation: Load the dataset into a Pandas DataFrame.
Display First Few Rows
df.head()
Explanation: Display the first five rows to understand the structure.
Display Last Few Rows
df.tail()
Explanation: Display the last five rows of the dataset.
Dataset Information
df.info()
Explanation: Get an overview, including data types and null values.
Descriptive Statistics
df.describe()
Explanation: Get statistics like mean, median, min, and max for each column.
Column Names
df.columns
Explanation: List all column names in the dataset.
Shape of the Dataset
df.shape
Explanation: Get the number of rows and columns.
Check for Null Values
df.isnull().sum()
Explanation: Count null values in each column.
Drop Rows with Null Values
df_cleaned = df.dropna()
Explanation: Remove rows with null values for a cleaner dataset.
Fill Null Values
df.fillna(value='Unknown', inplace=True)
Explanation: Fill null values with a placeholder.
Unique Values in a Column
df['column_name'].unique()
Explanation: Display unique values in a specific column.
Value Counts
df['column_name'].value_counts()
Explanation: Count the occurrences of each unique value in a column.
Filter Rows by Condition
df_filtered = df[df['column_name'] > some_value]
Explanation: Filter rows based on a condition.
Selecting Multiple Columns
df[['column1', 'column2']]
Explanation: Select and display specific columns.
Add a New Column
df['new_column'] = df['column1'] + df['column2']
Explanation: Add a new column by combining values from other columns.
Rename Columns
df.rename(columns={'old_name': 'new_name'}, inplace=True)
Explanation: Rename columns for better readability.
Sorting Values
df.sort_values(by='column_name', ascending=False)
Explanation: Sort the dataset by a specific column.
Drop a Column
df.drop('column_name', axis=1, inplace=True)
Explanation: Remove a specific column.
Group By and Aggregate
df.groupby('column_name').sum()
Explanation: Group by a column and apply an aggregate function like sum.
Calculate Mean of a Column
df['column_name'].mean()
Explanation: Calculate the mean of a specific column.
Calculate Median of a Column
df['column_name'].median()
Explanation: Calculate the median of a specific column.
Standard Deviation of a Column
df['column_name'].std()
Explanation: Calculate the standard deviation of a specific column.
Detecting Outliers
df[(df['column_name'] > upper_limit) | (df['column_name'] < lower_limit)]
Explanation: Detect outliers by specifying upper and lower limits.
Apply Custom Function
df['new_column'] = df['column_name'].apply(lambda x: x * 2)
Explanation: Apply a custom function to each value in a column.
Pivot Table
df.pivot_table(values='value_column', index='index_column', columns='column_name')
Explanation: Create a pivot table to analyze relationships.
Correlation Matrix
df.corr()
Explanation: Calculate the correlation matrix for numeric columns.
Visualizing with Histograms
df['column_name'].hist()
Explanation: Plot a histogram for a column to view the distribution.
Scatter Plot
df.plot.scatter(x='column_x', y='column_y')
Explanation: Create a scatter plot to see relationships between two columns.
Box Plot
df.boxplot(column='column_name')
Explanation: Generate a box plot to identify the spread and outliers.
Live Example of Data set Attached
DOWNLOAD from HERE – CLICK HERE
- An Introduction of Jupyter notebook extension - November 10, 2024
- Jupyter notebook – Lab Session – 12 – Panda Introduction - November 10, 2024
- Jupyter notebook – Lab Session – 11 – Numpy Introduction - November 10, 2024