okidk
About Menu Contact

Pandas Cheat Sheet


Dataframe Manipulation

How to create a new dataframe using a mask to filter by date range:

mask = (df['date'] >= start_date) & (df['date'] <= end_date)
df2 = df.loc[mask]

How to filter a dataframe by column value

df2 = df[df['column_name']== 'filter_value']

How to remove column from dataframe

df = df.drop(['column_name'], axis=1)

How to reorder dataframe columns by moving last column to first

cols = df.columns.tolist()
cols = cols[-1:] + cols[:-1]
df = df[cols]

How to rename dataframe column

df = df.rename(columns={'old_name': 'new_name'})

How to merge two dataframes (equivalent to left join)

df = df.merge(df2,on='shared_column_name')

How to group dataframe column by sum of values and reset index

df = df.groupby(["column_a", "column_b"]).sum().reset_index()

How to sort dataframe by column values, high to low

df.sort_values(by=['col1'],ascending=False)

How to convert dataframe column values to list

new_list = df['column'].tolist()

How to convert list of integers to a string

string_name = ','.join([str(elem) for elem in list_name])

How to trim whitespace from column values

df['column'] = df['column'].str.strip()

How to convert column values to lowercase

df['column'] = df['column'].str.lower()

Printf total rows in dataframe

print(len(df.index))

How to save dataframe to csv

df.to_csv('file.csv')

How to convert csv to dataframe

df = pd.read_csv('file.csv')


Dataframe Time Manipulation

How to create first of month column from date column

df['month'] = df['date'].values.astype('datetime64[M]')

How to convert dataframe series column to date format

df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True).dt.date

How to convert datetime object to date

date = date.date()

calculate days between two dates

start_date = datetime.strptime(start_date, '%Y-%m-%d')
end_date = datetime.strptime(end_date, '%Y-%m-%d')
days = (end_date - start_date).days

How to group data by date period

df = df.groupby(pd.Grouper(key="date", freq="1M")).sum().reset_index()


Dataframe Value Manipulation

How to replace strings in column

df['column'] = df['column'].str.replace('$', '')

How to change column data type to datetime obj

df['column'] = df['column'].astype('datetime64[ns]')

How to replace NaN with 0

df['column'] = df['column'].fillna(0)

How to change dataframe values in a specific column that are found within a list

df.loc[(df["column"].isin(["a", "b", "c", "d"]), "column")] = "new_value"

How to correct mixed types of dollars and convert to float

df['column'] = df['column'].replace({'\$ -': '','\$': '', ',': ''}, regex=True).astype(float)


Lambda functions

How to generate column values based on different column values

df['column_b'] = df['column_a'].apply(lambda x: 1 if x > 0 else 0)

How to generate column values with division and avoid dividing by 0

def div(x, y):
if y > 0:
return round(x / y, 2)
else:
return 1

df['percentage'] = df.apply(lambda x: div(x['column_a'], x['column_b']), axis=1)