Pandas and SQL

Reading Pandas DataFrame using SQL select statements

# import sqldf from pandasql
from pandasql import sqldf
# create a short alternative for sqldf using lambda function to avoid passing globals() or locals() every time while calling sqldf
# this command below will give as alternative to sqldf and automatically pass globals() or locals() we just need to pass the query
pysqldf = lambda q: sqldf(q, globals())
# import pandas
import pandas as pd
# read from csv file and use 'header' attribute if column headers are not in 1st line of csv file
customer_df=pd.read_csv("D:\\PowerBI\\Datasets\\World Wide Importers\\DimCustomer.csv", header=1)
customer_df.head(1)
Customer KeyWWI Customer IDCustomerBill To CustomerCategoryBuying GroupPrimary ContactPostal CodeCredit LimitValid FromValid ToLineage Key

0

0

0

Unknown

NaN

NaN

NaN

NaN

NaN

? -

00:00.0

##############################################...

0

# create your SQL query that you want to excute on the DataFrame
q='''
SELECT "Customer Data" 
FROM customer_df LIMIT 4
'''
# use the custom sqldf method created above
pysqldf(q)
Customer KeyWWI Customer IDWWI Customer ID

0

0

0

Unknown

Read from SQL to DF

install sqlalchemy to us pandas's read_sql and to_sql functions

alternatively you can use sqlalchemy with a little extra code

!pip install sqlalchemy
Requirement already satisfied: sqlalchemy in c:\python3\lib\site-packages (1.4.31)
Requirement already satisfied: greenlet!=0.4.17 in c:\python3\lib\site-packages (from sqlalchemy) (1.1.2)


WARNING: You are using pip version 21.2.4; however, version 22.0.3 is available.
You should consider upgrading via the 'C:\Python3\python.exe -m pip install --upgrade pip' command.

install pymysql (it is an extension for sqlalchemy to enable it to connect to mysql sources)

!pip install pymysql
# import sqlalchemy
import sqlalchemy
# import pymysql
import pymysql

create a connection for pandas using slqalchemy function

sqlalchemy.create_engine('mysql+pymysql://username:@host_name/db_name')

conection_engine = sqlalchemy.create_engine('mysql+pymysql://root:@localhost/pandas_sql')
df_customer=pd.read_sql_query("SELECT * FROM customer", conection_engine)
df_customer.head()
IdNameCity

0

1

Ajeet

Ghaziabad

1

2

Rashid

Noida

2

3

Ajay

Gurgaon

3

4

Ramu

Hyderbad

4

5

Rajesh

Bangalore

Once you read the data from SQL and store into a DataFrame you can do anything you want with it

# let's say after doing some transformation and modificaton you have a new DataFrame (df_customer_modified)
df_customer_modified=df_customer

Now if you want you can save this DataFrame in you SQL Database

# write DataFrame into SQL database (if the mentioned table name already exists then this qury will fail)
df_customer_modified.to_sql("customer_modified", conection_engine)

Congratulation !! 🎉

  • You can now access/read DataFrames using SQL Select statements instead of using the traditional Datafram syntax.

  • You can also read data from SQL and save the result as DataFrame.

  • You can save an existing dataframe directly into sql table.

Last updated