Pandas and SQL
Reading Pandas DataFrame using SQL select statements
# import sqldf from pandasql
from pandasql import sqldf
# create a short alternative for sqldf using lambda function to avoid passing globals() or locals() every time while calling sqldf
# this command below will give as alternative to sqldf and automatically pass globals() or locals() we just need to pass the query
pysqldf = lambda q: sqldf(q, globals())
# import pandas
import pandas as pd
# read from csv file and use 'header' attribute if column headers are not in 1st line of csv file
customer_df=pd.read_csv("D:\\PowerBI\\Datasets\\World Wide Importers\\DimCustomer.csv", header=1)
customer_df.head(1)
Customer Key
WWI Customer ID
Customer
Bill To Customer
Category
Buying Group
Primary Contact
Postal Code
Credit Limit
Valid From
Valid To
Lineage Key
0
0
0
Unknown
NaN
NaN
NaN
NaN
NaN
? -
00:00.0
##############################################...
0
# create your SQL query that you want to excute on the DataFrame
q='''
SELECT "Customer Data"
FROM customer_df LIMIT 4
'''
# use the custom sqldf method created above
pysqldf(q)
Customer Key
WWI Customer ID
WWI Customer ID
0
0
0
Unknown
Read from SQL to DF
install sqlalchemy
to us pandas's read_sql
and to_sql
functions
sqlalchemy
to us pandas's read_sql
and to_sql
functionsalternatively you can use sqlalchemy with a little extra code
!pip install sqlalchemy
Requirement already satisfied: sqlalchemy in c:\python3\lib\site-packages (1.4.31)
Requirement already satisfied: greenlet!=0.4.17 in c:\python3\lib\site-packages (from sqlalchemy) (1.1.2)
WARNING: You are using pip version 21.2.4; however, version 22.0.3 is available.
You should consider upgrading via the 'C:\Python3\python.exe -m pip install --upgrade pip' command.
install pymysql (it is an extension for sqlalchemy to enable it to connect to mysql sources)
!pip install pymysql
# import sqlalchemy
import sqlalchemy
# import pymysql
import pymysql
create a connection for pandas using slqalchemy function
sqlalchemy.create_engine('mysql+pymysql://username
:@host_name
/db_name
')
conection_engine = sqlalchemy.create_engine('mysql+pymysql://root:@localhost/pandas_sql')
df_customer=pd.read_sql_query("SELECT * FROM customer", conection_engine)
df_customer.head()
Id
Name
City
0
1
Ajeet
Ghaziabad
1
2
Rashid
Noida
2
3
Ajay
Gurgaon
3
4
Ramu
Hyderbad
4
5
Rajesh
Bangalore
Once you read the data from SQL and store into a DataFrame you can do anything you want with it
# let's say after doing some transformation and modificaton you have a new DataFrame (df_customer_modified)
df_customer_modified=df_customer
Now if you want you can save this DataFrame in you SQL Database
# write DataFrame into SQL database (if the mentioned table name already exists then this qury will fail)
df_customer_modified.to_sql("customer_modified", conection_engine)
Congratulation !! 🎉
You can now access/read DataFrames using SQL Select statements instead of using the traditional Datafram syntax.
You can also read data from SQL and save the result as DataFrame.
You can save an existing dataframe directly into sql table.
Last updated
Was this helpful?