ConsoleFlare
  • Python
    • Python Installation
    • Pandas and SQL
  • Projects
    • Data Analytics Project
      • Courier Analytics Challenge
      • Solution
    • Skytrax Airline Review Analysis Pipeline
      • Setting up Azure SQL Database
      • SkyTrax Web Scraping
  • Reporting
    • Power BI
      • Installation
      • Data Sources
      • Important Links
  • PySpark & Databricks
    • Spark vs Hadoop
    • Cluster Computing
    • PySpark
    • Databricks Introduction
    • PySpark in Databricks
    • Reading Data with PySpark
    • PySpark Transformation Methods
    • Handling Duplicate Data
    • PySpark Action Methods
    • PySpark Native Functions
    • Partitioning
    • Bucketing
    • Partitioning vs Bucketing
  • Live Data Streaming
    • Spark Streaming
      • Installation Issues
      • Jupyter Notebook Setup
  • Data Pipeline
    • Azure Data Factory
  • Blockchain
    • Smart Contract Guide
      • Setting up a Node project
      • Developing smart contracts
  • Interview Questions
    • SQL Interview Questions
    • Power BI Interview Questions
  • T-SQL Exercises
    • Exercise 0
    • Exercise 1
    • Exercise 2
    • Exercise 3
  • CHEAT SHEET
    • Ultimate SQL Server Cheat Sheet
Powered by GitBook
On this page
  • Reading Pandas DataFrame using SQL select statements
  • Read from SQL to DF
  • Congratulation !! 🎉

Was this helpful?

  1. Python

Pandas and SQL

Reading Pandas DataFrame using SQL select statements

# import sqldf from pandasql
from pandasql import sqldf
# create a short alternative for sqldf using lambda function to avoid passing globals() or locals() every time while calling sqldf
# this command below will give as alternative to sqldf and automatically pass globals() or locals() we just need to pass the query
pysqldf = lambda q: sqldf(q, globals())
# import pandas
import pandas as pd
# read from csv file and use 'header' attribute if column headers are not in 1st line of csv file
customer_df=pd.read_csv("D:\\PowerBI\\Datasets\\World Wide Importers\\DimCustomer.csv", header=1)
customer_df.head(1)
Customer Key
WWI Customer ID
Customer
Bill To Customer
Category
Buying Group
Primary Contact
Postal Code
Credit Limit
Valid From
Valid To
Lineage Key

0

0

0

Unknown

NaN

NaN

NaN

NaN

NaN

? -

00:00.0

##############################################...

0

# create your SQL query that you want to excute on the DataFrame
q='''
SELECT "Customer Data" 
FROM customer_df LIMIT 4
'''
# use the custom sqldf method created above
pysqldf(q)
Customer Key
WWI Customer ID
WWI Customer ID

0

0

0

Unknown

Read from SQL to DF

install sqlalchemy to us pandas's read_sql and to_sql functions

alternatively you can use sqlalchemy with a little extra code

!pip install sqlalchemy
Requirement already satisfied: sqlalchemy in c:\python3\lib\site-packages (1.4.31)
Requirement already satisfied: greenlet!=0.4.17 in c:\python3\lib\site-packages (from sqlalchemy) (1.1.2)


WARNING: You are using pip version 21.2.4; however, version 22.0.3 is available.
You should consider upgrading via the 'C:\Python3\python.exe -m pip install --upgrade pip' command.

install pymysql (it is an extension for sqlalchemy to enable it to connect to mysql sources)

!pip install pymysql
# import sqlalchemy
import sqlalchemy
# import pymysql
import pymysql

create a connection for pandas using slqalchemy function

sqlalchemy.create_engine('mysql+pymysql://username:@host_name/db_name')

conection_engine = sqlalchemy.create_engine('mysql+pymysql://root:@localhost/pandas_sql')
df_customer=pd.read_sql_query("SELECT * FROM customer", conection_engine)
df_customer.head()
Id
Name
City

0

1

Ajeet

Ghaziabad

1

2

Rashid

Noida

2

3

Ajay

Gurgaon

3

4

Ramu

Hyderbad

4

5

Rajesh

Bangalore

Once you read the data from SQL and store into a DataFrame you can do anything you want with it

# let's say after doing some transformation and modificaton you have a new DataFrame (df_customer_modified)
df_customer_modified=df_customer

Now if you want you can save this DataFrame in you SQL Database

# write DataFrame into SQL database (if the mentioned table name already exists then this qury will fail)
df_customer_modified.to_sql("customer_modified", conection_engine)

Congratulation !! 🎉

  • You can now access/read DataFrames using SQL Select statements instead of using the traditional Datafram syntax.

  • You can also read data from SQL and save the result as DataFrame.

  • You can save an existing dataframe directly into sql table.

PreviousPython InstallationNextData Analytics Project

Last updated 1 month ago

Was this helpful?