SQL is one of the most relevant language for databases. So at one point you will need to interact to get, store or modify some data. Fortunatelly you can do it in python using SQL Alchemy.
In this post I will show to do it while giving some advices based on problems I had.
In order to connect to a SQL database you will need to create an engine. The best way is to use the functions create_engine
and engine.url.URL
. For example:
import pandas as pd # Used later to read/insert data import sqlalchemy as sa engine = sa.create_engine( sa.engine.url.URL( drivername=drivername, username="root", password="1234", host="localhost", port=port, database="db", ), )
You obviously will need to use your credentials for username
, password
, host
and database
.
For the drivername
and port
below you have and example for some popular SQL databases:
postgres: drivername: "postgresql" port: 5439 redshift: drivername: "redshift+psycopg2" port: 5432 my_sql: drivername: "mysql+pymysql" port: 3306
More info about SQL engines at SQL Alchemy documentation.
To retrive a pandas dataframe it is easy. You can directly use pandas read_sql_query
function:
with engine.connect() as connection: df = pd.read_sql_query("SELECT * FROM my_table", connection)
You can also run an UPDATE
, DELETE
or other SQL sentences by creating a connection with the engine as follows:
with engine.connect() as connection: connection.execute("DELETE FROM my_table WHERE id > 10")
It is also posible to run stored procedures and to pass parameters to it. To do so you need to access the raw connection of the engine.
with engine.raw_connection().cursor() as cursor: cursor.execute("my_stored_procedure ?, ?", ['param_1', 'param_2']) cursor.commit()
If you don't have input parameters you can run cursor.execute("my_stored_procedure")
.
Ideally to insert data to SQL is as easy as calling the datafram function to_sql
. For example:
df.to_sql(name="my_table", con=engine, if_exists="append", index=False)
Be careful with using
if_exists="replace"
because it will delete the SQL table an create it another time. This caused me some problems when I was inserting japanese text in a table because the encoding was lost in the process.I usually use
index=False
because the index will be a dummy one. If it is not the case first calldf.reset_index()
.
The function df.to_sql
works well with MySQL
but I had serious performance problems when working with PostgreSQL
. Luckily there is a workaround:
from io import StringIO output = StringIO() df.to_csv(output, sep=';', header=False, index=False, columns=df.columns) output.getvalue() # jump to start of stream output.seek(0) # Insert df into postgre connection = engine.raw_connection() with connection.cursor() as cursor: cursor.copy_from(output, "my_table", sep=';', null="NULL", columns=(df.columns)) connection.commit()
It is not as simple as before but I got a 10x improvement.
Be careful to use the same separator in
df.to_csv
andcursor.copy_from