Data Engineering Best Practices
This page is still W.I.P and solely exists for my own reference.
Avoid Multiple SparkSessions and SparkContexts
Creating multiple SparkSessions
and SparkContexts
can cause problems. It's a best practice to use the SparkSession.builder.getOrCreate()
method. This gives you an existing SparkSession
if there's one around, or it makes a new one if needed.
# Import SparkSession from pyspark.sql
from pyspark.sql import SparkSession
# Create spark_session
spark_session = SparkSession.builder.getOrCreate()
# Print spark_session
print(spark_session)
This page is still W.I.P and solely exists for my own reference.