PySpark Transformation Methods
In this tutorial, we will try to explore some of the common data transformation methods in PySpark
df.select("column1","column2")df.filter(df.column_name > 10)df.groupBy("column1","column2").agg(avg("column3"),max("column4"))df1.join(df2, "column_name", "inner")df.drop("column1","column2")df.withColumn("new_column", df.column1 + df.column2)df.sort(df.column1.desc())df.distinct()df.limit(10)Last updated