# PySpark Native Functions

PySpark provides a variety of built-in functions that can be used to perform operations on columns in a DataFrame. These functions are part of the pyspark.sql.functions module and can be imported as follows:

```python
from pyspark.sql.functions import *
```

Some examples of commonly used functions include:

* `sum()` function: It is used to calculate the sum of a column.

```python
df.agg(sum("column1"))
```

* `avg()` function: It is used to calculate the average of a column.

```python
df.agg(avg("column1"))
```

* `min()` function: It is used to calculate the minimum value of a column.

```python
df.agg(min("column1"))
```

* `max()` function: It is used to calculate the maximum value of a column.

```python
df.agg(max("column1"))
```

* `concat()` function: It is used to concatenate two or more columns

```python
df.select(concat(col("column1"), col("column2")))
```

These functions can be used with the `select()` and `agg()` methods to perform operations on DataFrame columns.

```python
df.select(sum("column1").alias("sum_column1"))
```

You can also use these functions in the `filter()` method to filter the dataframe based on a certain condition

```python
df.filter(col("column1") > 10)
```

These functions can also be used with the `withColumn()` method to add a new column to a DataFrame.

```python
df.withColumn("new_column", col("column1") + col("column2"))
```

You can also use the `when()` and `otherwise()` functions to create a new column based on a certain condition.

```python
from pyspark.sql.functions import when
df.withColumn("new_column", when(col("column1") > 10, "high").otherwise("low"))
```

You can also use the `ifnull()` and `nullif()` functions to handle missing values.

```python
from pyspark.sql.functions import ifnull, nullif
df.select(ifnull("column1", 0))
df.select(nullif("column1", 0))
```

These are just some examples of the built-in functions provided by PySpark. There are many more functions available and it's always good to check the documentation for the latest updates and options.

It's always good to check the [documentation](https://docs.databricks.com/) for the latest updates and options. Also, when you are working with Databricks, always make sure that you have the required libraries installed.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.consoleflare.com/pyspark-and-databricks/pyspark-native-functions.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
