Clean
drop_all_parameters_null_columns(df)
This function drops all columns which contain only null values in parameters column. :param df: A PySpark DataFrame
Source code in delta_utils/clean.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | |
fix_invalid_column_names(df)
Will replace all invalid spark characters in columns names with ascii numbers
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The dataframe with invalid column names |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DataFrame |
DataFrame
|
Returns a dataframe that has no invalid column names |
Source code in delta_utils/clean.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | |
flatten(df, nested_names=True, column_delimiter='_')
Will take a nested dataframe and flatten it out.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The dataframe you want to flatten |
required |
nested_names
|
bool
|
If you want nested names or not |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
DataFrame |
DataFrame
|
Returns a flatter dataframe |
Source code in delta_utils/clean.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | |