Clean
drop_all_parameters_null_columns(df)
This function drops all columns which contain only null values in parameters column. :param df: A PySpark DataFrame
Source code in delta_utils/clean.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
fix_invalid_column_names(df)
Will replace all invalid spark characters in columns names with ascii numbers
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The dataframe with invalid column names |
required |
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
DataFrame
|
Returns a dataframe that has no invalid column names |
Source code in delta_utils/clean.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
flatten(df, nested_names=True)
Will take a nested dataframe and flatten it out.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The dataframe you want to flatten |
required |
nested_names |
bool
|
If you want nested names or not |
True
|
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
DataFrame
|
Returns a flatter dataframe |
Source code in delta_utils/clean.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|