`ice_pick.utils`#

Module Contents#

Classes#

SQLTracker

Functions#

`snowpark_query`(session, sql[, non_select, dry, collect])	non-select queries include things like:
`_get_schemas`(→ list)
`_get_unique_struct_fields`(→ list)
`_handle_struct_name_type_mismatch`(→ list)
`_create_union_dataframe`(→ snowflake.snowpark.DataFrame)
`_extend_schema`(→ snowflake.snowpark.DataFrame)
`_add_null_cols`(→ list)
`_populate_union_df`(→ snowflake.snowpark.DataFrame)
`concat_standalone`(→ snowflake.snowpark.DataFrame)	Returns a unioned dataframe from the input list of dataframes based on column names.
`melt_standalone`(→ snowflake.snowpark.DataFrame)	Returns a unioned dataframe from the input list of dataframes based on column names.
`isna`()
`isnull`()
`pivot`()
`get_dummies`()

class ice_pick.utils.SQLTracker(func)#

__call__(*args, **kwargs)#

ice_pick.utils.snowpark_query(session, sql, non_select=False, dry=False, collect=False)#

non-select queries include things like:: “show databases;”

ice_pick.utils._get_schemas(union_dfs: list) → list#

ice_pick.utils._get_unique_struct_fields(struct_field_list: list) → list#

ice_pick.utils._handle_struct_name_type_mismatch(unique_structs_list: list) → list#

ice_pick.utils._create_union_dataframe(session: snowflake.snowpark.Session, unique_structs_type_list: list) → snowflake.snowpark.DataFrame#

ice_pick.utils._extend_schema(df: snowflake.snowpark.DataFrame, union_df: snowflake.snowpark.DataFrame) → snowflake.snowpark.DataFrame#

ice_pick.utils._add_null_cols(union_dfs: list, union_df: snowflake.snowpark.DataFrame) → list#

ice_pick.utils._populate_union_df(ext_dfs: list, union_df: snowflake.snowpark.DataFrame) → snowflake.snowpark.DataFrame#

ice_pick.utils.concat_standalone(session: snowflake.snowpark.Session, union_dfs: list) → snowflake.snowpark.DataFrame#

Returns a unioned dataframe from the input list of dataframes based on column names. Primarly to handle cases where the number of columns do not match, which is not suppored by the base union function. If columns do not match, non-matching columns are added with null values to the base dataframes.

Parameters:

session (Session) – session object
union_dfs (list) – A list of the input snowpark dataframes to union

Returns:

A snowpark dataframe with the unioned input dataframes

Return type:

snowpark.DataFrame

Example

>> schema_1 = StructType([StructField(“a”, IntegerType()), StructField(“b”, StringType())])
>> schema_2 = StructType([StructField(“a”, FloatType()), StructField(“c”, StringType())])
>> schema_3 = StructType([StructField(“a”, IntegerType()), StructField(“c”, StringType())])
>> schema_4 = StructType([StructField(“c”, StringType()), StructField(“d”, StringType())])

>> df_1 = session.create_dataframe([[1, “snow”], [3, “flake”]], schema_1)
>> df_2 = session.create_dataframe([[2.0, “ice”], [4.0, “pick”]], schema_2)
>> df_3 = session.create_dataframe([[6, “test_1”], [7, “test_2”]], schema_3)
>> df_4 = session.create_dataframe([[“testing_d”, “testing_f”], [“testing_g”, “testing_h”]], schema_4)

>> union_dfs = [df_1, df_2, df_3, df_4]
>> unioned_df = auto_union(session, union_dfs)
>> unioned_df.show()
—————————————-
|”A”   |”B”    |”C”        |”D”        |
—————————————-
|1.0   |snow   |NULL       |NULL       |
|3.0   |flake  |NULL       |NULL       |
|2.0   |NULL   |ice        |NULL       |
|4.0   |NULL   |pick       |NULL       |
|6.0   |NULL   |test_1     |NULL       |
|7.0   |NULL   |test_2     |NULL       |
|NULL  |NULL   |testing_d  |testing_f  |
|NULL  |NULL   |testing_g  |testing_h  |
—————————————-

ice_pick.utils.melt_standalone(session: snowflake.snowpark.Session, df: snowflake.snowpark.DataFrame, id_vars: list, value_vars: list, var_name: str = 'variable', value_name: str = 'value') → snowflake.snowpark.DataFrame#