ice_pick.utils#

Module Contents#

Classes#

Functions#

snowpark_query(session, sql[, non_select, dry, collect])

non-select queries include things like:

_get_schemas(→ list)

_get_unique_struct_fields(→ list)

_handle_struct_name_type_mismatch(→ list)

_create_union_dataframe(→ snowflake.snowpark.DataFrame)

_extend_schema(→ snowflake.snowpark.DataFrame)

_add_null_cols(→ list)

_populate_union_df(→ snowflake.snowpark.DataFrame)

concat_standalone(→ snowflake.snowpark.DataFrame)

Returns a unioned dataframe from the input list of dataframes based on column names.

melt_standalone(→ snowflake.snowpark.DataFrame)

Returns a unioned dataframe from the input list of dataframes based on column names.

isna()

isnull()

pivot()

get_dummies()

class ice_pick.utils.SQLTracker(func)#
__call__(*args, **kwargs)#
ice_pick.utils.snowpark_query(session, sql, non_select=False, dry=False, collect=False)#
non-select queries include things like:

“show databases;”

ice_pick.utils._get_schemas(union_dfs: list) list#
ice_pick.utils._get_unique_struct_fields(struct_field_list: list) list#
ice_pick.utils._handle_struct_name_type_mismatch(unique_structs_list: list) list#
ice_pick.utils._create_union_dataframe(session: snowflake.snowpark.Session, unique_structs_type_list: list) snowflake.snowpark.DataFrame#
ice_pick.utils._extend_schema(df: snowflake.snowpark.DataFrame, union_df: snowflake.snowpark.DataFrame) snowflake.snowpark.DataFrame#
ice_pick.utils._add_null_cols(union_dfs: list, union_df: snowflake.snowpark.DataFrame) list#
ice_pick.utils._populate_union_df(ext_dfs: list, union_df: snowflake.snowpark.DataFrame) snowflake.snowpark.DataFrame#
ice_pick.utils.concat_standalone(session: snowflake.snowpark.Session, union_dfs: list) snowflake.snowpark.DataFrame#

Returns a unioned dataframe from the input list of dataframes based on column names. Primarly to handle cases where the number of columns do not match, which is not suppored by the base union function. If columns do not match, non-matching columns are added with null values to the base dataframes.

Parameters:
  • session (Session) – session object

  • union_dfs (list) – A list of the input snowpark dataframes to union

Returns:

A snowpark dataframe with the unioned input dataframes

Return type:

snowpark.DataFrame

Example

>> schema_1 = StructType([StructField(“a”, IntegerType()), StructField(“b”, StringType())])
>> schema_2 = StructType([StructField(“a”, FloatType()), StructField(“c”, StringType())])
>> schema_3 = StructType([StructField(“a”, IntegerType()), StructField(“c”, StringType())])
>> schema_4 = StructType([StructField(“c”, StringType()), StructField(“d”, StringType())])
>> df_1 = session.create_dataframe([[1, “snow”], [3, “flake”]], schema_1)
>> df_2 = session.create_dataframe([[2.0, “ice”], [4.0, “pick”]], schema_2)
>> df_3 = session.create_dataframe([[6, “test_1”], [7, “test_2”]], schema_3)
>> df_4 = session.create_dataframe([[“testing_d”, “testing_f”], [“testing_g”, “testing_h”]], schema_4)
>> union_dfs = [df_1, df_2, df_3, df_4]
>> unioned_df = auto_union(session, union_dfs)
>> unioned_df.show()
—————————————-
|”A” |”B” |”C” |”D” |
—————————————-
|1.0 |snow |NULL |NULL |
|3.0 |flake |NULL |NULL |
|2.0 |NULL |ice |NULL |
|4.0 |NULL |pick |NULL |
|6.0 |NULL |test_1 |NULL |
|7.0 |NULL |test_2 |NULL |
|NULL |NULL |testing_d |testing_f |
|NULL |NULL |testing_g |testing_h |
—————————————-
ice_pick.utils.melt_standalone(session: snowflake.snowpark.Session, df: snowflake.snowpark.DataFrame, id_vars: list, value_vars: list, var_name: str = 'variable', value_name: str = 'value') snowflake.snowpark.DataFrame#

Returns a unioned dataframe from the input list of dataframes based on column names. Primarly to handle cases where the number of columns do not match, which is not suppored by the base union function. If columns do not match, non-matching columns are added with null values to the base dataframes.

Parameters:
  • session (Session) – session object

  • df (snowpark.DataFrame) – A snowpark dataframe to melt

  • id_vars (list) – Column names to use as identifiers

  • value_vars (list) – Column names to unpivot

  • var_name (str, default 'variable') – Name of the variable column

  • value_name (str, default 'value') – Name of the value column

Returns:

A snowpark dataframe with the unpivoted dataframes

Return type:

snowpark.DataFrame

Example

>> schema = StructType([StructField(“A”, StringType()), StructField(“B”, IntegerType()), StructField(“C”, IntegerType())])
>> df = session.create_dataframe([[‘a’, 1, 2], [‘b’, 3, 4], [‘c’, 5, 6]], schema)
>> melt_df = session.melt(df, [‘A’], [‘B’, ‘C’])
>> melt_df.show()
——————————
|”A” |”VALUE” |”VARIABLE” |
——————————
|a |1 |B |
|b |3 |B |
|c |5 |B |
|a |2 |C |
|b |4 |C |
|c |6 |C |
——————————
ice_pick.utils.isna()#
ice_pick.utils.isnull()#
ice_pick.utils.pivot()#
ice_pick.utils.get_dummies()#