ice_pick.utils#
Module Contents#
Classes#
Functions#
|
non-select queries include things like: |
|
|
|
|
|
|
|
|
|
|
|
|
|
Returns a unioned dataframe from the input list of dataframes based on column names. |
|
Returns a unioned dataframe from the input list of dataframes based on column names. |
|
|
|
|
|
|
- ice_pick.utils.snowpark_query(session, sql, non_select=False, dry=False, collect=False)#
- non-select queries include things like:
“show databases;”
- ice_pick.utils._get_schemas(union_dfs: list) list#
- ice_pick.utils._get_unique_struct_fields(struct_field_list: list) list#
- ice_pick.utils._handle_struct_name_type_mismatch(unique_structs_list: list) list#
- ice_pick.utils._create_union_dataframe(session: snowflake.snowpark.Session, unique_structs_type_list: list) snowflake.snowpark.DataFrame#
- ice_pick.utils._extend_schema(df: snowflake.snowpark.DataFrame, union_df: snowflake.snowpark.DataFrame) snowflake.snowpark.DataFrame#
- ice_pick.utils._add_null_cols(union_dfs: list, union_df: snowflake.snowpark.DataFrame) list#
- ice_pick.utils._populate_union_df(ext_dfs: list, union_df: snowflake.snowpark.DataFrame) snowflake.snowpark.DataFrame#
- ice_pick.utils.concat_standalone(session: snowflake.snowpark.Session, union_dfs: list) snowflake.snowpark.DataFrame#
Returns a unioned dataframe from the input list of dataframes based on column names. Primarly to handle cases where the number of columns do not match, which is not suppored by the base union function. If columns do not match, non-matching columns are added with null values to the base dataframes.
- Parameters:
session (Session) – session object
union_dfs (list) – A list of the input snowpark dataframes to union
- Returns:
A snowpark dataframe with the unioned input dataframes
- Return type:
snowpark.DataFrame
Example
>> schema_1 = StructType([StructField(“a”, IntegerType()), StructField(“b”, StringType())])>> schema_2 = StructType([StructField(“a”, FloatType()), StructField(“c”, StringType())])>> schema_3 = StructType([StructField(“a”, IntegerType()), StructField(“c”, StringType())])>> schema_4 = StructType([StructField(“c”, StringType()), StructField(“d”, StringType())])>> df_1 = session.create_dataframe([[1, “snow”], [3, “flake”]], schema_1)>> df_2 = session.create_dataframe([[2.0, “ice”], [4.0, “pick”]], schema_2)>> df_3 = session.create_dataframe([[6, “test_1”], [7, “test_2”]], schema_3)>> df_4 = session.create_dataframe([[“testing_d”, “testing_f”], [“testing_g”, “testing_h”]], schema_4)>> union_dfs = [df_1, df_2, df_3, df_4]>> unioned_df = auto_union(session, union_dfs)>> unioned_df.show()—————————————-—————————————-—————————————-
- ice_pick.utils.melt_standalone(session: snowflake.snowpark.Session, df: snowflake.snowpark.DataFrame, id_vars: list, value_vars: list, var_name: str = 'variable', value_name: str = 'value') snowflake.snowpark.DataFrame#
Returns a unioned dataframe from the input list of dataframes based on column names. Primarly to handle cases where the number of columns do not match, which is not suppored by the base union function. If columns do not match, non-matching columns are added with null values to the base dataframes.
- Parameters:
session (Session) – session object
df (snowpark.DataFrame) – A snowpark dataframe to melt
id_vars (list) – Column names to use as identifiers
value_vars (list) – Column names to unpivot
var_name (str, default 'variable') – Name of the variable column
value_name (str, default 'value') – Name of the value column
- Returns:
A snowpark dataframe with the unpivoted dataframes
- Return type:
snowpark.DataFrame
Example
>> schema = StructType([StructField(“A”, StringType()), StructField(“B”, IntegerType()), StructField(“C”, IntegerType())])>> df = session.create_dataframe([[‘a’, 1, 2], [‘b’, 3, 4], [‘c’, 5, 6]], schema)>> melt_df = session.melt(df, [‘A’], [‘B’, ‘C’])>> melt_df.show()
- ice_pick.utils.isna()#
- ice_pick.utils.isnull()#
- ice_pick.utils.pivot()#
- ice_pick.utils.get_dummies()#