You are viewing documentation about an older version (1.25.0). View latest version

modin.pandas.DataFrame.assign¶

DataFrame.assign(**kwargs) → DataFrame[source]¶

Assign new columns to a DataFrame.

Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

Parameters:: **kwargs (dict of {str: callable or Series}) – The column names are the keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though Snowpark pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.
Returns:: A new DataFrame with the new columns in addition to all the existing columns.
Return type:: DataFrame

Notes

Assigning multiple columns within the same assign is possible. Later items in **kwargs may refer to newly created or modified columns in df; items are computed and assigned into df in order.
If an array that of the wrong length is passed in to assign, Snowpark pandas will either truncate the array, if it is too long, or broadcast the last element of the array until the array is the correct length if it is too short. This differs from native pandas, which will error out with a ValueError if the length of the array does not match the length of df. This is done to preserve Snowpark pandas’ lazy evaluation paradigm.

Examples

>>> df = pd.DataFrame({'temp_c': [17.0, 25.0]},
...                   index=['Portland', 'Berkeley'])
>>> df
          temp_c
Portland    17.0
Berkeley    25.0

>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,
...           temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9)
          temp_c  temp_f  temp_k
Portland    17.0    62.6  290.15
Berkeley    25.0    77.0  298.15

>>> df = pd.DataFrame({'col1': [17.0, 25.0, 22.0]})
>>> df
   col1
0  17.0
1  25.0
2  22.0

>>> df.assign(new_col=[10, 11])
   col1  new_col
0  17.0       10
1  25.0       11
2  22.0       11

>>> df.assign(new_col=[10, 11, 12, 13, 14])
   col1  new_col
0  17.0       10
1  25.0       11
2  22.0       12