modin.pandas.merge_asofΒΆ
- modin.pandas.merge_asof(left, right, on: str | None = None, left_on: str | None = None, right_on: str | None = None, left_index: bool = False, right_index: bool = False, by: str | list[str] | None = None, left_by: str | None = None, right_by: str | None = None, suffixes: Suffixes = ('_x', '_y'), tolerance: int | Timedelta | None = None, allow_exact_matches: bool = True, direction: str = 'backward') pd.DataFrame [source]ΒΆ
Perform a merge by key distance.
This is similar to a left-join except that we match on nearest key rather than equal keys. Both DataFrames must be sorted by the key. For each row in the left DataFrame:
A βbackwardβ search selects the last row in the right DataFrame whose βonβ key is less than or equal to the leftβs key. A βforwardβ search selects the first row in the right DataFrame whose βonβ key is greater than or equal to the leftβs key. A βnearestβ search selects the row in the right DataFrame whose βonβ key is closest in absolute distance to the leftβs key.
Optionally match on equivalent keys with βbyβ before searching with βonβ.
- Parameters:
on (label) β Field name to join on. Must be found in both DataFrames. The data MUST be ordered. Furthermore, this must be a numeric column such as datetimelike, integer, or float. On or left_on/right_on must be given.
left_on (label) β Field name to join on in left DataFrame.
right_on (label) β Field name to join on in right DataFrame.
left_index (bool) β Use the index of the left DataFrame as the join key.
right_index (bool) β Use the index of the right DataFrame as the join key.
by (column name or list of column names) β Match on these columns before performing merge operation.
left_by (column name) β Field names to match on in the left DataFrame.
right_by (column name) β Field names to match on in the right DataFrame.
suffixes (2-length sequence (tuple, list, β¦)) β Suffix to apply to overlapping column names in the left and right side, respectively.
tolerance (int or Timedelta, optional, default None) β Select asof tolerance within this range; must be compatible with the merge index.
allow_exact_matches (bool, default True) β If True, allow matching with the same βonβ value (i.e. less-than-or-equal-to / greater-than-or-equal-to) If False, donβt match the same βonβ value (i.e., strictly less-than / strictly greater-than).
direction (βbackwardβ (default), βforwardβ, or βnearestβ) β Whether to search for prior, subsequent, or closest matches.
- Return type:
Snowpark pandas
DataFrame
Examples
>>> left = pd.DataFrame({"a": [1, 5, 10], "left_val": ["a", "b", "c"]}) >>> left a left_val 0 1 a 1 5 b 2 10 c >>> right = pd.DataFrame({"a": [1, 2, 3, 6, 7], "right_val": [1, 2, 3, 6, 7]}) >>> right a right_val 0 1 1 1 2 2 2 3 3 3 6 6 4 7 7 >>> pd.merge_asof(left, right, on="a") a left_val right_val 0 1 a 1 1 5 b 3 2 10 c 7 >>> pd.merge_asof(left, right, on="a", allow_exact_matches=False) a left_val right_val 0 1 a NaN 1 5 b 3.0 2 10 c 7.0 >>> pd.merge_asof(left, right, on="a", direction="forward") a left_val right_val 0 1 a 1.0 1 5 b 6.0 2 10 c NaN
Here is a real-world times-series example:
>>> quotes = pd.DataFrame( ... { ... "time": [ ... pd.Timestamp("2016-05-25 13:30:00.023"), ... pd.Timestamp("2016-05-25 13:30:00.023"), ... pd.Timestamp("2016-05-25 13:30:00.030"), ... pd.Timestamp("2016-05-25 13:30:00.041"), ... pd.Timestamp("2016-05-25 13:30:00.048"), ... pd.Timestamp("2016-05-25 13:30:00.049"), ... pd.Timestamp("2016-05-25 13:30:00.072"), ... pd.Timestamp("2016-05-25 13:30:00.075") ... ], ... "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01], ... "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03] ... } ... ) >>> quotes time bid ask 0 2016-05-25 13:30:00.023 720.50 720.93 1 2016-05-25 13:30:00.023 51.95 51.96 2 2016-05-25 13:30:00.030 51.97 51.98 3 2016-05-25 13:30:00.041 51.99 52.00 4 2016-05-25 13:30:00.048 720.50 720.93 5 2016-05-25 13:30:00.049 97.99 98.01 6 2016-05-25 13:30:00.072 720.50 720.88 7 2016-05-25 13:30:00.075 52.01 52.03 >>> trades = pd.DataFrame( ... { ... "time": [ ... pd.Timestamp("2016-05-25 13:30:00.023"), ... pd.Timestamp("2016-05-25 13:30:00.038"), ... pd.Timestamp("2016-05-25 13:30:00.048"), ... pd.Timestamp("2016-05-25 13:30:00.048"), ... pd.Timestamp("2016-05-25 13:30:00.048") ... ], ... "price": [51.95, 51.95, 720.77, 720.92, 98.0], ... "quantity": [75, 155, 100, 100, 100] ... } ... ) >>> trades time price quantity 0 2016-05-25 13:30:00.023 51.95 75 1 2016-05-25 13:30:00.038 51.95 155 2 2016-05-25 13:30:00.048 720.77 100 3 2016-05-25 13:30:00.048 720.92 100 4 2016-05-25 13:30:00.048 98.00 100 >>> pd.merge_asof(trades, quotes, on="time") time price quantity bid ask 0 2016-05-25 13:30:00.023 51.95 75 51.95 51.96 1 2016-05-25 13:30:00.038 51.95 155 51.97 51.98 2 2016-05-25 13:30:00.048 720.77 100 720.50 720.93 3 2016-05-25 13:30:00.048 720.92 100 720.50 720.93 4 2016-05-25 13:30:00.048 98.00 100 720.50 720.93