You are viewing documentation about an older version (1.27.0). View latest version

modin.pandas.merge_asofΒΆ

modin.pandas.merge_asof(left, right, on: str | None = None, left_on: str | None = None, right_on: str | None = None, left_index: bool = False, right_index: bool = False, by: str | list[str] | None = None, left_by: str | None = None, right_by: str | None = None, suffixes: Suffixes = ('_x', '_y'), tolerance: int | Timedelta | None = None, allow_exact_matches: bool = True, direction: str = 'backward') β†’ pd.DataFrame[source]ΒΆ

Perform a merge by key distance.

This is similar to a left-join except that we match on nearest key rather than equal keys. Both DataFrames must be sorted by the key. For each row in the left DataFrame:

A β€œbackward” search selects the last row in the right DataFrame whose β€˜on’ key is less than or equal to the left’s key. A β€œforward” search selects the first row in the right DataFrame whose β€˜on’ key is greater than or equal to the left’s key. A β€œnearest” search selects the row in the right DataFrame whose β€˜on’ key is closest in absolute distance to the left’s key.

Optionally match on equivalent keys with β€˜by’ before searching with β€˜on’.

Parameters:
  • left (DataFrame or named Series.) –

  • right (DataFrame or named Series.) –

  • on (label) – Field name to join on. Must be found in both DataFrames. The data MUST be ordered. Furthermore, this must be a numeric column such as datetimelike, integer, or float. On or left_on/right_on must be given.

  • left_on (label) – Field name to join on in left DataFrame.

  • right_on (label) – Field name to join on in right DataFrame.

  • left_index (bool) – Use the index of the left DataFrame as the join key.

  • right_index (bool) – Use the index of the right DataFrame as the join key.

  • by (column name or list of column names) – Match on these columns before performing merge operation.

  • left_by (column name) – Field names to match on in the left DataFrame.

  • right_by (column name) – Field names to match on in the right DataFrame.

  • suffixes (2-length sequence (tuple, list, …)) – Suffix to apply to overlapping column names in the left and right side, respectively.

  • tolerance (int or Timedelta, optional, default None) – Select asof tolerance within this range; must be compatible with the merge index.

  • allow_exact_matches (bool, default True) – If True, allow matching with the same β€˜on’ value (i.e. less-than-or-equal-to / greater-than-or-equal-to) If False, don’t match the same β€˜on’ value (i.e., strictly less-than / strictly greater-than).

  • direction (β€˜backward’ (default), β€˜forward’, or β€˜nearest’) – Whether to search for prior, subsequent, or closest matches.

Return type:

Snowpark pandas DataFrame

Examples

>>> left = pd.DataFrame({"a": [1, 5, 10], "left_val": ["a", "b", "c"]})
>>> left
    a left_val
0   1        a
1   5        b
2  10        c
>>> right = pd.DataFrame({"a": [1, 2, 3, 6, 7], "right_val": [1, 2, 3, 6, 7]})
>>> right
   a  right_val
0  1          1
1  2          2
2  3          3
3  6          6
4  7          7
>>> pd.merge_asof(left, right, on="a")
    a left_val  right_val
0   1        a          1
1   5        b          3
2  10        c          7
>>> pd.merge_asof(left, right, on="a", allow_exact_matches=False)
    a left_val  right_val
0   1        a        NaN
1   5        b        3.0
2  10        c        7.0
>>> pd.merge_asof(left, right, on="a", direction="forward")
    a left_val  right_val
0   1        a        1.0
1   5        b        6.0
2  10        c        NaN
Copy

Here is a real-world times-series example:

>>> quotes = pd.DataFrame(
...    {
...        "time": [
...            pd.Timestamp("2016-05-25 13:30:00.023"),
...            pd.Timestamp("2016-05-25 13:30:00.023"),
...            pd.Timestamp("2016-05-25 13:30:00.030"),
...            pd.Timestamp("2016-05-25 13:30:00.041"),
...            pd.Timestamp("2016-05-25 13:30:00.048"),
...            pd.Timestamp("2016-05-25 13:30:00.049"),
...            pd.Timestamp("2016-05-25 13:30:00.072"),
...            pd.Timestamp("2016-05-25 13:30:00.075")
...        ],
...        "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
...        "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
...    }
... )
>>> quotes
                     time     bid     ask
0 2016-05-25 13:30:00.023  720.50  720.93
1 2016-05-25 13:30:00.023   51.95   51.96
2 2016-05-25 13:30:00.030   51.97   51.98
3 2016-05-25 13:30:00.041   51.99   52.00
4 2016-05-25 13:30:00.048  720.50  720.93
5 2016-05-25 13:30:00.049   97.99   98.01
6 2016-05-25 13:30:00.072  720.50  720.88
7 2016-05-25 13:30:00.075   52.01   52.03
>>> trades = pd.DataFrame(
...    {
...        "time": [
...            pd.Timestamp("2016-05-25 13:30:00.023"),
...            pd.Timestamp("2016-05-25 13:30:00.038"),
...            pd.Timestamp("2016-05-25 13:30:00.048"),
...            pd.Timestamp("2016-05-25 13:30:00.048"),
...            pd.Timestamp("2016-05-25 13:30:00.048")
...        ],
...        "price": [51.95, 51.95, 720.77, 720.92, 98.0],
...        "quantity": [75, 155, 100, 100, 100]
...    }
... )
>>> trades
                     time   price  quantity
0 2016-05-25 13:30:00.023   51.95        75
1 2016-05-25 13:30:00.038   51.95       155
2 2016-05-25 13:30:00.048  720.77       100
3 2016-05-25 13:30:00.048  720.92       100
4 2016-05-25 13:30:00.048   98.00       100
>>> pd.merge_asof(trades, quotes, on="time")
                     time   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   51.95        75   51.95   51.96
1 2016-05-25 13:30:00.038   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.048  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.048  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.048   98.00       100  720.50  720.93
Copy