ExtractText 2025.10.2.19¶
バンドル¶
org.apache.nifi | nifi-standard-nar
説明¶
Evaluates one or more Regular Expressions against the content of a FlowFile. The results of those Regular Expressions are assigned to FlowFile Attributes. Regular Expressions are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed. The attributes are generated differently based on the enabling of named capture groups. If named capture groups are not enabled: The first capture group, if any found, will be placed into that attribute name. But all capture groups, including the matching string sequence itself will also be provided at that attribute name with an index value provided, with the exception of a capturing group that is optional and does not match - for example, given the attribute name "regex" and expression "abc(def)?(g)" we would add an attribute "regex.1" with a value of "def" if the "def" matched. If the "def" did not match, no attribute named "regex.1" would be added but an attribute named "regex.2" with a value of "g" will be added regardless. If named capture groups are enabled: Each named capture group, if found will be placed into the attributes name with the name provided. If enabled the matching string sequence itself will be placed into the attribute name. If multiple matches are enabled, and index will be applied after the first set of matches. The exception is a capturing group that is optional and does not match For example, given the attribute name "regex" and expression "abc(?<NAMED>def)?(?<NAMED-TWO>g)" we would add an attribute "regex. NAMED" with the value of "def" if the "def" matched. We would add an attribute "regex. NAMED-TWO" with the value of "g" if the "g" matched regardless. The value of the property must be a valid Regular Expressions with one or more capturing groups. If named capture groups are enabled, all capture groups must be named. If they are not, then the processor configuration will fail validation. If the Regular Expression matches more than once, only the first match will be used unless the property enabling repeating capture group is set to true. If any provided Regular Expression matches, the FlowFile(s) will be routed to 'matched'. If no provided Regular Expression matches, the FlowFile will be routed to 'unmatched' and no attributes will be applied to the FlowFile.
入力要件¶
REQUIRED
機密動的プロパティをサポート¶
false
プロパティ¶
プロパティ |
説明 |
|---|---|
文字セット |
ファイルがエンコードされている文字セット |
正準同値の有効化 |
2つの文字が一致するのは、その完全な正準分解が一致するときだけであることを示します。 |
大文字と小文字を区別しない一致を有効にします。 |
Indicates that two characters match even if they are in a different case. Can also be specified via the embedded flag (?i). |
DOTALL モードの有効化 |
Indicates that the expression '.' should match any character, including a line terminator. Can also be specified via the embedded flag (?s). |
パターンのリテラル解析の有効化 |
メタ文字とエスケープ文字に特別な意味を与えないことを示します。 |
マルチラインモードの有効化 |
Indicates that '^' and '$' should match just after and just before a line terminator or end of sequence, instead of only the beginning or end of the entire input. Can also be specified via the embeded flag (?m). |
Unicode 定義済み文字クラスの有効化 |
Specifies conformance with the Unicode Technical Standard #18: Unicode Regular Expression Annex C: Compatibility Properties. Can also be specified via the embedded flag (?U). |
Unicodeを意識した大文字と小文字の折りたたみの有効化 |
When used with 'Enable Case-insensitive Matching', matches in a manner consistent with the Unicode Standard. Can also be specified via the embedded flag (?u). |
Unixラインモードの有効化 |
Indicates that only the 'line terminator is recognized in the behavior of'. ','^ ', and'$'. Can also be specified via the embedded flag (?d). |
名前付きグループサポートの有効化 |
If set to true, when named groups are present in the regular expression, the name of the group will be used in the attribute name as opposed to the group index. All capturing groups must be named, if the number of groups (not including capture group 0) does not equal the number of named groups validation will fail. |
繰り返しキャプチャグループの有効化 |
trueにセットすると、キャプチャグループにマッチする文字列がすべて抽出されます。そうでない場合、正規表現が複数回一致すると、最初に一致したものだけが抽出されます。 |
キャプチャグループ0を含む |
キャプチャグループ0を属性として含めることを示します。キャプチャグループ0は正規表現一致の全体を表し、通常は使用されず、かなりの長さになる可能性があります。 |
最大バッファサイズ |
正規表現を適用するためにバッファリングするデータの最大量 (FlowFile ごと) を指定します。FlowFiles が指定された最大値より大きい場合は、完全に評価されません。 |
最大キャプチャグループ長 |
キャプチャーグループの値の最大文字数を指定します。最大値を超える文字は切り捨てられます。 |
パターン内の空白とコメントの許可 |
In this mode, whitespace is ignored, and embedded comments starting with # are ignored until the end of a line. Can also be specified via the embedded flag (?x). |
リレーションシップ¶
名前 |
説明 |
|---|---|
matched |
FlowFiles がこの関係にルーティングされるのは、正規表現が正常に評価され、その結果 FlowFile が変更されたときです。 |
unmatched |
FlowFiles の内容に一致する正規表現がない場合は、このリレーションに FlowFile がルーティングされます。 |