modules.filters
Classes
- class modules.filters.TableSeriesSelector
Select a series of from a table or tables.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str | None = None, auto_run: bool = True, table: PortTypeHint.TableCollection | PortTypeHint.TableData | None = None, select_field: tuple[str, str] | str | None = None) None
Initialize TableSeriesSelector object.
Parameters
- table_collectionTableCollection
The table collection to select from. select_field : tuple[str, str] | str, default: None The field to select from. If tuple, used for TableCollection. The first element is the table name, the second element is the field name. If str, used for TableData. The field name. Ports
- InputTablePortTypeHint.TableCollection | PortTypeHint.TableData
The table to select from.
- OutputTableSeriesPortTypeHint.TableSeries
The selected table series.
Attributes:
- InputTable: PortReference[PortTypeHint.TableCollection | PortTypeHint.TableData]
- OutputTableSeries: PortReference[PortTypeHint.TableSeries]
- class modules.filters.TableFieldsSelector
Select or remove columns from TableData / TableCollection.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'TableFieldsSelector', auto_run: bool = True, table: PortTypeHint.TableCollection | PortTypeHint.TableData | None = None, select_fields: list[str] | dict[str, list[str]] | None = None, operation: Literal[select, remove] = 'select', rename_output: TableMetadata | dict | None = None, rename_nested_table: dict[str, TableMetadata | dict] | None = None) None
Initialize TableFieldsSelector object.
Raises
- TableData +
dict[str, list[str]] at least one key must equal this table’s name or title.
- TableData +
- TableCollection +
list[str] same labels for each nested table; unknown labels are skipped for both
selectandremove.
- TableCollection +
- TableCollection +
dict[str, list[str]] keys are table name/title.
- TableCollection +
- Targeted tables
selectraises :class:KeyErroron unknown field labels;removeskips unknown fields. Omitted tables are deep-copied unchanged.- operation
{“select”, “remove”}, default
"select""select"keeps only resolved columns;"remove"drops resolved columns. List indexing keeps TableData for a single retained column.- rename_output
TableMetadata | dict | None, optional Optional renaming of the output table or collection shell. Accepts a
- Exception
class:
~gdi.dataclass.tables.TableMetadatainstance or a plaindictwith only keys"name","title", and/or"description"- (all optional). String values must be non-empty. Examples
: TableMetadata(name=”my_table”, title=”My title”, description=”…”)
- {“name”
“my_table”}
- {“title”
“My title”, “description”: “A description”} For TableData input, applies to the output
TableData. For TableCollection input, applies to the output collection object (not individual nested tables — userename_nested_table).- rename_nested_table
dict[str, TableMetadata | dict] | None, optional Only used when the input is
TableCollection. Must beNoneor adict- mapping each nested table to a
class:
~gdi.dataclass.tables.TableMetadatainstance or an identity dict (same shape asrename_output). Eachkeyis the nested table’snameortitlebefore this module. If a table’snameandtitlediffer and both appear as keys, the two values- must be identical. Example
: {
- “钻孔土层”
TableMetadata(name=”钻孔土层_fields”, title=”土层 (subset)”),
- “钻孔水位”
{“name”: “gwt_fields”}, } Renaming a nested table’s
nameupdatesmain_table,sub_tables, and dictprimary_keykeys on the collection to stay consistent. Ports- InputTable
PortTypeHint.TableCollection | PortTypeHint.TableData The table to select from.
- OutputTable
PortTypeHint.TableData | PortTypeHint.TableCollection The table or table collection after the field operation.
Attributes:
- InputTable: PortReference[PortTypeHint.TableCollection | PortTypeHint.TableData]
- OutputTable: PortReference[PortTypeHint.TableData | PortTypeHint.TableCollection]
- class modules.filters.TableSelector
Select a table from a TableCollection.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str | None = None, auto_run: bool = True, tables: PortTypeHint.TableCollection | None = None, table_name: str | None = None, table_idx: int | None = None) None
Initialize TableSelector object.
Parameters
- tablesTableCollection
The table collection to select from.
- table_namestr, default: None
The name or title of the table to select. If not None, ‘table_idx’ will be ignored.
- table_idxint, default: None
The index of table in the table collection.
Attributes:
- InputTables: PortReference[PortTypeHint.TableCollection]
- OutputTable: PortReference[PortTypeHint.TableData]
- class modules.filters.TableCollectionSelector
Select tables or remove tables from a TableCollection.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'TableCollectionSelector', auto_run: bool = True, tables: PortTypeHint.TableCollection | None = None, table_names: list[str] | None = None, table_idxs: list[int] | None = None, operation: Literal[select, remove] = 'select') None
Initialize TableCollectionSelector object.
Parameters
- tablesTableCollection
The table collection to select from.
- table_nameslist[str] | None
The names or titles of the tables to select or remove. If the table_name is not in the table collection, it will be ignored. If not None, ‘table_idxs’ will be ignored.
- table_idxslist[int] | None
The indices of the tables to select or remove. If the table_idx is out of range, it will be ignored. operation : Literal[“select”, “remove”] The operation to perform on the table collection. If it’s ‘select’, the tables will be selected from the table collection. If it’s ‘remove’, the tables will be removed from the table collection.
Notes
A new table collections will be created so that the original table collection will not be modified. ````main_table````, ````sub_tables````, and ````primary_key```` on the output are derived from the input and restricted to the tables that remain after the operation. If table_names and table_idxs are both None: If operation is ``select``, None will be returned. If operation is ``remove``, the original table collection will be returned.
- execute() PortTypeHint.TableCollection | None
Execute the table selection or removal operation.
Returns
- Any
TableCollection or None The resulting table collection after selection/removal operation.
Attributes:
- InputTables: PortReference[PortTypeHint.TableCollection]
- OutputTables: PortReference[PortTypeHint.TableCollection]
- class modules.filters.TablesQuery
Filter tables using SQL-like query template expressions with dynamic UI variables.
This module provides a flexible, template-based approach to filtering tables using pandas query syntax. Template variables allow end-users to change query values through the UI without editing the query directly.
Supports:
Single TableData filtering
TableCollection filtering (all tables independently)
Related table filtering with cascade (main → children)
Auto-detection of table relationships from TableCollection
Dynamic template variables with auto-generated or explicit UI schemas
Template Variable Syntax Use curly braces for variable placeholders: {variable_name} String values are auto-quoted based on value_type - no manual quotes needed!
"``年份== {tpl_year}”`` with value_type=”int” →年份== 2007"``国家== {tpl_country}”`` with value_type=”str” →国家== ‘US’
Important: All template variable names MUST start with ‘tpl_’ prefix to avoid conflicts with class attributes.
Variable Configuration Template variables can be configured in two ways:
TemplateVariableConfig: For auto-generated schemas (auto_select, auto_range)
UIAttributeSchema: For explicit schema control (direct use)
Examples
Simple single table filter (no variables) >>> TablesQuery( ... query_template="year == 2007 and country == 'US'" ... ) With auto-generated schema (unique values dropdown) >>> TablesQuery( ... query_template="``年份`` == {tpl_year} and ``国家`` == {tpl_country}", ... template_variables={ ... "tpl_year": TemplateVariableConfig( ... title="年份", ... default=2007, ... value_type="int", ... schema_type="auto_select", ... ), ... "tpl_country": TemplateVariableConfig( ... title="国家", ... default="US", ... value_type="str", # Auto-quoted in query ... schema_type="auto_select", ... ) ... } ... ) >>> # Users can change values via attributes: >>> query_module.tpl_year = 2008 >>> query_module.tpl_country = "CN" With explicit UIAttributeSchema (full control) >>> TablesQuery( ... query_template="``状态`` == {tpl_status}", ... template_variables={ ... "tpl_status": StringAttributeSchema( ... title="状态", ... default="active", ... selections=["active", "inactive"], ... selections_name=["活动", "非活动"] ... ) ... } ... ) Auto-range for numeric filtering (slider/number with min/max) >>> TablesQuery( ... query_template="``Depth`` > {tpl_min_depth}", ... template_variables={ ... "tpl_min_depth": TemplateVariableConfig( ... title="最小深度", ... default=10, ... value_type="float", ... schema_type="auto_range", ... ) ... } ... )
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'TablesQuery', auto_run: bool = True, tables: PortTypeHint.TableCollection | PortTypeHint.TableData | None = None, query_template: str | None = None, template_variables: dict[str, TemplateVariableConfig | UIAttributeSchema] | None = None, main_table: str | None = None, related_tables: list[str] | None = None, join_key: str | None = None, cascade_to_children: bool = True, debug_mode: bool = False, none_error_type: Literal[error, warning, gdi_warning, ignore] = 'warning') None
Initialize TablesQuery.
Examples
{ "tpl_year": TemplateVariableConfig(title="年份", default=2007, value_type="int"), "tpl_status": StringAttributeSchema(title="状态", default="active", selections=[...]) } main_table : str, optional Name/title of main table to filter (for related table mode). If not provided, will use TableCollection's main_table if available. related_tables : list[str], optional Names/titles of related tables to filter based on main table results. If not provided, will use TableCollection's sub_tables if available. join_key : str, optional Column name/title used to join main and related tables. If not provided, will use TableCollection's primary_key if available. cascade_to_children : bool, default True When table relationships exist (either explicit or from TableCollection): - True: Apply query to main table first, then cascade filtered keys to children - False: Apply query independently to all tables (ignores relationships) debug_mode : bool, default False If True, show detailed error messages for query evaluation. none_error_type : Literal["error", "warning", "gdi_warning", "ignore"], default "error" How to handle None values in template variables: - "error": Raise a ValueError (default behavior for strict validation) - "warning": Print a UserWarning to the console - "gdi_warning": Print a GDIWarning to the console (shows warning in GDIM) - "ignore": Skip the None check entirely (use when None is acceptable)
- update_ui_schema(reset: bool = False) dict[str, UIAttributeSchema]
Generate UI schemas for query template and template variables.
- execute() PortTypeHint.TableCollection | PortTypeHint.TableData | None
Execute the query filter with template variable substitution.
Properties:
- template_variables
Get the template variables configuration.
Attributes:
- InputTables: PortReference[PortTypeHint.TableCollection | PortTypeHint.TableData]
- OutputTables: PortReference[PortTypeHint.TableCollection | PortTypeHint.TableData]
- TEMPLATE_VARIABLE_PREFIX: str = 'tpl_'
- class modules.filters.RelatedTablesRowFilter
Filter rows of a series related tables based on a condition.
- Inherits from:
_BaseTablesRowFilter
Methods:
- __init__(mname: str | None = None, auto_run: bool = True, tables: PortTypeHint.TableCollection | None = None, filter_by: str | None = None, filter_operation: Literal[(eq, ne, gt, lt, ge, le, in, not_in, contains, starts_with, ends_with)] | None = None, filter_value: Any = None, min_items: int = 0, max_items: int | None = None, filter_by_read_only: bool = False, filter_operation_read_only: bool = False, fiter_by_visible: bool = True, filter_operation_visible: bool = True, filter_value_type: Literal[(str, int, float, list, bool)] | None = None, filter_value_choices: Literal[by_uniques] | None = None, main_table: str | None = None, sub_tables: list[str] | None = None, primary_key: str | None = None, local_functions_path: str | None = None, local_functions_name: str | None = None) None
Initialize TablesRowFilter object.
Parameters
- tablesTableCollection
The table collection to filter.
- filter_bystr
The column name or title to apply the filter on. filter_operation : str, optional Predefined filter operation to use (‘eq’, ‘ne’, ‘gt’, ‘lt’, ‘ge’, ‘le’, etc.) Check FILTER_OPERATIONS for all supported operations. filter_value : Any, optional The value to compare against when using filter_operation.
- min_itemsint, default: 0
The minimum number of filters.
- max_itemsint | None, default: None
The maximum number of filters. If None, no limit.
- filter_by_read_onlybool, default: False
Whether the filter_by is read-only. If it’s True, the filter_by cannot be changed by the user.
- filter_operation_read_onlybool, default: False
Whether the filter_operation is read-only. If it’s True, the filter_operation cannot be changed by the user.
- fiter_by_visiblebool, default: True
Whether the filter_by is visible. If it’s False, the filter_by will not be shown in the UI.
- filter_operation_visiblebool, default: True
Whether the filter_operation is visible. If it’s False, the filter_operation will not be shown in the UI. filter_value_type : Literal[“str”, “int”, “float”, “list”, “bool”] | None, default: None The type of the filter value. It’s used in RangeModel. If it’s None, the filter value will be a string.
- filter_value_choicesLiteral[“by_uniques”] | None, default: None
The choices of the filter value. It’s used in RangeModel. If it’s None, the filter value will be a string.
- main_tablestr
The name of the main table to apply the filter on. sub_tables : list[str], optional List of other tables to filter based on filtered main table. primary_key : str, optional The key column in the main table to use for joining with sub tables. local_functions_path : str, optional The path to the local functions file. If it’s not provided, the module will try to use the local functions file path in the pipeline. local_functions_name : str, optional The name of the local function that defines the function about how to get the filter value range model. The module will use the path of the local functions file to load the class.
Notes
Only the main table and the specified sub tables will be included in the output table collection, othere tables will be ignored.
- class modules.filters.TablesRowFilter
Filter rows of each table from a table collection or a single table based on a condition.
- Inherits from:
_BaseTablesRowFilter
Methods:
- __init__(mname: str | None = 'TablesRowFilter', auto_run: bool = True, tables: PortTypeHint.TableCollection | PortTypeHint.TableData | None = None, filter_by: str | None = None, filter_operation: Literal[(eq, ne, gt, lt, ge, le, in, not_in, contains, starts_with, ends_with, is_null, is_not_null)] | None = None, filter_value: Any = None, min_items: int = 0, max_items: int | None = None, filter_by_read_only: bool = False, filter_operation_read_only: bool = False, fiter_by_visible: bool = True, filter_operation_visible: bool = True, filter_value_type: Literal[(str, int, float, list, bool)] | None = None, filter_value_choices: Literal[by_uniques] | None = None, module_attributes_map: dict[(str, str)] | dict[(str, tuple[(str, str)] | list[str])] | None = None, input_attributes_port_required: bool = False) None
Initialize TablesRowFilter object.
Parameters
- tablesTableCollection | TableData
The table collection or single table to filter.
- filter_bystr
The column name or title to apply the filter on. filter_operation : str, optional Predefined filter operation to use (‘eq’, ‘ne’, ‘gt’, ‘lt’, ‘ge’, ‘le’, etc.) Check FILTER_OPERATIONS for all supported operations. filter_value : Any, optional The value to compare against when using filter_operation. filter_value_type : Literal[“str”, “int”, “float”, “list”, “bool”] | None, default: None The type of the filter value. It’s used in RangeModel. If it’s None, the filter value will be a string. module_attributes_map : dict[str, str] | dict[str, tuple[str, str] | list[str]] | None The map of the other module’s attributes to the current module’s attributes.
- input_attributes_port_requiredbool
Whether the “InputAttributes” port data is required.
Notes
- When input is a TableCollection: All tables with field ``filter_by`` will be included in the output table collection. - When input is a single TableData: The filtered table will be returned as a single TableData object. - Multiple filter conditions can be added using the add_filter method.
Properties:
- InputTables
- OutputTablesRowMask
- class modules.filters.DropDuplicateRows
Drop duplicate rows from a table.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'DropDuplicateRows', auto_run: bool = True, table: PortTypeHint.TableData | None = None, subset: list[str] | None = None, keep_empty_strings: bool = True, keep_null_values: bool = True, join_string_columns: list[str] | None = None, join_separator: str = '\n') None
Initialize DropDuplicateRows object.
Parameters
- tableTableData
The table to drop duplicate rows from.
- subsetlist[str] | None
The columns to use to identify duplicate rows. If None, all columns will be used.
- keep_empty_stringsbool, default: True
Whether to keep rows with empty strings in the output. If False, rows containing empty strings will be removed.
- keep_null_valuesbool, default: True
Whether to keep rows with None or NaN values in the output. If False, rows containing None or NaN values will be removed.
- join_string_columnslist[str] | None, default: None
Columns with string type that should have their values joined when duplicate rows are found (based on subset). The values will be joined using join_separator. If None, no columns will be joined.
- join_separatorstr, default: “
” The separator string to use when joining values in join_string_columns. Ports —–
- InputTablePortTypeHint.TableData
The table to drop duplicate rows from.
- OutputTablePortTypeHint.TableData
The table with duplicate rows dropped.
Attributes:
- InputTable: PortReference[PortTypeHint.TableData]
- OutputTable: PortReference[PortTypeHint.TableData]
- class modules.filters.MarkdownSectionFilter
Filter markdown content by sections with flexible selection rules.
This module parses markdown into a section tree based on headings, then filters sections according to user-defined rules. It’s designed for preparing markdown documents for LLM processing or RAG workflows by:
Selecting relevant sections while dropping noise (TOC, figure lists, etc.)
Preserving parent heading context for selected subsections
Handling tables with multiple strategies (keep/drop/compress)
Providing detailed metadata about what was included/excluded
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'MarkdownSectionFilter', auto_run: bool = True, markdown: PortTypeHint.Text | PortTypeHint.FilePath | None = None, filtered_markdown_key: str = 'filtered_markdown', include_number_prefixes: list[str] | None = None, include_title_patterns: list[str] | None = None, exclude_number_prefixes: list[str] | None = None, exclude_title_patterns: list[str] | None = None, keep_parent_headings: bool = True, drop_preamble: bool = True, drop_toc: bool = True, tables_mode: Literal[keep, drop, caption_only, truncate_rows] = 'keep', table_truncate_rows: int = 10) None
Initialize MarkdownSectionFilter module.
Parameters
- markdownPortTypeHint.Text | PortTypeHint.FilePath | None, default: None
Input markdown content (as string) or file path. If the data of
InputMarkdownport is not None,self.markdownwill be overwritten.- filtered_markdown_keystr, default: “filtered_markdown”
Field name on
``OutputResultModel``for the filtered markdown string. Useful when multiple MarkdownSectionFilter modules merge into downstream steps to avoid attribute conflicts.- include_number_prefixeslist[str] | None, default: None
List of section number prefixes to include (e.g., [“2”, “4”, “5”]). Includes all subsections under these prefixes.
Examples
>>> module = MarkdownSectionFilter( ... markdown="report.md", ... include_number_prefixes=["2", "4"], ... tables_mode="truncate_rows", ... table_truncate_rows=5 ... )
Notes
**Images:** Markdown image syntax (e.g. ````!`alt <path>`_````) and HTML ````<img>```` tags are ordinary body lines; they stay or go with their section. There is no extra pass that removes images from kept sections; use exclusion rules or a downstream step if you need that. To persist markdown, chain :class:``~gdi.modules.writers.TextWriter``. Selection rules are applied in order: #. If both include rules are None, all sections pass #. Sections matching include_number_prefixes OR include_title_patterns pass #. Sections matching exclude rules are then removed (exclusion takes priority) #. Parent headings are added if keep_parent_headings=True
Attributes:
- InputMarkdown: PortReference[PortTypeHint.Text | PortTypeHint.FilePath]
- OutputMarkdown: PortReference[PortTypeHint.Text]
- OutputResultModel: PortReference[PortTypeHint.ResultModel]
- class modules.filters.SingleResultSelector
Select data from SingelReuslt
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'GdimReportTextFilter', auto_run: bool = True, single_result: PortTypeHint.SingleResult | None = None, selected_items: list[str] | str | None = None, multiple_select: bool = True, merged_name: str | None = None, merged_title: str | None = None) None
Initialize TextLibraryResultFilter object.
Parameters
- single_resultSingleResult
The single result to filter.
- selected_itemslist[str] | str, default: None
The name or title of the items to select. If multiple_select is False, the type of seleted_items is str.
- multiple_selectbool, default: True
Whether to allow multiple selection.
- merged_namestr | None, default: None
If a string is given, all the selected items will be merged into one UnitResult using the string as name. All the slected items will be merged into a list. The Unit and description of the first selected item will be used. If multiple_select is False, only the name will be changed.
- merged_titlestr | None, default: None
Only available when merged_name is not None. This string will be used as the title of the merged UnitResult. If None, title will use merged_name.
Properties:
- InputSingleResult
- OutputSingleResult
- class modules.filters.GdimReportTemplateSelector
Select the data from SingelReuslt from GdimAppReportTextReader with
read_document=True.The output will be the template_path, template_name and output_name which are the attributes value of DocPrinter.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'GdimReportTemplateSelector', auto_run: bool = True, single_result: PortTypeHint.SingleResult | None = None, selected_items: list[str] | str | None = None, multiple_select: bool = True) None
Initialize GdimReportTemplateSelector object.
Parameters
- single_resultSingleResult, default: None
The single result to select.
- selected_itemslist[str] | str | None, default: None
The selected items to select. If multiple_select is True, the type is list[str]. If multiple_select is False, the type is list[str] or str.
- multiple_selectbool, default: True
Whether to allow multiple selection.
Properties:
- InputSingleResult
- OutputDocPrinterAttributes
- OutputSingleResultDocs
- class modules.filters.GdimReportTextSelector
Select the data from SingelReuslt from GdimAppReportTextReader with
read_document=False.- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'GdimReportTextSelector', auto_run: bool = True, single_result: PortTypeHint.SingleResult | None = None, selected_items: list[dict[str, list | str]] | None = None, min_items: int = 0, max_items: int | None = None, text_multiple_select: bool = True, group_title_read_only: bool = False) None
Initialize GdimReportTextSelector object.
Parameters
- single_resultSingleResult, default: None
The single result to select. selected_items : list[dict[str, list | str]] | None, default: None The selected items to select. Two keys are available - “group_title” - The selected group title. Always single select, the type is str. “text_indexes” - The selected text indexes. Single select or multiple select, the type is int or list[int].
- min_itemsint, default: 0
The minimum number of selected items.
- max_itemsint | None, default: None
The maximum number of selected items. If None, no limit.
- text_multiple_selectbool, default: True
Whether to allow multiple selection for text_indexes in each group.
Notes
group_title_read_only : bool, default: False Whether to make the group_title read only.
Properties:
- InputSingleResult
- OutputSingleResult
- class modules.filters.GdimAppDataSelector
Select a field value from a :class:
~gdi.dataclass.results.ResultModel.Designed to work with the output of :class:
~gdi.modules.readers.GdimAppDataReader. Extracts one field from the``ResultModel``by constructing the appropriate lookup key. Keys follow the naming strategy used when data was saved to the GDIM database:``
"module_name@port_name"``— output port data``
"pipeline@attr_name"``— pipeline attributes``
"module_name#attr_name"``— module attributes
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'GdimAppDataSelector', auto_run: bool = True, result_model: PortTypeHint.ResultModel | None = None, name: str | None = None, module_name: str | None = None) None
Initialize GdimAppDataSelector.
Parameters
- InputResultModelPortTypeHint.ResultModel
The ResultModel produced by GdimAppDataReader.
- OutputDataPortTypeHint.General
The selected field value extracted from the ResultModel.
- execute() Any
Attributes:
- InputResultModel: PortReference[PortTypeHint.ResultModel]
- OutputData: PortReference[PortTypeHint.General]