modules.converters
Classes
- class modules.converters.ConvertToMaterialTable
Convert the input table to a MaterialTable object which can be used for geo calculation.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'ConvertToMaterialTable', auto_run: bool = True, table: PortTypeHint.TableData | PortTypeHint.TableCollection | None = None, column_name_map: dict[str, GProps] | None = None, sort_by_layer_number: bool = True, reset_material_id: bool = True, table_name: str | None = None) None
Initialize a ConvertToMaterialTable object.
Parameters
- tablePortTypeHint.Table | None, default: None
The input table data. column_name_map: dict[str, GProps] | None, default: None The mapping of the column name of the table and the materials name. The keys are the field name or field title of the table (resolved automatically) and the values are the material property. If the mapping of a column is not provided, the module will try to find the column with the same name as the value of GeoMaterialProps automatically.
- sort_by_layer_numberbool, default: True
If True, the data will be sorted by the layer number. The material_id starts from 0.
- reset_material_idbool, default: True
If True, the material_id will be reset to the ascending order of layer number which means the material_id will start from 0 at the first row after sorting. It’s available only when ‘sort_by_layer_number’ is True.
- table_namestr | None, default: None
The name or title of the material table in the table collection. This is only used when table is a TableCollection. If None, the table with name “standard_layer_table” will be used. If there is no “standard_layer_table” in the collection, the table with title “标准地层表” will be used. If there is no “standard_layer_table” and “标准地层表” in the collection, the first table in the collection will be used. Ports
- InputTablePortTypeHint.TableData | PortTypeHint.TableCollection
The input table data.
- OutputMaterialTablePortTypeHint.MaterialTable
The output material table.
Notes
GProps.LayerNumber will be converted to string type forcefully.
Attributes:
- InputTable: PortReference[PortTypeHint.TableData | PortTypeHint.TableCollection]
- OutputMaterialTable: PortReference[PortTypeHint.MaterialTable]
- class modules.converters.ConvertToMaterialTables
Convert the input table collection to a MaterialTable collection which can be used for geo calculation.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'ConvertToMaterialTables', auto_run: bool = True, tables: PortTypeHint.TableCollection | None = None, column_name_map: dict[str, GProps] | None = None, sort_by_layer_number: bool = True, format_dict: dict[GProps, str] | None = None) None
Initialize a ConvertToMaterialTables object.
Parameters
- tablesPortTypeHint.TableCollection | None, default: None
The input table collection. column_name_map: dict[str, GProps] | None, default: None The mapping of the column name of the table and the materials name. The key is the column name or field title of the table and the value is the materials name.
- sort_by_layer_numberbool, default: True
If True, the data will be sorted by the layer number. The material_id starts from 0. format_dict: dict[GProps, str] | None, default: None The dictionary of the column name and the type which is used to convert the column to the specific type. The key is the material name and the value is the type.
Notes
MatrialTables are often used when each bore or section has its own material table. So the key of the MaterialTables is usually the bore or section number or name.
Properties:
- InputTables
- OutputMaterialTables
- class modules.converters.ConvertToMultiProfile1D
Convert the input table collection to a MultiProfile1D object which can be used for geo calculation.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'ConvertToMultiProfile1D', auto_run: bool = True, tables: PortTypeHint.TableCollection | None = None, material_table: PortTypeHint.MaterialTable | None = None, bore_table_name: str | None = None, bore_table_field_names: list[str] | None = None, layer_table_name: str | None = None, layer_table_field_names: list[str] | None = None, selected_bores: list[str] | None = None, sort_layers_table: bool = False) None
Initialize a ConvertToMultiProfile1D object.
Parameters
- tablesPortTypeHint.TableCollection | None, default: None
The input tables data.
- material_tablePortTypeHint.MaterialTable | None, default: None
The input material table used to resolve
materials_idfrom layer numbers- bore_table_namestr | None, default: None
Bore list table key (table
nameortitle). IfNone, the module looks for a table namedbore_table, then a table titled钻孔一览表.- bore_table_field_nameslist[str] | None, default: None
Five field identifiers (names or titles) for the bore table, in order: bore number, bore top elevation, X, Y, steady groundwater depth. If
None, each column is matched by English name or Chinese title in order:bore_number/钻孔编号bore_top/孔口高程x_coordinate/X坐标y_coordinate/Y坐标steady_ground_water_depth/稳定水位埋深
- layer_table_namestr | None, default: None
Layer table key (
nameortitle). IfNone, looks forlayer_tablethen title地层表.- layer_table_field_nameslist[str] | None, default: None
Three field identifiers: bore number, layer bottom depth, layer number. If
None, each column is matched by English name or Chinese title in order:bore_number/钻孔编号layer_bottom_depth/层底深度layer_number/地层编号
- selected_boreslist[str] | None, default: None
If set, only these bore identifiers (values in the bore-number column) are kept.
- sort_layers_tablebool, default: False
If False, keep the layer table row order as in the input. If True, sort rows by bore number ascending, then layer bottom depth ascending.
Attributes:
- InputTables: PortReference[PortTypeHint.TableCollection]
- InputMaterialTable: PortReference[PortTypeHint.MaterialTable]
- OutputMultiProfile1D: PortReference[PortTypeHint.MultiProfile1D]
- class modules.converters.PdfToImages
Convert PDF files to images.
This module converts single or multiple PDF files to images. Each PDF can have its own configuration (image prefix, page range, dimensions) specified via a table-based interface.
Features
Convert single or multiple PDF files
Per-file configuration for page ranges and output dimensions
Auto-generated or custom image prefixes
Table-based UI for easy configuration
Examples
Minimal configuration (just file paths, all defaults): >>> converter = PdfToImages( ... pdf_configs=[ ... {"file_path": "doc1.pdf"}, ... {"file_path": "doc2.pdf"} ... ] ... ) With selective per-file configuration: >>> converter = PdfToImages( ... pdf_configs=[ ... {"file_path": "doc1.pdf", "image_prefix": "bore", "last_page": 5}, ... {"file_path": "doc2.pdf", "first_page": 3, "width": 1920} ... ] ... )
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'PdfToImages', auto_run: bool = True, output_dir: str | Path | None = None, dpi: int = 200, format: Literal[png, jpg, jpeg] = 'png', pdf_configs: list[_PdfConfigRow] | None = None) None
Initialize a PdfToImages object.
Parameters
- mnamestr, default: “PdfToImages”
Module name.
- auto_runbool, default: True
Whether to auto-run the module.
- output_dirstr | Path | None, default: None
The directory to save all output images.
‘workspace’ of pipeline has priority over the ‘output_dir’.
If both ‘output_dir’ and ‘workspace’ are None, the directory
of the first PDF will be used.
- dpiint, default: 200
The DPI of the pictures converted from documents. format : Literal[“png”, “jpg”, “jpeg”], default: “png” The format of the images converted from PDFs.
- pdf_configslist[_PdfConfigRow] | None, default: None
List of per-file configurations. Each configuration is a dictionary. Only
file_pathis required; other keys are optional with defaults:file_path: str - PDF file path (required)
image_prefix: str - Prefix for output images (default: auto-generate)
first_page: int - First page to convert, 1-based (default: 1)
last_page: int | None - Last page (default: None = all pages)
width: int | None - Image width (default: None = use DPI)
height: int | None - Image height (default: None = use DPI)
Notes
- If image_prefix is empty, prefixes are automatically generated from PDF filenames - If multiple PDFs have the same filename, numeric suffixes (_1, _2, etc.) are added - All images are saved to the same output directory
Attributes:
- OutputImages: PortReference[PortTypeHint.FilesPath]
- class modules.converters.TableToDataFrame
Convert TableData to DataFrame.
This module converts a TableData object (which has enhanced metadata like field titles, units, descriptions) to a standard pandas DataFrame. This is useful when you need to use the data with libraries or functions that expect pure pandas DataFrames.
Examples
Basic usage: >>> converter = TableToDataFrame() >>> converter.InputTable = my_table_data >>> result_df = converter.OutputDataFrame Using field titles as column names: >>> converter = TableToDataFrame(use_titles_as_columns=True) >>> converter.InputTable = my_table_data >>> result_df = converter.OutputDataFrame # DataFrame with human-readable column names Resetting index: >>> converter = TableToDataFrame(preserve_index=False) >>> converter.InputTable = my_table_data >>> result_df = converter.OutputDataFrame # DataFrame with default 0, 1, 2... index
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'TableToDataFrame', auto_run: bool = True, table: PortTypeHint.TableData | None = None, use_titles_as_columns: bool = False, preserve_index: bool = True) None
Initialize a TableToDataFrame object.
Parameters
- tablePortTypeHint.TableData | None, default: None
The input TableData to convert.
- use_titles_as_columnsbool, default: False
If True, use field titles as column names in the resulting DataFrame. If False, use the original column names.
- preserve_indexbool, default: True
If True, preserve the original index of the TableData. If False, reset the index to default range index.
Attributes:
- InputTable: PortReference[PortTypeHint.TableData]
- OutputGeneralTable: PortReference[PortTypeHint.GeneralTable]
- class modules.converters.TableToSingleResult
Convert a specified row or column of a TableData to a SingleResult.
This module extracts values from a specified row or column in a TableData and converts them to a SingleResult object. The output can be in two forms:
Single UnitResult with value as a list containing all row or column values
Multiple UnitResult objects, each containing one row or column value
Examples
Single list result: >>> converter = TableToSingleResult(axis="row", row_index=0, result_mode="single") >>> converter.InputTable = my_table_data >>> result = converter.OutputSingleResult # SingleResult with one UnitResult containing list of all values in current row Multiple individual results: >>> converter = TableToSingleResult(axis="column", column_index=0, result_mode="multiple") >>> converter.InputTable = my_table_data >>> result = converter.OutputSingleResult # SingleResult with multiple UnitResults, one per row
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'TableToSingleResult', auto_run: bool = True, table: PortTypeHint.TableData | None = None, axis: Literal[row, column] = 'row', row_index: str | int = 0, column_index: str | int = 0, result_mode: Literal[single, multiple] = 'multiple', single_result_name: str | None = None, single_result_title: str | None = None, multiple_result_prefix: str = '', use_row_index: bool = True) None
Initialize a TableToSingleResult object.
Parameters
- tablePortTypeHint.TableData | None, default: None
The input TableData to convert. axis: Literal[“row”, “column”], default: “row” The axis to extract the data from.
“row”: Extract the data from one row.
“column”: Extract the data from one column.
- row_indexstr | int, default: 0
The index of the row to extract the data from. If str, the index of the table will be used. If int, the index number of the row will be used.
- column_indexstr | int, default: 0
The index of the column to extract the data from. If str, the column name or title of the table will be used. If int, the index number of the column will be used. result_mode: Literal[“single”, “multiple”], default: “multiple” Mode of conversion:
“single”: Convert all values of the specified row or column to a list and saved in a UnitResult.
“multiple”: Create multiple UnitResult objects, one per column or one per row.
- single_result_namestr | None, default: None
Name for the single UnitResult when result_mode=”single”. If None, axis=”row”: uses the row index. axis=”column”: uses the column name.
- single_result_titlestr | None, default: None
Title for the single UnitResult when result_mode=”single”. If None, axis=”row”: uses the row index. axis=”column”: uses the column title from metadata.
- multiple_result_prefixstr | None, default: None
Prefix for naming multiple UnitResults when result_mode=”multiple”. When axis=”row”, Results will be named as “{prefix}_{column_name}”. If None, uses the column name and title. When axis=”column”, Results will be named as “{prefix}_{sequential_number}” or “{prefix}_{row_index}”. If None, uses the row index.
- use_row_indexbool, default: True
Only valid when result_mode=”multiple” and axis=”column”. If True, uses the row index in result names. If False, uses sequential numbering starting from 0.
Attributes:
- InputTable: PortReference[PortTypeHint.TableData]
- OutputSingleResult: PortReference[PortTypeHint.SingleResult]
- class modules.converters.TableToResultModel
Convert a row of a TableData to a dynamically-created ResultModel.
Each column of the input table becomes a field in the generated :class:``~gdi.dataclass.results.ResultModel`` subclass. The column's title, unit and description from :class:``~gdi.dataclass.tables.FieldMetadata`` are forwarded to the field, and the Python type is inferred from the column's pandas dtype. Empty-table behaviour - **No columns**: returns a bare :class:``~gdi.dataclass.results.ResultModel`` instance with no fields. - **Columns but no rows**: returns a :class:``~gdi.dataclass.results.ResultModel`` instance whose fields match the table columns (with correct types and metadata) but whose values are all ````None````.
Examples
>>> conv = TableToResultModel(row_index=0) >>> conv.InputTable = my_table_data >>> result = conv.OutputResultModel ResultModel subclass with one field per column, values from row 0
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'TableToResultModel', auto_run: bool = True, table: PortTypeHint.TableData | None = None, row_index: str | int = 0, model_name: str = 'DynamicResultModel') None
Initialize a TableToResultModel object.
Parameters
- tablePortTypeHint.TableData | None, default: None
The input TableData to convert.
- row_indexstr | int, default: 0
The row to convert. If str, the row whose index label matches this value is used. If int, the positional (zero-based) row number is used.
- model_namestr, default: “DynamicResultModel”
Class name assigned to the dynamically created :class:
~gdi.dataclass.results.ResultModelsubclass.
Attributes:
- InputTable: PortReference[PortTypeHint.TableData]
- OutputResultModel: PortReference[PortTypeHint.ResultModel]
- class modules.converters.SingleResultToText
Convert a SingleResult to a formatted text string.
This module takes a SingleResult object and converts it to a formatted text string using a template. Placeholders in the template are replaced with values from the SingleResult’s UnitResult objects.
The template supports:
UnitResult names as placeholders:
"{result_name}"UnitResult titles as placeholders:
"{Result Title}"Python format specifiers:
"{value:.2f}"for 2 decimal places
Examples
Basic usage: >>> converter = SingleResultToText( ... text_template="The calculated value is {result_a} with precision {result_b:.3f}" ... ) >>> converter.InputSingleResult = my_single_result >>> text = converter.OutputText Result: "The calculated value is 42 with precision 3.142" Using titles as placeholders: >>> converter = SingleResultToText( ... text_template="计算结果: {计算值} (单位: {单位})" ... ) Handling list values: >>> converter = SingleResultToText( ... text_template="Values: {list_result}", ... joiner_for_list_value=", " ... ) If list_result contains [1, 2, 3], output: "Values: 1, 2, 3"
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'SingleResultToText', auto_run: bool = True, single_result: PortTypeHint.SingleResult | None = None, text_template: str | None = None, joiner_for_list_value: str = ',') None
Initialize a SingleResultToText object.
Parameters
- single_resultPortTypeHint.SingleResult | None, default: None
The input SingleResult to convert.
- text_templatestr | None, default: None
The template to convert the SingleResult to a text. Use
{name}or{title}as placeholders for UnitResult values. Supports Python format specifiers like{value:.2f}for formatting. joiner_for_list_value: str, default: “,” The joiner to be used to join the list values of a UnitResult to be a single string.
Attributes:
- InputSingleResult: PortReference[PortTypeHint.SingleResult]
- OutputText: PortReference[PortTypeHint.Text]
- class modules.converters.SingleResultToTable
Convert a SingleResult to a TableData.
This module takes a SingleResult object and converts it to a TableData object where each field in the SingleResult becomes a column in the table, and the values are stored in a single row.
Examples
>>> converter = SingleResultToTable() >>> converter.InputSingleResult = my_single_result >>> table_data = converter.OutputTable # TableData with one row containing all values
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'SingleResultToTable', auto_run: bool = True, single_result: PortTypeHint.SingleResult | None = None, table_name: str | None = None, table_title: str | None = None, joiner: str = ',') None
Initialize a SingleResultToTable object.
Parameters
- single_resultPortTypeHint.SingleResult | None, default: None
The input SingleResult to convert.
- table_namestr | None, default: None
The name of the table. If None, default name “single_result” will be used.
- table_titlestr | None, default: None
The title of the table. If None, default title “single_result” will be used. joiner: str, default: “,” The string used to join list values into a single string.
Properties:
- InputSingleResult
- OutputTable
- class modules.converters.ResultModelToTable
Convert a ResultModel (or subclass) to a TableData.
Each model field becomes a column; values are stored in a single row.
Examples
>>> converter = ResultModelToTable() >>> converter.InputResultModel = my_result_model >>> table_data = converter.OutputTable
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'ResultModelToTable', auto_run: bool = True, result_model: PortTypeHint.ResultModel | None = None, table_name: str | None = None, table_title: str | None = None, joiner: str = ',') None
Initialize ResultModelToTable.
Parameters
- result_modelPortTypeHint.ResultModel | None, default: None
The input :class:
~gdi.dataclass.results.ResultModelinstance.
table_name : str | None, default: None
Overrides : attr:
~gdi.dataclass.tables.TableData.namewhen not None.table_title : str | None, default: None
- Overridesattr:
~gdi.dataclass.tables.TableData.titlewhen not None. joiner: str, default: “,” String used to join list values into one cell string.
Attributes:
- InputResultModel: PortReference[PortTypeHint.ResultModel]
- OutputTable: PortReference[PortTypeHint.TableData]
- class modules.converters.TableToString
Convert TableData to descriptive strings using customizable templates.
This module allows you to describe table data row by row using string templates. You can define multiple templates to generate different descriptions from the same table, with each description saved as a separate UnitResult in a SingleResult output. The templates support: - Column names or field titles as placeholders: ``"{columnA} has {columnB}"`` - Python format specifiers for numbers: ``"{value:.2f}"``, ``"{count:d}"``, etc. - Multiple templates with different keys for varied descriptionsExamples
-------- Single template: >>> converter = TableToString( ... templates={"description": "{columnA} has {columnB}"}, ... separator=", " ... ) >>> converter.InputTable = my_table_data >>> result = converter.OutputDescriptions # Result: SingleResult with one UnitResult named "description" # Value: "a has 3, b has 4, c has 5" Multiple templates with formatting: >>> converter = TableToString( ... templates={ ... "simple": "{item} has {count}", ... "detailed": "Item {item} has a count of {count:.2f} units" ... }, ... separator=" " ... ) >>> converter.InputTable = my_table_data >>> result = converter.OutputDescriptions # Result: SingleResult with two UnitResults: # "simple": "a has 3 b has 4 c has 5" # "detailed": "Item a has a count of 3.00 units Item b has a count of 4.00 units ..." Using field titles: >>> # If table has column "col1" with title "Item Name" >>> converter = TableToString( ... templates={"desc": "{Item Name} has value {col2}"} ... ) # Will work with both field names and titles
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'TableToString', auto_run: bool = True, table: PortTypeHint.TableData | None = None, templates: dict[str, str] | None = None, result_titles: dict[str, str] | None = None, separator: str = ', ', prefix: str = '', suffix: str = '', include_empty_rows: bool = True) None
Initialize a TableToString object.
Parameters
- tablePortTypeHint.TableData | None, default: None
The input TableData to convert. templates : dict[str, str] | None, default: None Dictionary of templates where key is the result name (will be the UnitResult name) and value is the template string. Template can use column names or field titles as placeholders. Supports Python format specifiers (e.g.,
"{value:.2f}"for 2 decimal places).
Examples
include_empty_rows : bool, default: True If False, rows containing any NaN or None values in the template columns will be skipped in the output description. If True, all rows are included (NaN/None values will appear as "nan" or "None"). Ports InputTable: PortTypeHint.TableData The input TableData to convert to string descriptions. OutputResultModel: PortTypeHint.ResultModel The output ResultModel containing string descriptions.
Attributes:
- InputTable: PortReference[PortTypeHint.TableData]
- OutputResultModel: PortReference[PortTypeHint.ResultModel]
- class modules.converters.TableToJson
Convert TableData or TableSeries to JSON object using its
serializemethod.The output text (stored in the value of ResultModel) can be useful for a LLM to analyze the table data.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'TableToJson', auto_run: bool = True, table: PortTypeHint.TableData | PortTypeHint.TableSeries | None = None, columns: list[str] | str | None = None, result_key_name: str | None = None, result_key_title: str | None = None, indent: int | None = None) None
Initialize a TableToJson object.
Parameters
- tablePortTypeHint.TableData | PortTypeHint.TableSeries | None, default: None
The input table data or table series to convert to json object.
- columnslist[str] | str | None, default: None
The columns to convert. If None, all columns will be converted. Column names can be either field names or field titles. Only applicable when the input is TableData.
- result_key_namestr | None, default: None
The name of the key in the SingleResult returned for the json object. If None, the name of the table or series will be used. If the table name is None, “json_object” will be used.
- result_key_titlestr | None, default: None
The title of the key in the SingleResult returned for the json object. If None, the title of the table or series will be used. If the table title is None, result_key_name will be used.
- indentint | None, default: None
The indent of the json object. If None, the indent will not be used. Ports
- InputTablePortTypeHint.TableData | PortTypeHint.TableSeries
The input table data or table series to convert to json object.
- OutputResultModelPortTypeHint.ResultModel
The output ResultModel containing the json object as a string.
- OutputJsonObjectPortTypeHint.JsonObject
The output json object in dict type.
Attributes:
- InputTable: PortReference[PortTypeHint.TableData | PortTypeHint.TableSeries]
- OutputResultModel: PortReference[PortTypeHint.ResultModel]
- OutputJsonObject: PortReference[PortTypeHint.JsonObject]
- class modules.converters.JsonToTable
Convert a json object to a table data or table series.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'JsonToTable', auto_run: bool = True, json_object: PortTypeHint.JsonObject | PortTypeHint.Text | None = None, single_column_as: Literal[table_data, table_series, auto] = 'auto') None
Initialize a JsonToTable object.
Parameters
- json_objectPortTypeHint.JsonObject | PortTypeHint.Text | None, default: None
The input json object to convert to a table data or table series. single_column_as: Literal[“table_data”, “table_series”, “auto”], default: “auto” How to handle single column data:
“auto”: Keep the deserialized result as-is without conversion.
TableSeries JSON → TableSeries, TableData JSON → TableData.
“table_data”: Ensure single-column output is always TableData.
If deserialized result is TableSeries, convert to TableData. If result is single-column TableData, keep as TableData.
“table_series”: Ensure single-column output is always TableSeries.
If deserialized result is single-column TableData, convert to TableSeries. If result is TableSeries, keep as-is. Ports
- InputJsonObjectPortTypeHint.JsonObject | PortTypeHint.Text
The input json object to convert to a table data or table series.
- OutputTablePortTypeHint.TableData | PortTypeHint.TableSeries
The output table data or table series.
- execute() PortTypeHint.TableData | PortTypeHint.TableSeries | None
Execute the JSON to table conversion.
Attributes:
- InputJsonObject: PortReference[PortTypeHint.JsonObject | PortTypeHint.Text]
- OutputTable: PortReference[PortTypeHint.TableData | PortTypeHint.TableSeries]
- class modules.converters.TableToMarkdown
Convert a
``TableData``or``TableSeries``to a GitHub-Flavored Markdown table.The output is optimised for LLM consumption and frontend rendering. Column headers are taken from field titles (or field names when
``use_titles=False``) and can optionally include the field unit in parentheses. An optional Markdown heading that shows the table title can be prepended, and``center_title=True``renders it as a centred HTML heading (``<h2 align="center">…</h2>``) for renderers that support inline HTML.Examples
>>> m = TableToMarkdown(use_titles=True, include_units=True, show_index=False) >>> m.InputTable = my_table >>> print(m.OutputMarkdown) 勘探点数据 | 孔号 | 深度 (m) | 地层 | | :--- |---:| :--- | | ZK-01 | 10.5 | 粉质黏土 | >>> m = TableToMarkdown(center_title=True) >>> m.InputTable = my_table >>> print(m.OutputMarkdown) <h2 align="center">勘探点数据</h2> | 孔号 | 深度 (m) | 地层 | | :--- |---:| :--- | | ZK-01 | 10.5 | 粉质黏土 |
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'TableToMarkdown', auto_run: bool = True, table: PortTypeHint.TableData | PortTypeHint.TableSeries | None = None, columns: list[str] | str | None = None, use_titles: bool = True, include_units: bool = True, include_table_title: bool = True, center_title: bool = False, show_index: bool = False, precision: int | dict[str, int | None] | None = None, latex_math: bool | set[Literal[table_title, field_titles, cells]] = False, result_key_name: str | None = None, result_key_title: str | None = None) None
Initialise a TableToMarkdown module.
Parameters
- tablePortTypeHint.TableData | PortTypeHint.TableSeries | None
The table to convert. Can also be supplied via the
``InputTable``port.- columnslist[str] | str | None, default: None
Columns to include in the output, specified by field name or title. When a list is provided the columns appear in that exact order in the Markdown table, so this parameter doubles as a column-reorder control.
``None``keeps all columns in their original order. Only applies when the input is a``TableData``.- use_titlesbool, default: True
Use field titles as Markdown column headers. Set to
``False``to use raw field names instead.- include_unitsbool, default: True
Append the field unit in parentheses to the column header, e.g. ``
"深度 (m)"``. Ignored when the unit is``UNITLESS``/ empty.- include_table_titlebool, default: True
Prepend a
``## <table title>``Markdown heading above the table.- center_titlebool, default: False
Render the table title as a centred HTML heading (
``<h2 align="center">…</h2>``) instead of a plain``## …``Markdown heading. Only has an effect when``include_table_title=True``and the table has a non-empty title.- show_indexbool, default: False
Include the DataFrame row index as the first column. Defaults to
``False``because the index is usually a meaningless integer for LLMs. precision : int | dict[str, int | None] | None, default: None Decimal places for numeric cells when rendering Markdown, same semantics as``DocDataWriter``/``TableData.export_doc_context``: an``int``applies to all numeric columns; a``dict``maps column names or titles to a decimal count, or``None``to leave that column unformatted. latex_math : bool | set[{“table_title”, “field_titles”, “cells”}], default: False Wrap LaTeX expressions in``$...$``for math-aware frontend renderers. Set to``False``(default) when the output goes to an LLM.``False``— no wrapping.``True``— wrap table title, column headers, and string-typed cells.A
``set``— granular, e.g.``{"cells"}``or``{"field_titles", "cells"}``.
Wrapping is detection-based: a string is only wrapped when it contains a LaTeX command (``
\alpha``, ``\frac``, …) or sub-/superscript notation. Numeric and non-LaTeX strings are left unchanged.- result_key_namestr | None, default: None
Field name of the single field in
``OutputResultModel``. Defaults to the table’s``name``attribute, or ``"markdown_table"``if that is not set.- result_key_titlestr | None, default: None
Field title of the single field in
``OutputResultModel``. Defaults to the table’s``title``attribute, or``result_key_name``if that is not set. Ports- InputTablePortTypeHint.TableData | PortTypeHint.TableSeries
The input table to convert.
- OutputMarkdownPortTypeHint.Text
The Markdown table (and optional heading) as a plain string.
- OutputResultModelPortTypeHint.ResultModel
A dynamically-created
``ResultModel``with a single string field (named by``result_key_name``) whose value is the Markdown text.
Attributes:
- InputTable: PortReference[PortTypeHint.TableData | PortTypeHint.TableSeries]
- OutputMarkdown: PortReference[PortTypeHint.Text]
- OutputResultModel: PortReference[PortTypeHint.ResultModel]
- class modules.converters.TablesToMarkdown
Convert a
``TableCollection``to GitHub-Flavored Markdown tables.Each ````TableData```` in the collection can be rendered as an independent Markdown table, or — when the collection defines a main/sub-table relationship and ````combine_related=True```` (default) — each main table is joined with every one of its sub-tables on the configured primary key and rendered as a single combined block. Use ````related_tables_title```` to choose whether the block heading is the main title, the sub title, or both (````"{main} - {sub}"````, default). Two rendering styles are available for the combined block (````merge_mode```` parameter): - ````"markdown"```` *(default)*: repeated main-table cell values are left blank on subsequent sub-rows — clean plain GFM that works well as LLM input. - ````"html"````: an inline ````<table>```` with ````rowspan```` attributes is emitted, with merged cells horizontally centred — ideal for frontend display inside Markdown-aware renderers.Examples
Combined Markdown mode (LLM-friendly): >>> m = TablesToMarkdown() >>> m.InputTables = my_collection >>> print(m.OutputMarkdown.data) 勘探点汇总 - 地层分层 | 孔号 | 地面高程 (m) | 层顶深度 (m) | 地层 | | ----- | ------------ | ------------ | -------- | | ZK-01 | 12.5 | 0.0 | 填土 | | | | 2.5 | 粉质黏土 | | ZK-02 | 10.0 | 0.0 | 填土 | Combined HTML mode (frontend-friendly): >>> m = TablesToMarkdown(merge_mode="html") >>> m.InputTables = my_collection >>> print(m.OutputMarkdown.data) 勘探点汇总 - 地层分层 <table> <thead><tr><th>孔号</th><th>地面高程 (m)</th>...</tr></thead> <tbody> <tr><td rowspan="2" style="vertical-align:middle">ZK-01</td>...</tr> ... </tbody> </table> Column selection (only show selected columns, pk always first in combined): >>> m = TablesToMarkdown(columns={"main": ["bore_id", "elevation"], ... "sub": ["top_depth", "geo"]})
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'TablesToMarkdown', auto_run: bool = True, tables: PortTypeHint.TableCollection | None = None, combine_related: bool = True, related_tables_title: Literal[combined, main, sub] = 'sub', columns: dict[str, list[str] | str] | None = None, merge_mode: Literal[markdown, html] = 'markdown', use_titles: bool = True, include_units: bool = True, include_table_title: bool = True, center_title: bool = False, show_index: bool = False, separator: str = '\n\n---\n\n', precision: int | dict[str, int | None] | dict[str, dict[str, int | None]] | None = None, latex_math: bool | set[Literal[table_title, field_titles, cells]] = False, result_key_name: str | None = None, result_key_title: str | None = None) None
Initialise a TablesToMarkdown module.
Parameters
- tablesPortTypeHint.TableCollection | None
The collection to convert. Can also be supplied via the
``InputTables``port.- combine_relatedbool, default: True
When the collection has a main/sub-table relationship and a primary key, join each main table with its sub-tables. Set to
``False``to render every table independently. related_tables_title : Literal[“combined”, “main”, “sub”], default: “sub” For merged main/sub blocks only: ``"combined"``→ ``"{main_table_title} - {sub_table_title}"``; ``"main"``/ ``"sub"``use that table’s title (falling back to the other if missing). columns : dict[str, list[str] | str] | None, default: None Per-table column selection. The key is the table name or title; the value is a column name / title, or a list of column names / titles, that determines which columns to include and in what order.``None``keeps all columns. When``combine_related=True``the primary key column is always placed first in the main-table column list regardless of the selection. merge_mode : {“markdown”, “html”}, default: “markdown” How repeated main-table values are rendered in a combined block.``
"markdown"``: subsequent cells are left blank (LLM-friendly
plain GFM).
``
"html"``: an inline``<table>``with``rowspan``attributes
and
``vertical-align:middle``on merged cells (frontend-friendly). Only affects combined blocks; independent tables are always plain GFM.- use_titlesbool, default: True
Use field titles as column headers.
- include_unitsbool, default: True
Append the field unit in parentheses to each column header.
- include_table_titlebool, default: True
Prepend a
``## <title>``heading above each table block.- center_titlebool, default: False
Render headings as
``<h2 align="center">…</h2>``instead of plain``## …``.- show_indexbool, default: False
Include the DataFrame row index as the first column. Only applies when tables are rendered independently (not combined).
- separatorstr, default: ``
"\n\n---\n\n"`` String used to join separate table blocks. precision : int | dict[str, int | None] | dict[str, dict[str, int | None]] | None, default: None Decimal places for numeric cells. An
``int``or a flat dict applies the same rules as``DocDataWriter``across all tables. A nested dict``{ <table name or title>: { <column name or title>: n } }``sets precision per table (table keys match``columns``: name first, then title), so the same column name can use different rounding in different tables. Nested layout applies only when every top-level value is a``dict``(see``convert_table_collection_to_markdown``). latex_math : bool | set[{“table_title”, “field_titles”, “cells”}], default: False Wrap LaTeX expressions in``$...$``for math-aware frontend renderers. Set to``False``(default) when the output goes to an LLM.``False``— no wrapping.``True``— wrap table title, column headers, and string-typed cells.A
``set``— granular, e.g.``{"cells"}``or``{"field_titles", "cells"}``.
Applied uniformly to every table block in the collection.
- result_key_namestr | None, default: None
Field name of the single field in
``OutputResultModel``. Defaults to the collection’s``name``, or ``"markdown_tables"``.- result_key_titlestr | None, default: None
Field title of the single field in
``OutputResultModel``. Defaults to the collection’s``title``, or``result_key_name``. Ports- InputTablesPortTypeHint.TableCollection
The input collection to convert.
- OutputMarkdownPortTypeHint.Text
All table blocks joined by
``separator``.- OutputResultModelPortTypeHint.ResultModel
A dynamically-created
``ResultModel``with a single string field whose value is the full output text.
Attributes:
- InputTables: PortReference[PortTypeHint.TableCollection]
- OutputMarkdown: PortReference[PortTypeHint.Text]
- OutputResultModel: PortReference[PortTypeHint.ResultModel]
- class modules.converters.DocxToMarkdown
Convert a document (.docx or .rtf) to markdown format.
This module converts Microsoft Word (.docx) or Rich Text Format (.rtf) documents to markdown format with comprehensive options for image handling and markdown variant selection. It’s optimized for LLM processing with sensible defaults.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'DocxToMarkdown', auto_run: bool = True, input_file: PortTypeHint.FilePath | dict | None = None, result_key_name: str | None = None, result_key_title: str | None = None, markdown_format: Literal[gfm, markdown, commonmark, markdown_strict, markdown_phpextra, markdown_mmd] = 'gfm', extract_images: bool = True, images_dir: str | Path | None = None, images_dir_relative_to_input: bool = True, relative_image_link: bool = False, embed_images: bool = False, wrap_text: bool = False, extra_args: list[str] | None = None) None
Initialize a DocxToMarkdown module.
Parameters
- input_filePortTypeHint.FilePath | dict | None, default: None
The input docx or rtf file path. If the data of
InputDocxFileport is notNone,self.input_filewill be overwritten. If dict, it will be converted to a GdimMinIOFile object to get the file from minIO server.- result_key_namestr | None, default: None
Field name of the single field in
``OutputResultModel``. Defaults to the stem of the input filename, or ``"markdown_content"``if that is not available.- result_key_titlestr | None, default: None
Field title of the single field in
``OutputResultModel``. Defaults to``result_key_name``when not provided.- markdown_formatstr, default: “gfm”
The markdown variant to use for conversion. Options:
“gfm”: GitHub Flavored Markdown (default, best for LLMs)
LLMs are extensively trained on GitHub content, making GFM
the most familiar and well-understood markdown variant for AI models
Supports: tables, task lists, strikethrough, fenced code blocks
“markdown”: Pandoc’s extended markdown (most features)
Supports: tables, footnotes, math, definition lists, metadata blocks
“commonmark”: CommonMark specification (strict standard)
Basic markdown with strict spec compliance
“markdown_strict”: Original markdown.pl implementation
Only original markdown features (very limited)
“markdown_phpextra”: PHP Markdown Extra
Tables, footnotes, definition lists, fenced code blocks
“markdown_mmd”: MultiMarkdown
Citations, cross-references, metadata, math support
- extract_imagesbool, default: True
Whether to extract images from the document. If True, images are saved to a directory and referenced with relative paths in the markdown. Recommended for multi-modal models as it provides both text and separate image files that can be processed independently.
- images_dirstr | Path | None, default: None
Directory for extracted images. If None, uses ``
"{{stem}}_images"``where``stem``is the input filename without suffix. See``images_dir_relative_to_input``for whether that folder is under the input’s parent or under the process working directory.- images_dir_relative_to_inputbool, default: False
When True, a missing or relative
``images_dir``is rooted under the input file’s parent directory. When False, it is rooted under the current working directory (including the default``{{stem}}_images``name).- relative_image_linkbool, default: True
When
``images_dir``is an absolute path, if True (default), image URLs in the markdown are rewritten to be relative to``Path(images_dir).parent``(save the ``.md``beside that``images``folder). Ignored for non-absolute``images_dir``(those URLs are always rewritten for portability).- embed_imagesbool, default: False
If True, embeds images as base64-encoded data URIs directly in the markdown content, creating a self-contained file. This overrides extract_images and images_dir settings. Note: For multi-modal LLMs, extract_images=True is generally preferred as it allows the model to process images and text separately.
- wrap_textbool, default: False
Whether to wrap text at a certain column width. Default is False which means no text wrapping (–wrap=none is passed to pandoc).
- **RecommendedFalse for LLM processing** to preserve the original text flow
and avoid artificial line breaks that could confuse language models. Set to True if you need wrapped text for human readability.
- extra_argslist[str] | None, default: None
Additional command-line arguments for advanced customization.
Examples
Basic usage with default settings (best for LLMs): >>> module = DocxToMarkdown(input_file="report.docx") >>> print(module.OutputMarkdown) >>> result = module.OutputResultModel >>> print(result.markdown_content) Self-contained document with embedded images: >>> module = DocxToMarkdown( ... input_file="report.docx", ... embed_images=True ... ) For academic writing with MultiMarkdown format: >>> module = DocxToMarkdown( ... input_file="paper.docx", ... markdown_format="markdown_mmd" ... ) Enable text wrapping for human readability: >>> module = DocxToMarkdown( ... input_file="document.docx", ... wrap_text=True ... )
Notes
**For Multi-Modal LLM Processing:** The default settings (extract_images=True, markdown_format="gfm", wrap_text=False) are optimized for multi-modal LLMs. This approach: #. Extracts images to a separate directory with relative paths in markdown #. Allows the LLM to process both text and images independently #. Uses GFM format which LLMs understand best #. Disables text wrapping to preserve original text flow **Image Handling Strategies:** - **extract_images=True** (default): Best for multi-modal processing * Images saved as separate files * Markdown contains relative paths: ``!`alt <images/image1.png>`_`` * Multi-modal models can process each image independently - **embed_images=True**: Creates self-contained document * Images embedded as base64 data URIs in markdown * Single file, but larger size * Less ideal for multi-modal processing - **extract_images=False**: Images are skipped * Only text content is converted * Use when images are not needed
Attributes:
- InputDocxFile: PortReference[PortTypeHint.FilePath]
- OutputMarkdown: PortReference[PortTypeHint.Text]
- OutputResultModel: PortReference[PortTypeHint.ResultModel]
- class modules.converters.ConvertToBoreForCadDraw
Convert the input table collection to a BoreForCadDraw object and save it to .gsc file
which can be used for cad drawing (钻孔柱状图).
- Inherits from:
PipeModule
Methods:
- __init__(mname: str | None = 'ConvertToBoreForCadDraw', auto_run: bool = True, tables: PortTypeHint.TableCollection | None = None, geo_params_table: PortTypeHint.TableData | None = None, proj_info: PortTypeHint.ResultModel | None = None, name_maps: dict[str, dict[str, str]] | None = None, selected_bores: list[str] | None = None, drawing_scales: dict[str, int] | list[dict[str, str | int]] | int | None = None, proj_info_name_map: dict[str, str] | None = None, sample_types_map: dict[str, int] | None = None, output_dir: str | Path | None = None, gsc_file_name: str = 'bore_for_cad_draw.gsc', save_to_gdim: bool = False, token: str | None = None, proj_id: int | str | None = None, host: str | None = None) None
Initialize a ConvertToBoreForCadDraw object.
Parameters
- tablesPortTypeHint.TableCollection | None, default: None
The input table collection containing bore, layer, and test data etc. Here are the table names:
bore_table : 钻孔一览表
layer_table : 地层表
materials_table : 标准地层表
spt_table : 标贯表
cpt_table : 双桥静探表
dpt_table : 动探表
wave_table : 波速表
samples_table : 取样表
soils_test_table : 常规试验表
- geo_params_tablePortTypeHint.TableData | None, default: None
The input table data containing geo parameters data from Gdim App ‘岩土参数建议值表’.
geo_parameters_table : 岩土参数建议值表
- proj_infoPortTypeHint.ResultModel | None, default: None
The input ResultModel containing project info data. name_maps: dict[str, dict[str, str]] | None, default: None Mapping of table names and field names. Structure: { “table_names”: {“bore_table”: “actual_bore_table_name”, …}, “field_names”: {“bore_table”: {“bore_num”: “actual_bore_num_field”, …}, …} }
- selected_boreslist[str] | None, default: None
List of bores (input bore numbers) to convert. If None, all bores will be converted. drawing_scales: dict[str, int] | int | None, default: None Drawing scales for bores. If int, it will be used for all bores. If dict, the key is bore number, the value is drawing scale. If list, use the following format -
[{"bore_num": "num_1", "drawing_scales": 200}, {"bore_num": "num_2", "drawing_scales": 300}]proj_info_name_map: dict[str, str] | None, default: None Mapping for project info fields from proj_info to ProjectInfo. sample_types_map: dict[str, int] | None, default: None Mapping from string sample type names to integer codes. For example: {“厚壁原状”: 0, “薄壁原状”: 0, “扰动样”: 1, “岩石样”: 2, “水样”: 3}- output_dirstr | Path | None, default: None
The directory to save the output gsc file.
‘workspace’ of pipeline has priority over the ‘output_dir’.
If both ‘output_dir’ and ‘workspace’ are None, the current working directory will be used.
- gsc_file_namestr, default: “bore_for_cad_draw.gsc”
The name of the gsc file.
- save_to_gdimbool, default: False
If True, the generated .gsc file will also be saved to the gdim file server.
- tokenstr | None
The token of the user. Must be provided when
save_to_gdimis True.- proj_idint | str | None
The id of the gdim project. Must be provided when
save_to_gdimis True.- hoststr | None
The host of the gdim platform.
- execute() BoreForCadDraw | None
Attributes:
- InputTables: PortReference[PortTypeHint.TableCollection]
- InputGeoParamsTable: PortReference[PortTypeHint.TableData]
- InputProjectInfo: PortReference[PortTypeHint.ResultModel]
- InputToken: PortReference[PortTypeHint.Token]
- OutputBoreForCadDraw: PortReference[PortTypeHint.BoreForCadDraw]
- OutputFile: PortReference[PortTypeHint.FilePath | PortTypeHint.GdimFile]
- class modules.converters.ConvertToBoreForPlanDraw
Convert the input table collection to a BoreForPlanDraw object and save it to .gsc file
which can be used for bores plan drawing (钻孔平面布置图).
- Inherits from:
PipeModule
Methods:
- __init__(mname: str | None = 'ConvertToBorePlanForCad', auto_run: bool = True, tables: PortTypeHint.TableCollection | None = None, proj_info: PortTypeHint.ResultModel | None = None, coordinate_system: PortTypeHint.SingleResult | None = None, name_maps: dict[str, dict[str, str]] | None = None, bore_types_map: dict[str, BoreTypes] | None = None, proj_info_name_map: dict[str, str] | None = None, output_dir: str | Path | None = None, gsc_file_name: str = 'bore_plan_for_cad_draw.gsc', save_to_gdim: bool = False, token: str | None = None, proj_id: int | str | None = None, host: str | None = None) None
Initialize a ConvertToBorePlanForCad object.
Parameters
- tablesPortTypeHint.TableCollection | None, default: None
The input table collection containing bore and section line data. Here are the table names:
bore_table : 钻孔一览表
section_line_table : 剖面线表
- proj_infoPortTypeHint.ResultModel | None, default: None
The input ResultModel containing project info data. name_maps: dict[str, dict[str, str]] | None, default: None Mapping of table names and field names. Structure: { “table_names”: {“bore_table”: “actual_bore_table_name”, “section_line_table”: “actual_section_line_table_name”}, “field_names”: { “bore_table”: {“bore_num”: “actual_bore_num_field”, “x”: “actual_x_field”, “y”: “actual_y_field”, “bore_type”: “actual_bore_type_field”, “top”: “actual_top_field”}, “section_line_table”: {“name”: “actual_name_field”, “bores”: “actual_bores_field”} } } bore_types_map: dict[str, BoreTypes] | None, default: None Mapping from string bore type names to BoreTypes enum. For example: {“鉴别孔”: BoreTypes.IdentificationBore, “取土试样钻孔”: BoreTypes.SoilSamplingBore} If the map is not specified, will try to compare to the names and titles of BoreTypes automatically. proj_info_name_map: dict[str, str] | None, default: None Mapping for project info fields from proj_info to ProjectInfo.
- output_dirstr | Path | None, default: None
The directory to save the output gsc file.
‘workspace’ of pipeline has priority over the ‘output_dir’.
If both ‘output_dir’ and ‘workspace’ are None, the current working directory will be used.
- gsc_file_namestr, default: “bore_plan_for_cad_draw.gsc”
The name of the gsc file. If None, the gsc file will not be saved.
- save_to_gdimbool, default: False
If True, the generated .gsc file will also be saved to the gdim file server.
- tokenstr | None
The token of the user. Must be provided when
save_to_gdimis True.- proj_idint | str | None
The id of the gdim project. Must be provided when
save_to_gdimis True.- hoststr | None
The host of the gdim platform.
- execute() BoreForPlanDraw | None
Attributes:
- InputTables: PortReference[PortTypeHint.TableCollection]
- InputProjectInfo: PortReference[PortTypeHint.ResultModel]
- InputCoordinateSystem: PortReference[PortTypeHint.CoordinateSystem]
- InputToken: PortReference[PortTypeHint.Token]
- OutputBoreForPlanDraw: PortReference[PortTypeHint.BoreForPlanDraw]
- OutputFile: PortReference[PortTypeHint.FilePath | PortTypeHint.GdimFile]