modules.converters

Classes

class modules.converters.ConvertToMaterialTable

Convert the input table to a MaterialTable object which can be used for geo calculation.

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'ConvertToMaterialTable', auto_run: bool = True, table: PortTypeHint.TableData | PortTypeHint.TableCollection | None = None, column_name_map: dict[str, GProps] | None = None, sort_by_layer_number: bool = True, reset_material_id: bool = True, table_name: str | None = None) → None

Initialize a ConvertToMaterialTable object.

Parameters

tablePortTypeHint.Table | None, default: None: The input table data. column_name_map: dict[str, GProps] | None, default: None The mapping of the column name of the table and the materials name. The keys are the field name or field title of the table (resolved automatically) and the values are the material property. If the mapping of a column is not provided, the module will try to find the column with the same name as the value of GeoMaterialProps automatically.
sort_by_layer_numberbool, default: True: If True, the data will be sorted by the layer number. The material_id starts from 0.
reset_material_idbool, default: True: If True, the material_id will be reset to the ascending order of layer number which means the material_id will start from 0 at the first row after sorting. It’s available only when ‘sort_by_layer_number’ is True.
table_namestr | None, default: None: The name or title of the material table in the table collection. This is only used when table is a TableCollection. If None, the table with name “standard_layer_table” will be used. If there is no “standard_layer_table” in the collection, the table with title “标准地层表” will be used. If there is no “standard_layer_table” and “标准地层表” in the collection, the first table in the collection will be used. Ports
InputTablePortTypeHint.TableData | PortTypeHint.TableCollection: The input table data.
OutputMaterialTablePortTypeHint.MaterialTable: The output material table.

Notes

GProps.LayerNumber will be converted to string type forcefully.

execute() → PortTypeHint.MaterialTable | None

Attributes:

InputTable: PortReference[PortTypeHint.TableData | PortTypeHint.TableCollection]

OutputMaterialTable: PortReference[PortTypeHint.MaterialTable]

class modules.converters.ConvertToMaterialTables

Convert the input table collection to a MaterialTable collection which can be used for geo calculation.

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'ConvertToMaterialTables', auto_run: bool = True, tables: PortTypeHint.TableCollection | None = None, column_name_map: dict[str, GProps] | None = None, sort_by_layer_number: bool = True, format_dict: dict[GProps, str] | None = None) → None

Initialize a ConvertToMaterialTables object.

Parameters

tablesPortTypeHint.TableCollection | None, default: None: The input table collection. column_name_map: dict[str, GProps] | None, default: None The mapping of the column name of the table and the materials name. The key is the column name or field title of the table and the value is the materials name.
sort_by_layer_numberbool, default: True: If True, the data will be sorted by the layer number. The material_id starts from 0. format_dict: dict[GProps, str] | None, default: None The dictionary of the column name and the type which is used to convert the column to the specific type. The key is the material name and the value is the type.

Notes

MatrialTables are often used when each bore or section has its own material table.
So the key of the MaterialTables is usually the bore or section number or name.

set_cal_params(reset: bool = True) → dict[str, RangeModel]

execute() → PortTypeHint.MaterialTableCollection | None

Properties:

InputTables

OutputMaterialTables

class modules.converters.ConvertToMultiProfile1D

Convert the input table collection to a MultiProfile1D object which can be used for geo calculation.

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'ConvertToMultiProfile1D', auto_run: bool = True, tables: PortTypeHint.TableCollection | None = None, material_table: PortTypeHint.MaterialTable | None = None, bore_table_name: str | None = None, bore_table_field_names: list[str] | None = None, layer_table_name: str | None = None, layer_table_field_names: list[str] | None = None, selected_bores: list[str] | None = None, sort_layers_table: bool = False) → None

Initialize a ConvertToMultiProfile1D object.

Parameters

tablesPortTypeHint.TableCollection | None, default: None

The input tables data.

material_tablePortTypeHint.MaterialTable | None, default: None

The input material table used to resolve materials_id from layer numbers

bore_table_namestr | None, default: None

Bore list table key (table name or title). If None, the module looks for a table named bore_table, then a table titled 钻孔一览表.

bore_table_field_nameslist[str] | None, default: None

Five field identifiers (names or titles) for the bore table, in order: bore number, bore top elevation, X, Y, steady groundwater depth. If None, each column is matched by English name or Chinese title in order:

bore_number / 钻孔编号
bore_top / 孔口高程
x_coordinate / X坐标
y_coordinate / Y坐标
steady_ground_water_depth / 稳定水位埋深

layer_table_namestr | None, default: None

Layer table key (name or title). If None, looks for layer_table then title 地层表.

layer_table_field_nameslist[str] | None, default: None

Three field identifiers: bore number, layer bottom depth, layer number. If None, each column is matched by English name or Chinese title in order:

bore_number / 钻孔编号
layer_bottom_depth / 层底深度
layer_number / 地层编号

selected_boreslist[str] | None, default: None

If set, only these bore identifiers (values in the bore-number column) are kept.

sort_layers_tablebool, default: False

If False, keep the layer table row order as in the input. If True, sort rows by bore number ascending, then layer bottom depth ascending.

execute() → PortTypeHint.MultiProfile1D | None

Attributes:

InputTables: PortReference[PortTypeHint.TableCollection]

InputMaterialTable: PortReference[PortTypeHint.MaterialTable]

OutputMultiProfile1D: PortReference[PortTypeHint.MultiProfile1D]

class modules.converters.PdfToImages

Convert PDF files to images.

This module converts single or multiple PDF files to images. Each PDF can have its own configuration (image prefix, page range, dimensions) specified via a table-based interface.

Features

Convert single or multiple PDF files
Per-file configuration for page ranges and output dimensions
Auto-generated or custom image prefixes
Table-based UI for easy configuration

Examples

Minimal configuration (just file paths, all defaults):

>>> converter = PdfToImages(
...     pdf_configs=[
...         {"file_path": "doc1.pdf"},
...         {"file_path": "doc2.pdf"}
...     ]
... )

With selective per-file configuration:

>>> converter = PdfToImages(
...     pdf_configs=[
...         {"file_path": "doc1.pdf", "image_prefix": "bore", "last_page": 5},
...         {"file_path": "doc2.pdf", "first_page": 3, "width": 1920}
...     ]
... )

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'PdfToImages', auto_run: bool = True, output_dir: str | Path | None = None, dpi: int = 200, format: Literal[png, jpg, jpeg] = 'png', pdf_configs: list[_PdfConfigRow] | None = None) → None

Initialize a PdfToImages object.

Parameters

mnamestr, default: “PdfToImages”

Module name.

auto_runbool, default: True

Whether to auto-run the module.

output_dirstr | Path | None, default: None

The directory to save all output images.

‘workspace’ of pipeline has priority over the ‘output_dir’.
If both ‘output_dir’ and ‘workspace’ are None, the directory

of the first PDF will be used.

dpiint, default: 200

The DPI of the pictures converted from documents. format : Literal[“png”, “jpg”, “jpeg”], default: “png” The format of the images converted from PDFs.

pdf_configslist[_PdfConfigRow] | None, default: None

List of per-file configurations. Each configuration is a dictionary. Only file_path is required; other keys are optional with defaults:

file_path: str - PDF file path (required)
image_prefix: str - Prefix for output images (default: auto-generate)
first_page: int - First page to convert, 1-based (default: 1)
last_page: int | None - Last page (default: None = all pages)
width: int | None - Image width (default: None = use DPI)
height: int | None - Image height (default: None = use DPI)

Notes

- If image_prefix is empty, prefixes are automatically generated from PDF filenames
- If multiple PDFs have the same filename, numeric suffixes (_1, _2, etc.) are added
- All images are saved to the same output directory

update_ui_schema(reset: bool = False) → dict[str, UIAttributeSchema]

execute() → PortTypeHint.FilesPath | None

Attributes:

OutputImages: PortReference[PortTypeHint.FilesPath]

class modules.converters.TableToDataFrame

Convert TableData to DataFrame.

This module converts a TableData object (which has enhanced metadata like field titles, units, descriptions) to a standard pandas DataFrame. This is useful when you need to use the data with libraries or functions that expect pure pandas DataFrames.

Examples

Basic usage:
>>> converter = TableToDataFrame()
>>> converter.InputTable = my_table_data
>>> result_df = converter.OutputDataFrame

Using field titles as column names:
>>> converter = TableToDataFrame(use_titles_as_columns=True)
>>> converter.InputTable = my_table_data
>>> result_df = converter.OutputDataFrame  # DataFrame with human-readable column names

Resetting index:
>>> converter = TableToDataFrame(preserve_index=False)
>>> converter.InputTable = my_table_data
>>> result_df = converter.OutputDataFrame  # DataFrame with default 0, 1, 2... index

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'TableToDataFrame', auto_run: bool = True, table: PortTypeHint.TableData | None = None, use_titles_as_columns: bool = False, preserve_index: bool = True) → None

Initialize a TableToDataFrame object.

Parameters

tablePortTypeHint.TableData | None, default: None: The input TableData to convert.
use_titles_as_columnsbool, default: False: If True, use field titles as column names in the resulting DataFrame. If False, use the original column names.
preserve_indexbool, default: True: If True, preserve the original index of the TableData. If False, reset the index to default range index.

update_ui_schema(reset: bool = False) → dict[str, UIAttributeSchema]

execute() → PortTypeHint.GeneralTable | None

Attributes:

InputTable: PortReference[PortTypeHint.TableData]

OutputGeneralTable: PortReference[PortTypeHint.GeneralTable]

class modules.converters.TableToSingleResult

Convert a specified row or column of a TableData to a SingleResult.

This module extracts values from a specified row or column in a TableData and converts them to a SingleResult object. The output can be in two forms:

Single UnitResult with value as a list containing all row or column values
Multiple UnitResult objects, each containing one row or column value

Examples

Single list result:
>>> converter = TableToSingleResult(axis="row", row_index=0, result_mode="single")
>>> converter.InputTable = my_table_data
>>> result = converter.OutputSingleResult  # SingleResult with one UnitResult containing list of all values in current row

Multiple individual results:
>>> converter = TableToSingleResult(axis="column", column_index=0, result_mode="multiple")
>>> converter.InputTable = my_table_data
>>> result = converter.OutputSingleResult  # SingleResult with multiple UnitResults, one per row

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'TableToSingleResult', auto_run: bool = True, table: PortTypeHint.TableData | None = None, axis: Literal[row, column] = 'row', row_index: str | int = 0, column_index: str | int = 0, result_mode: Literal[single, multiple] = 'multiple', single_result_name: str | None = None, single_result_title: str | None = None, multiple_result_prefix: str = '', use_row_index: bool = True) → None

Initialize a TableToSingleResult object.

Parameters

tablePortTypeHint.TableData | None, default: None

The input TableData to convert. axis: Literal[“row”, “column”], default: “row” The axis to extract the data from.

“row”: Extract the data from one row.
“column”: Extract the data from one column.

row_indexstr | int, default: 0

The index of the row to extract the data from. If str, the index of the table will be used. If int, the index number of the row will be used.

column_indexstr | int, default: 0

The index of the column to extract the data from. If str, the column name or title of the table will be used. If int, the index number of the column will be used. result_mode: Literal[“single”, “multiple”], default: “multiple” Mode of conversion:

“single”: Convert all values of the specified row or column to a list and saved in a UnitResult.
“multiple”: Create multiple UnitResult objects, one per column or one per row.

single_result_namestr | None, default: None

Name for the single UnitResult when result_mode=”single”. If None, axis=”row”: uses the row index. axis=”column”: uses the column name.

single_result_titlestr | None, default: None

Title for the single UnitResult when result_mode=”single”. If None, axis=”row”: uses the row index. axis=”column”: uses the column title from metadata.

multiple_result_prefixstr | None, default: None

Prefix for naming multiple UnitResults when result_mode=”multiple”. When axis=”row”, Results will be named as “{prefix}_{column_name}”. If None, uses the column name and title. When axis=”column”, Results will be named as “{prefix}_{sequential_number}” or “{prefix}_{row_index}”. If None, uses the row index.

use_row_indexbool, default: True

Only valid when result_mode=”multiple” and axis=”column”. If True, uses the row index in result names. If False, uses sequential numbering starting from 0.

execute() → PortTypeHint.SingleResult | None

Attributes:

InputTable: PortReference[PortTypeHint.TableData]

OutputSingleResult: PortReference[PortTypeHint.SingleResult]

class modules.converters.TableToResultModel

Convert a row of a TableData to a dynamically-created ResultModel.

Each column of the input table becomes a field in the generated
:class:``~gdi.dataclass.results.ResultModel`` subclass.  The column's
title, unit and description from :class:``~gdi.dataclass.tables.FieldMetadata``
are forwarded to the field, and the Python type is inferred from the
column's pandas dtype.

Empty-table behaviour
- **No columns**: returns a bare :class:``~gdi.dataclass.results.ResultModel``
  instance with no fields.
- **Columns but no rows**: returns a :class:``~gdi.dataclass.results.ResultModel``
  instance whose fields match the table columns (with correct types and
  metadata) but whose values are all ````None````.

Examples

>>> conv = TableToResultModel(row_index=0)
>>> conv.InputTable = my_table_data
>>> result = conv.OutputResultModel
ResultModel subclass with one field per column, values from row 0

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'TableToResultModel', auto_run: bool = True, table: PortTypeHint.TableData | None = None, row_index: str | int = 0, model_name: str = 'DynamicResultModel') → None

Initialize a TableToResultModel object.

Parameters

tablePortTypeHint.TableData | None, default: None: The input TableData to convert.
row_indexstr | int, default: 0: The row to convert. If str, the row whose index label matches this value is used. If int, the positional (zero-based) row number is used.
model_namestr, default: “DynamicResultModel”: Class name assigned to the dynamically created :class:~gdi.dataclass.results.ResultModel subclass.

execute() → PortTypeHint.ResultModel | None

Attributes:

InputTable: PortReference[PortTypeHint.TableData]

OutputResultModel: PortReference[PortTypeHint.ResultModel]

class modules.converters.SingleResultToText

Convert a SingleResult to a formatted text string.

This module takes a SingleResult object and converts it to a formatted text string using a template. Placeholders in the template are replaced with values from the SingleResult’s UnitResult objects.

The template supports:

UnitResult names as placeholders: "{result_name}"
UnitResult titles as placeholders: "{Result Title}"
Python format specifiers: "{value:.2f}" for 2 decimal places

Examples

Basic usage:
>>> converter = SingleResultToText(
...     text_template="The calculated value is {result_a} with precision {result_b:.3f}"
... )
>>> converter.InputSingleResult = my_single_result
>>> text = converter.OutputText
Result: "The calculated value is 42 with precision 3.142"

Using titles as placeholders:
>>> converter = SingleResultToText(
...     text_template="计算结果: {计算值} (单位: {单位})"
... )

Handling list values:
>>> converter = SingleResultToText(
...     text_template="Values: {list_result}",
...     joiner_for_list_value=", "
... )
If list_result contains [1, 2, 3], output: "Values: 1, 2, 3"

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'SingleResultToText', auto_run: bool = True, single_result: PortTypeHint.SingleResult | None = None, text_template: str | None = None, joiner_for_list_value: str = ',') → None

Initialize a SingleResultToText object.

Parameters

single_resultPortTypeHint.SingleResult | None, default: None: The input SingleResult to convert.
text_templatestr | None, default: None: The template to convert the SingleResult to a text. Use {name} or {title} as placeholders for UnitResult values. Supports Python format specifiers like {value:.2f} for formatting. joiner_for_list_value: str, default: “,” The joiner to be used to join the list values of a UnitResult to be a single string.

execute() → PortTypeHint.Text | None: Execute the SingleResult to text conversion.

Attributes:

InputSingleResult: PortReference[PortTypeHint.SingleResult]

OutputText: PortReference[PortTypeHint.Text]

class modules.converters.SingleResultToTable

Convert a SingleResult to a TableData.

This module takes a SingleResult object and converts it to a TableData object where each field in the SingleResult becomes a column in the table, and the values are stored in a single row.

Examples

>>> converter = SingleResultToTable()
>>> converter.InputSingleResult = my_single_result
>>> table_data = converter.OutputTable  # TableData with one row containing all values

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'SingleResultToTable', auto_run: bool = True, single_result: PortTypeHint.SingleResult | None = None, table_name: str | None = None, table_title: str | None = None, joiner: str = ',') → None

Initialize a SingleResultToTable object.

Parameters

single_resultPortTypeHint.SingleResult | None, default: None: The input SingleResult to convert.
table_namestr | None, default: None: The name of the table. If None, default name “single_result” will be used.
table_titlestr | None, default: None: The title of the table. If None, default title “single_result” will be used. joiner: str, default: “,” The string used to join list values into a single string.

set_cal_params(reset: bool = True) → dict[str, RangeModel]

execute() → PortTypeHint.TableData | None

Properties:

InputSingleResult

OutputTable

class modules.converters.ResultModelToTable

Convert a ResultModel (or subclass) to a TableData.

Each model field becomes a column; values are stored in a single row.

Examples

>>> converter = ResultModelToTable()
>>> converter.InputResultModel = my_result_model
>>> table_data = converter.OutputTable

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'ResultModelToTable', auto_run: bool = True, result_model: PortTypeHint.ResultModel | None = None, table_name: str | None = None, table_title: str | None = None, joiner: str = ',') → None

Initialize ResultModelToTable.

Parameters

result_modelPortTypeHint.ResultModel | None, default: None: The input :class:~gdi.dataclass.results.ResultModel instance.

table_name : str | None, default: None

Overrides : attr:~gdi.dataclass.tables.TableData.name when not None.

table_title : str | None, default: None

Overridesattr:~gdi.dataclass.tables.TableData.title when not None.: joiner: str, default: “,” String used to join list values into one cell string.

execute() → PortTypeHint.TableData | None

Attributes:

InputResultModel: PortReference[PortTypeHint.ResultModel]

OutputTable: PortReference[PortTypeHint.TableData]

class modules.converters.TableToString

Convert TableData to descriptive strings using customizable templates.

This module allows you to describe table data row by row using string templates.
    You can define multiple templates to generate different descriptions from the same table,
    with each description saved as a separate UnitResult in a SingleResult output.

    The templates support:
    - Column names or field titles as placeholders: ``"{columnA} has {columnB}"``
    - Python format specifiers for numbers: ``"{value:.2f}"``, ``"{count:d}"``, etc.
    - Multiple templates with different keys for varied descriptions

Examples

--------
    Single template:
    >>> converter = TableToString(
    ...     templates={"description": "{columnA} has {columnB}"},
    ...     separator=", "
    ... )
    >>> converter.InputTable = my_table_data
    >>> result = converter.OutputDescriptions
    # Result: SingleResult with one UnitResult named "description"
    # Value: "a has 3, b has 4, c has 5"

    Multiple templates with formatting:
    >>> converter = TableToString(
    ...     templates={
    ...         "simple": "{item} has {count}",
    ...         "detailed": "Item {item} has a count of {count:.2f} units"
    ...     },
    ...     separator="
"
    ... )
    >>> converter.InputTable = my_table_data
    >>> result = converter.OutputDescriptions
    # Result: SingleResult with two UnitResults:
    #   "simple": "a has 3
b has 4
c has 5"
    #   "detailed": "Item a has a count of 3.00 units
Item b has a count of 4.00 units
..."

    Using field titles:
    >>> # If table has column "col1" with title "Item Name"
    >>> converter = TableToString(
    ...     templates={"desc": "{Item Name} has value {col2}"}
    ... )
    # Will work with both field names and titles

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'TableToString', auto_run: bool = True, table: PortTypeHint.TableData | None = None, templates: dict[str, str] | None = None, result_titles: dict[str, str] | None = None, separator: str = ', ', prefix: str = '', suffix: str = '', include_empty_rows: bool = True) → None

Initialize a TableToString object.

Parameters

tablePortTypeHint.TableData | None, default: None: The input TableData to convert. templates : dict[str, str] | None, default: None Dictionary of templates where key is the result name (will be the UnitResult name) and value is the template string. Template can use column names or field titles as placeholders. Supports Python format specifiers (e.g., "{value:.2f}" for 2 decimal places).

Examples

include_empty_rows : bool, default: True
    If False, rows containing any NaN or None values in the template columns
    will be skipped in the output description.
    If True, all rows are included (NaN/None values will appear as "nan" or "None").

Ports
InputTable: PortTypeHint.TableData
    The input TableData to convert to string descriptions.

OutputResultModel: PortTypeHint.ResultModel
    The output ResultModel containing string descriptions.

execute() → PortTypeHint.ResultModel | None: Execute the table to string conversion.

Attributes:

InputTable: PortReference[PortTypeHint.TableData]

OutputResultModel: PortReference[PortTypeHint.ResultModel]

class modules.converters.TableToJson

Convert TableData or TableSeries to JSON object using its serialize method.

The output text (stored in the value of ResultModel) can be useful for a LLM to analyze the table data.

Inherits from:: PipeModule

Methods:

Initialize a TableToJson object.

Parameters

tablePortTypeHint.TableData | PortTypeHint.TableSeries | None, default: None: The input table data or table series to convert to json object.
columnslist[str] | str | None, default: None: The columns to convert. If None, all columns will be converted. Column names can be either field names or field titles. Only applicable when the input is TableData.
result_key_namestr | None, default: None: The name of the key in the SingleResult returned for the json object. If None, the name of the table or series will be used. If the table name is None, “json_object” will be used.
result_key_titlestr | None, default: None: The title of the key in the SingleResult returned for the json object. If None, the title of the table or series will be used. If the table title is None, result_key_name will be used.
indentint | None, default: None: The indent of the json object. If None, the indent will not be used. Ports
InputTablePortTypeHint.TableData | PortTypeHint.TableSeries: The input table data or table series to convert to json object.
OutputResultModelPortTypeHint.ResultModel: The output ResultModel containing the json object as a string.
OutputJsonObjectPortTypeHint.JsonObject: The output json object in dict type.

execute() → PortTypeHint.JsonObject | None: Execute the table to JSON conversion.

Attributes:

InputTable: PortReference[PortTypeHint.TableData | PortTypeHint.TableSeries]

OutputResultModel: PortReference[PortTypeHint.ResultModel]

OutputJsonObject: PortReference[PortTypeHint.JsonObject]

class modules.converters.JsonToTable

Convert a json object to a table data or table series.

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'JsonToTable', auto_run: bool = True, json_object: PortTypeHint.JsonObject | PortTypeHint.Text | None = None, single_column_as: Literal[table_data, table_series, auto] = 'auto') → None

Initialize a JsonToTable object.

Parameters

json_objectPortTypeHint.JsonObject | PortTypeHint.Text | None, default: None

The input json object to convert to a table data or table series. single_column_as: Literal[“table_data”, “table_series”, “auto”], default: “auto” How to handle single column data:

“auto”: Keep the deserialized result as-is without conversion.

TableSeries JSON → TableSeries, TableData JSON → TableData.

“table_data”: Ensure single-column output is always TableData.

If deserialized result is TableSeries, convert to TableData. If result is single-column TableData, keep as TableData.

“table_series”: Ensure single-column output is always TableSeries.

If deserialized result is single-column TableData, convert to TableSeries. If result is TableSeries, keep as-is. Ports

InputJsonObjectPortTypeHint.JsonObject | PortTypeHint.Text

The input json object to convert to a table data or table series.

OutputTablePortTypeHint.TableData | PortTypeHint.TableSeries

The output table data or table series.

execute() → PortTypeHint.TableData | PortTypeHint.TableSeries | None: Execute the JSON to table conversion.

Attributes:

InputJsonObject: PortReference[PortTypeHint.JsonObject | PortTypeHint.Text]

OutputTable: PortReference[PortTypeHint.TableData | PortTypeHint.TableSeries]

class modules.converters.TableToMarkdown

Convert a ``TableData`` or ``TableSeries`` to a GitHub-Flavored Markdown table.

The output is optimised for LLM consumption and frontend rendering. Column headers are taken from field titles (or field names when ``use_titles=False``) and can optionally include the field unit in parentheses. An optional Markdown heading that shows the table title can be prepended, and ``center_title=True`` renders it as a centred HTML heading (``<h2 align="center">…</h2>``) for renderers that support inline HTML.

Examples

>>> m = TableToMarkdown(use_titles=True, include_units=True, show_index=False)
>>> m.InputTable = my_table
>>> print(m.OutputMarkdown)
勘探点数据
|  孔号  | 深度 (m) |  地层  |
| :--- |---:| :--- |
|  ZK-01  | 10.5 |  粉质黏土  |

>>> m = TableToMarkdown(center_title=True)
>>> m.InputTable = my_table
>>> print(m.OutputMarkdown)
<h2 align="center">勘探点数据</h2>
|  孔号  | 深度 (m) |  地层  |
| :--- |---:| :--- |
|  ZK-01  | 10.5 |  粉质黏土  |

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'TableToMarkdown', auto_run: bool = True, table: PortTypeHint.TableData | PortTypeHint.TableSeries | None = None, columns: list[str] | str | None = None, use_titles: bool = True, include_units: bool = True, include_table_title: bool = True, center_title: bool = False, show_index: bool = False, precision: int | dict[str, int | None] | None = None, latex_math: bool | set[Literal[table_title, field_titles, cells]] = False, result_key_name: str | None = None, result_key_title: str | None = None) → None

Initialise a TableToMarkdown module.

Parameters

tablePortTypeHint.TableData | PortTypeHint.TableSeries | None

The table to convert. Can also be supplied via the ``InputTable`` port.

columnslist[str] | str | None, default: None

Columns to include in the output, specified by field name or title. When a list is provided the columns appear in that exact order in the Markdown table, so this parameter doubles as a column-reorder control. ``None`` keeps all columns in their original order. Only applies when the input is a ``TableData``.

use_titlesbool, default: True

Use field titles as Markdown column headers. Set to ``False`` to use raw field names instead.

include_unitsbool, default: True

Append the field unit in parentheses to the column header, e.g. ``"深度 (m)"``. Ignored when the unit is ``UNITLESS`` / empty.

include_table_titlebool, default: True

Prepend a ``## <table title>`` Markdown heading above the table.

center_titlebool, default: False

Render the table title as a centred HTML heading (``<h2 align="center">…</h2>``) instead of a plain ``## …`` Markdown heading. Only has an effect when ``include_table_title=True`` and the table has a non-empty title.

show_indexbool, default: False

Include the DataFrame row index as the first column. Defaults to ``False`` because the index is usually a meaningless integer for LLMs. precision : int | dict[str, int | None] | None, default: None Decimal places for numeric cells when rendering Markdown, same semantics as ``DocDataWriter`` / ``TableData.export_doc_context``: an ``int`` applies to all numeric columns; a ``dict`` maps column names or titles to a decimal count, or ``None`` to leave that column unformatted. latex_math : bool | set[{“table_title”, “field_titles”, “cells”}], default: False Wrap LaTeX expressions in ``$...$`` for math-aware frontend renderers. Set to ``False`` (default) when the output goes to an LLM.

``False`` — no wrapping.
``True`` — wrap table title, column headers, and string-typed cells.
A ``set`` — granular, e.g. ``{"cells"}`` or ``{"field_titles", "cells"}``.

Wrapping is detection-based: a string is only wrapped when it contains a LaTeX command (``\alpha``, ``\frac``, …) or sub-/superscript notation. Numeric and non-LaTeX strings are left unchanged.

result_key_namestr | None, default: None

Field name of the single field in ``OutputResultModel``. Defaults to the table’s ``name`` attribute, or ``"markdown_table"`` if that is not set.

result_key_titlestr | None, default: None

Field title of the single field in ``OutputResultModel``. Defaults to the table’s ``title`` attribute, or ``result_key_name`` if that is not set. Ports

InputTablePortTypeHint.TableData | PortTypeHint.TableSeries

The input table to convert.

OutputMarkdownPortTypeHint.Text

The Markdown table (and optional heading) as a plain string.

OutputResultModelPortTypeHint.ResultModel

A dynamically-created ``ResultModel`` with a single string field (named by ``result_key_name``) whose value is the Markdown text.

execute() → str | None: Execute the table-to-Markdown conversion.

Attributes:

InputTable: PortReference[PortTypeHint.TableData | PortTypeHint.TableSeries]

OutputMarkdown: PortReference[PortTypeHint.Text]

OutputResultModel: PortReference[PortTypeHint.ResultModel]

class modules.converters.TablesToMarkdown

Convert a ``TableCollection`` to GitHub-Flavored Markdown tables.

Each ````TableData```` in the collection can be rendered as an independent
Markdown table, or — when the collection defines a main/sub-table
relationship and ````combine_related=True```` (default) — each main table is
joined with every one of its sub-tables on the configured primary key and
rendered as a single combined block.  Use ````related_tables_title```` to choose
whether the block heading is the main title, the sub title, or both
(````"{main} - {sub}"````, default).

Two rendering styles are available for the combined block
(````merge_mode```` parameter):

- ````"markdown"```` *(default)*: repeated main-table cell values are left
  blank on subsequent sub-rows — clean plain GFM that works well as LLM
  input.
- ````"html"````: an inline ````<table>```` with ````rowspan```` attributes is
  emitted, with merged cells horizontally centred — ideal for frontend
  display inside Markdown-aware renderers.

Examples

Combined Markdown mode (LLM-friendly):

>>> m = TablesToMarkdown()
>>> m.InputTables = my_collection
>>> print(m.OutputMarkdown.data)
勘探点汇总 - 地层分层
|  孔号   | 地面高程 (m) |  层顶深度 (m)  | 地层     |
|  -----  | ------------ |  ------------  | -------- |
|  ZK-01  | 12.5         |  0.0           | 填土     |
|         |              |  2.5           | 粉质黏土 |
|  ZK-02  | 10.0         |  0.0           | 填土     |

Combined HTML mode (frontend-friendly):

>>> m = TablesToMarkdown(merge_mode="html")
>>> m.InputTables = my_collection
>>> print(m.OutputMarkdown.data)
勘探点汇总 - 地层分层

<table>
  <thead><tr><th>孔号</th><th>地面高程 (m)</th>...</tr></thead>
  <tbody>
    <tr><td rowspan="2" style="vertical-align:middle">ZK-01</td>...</tr>
    ...
  </tbody>
</table>

Column selection (only show selected columns, pk always first in combined):

>>> m = TablesToMarkdown(columns={"main": ["bore_id", "elevation"],
...                               "sub":  ["top_depth", "geo"]})

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'TablesToMarkdown', auto_run: bool = True, tables: PortTypeHint.TableCollection | None = None, combine_related: bool = True, related_tables_title: Literal[combined, main, sub] = 'sub', columns: dict[str, list[str] | str] | None = None, merge_mode: Literal[markdown, html] = 'markdown', use_titles: bool = True, include_units: bool = True, include_table_title: bool = True, center_title: bool = False, show_index: bool = False, separator: str = '\n\n---\n\n', precision: int | dict[str, int | None] | dict[str, dict[str, int | None]] | None = None, latex_math: bool | set[Literal[table_title, field_titles, cells]] = False, result_key_name: str | None = None, result_key_title: str | None = None) → None

Initialise a TablesToMarkdown module.

Parameters

tablesPortTypeHint.TableCollection | None

The collection to convert. Can also be supplied via the ``InputTables`` port.

combine_relatedbool, default: True

When the collection has a main/sub-table relationship and a primary key, join each main table with its sub-tables. Set to ``False`` to render every table independently. related_tables_title : Literal[“combined”, “main”, “sub”], default: “sub” For merged main/sub blocks only: ``"combined"`` → ``"{main_table_title} - {sub_table_title}"``; ``"main"`` / ``"sub"`` use that table’s title (falling back to the other if missing). columns : dict[str, list[str] | str] | None, default: None Per-table column selection. The key is the table name or title; the value is a column name / title, or a list of column names / titles, that determines which columns to include and in what order. ``None`` keeps all columns. When ``combine_related=True`` the primary key column is always placed first in the main-table column list regardless of the selection. merge_mode : {“markdown”, “html”}, default: “markdown” How repeated main-table values are rendered in a combined block.

``"markdown"``: subsequent cells are left blank (LLM-friendly

plain GFM).

``"html"``: an inline ``<table>`` with ``rowspan`` attributes

and ``vertical-align:middle`` on merged cells (frontend-friendly). Only affects combined blocks; independent tables are always plain GFM.

use_titlesbool, default: True

Use field titles as column headers.

include_unitsbool, default: True

Append the field unit in parentheses to each column header.

include_table_titlebool, default: True

Prepend a ``## <title>`` heading above each table block.

center_titlebool, default: False

Render headings as ``<h2 align="center">…</h2>`` instead of plain ``## …``.

show_indexbool, default: False

Include the DataFrame row index as the first column. Only applies when tables are rendered independently (not combined).

separatorstr, default: ``"\n\n---\n\n"``

String used to join separate table blocks. precision : int | dict[str, int | None] | dict[str, dict[str, int | None]] | None, default: None Decimal places for numeric cells. An ``int`` or a flat dict applies the same rules as ``DocDataWriter`` across all tables. A nested dict ``{ <table name or title>: { <column name or title>: n } }`` sets precision per table (table keys match ``columns``: name first, then title), so the same column name can use different rounding in different tables. Nested layout applies only when every top-level value is a ``dict`` (see ``convert_table_collection_to_markdown``). latex_math : bool | set[{“table_title”, “field_titles”, “cells”}], default: False Wrap LaTeX expressions in ``$...$`` for math-aware frontend renderers. Set to ``False`` (default) when the output goes to an LLM.

``False`` — no wrapping.
``True`` — wrap table title, column headers, and string-typed cells.
A ``set`` — granular, e.g. ``{"cells"}`` or ``{"field_titles", "cells"}``.

Applied uniformly to every table block in the collection.

result_key_namestr | None, default: None

Field name of the single field in ``OutputResultModel``. Defaults to the collection’s ``name``, or ``"markdown_tables"``.

result_key_titlestr | None, default: None

Field title of the single field in ``OutputResultModel``. Defaults to the collection’s ``title``, or ``result_key_name``. Ports

InputTablesPortTypeHint.TableCollection

The input collection to convert.

OutputMarkdownPortTypeHint.Text

All table blocks joined by ``separator``.

OutputResultModelPortTypeHint.ResultModel

A dynamically-created ``ResultModel`` with a single string field whose value is the full output text.

execute() → str | None: Execute the collection-to-Markdown conversion.

Attributes:

InputTables: PortReference[PortTypeHint.TableCollection]

OutputMarkdown: PortReference[PortTypeHint.Text]

OutputResultModel: PortReference[PortTypeHint.ResultModel]

class modules.converters.DocxToMarkdown

Convert a document (.docx or .rtf) to markdown format.

This module converts Microsoft Word (.docx) or Rich Text Format (.rtf) documents to markdown format with comprehensive options for image handling and markdown variant selection. It’s optimized for LLM processing with sensible defaults.

Inherits from:: PipeModule

Methods:

__init__(mname: str = 'DocxToMarkdown', auto_run: bool = True, input_file: PortTypeHint.FilePath | dict | None = None, result_key_name: str | None = None, result_key_title: str | None = None, markdown_format: Literal[gfm, markdown, commonmark, markdown_strict, markdown_phpextra, markdown_mmd] = 'gfm', extract_images: bool = True, images_dir: str | Path | None = None, images_dir_relative_to_input: bool = True, relative_image_link: bool = False, embed_images: bool = False, wrap_text: bool = False, extra_args: list[str] | None = None) → None

Initialize a DocxToMarkdown module.

Parameters

input_filePortTypeHint.FilePath | dict | None, default: None

The input docx or rtf file path. If the data of InputDocxFile port is not None, self.input_file will be overwritten. If dict, it will be converted to a GdimMinIOFile object to get the file from minIO server.

result_key_namestr | None, default: None

Field name of the single field in ``OutputResultModel``. Defaults to the stem of the input filename, or ``"markdown_content"`` if that is not available.

result_key_titlestr | None, default: None

Field title of the single field in ``OutputResultModel``. Defaults to ``result_key_name`` when not provided.

markdown_formatstr, default: “gfm”

The markdown variant to use for conversion. Options:

“gfm”: GitHub Flavored Markdown (default, best for LLMs)

LLMs are extensively trained on GitHub content, making GFM

the most familiar and well-understood markdown variant for AI models

Supports: tables, task lists, strikethrough, fenced code blocks

“markdown”: Pandoc’s extended markdown (most features)

Supports: tables, footnotes, math, definition lists, metadata blocks

“commonmark”: CommonMark specification (strict standard)

Basic markdown with strict spec compliance

“markdown_strict”: Original markdown.pl implementation

Only original markdown features (very limited)

“markdown_phpextra”: PHP Markdown Extra

Tables, footnotes, definition lists, fenced code blocks

“markdown_mmd”: MultiMarkdown

Citations, cross-references, metadata, math support

extract_imagesbool, default: True

Whether to extract images from the document. If True, images are saved to a directory and referenced with relative paths in the markdown. Recommended for multi-modal models as it provides both text and separate image files that can be processed independently.

images_dirstr | Path | None, default: None

Directory for extracted images. If None, uses ``"{{stem}}_images"`` where ``stem`` is the input filename without suffix. See ``images_dir_relative_to_input`` for whether that folder is under the input’s parent or under the process working directory.

images_dir_relative_to_inputbool, default: False

When True, a missing or relative ``images_dir`` is rooted under the input file’s parent directory. When False, it is rooted under the current working directory (including the default ``{{stem}}_images`` name).

relative_image_linkbool, default: True

When ``images_dir`` is an absolute path, if True (default), image URLs in the markdown are rewritten to be relative to ``Path(images_dir).parent`` (save the ``.md`` beside that ``images`` folder). Ignored for non-absolute ``images_dir`` (those URLs are always rewritten for portability).

embed_imagesbool, default: False

If True, embeds images as base64-encoded data URIs directly in the markdown content, creating a self-contained file. This overrides extract_images and images_dir settings. Note: For multi-modal LLMs, extract_images=True is generally preferred as it allows the model to process images and text separately.

wrap_textbool, default: False

Whether to wrap text at a certain column width. Default is False which means no text wrapping (–wrap=none is passed to pandoc).

**RecommendedFalse for LLM processing** to preserve the original text flow

and avoid artificial line breaks that could confuse language models. Set to True if you need wrapped text for human readability.

extra_argslist[str] | None, default: None

Additional command-line arguments for advanced customization.

Examples

Basic usage with default settings (best for LLMs):

>>> module = DocxToMarkdown(input_file="report.docx")
>>> print(module.OutputMarkdown)
>>> result = module.OutputResultModel
>>> print(result.markdown_content)

Self-contained document with embedded images:

>>> module = DocxToMarkdown(
...     input_file="report.docx",
...     embed_images=True
... )

For academic writing with MultiMarkdown format:

>>> module = DocxToMarkdown(
...     input_file="paper.docx",
...     markdown_format="markdown_mmd"
... )

Enable text wrapping for human readability:

>>> module = DocxToMarkdown(
...     input_file="document.docx",
...     wrap_text=True
... )

Notes

**For Multi-Modal LLM Processing:**

The default settings (extract_images=True, markdown_format="gfm", wrap_text=False)
are optimized for multi-modal LLMs. This approach:

#. Extracts images to a separate directory with relative paths in markdown
#. Allows the LLM to process both text and images independently
#. Uses GFM format which LLMs understand best
#. Disables text wrapping to preserve original text flow

**Image Handling Strategies:**

- **extract_images=True** (default): Best for multi-modal processing
    * Images saved as separate files
    * Markdown contains relative paths: ``!`alt <images/image1.png>`_``
    * Multi-modal models can process each image independently

- **embed_images=True**: Creates self-contained document
    * Images embedded as base64 data URIs in markdown
    * Single file, but larger size
    * Less ideal for multi-modal processing

- **extract_images=False**: Images are skipped
    * Only text content is converted
    * Use when images are not needed

update_ui_schema(reset: bool = False) → dict[str, UIAttributeSchema]

execute() → PortTypeHint.Text | None: Execute the document to markdown conversion.

Attributes:

InputDocxFile: PortReference[PortTypeHint.FilePath]

OutputMarkdown: PortReference[PortTypeHint.Text]

OutputResultModel: PortReference[PortTypeHint.ResultModel]

class modules.converters.ConvertToBoreForCadDraw

Convert the input table collection to a BoreForCadDraw object and save it to .gsc file

which can be used for cad drawing (钻孔柱状图).

Inherits from:: PipeModule

Methods:

Initialize a ConvertToBoreForCadDraw object.

Parameters

tablesPortTypeHint.TableCollection | None, default: None: The input table collection containing bore, layer, and test data etc. Here are the table names:

bore_table : 钻孔一览表

layer_table : 地层表

materials_table : 标准地层表

spt_table : 标贯表

cpt_table : 双桥静探表

dpt_table : 动探表

wave_table : 波速表

samples_table : 取样表

soils_test_table : 常规试验表

geo_params_tablePortTypeHint.TableData | None, default: None: The input table data containing geo parameters data from Gdim App ‘岩土参数建议值表’.

geo_parameters_table : 岩土参数建议值表

proj_infoPortTypeHint.ResultModel | None, default: None

The input ResultModel containing project info data. name_maps: dict[str, dict[str, str]] | None, default: None Mapping of table names and field names. Structure: { “table_names”: {“bore_table”: “actual_bore_table_name”, …}, “field_names”: {“bore_table”: {“bore_num”: “actual_bore_num_field”, …}, …} }

selected_boreslist[str] | None, default: None

List of bores (input bore numbers) to convert. If None, all bores will be converted. drawing_scales: dict[str, int] | int | None, default: None Drawing scales for bores. If int, it will be used for all bores. If dict, the key is bore number, the value is drawing scale. If list, use the following format - [{"bore_num": "num_1", "drawing_scales": 200}, {"bore_num": "num_2", "drawing_scales": 300}] proj_info_name_map: dict[str, str] | None, default: None Mapping for project info fields from proj_info to ProjectInfo. sample_types_map: dict[str, int] | None, default: None Mapping from string sample type names to integer codes. For example: {“厚壁原状”: 0, “薄壁原状”: 0, “扰动样”: 1, “岩石样”: 2, “水样”: 3}

output_dirstr | Path | None, default: None

The directory to save the output gsc file.

‘workspace’ of pipeline has priority over the ‘output_dir’.
If both ‘output_dir’ and ‘workspace’ are None, the current working directory will be used.

gsc_file_namestr, default: “bore_for_cad_draw.gsc”

The name of the gsc file.

save_to_gdimbool, default: False

If True, the generated .gsc file will also be saved to the gdim file server.

tokenstr | None

The token of the user. Must be provided when save_to_gdim is True.

proj_idint | str | None

The id of the gdim project. Must be provided when save_to_gdim is True.

hoststr | None

The host of the gdim platform.

update_ui_schema(reset: bool = False) → dict[str, UIAttributeSchema]

set_cal_params(reset: bool = True) → dict[str, RangeModel]

execute() → BoreForCadDraw | None

Attributes:

InputTables: PortReference[PortTypeHint.TableCollection]

InputGeoParamsTable: PortReference[PortTypeHint.TableData]

InputProjectInfo: PortReference[PortTypeHint.ResultModel]

InputToken: PortReference[PortTypeHint.Token]

OutputBoreForCadDraw: PortReference[PortTypeHint.BoreForCadDraw]

OutputFile: PortReference[PortTypeHint.FilePath | PortTypeHint.GdimFile]

class modules.converters.ConvertToBoreForPlanDraw

Convert the input table collection to a BoreForPlanDraw object and save it to .gsc file

which can be used for bores plan drawing (钻孔平面布置图).

Inherits from:: PipeModule

Methods:

__init__(mname: str | None = 'ConvertToBorePlanForCad', auto_run: bool = True, tables: PortTypeHint.TableCollection | None = None, proj_info: PortTypeHint.ResultModel | None = None, coordinate_system: PortTypeHint.SingleResult | None = None, name_maps: dict[str, dict[str, str]] | None = None, bore_types_map: dict[str, BoreTypes] | None = None, proj_info_name_map: dict[str, str] | None = None, output_dir: str | Path | None = None, gsc_file_name: str = 'bore_plan_for_cad_draw.gsc', save_to_gdim: bool = False, token: str | None = None, proj_id: int | str | None = None, host: str | None = None) → None

Initialize a ConvertToBorePlanForCad object.

Parameters

tablesPortTypeHint.TableCollection | None, default: None: The input table collection containing bore and section line data. Here are the table names:

bore_table : 钻孔一览表

section_line_table : 剖面线表

proj_infoPortTypeHint.ResultModel | None, default: None

The input ResultModel containing project info data. name_maps: dict[str, dict[str, str]] | None, default: None Mapping of table names and field names. Structure: { “table_names”: {“bore_table”: “actual_bore_table_name”, “section_line_table”: “actual_section_line_table_name”}, “field_names”: { “bore_table”: {“bore_num”: “actual_bore_num_field”, “x”: “actual_x_field”, “y”: “actual_y_field”, “bore_type”: “actual_bore_type_field”, “top”: “actual_top_field”}, “section_line_table”: {“name”: “actual_name_field”, “bores”: “actual_bores_field”} } } bore_types_map: dict[str, BoreTypes] | None, default: None Mapping from string bore type names to BoreTypes enum. For example: {“鉴别孔”: BoreTypes.IdentificationBore, “取土试样钻孔”: BoreTypes.SoilSamplingBore} If the map is not specified, will try to compare to the names and titles of BoreTypes automatically. proj_info_name_map: dict[str, str] | None, default: None Mapping for project info fields from proj_info to ProjectInfo.

output_dirstr | Path | None, default: None

The directory to save the output gsc file.

‘workspace’ of pipeline has priority over the ‘output_dir’.
If both ‘output_dir’ and ‘workspace’ are None, the current working directory will be used.

gsc_file_namestr, default: “bore_plan_for_cad_draw.gsc”

The name of the gsc file. If None, the gsc file will not be saved.

save_to_gdimbool, default: False

If True, the generated .gsc file will also be saved to the gdim file server.

tokenstr | None

The token of the user. Must be provided when save_to_gdim is True.

proj_idint | str | None

The id of the gdim project. Must be provided when save_to_gdim is True.

hoststr | None

The host of the gdim platform.

update_ui_schema(reset: bool = False) → dict[str, UIAttributeSchema]

execute() → BoreForPlanDraw | None

Attributes:

InputTables: PortReference[PortTypeHint.TableCollection]

InputProjectInfo: PortReference[PortTypeHint.ResultModel]

InputCoordinateSystem: PortReference[PortTypeHint.CoordinateSystem]

InputToken: PortReference[PortTypeHint.Token]

OutputBoreForPlanDraw: PortReference[PortTypeHint.BoreForPlanDraw]

OutputFile: PortReference[PortTypeHint.FilePath | PortTypeHint.GdimFile]