modules.readers
Classes
- class modules.readers.GetGdimToken
Get user token and project id from Gdim.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'GetGdimToken', auto_run: bool = True, user_name: str | None = None, password: str | None = None, login_by_token: bool = False, token: str | None = None, proj_id: str | None = None, host: str | None = None, gdim: bool = True, platform: str | None = None) None
Initialize GetGdimToken object.
Parameters
- user_namestr | None, default: None
Username for login. If None, will try to read from .env file (GDIM_USERNAME)
- passwordstr | None, default: None
Password for login. If None, will try to read from .env file (GDIM_PASSWORD)
- login_by_tokenbool, default: False
Whether to login by token. If True, user_name and password will be ignored.
- tokenstr | None, default: None
The token of the user. If not None, user_name and password will be ignored no matter what the value of login_by_token is.
- proj_idstr | None, default: None
The project id.
- gdimbool, default: True
Whether the project is a GDIM project.
- hoststr | None, default: None
The host of the platform.
- platformstr | None, default: None
The platform name.
Attributes:
- OutputToken: PortReference[PortTypeHint.Token]
- class modules.readers.GetGdimFile
Get a file object from Gdim.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str | None = None, auto_run: bool = True, gdim_file: dict | None = None, host: str | None = None) None
Initialize GetGdimFile object.
Parameters
- gdim_filedict | None, default: None
The dict data for gdim file object. Check
GdimMinIOFilefor the detail of the data stucture.- hoststr | None, default: None
The host of the platform. If None, the host in the config file will be used.
Attributes:
- OutputGdimFile: PortReference[PortTypeHint.GdimFile]
- class modules.readers.GdimTemplateReader
Read the structure of a Gdim template by template id or project id.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'GdimTemplateReader', auto_run: bool = True, tpl_id: str | None = None, get_app_info: bool = False, template_tree: bool = True, token: str | None = None, proj_id: str | None = None, host: str | None = None) None
Initialize GdimTemplateReader object.
Parameters
- tpl_idstr | None, default: None
The id of the template. If not None, proj_id will be ignored.
- get_app_infobool, default: False
Whether to get the application information.
- template_treebool, default: True
Whether to get the hierarchical table structure (parent-child relationships via
sub_tables). If False, only flat table metadata is fetched.- tokenstr | None, default: None
The token of the user
- proj_idstr | None, default: None
The id of the project.
- hoststr | None, default: None
The host of the platform. If None, the host in the config file will be used. Ports
- InputTokenPortReference[PortTypeHint.Token]
The token of the user. If None, the token will be get from the pipeline.
- OutputTemplatePortReference[PortTypeHint.GdimTemplate]
The structure of the template.
Notes
- The module will try to get the gdim template from the pipeline at first.
Attributes:
- InputToken: PortReference[PortTypeHint.Token]
- OutputTemplate: PortReference[PortTypeHint.GdimTemplate]
- class modules.readers.GdimTableReader
Read the data of serveral tables in a Gdim project.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'GdimTableReader', auto_run: bool = True, table_fields: dict[str, list[str]] | list[str] | str | None = None, format_dict: dict[str, dict[str, str]] | None = None, auto_detect_related_tables: bool = True, main_table_name: str | None = None, sub_table_names: list[str] | None = None, primary_key: str | None = None, primary_key_values: dict[str, int | float | str] | None = None, keep_gdim_id: bool = False, table_collection_name: str | None = None, table_collection_title: str | None = None, table_collection_description: str | None = None, missing_error_type: Literal[error, warning, gdi_warning] = 'error', empty_error_type: Literal[warning, gdi_warning, create_empty_table] = 'warning', all_empty_output: Literal[empty_collection, none] = 'empty_collection', output_table_name: str | None = None, token: str | None = None, proj_id: str | None = None, host: str | None = None, gdim: bool = True) None
“Initialize GdimTableReader object.
Examples
This means the data of the sub-table 'layer_table' will be filtered by the primary_key_value 'zk1'. Only valid when gdim is True. keep_gdim_id: bool, default: False Whether to keep the gdim_id column in the output table. Only valid when gdim is True. table_collection_name: str | None, default: None The name of the table collection. table_collection_title: str | None, default: None The title of the table collection. table_collection_description: str | None, default: None The description of the table collection. missing_error_type: Literal["error", "warning", "gdi_warning"], default: "error" The type of the error when the table or field is not found. If ``error``, a ``KeyError`` will be raised. If ``warning``, a ``UserWarning`` will be printed in the console. If ``gdi_warning``, a ``GDIWarning`` will be printed in the console and show a warning in GDIM. empty_error_type: Literal["warning", "gdi_warning", "create_empty_table"], default: "warning" The type of the error when the table data is empty. If ``warning``, a ``UserWarning`` will be printed in the console. If ``gdi_warning``, a ``GDIWarning`` will be printed in the console and show a warning in GDIM. If ``create_empty_table``, an empty TableData will be created according to the template structure. all_empty_output: Literal["empty_collection", "none"], default: "empty_collection" The output when no tables end up in the collection (e.g. every read table was empty and skipped by ``empty_error_type``). If ``empty_collection``, an empty ``TableCollection`` is returned (backward-compatible default). If ``none``, ``OutputTables``, ``OutputTable``, and the ``execute`` return value are all ``None``. output_table_name: str | None, default: None The name or title of the table which will be the data of OutputTable port. If None, the first table will be the data of OutputTable port. If it's not found, 'None' will be used. token: str | None, default: None The token of the user. proj_id: str | None, default: None The id of the project. If ``self.proj_id`` is not None, the ``proj_id`` in InputToken or pipeline's ``gdim_proj_id`` will be ignored. host: str | None, default: None If it's None, defulat value will be used, for example: "https://gdim.kulunsoft.com" gdim: bool, default: True Whether to get the data from GDIM. If False, the data is from GBIM. Ports InputToken: PortReference[PortTypeHint.Token] The token of the user. If None, the token will be get from the pipeline. OutputTables: PortReference[PortTypeHint.TableCollection] The data of the tables. OutputTable: PortReference[PortTypeHint.TableData] The data of the specified table by 'output_table_name' parameter. If 'output_table_name' is None, the first table will be the data of OutputTable port.Notes
- The module will try to get the gdim template from the pipeline at first. - If you want to change the ``proj_id`` by InputToken, DO NOT set ``proj_id`` in the constructor or assign value to ``proj_id``, as the ``self.proj_id`` has the highest priority.
Attributes:
- InputToken: PortReference[PortTypeHint.Token]
- OutputTables: PortReference[PortTypeHint.TableCollection]
- OutputTable: PortReference[PortTypeHint.TableData]
- class modules.readers.GdimProjectTablesReader
Read the data of tables from multiple projects with the same template and in the same space.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'GdimProjectTablesReader', auto_run: bool = True, tpl_id: str | None = None, proj_ids: list[str] | None = None, table_fields: dict[str, list[str]] | list[str] | str | None = None, format_dict: dict[str, dict[str, str]] | None = None, auto_detect_related_tables: bool = True, main_table_name: str | None = None, sub_table_names: list[str] | None = None, primary_key: str | None = None, primary_key_values: dict[str, int | float | str] | None = None, keep_gdim_id: bool = False, table_collection_name: str | None = None, table_collection_title: str | None = None, table_collection_description: str | None = None, missing_error_type: Literal[error, warning, gdi_warning] = 'error', empty_error_type: Literal[warning, gdi_warning, create_empty_table] = 'warning', project_select_fields: str | list[str] = 'projectName', token: str | None = None, host: str | None = None) None
Initial GdimProjectTablesReader object.
Parameters
- tpl_idstr | None, default: None
The id of the project template. Will try to get from the pipeline at first. If None, no projects will be read.
- proj_idslist[str] | None, default: None
The ids of the projects. If None, all the projects will be read. table_fields: dict[str, list[str]] | list[str] | str | None, default: None If a dict, key can be either table name or title, value can be either field names or titles. The system automatically detects whether the provided keys are names or titles.
Examples
This means the data of the sub-table 'layer_table' will be filtered by the primary_key_value 'zk1'. Only valid when gdim is True. keep_gdim_id: bool, default: False Whether to keep the gdim_id column in the output table. Only valid when gdim is True. table_collection_name: str | None, default: None The name of the table collection. table_collection_title: str | None, default: None The title of the table collection. table_collection_description: str | None, default: None The description of the table collection. missing_error_type: Literal["error", "warning", "gdi_warning"], default: "error" The type of the error when the table or field is not found. If ``error``, a ``KeyError`` will be raised. If ``warning``, a ``UserWarning`` will be printed in the console. If ``gdi_warning``, a ``GDIWarning`` will be printed in the console and show a warning in GDIM. empty_error_type: Literal["warning", "gdi_warning", "create_empty_table"], default: "warning" The type of the error when the table data is empty. If ``warning``, a ``UserWarning`` will be printed in the console. If ``gdi_warning``, a ``GDIWarning`` will be printed in the console and show a warning in GDIM. If ``create_empty_table``, an empty TableData will be created according to the template structure. project_select_fields: str | list[str] = "projectName", Only used for UI deign. This parameter decides the 'selections_name' showed to users in the UI. The value can be field name or title of the project info. For example: ``project_select_fields = "projectName"``: The 'selections_name' will be the 'projectName'. ``project_select_fields = ["projectName", "projectManager"]``: The 'selections_name' will be 'projectName | projectManager'. token: str | None, default: None The token of the user. host: str | None, default: None If it's None, defulat value will be used, for example: "https://gdim.kulunsoft.com"
Attributes:
- InputToken: PortReference[PortTypeHint.Token]
- OutputProjectTables: PortReference[PortTypeHint.TableCollectionDict]
- class modules.readers.GdimBoresCoordinateReader
Read the coordinate information of bores from Gdim with ‘bore_table (钻孔一览表)’.
The output table includes column - ProfileNumber, XCoordinate, YCoordinate, Longitude, Latitude.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str | None = None, auto_run: bool = True, host: str | None = None, proj_id: str | None = None, token: str | None = None, gdim: bool = True) None
“Initialize GdimAppSurveyStatReader object.
Parameters
- hoststr | None, default: None
If it’s None, defulat value will be used, for example: “https://gdim.kulunsoft.com”
- proj_idstr | None, default: None
The id of the project.
- tokenstr | None, default: None
The token of the user.
- gdimbool, default: True
Whether to get the data from GDIM. If False, the data is from GBIM.
Properties:
- OutputTable
- class modules.readers.GdimAppDataReader
Read the data saved by a pipeline app from the GDIM platform database.
This module retrieves data that was previously saved using :meth:
~gdi.pipeline.pipeline.PipeLine.save_data_to_db, which can include:Output port data from modules (e.g.
``TableData``,``TableCollection``,``ResultModel``)Module attributes / parameters (
``str``,``int``,``float``,``bool``, …)Pipeline attributes (
``workspace``,``app_name``, …)
The data is deserialized and packed into a :class:
~gdi.dataclass.results.ResultModelwhose field names follow the key naming strategy:``
"module_name@port_name"``— output port data``
"pipeline@attr_name"``— pipeline attributes``
"module_name#attr_name"``— module attributes
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'GdimAppDataReader', auto_run: bool = True, app_title: str | None = None, token: str | None = None, proj_id: str | None = None, host: str | None = None) None
Initialize GdimAppDataReader.
Parameters
- app_titlestr | None, default: None
The title of the app in the GDIM template.
- tokenstr | None, default: None
The user authentication token.
- proj_idstr | None, default: None
The project id. When set, overrides the
``proj_id``carried by``InputToken``or the pipeline’s``gdim_state``.- hoststr | None, default: None
GDIM host URL. Defaults to ``
"https://gdim.kulunsoft.com"``when``None``. Ports- InputTokenPortReference[PortTypeHint.Token]
The token of the user.
- OutputResultModelPortReference[PortTypeHint.ResultModel]
The deserialized data of the pipeline app.
- execute() PortTypeHint.ResultModel | None
Fetch the app’s saved data from GDIM and return it as a
``ResultModel``.
Attributes:
- InputToken: PortReference[PortTypeHint.Token]
- OutputResultModel: PortReference[PortTypeHint.ResultModel]
- class modules.readers.FileDownloader
Download the file from the GDIM platform or a url.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str | None = 'FileDownloader', auto_run: bool = True, file_url: PortTypeHint.HttpUrl | PortTypeHint.GdimFile | dict | None = None, output_dir: str | Path | None = None, output_name: str | None = None) None
Initialize FileDownloader object.
Parameters
- file_urlPortTypeHint.HttpUrl | PortTypeHint.GdimFile | dict | None, default: None
If it’s GdimFile object or dict, the file will be downloaded from the GDIM platform. If it’s url, the file will be downloaded from the url.
- output_dirstr | Path | None, default: None
The directory to save the file.
‘workspace’ of pipeline has priority over the ‘output_dir’.
If both ‘output_dir’ and ‘workspace’ are None, the current working directory will be used.
- output_namestr | None, default: None
The name of for the downloaded file. If it’s None, the file name will be extracted from the file url.
Notes
If the data in InputFileUrl is not None, the self.file_url will be overwritten by the data of the input port.
Attributes:
- InputFileUrl: PortReference[PortTypeHint.HttpUrl | PortTypeHint.GdimFile]
- OutputFile: PortReference[PortTypeHint.FilePath]
- class modules.readers.GdimAppProjectInfoReader
Read the data from Project Information APP.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'GdimAppProjectInfoReader', auto_run: bool = True, token: str | None = None, proj_id: str | None = None, host: str | None = None, gdim: bool = True) None
Initialize GdimAppProjectInforReader object.
Parameters
- tokenstr | None, default: None
The token of the user.
- proj_idstr | None, default: None
The id of the project.
- hoststr | None, default: None
If it’s None, defulat value will be used, for example: “https://gdim.kulunsoft.com”
- gdimbool, default: True
Whether to get the data from GDIM. If False, the data is from GBIM. Ports
- InputTokenPortTypeHint.Token
The token of the user.
- OutputProjectInfoPortTypeHint.ResultModel
The data of the project information.
- OutputCoordinateSystemPortTypeHint.CoordinateSystem
The coordinate system of the project.
Attributes:
- InputToken: PortReference[PortTypeHint.Token]
- OutputProjectInfo: PortReference[PortTypeHint.ResultModel]
- OutputCoordinateSystem: PortReference[PortTypeHint.CoordinateSystem]
- class modules.readers.GdimProjectListReader
Read the project information from multiple projects with the same template and in the same space.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'GdimProjectListReader', auto_run: bool = True, tpl_id: str | None = None, proj_ids: list[str] | None = None, project_select_fields: str | list[str] = 'projectName', token: str | None = None, host: str | None = None) None
Initialize GdimProjectListReader object.
Parameters
- tpl_idstr | None, default: None
The id of the template. Will try to get from the pipeline at first. If None, no projects’ information will be read.
- proj_idslist[int] | None, default: None
The ids of the projects. If None, all projects’ information will be read. project_select_fields: str | list[str] = “projectName”, Only used for UI deign. This parameter decides the ‘selections_name’ showed to users in the UI. The value can be field name or title of the project info. For example:
project_select_fields = "projectName": The ‘selections_name’ will be the ‘projectName’.project_select_fields = ["projectName", "projectManager"]: The ‘selections_name’ will be ‘projectName | projectManager’.- tokenstr | None, default: None
The token of the user.
- hoststr | None, default: None
If it’s None, defulat value will be used, for example: “https://gdim.kulunsoft.com”
Attributes:
- InputToken: PortReference[PortTypeHint.Token]
- OutputSingleResultDict: PortReference[PortTypeHint.SingleResultDict]
- OutputTableData: PortReference[PortTypeHint.TableData]
- class modules.readers.GdimTerrainDataReader
Read terrain point data from the GDIM application Terrain Data Manager.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'GdimTerrainDataReader', auto_run: bool = True, group_name: str | None = None, query_mode: Literal[all, nearest_point, nearest_line, region] = 'all', ref_point: tuple[float, float, float] | None = None, line_points: list[tuple[float, float, float]] | None = None, line_name: str | int = 0, polygon: list[tuple[float, float, float]] | None = None, limit: int = 10, keep_gdim_id: bool = False, token: str | None = None, proj_id: str | None = None, host: str | None = None) None
Initialize GdimTerrainDataReader.
Parameters
- group_namestr | None, default: None
Terrain data group name shown in Terrain Data Manager. The module resolves the name to a group id automatically. query_mode: Literal[“all”, “nearest_point”, “nearest_line”, “region”], default: “all” Query strategy:
all,nearest_point,nearest_line, orregion. ref_point: tuple[float, float, float] | None, default: None Reference point(x, y, z)fornearest_pointmode. line_points: list[tuple[float, float, float]] | None, default: None Reference line vertices as(x, y, z)tuples fornearest_linemode. Overwritten whenInputPolyLinesis connected.- line_namestr | int, default: 0
Line selector used with
InputPolyLines. A string selects the line by name; an integer selects the line by index. polygon: list[tuple[float, float, float]] | None, default: None Closed region vertices as(x, y, z)tuples forregionmode. The API closes the polygon automatically.- limitint, default: 10
Maximum number of points returned by nearest-point and nearest-line queries. Ignored for
allandregionmodes.- keep_gdim_idbool, default: False
Whether to keep the GDIM point id as a
``gdim_id``column in the output table. If False, the id column is dropped. Ports- InputPolyLinesPortTypeHint.TableData | None
Poly line vertices from CreatePolyLines. When connected, the selected line overwrites
line_points. token: User authentication token. proj_id: Project id. Ifself.proj_idis not None, theproj_idin InputToken or pipeline’sgdim_proj_idwill be ignored. host: GDIM host URL. Defaults to the configured GDIM domain whenNone.
Attributes:
- InputToken: PortReference[PortTypeHint.Token]
- InputPolyLines: PortReference[PortTypeHint.TableData]
- OutputTable: PortReference[PortTypeHint.TableData]
- class modules.readers.GdimAppGeoParamsReader
Read the data from the Gdim APP - Geo Parameters Table (岩土参数建议值表).
- Inherits from:
PipeModule
Methods:
- __init__(mname: str | None = None, auto_run: bool = True, host: str | None = None, proj_id: str | None = None, token: str | None = None, fields: list[str] | None = None, format_dict: dict[str, str] | None = None, names_type: Literal[name, title] = 'title', column_name: Literal[name, title] = 'name', gdim: bool = True) None
“Initialize GdimAppGeoParamsReader object.
Parameters
- hoststr | None, default: None
If it’s None, defulat value will be used, for example: “https://gdim.kulunsoft.com”
- proj_idstr | None, default: None
The id of the project.
- tokenstr | None, default: None
The token of the user.
- fieldslist[str] | None, default: None
The field name list. If
names_typeistitle, the field title will be used. Ifnames_typeisname, the field name will be used.
Examples
names_type: Literal["name", "title"], default: "title" The type of the field name. If it's ``title``, the title will be used which is user-friendly. If it's ``name``, the name will be used which is used in the database. column_name: Literal["name", "title"], default: "name" The type of the column name in the output table. If it's ``title``, the title will be used which is user-friendly. If it's ``name``, the name will be used which is used in the database. gdim: bool, default: True Whether to get the data from GDIM. If False, the data is from GBIM.Notes
format_dict: dict[str, str] | None, default: None The format dict. If ``names_type`` is ``title``, the key is the field title. If ``names_type`` is ``name``, the key is the field name.
Properties:
- InputToken
- InputFields
- OutputTable
- class modules.readers.GdimAppSurveyStatReader
Read the data from the Gdim APP - Survey Statistics.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str | None = None, auto_run: bool = True, host: str | None = None, proj_id: str | None = None, token: str | None = None, stat_type: Literal[地层表, 常规试验表, 标贯, 岩石试验, 动探, 双桥静探, 物理力学指标统计表] | None = None, format_dict: dict[str, str] | None = None, column_name: Literal[name, title] = 'name', gdim: bool = True) None
“Initialize GdimAppSurveyStatReader object.
Parameters
- hoststr | None, default: None
If it’s None, defulat value will be used, for example: “https://gdim.kulunsoft.com”
- proj_idstr | None, default: None
The id of the project.
- tokenstr | None, default: None
The token of the user. stat_type: Literal[“地层表”, “常规试验表”, “标贯”, “岩石试验”, “动探”, “双桥静探”, “物理力学指标统计表”] | None, default: None The type of the survey statistics. If it’s “物理力学指标统计表”, the name of each table is the layer number. format_dict: dict[str, str] | None, default: None The format dict for column formatting. Key is the column name, value is the column format.
Notes
column_name: Literal["name", "title"], default: "name" The type of the column name in the output table. If it's ``title``, the title will be used which is user-friendly. If it's ``name``, the name will be used which is used in the database. gdim: bool, default: True Whether to get the data from GDIM. If False, the data is from GBIM.
Properties:
- InputToken
- OutputTables
- class modules.readers.GdimAppReportTextReader
Read the data from a Gdim APP - Report Text.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'GdimAppReportTextReader', auto_run: bool = True, host: str | None = None, proj_id: str | None = None, token: str | None = None, read_document: bool = False, gdim: bool = True) None
“Initialize GdimAppReportTextReader object.
Parameters
- hoststr | None, default: None
If it’s None, defulat value will be used, for example: “https://gdim.kulunsoft.com”
- proj_idstr | None, default: None
The id of the project.
- tokenstr | None, default: None
The token of the user.
- read_documentbool, default: False
If False, only text content will be read. If True, only documents and templates (.docx files) will be read.
- gdimbool, default: True
Whether to get the data from GDIM. If False, the data is from GBIM.
Properties:
- InputToken
- OutputSingleResult
- class modules.readers.GdimAppCoordinateSystemReader
Read the data from a Gdim APP - Coordinate System.
Only used for GBIM, for GDIM platform, this module is not available.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str | None = None, auto_run: bool = True, host: str | None = None, proj_id: str | None = None, token: str | None = None, gdim: bool = True) None
“Initialize GdimAppCoordinateSystemReader object.
Parameters
- hoststr | None, default: None
If it’s None, defulat value will be used, for example: “https://gdim.kulunsoft.com”
- proj_idstr | None, default: None
The id of the project.
- tokenstr | None, default: None
The token of the user.
- gdimbool, default: True
Whether to get the data from GDIM. If False, the data is from GBIM.
Properties:
- OutputCoordinateSystem
- InputToken
- class modules.readers.GdimAppMemberManagerReader
Read the data from the Gdim APP - MemberManager.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'GdimAppMemberManagerReader', auto_run: bool = True, host: str | None = None, proj_id: str | None = None, token: str | None = None, gdim: bool = True) None
“Initialize GdimAppMemberManagerReader object.
Parameters
- hoststr | None, default: None
If it’s None, defulat value will be used, for example: “https://gdim.kulunsoft.com”
- proj_idstr | None, default: None
The id of the project.
- tokenstr | None, default: None
The token of the user.
- gdimbool, default: True
Whether to get the data from GDIM. If False, the data is from GBIM.
Properties:
- OutputSingleResult
- class modules.readers.CsvReader
Read a csv file.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'CsvReader', auto_run: bool = True, file: str | Path | dict | None = None, sep: str = ',', header: int | list[int] | str | None = 'infer', index_col: int | str | list[int] | list[str] | None = None, usecols: list[int] | list[str] | None = None, dtype: dict[str, str] | str | None = None, skiprows: int | list[int] | None = None, nrows: int | None = None, na_values: str | list[str] | dict[str, str | list[str]] | None = None, encoding: str | None = 'auto', output_mode: Literal[table, schema, both] = 'table', name_row: int = 0, description_row: int | None = 1, unit_row: int | None = 2, check_units: bool = True, schema_field_name: str = 'fields') None
Initialize CsvReader object.
Parameters
- filestr | Path | dict | None
The path to the .csv file, or GDIM file metadata. If
strorPath, the local file path is used. Ifdict, it is validated asGdimMinIOFileand downloaded from the GDIM File server. sep: str, default “,” Delimiter to use for separating fields. header: int, list of int, str, default “infer” Row number(s) to use as the column names, and the start of the data.int: Row number to use as column names (0-indexed). Use 0 for first row.
list[int]: Multiple rows to use for multi-level column names, e.g., [0, 1].
“infer”: Automatically detect if first row contains column names.
None: No header row, columns will be named 0, 1, 2, etc.
index_col: int, str, sequence of int/str, None, default None Column(s) to use as the row labels of the DataFrame. usecols: list-like or callable, optional
Returns
- dtype
Type name or dict of column -> type, optional Data type for data or columns. Available types: - Basic types: ‘str’, ‘int’, ‘float’, ‘bool’ - Pandas types: ‘Int64’, ‘Float64’, ‘string’, ‘boolean’ - NumPy types: ‘int32’, ‘int64’, ‘float32’, ‘float64’, ‘object’ - Category: ‘category’ - DateTime: ‘datetime64[ns]’ - Examples: - Single type: ‘str’ (apply to all columns) - Dict format: {‘col1’: ‘int’, ‘col2’: ‘float’, ‘col3’: ‘str’} skiprows: list-like, int or callable, optional Line numbers to skip (0-indexed) or number of lines to skip. - int: Number of lines to skip from the beginning of file, e.g., 3 (skip first 3 lines) - list[int]: Specific line numbers to skip (0-indexed), e.g., [0, 2, 5] (skip lines 1, 3, and 6) - callable: Function that takes line number and returns True if line should be skipped - Examples: - Skip first 2 lines: 2 - Skip specific lines: [0, 3, 7] (skip lines 1, 4, and 8) - Skip header and footer: lambda x: x in [0, 1] or x > 100 nrows: int, optional Number of rows of file to read. na_values: scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN (missing values). - str: Single value to treat as NaN, e.g., ‘NULL’, ‘N/A’, ‘无数据’ - list[str]: Multiple values to treat as NaN, e.g., [‘NULL’, ‘N/A’, ‘’, ‘无数据’] - dict[str, str|list[str]]: Column-specific NA values - Examples: - Single NA value: ‘NULL’ - Multiple NA values: [‘NULL’, ‘N/A’, ‘’, ‘无数据’, ‘缺失’] - Column-specific: {‘age’: [‘unknown’, ‘未知’], ‘score’: [‘absent’, ‘缺考’]} encoding: str, optional Encoding to use when reading the file. - ‘auto’: Automatically detect encoding (tries utf-8, gbk, gb2312, latin-1) - ‘utf-8’: UTF-8 encoding (default) - ‘gbk’: GBK encoding (common for Chinese files) - ‘gb2312’: GB2312 encoding (simplified Chinese) - ‘latin-1’: Latin-1 encoding - Or any other standard encoding name output_mode: Literal[“table”, “schema”, “both”], default “table” Controls which output ports are populated: - ‘table’: populate OutputTable only (default). - ‘schema’: populate OutputSchema only; the CSV data itself is not loaded. - ‘both’: populate both OutputTable and OutputSchema. name_row: int, default 0 0-indexed row number in the raw CSV file that contains field names. Defaults to 0 (first row). Used only when OutputSchema is produced. description_row: int | None, default 1 0-indexed row number in the raw CSV file that contains field descriptions. Set to
``None``if the CSV has no description row. Defaults to 1 (second row). Used only when OutputSchema is produced. unit_row: int | None, default 2 0-indexed row number in the raw CSV file that contains the physical unit for each column (e.g. ``"m"``, ``"kPa"``). Defaults to``2``(third row). Set to``None``if the CSV has no unit row, or if the row index is beyond the end of the file the parameter is silently ignored — identical behaviour to``description_row``. Used only when OutputSchema is produced. Each cell is matched against :class:~gdi.dataclass.terminologies.Unitsby value first and then by enum member name. If a cell does not match any known unit, a :class:~gdi.dataclass.GDIDataQualityWarningis issued and that column’s unit is omitted from the schema text. Validated units appear in the schema as``[m]``annotations next to the field name, e.g. ``- depth: 钻孔深度 [m]``. check_units: bool, default True When``True``(default), unit cells are validated against :class:~gdi.dataclass.terminologies.Unitsand a :class:~gdi.dataclass.GDIDataQualityWarningis raised for unknown units (those columns get no unit annotation). When``False``, no validation is performed: non-empty unit cells are copied into the schema text as-is inside brackets, e.g.``[my unit]``. schema_field_name: str, default “fields” Name of the attribute in the output ResultModel that holds the plain-text schema summary string. Change this when downstream modules expect a different attribute name (e.g."columns","schema"). Ports OutputTable: PortTypeHint.TableData The output TableData of the .csv file. Populated when``output_mode``is ``"table"``or ``"both"``. OutputSchema: PortTypeHint.ResultModel A single-field ResultModel (PydanticBaseModelwrapped asResultModel) whose attribute name isschema_field_name(default"fields") and whose value is a plain-text column schema derived from the CSV header rows — not the table rows themselves. The string has the form: Table: <filename> Fields: - <name>: <description> [<unit>] - … Field names come fromname_row; descriptions fromdescription_row(omitted whenNoneor missing); units fromunit_rowappear in brackets when present (validated against :class:~gdi.dataclass.terminologies.Unitswhencheck_unitsisTrue). Typical use is wiring this port toPromptTemplate.InputValuesso a{fields}placeholder receives the schema text. Populated whenoutput_modeis"schema"or"both".
Attributes:
- OutputTable: PortReference[PortTypeHint.TableData]
- OutputSchema: PortReference[PortTypeHint.ResultModel]
- class modules.readers.MarkdownReader
Read a plain-text or Markdown file and expose its content as a ResultModel.
The output
``OutputContent``port carries a single-field ResultModel whose field name matches``field_name``(default ``"content"``). This lets a downstream :class:~modules.llmAI.PromptTemplatepick up the text via the matching``{content}``placeholder without any extra wiring.Examples
Minimal usage – read ````report.md```` and inject it into a prompt: >>> reader = MarkdownReader(file="report.md") >>> tpl = PromptTemplate(template="Summarise the following report:\n{content}") >>> pipe.connect(reader, "OutputContent", tpl, "InputValues") Use a custom field name so the template placeholder matches: >>> reader = MarkdownReader(file="context.txt", field_name="background") >>> tpl = PromptTemplate(template="Given this background:\n{background}\nAnswer: ...")
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'MarkdownReader', auto_run: bool = True, file: str | Path | None = None, encoding: str | None = 'auto', field_name: str = 'content', strip: bool = True) None
Initialize MarkdownReader.
Parameters
- filestr | Path | None, default: None
Path to the text or Markdown file to read. Accepts any plain-text format (``
.md``, ``.txt``, ``.rst``, …).- encodingstr | None, default: ``
"auto"`` Character encoding used to open the file.
``
"auto"``: probe the file with common encodings (utf-8, gbk,
gb2312, utf-8-sig, latin-1, cp1252) and use the first that works.
``None``or ``""``: fall back to utf-8.Any other standard encoding name is forwarded directly to
``open()``.- field_namestr, default: ``
"content"`` Name of the field in the output ResultModel. Must match the placeholder used in the downstream
``PromptTemplate``.If the template contains
``{content}``keep the default.If the template contains
``{report}``set``field_name="report"``.
- stripbool, default:
``True`` When
``True``, strip leading and trailing whitespace (including blank lines) from the file content before storing it. Ports- OutputContentPortTypeHint.ResultModel
A dynamically-created ResultModel instance with one
``str``field named``field_name``whose value is the file content. Connect this to``PromptTemplate.InputValues``.
Attributes:
- OutputContent: PortReference[PortTypeHint.ResultModel]
- class modules.readers.ExcelReader
Read an Excel (.xlsx) file.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'ExcelReader', auto_run: bool = True, file: str | Path | dict | None = None, sheet_name: str | int | list[str | int] | None = None, header: int | list[int] | str | None = 'infer', index_col: int | str | list[int] | list[str] | None = None, usecols: list[int] | list[str] | None = None, dtype: dict[str, str] | str | None = None, skiprows: int | list[int] | None = None, nrows: int | None = None, na_values: str | list[str] | dict[str, str | list[str]] | None = None, output_mode: Literal[table, schema, both] = 'table', name_row: int = 0, description_row: int | None = 1, unit_row: int | None = 2, check_units: bool = True, schema_field_name: str = 'fields', table_relationship_mode: Literal[none, manual] = 'none', main_table: str | list[str] | None = None, sub_tables: list[str] | dict[str, list[str]] | None = None, primary_key: str | dict[str, str] | None = None) None
Initialize ExcelReader object.
Parameters
- filestr | Path | dict | None
The path to the
.xlsxfile, or GDIM file metadata. IfstrorPath, the local file path is used. Ifdict, it is validated asGdimMinIOFileand downloaded from the GDIM File server. sheet_name: str | int | list[str | int] | None, default None Worksheet(s) to read.strorint: read one sheet; populateOutputTablewith
that sheet and
OutputTableswith a single-table collection.list[str | int]: read the listed sheets intoOutputTables;
OutputTableis always the worksheet at workbook index0.None: read every worksheet intoOutputTables;OutputTable
is the worksheet at workbook index
0. header: int, list of int, str, default “infer” Row number(s) to use as the column names, and the start of the data. index_col: int, str, sequence of int/str, None, default None Column(s) to use as the row labels of the DataFrame. usecols: list-like, optional
Returns
- dtype
Type name or dict of column -> type, optional Data type for data or columns. Same conventions as :class:
CsvReader. skiprows: list-like or int, optional Row numbers to skip (0-indexed) or number of rows to skip. nrows: int, optional Number of rows of the sheet to read. na_values: scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN (missing values). output_mode: Literal[“table”, “schema”, “both”], default “table” Controls which output ports are populated: - “table”: populate table port(s) only (default). - “schema”: populate “OutputSchema” only. - “both”: populate schema and table port(s). name_row: int, default 0 0-indexed row number in the raw worksheet that contains field names. Used only whenOutputSchemais produced. description_row: int | None, default 1 0-indexed row number that contains field descriptions. Set toNoneif the sheet has no description row. unit_row: int | None, default 2 0-indexed row number that contains the physical unit for each column. Set toNoneif the sheet has no unit row. check_units: bool, default True WhenTrue, unit cells are validated against :class:~gdi.dataclass.terminologies.Units. schema_field_name: str, default “fields” Name of the attribute in the output ResultModel that holds the plain-text schema summary string. table_relationship_mode: Literal[“none”, “manual”], default “none” Controls main/sub table relationships inOutputTableswhen multiple worksheets are read: - “none”: add tables independently with no hierarchy (default). - “manual”: usemain_table,sub_tables, and optionalprimary_keyto define the relationship. Ignored when only one worksheet is loaded. main_table: str | list[str] | None, default None Worksheet or table identifier(s) for the main table(s). Required whentable_relationship_mode="manual". - A singlestrfor one main table (use withsub_tablesas alist[str]). - Alist[str]for several main tables (use withsub_tablesas adict[str, list[str]]mapping each main table to its sub tables). sub_tables: list[str] | dict[str, list[str]] | None, default None Worksheet or table identifiers for sub tables. Required whentable_relationship_mode="manual". -list[str]whenmain_tableis a single table name/title. -dict[str, list[str]]whenmain_tablelists several main tables. Each dict key is a main table identifier; the value is that main table’s sub table identifiers. primary_key: str | dict[str, str] | None, default None Column name used to link each main table to its sub tables. -str: the same join column for a single-main configuration. -dict[str, str]: per-main join columns when several main tables are configured. Keys may be main table names or titles. -None: the first common column between each main table and its sub tables is used. Ports OutputTable: PortTypeHint.TableData The outputTableDataof an Excel worksheet. Whensheet_nameis a single sheet identifier (strorint), this port carries that sheet; otherwise it always carries the worksheet at workbook index0. Populated whenoutput_modeistableorboth. OutputTables: PortTypeHint.TableCollection ATableCollectionof worksheet data. Whensheet_nameis a single sheet identifier, the collection contains only thatTableData; otherwise it contains oneTableDataper sheet selected bysheet_name(all sheets whenNone). Populated whenoutput_modeistableorboth. OutputSchema: PortTypeHint.ResultModel A single-fieldResultModelwhose attribute name isschema_field_name(defaultfields) and whose value is a plain-text column schema derived from header rows — not the data rows themselves. For one worksheet the string has the form: Table: <sheet_name> Fields: - <name>: <description> [<unit>] - … When multiple worksheets are read, each sheet block is separated by a blank line. Field names come fromname_row; descriptions fromdescription_row(omitted whenNoneor missing); units fromunit_rowappear in brackets when present (validated against :class:~gdi.dataclass.terminologies.Unitswhencheck_unitsisTrue). Typical use is wiring this port toPromptTemplate.InputValuesso a{fields}placeholder receives the schema text. Populated whenoutput_modeisschemaorboth.
Attributes:
- OutputTable: PortReference[PortTypeHint.TableData]
- OutputTables: PortReference[PortTypeHint.TableCollection]
- OutputSchema: PortReference[PortTypeHint.ResultModel]
- class modules.readers.MdbReader
Read data and schema from a Microsoft Access
.mdbfile.- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'MdbReader', auto_run: bool = True, file: str | Path | dict | None = None, table_names: list[str] | None = None, password: str | None = None, usecols: list[str] | dict[str, list[str]] | None = None, nrows: int | None = None, skiprows: int | None = None, output_mode: Literal[table, schema, both] = 'table', schema_field_name: str = 'fields', include_sample_values: int = 0) None
Initialize MdbReader object.
Parameters
- filestr | Path | dict | None, default: None
The path to the
.mdbfile, or GDIM file metadata. IfstrorPath, the local file path is used. Ifdict, it is validated asGdimMinIOFileand downloaded from the GDIM File server.- table_nameslist[str] | None, default: None
仅读取指定表;为 None 时读取 MDB 中全部表。 Schema 导出同样仅包含这些表。
- passwordstr | None, default: None
MDB 数据库访问密码。加密库必须提供正确密码才能读取。 usecols: list[str] | dict[str, list[str]] | None, default: None 仅读取部分列。
list[str]:对所有读取的表应用同一组列名。dict[str, list[str]]:按表名分别指定列名。
Schema 导出同样仅包含这些列。
- nrowsint | None, default: None
每个表最多读取的数据行数。
- skiprowsint | None, default: None
每个表跳过开头的数据行数(在
nrows之前应用)。 output_mode: Literal[“table”, “schema”, “both”], default “table” 控制输出端口:"table":仅填充表数据端口(默认)。"schema":仅填充OutputSchema。"both":同时填充 Schema 与表数据端口。
schema_field_name: str, default “fields”
OutputSchema中ResultModel存放纯文本 Schema 的字段名。 include_sample_values: int, default 0 Schema 导出时每个字段附带的示例 distinct 值个数。0表示不读取示例数据;大于0时会扫描少量行以提取示例值。 Ports- OutputTablePortTypeHint.TableData
The output
TableDataof an MDB table. Whentable_namescontains exactly one table name, this port carries that table; otherwise it always carries the first user table in MDB read order (the first table whentable_namesisNone). Populated whenoutput_modeistableorboth.- OutputTablesPortTypeHint.TableCollection
A
TableCollectionof MDB table data. Whentable_namesspecifies a single table, the collection contains only thatTableData; otherwise it contains oneTableDataper table selected bytable_names(all user tables whenNone). Parent/child relationships from the MDB schema are attached automatically on Windows. Populated whenoutput_modeistableorboth.- OutputSchemaPortTypeHint.ResultModel
A single-field
ResultModelwhose attribute name isschema_field_name(defaultfields) and whose value is a plain-text schema summary of the selected tables — table names, optional table description, primary keys, field metadata (name, type, description, optional sample values), and relationship hints. For one table the string has the form:
Table : <table_name>
- Description<table_description> # only when different from name
Primary key: <col1>, <col2> # when defined Fields:
<name>: <description> [<type>, not null] (samples: a, b)
…
When multiple tables are read, each table block is separated by a blank line. Typical use is wiring this port to
PromptTemplate.InputValuesso a{fields}placeholder receives the schema text. Populated whenoutput_modeisschemaorboth.
Notes
If the data in InputFile is not None, the self.file will be overwritten by the data of the input port.
Attributes:
- InputFile: PortReference[PortTypeHint.FilePath]
- OutputTable: PortReference[PortTypeHint.TableData]
- OutputTables: PortReference[PortTypeHint.TableCollection]
- OutputSchema: PortReference[PortTypeHint.ResultModel]
- class modules.readers.ReadGtbFile
Read a
.gtbor.xlsxfile exported byExportGdimTablesand returna ``TableCollection`` ready for downstream processing (e.g. ``GdimTableWriter``). The file format written by ``ExportGdimTables`` stores: - A ``metadata`` entry with ``dataTemplateId``, a ``tables`` mapping (name → ``{name, title, sheetName}``, and an optional ``tableRelations`` mapping (child name → parent name). - One CSV / Excel sheet per table whose **first row** contains field *titles*, whose **second row** contains field *names*, whose **third row** contains field *units* (only for ``number`` fields), and data starts at row 4. The output ``TableCollection`` has each ``TableData`` column-named with field *names*, ``name_to_title`` populated from row 1, and ``TableData.name`` / ``TableData.title`` set from the metadata. When ``tableRelations`` is present the collection's ``main_table`` / ``sub_tables`` hierarchy is built so that ``write_table_data`` can write in the correct topological order. **Optional template-ID validation** (only when ``InputToken`` is connected and a valid ``proj_id`` is available): the module calls ``get_project_info`` and compares the file's ``dataTemplateId`` against the project. A mismatch emits a ``GDIDataQualityWarning`` and the output ports are set to ``None``. Set ``validate_template_id=False`` to skip this check explicitly.- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'ReadGtbFile', auto_run: bool = True, file: str | Path | dict | None = None, validate_template_id: bool = True, token: str | None = None, proj_id: str | None = None, host: str | None = None) None
Initialize the ReadGtbFile object.
Parameters
- filestr | Path | dict | None, default: None
Path to the
.gtbor.xlsxfile, or a GDIM MinIO file dict (produced byExportGdimTables(save_to_gdim=True)).- validate_template_idbool, default: True
When
Trueand a valid token +proj_idare available, the module calls the GDIM API to verify that the file’sdataTemplateIdmatches the target project. A mismatch emits aGDIDataQualityWarningand the output ports are set toNone. Set toFalseto skip this check (e.g. when reading the file for inspection without targeting a specific project).- tokenstr | None
The user token. Can also be supplied via the
InputTokenport orpipeline.gdim_state.- proj_idstr | None
The GDIM project id used for template-ID validation. When set, overrides the
proj_idinInputToken/pipeline.gdim_state.- hoststr | None
The GDIM platform host URL. Ports
- InputTokenPortTypeHint.Token
User token. Optional — see parameter description above.
- OutputTablesPortTypeHint.TableCollection
The reconstructed
``TableCollection``.
Attributes:
- InputToken: PortReference[PortTypeHint.Token]
- OutputTables: PortReference[PortTypeHint.TableCollection]
- class modules.readers.SkglMonitorReader
Read the monitor data from skgl platform.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str | None = None, auto_run: bool = True, host: str | None = None, proj_id: int | None = None, token: str | None = None, group: Literal[雨水情监测, 安全监测] = '雨水情监测', monitor_type: str | None = None, type_name: Literal[name, title] = 'title', column_name: Literal[name, title] = 'name') None
“Initialize SkglMonitorReader object.
Parameters
- hoststr | None, default: None
If it’s None, defulat value will be used, for example: “https://gdim.kulunsoft.com”
- proj_idint | None, default: None
The id of the project.
- tokenstr | None, default: None
The token of the user. group: Literal[“雨水情监测”, “安全监测”], default: “雨水情监测” The group of the monitor data.
- monitor_typestr | None, default: None
The type of the monitor data. type_name: Literal[“name”, “title”], default: “title” The type of the monitor type name. If it’s
title, the input name of the monitor type is the title. If it’sname, the input name of the monitor type is the name. column_name: Literal[“name”, “title”], default: “name” The type of the column name in the output table. If it’stitle, the title will be used which is user-friendly. If it’sname, the name will be used which is used in the database.
Properties:
- InputToken
- OutputTable
- class modules.readers.SkglYesterdayReader
Read the monitor data and management data of yesterday from skgl platform.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str | None = None, auto_run: bool = True, host: str | None = None, proj_id: int | None = None, token: str | None = None) None
Properties:
- InputToken
- OutputTableList
- OutputTables