modules.llmAI
Classes
- class modules.llmAI.PromptTemplate
Define a prompt template with dynamic placeholders.
Features:
Extract variables from template with {variable} syntax
Dynamic UI schema generation based on template variables
Multiple value sources: InputValues port (SingleResult or ResultModel), variable_values dict, or pipeline attributes
Use add_dict_attribute() to expose each variable as a pipeline attribute
Examples
Basic usage with SingleResult: >>> template = PromptTemplate( ... template="Analyze {data_type} with focus on {aspect}", ... variable_titles={"data_type": "Type of Data", "aspect": "Analysis Focus"} ... ) >>> template.variable_values = {"data_type": "sales", "aspect": "trends"} >>> template.execute() Basic usage with ResultModel: >>> class ReportContext(ResultModel): ... data_type: str ... aspect: str = Field(title="Analysis Focus") >>> template = PromptTemplate(template="Analyze {data_type} with focus on {aspect}") >>> # Connect an agent node's ResultModel output to InputValues Pipeline integration: >>> pipe.add_module(template) >>> pipe.add_dict_attribute("data_type", "template", "variable_values", "data_type") >>> pipe.add_dict_attribute("aspect", "template", "variable_values", "aspect") >>> pipe.set_attributes(data_type="sales", aspect="trends")
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'PromptTemplate', auto_run: bool = True, template: str = '', variable_values: dict[str, Any] | None = None, variable_titles: dict[str, str] | None = None, placeholder_pattern: str = '\\{(\\w+)\\}', missing_value_handling: Literal[empty, error, keep_placeholder] = 'empty') None
Initialize a PromptTemplate object.
Parameters
- templatestr, default: “”
The prompt template with {variable} placeholders variable_values : dict[str, str] | None, default: None Dictionary of variable values (can also be set via pipeline attributes) variable_titles : dict[str, Any] | None, default: None Custom titles for each variable in the UI (defaults to capitalized variable name)
- placeholder_patternstr, default: r”{(w+)}”
Regex pattern for extracting variables (default: {variable_name}) missing_value_handling : Literal[“empty”, “error”, “keep_placeholder”], default: “empty” How to handle missing variable values:
“empty”: replace with empty string
“error”: raise ValueError
“keep_placeholder”: keep {variable} as-is
Ports
- InputValuesPortReference[PortTypeHint.SingleResult | PortTypeHint.ResultModel]
The input values for the template variables. Accepts either a SingleResult (matched by UnitResult.name, then UnitResult.title) or a ResultModel instance (matched by field name, then by Field(title=…) metadata).
- OutputPromptPortReference[PortTypeHint.Text]
The filled prompt text.
- update_ui_schema(reset: bool = False) dict[str, UIAttributeSchema]
Update UI schema based on current configuration.
- execute() PortTypeHint.Text | None
Fill the template with values from various sources.
Priority (highest to lowest): #. InputValues port — SingleResult matched by UnitResult.name then UnitResult.title; ResultModel matched by field name then Field(title=...) metadata. #. variable_values dict #. Empty string or error (based on missing_value_handling) If ````InputValues```` is connected upstream but data is not yet available (e.g. a join module still waiting on another branch), this returns ````None```` and does not fill the template from ````variable_values```` alone. Unconnected ````InputValues```` may stay ````None```` and execution still uses ````variable_values```` / placeholder handling.
Properties:
- template
Attributes:
- InputValues: PortReference[PortTypeHint.SingleResult | PortTypeHint.ResultModel]
- OutputPrompt: PortReference[PortTypeHint.Text]
- class modules.llmAI.LLMNode
A simple module that executes an LLM model with a prompt.
This module provides a straightforward interface to LLM providers:
Takes a prompt as input (via InputPrompt port or prompt parameter)
Calls the LLM provider with the prompt
Returns the raw text response and structured metadata
LLMNode is a direct LLM interface suitable for general text generation tasks.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'LLMNode', auto_run: bool = True, prompt: PortTypeHint.Text | None = None, llm_provider: Literal[geekai, siliconflow, ollama, openrouter] = 'geekai', model: str = 'qwen-plus', temperature: float = 0.1, max_tokens: int = 4096, system_prompt: str | None = None, response_format: dict[str, str | dict] | None = None, reply_key_name: str = 'reply', reply_key_title: str | None = None, api_key: str | None = None)
Initialize the LLM Node.
Parameters
- mnamestr, default: “LLMNode”
Module name
- auto_runbool, default: True
Whether to auto-run the module
- promptPortTypeHint.Text | None, default: None
Prompt for LLM (used if InputPrompt port is not provided) llm_provider: Literal[“geekai”, “siliconflow”, “ollama”, “openrouter”], default: “geekai” LLM provider to use for generation.
- modelstr, default: “qwen-plus””
Specific model to use for generation
- temperaturefloat, default: 0.1
Temperature for LLM generation (0.0-1.0)
- max_tokensint, default: 4096
Maximum tokens for LLM response
- system_promptstr | None, default: None
Optional system prompt to set context for the LLM response_format: dict[str, str | dict] | None, default: None Response format specification (e.g., {‘type’: ‘json_object’})
- reply_key_namestr, default: “reply”
The field name of the key in the ResultModel returned for reply. Use this name as the placeholder in a downstream
``PromptTemplate``(e.g.``template="...{reply}..."``when left at the default).- reply_key_titlestr | None, default: None
The display title of the key in the ResultModel returned for reply. If None, falls back to the value of
``reply_key_name``.- api_keystr | None, default: None
API key for the selected LLM provider If None, will use pipeline’s llm_key if available Ports
- InputPromptPortReference[PortTypeHint.Text]
The prompt to the LLM model.
- OutputReplyPortReference[PortTypeHint.ResultModel]
The reply wrapped in a ResultModel. The field name is
``reply_key_name``(default ``"reply"``) so it can be consumed directly by a``PromptTemplate``InputValues port.- OutputResponsePortReference[PortTypeHint.AgentRunResult]
The raw AgentRunResult from pydantic_ai. Use .output for the text, .all_messages() for full message history, .usage() for token counts, and pass .all_messages() to a subsequent agent.run_sync() for multi-turn.
- update_ui_schema(reset: bool = False) dict[str, UIAttributeSchema]
Update UI schema based on current configuration.
Attributes:
- InputPrompt: PortReference[PortTypeHint.Text]
- OutputText: PortReference[PortTypeHint.Text]
- OutputReply: PortReference[PortTypeHint.ResultModel]
- OutputResponse: PortReference[PortTypeHint.AgentRunResult]
- class modules.llmAI.AgentNode
A generic PydanticAI agent runtime module for pipeline workflows.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'AgentNode', auto_run: bool = True, prompt: PortTypeHint.Text | None = None, llm_provider: Literal[geekai, siliconflow, ollama, openrouter] = 'geekai', model: str = 'qwen-plus', temperature: float = 0.1, max_tokens: int = 4096, retries: int = 1, system_prompt: str | None = None, deps_name: str | Callable[Ellipsis, Any] | None = None, output_type_name: str | type | None = None, response_format: dict[str, str | dict] | None = None, api_key: str | None = None, local_functions_path: str | Path | None = None, tool_function_names: list[str | Callable[Ellipsis, Any]] | None = None, output_validator_names: list[str | Callable[Ellipsis, Any]] | None = None, agent_spec: AgentSpec | None = None) None
Initialize AgentNode.
Parameters
- promptPortTypeHint.Text | None, default: None
Fallback user prompt when InputPrompt is not connected. llm_provider : Literal[“geekai”, “siliconflow”, “ollama”, “openrouter”], default: “geekai” Provider key understood by
``ModelClientFactory``.- modelstr, default: “qwen-plus”
Provider model name.
- temperaturefloat, default: 0.1
Sampling temperature for run-time
``ModelSettings``.- max_tokensint, default: 4096
Maximum output token count for one run.
- retriesint, default: 1
Retry budget for model/tool/output validation cycles.
- system_promptstr | None, default: None
Agent system instruction text. deps_name : str | Callable[…, Any] | None, default: None Optional dependency builder for RunContext deps. Supports string name (local_functions_path) or inline callable.
- output_type_namestr | type | None, default: None
Structured output type definition. Supports:
class object (inline testing)
symbol name in
``local_functions_path``fully qualified import path (
``module.Symbol``)
response_format : dict[str, str | dict] | None, default: None Optional low-level response format (such as JSON mode hints).
- api_keystr | None, default: None
Explicit API key. Pipeline
``llm_key``still has higher priority.- local_functions_pathstr | Path | None, default: None
Path used to resolve
``tool_function_names``and``output_validator_names``. tool_function_names : list[str | Callable[…, Any]] | None, default: None Tool definitions. Each item can be:function name (resolved from
``local_functions_path``)inline callable (useful for local testing)
output_validator_names : list[str | Callable[…, Any]] | None, default: None Output validator definitions. Each item can be:
function name (resolved from
``local_functions_path``)inline callable (useful for local testing)
- agent_specAgentSpec | None, default: None
Optional declarative runtime spec. If provided (or passed from port), it overrides module-level settings.
Attributes:
- InputPrompt: PortReference[PortTypeHint.Text]
- InputAgentSpec: PortReference[PortTypeHint.ResultModel]
- InputDeps: PortReference[PortTypeHint.ResultModel | PortTypeHint.General]
- OutputReply: PortReference[PortTypeHint.ResultModel]
- OutputResponse: PortReference[PortTypeHint.AgentRunResult]
- class modules.llmAI.TableDataExtractorV2
Table extraction wrapper powered by AgentNode and JSON parsing.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'TableDataExtractorV2', auto_run: bool = True, text: str | None = None, columns: list[str] | None = None, table_description: str = 'table', extraction_instructions: str | None = None, llm_provider: Literal[geekai, siliconflow, ollama, openrouter] = 'geekai', model: str = 'qwen-plus', api_key: str | None = None, temperature: float = 0.1, max_tokens: int = 4096, local_functions_path: str | Path | None = None, tool_function_names: list[str | Callable[Ellipsis, Any]] | None = None, output_validator_names: list[str | Callable[Ellipsis, Any]] | None = None) None
Attributes:
- InputText: PortReference[PortTypeHint.Text]
- OutputTable: PortReference[PortTypeHint.TableData]
- OutputResponse: PortReference[PortTypeHint.AgentRunResult]
- class modules.llmAI.BuildAgentSpec
Build a base AgentSpec object for advanced graph composition.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'BuildAgentSpec', auto_run: bool = True, llm_provider: Literal[geekai, siliconflow, ollama, openrouter] = 'geekai', model: str | None = None, mode: Literal[text, vision] = 'text', system_prompt: str | None = None, temperature: float = 0.1, max_tokens: int = 4096, retries: int = 1, api_key: str | None = None, deps_name: str | None = None, output_type_name: str | None = None, response_format: dict[str, str | dict] | None = None, local_functions_path: str | Path | None = None) None
- execute() PortTypeHint.ResultModel
Attributes:
- OutputAgentSpec: PortReference[PortTypeHint.ResultModel]
- class modules.llmAI.AddToolSpec
Append one ToolSpec item to an AgentSpec.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'AddToolSpec', auto_run: bool = True, function_name: str | None = None, enabled: bool = True, description: str | None = None) None
Attributes:
- InputAgentSpec: PortReference[PortTypeHint.ResultModel]
- OutputAgentSpec: PortReference[PortTypeHint.ResultModel]
- class modules.llmAI.AddOutputValidatorSpec
Append one OutputValidatorSpec item to an AgentSpec.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'AddOutputValidatorSpec', auto_run: bool = True, function_name: str | None = None, enabled: bool = True, description: str | None = None) None
Attributes:
- InputAgentSpec: PortReference[PortTypeHint.ResultModel]
- OutputAgentSpec: PortReference[PortTypeHint.ResultModel]
- class modules.llmAI.RunAgent
Run AgentSpec + prompt by delegating to AgentNode runtime.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'RunAgent', auto_run: bool = True, prompt: str | None = None, agent_spec: AgentSpec | None = None) None
Attributes:
- InputPrompt: PortReference[PortTypeHint.Text]
- InputAgentSpec: PortReference[PortTypeHint.ResultModel]
- OutputReply: PortReference[PortTypeHint.Text]
- OutputResponse: PortReference[PortTypeHint.AgentRunResult]
- class modules.llmAI.TableDataExtractor
Extract table data from images using LLM.
This module extracts structured table data from images using vision-capable LLM models. It supports extracting multiple tables with different configurations by specifying table_configs as a list of dictionaries (rendered as a table in the UI).
Features
Extract single or multiple tables from images
Filter images by prefix for each table configuration
Configurable column definitions and data types per table
Support for multiple LLM providers
Examples
Single table extraction: >>> extractor = TableDataExtractor( ... table_configs=[{ ... "images_prefix": "table1", ... "columns": ["Name", "Age", "City"], ... "table_description": "Person info", ... "extraction_instructions": "", ... "data_types": {"Age": "int"} ... }] ... ) Multiple tables extraction: >>> extractor = TableDataExtractor( ... table_configs=[ ... { ... "images_prefix": "employee", ... "columns": ["Name", "Department", "Salary"], ... "table_description": "Employee directory", ... "extraction_instructions": "", ... "data_types": {"Salary": "float"} ... }, ... { ... "images_prefix": "inventory", ... "columns": ["Product", "Quantity", "Price"], ... "table_description": "Inventory list", ... "extraction_instructions": "注意识别特殊符号", ... "data_types": {"Quantity": "int", "Price": "float"} ... } ... ] ... )
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'TableDataExtractor', auto_run: bool = True, images_path: PortTypeHint.FilesPath | None = None, table_configs: list[_TableConfigRow] | None = None, llm_provider: Literal[geekai, siliconflow, ollama, openrouter] = 'geekai', api_key: str | None = None, vision_model: str = 'qwen3-vl-flash', temperature: float = 0.1, max_tokens: int = 4096)
Initialize the table data extractor.
Parameters
- mnamestr, default: “TableDataExtractor”
Module name.
- auto_runbool, default: True
Whether to auto-run the module.
- images_pathPortTypeHint.FilesPath | None, default: None
Path to the images to extract table data from. If the images_path are assigned by input port, the self.images_path will be overwritten by the input data.
- table_configslist[_TableConfigRow] | None, default: None
List of table extraction configurations. Each configuration is a dictionary with the following keys:
images_prefix: str - Prefix to filter images for this table
columns: list[str] - List of column names to extract
table_description: str - Description of the table
extraction_instructions: str - Custom extraction instructions
data_types: dict[str, str] - Column name to data type mapping
llm_provider : Literal[“geekai”, “siliconflow”, “ollama”, “openrouter”], default: “geekai” LLM provider to use for extraction.
- api_keystr | None, default: None
API key for the selected LLM provider. If None, will use pipeline’s llm_key if available.
- vision_modelstr, default: “qwen3-vl-flash”
Vision model to use for extraction.
- temperaturefloat, default: 0.1
Model temperature for generation (0.0-2.0).
- max_tokensint, default: 4096
Maximum tokens for LLM response.
- update_ui_schema(reset: bool = False) dict[str, UIAttributeSchema]
Update UI schema for table data extractor configuration.
Returns
- Any
using ArrayAttributeSchema with render_as_table=True.
- execute() PortTypeHint.TableData | PortTypeHint.TableCollection | None
Execute table data extraction from images.
Returns
- Any
TableData | TableCollection | None - TableData: When only one table configuration is provided - TableCollection: When multiple table configurations are provided - None: When no images or configurations are available
Attributes:
- InputImages: PortReference[PortTypeHint.FilesPath]
- OutputTables: PortReference[PortTypeHint.TableData | PortTypeHint.TableCollection]
- class modules.llmAI.DictDataExtractor
Extract structured dictionary/JSON data from images using LLM.
This module extracts flexible JSON/dictionary data from images using vision-capable LLM models. Unlike TableDataExtractor which outputs tabular data (rows and columns), this module outputs SingleResult for complex, nested, or non-tabular structures.
Use Cases:
Forms and certificates (ID cards, licenses, permits)
Labels and tags (product labels, shipping labels)
Invoices and receipts
Technical specifications or datasheets
Any structured non-tabular document
Multi-Image Extraction Strategy:
Extracts ALL fields from EACH image
For non-list fields: Last non-null value wins (overwrites previous)
For list fields: Append all non-null values across images
Nested objects (type=”object”) are stored as JSON strings in SingleResult
Examples
Basic extraction: >>> extractor = DictDataExtractor( ... field_configs=[ ... {"name": "company_name", "type": "string", "title": "公司名称"}, ... {"name": "invoice_number", "type": "string", "title": "发票号"}, ... {"name": "amount", "type": "float", "title": "金额", "unit": "元"}, ... ], ... extraction_description="发票" ... ) With list field (accumulates across images): >>> extractor = DictDataExtractor( ... field_configs=[ ... {"name": "sample_id", "type": "string", "title": "样品编号"}, ... {"name": "pollutants", "type": "list", "title": "污染物列表"}, ... ], ... extraction_description="环境监测报告" ... ) With nested object (stored as JSON string): >>> extractor = DictDataExtractor( ... field_configs=[ ... {"name": "project_info", "type": "object", "title": "项目信息", ... "description": "包含项目名称、编号、负责人等"}, ... ], ... extraction_description="项目报告" ... )
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'DictDataExtractor', auto_run: bool = True, images_path: PortTypeHint.FilesPath | None = None, field_configs: list[_FieldConfigRow] | None = None, extraction_description: str = '文档', extraction_instructions: str | None = None, llm_provider: Literal[geekai, siliconflow, ollama, openrouter] = 'geekai', api_key: str | None = None, vision_model: str = 'qwen3-vl-flash', temperature: float = 0.1, max_tokens: int = 4096)
Initialize the dict data extractor.
Parameters
- mnamestr, default: “DictDataExtractor”
Module name.
- auto_runbool, default: True
Whether to auto-run the module.
- images_pathPortTypeHint.FilesPath | None, default: None
Path to the images to extract data from. If assigned by input port, self.images_path will be overwritten.
- field_configslist[_FieldConfigRow] | None, default: None
List of field configurations to extract. Each configuration is a dictionary with the following keys:
name: str - Field key name (required)
type: str - Data type (string, number, float, int, boolean, date, list, object)
title: str - Human-readable title for UI
description: str - Description to help LLM understand the field
unit: str - Unit for numeric values
- extraction_descriptionstr, default: “文档”
Description of the document type (e.g., “发票”, “检测报告”, “证书”)
- extraction_instructionsstr | None, default: None
Additional extraction instructions for the LLM llm_provider : Literal[“geekai”, “siliconflow”, “ollama”, “openrouter”], default: “geekai” LLM provider to use for extraction.
- api_keystr | None, default: None
API key for the selected LLM provider. If None, will use pipeline’s llm_key if available.
- vision_modelstr, default: “qwen3-vl-flash”
Vision model to use for extraction.
- temperaturefloat, default: 0.1
Model temperature for generation (0.0-2.0).
- max_tokensint, default: 4096
Maximum tokens for LLM response.
- update_ui_schema(reset: bool = False) dict[str, UIAttributeSchema]
Update UI schema for dict data extractor configuration.
Returns
- Any
using ArrayAttributeSchema with render_as_table=True.
- execute() PortTypeHint.SingleResult | None
Execute dictionary data extraction from images.
Returns
- Any
SingleResult | None - SingleResult: Extracted key-value pairs - None: When no images or configurations are available
Attributes:
- InputImages: PortReference[PortTypeHint.FilesPath]
- OutputSingleResult: PortReference[PortTypeHint.SingleResult]
- class modules.llmAI.TableAnalyzer
Intelligent table data analyzer using LLM.
This module provides comprehensive data analysis capabilities including:
Descriptive statistics and data quality assessment
Correlation and trend analysis
Anomaly detection and data cleaning
Custom queries with natural language
Automated plot generation
Feature engineering
Supports TableData, TableCollection, and SingleResult inputs with rich metadata utilization for enhanced analysis quality.
- Inherits from:
PipeModule
Methods:
- __init__(mname: str = 'TableAnalyzer', auto_run: bool = True, input_data: PortTypeHint.TableData | PortTypeHint.TableCollection | PortTypeHint.SingleResult | None = None, prompt: PortTypeHint.Text | None = None, analysis_type: AnalysisType | str = AnalysisType.CUSTOM_QUERY, llm_provider: Literal[geekai, siliconflow, ollama, openrouter] = 'geekai', model: str = 'qwen3-coder-flash', temperature: float = 0.05, max_tokens: int = 4096, api_key: str | None = None, output_dir: str | Path | None = None, plot_save_mode: Literal[workspace, default] = 'workspace', auto_open_plots: bool = False, plot_format: Literal[png, jpg, pdf, svg, base64] = 'png', language: Literal[english, chinese] = 'chinese', system_prompt: str | None = None)
Initialize the LLM Table Analyzer.
Parameters
- input_dataPortTypeHint.TableData | PortTypeHint.TableCollection | PortTypeHint.SingleResult | None, default: None
The table data to analyze (TableData, TableCollection, or SingleResult).
- promptPortTypeHint.String, default: None
Prompt for analysis (used when analysis_type is CUSTOM_QUERY). If not provided, the module’s attributes ‘prompt’ will be used.
- analysis_typeAnalysisType, default: CUSTOM_QUERY
Type of analysis to perform on the table data llm_provider: Literal[“geekai”, “siliconflow”, “ollama”, “openrouter”], default: “geekai” LLM provider to use for analysis.
- api_keystr | None, default: None
API key for the selected LLM provider
- modelstr, default: “qwen3-coder-flash”
Specific model to use for analysis
- temperaturefloat, default: 0.05
Temperature for LLM generation (0.0-2.0)
- max_tokensint, default: 4096
Maximum tokens for LLM response
- output_dirstr | Path | None, default: None
Directory to save the generated plots
‘workspace’ of pipeline has priority over the ‘output_dir’.
If both ‘output_dir’ and ‘workspace’ are None, the current working directory will be used.
plot_save_mode: Literal[“workspace”, “default”], default: “workspace”
“workspace”: Save plot file to the specified workspace directory
“default”: Use PandasAI default save path
- auto_open_plotsbool, default: False
Automatically open plots with system viewer (only applies when plot_save_mode=”workspace”) plot_format: Literal[“png”, “jpg”, “pdf”, “svg”, “base64”], default: “png” Format for generated plots If “base64”, ‘auto_open_plots’ will be ignored and the plot file will be deleted. Plot will be returned as a base64 string. language: Literal[“english”, “chinese”], default: “chinese” Output language preference for analysis insights
- system_promptstr | None, default: None
System prompt for domain-specific context.
- update_ui_schema(reset: bool = False) dict[str, UIAttributeSchema]
Update UI schema based on current configuration.
Attributes:
- InputData: PortReference[PortTypeHint.TableData | PortTypeHint.TableCollection | PortTypeHint.SingleResult]
- InputPrompt: PortReference[PortTypeHint.Text]
- OutputInsights: PortReference[PortTypeHint.Text]
- OutputProcessedData: PortReference[PortTypeHint.TableData | PortTypeHint.TableCollection]
- OutputPlot: PortReference[PortTypeHint.FilePath | PortTypeHint.Picture]
- OutputResponse: PortReference[PortTypeHint.SingleResult]