dataclass.logs

Data cleaning and pipeline execution logging structures.

This module provides Pydantic models for structured logging of data cleaning operations, designed to be easily consumed by web frontends for beautiful and informative displays.

Classes

class dataclass.logs.LogLevel

Log level enumeration for consistent severity classification.

Inherits from:: str, Enum

Attributes:

DEBUG = 'debug'

INFO = 'info'

WARNING = 'warning'

ERROR = 'error'

CRITICAL = 'critical'

class dataclass.logs.ActionType

Types of actions that can be performed during data cleaning.

Inherits from:: str, Enum

Attributes:

NO_ACTION = 'no_action'

REMOVE_ROWS = 'remove_rows'

REMOVE_COLUMNS = 'remove_columns'

FILL_VALUES = 'fill_values'

CONVERT_TYPES = 'convert_types'

VALIDATE_DATA = 'validate_data'

FILTER_DATA = 'filter_data'

TRANSFORM_DATA = 'transform_data'

class dataclass.logs.IssueType

Types of data quality issues that can be detected.

Inherits from:: str, Enum

Attributes:

MISSING_VALUES = 'missing_values'

INVALID_TYPE = 'invalid_type'

OUT_OF_RANGE = 'out_of_range'

DUPLICATE_VALUES = 'duplicate_values'

INCONSISTENT_FORMAT = 'inconsistent_format'

CONSTRAINT_VIOLATION = 'constraint_violation'

DATA_ANOMALY = 'data_anomaly'

class dataclass.logs.IssueDetail

Detailed information about a specific data quality issue.

Inherits from:: BaseModel

Properties:

affected_percentage: Calculate the percentage of affected records.

severity_score: Convert severity to numeric score for sorting (higher = more severe).

Attributes:

id: UUID4 = <ast.Call object at 0x0000028E223BD9C0>

issue_type: IssueType = <ast.Call object at 0x0000028E223BDF60>

severity: LogLevel = <ast.Call object at 0x0000028E223BDDE0>

table_name: str | None = <ast.Call object at 0x0000028E223BDF00>

column_name: str | None = <ast.Call object at 0x0000028E223BE200>

row_indices: list[int] | None = <ast.Call object at 0x0000028E223BE3B0>

description: str = <ast.Call object at 0x0000028E223BDAB0>

detected_value: Any = <ast.Call object at 0x0000028E223BE710>

expected_value: Any = <ast.Call object at 0x0000028E223BE830>

affected_count: int = <ast.Call object at 0x0000028E223BE920>

total_count: int = <ast.Call object at 0x0000028E223BEB30>

rule_name: str | None = <ast.Call object at 0x0000028E223BEC20>

context: dict[(str, Any)] = <ast.Call object at 0x0000028E217781C0>

class dataclass.logs.ActionDetail

Detailed information about an action taken during data cleaning.

Inherits from:: BaseModel

Properties:

records_changed: Calculate the number of records changed.

columns_changed: Calculate the number of columns changed.

Attributes:

id: UUID4 = <ast.Call object at 0x0000028E21778F10>

action_type: ActionType = <ast.Call object at 0x0000028E217790C0>

timestamp: datetime = <ast.Call object at 0x0000028E21779240>

description: str = <ast.Call object at 0x0000028E21779420>

table_name: str | None = <ast.Call object at 0x0000028E21779600>

column_names: list[str] | None = <ast.Call object at 0x0000028E21779840>

records_before: int = <ast.Call object at 0x0000028E217799C0>

records_after: int = <ast.Call object at 0x0000028E2174A1D0>

columns_before: int = <ast.Call object at 0x0000028E2174A350>

columns_after: int = <ast.Call object at 0x0000028E2174A4D0>

parameters: dict[(str, Any)] = <ast.Call object at 0x0000028E2174A710>

success: bool = <ast.Call object at 0x0000028E2174A8C0>

error_message: str | None = <ast.Call object at 0x0000028E2174AAA0>

class dataclass.logs.RuleExecutionResult

Result of executing a single cleaning rule.

Inherits from:: BaseModel

Properties:

total_issues: Total number of issues found by this rule.

total_actions: Total number of actions taken by this rule.

critical_issues: Get only critical and error-level issues.

Attributes:

rule_name: str = <ast.Call object at 0x0000028E2174B2E0>

rule_description: str | None = <ast.Call object at 0x0000028E2174B4C0>

execution_time: datetime = <ast.Call object at 0x0000028E2174B640>

duration_ms: float | None = <ast.Call object at 0x0000028E2174B880>

status: Literal[(success, failed, skipped)] = <ast.Call object at 0x0000028E2174BAF0>

enabled: bool = <ast.Call object at 0x0000028E2174BC70>

error_message: str | None = <ast.Call object at 0x0000028E2174BE50>

issues_found: list[IssueDetail] = <ast.Call object at 0x0000028E222E6380>

actions_taken: list[ActionDetail] = <ast.Call object at 0x0000028E222E6170>

records_processed: int = <ast.Call object at 0x0000028E222E5FC0>

tables_processed: list[str] = <ast.Call object at 0x0000028E222E5DE0>

class dataclass.logs.TableProcessingSummary

Summary of processing for a single table.

Inherits from:: BaseModel

Properties:

rows_changed: Number of rows changed.

columns_changed: Number of columns changed.

change_percentage: Percentage of data changed.

Attributes:

table_name: str = <ast.Call object at 0x0000028E222E52A0>

original_shape: tuple[(int, int)] = <ast.Call object at 0x0000028E222E4FA0>

final_shape: tuple[(int, int)] = <ast.Call object at 0x0000028E222E4D60>

total_issues: int = <ast.Call object at 0x0000028E222E4BE0>

issues_by_type: dict[(IssueType, int)] = <ast.Call object at 0x0000028E222E49A0>

actions_performed: list[ActionType] = <ast.Call object at 0x0000028E222E4790>

data_quality_score: float | None = <ast.Call object at 0x0000028E222E4580>

completeness_score: float | None = <ast.Call object at 0x0000028E222E43A0>

class dataclass.logs.ModuleExecutionLog

Log entry for a single module execution.

Inherits from:: BaseModel

Properties:

duration_seconds: Calculate execution duration in seconds.

success_rate: Calculate the success rate of rule executions.

critical_issues: Get all critical issues from all rules.

Attributes:

id: UUID4 = <ast.Call object at 0x0000028E2170D150>

module_name: str = <ast.Call object at 0x0000028E2170C2E0>

module_class: str = <ast.Call object at 0x0000028E2170D120>

start_time: datetime = <ast.Call object at 0x0000028E2170D060>

end_time: datetime | None = <ast.Call object at 0x0000028E2170C220>

status: Literal[(running, completed, failed, skipped)] = <ast.Call object at 0x0000028E2170EA40>

configuration: dict[(str, Any)] = <ast.Call object at 0x0000028E2170F4F0>

input_data_info: dict[(str, Any)] = <ast.Call object at 0x0000028E2170CA30>

rule_results: list[RuleExecutionResult] = <ast.Call object at 0x0000028E2170D000>

table_summaries: list[TableProcessingSummary] = <ast.Call object at 0x0000028E2170D0C0>

total_records_processed: int = <ast.Call object at 0x0000028E2170D660>

total_issues_found: int = <ast.Call object at 0x0000028E2170E260>

total_actions_taken: int = <ast.Call object at 0x0000028E2170D600>

error_message: str | None = <ast.Call object at 0x0000028E2170C190>

stack_trace: str | None = <ast.Call object at 0x0000028E2170F490>

class dataclass.logs.PipelineExecutionReport

Complete report for a pipeline execution with data cleaning operations.

Inherits from:: BaseModel

Methods:

get_module_log(module_name: str) → ModuleExecutionLog | None: Get log for a specific module.

get_issues_for_table(table_name: str) → list[IssueDetail]: Get all issues for a specific table.

generate_frontend_summary() → dict[str, Any]: Generate a summary optimized for frontend dashboard display.

Properties:

duration_seconds: Calculate total execution duration in seconds.

success_rate: Calculate overall module success rate.

all_critical_issues: Get all critical issues from all modules.

issues_by_severity: Count issues by severity level.

processing_speed: Calculate records processed per second.

Attributes:

id: UUID4 = <ast.Call object at 0x0000028E216F36A0>

pipeline_name: str = <ast.Call object at 0x0000028E216F11E0>

pipeline_title: str | None = <ast.Call object at 0x0000028E216F2E90>

start_time: datetime = <ast.Call object at 0x0000028E216F2D40>

end_time: datetime | None = <ast.Call object at 0x0000028E216F26B0>

status: Literal[(running, completed, failed, cancelled)] = <ast.Call object at 0x0000028E216F2440>

module_logs: list[ModuleExecutionLog] = <ast.Call object at 0x0000028E216F1ED0>

total_modules: int = <ast.Call object at 0x0000028E216F2530>

successful_modules: int = <ast.Call object at 0x0000028E216F1960>

total_records_processed: int = <ast.Call object at 0x0000028E216F2770>

total_issues_found: int = <ast.Call object at 0x0000028E216F12A0>

total_actions_taken: int = <ast.Call object at 0x0000028E21726920>

overall_quality_score: float | None = <ast.Call object at 0x0000028E217266B0>

quality_improvement: float | None = <ast.Call object at 0x0000028E21724E50>

pipeline_config: dict[(str, Any)] = <ast.Call object at 0x0000028E21724A30>

environment_info: dict[(str, Any)] = <ast.Call object at 0x0000028E21725090>

summary: dict[(str, Any)] = <ast.Call object at 0x0000028E217262C0>

Functions

dataclass.logs.create_pipeline_execution_report(pipeline_name: str, pipeline_title: str | None = None, pipeline_config: dict[str, Any] | None = None) → PipelineExecutionReport: Create a new pipeline execution report.