dataclass.logs

Data cleaning and pipeline execution logging structures.

This module provides Pydantic models for structured logging of data cleaning operations, designed to be easily consumed by web frontends for beautiful and informative displays.

Classes

class dataclass.logs.LogLevel

Log level enumeration for consistent severity classification.

Inherits from:

str, Enum

Attributes:

DEBUG = 'debug'
INFO = 'info'
WARNING = 'warning'
ERROR = 'error'
CRITICAL = 'critical'
class dataclass.logs.ActionType

Types of actions that can be performed during data cleaning.

Inherits from:

str, Enum

Attributes:

NO_ACTION = 'no_action'
REMOVE_ROWS = 'remove_rows'
REMOVE_COLUMNS = 'remove_columns'
FILL_VALUES = 'fill_values'
CONVERT_TYPES = 'convert_types'
VALIDATE_DATA = 'validate_data'
FILTER_DATA = 'filter_data'
TRANSFORM_DATA = 'transform_data'
class dataclass.logs.IssueType

Types of data quality issues that can be detected.

Inherits from:

str, Enum

Attributes:

MISSING_VALUES = 'missing_values'
INVALID_TYPE = 'invalid_type'
OUT_OF_RANGE = 'out_of_range'
DUPLICATE_VALUES = 'duplicate_values'
INCONSISTENT_FORMAT = 'inconsistent_format'
CONSTRAINT_VIOLATION = 'constraint_violation'
DATA_ANOMALY = 'data_anomaly'
class dataclass.logs.IssueDetail

Detailed information about a specific data quality issue.

Inherits from:

BaseModel

Properties:

affected_percentage

Calculate the percentage of affected records.

severity_score

Convert severity to numeric score for sorting (higher = more severe).

Attributes:

id: UUID4 = <ast.Call object at 0x0000028E223BD9C0>
issue_type: IssueType = <ast.Call object at 0x0000028E223BDF60>
severity: LogLevel = <ast.Call object at 0x0000028E223BDDE0>
table_name: str | None = <ast.Call object at 0x0000028E223BDF00>
column_name: str | None = <ast.Call object at 0x0000028E223BE200>
row_indices: list[int] | None = <ast.Call object at 0x0000028E223BE3B0>
description: str = <ast.Call object at 0x0000028E223BDAB0>
detected_value: Any = <ast.Call object at 0x0000028E223BE710>
expected_value: Any = <ast.Call object at 0x0000028E223BE830>
affected_count: int = <ast.Call object at 0x0000028E223BE920>
total_count: int = <ast.Call object at 0x0000028E223BEB30>
rule_name: str | None = <ast.Call object at 0x0000028E223BEC20>
context: dict[(str, Any)] = <ast.Call object at 0x0000028E217781C0>
class dataclass.logs.ActionDetail

Detailed information about an action taken during data cleaning.

Inherits from:

BaseModel

Properties:

records_changed

Calculate the number of records changed.

columns_changed

Calculate the number of columns changed.

Attributes:

id: UUID4 = <ast.Call object at 0x0000028E21778F10>
action_type: ActionType = <ast.Call object at 0x0000028E217790C0>
timestamp: datetime = <ast.Call object at 0x0000028E21779240>
description: str = <ast.Call object at 0x0000028E21779420>
table_name: str | None = <ast.Call object at 0x0000028E21779600>
column_names: list[str] | None = <ast.Call object at 0x0000028E21779840>
records_before: int = <ast.Call object at 0x0000028E217799C0>
records_after: int = <ast.Call object at 0x0000028E2174A1D0>
columns_before: int = <ast.Call object at 0x0000028E2174A350>
columns_after: int = <ast.Call object at 0x0000028E2174A4D0>
parameters: dict[(str, Any)] = <ast.Call object at 0x0000028E2174A710>
success: bool = <ast.Call object at 0x0000028E2174A8C0>
error_message: str | None = <ast.Call object at 0x0000028E2174AAA0>
class dataclass.logs.RuleExecutionResult

Result of executing a single cleaning rule.

Inherits from:

BaseModel

Properties:

total_issues

Total number of issues found by this rule.

total_actions

Total number of actions taken by this rule.

critical_issues

Get only critical and error-level issues.

Attributes:

rule_name: str = <ast.Call object at 0x0000028E2174B2E0>
rule_description: str | None = <ast.Call object at 0x0000028E2174B4C0>
execution_time: datetime = <ast.Call object at 0x0000028E2174B640>
duration_ms: float | None = <ast.Call object at 0x0000028E2174B880>
status: Literal[(success, failed, skipped)] = <ast.Call object at 0x0000028E2174BAF0>
enabled: bool = <ast.Call object at 0x0000028E2174BC70>
error_message: str | None = <ast.Call object at 0x0000028E2174BE50>
issues_found: list[IssueDetail] = <ast.Call object at 0x0000028E222E6380>
actions_taken: list[ActionDetail] = <ast.Call object at 0x0000028E222E6170>
records_processed: int = <ast.Call object at 0x0000028E222E5FC0>
tables_processed: list[str] = <ast.Call object at 0x0000028E222E5DE0>
class dataclass.logs.TableProcessingSummary

Summary of processing for a single table.

Inherits from:

BaseModel

Properties:

rows_changed

Number of rows changed.

columns_changed

Number of columns changed.

change_percentage

Percentage of data changed.

Attributes:

table_name: str = <ast.Call object at 0x0000028E222E52A0>
original_shape: tuple[(int, int)] = <ast.Call object at 0x0000028E222E4FA0>
final_shape: tuple[(int, int)] = <ast.Call object at 0x0000028E222E4D60>
total_issues: int = <ast.Call object at 0x0000028E222E4BE0>
issues_by_type: dict[(IssueType, int)] = <ast.Call object at 0x0000028E222E49A0>
actions_performed: list[ActionType] = <ast.Call object at 0x0000028E222E4790>
data_quality_score: float | None = <ast.Call object at 0x0000028E222E4580>
completeness_score: float | None = <ast.Call object at 0x0000028E222E43A0>
class dataclass.logs.ModuleExecutionLog

Log entry for a single module execution.

Inherits from:

BaseModel

Properties:

duration_seconds

Calculate execution duration in seconds.

success_rate

Calculate the success rate of rule executions.

critical_issues

Get all critical issues from all rules.

Attributes:

id: UUID4 = <ast.Call object at 0x0000028E2170D150>
module_name: str = <ast.Call object at 0x0000028E2170C2E0>
module_class: str = <ast.Call object at 0x0000028E2170D120>
start_time: datetime = <ast.Call object at 0x0000028E2170D060>
end_time: datetime | None = <ast.Call object at 0x0000028E2170C220>
status: Literal[(running, completed, failed, skipped)] = <ast.Call object at 0x0000028E2170EA40>
configuration: dict[(str, Any)] = <ast.Call object at 0x0000028E2170F4F0>
input_data_info: dict[(str, Any)] = <ast.Call object at 0x0000028E2170CA30>
rule_results: list[RuleExecutionResult] = <ast.Call object at 0x0000028E2170D000>
table_summaries: list[TableProcessingSummary] = <ast.Call object at 0x0000028E2170D0C0>
total_records_processed: int = <ast.Call object at 0x0000028E2170D660>
total_issues_found: int = <ast.Call object at 0x0000028E2170E260>
total_actions_taken: int = <ast.Call object at 0x0000028E2170D600>
error_message: str | None = <ast.Call object at 0x0000028E2170C190>
stack_trace: str | None = <ast.Call object at 0x0000028E2170F490>
class dataclass.logs.PipelineExecutionReport

Complete report for a pipeline execution with data cleaning operations.

Inherits from:

BaseModel

Methods:

get_module_log(module_name: str) ModuleExecutionLog | None

Get log for a specific module.

get_issues_for_table(table_name: str) list[IssueDetail]

Get all issues for a specific table.

generate_frontend_summary() dict[str, Any]

Generate a summary optimized for frontend dashboard display.

Properties:

duration_seconds

Calculate total execution duration in seconds.

success_rate

Calculate overall module success rate.

all_critical_issues

Get all critical issues from all modules.

issues_by_severity

Count issues by severity level.

processing_speed

Calculate records processed per second.

Attributes:

id: UUID4 = <ast.Call object at 0x0000028E216F36A0>
pipeline_name: str = <ast.Call object at 0x0000028E216F11E0>
pipeline_title: str | None = <ast.Call object at 0x0000028E216F2E90>
start_time: datetime = <ast.Call object at 0x0000028E216F2D40>
end_time: datetime | None = <ast.Call object at 0x0000028E216F26B0>
status: Literal[(running, completed, failed, cancelled)] = <ast.Call object at 0x0000028E216F2440>
module_logs: list[ModuleExecutionLog] = <ast.Call object at 0x0000028E216F1ED0>
total_modules: int = <ast.Call object at 0x0000028E216F2530>
successful_modules: int = <ast.Call object at 0x0000028E216F1960>
total_records_processed: int = <ast.Call object at 0x0000028E216F2770>
total_issues_found: int = <ast.Call object at 0x0000028E216F12A0>
total_actions_taken: int = <ast.Call object at 0x0000028E21726920>
overall_quality_score: float | None = <ast.Call object at 0x0000028E217266B0>
quality_improvement: float | None = <ast.Call object at 0x0000028E21724E50>
pipeline_config: dict[(str, Any)] = <ast.Call object at 0x0000028E21724A30>
environment_info: dict[(str, Any)] = <ast.Call object at 0x0000028E21725090>
summary: dict[(str, Any)] = <ast.Call object at 0x0000028E217262C0>

Functions

dataclass.logs.create_pipeline_execution_report(pipeline_name: str, pipeline_title: str | None = None, pipeline_config: dict[str, Any] | None = None) PipelineExecutionReport

Create a new pipeline execution report.