pg-logstats Architecture¶
This document provides a comprehensive overview of the pg-logstats system architecture, module responsibilities, data flow, and extension points for future development.
Table of Contents¶
- System Overview
- Module Architecture
- Data Flow
- Core Components
- Extension Points
- Performance Considerations
- Error Handling Strategy
- Testing Architecture
System Overview¶
pg-logstats is designed as a modular, extensible PostgreSQL log analysis tool built in Rust. The architecture follows a pipeline pattern where data flows through distinct stages: discovery → parsing → analysis → output.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ File Discovery│───▶│ Log Parsing │───▶│ Analytics │───▶│ Output │
│ │ │ │ │ │ │ │
│ • Directory scan│ │ • Format detect │ │ • Query classify│ │ • JSON format │
│ • File filtering│ │ • Line parsing │ │ • Performance │ │ • Text format │
│ • Validation │ │ • Error recovery│ │ • Aggregation │ │ • Progress │
└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘
Design Principles¶
- Modularity: Each component has a single responsibility and clear interfaces
- Extensibility: New parsers, analyzers, and output formats can be added easily
- Performance: Memory-efficient processing with sampling for large files
- Reliability: Comprehensive error handling and graceful degradation
- Testability: Each module is independently testable with comprehensive coverage
Module Architecture¶
Core Library Structure¶
src/
├── lib.rs # Public API and core types
├── main.rs # CLI interface and orchestration
├── parsers/ # Log format parsing
│ ├── mod.rs # Parser trait and registry
│ └── stderr.rs # PostgreSQL stderr format parser
├── analytics/ # Data analysis and metrics
│ ├── mod.rs # Analysis orchestration
│ ├── queries.rs # Query classification and analysis
│ └── timing.rs # Performance metrics calculation
└── output/ # Result formatting and display
├── mod.rs # Output trait and registry
├── json.rs # JSON output formatter
└── text.rs # Human-readable text formatter
Module Responsibilities¶
src/lib.rs - Core Types and Public API¶
- Purpose: Defines core data structures and public API
- Key Types:
LogEntry: Represents a parsed log entryAnalysisResult: Contains analysis results and metricsPgLogstatsError: Unified error type for the entire systemLogLevel: PostgreSQL log levels (DEBUG, INFO, NOTICE, WARNING, ERROR, FATAL, PANIC)- Responsibilities:
- Public API surface
- Core data structure definitions
- Error type definitions
- Module re-exports
src/main.rs - CLI Interface and Orchestration¶
- Purpose: Command-line interface and workflow orchestration
- Key Components:
Arguments: CLI argument parsing with clapmain(): Application entry point and workflow coordination- Progress indication and user feedback
- Responsibilities:
- CLI argument parsing and validation
- File discovery and validation
- Workflow orchestration
- Progress reporting
- Error handling and user feedback
src/parsers/ - Log Format Parsing¶
- Purpose: Parse various PostgreSQL log formats into structured data
- Architecture:
- Current Implementations:
TextLogParser: text log parser for the supported default prefix and Amazon RDS%t:%r:%u@%d:[%p]:logs- Responsibilities:
- Format detection and validation
- Line-by-line parsing with error recovery
- Timestamp parsing and normalization
- Multi-line statement handling
src/analytics/ - Data Analysis and Metrics¶
- Purpose: Analyze parsed log data and generate insights
- Key Components:
queries.rs: Query classification and normalizationtiming.rs: Performance metrics and statistical analysis- Responsibilities:
- Query type classification (SELECT, INSERT, UPDATE, DELETE, DDL, OTHER)
- Query normalization for pattern analysis
- Performance metrics calculation (percentiles, averages)
- Slow query detection and analysis
- Frequency analysis and aggregation
src/output/ - Result Formatting¶
- Purpose: Format analysis results for different output targets
- Architecture:
- Current Implementations:
JsonFormatter: Machine-readable JSON outputTextFormatter: Human-readable text output- Responsibilities:
- Result serialization and formatting
- Quick mode vs. detailed output
- Progress indication during output
Data Flow¶
1. Initialization Phase¶
- Parse and validate command-line arguments - Check log directory existence and permissions - Initialize configuration and progress tracking2. Discovery Phase¶
- Scan specified directory for log files - Filter files by extension and readability - Validate file formats and warn about issues - Generate prioritized file processing list3. Parsing Phase¶
- Detect log format for each file - Fetch bounded CloudWatch Logs windows through the optional AWS SDK feature when requested - Parse lines with error recovery - Handle multi-line statements and continuations - Generate stream of structuredLogEntry objects
4. Analysis Phase¶
- Classify queries by type and complexity - Calculate performance metrics and statistics - Detect patterns and anomalies - Generate comprehensive analysis results5. Output Phase¶
- Format results according to specified output format - Handle quick mode vs. detailed output - Write to stdout or specified output fileCore Components¶
LogEntry Structure¶
pub struct LogEntry {
pub timestamp: DateTime<Utc>,
pub level: LogLevel,
pub message: String,
pub query: Option<String>,
pub duration: Option<f64>,
pub connection_id: Option<String>,
pub database: Option<String>,
pub user: Option<String>,
}
AnalysisResult Structure¶
pub struct AnalysisResult {
pub total_entries: usize,
pub query_types: HashMap<String, usize>,
pub performance_metrics: PerformanceMetrics,
pub slow_queries: Vec<SlowQuery>,
pub error_summary: ErrorSummary,
pub time_range: (DateTime<Utc>, DateTime<Utc>),
}
Error Handling¶
#[derive(Error, Debug)]
pub enum PgLogstatsError {
#[error("IO error: {0}")]
Io(#[from] std::io::Error),
#[error("Parse error: {0}")]
Parse(String),
#[error("Invalid argument: {0}")]
InvalidArgument(String),
#[error("Configuration error: {0}")]
Config(String),
}
Extension Points¶
Adding New Log Parsers¶
-
Implement the LogParser trait:
-
Register in parser module:
Adding New Analytics¶
-
Extend AnalysisResult:
-
Implement analysis functions:
Adding New Output Formats¶
-
Implement OutputFormatter trait:
-
Register in output module:
Future Phase Extensions¶
Phase 2: Advanced Analytics¶
- Real-time monitoring: Stream processing capabilities
- Machine learning: Anomaly detection and pattern recognition
- Alerting: Threshold-based notifications
- Extension Point:
src/analytics/realtime/module
Phase 3: Multi-format Support¶
- CSV logs: Structured log parsing
- JSON logs: Native JSON log support
- Syslog: System log integration
- Extension Point:
src/parsers/additional implementations
Phase 4: Advanced Output¶
- Web dashboard: HTML/CSS/JS output
- Grafana integration: Metrics export
- Database export: Direct database insertion
- Extension Point:
src/output/additional formatters
Performance Considerations¶
Memory Management¶
- Streaming processing: Process files line-by-line to minimize memory usage
- Sampling: Use
--sample-sizefor large files to limit memory consumption - Lazy evaluation: Parse and analyze data on-demand
Processing Optimization¶
- Regex compilation: Compile regex patterns once and reuse
- String interning: Reuse common strings to reduce allocations
- Parallel processing: Future enhancement for multi-file processing
Scalability Limits¶
- Single-threaded: Current implementation processes files sequentially
- Memory bounds: Large files are handled through sampling
- Disk I/O: Performance limited by disk read speed
Error Handling Strategy¶
Error Categories¶
- Recoverable Errors: Continue processing with warnings
- Malformed log lines
- Unparseable timestamps
-
Missing optional fields
-
Fatal Errors: Stop processing with clear error messages
- File not found
- Permission denied
-
Invalid arguments
-
Validation Errors: Prevent processing with helpful guidance
- Invalid log directory
- Unsupported file formats
- Configuration conflicts
Error Recovery¶
- Line-level recovery: Skip malformed lines with warnings
- File-level recovery: Continue with next file on parse errors
- Graceful degradation: Provide partial results when possible
Testing Architecture¶
Test Organization¶
tests/
├── integration_tests.rs # End-to-end CLI testing
├── unit/ # Unit tests by module
│ ├── parser_tests.rs # Parser unit tests
│ ├── analytics_tests.rs # Analytics unit tests
│ └── output_tests.rs # Output formatter tests
├── test_data/ # Test data generation
│ └── mod.rs # Utilities for creating test data
└── README.md # Testing documentation
Testing Strategy¶
- Unit Tests: Test individual functions and modules in isolation
- Integration Tests: Test complete workflows and CLI interface
- Property-based Tests: Test invariants and edge cases
- Performance Tests: Validate performance characteristics
- Regression Tests: Prevent regressions in core functionality
Test Data Management¶
- Generated test data: Programmatically create various log scenarios
- Deterministic tests: Use fixed seeds for reproducible results
- Edge case coverage: Test boundary conditions and error cases
- Performance benchmarks: Measure and validate performance metrics
This architecture supports the current Phase 1 implementation while providing clear extension points for future phases. The modular design ensures that new features can be added without disrupting existing functionality, and the comprehensive testing strategy maintains reliability as the system evolves.