Discover Jobs

The Jobs feature in Discover provides comprehensive monitoring for background jobs, scheduled tasks, queue processing, and asynchronous operations across your application infrastructure. Track job execution performance, error rates, processing duration, and queue health to ensure reliable background processing and maintain optimal system performance.

Job Overview

This background job monitoring solution helps you identify processing bottlenecks, monitor queue backlogs, detect failed jobs, and optimize task execution across your entire job processing infrastructure.

Prerequisites

To monitor background jobs and scheduled tasks with Last9, configure the following integrations:

Required:

Traces: Distributed tracing data is mandatory for job discovery, execution tracking, and operation-level analysis. Configure OpenTelemetry or other tracing instrumentation for your job processing systems. See all traces integrations.

Optional but Recommended:

Logs: Application and job processing logs provide detailed execution context and error information. Configure log forwarding from your job runners and processing systems. See all logs integrations.
Infrastructure Metrics: Container and host metrics for job processing infrastructure. Set up Docker, Kubernetes, or cloud monitoring for resource tracking during job execution.

Without traces, the Discover Jobs feature will have limited functionality. Logs and infrastructure metrics enhance troubleshooting capabilities and provide deeper operational context.

Understanding the Jobs Dashboard

Access the Jobs dashboard at Discover > Jobs in Last9.

Jobs Overview

The Jobs dashboard displays all monitored background jobs and scheduled tasks in your environment with key performance indicators. The dashboard provides two viewing modes controlled by the Group by Service toggle.

Default View (Ungrouped): The dashboard shows individual jobs with their performance metrics.
- Service: The service or application running the job
- Job Name: Specific job identifier or task name
- Queue Name: Job queue or processing system (when applicable)
- Throughput: Jobs processed per minute (RPM)
- Error Rate: Percentage of failed job executions
- Duration (P95): 95th percentile job execution time
Grouped View: When Group by Service is enabled, jobs are organized hierarchically by service, allowing you to:
- Expand/Collapse Services: Click the arrow icons to show or hide jobs within each service
- Job-Level Details: View individual job performance within the service context

Use the sidebar filters to focus on specific job types or services:

Select filter categories from the left sidebar (process_runtime_name, process_runtime_version, telemetry_sdk_language)
Choose specific values to filter the jobs list
Click Apply Filters to update the view
Use Clear to reset all applied filters

Analyzing Individual Jobs

Click on any job name to access detailed performance monitoring and execution analysis.

Job Overview

Overview

The Overview tab provides comprehensive job performance dashboards with key execution metrics:

Performance Metrics:

Availability: Job execution success rate and reliability tracking
Response Time: Execution duration with P50, P95, P99, and AVG percentiles
Throughput & Error Rate: Job processing volume and failure rates over time
Error Distribution: Breakdown of error types and their frequency during job execution

Key Performance Analysis:

Top 10 Errors: Most frequent error types and their occurrence counts for prioritizing fixes

Exceptions

Monitor job failures and execution errors:

Error Trend Visualization: Track error frequency over time with trend analysis for different error types
Exception Type Filtering: Filter by specific exception classes and error types that occur during job execution
Operation-Level Error Analysis: Identify which specific job operations are generating the most errors
Error Count Tracking: Monitor total error occurrences for each exception type to prioritize troubleshooting efforts

Breakdown

Analyze job execution performance by individual operations. The Breakdown tab shows detailed operation-level metrics:

Response Time Visualization: Area chart showing P50, P95, P99, and AVG response times with color-coded percentile bands
Operation Performance Table: Detailed metrics for each job operation including:
- Operation Name: Specific job operation or database query
- Operation Type: Classification (Database, HTTP Client, Consumer, etc.)
- Avg. Calls/Transaction: Average number of operations per job execution
- Response Time (P95): 95th percentile execution time
- Total Time Spent: Cumulative time spent on the operation

Access job-specific logs for troubleshooting:

Pre-filtered Logs: Automatically filtered to the selected job with service context
Log Volume Indicator: Visual representation of log activity over time
Time Range Alignment: Logs correspond to the selected monitoring time window
Search and Filter: Use the search bar to find specific log entries or filter by attributes
Click on any log line to view more details

Examine distributed traces for job execution. The Traces tab provides detailed execution flow analysis:

Trace Filtering: Filter traces by service name, span name, and span kind (Consumer, Internal, etc.)
Execution Timeline: View job execution traces with start times, trace IDs, and duration
Operation Details: Examine specific operations and services involved in job execution
Click on any trace or span to view distributed tracing visualization and more details

Best Practices

Job Monitoring Strategy:

Focus on jobs with high error rates or extended execution times in the main dashboard
Use the grouped view to understand service-level job health and identify problematic services
Monitor both individual job performance and overall job processing throughput

Performance Optimization:

Use the Breakdown tab to identify slow operations within job execution
Monitor database queries and external API calls that may be bottlenecks
Track resource usage patterns to optimize job scheduling and concurrency
Analyze execution time trends to validate the impact of job optimizations

Troubleshooting Workflow:

Start with the Overview tab to identify performance anomalies and error spikes
Use the Exceptions tab to understand specific error patterns and their frequency
Examine the Breakdown tab for operation-level performance issues
Access Logs for detailed execution context and error messages
Use Traces to understand job execution flow and identify bottlenecks in distributed processing

Queue Management:

Monitor throughput trends to identify processing capacity issues
Track error rates to detect systemic problems with job processing
Use duration metrics to optimize job execution and resource allocation
Analyze job scheduling patterns to balance system load and processing efficiency

Troubleshooting

Please get in touch with us on Discord or Email if you have any questions.