Discover Services

The Services feature in Discover serves as Last9’s comprehensive Application Performance Monitoring (APM) solution, providing deep visibility into your application’s health, performance, and user experience. Monitor critical APM metrics including throughput, response times, error rates, and APDEX scores across all your services, with the ability to drill down into individual operations, traces, and dependencies.

Services Overview

This full-featured APM platform helps you proactively identify performance bottlenecks, track application health trends, and maintain optimal user experience across your entire service architecture.

Prerequisites

To fully utilize the Services APM capabilities, ensure you have the following integrations configured:

Required:

Traces: Distributed tracing data is mandatory for service discovery, dependency mapping, and operation-level analysis. Configure OpenTelemetry or other tracing instrumentation for your applications. See all traces integrations.

Optional:

Logs: Application and infrastructure logs provide detailed troubleshooting context. Configure log forwarding from your applications and infrastructure. See all logs integrations.
Infrastructure Metrics: Container and host metrics enable infrastructure monitoring. Set up Docker or Kubernetes for container and host-level metric collection.
Process Metrics: JVM and application runtimes such as Node.js provide deep process insights. Configure runtime metrics via OpenTelemetry SDK for Java and Node.js applications. See all Java integrations, See all Node.js integrations.

Without traces, the Discover Services feature will have limited functionality. Logs and metrics enhance the experience but are not required for basic service monitoring and APM capabilities.

Understanding the Services Dashboard

Access the Services dashboard at Discover > Services in Last9.

Services Overview

The Services dashboard displays all monitored services in your environment with key performance indicators at a glance. Each service shows critical metrics including throughput (requests per minute), error rate percentage, availability, and response time.

The main dashboard includes:

Service Names: Organized by environment and runtime technology
Throughput: Requests per minute for each service
Error Rate: Percentage of failed requests
Availability: Service uptime percentage
Response Time: p95 response time in milliseconds

Use the sidebar filters to narrow down your view:

Click on any filter category in the left sidebar
Select specific values to filter the services list
Click Apply Filters to update the view
Use Clear to reset all applied filters

Analyzing Individual Services

Click on any service name to access detailed performance analysis and monitoring capabilities.

Overview

The Overview tab provides a comprehensive performance dashboard with multiple visualization panels:

Performance Metrics Include:

APDEX Score: Application Performance Index showing user satisfaction
Response Time: P50, P95, P99, and AVG percentiles with alert threshold reference lines
Availability: Service uptime tracking
Throughput & Error Rate: Request volume and failure rates over time
Error Distribution: Breakdown of error types and their frequency

Key Performance Tables:

Top 10 Web Operations: Slowest operations by response time
Top 10 Operations with Errors: Operations with highest error counts
Top 10 Errors: Most frequent error types and their occurrence

You can also click on each of the rows in the key performance tables to view more details.

Setting Up Alerts

Configure performance-based alerts directly from the service overview:

Alert Configuration

Click Enable Alert Rule in the Performance section
Configure the alert condition (e.g., APDEX Score falls below threshold)
Set the threshold value and time window
Preview the alert behavior with the visual timeline
Click Configure Alert Rule to finalize

The alert preview shows how the rule would have triggered based on historical data, helping you validate the threshold settings.

Notification Channels

Configure where alerts are delivered by setting up notification channels. Access this through the No Notification Channels button (when no channels are configured) or the Settings button.

Available notification channels include Slack, PagerDuty, Opsgenie, Webhook, and Email integrations. For detailed setup instructions, see the Alerting documentation.

Operation Analysis

All Operations

View detailed performance metrics for all operations within a service. The operations view includes filters for:

All Operations

Operation Type: Filter by Endpoints, HTTP types, Database operations
Operation Categories: Client-Internal, Client-External, Messaging, etc.

Each operation shows:

Throughput: Requests per minute
Error Rate: Failure percentage
Response Time: P95 latency metrics
Operation Type: Classification of the operation

Database

Monitor database-specific operations and queries across different database technologies:

Multi-Database Support: Automatically detects and monitors operations across MySQL, MongoDB, Redis, PostgreSQL, and other database technologies
Query Performance Tracking: Monitor throughput (RPM), error rates, and P95 response times for SELECT, INSERT, UPDATE, and other database operations
Time-Series Visualization: Identify slow queries, high-volume operations, and performance trends with detailed graphs
Operation-Level Details: View specific queries and statements with individual performance characteristics and metrics

Outgoing Calls

Monitor external dependencies and third-party service calls:

External API Monitoring: Track HTTP calls to third-party services, cloud APIs, and external endpoints with detailed performance metrics
Internal Service Communication: Monitor microservice-to-microservice communication and internal network calls
Client Type Classification: Distinguish between Client-Internal and Client-External operations for better dependency analysis
Dependency Performance Impact: Analyze how external service latency and availability affect your application’s overall performance
Service Reliability Tracking: Monitor throughput, error rates, and response times for all outgoing dependencies to identify unreliable external services

Exception Monitoring

Using the Exceptions tab, track and analyze application errors and exceptions:

Error Trend Visualization: Monitor error frequency over time with multiple trend analysis views (Error Type, Operation)
Exception Type Filtering: Filter by specific exception classes like ReadTimeout, UnknownHostException, SQLIntegrityConstraintViolation, and HTTP error codes (400, 404, 500, 502, 503)
Operation-Level Error Analysis: Identify which specific operations (Database queries, HTTP Client calls, Endpoints) are generating the most errors
Error Count Tracking: See total error occurrences for each exception type to prioritize troubleshooting efforts
Multi-Dimensional Analysis: Analyze errors by operation type (Database, HTTP Client - Internal/External, Endpoints) to understand error patterns across your application stack

Performance History

Analyze performance trends over time with historical comparisons:

Operation Type Filtering: Switch between Endpoints, Consumer, and Database operations to analyze specific operation categories
Period Comparison: Compare current performance against previous periods (Last 24 Hours, Previous 24 Hours, Last Monday, 7 Day Average)
Color-coded Performance: Green indicates improvements, red shows degradation in throughput and response times
Trend Analysis: Track throughput (RPM), response time (P95), error rates, and APDEX scores across different time periods
Operation-Level Insights: See performance changes for individual operations like API endpoints, database operations, and health checks

Service Dependencies

Using the Dependency tab, Visualize service relationships and dependencies:

Service Dependencies

The dependency map shows:

Service Connections: How services communicate with each other
Infrastructure Dependencies: Database and external service connections
Performance Impact: Metrics for each dependency relationship
Dependency Health: Red nodes and arrows indicate services/relationships with errors, green indicates healthy services

Navigate the dependency map using:

Zoom Controls: Use + and - buttons to adjust view
Pan: Click and drag to move around the map

Access service-specific logs for troubleshooting:

Pre-filtered Logs: Automatically filtered to the selected service
Time Range Alignment: Logs correspond to the selected time window
Volume Indicator: Visual representation of log volume over time
Click on any log line to view more details

Examine distributed traces for the selected service:

Operation Filtering: Filter traces by specific operations
Span Analysis: Examine individual spans within traces
Performance Correlation: Connect traces to performance metrics
Duration Tracking: Analyze request flow and timing
Click on any trace or span to view distributed tracing visualization and more details

Services may have one or both of the following, depending on their monitoring configuration and deployment type:

Infrastructure metrics monitor container-level resources (CPU, memory, network, disk)
Process metrics focus on application runtime performance (JVM, memory management, garbage collection)

Infrastructure Metrics

Monitor underlying infrastructure performance. Infrastructure monitoring covers:

Container Overview: High-level dashboard showing total containers, average CPU and memory usage, and network traffic (incoming and outgoing)
CPU Monitoring: Processor utilization per container with breakdowns by user mode and kernel mode
Memory Monitoring: RAM usage percentage by container, memory limits, and cache utilization patterns
Network I/O Monitoring: Network traffic analysis including data transfer rates and packet statistics
Block I/O Monitoring: Disk I/O operations for storage performance analysis

Process Metrics

Monitor JVM and application-level performance metrics. Process monitoring provides deep insights into application runtime performance:

Memory Health Overview: Track memory usage after garbage collection and memory growth patterns
Heap Pool Analysis: Monitor different heap space regions (Eden, Survivor) for garbage collection optimization
GC Performance Metrics: Analyze garbage collection overhead, frequency, duration, and efficiency
CPU and System Analysis: Monitor JVM and system CPU utilization, load, and thread management
Buffer Pool Intelligence: Track I/O buffer usage, limits, and utilization for performance optimization
Class Loading Monitoring: Monitor dynamic class loading and unloading behavior
Advanced Analysis: Detect memory leaks, performance degradation, and thread issues with predictive monitoring

Best Practices

Service Monitoring Strategy:

Set up alerts for critical services using APDEX scores below 0.8
Monitor both throughput trends and error rate spikes for early issue detection
Use dependency maps to understand service impact during incidents
Configure notification channels before alerts to ensure proper incident response

Performance Optimization:

Focus on operations with high response times in the “Top 10 Web Operations” table
Monitor error distribution to identify systemic vs. isolated issues
Use the Performance History tab to validate the impact of deployments
Correlate infrastructure metrics with application performance during capacity planning

Troubleshooting Workflow:

Start with the Overview tab to identify performance anomalies
Use the Exceptions tab to understand error patterns
Examine specific operations in the All Operations tab
Check dependencies to identify upstream or downstream impact
Access logs and traces for detailed root cause analysis

Troubleshooting

Please get in touch with us on Discord or Email if you have any questions.

Discover Services

Prerequisites

Understanding the Services Dashboard

Analyzing Individual Services

Overview

Setting Up Alerts

Notification Channels

Operation Analysis

All Operations

Database

Outgoing Calls

Exception Monitoring

Performance History

Service Dependencies

Related Logs

Related Traces

Related Metrics

Infrastructure Metrics

Process Metrics

Best Practices

Troubleshooting