This is the full developer documentation for Last9 # Introduction > Learn how to get started with Last9 [school](/docs/introduction/) ## [Getting Started](/docs/introduction/) [Understand what Last9 is and how to quickly start sending data](/docs/introduction/) [cell\_tower](/docs/control-plane/) ## [Control Plane](/docs/control-plane/) [Manage your telemetry data, its configurations, and its lifecycle](/docs/control-plane/) [wysiwyg](/docs/logs/) ## [Logs](/docs/logs/) [Explore your logs data, its details, and related telemetry](/docs/logs/) [waterfall\_chart](/docs/traces/) ## [Traces](/docs/traces/) [Explore your trace spans, its dependencies and timeline charts, and span details](/docs/traces/) [monitor\_heart](/docs/rum/) ## [Real User Monitoring](/docs/rum/) [Monitor your web application's performance from your users' perspective](/docs/rum/) [notifications\_active](/docs/alerting-overview/) ## [Alerting](/docs/alerting-overview/) [Set up alerts, pattern matching, receive notifications, an IaC tool for alerting](/docs/alerting-overview/) [extension](/docs/integrations/) ## [Instrumentation](/docs/integrations/) [Send data via OpenTelemetry, Prometheus, AWS Cloudwatch, and more](/docs/integrations/) [support](/docs/howto/) ## [Tutorials](/docs/howto/) [Common how-tos for Prometheus, Kubernetes, VictoriaMetrics, etc.](/docs/howto/) [help](/docs/faqs/) ## [FAQs](/docs/faqs/) [Frequently asked questions about Last9 — what, why, how](/docs/faqs/) ## Other Resources [assignment Changelog ](https://last9.io/changelog/)[public Blog ](https://last9.io/blog/)[group Community ](https://discord.com/invite/Q3p2EEucx9/)[comment X / Twitter ](https://x.com/last9io/)[smart\_display Youtube](https://youtube.com/@last9/) # Access Policies > Leverage Last9's access policies to perform traffic shaping of time series data in real-time. Last9 supports automatic data tiering of the metrics based on retention policies. These data tiers have different retention policies. E.g., Blaze Tier stores data for the last two hours, whereas Hot Tier stores data for the last six months. Depending on the use case, the tiers are designed to access their metrics from a fast or slow tier. It is extremely crucial to ensure that traffic for real-time alerting is always prioritized and served from the fastest Blaze tier. The Grafana queries can be served from the Hot tier without conflicting with alerting. Access Policies ensure that one can create these policy guardrails to ensure the metrics data is accessible from a specific tier based on its purpose. Note Access Policies are specific to a Last9 cluster. Different clusters can have different Access Policies. Each Access Policy is associated with a Token created for a Tier. Tokens allow ACL for time series data by providing a way to access data from specific tiers for either `read` `write` or both operations. ## Setting up Tokens To achieve this, create a read token first from Settings -> Tokens. [Creating a Read Token in Levitate](https://www.youtube.com/embed/qfdUYwAMZvw) ## Creating Access Policy Once the Token is created, one can create an Access Policies from Settings -> Policies. [Creating Access Policy in Levitate](https://www.youtube.com/embed/0j_N9CKyigY) That’s it, you don’t have to change anything more. Just use the token associated with the Alerting policy to configure alertmanager and the token associated with the visualization policy to [configure Grafana](/docs/grafana-config/). Last9 will take care of performing traffic shaping in real-time. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Alert Groups > Overview of Alert Groups An Alert Group is a container for Indicators (ie PromQL queries) and Alert Rules which evaluate these queries. Alerts that are generated by Alert Rules, send notification on the Channels that are configured within the Alert Group. ## Creating an Alert Group Note The following steps are to create an Alert Group using the UI. To use a gitops workflow, see [Declarative Alerting via IaC](/docs/alerting-via-iac/). Alert Groups created via IaC have the option to [disable any edits from the UI](/docs/alerting-via-iac/#entities) to avoid configuration conflicts. 1. Navigate to **Home** → **Alert Studio** → **Alert Groups** and click on **Add New** ![Creating An Alert Group ](/_astro/alert-group-1.Ru2KGP9C_ZjPvIq.webp) ![Creating An Alert Group ](/_astro/alert-group-2.B_62WYKp_ZLGtlE.webp) 2. Assign a descriptive name to the Alert Group and Select the data source from which you like to query metrics and click **Create.** ![Creating An Alert Group ](/_astro/alert-group-2.B_62WYKp_ZLGtlE.webp) Ensure that you select the correct Data Source (Last9 Cluster) from which you like to query the metrics from or else the Alert Rules will not evaluate. Pro Tip - You can also use Last9’s Health Cluster as a data source to setup alerting to watch your Cluster’s health .. after all q*uis custodiet ipsos custodes?* 3. Click on the **Alert Group** to navigate to your newly created Alert Group ![Creating An Alert Group ](/_astro/alert-group-4.D6KHbGjC_QENhQ.webp) If this is your first Alert Group, next you would need to create the first Indicators followed by creating an Alert Rule. ## Deleting an Alert Group Deleting an Alert Group deletes all the Alert Rules, Indicators and all the generated Alerts. To delete an Alert Group: 1. Navigate to **Home** → **Alert Studio** → **Alert Groups** 2. Click the **…** button besides the Alert Group you wish to delete and select **Delete** ![Deleting An Alert Group ](/_astro/alert-group-6.BiaDA1dH_eG91V.webp) ![Deleting An Alert Group ](/_astro/alert-group-7.DbhZmNj7_1MKsxc.webp) ## Features ### Labels Labels are are named pairs (key:value pairs) that add additional information and context to Alert Groups. To add labels to an Alert Group: 1. Click **Edit** in to update Alert Group meta fields ![Alert Group Labels](/_astro/alert-group-8.Dz1qN-2f_Z1BRNzD.webp) 2. In the Labels card, click **Add Labels** to add a new label Labels must have a unique key (ie a name). Label value can contain alphanumeric text. ![Alert Group Labels](/docs/gif-images/alert-group-9.gif) 3. Click **Done** to exit edit mode To edit or delete labels from an Alert Group: 1. Click **Edit** in to update Alert Group meta fields 2. In the Labels card, hover on the label you wish to edit or delete. Click on the appropriate button to edit or delete the label ![Alert Group Labels](/docs/gif-images/alert-group-10.gif) 3. Click **Done** to exit edit mode ### Tags Tags help you categorize multiple Alert Groups To add tags to an Alert Group: 1. Click **Edit** in to update Alert Group meta fields 2. In the Details card, click on Assign Tags 3. Search from existing or add a new Tag to the Alert Group ![Alert Group Tags](/docs/gif-images/alert-group-11.gif) 4. Click **Done** to exit edit mode To remove tags from an Alert Group: 1. Click **Edit** in to update Alert Group meta fields 2. In the Details card, hover on the tags you wish to edit or delete. Click on the X button to remove tag 3. Click **Done** to exit edit mode ### Links Alert Group links allow you to add links to external resource used by your team. These can be very helpful for your team to quickly navigate to resources like CloudWatch, runbooks or repos, etc. Links are named URLs which can have any custom with several suggested links. To add links to Alert Groups: 1. Click **Edit** in to update Alert Group meta fields 2. In the Links card, add a link to a suggested field or add your own custom name for the link ![Alert Group Tags](/docs/gif-images/alert-group-12.gif) 3. Click **Done** to exit edit mode To edit or remove links from an Alert Group: 1. Click **Edit** in to update Alert Group meta fields 2. In the Links card, hover on the link you wish to edit or delete. Click on the appropriate button to edit or delete the link 3. Click **Done** to exit edit mode ## Alert Group Settings ### Channels Notifications from Last9 are sent on [Notifications Channels](/docs/notification-channels/). Ensure that you have at least one Notification Channel configured, before trying to add an Channels to an Alert Group To add a notification channel: 1. Navigate to `Home` → `Alert Studio` → `Alert Groups` → *Select an Alert Group* Press the ⚙️ icon on the top right to view Alert Group settings. ![Adding a Notification Channel](/_astro/alert-notification-1.1goAbmeG_ZaE7gg.webp) 2. Under the Channels tab you can assign channels as per Alerts severity level, ie you can set different (or same) channels for Threat and Breach severity alerts ![Adding a Notification Channel](/_astro/alert-notification-2.DouIv_K4_Z1851Ed.webp) Slack integration also allows you to append additional *@mentions* to tag a person or group ![Adding a Notification Channel](/_astro/alert-notification-3.DtvjI8XB_wdQai.webp) 3. The configured Alert Channel will now start receiving alerts ![Adding a Notification Channel](/_astro/alert-notification-4.CUQYifw__2rUmdY.webp) *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Alert Rules > Alert Rules Overview ## Alert Rules Overview Alert Rules evaluate Indicators using algorithms and configured thresholds/sensitivity to generate Alerts. Alerts can be visualized using Health and sent to configured Notification Channels. ## Creating an Alert Rule Before you can configure an Alert Rule, you need at least one Indicator to be created in the Alert Group. To create an Alert Rule: 1. Navigate to the Alert Group in which you would like to create the Alert Rule: **Home** → **Alert Studio** → **Alert Groups** → *Select an Alert Group* → **Alert Rules** Tab ![Creating An Alert Rule 1](/_astro/alert-rule-1.Bej9KOSX_3KArK.webp) 2. The following details are required for an Alert Rule: 1. **Rule Name**: Use a descriptive Alert Rule that can be easily identified by your team. This will be also sent as the part of notifications 2. **Indicator**: Select the indicator for which you would like to create the Alert Rule. If you have not created an Indicator, follow the steps as mentioned here 3. **Edit Label Filter**: … 4. **Algorithm**: Select from the available algorithms. For a detailed guide on how various algorithms, refer to this [guide](/docs/anomalous-pattern-detection-guide/) In this tutorial, we will step up an alert using Static Threshold 5. **Threshold / Sensitivity**: Specify the Threshold (or Sensitivity in case of Anomaly Detection algorithms) and the Operator (example: Alert when the Indicator value is *greater than or equal* to 10) 6. **Alert Sensitivity**: Using Alert Sensitivity you can define how reactive is the Alert Rule. Alert Sensitivity requires two inputs: * Total Minutes: This is the total duration of the rolling time window during which the Indicator is evaluated, with the maximum allowed duration being 60 minutes. (All Alert Rules are evaluated in one-minute intervals) * Bad Minutes: This value represents the number of minutes within the evaluation window that exhibit undesirable or unexpected behavior. These “bad” minutes need not be consecutive If the number of ‘Bad Minutes’ exceeds the predefined limit within the ‘Total Minutes’ rolling window, an alert is generated. A rolling time evaluation window offers continuous analysis by constantly updating the period under evaluation. It allows for immediate reaction to issues as they develop, rather than waiting for a static hourly evaluation to complete. This mechanism ensures that users are notified only when there is a significant deviation in expected metric performance, helping to avoid unnecessary alerts for minor or inconsequential fluctuations. 7. **Severity Level**: Helps you provide additional context to Alert Rule by categorizing alerts as either Threat or Breach. We indicate Threat alerts in amber and Breach as red colors 8. **Notification Group**: For Indicators with multiple timeseries, you can choose to receive individual alerts for every single timeseries or to group them into a single alert. We recommend that you group alerts as ungrouping them can lead to noise being generated 9. **Annotations** (Optional): Annotations are optional information labels in `key:value` format that can be sent with every Alert notification. You can use these specify additional description, Runbooks or trigger complex workflows in your incident management systems 10. When the threshold is configured, we generate a preview to help you visualize what values of the Indicators are considered anomalous. The number of timeseries that will be evaluated every minute by the alert rule are indicated Click **Save Rule** to enable alerting for this rule. To start receiving notification for this Alert Rule, ensure that at least one Notification Channel as been configured for this Alert Group. ![Creating An Alert Rule 2](/_astro/alert-rule-2.BkYvZ-S__Z1XUB8G.webp) *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Alerting Overview > Overview of Last9's Alerting Capabilities Last9 comes with complete monitoring support, including alerting and notification capabilities. Irrespective of your tool choice, a few problems plague today’s alerting journey — coverage, fatigue, and cleanup. Unfortunately, there are no easy answers to these complex problems. However, with advanced features like Pattern-based Alerting and a redesigned Alert Manager designed with High Cardinality in mind, Last9 helps you stay ahead. In addition to being fully PromQL compatible, it provides features like a real-time alert monitor and historical health view. You can also perform advanced tasks, such as correlating them with events while focusing on the desired outcome of keeping up with constantly evolving infrastructure and Services. In addition to being fully PromQL compatible, it provides features like a real-time alert monitor and historical health view. You can also perform advanced tasks, such as correlating them with events while focusing on the desired outcome of keeping up with constantly evolving infrastructure and Services. Alerting with Last9 starts by creating an **Alert Groups** which contain one or more **Alert Rules.** These Alert Rules evaluate the PromQL queries which are defined as **Indicators** in the Alert Group. Using **Alert Monitor** you can view a live updating stream of all your Alert Rules across all Alert Groups. In the following section we dive deeper into each of these components. ## Enabling Alerting Studio All new orgs needs to request access to Alert Studio, this is a one time action and takes about 30 minutes to get completed (usually done much faster). To enable Alert Studio: 1. Navigate to **Home** → **Alert Studio** and click on the **Request To Enable** button ![Enabling Alert Studio](/_astro/alerting-overview-1.BiViaQmt_2huozx.webp) 2. Once the request has been sent, come back in some time to start using Alert Studio *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Declarative Alerting via IaC > Last9 supports configuring alerts and notifications automatically using a Python-based SDK tool which takes care of infrastructure changes Configurations for alerting and notifications for observability at scale are hard to start, maintain and fix manually, just like provisioning infrastructure at scale. With infrastructure changes, it’s important that the observability stack also catch up with it to avoid the chances of issues because of a lack of observability or black swarm events. Last9 has introduced.`l9iac` tool to solve the exact same problem. ## Installation Last9’s IaC (Infrastructure as Code) tool is available as a Docker image, providing a consistent and isolated environment for automating entity creation and alert configuration. 1. **Pull the Docker Image** ```bash docker pull last9system/iac:latest ``` The image is available on [DockerHub](https://hub.docker.com/repository/docker/last9system/iac/general). 2. **Prepare Your Working Directory** Create a directory containing: * Your IaC YAML files * `config.json` with your refresh tokens ([see file structure](#configuration-file-structure)) * Space for the state lock file 3. **Run the Docker Container** ```bash docker run --name l9iac -d -v : last9system/iac: ``` Example: ```bash docker run -d -v /home/user/iac-files:/app/rules last9system/iac:2.4.2 ``` > 💡 **Note**: If using Docker Desktop, ensure file sharing is enabled for the volume mount. 4. **Execute IaC Commands** ```bash docker exec -it l9iac -mf -c ``` Example: ```bash docker exec -it bcdea6660fd4 l9iac -mf /app/rules/alert-rules.yaml -c /app/rules/config.json plan ``` ## Configuration File Structure The IaC tool requires a `config.json` file with the following structure: ```json { "api_config": { "read": { "refresh_token": "", "api_base_url": "https://app.last9.io/api/v4", "org": "" }, "write": { "refresh_token": "", "api_base_url": "https://app.last9.io/api/v4", "org": "" }, "delete": { "refresh_token": "", "api_base_url": "https://app.last9.io/api/v4", "org": "" } }, "state_lock_file_path": "state.lock" // Should be in the same directory as model_file and config_file } ``` ### Important Notes * The `refresh_token` values can be obtained from the [API Access](https://app.last9.io/api-access) page in the Last9 dashboard ([know more](/docs/getting-started-with-api/)) * The `` can be obtained from the app’s URL: `app.last9.io/v2/organizations/` * For on-premise Last9 setups, contact to get the correct `api_base_url` * The `state_lock_file_path` should be accessible from the directory where you run the IaC commands ## Quick Start 1. Create a *YAML* as per your alert rule configuration **Example**: notification\_service\_am.yaml ```yaml # notification_service_am.yaml entities: - name: Notification Backend Alert Manager type: service_alert_manager data_source: prod-cluster entity_class: alert-manager external_ref: unqiue-slug-identifier indicators: - name: availability query: count(sum by (job, taskid)(up{job !~ "ome.*"}) > 0) / count(sum by (job, taskid) (up{job=~".*vmagent.*", job !~ "ome.*"})) * 100 - name: loss_of_signal query: 'absent(up{job !~ "ome.*"})' alert_rules: - name: Availability of notification service should not be less than 95% description: The error rate (5xx / total requests) is what defines the availability, lower value means more degradation indicator: availability less_than: 99.5 severity: breach bad_minutes: 3 total_minutes: 5 group_timeseries_notifications: false annotations: team: payments description: Error Rate described as number of 5xx/throughput runbook: https://notion.com/runbooks/payments/error_rates_fixing_strategies ``` 2. Prepare the configuration file for running the IaC tool The configuration file has the following structure. It is a JSON file. ```json { "api_config": { "read": { "refresh_token": "", "api_base_url": "https://app.last9.io/api/v4", "org": "" }, "write": { "refresh_token": "", "api_base_url": "https://app.last9.io/api/v4", "org": "" }, "delete": { "refresh_token": "", "api_base_url": "https://app.last9.io/api/v4", "org": "" } }, "state_lock_file_path": "state.lock" } ``` * The `refresh_token` can be obtained from the API Access page from the Last9 dashboard. You need to have `refresh_tokens` for all 3 operations - read, write and delete as the `l9iac` tool will perform all these 3 actions while applying the alert rules. * The `` is your organization’s unique slug in Last9. It can be obtained from the API access page of Last9 dashboard.i * The default `api_base_url` is `https://app.last9.io/api/v4`. If you are on an on-premise setup of Last9, contact to get the `api_base_url`. * The `state_lock_file_path` is name of the file where `l9iac` will store the state lock of current alerting state(on the same lines of terraform state.lock). 3. Run the following command to do a dry run for the changes ```shell l9iac -mf notification_service_am.yaml -c config.json plan ``` 4. Run the following command to apply the changes ```shell l9iac -mf notification_service_am.yaml -c config.json apply ``` Tip We will provision the GitOps flow that will run `apply` command once changes are merged to the master branch in the GitHub repo. Contact for more details. ## Schema Here is the complete schema for generating the above `.yaml` file: ### Entities | Field | Type | Unique | Required | Description | | --------------------------------------------------------- | --------------- | ------ | -------- | ---------------------------------------------------------------------------------------------- | | name | string | false | true | Name of the entity (alert manager) | | type | string | false | true | Type of the entity | | external\_ref | string | true | true | External reference for the entity, it’s a unique slug format identifier for each alert manager | | [adhoc\_filter](#common-rule-filters-adhoc-filters) | object | false | optional | List of common rule filters for the entity | | [alert\_rules](#alert-rules) | array | false | optional | List of alert rules for the entity | | data\_source | string | false | optional | Data source | | data\_source\_id | string | false | optional | The ID of the data source | | description | string | false | optional | Description of the entity | | entity\_class | string | false | optional | Denotes the class of the entity. Supported values: `alert-manager` | | [indicators](#indicators) | array | false | optional | List of indicators for the entity | | labels | object | false | optional | List of key value pairs of group label names and values | | [links](#links) | array | false | optional | List of links associated with the entity | | namespace | string | false | optional | The namespace of the entity | | [notification\_channels](#notification-channels) | string OR array | false | optional | List of notification channels applicable to the entity | | tags | array | false | optional | List of tags for the entity | | team | string | false | optional | The team that owns the entity | | tier | string | false | optional | Tier of the entity | | [ui\_readonly](/docs/alert-group#creating-an-alert-group) | boolean | false | optional | Disable any sort of edits to the alert group from the UI | | workspace | string | false | optional | Workspace of the entity | ### Common Rule Filters (Adhoc Filters) | Field | Type | Unique | Required | Description | | ------------ | ------ | ------ | -------- | ------------------------------------------------- | | labels | object | false | required | List of key value pairs of label names and values | | data\_source | string | false | required | Defaults to entity’s data source | ### Alert Rules | Field | Type | Unique | Required | Description | | -------------------------------- | ---------- | ------ | -------- | -------------------------------------------------------------------------------------------- | | name | string | true | required | Rule name that describes the alert | | indicator | string | false | required | Name of the indicator | | bad\_minutes | integer | false | required | Number of minutes the indicator must be in a bad state before alerting | | total\_minutes | integer | false | required | Total number of minutes the indicator is sampled over | | description | string | true | optional | Description for an alert rule that is included in the alert payload | | expression | string | false | optional | Alert rule expression, to be used only for pattern-based alerts | | greater\_than | number | false | optional | Alert triggers when the indicator value is greater than this | | greater\_than\_eq | number | false | optional | Alert triggers when the indicator value is greater than or equal to this | | less\_than | number | false | optional | Alert triggers when the indicator value is less than this | | less\_than\_eq | number | false | optional | Alert triggers when the indicator value is less than or equal to this | | equal\_to | number | false | optional | Alert triggers when the indicator value is equal to this | | not\_equal | number | false | optional | Alert triggers when the indicator value is not equal to this | | group\_timeseries\_notifications | boolean | false | optional | If multiple impacted time series in an alert need to be grouped as one notification or not | | is\_disabled | boolean | false | optional | Whether the alert is disabled or not | | label\_filter | map/object | false | optional | Mapping of the variables present in the indicator query and their pattern for the alert rule | | mute | boolean | false | optional | If alert notifications need to be muted or not | | [runbook](#runbook) | | false | optional | Runbook link to be included in the alert payload | | severity | string | false | optional | Can be a `threat` or `breach` | #### Runbook | Field | Type | Unique | Required | Description | | ----- | ------ | ------ | -------- | ------------------------------------------------ | | link | string | false | required | Runbook link to be included in the alert payload | ### Indicators | Field | Type | Unique | Required | Description | | ------------ | ------ | ----------------------------------------- | -------- | ------------------------------------ | | name | string | true, uniqueness enforced at entity level | required | Name of the indicator | | query | string | false | required | PromQL query for the indicator | | data\_source | string | false | optional | Data Source of the indicator (Last9) | | description | string | false | optional | Description of the indicator | | unit | string | false | optional | Unit of the indicator | ### Links | Field | Type | Unique | Required | Description | | ----- | ------ | ------ | -------- | ------------------------ | | name | string | false | required | Display name of the link | | url | string | false | required | URL of the link | ### Notification Channels | Field | Type | Unique | Required | Description | | -------- | ----------------------- | ------ | -------- | ------------------------------------------------------------------------------------------------- | | name | string | false | required | Name of the notification channel | | type | string | false | required | Type of notification channel. Allowed values: `slack`, `pagerduty`, `opsgenie`, `generic_webhook` | | mention | string OR list (string) | false | optional | Only applicable to Slack. The user(s) to tag in the alert message | | severity | string | false | optional | Severity of the alerts sent through this channel. Allowed values: `threat`, `breach` | Before a notification channel can be used in IaC, it needs to be configured. Please see [Notification Channels](/docs/notification-channels/) for more details. ## Supported Macros by IaC * `low_spike (tolerance, metric)` * `high_spike (tolerance, metric)` * `decreasing_changepoint (tolerance, metric)` * `increasing_changepoint (tolerance, metric)` * `increasing_trend (tolerance, metric)` * `decreasing_trend (tolerance, metric)` *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Anomalous Pattern Detection Guide > An overview of Pattern Detection algorithms supported by Last9 and guidelines on when to use them. ## Supported algorithms Last9’s Alert Studio supports following algorithms for anomalous pattern detection. ### High Spike The high spike algorithm is designed to detect sudden increases in signal values, particularly when the increase occur within a short time frame. It is especially useful for detecting sudden jumps in the number of 4xx, throughput, and edge hits, which are a good fit for the high spike. The high spike algorithm compares the current data point with the last 60 minutes worth of data points to check whether a given point has a considerably large amplitude or not. Tip The **high spike** algorithm is designed to detect sudden increases in signal values over a short period of time, while the **low spike** algorithm is useful for identifying sudden drops in signal values. #### Eligible Signals for High Spike Signals similar to following can be used for high spike pattern detection. ![High Spike Pattern Detection Signal](/_astro/eligibile-signals-high-spike-1.Njp_wYxy_16mCzg.webp) ![High Spike Pattern Detection Signal](/_astro/eligibile-signals-low-spike-1.B8qjOS9J_13h2rK.webp) ### Low Spike The low spike algorithm is particularly helpful in identifying sudden drops in signal values, Signals such as CPU utilization, cache hit rate, and availability are good fits for the low spike algorithm. The algorithm compares the current data point with the previous 60 minutes of data to determine whether a given point represents a significant drop or not. Tip The **level change** algorithm is different from high/low spike algorithms in that it detects when a data pattern has changed rather than detecting a single or few large jumps or drops. #### Eligible Signals for Low Spike ![Low Spike Pattern Detection Signal](/_astro/eligibile-signals-low-spike-1.B8qjOS9J_13h2rK.webp) ![Low Spike Pattern Detection Signal](/_astro/eligibile-signals-low-spike-2.CR6OcQGH_Z15B7nD.webp) ![Low Spike Pattern Detection Signal](/_astro/eligibile-signals-low-spike-3.--2pIdwK_1M6Dp.webp) ### Level Change The level change algorithm detects the point at which data begins to exhibit a new pattern that is different from the old. The data will have different patterns before and after the level change. To determine if an incoming point is a candidate level change, the algorithm checks if it is different (too high or too low) from the data over the last hour. #### How is this different from a high/low spike? If the data shows a single or a few large jumps or drops, this algorithm will not detect them. A single different value or even a few of them do not necessarily indicate that the pattern has changed or that there is a new pattern. #### Eligible Signals Level change Algorithm ![Level Change Pattern Detection Signal](/_astro/eligibile-signals-level-change-1.Cdyn2nGK_ZUjyMV.webp) ![Level Change Pattern Detection Signal](/_astro/eligibile-signals-level-change-2.DTSZN2M3_2n2Lk8.webp) ![Level Change Pattern Detection Signal](/_astro/eligibile-signals-level-change-3.DIjxsJ9h_5kpov.webp) ### Trend Deviation A trend algorithm is a useful tool for detecting deviations in a signal from its expected pattern compared to its behaviour over a certain number of previous days. * For each incoming data point, collect relevant data from the past (this is the reference period or seasonality). It is not necessary to collect all past data * To determine if an incoming data point is an anomaly, compare it with a reference period from the past Tip The **trend deviation** algorithm detects when unexpected data points or patterns have occurred compared to the previous 7 days of data. #### Illustrations In the below scenario, the trend algorithm will detect anomalies at 10 a.m. (red circled) because it is not expected when compared to its previous days (reference period- red-colored rectangular boxes) ![Trend Detection Pattern Detection Signal](/_astro/eligibile-signals-trend-deviation-1.CQsxmWXQ_29TA9z.webp) In Figure 2, if we observe the signal pattern carefully, the trend algorithm will not detect any anomalies at 10 a.m. (red-circled) because the point or peak is expected when compared to its previous days. ![Trend Detection Pattern Detection Signal](/_astro/eligibile-signals-trend-deviation-2.BgHTjtJ6_DY4TU.webp) In Figure 3, if we observe the signal pattern carefully, the trend algorithm will detect anomalies at 10 a.m. (red circle). Although it is a repetitive peak, the amplitude of this peak is much higher than the peaks of the previous days. ![Trend Detection Pattern Detection Signal](/_astro/eligibile-signals-trend-deviation-3.DeUxy9fZ_1pSMz5.webp) ![Trend Detection Pattern Detection Signal](/_astro/eligibile-signals-trend-deviation-4.COIYkMch_2h0mcf.webp) #### Eligible Signal for Trend (increasing / Decreasing) ![Trend Detection Pattern Detection Signal](/_astro/eligibile-signals-trend-deviation-5.BxdaCD87_Z27r5ma.webp) ![Trend Detection Pattern Detection Signal](/_astro/eligibile-signals-trend-deviation-6.CxGZfV0T_Z2dpdLx.webp) ![Trend Detection Pattern Detection Signal](/_astro/eligibile-signals-trend-deviation-7.BTmBjaqW_SKr5M.webp) ## How to select the right algorithm? Each algorithm matches a specific pattern and raises an alert when it is encountered. To use it effectively, the user should follow the below process when choosing an algorithm. 1. **Define normal behaviour**. It is important to know what the acceptable behaviour of the signal is. One simple way of doing this, is to look at the signal over the relevant span, and try and point out the timestamps where the signal deviates from the normal behaviour, and you would like to get alerted. Remember, an algorithm is not able to detect deviation from normal behaviour, if a trained human cannot 2. **Identify the anomalous pattern(s) in the signal**. Different signals exhibit different anomalous behaviour. Some might show spikes, some might show level change. Eg, for a signal like CPU usage, a sharp spike that returns to baseline may be perfectly normal behaviour, but for a business metric it may not. Knowledge of the underlying processes that generate the signal is essential to determine the correct pattern 3. **Check if a PromQL expression captures the intended deviation better**. PromQL is a very powerful language with many functions. For detecting deviations that can be defined in terms of relative values, percentages, or some rollup formulae on historical data, prefer defining the PromQLs accordingly For eg., if a signal has a normal range if ```text it stays in a range of minimum and maximum of the 15 minute medians over the last 2 days, with a tolerance of 20% ``` The PromQL to detect this would be ```text s < min_over_time(median_over_time(s)[15m])[2d]*0.8 || s > max_over_time(median_over_time(s)[15m])[2d]*0.8 ``` where `s` is the original signal metric. 4. **Check the Algorithm**. If the pattern that you want to match cannot be expressed easily like demonstrated above, check if any built-in algorithm can satisfactorily match the pattern. Remember that each algorithm has its own limitations, and it is important to understand them when working with signals Signals that don’t meet the requirements of any of the algorithms should be handled differently. By selecting the appropriate algorithm and adjusting the sensitivity to match your use case, you can improve the accuracy of these pattern detections. ## When not to choose a pattern matching algorithm? As a rule of thumb, a pattern matching algorithm should be chosen in situations where a human who is looking at the plot can define, with a high level of accuracy, where an alert should be generated and where it should not be generated. If, by looking at the plot, it is not possible for a human to determine the alert points, it is highly unlikely that any of the above algorithms can succeed. Below are a few signals which are not a good fit for any one of the above algorithms ![Ineligible signals for pattern detection](/_astro/ineligible-signal-1.OeR1Ck5x_Z1Y5EBE.webp) The above signal is mostly zero-valued. Applying high spikes, low spikes, or increasing trend to these types of signals will cause each and every peak to be alerted. It is better to use a static threshold instead of pattern matching functions on these types of signals. Note Alert Studio supports static threshold based alerting as well. *** ![Ineligible signals for pattern detection](/_astro/ineligible-signal-2.BCfoXVjD_Z2811Rf.webp) This signal is a discrete-time signal. At any given point in time, it can have one of three possible values (1000, 1500, 2000) or no value at all. For this type of signal, a static threshold may be a better choice. *** ![Ineligible signals for pattern detection](/_astro/ineligible-signal-4.BJQHik2Y_HVXRC.webp) ![Ineligible signals for pattern detection](/_astro/ineligible-signal-3.CFxo9RNk_Z296xyX.webp) ![Ineligible signals for pattern detection](/_astro/ineligible-signal-5.BvLglKVh_oLS2s.webp) ![Ineligible signals for pattern detection](/_astro/ineligible-signal-6.DfrIi6NR_7qqDa.webp) These signals should be handled differently as they do not follow a predictable pattern, making it difficult to detect patterns. ## Summary While deciding the pattern detection algorithm, it is important to understand the nature of the signal and the objective of the alert before choosing the algorithm. This guide describes a few guidelines which can be used while deciding the pattern algorithms with Last9 Alert Studio. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Cardinality Explorer > Identify metrics and labels impacted by cardinality. Cardinality Explorer helps you understand how the cardinality for metrics in a Cluster is trending. This powerful feature enables you to diagnose cardinality-related challenges with your metrics. ## Using Cardinality Explorer To view an individual metric’s cardinality contribution: 1. Navigate to **Control Plane** → **Cardinality Explore** & Select the Cluster you wish to explore ![Cardinality Explorer 1](/_astro/cardinality-1.CUa7wOC-_Z25CiBG.webp) A report with all your metrics in the selected date is generated. When the current day is selected, the data shown in the table will continue to update throughout the day. The report also highlights metrics if they have crossed or are nearing their cardinality quota limits: * Metrics in Red have crossed their daily cardinality quota * Metrics in Amber have crossed 80% of their daily cardinality quota 2. To view how an individual metric is contributing towards the Cluster’s cardinality, click on a metric from the table: ![Cardinality Explorer 2](/_astro/cardinality-2.nspUAQP-_Z1xIKcy.webp) You can use this detail view to diagnose issues with the selected metric using: * **Cardinality Trend:** Using this graph you can observe how the cardinality of the selected metric has trended over the last 7 days. A sudden spike or a dip may indicate unexpected changes to the cardinality of this metric * **Cardinality Details: V**iew all the metric’s label names and their top 5 occurring label values for a selected date. Using this, you can find which labels contribute to the metric’s cardinality growth. When the current day is selected, the reported cardinality and labels shown will continue to update throughout the day *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Change Events > With Last9, track change events such as configuration changes and deployment events with ease along with other metrics. ## Why Change Events Matter? Software systems and their observability are not just about the telemetry data emitted from these systems. The software systems also get affected by external change events. These events can be from domains such as deployment, configuration, or external 3rd party systems. Last9 allows tracking such change events along with other metrics, seamlessly providing more context to the system observability. This document will show how to start tracking change events with Last9. Last9 offers an HTTP API that can be used to track any domain change event. Each event has two states, start and stop, and both can be tracked with Last9. Once Last9 receives the event, it converts it into a metric that can be used with other metrics for querying using PromQL in Grafana dashboards or alerting. The data flow for change events is as follows. ![Change Events Data Flow](/_astro/change-events.DkIajm3P_2cc2Xy.webp) ## Last9 Change Events API Last9 offers a REST HTTP endpoint that can be used to send the change events. The API endpoint is as follows. ```shell curl -XPUT https://app.last9.io/api/v4/organizations/{org_slug}/change_events \ --header 'Content-Type: application/json' \ --header 'X-LAST9-API-TOKEN: Bearer ' \ --data-raw ' { "timestamp": "2024-01-15T17:57:22+05:30", "event_name": "new_deployment", "event_state": "start", "data_source_name": {levitate_cluster_data_source_name}, "attributes": { "env": "production", "k8s_cluster": "prod-us-east-1", "app": "backend-api" } }' ``` Tip Refer to the [Getting Started with API](/docs/getting-started-with-api/) guide to obtain the token required for the change events API. * `timestamp` is the iso8601 formatted timestamp of the event. It is optional. If not passed, current timestamp is used * `event_name` can be any event name depending on the context. It will be added as a label to the resulting time series * `event_state` can be `start` or `stop` * `attributes` will be used as labels while converting the change event to a metric * `data_source_name` is the name of the Last9 cluster where the events will be stored. It is an optional field and can be obtained as described in the [Change Events Storage](#change-events-storage) section. Last9 will convert the events into a metric named `last9_change_events`. ## Change Events Storage It is possible that you might be using multiple Last9 clusters. In such scenario, you can choose to store the change events in a Last9 cluster of your choice. The optional `data_source_name` attribute is used to specify the cluster where change event will be stored. If this attribute is not passed, then Last9 will store the change event in a default cluster designated for change events. The default cluster for change events is set as follows. ![Default Cluster for Change Events](/_astro/default-change-events-cluster.C5yBbpNC_1smjDq.webp) You can override this by specifying the `data_source_name` in the request payload. Obtain the cluster name from the Data Sources section as follows. ![Data Sources](/_astro/levitate-data-sources.CsdDzD6W_G1o5P.webp) ![Copy the Data Source Name](/_astro/copy-data-source-name.BYrCexzM_22LKj6.webp) Note Store events and related metrics in same Last9 cluster for automatic correlation between them. ## Visualize Change Events in Grafana The change events can be visualized in Grafana just like any other metrics. ![Change Events in Grafana](/_astro/change_events_in_grafana.BVfFcQfz_ZOjvKj.webp) ## Native Integrations for Change Events * [Prodvana](/docs/integrations-prodvana/) * [LaunchDarkly](/docs/integrations-launchdarkly/) *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Clusters > Overview of Clusters ## Cluster Overview To start using Last9 you need at least one Cluster, from which you read and write metric data. In this document, we dive deep into all things related to a cluster. To get up and running fast, see our [Quick Start Guide](/docs/onboard/). Think of a Cluster as a logically separated, Prometheus API-compatible data source for all your metric data. You can create as many Clusters as you want, the number of clusters has no impact on your billing. It is typically recommended that you create Clusters for each of your environments. Example: Production Cluster, Staging Cluster, etc. ## Creating a New Cluster To create a Cluster: 1. Navigate to **Home** → **Levitate** ![Creating a Cluster 1](/_astro/cluster-1.Dd0S_MAv_ZsUNnO.webp) 2. Click the *Launch Cluster* button to launch the setup wizard ![Creating a Cluster 2](/_astro/cluster-2.qDzCVIem_1dVf3N.webp) 3. Select the AWS region you would like to deploy the cluster in. This should ideally be the same region as your application 4. Give the Cluster a descriptive name 5. Optionally, add a description which will be displayed on the Cluster Overview screen ![Creating a Cluster 3](/_astro/cluster-3.D6cLSwhK_Z1Lh0Ur.webp) 6. Press the **Create** to create your new Cluster As the Cluster gets created, you will be presented with an access token that is automatically created. This token is required to start writing & reading data to the Cluster. Tokens are only shown once, so please copy or download credentials (or you can always create another token from Cluster settings). ![Creating a Cluster 4](/_astro/cluster-4.BVJSDX1T_Z11IwG9.webp) Your new Cluster is now ready to receive metrics. 7. To start writing data to this new Cluster, please follow the Write Data steps start writing data from Kubernetes, Prometheus, AWS/CloudStream or quickly try out by running a local demo environment ![Creating a Cluster 5](/_astro/cluster-5.R9lM6D_P_Z20OmFo.webp) Using the **Test Config** button you can verify if your Last9 cluster has started receiving data. Click the **Next** button to start reading data/querying metrics from your Cluster. See our guides on how you can send data from [Prometheus](#), [OpenTelemetry](#), [VMAgent](#), or other various s[Integrations](#) supported. 8. To start reading metrics from your new Cluster you can use Managed Grafana which comes included with every plan. Alternatively, you can use the provided **Read URL** to read data using any Prometheus HTTP API Compatible tool like AlertManager, your own Grafana, KEDA, etc. See the guide on [how to connect your own Grafana](/docs/grafana-config/) with a Last9 cluster. ![Creating a Cluster 6](/_astro/cluster-6.DHUZzhn5_1WMJtu.webp) *** ## Managing a Cluster ### Cluster Usage and Performance Last9 provides the following tools to observe the Cluster’s performance: * [Cluster Health Dashboard](#cluster-health-dashboard) - Performance & usage metrics report * [Query Logs](#query-logs) - Identify slow-running queries #### Cluster Usage ![Cluster Usage](/_astro/cluster-1.Dd0S_MAv_ZsUNnO.webp) Usage for each cluster is reported in *Samples Ingested*. A **sample** refers to a single data point in a time series. Usage for each Cluster can be viewed from the Cluster’s details page. For more granular and historical usage, see the Cluster Health dashboard’s Sample Ingested panel. #### Cluster Quotas There are no *per-cluster* limits in Last9. You are billed for usage across all Clusters combined. The ingestion rate, read query rate, and data retention quotas are applied for all the data across all clusters. #### Default Cluster Quotas Last9’s default cluster quotas are fairly generous. In certain cases, keeping in mind performance and cost impacts, we may be able to increase a quota after a discussion with your team. #### Write Quotas | Type | Base Quota | Reset Period | Note | | -------------------------------------------- | ---------- | ------------ | ------------------------ | | Per Time Series Cardinality | 1M | Per Hour | Can be raised on request | | Per Time Series Cardinality | 20M | Per Day | Can be raised on request | | Streaming Aggregation Cardinality | 3M | Per Hour | Can be raised on request | | Ingestion Concurrency | 20K | Per Second | Can be raised on request | | Number of Metrics Aggregated in one Pipeline | 1 Metric | Per Query | Cannot be changed | #### Read Quotas | Type | Base Quota | Note | | ------------------------------------------ | ---------- | ------------------------ | | Time Series Scanned Per Query — Blaze Tier | 5M | Cannot be changed | | Time Series Scanned Per Query — Hot Tier | 10M | Cannot be changed | | Samples Scanned Per Query | 100M | Cannot be changed | | Query Time Range — Blaze Tier | 2 Hours | Can be raised on request | | Query Time Range — Hot Tier | 35 Days | Can be raised on request | If you wish to change your quotas, please raise a request by emailing us on: ### Cluster Health Dashboard Every Last9 Cluster comes with its own Health dashboard. To view the Health dashboard, navigate to the Cluster details page and click on the **View Health** link in the performance card. ![Cluster Health - 1](/_astro/manage-cluster-2.I73qDeRU_1xnqyS.webp) The following Cluster Performance Metrics are available in the health dashboard: ![Cluster Health - 2](/_astro/manage-cluster-3.CclarE7r_Z13qtGu.webp) * **Write Success** - Total successful write requests * **Write Error** - Total failed write requests * **Samples Ingested** - Total number of samples ingested * **Write Availability** - Percentage of write requests successful * **Write Latency** - Write request latency * **Lag** - Pending samples waiting to be indexed (in bytes) * **Read Success** - Total successful write requests * **Read Errors** - Total failed read requests * **Cardinality Limited** - Metrics whose cardinality has been limited * **Read Latency** - Query Latency * **Cardinality Limiter (Early Warning)** - Metrics whose cardinality is about to be limited * **Bytes Dropped** - Samples permanently failed to be indexed (in bytes) *** ## Query Logs Query Logs helps identify slow-running queries so that you can debug and optimize your PromQL. Query Logs displays slow queries in the last 24 hours, which were successfully executed but have taken more than 1000ms (ie one second) to execute. ![Query Logs](/_astro/query-logs-1.B5JN_Icv_Z1wDN7A.webp) When a slow query is identified the following details are displayed: * **Timestamp** - Time when the query was executed * **Query** - PromQL along with the query’s time range and query resolution step width * **Latency** - approximate time taken for the query to execute * **Token** Name - the name of the token used to query * **Tier** - storage tier that was used for this query *** ## Cluster Settings ### Tokens Tokens provide a mechanism for access management for your clients. We generate a default token when the Cluster is created for the first time #### Creating a New Token 1. Navigate to the Cluster that you wish to create a token for: **Control Plane** → **Tokens** ![Create Token 1](/_astro/create-token-1.C_swa1Pm_2ipltk.webp) 2. Click **New Token** ![Create Token 2](/_astro/create-token-2.CARPMHuN_QBVSl.webp) 3. Provide a descriptive **Token Name** the access **Scope** (Write Only, Read Only, Read & Write) and click **Create** 4. Copy the generated token since it will be visible only once. This token can now be used along with the Read or Write URL (depending on the Scope selected) ![Create Token 3](/_astro/create-token-3.CmM_miQt_2cGNzL.webp) #### Delete a Token To delete/revoke a token: 1. Navigate to the Cluster that you wish to revoke a token from: \*\*Control Plane → \*\*Tokens\*\* 2. Click the **…** button and select Delete ![Delete Token 1](/_astro/delete-token-1._7jdZRI9_24SIGH.webp) Note: * This action cannot be undone, once deleted tokens cannot be recovered * Tokens can only be deleted by your organization’s admin ### Write & Read Data Refer to the list of available [Integrations](/docs/integrations/) that can be used to start writing and reading data to a Last9 Cluster. ### Access Policies Last9 has built-in data-tiering capabilities based on retention policies. Access policies let you define policies to control which token or client can query a specified data tier. See our in-depth guide on how you can leverage this powerful feature - Guide on Access Policies #### To define a new access policy: 1. Navigate to **Control Plane** → **Access Policies** ![Access Token 1](/_astro/access-tokens-1.BTtwTfQM_ZujuMP.webp) Every cluster comes with a default access policy pre-configured. 2. To define a new policy click the Create button ![Access Token 2](/_astro/access-tokens-2.C57iLt4V_14C8nS.webp) Provide the following details: * Policy Name: Give a descriptive name for this access policy * Token: Select a specific Token for which this access policy is applied or select *Any* * Query Client: We can identify traffic from known clients or select Any for the policy to apply from any client * Tier: Select the Tier from which the queries will be served for this policy And click **Create** 3. Your new access policy will be applied instantly ![Access Token 3](/_astro/access-tokens-3.D4vgcaue_Z3f4GQ.webp) #### To delete an Access Policy : 1. Select the **…** button beside the access policy you wish to delete ![Access Token 4](/_astro/access-tokens-4.BPMC_CV__Z2nVo4L.webp) 2. Select **Delete** from the menu Do Note: * Access policies can only be deleted by the admin user(s) of your org * Deleting an access policy may limit or lock access for a client or token, please be mindful before deleting ### Macros Macros lets you define PromQL queries as reusable functions and use them as abstracted metric names across Grafana, Alert Manager, or the CLI We cover how to define and use Macros in detail in [guide on PromQL Macros](/docs/promql-macros/) #### Enabling Macros: 1. Navigate to **Control Plane** → **Macros** ![Macros 1](/_astro/macros-1.MBp7cJYo_JHoUi.webp) 2. Write/Paste your Macro function and Click Save ![Macros 2](/_astro/macros-2.BA9iCfc0_ZrzDYG.webp) We perform validation once you click Save ![Macros 3](/_astro/macros-3.BeLR0Lz5_1PTsKs.webp) Once validated, we will save your Macro function. Do note that it will take upto 5 minutes for new Macros to be available for querying ![Macros 4](/_astro/macros-4.D2puijFr_k9Ug9.webp) #### Deleting Macros: 1. Navigate to **Control Plane** → **Macros** ![Macros 5](/_astro/macros-5.DAarWIio_ZVrX3V.webp) 2. Click the delete icon and click confirm Note: * Deleted Macros will impact any queries and dashboards where the macro functions were used * Deleted Macros may be available for queries up to 5 minutes after they have been deleted ### Streaming Aggregation Streaming aggregation is a powerful metric cardinality that is built-in with Last9. Refer to our [Guide on Streaming Aggregation](/docs/streaming-aggregations/) for an in-depth tutorial # Configuring an Alert > A step-by-step guide to configuring an alert rule in an Alert Group ![Alert Rule Configuration](/_astro/alert-rule-config-form.Ba66EPQi_Z1udpcU.webp) ## Pre-requisites To be useful, each Alert Group needs Alert Rules to enable monitoring the health of the Alert Group. If you’ve created an Alert Group by importing from a Managed Grafana dashboard, indicators will already exist based on the PromQLs used in the dashboard. Else, you’ll need to create a new indicator. Indicators are required to be selected while creating an alert rule. ## Rule Configuration ### Rule Name Short and simple is good for quick identification when a notification is triggered. Keep in mind that Alert Rule names are shown along with the Alert Group names in the notification. If you want to add more context, use the Rule Description field in the [Annotations](#annotations-optional) section. ### Select Indicator Alert Rules are run against an Indicator. If you’ve imported a Grafana dashboard, Indicators are auto-generated based on the dashboard panel PromQLs, else you’ll have to first add the relevant Indicator. Indicators inherit the ALert Group’s datasource, but can also have their own as an override. ### Edit Label Filter (optional) Indicator queries support PromQL variables. If the query contains a variable, you’re able to specify a specific label filter for the Alert Rule to be triggered only for that. ### Select Alerting Algorithm ```plaintext By default, only Static Threshold is enabled. If you would like to use our [Anomaly Detection](/docs/anomalous-pattern-detection-guide/) algorithms, please write to us at [support@last9.io](mailto:support@last9.io). ``` ### Set Threshold This section is only visible if you’ve selected Static Threshold. You can select an operator and set the value of the threshold. The alert rule will only trigger when it matches the threshold’s criteria. ### Configure Alert Sensitivity Depending on the algorithm selected, the options may vary here. In case of Static Threshold, you can specify the no. of bad minutes the rule needs to be triggered out of no. of total minutes before it appears as firing in the Alert Monitor or send a notification. In case of the Anomaly Detection algorithms, you can specify a value ranging from 0 to 10, and decimal values are accepted. Lower the value, the more sensitive the algorithm will be. You can click on the backtest button in the preview panel to open the indicator and algorithm calculations in Grafana and see at what values will the algorithm trigger. Play around with the query in Grafana to find a balance that you’re comfortable with. ### Severity Level Select if this rule, when firing, should be treated as a threat or a breach. This is helpful as additional metadata for integrations like PagerDuty and OpsGenie to determine severity levels and route accordingly. ### Notification Grouping When the alert rule is firing for multiple labelsets, it may lead to noise. For such case, you may group notifications to a single instance. Such notifications do call out the no. of labelsets and values the alert rule is firing for. ### Annotations (optional) Annotations are used to include additional meta data to alert notifications for an alert rule. For example, to help your team members better understand the context of an alert notification, you may want to include a brief description outlining the behavior or circumstances when the rule should’ve triggered. Or, include a runbook link for quickly reaching the next to-do steps for your team member. #### Dynamic Annotations Annotations can be supercharged by inserting dynamic values using template variable. Currently, the following variables are supported: * **Labels**, where it is the value of the respective label of timeeries under alert, with the syntax `{{ $labels. }}` or `{{ .Labels. }}` * **Value**, where it is the worst value of timeseries under alert, with the syntax `{{ $value }}` or `{{ .Value }}` Template variables can be used alongside plain text as well. For example, `Service name is {{ $labels.service }}`. Usage of multiple variables in a field is also supported. Spaces in the template variable syntax are optional. Template variables can be used in any of the annotation fields — the rule description, runbook, or even custom annotations. Considerations: * Apart from the labels in the metric’s timeseries, the labels of the Alert Group can also be referenced in template variables. In case the labels match, preference to the metric’s timeseries is given * In case a label value is not present, the template variable is shown as is * In case the template variable syntax is incorrect, the UI will display an error. Please note the supported variables above and their respective syntaxes * Notifications with Dynamic Annotations display these dynamic values. In case of [grouped notifications](#notification-grouping), *Labels* are shown as a count of all label values and *Values* are shown as a P99 of all the worst values ##### **Sample usage of Dynamic Annotations with Splunk** A custom annotation named `splunk_debug_url` is added to an alert rule whose value is configured as `https://search.splunk.com/?service={{$labels.service}}&stack={{$labels.stack}}`. When alerts are generated for one or more timeseries, the values of the variable in this custom annotation will be interpolated using the labels in the timeseries. For example, `service=billing` and `stack=my-org` will lead to link `https://search.splunk.com/?service=billing&stack=my-org` and so on. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Send data using AWS PrivateLink > This guide walks you through how to configure AWS PrivateLink for use with Last9 will send read and write requests over the private network. The process consists of configuring an internal endpoint in your VPC, which can talk to the Last9 endpoint without going via a public network. ## Setup ### Get Last9 PrivateLink Service name Currently, Last9 supports three regions - `eu-west-1`, `us-east-1` and `ap-south-1`. You can choose the service name from the following list. Note This applies only to customers hosted on app.last9.io. If you use an on-premise hosted Last9 offering, the Last9 Customer Success team will provide the right endpoint details. Please contact us at . | Region | Type | Service Name | DNS | | ---------------- | ---------- | -------------------------------------------------------------- | ------------------------------------- | | `eu-west-1` | Write | `com.amazonaws.vpce.eu-west-1.vpce-svc-0381436dec59e8895` | `https://app-tsdb-euw1.last9.io` | | `eu-west-1` | Read | `com.amazonaws.vpce.eu-west-1.vpce-svc-0a101a671bd82e759` | `https://read-app-tsdb-euw1.last9.io` | | `us-east-1` | Write | `com.amazonaws.vpce.us-east-1.vpce-svc-0eda8f4ecd25af01f` | `https://app-tsdb-use1.last9.io` | | `us-east-1` | Read | `com.amazonaws.vpce.us-east-1.vpce-svc-0fb0502d1f58dc1bb` | `https://read-app-tsdb-use1.last9.io` | | `ap-south-1` | Write | `com.amazonaws.vpce.ap-south-1.vpce-svc-0240de6d26b096123` | `https://app-tsdb.last9.io` | | `ap-south-1` | Read | `com.amazonaws.vpce.ap-south-1.vpce-svc-01c9bdbb02e34fe2c` | `https://read-app-tsdb.last9.io` | | `us-east-1` | Read/Write | `com.amazonaws.vpce.us-east-1.vpce-svc-06931cb6d013ad0fc` | `https://otlp.last9.io` | | `ap-south-1` | Read/Write | `com.amazonaws.vpce.ap-south-1.vpce-svc-026c2a0d782b7ef32` | `https://otlp-aps1.last9.io` | | `ap-southeast-1` | Read/Write | `com.amazonaws.vpce.ap-southeast-1.vpce-svc-0f8965c5096ad1f65` | `https://otlp-apse1.last9.io` | ### Create an endpoint in your VPC Navigate to the VPC section and select `Endpoints` in the left sidebar. ![Create an endpoint in your VPC](/_astro/90780ad-image.Cf7PayMW_Z21IRI7.webp) Click on Create Endpoint in the top right corner. Enter an appropriate `Name tag`, and Select `Other endpoint services `. ![Add Endpoint Settings](/_astro/ae9c0c2-image.DW5UwyhZ_2ftNip.webp) Enter the Service name provided by Last9 in the earlier step and Click `Verify service`. ### Additional settings * Select the VPC for the endpoint, and in additional settings, check the box for Enable DNS Name. ![Enable DNS Name](/_astro/4c0d90f-image.DUAlPWFz_1KAGDt.webp) * Select the subnets where you are running the workloads that will read or write to Last9. ![Select Subnets](/_astro/2c7182a-image.CTQEaoOU_Z2anEGg.webp) * Attach a security group to the endpoint; the endpoint must allow traffic on port 443 from your origin VPC or the specific IP address where the requests originate. * Note: For any on-premise Last9 setups, the endpoint must allow traffic for ports 80 and 443. ![Advanced Settings](/_astro/fbd6be1-image.BpwmRaan_Z15z88R.webp) ## Verification After the Endpoint status becomes available, validate that the DNS records change from any machine inside a subnet in your VPC for which the Endpoint is enabled. For example, `dig app-tsdb.last9.io` should not return the public IP address; instead, it should return the private IP address of the Endpoint we just created. ![Verify Privatelink setup](/_astro/7ac4516-image.mCum88CI_Z18rdbF.webp) That’s all. Happy sending logs, metrics, and traces to Last9 using PrivateLink. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Control Plane > Manage your data, its configurations, and its lifecycle. ## Introduction ![Control Plane](/_astro/control-plane.BSV36YD6_VcsYM.webp) Last9’s Control Plane offers a first-class citizen experience for developers to manage their data, its settings, and its lifecycle. This document provides an overview of the main features and functionalities available in the Control Plane user interface. ## Tools and Configurations [input](/docs/control-plane-ingestion/) ## [Ingestion](/docs/control-plane-ingestion/) [Configurations for how your data is ingested into Last9](/docs/control-plane-ingestion/) [folder\_zip](/docs/control-plane-storage/) ## [Storage](/docs/control-plane-storage/) [Defaults and controls for storing and using your telemetry data](/docs/control-plane-storage/) [query\_stats](/docs/control-plane-query/) ## [Query](/docs/control-plane-query/) [Configure query reusability, reads, and pattern match alerts.](/docs/control-plane-query/) [monitoring](/docs/control-plane-analytics/) ## [Analytics](/docs/control-plane-analytics/) [Understand and debug system usage and performance](/docs/control-plane-analytics/) *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Analytics > Control Plane tools to understand and debug system usage and performance. ## Cardinality Explorer ![Control Plane — Cardinality Explorer](/_astro/control-plane-cardinality-explorer.BI43P4d1_25pkXz.webp) While Last9 offers [superior defaults](/docs/managing-high-cardinality/) on per-metric per-day cardinality, you may need to identify the metrics and its labels that are impacted. Cardinality Explorer helps you understand how the cardinality for metrics and its labels is trending. This enables you to diagnose cardinality-related challenges with your metrics. [Read more](/docs/cardinality-explorer/) on how to use the Cardinality Explorer interface. ## Slow Query Logs ![Control Plane — Slow Query Logs](/_astro/control-plane-slow-query-logs.xhv7h3DP_Z2vVnVn.webp) Quickly identify which queries are taking the longest to debug and optimize them. These queries could be originating from either Last9’s alerting, managed Grafana explore/dasboards, or from your own read workfloads. You can change the latency values on the filter to see slower queries, but the minimum is queries taking longer than 1 second. By default, logs are displayed for the last 1 hour, but the window can be customized to a maximum of last 24 hours. ## Health Dashboard ![Control Plane — Health Dashboard](/_astro/control-plane-health-dashboard.B43nJB5j_Z2sQGsw.webp) While Last9 provides an SLA of 99.9% writes and 99.5% reads, you can also view the health of Last9 by clicking on Health Dashboard. You are redirected to a system-generated Grafana dashboard with panels for availability, successes/errors, latencies, lags, bytes dropped, and more. ## Usage ![Control Plane — Usage](/_astro/control-plane-usage.DT30iXYA_Zu0sb3.webp) View the ingestion trend and usage breakdown for your telemetry data by total and types (log, span, and metric events). By default, a summary of the last 30 days is displayed. You can select an area on the chart to zoom in or you can click on the icon in each date row of the breakdown table to view an hourly breakdown. You can also click on “Download CSV” to get a hourly breakdown for the last 30 days. ### What is an Event? Usage numbers are shown as Total Events. Each log line, trace span, and metric sample that is ingested by Last9 is considered an event. The number of events is calculated at the ingestion layer, before the data is used by any of the ingestion pipelines like [Streaming Aggregation](/docs/control-plane-ingestion/#streaming-aggregations), [Sensitive Data](/docs/control-plane-ingestion/#sensitive-data), [Forward](/docs/control-plane-ingestion/#forward), and [Drop](/docs/control-plane-ingestion/#drop). *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Cold Storage > Learn how to configure AWS S3 cold storage for log archival and cost optimization with Last9 Automatically archive logs older than 14 days to S3 for cost-effective storage and on-demand rehydration. Note The default log retention period in Last9 is 14 days. To modify this retention period for your specific needs, please reach out to our support team at . ![Control Plane](/_astro/configure-cold-storage.BcBPfo9X_Z1z1F6Y.webp) ## Setup 1. **Create IAM Role** with permissions to the S3 bucket: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:ListBucket", "s3express:CreateSession" ], "Resource": [ "arn:aws:s3:::", "arn:aws:s3:::/*" ] } ] } ``` 2. **Add Trust Relationship**: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "s3.amazonaws.com", "AWS": "arn:aws:iam::" }, "Action": "sts:AssumeRole" } ] } ``` 3. Make sure that the role session expirry is set to **minimum 4 hours**. Note Contact Last9 support for LAST9\_STORAGE\_USER ARN. 4. **Enable Cold Storage** Configure your bucket name and role ARN in [Cold Storage](https://app.last9.io/control-plane/cold-storage). 5. Once the cold storage is enabled, you can rehydrate the logs on demand. Read the [Rehydration](/docs/control-plane-rehydration/) guide for more details. ## Service-Level Backup Configuration You can now configure which services you want to back up to your cold storage. When configuring, you have three options: 1. **Default:** With service-level backup not enabled, data is backed up with index-level granularity only, meaning you cannot rehydrate individual services. 2. **All Services:** All services are backed up and you can rehydrate individual services. 3. **Only Selected Services:** Specify which services you want to back up, giving you more granular control. Note While creating a Rehydrated Index, if service-level backup is available for your selected time range, you can choose which specific services to rehydrate. ### Benefits of Service-Level Backup * **Targeted cost optimization:** Save money where it makes sense without compromising on critical services * **Service-appropriate retention:** Match data lifecycle to each service’s actual needs * **Strategic resource allocation:** Invest observability resources based on service priority * **Simplified compliance:** Apply different retention rules only where legally necessary *** ## Troubleshooting Need help? Join our [Discord](https://discord.com/invite/Q3p2EEucx9) or email . # Access Cold Storage Logs via AWS Athena > Learn how to query and analyze your cold storage logs in S3 using AWS Athena's SQL interface Last9 automatically backs up your logs to a configured S3 bucket via [Cold Storage](/docs/control-plane-cold-storage). This doc will show you how to access and query these archived logs using AWS Athena, allowing you to perform powerful SQL-based analysis on your historical data. ## Create a database on Athena ```sql CREATE DATABASE last9; ``` ## Create a table in the database ```sql CREATE EXTERNAL TABLE last9.logs ( `timestamp` bigint, `traceid` string, `spanid` string, `traceflags` int, `severitytext` string, `severitynumber` int, `servicename` string, `body` string, `resourceschemaurl` string, `resourceattributes` array>, `scopeschemaurl` string, `scopename` string, `scopeversion` string, `scopeattributes` array, `logattributes` array> ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://customer_s3_bucket/snappy-files/' TBLPROPERTIES ('parquet.compression'='SNAPPY'); ``` ## Export AWS Profile Before running the script, ensure your AWS profile is properly configured with appropriate permissions to access both your source and destination S3 buckets, as well as Athena. ## Move logs to Athena from backup S3 bucket Save the following Python script as `insert_data_into_athena.py`. ```python import argparse import boto3 import os import pandas as pd import tempfile import lz4.frame from botocore.exceptions import ClientError class ParquetProcessor: def __init__(self): """Initialize the processor using AWS credentials from environment""" self.s3_client = boto3.client('s3') self.athena_client = boto3.client('athena') self.temp_dir = tempfile.mkdtemp() def download_from_s3(self, bucket_name, prefix): """Download all .parquet.lz4 files from the specified S3 path""" downloaded_files = [] try: print(f"Searching in bucket: {bucket_name}") print(f"Using prefix: {prefix}") paginator = self.s3_client.get_paginator('list_objects_v2') for page in paginator.paginate(Bucket=bucket_name, Prefix=prefix): if 'Contents' in page: print("\nObjects found:") for obj in page['Contents']: print(f"Key: {obj['Key']}") if obj['Key'].endswith('.parquet.lz4'): print(f"Found matching file: {obj['Key']}") local_file = os.path.join(self.temp_dir, os.path.basename(obj['Key'])) self.s3_client.download_file(bucket_name, obj['Key'], local_file) downloaded_files.append(local_file) else: print("No 'Contents' in this page") if not downloaded_files: print("No .parquet.lz4 files were found") return downloaded_files except ClientError as e: print(f"Error downloading files: {e}") return [] def decompress_lz4(self, file_path): """Decompress .parquet.lz4 file to .parquet""" try: output_file = file_path.replace('.lz4', '') print(f"Decompressing {file_path} to {output_file}") with open(file_path, 'rb') as compressed: compressed_data = compressed.read() decompressed_data = lz4.frame.decompress(compressed_data) with open(output_file, 'wb') as decompressed: decompressed.write(decompressed_data) os.remove(file_path) print(f"Successfully decompressed to {output_file}") return output_file except Exception as e: print(f"Error decompressing file {file_path}: {e}") return None def convert_to_snappy(self, file_path): """Convert decompressed parquet to Snappy compression""" try: df = pd.read_parquet(file_path) df.to_parquet(file_path, compression='snappy') return file_path except Exception as e: print(f"Error converting file {file_path}: {e}") return None def upload_to_s3(self, bucket, prefix, file_path): """Upload a file to S3""" try: file_name = os.path.basename(file_path) s3_key = os.path.join(prefix.rstrip('/'), file_name) print(f"Uploading {file_path} to s3://{bucket}/{s3_key}") self.s3_client.upload_file(file_path, bucket, s3_key) return True except Exception as e: print(f"Error uploading file: {e}") return False def cleanup_local_files(self, snappy_files): """Clean up temporary local files""" for file in snappy_files: try: os.remove(file) except Exception as e: print(f"Error removing file {file}: {e}") os.rmdir(self.temp_dir) def process_files(self, source_bucket, source_prefix, snappy_destination, athena_results_location=None): """Main process to handle the complete workflow""" # Download LZ4 files lz4_files = self.download_from_s3(source_bucket, source_prefix) if not lz4_files: print("No .parquet.lz4 files found") return # Decompress LZ4 files decompressed_files = [] for file in lz4_files: decompressed_file = self.decompress_lz4(file) if decompressed_file: decompressed_files.append(decompressed_file) if not decompressed_files: print("No files were successfully decompressed") return # Convert to Snappy snappy_files = [] for file in decompressed_files: snappy_file = self.convert_to_snappy(file) if snappy_file: snappy_files.append(snappy_file) if not snappy_files: print("No files were successfully converted to Snappy") return # Upload to snappy destination dest_bucket = snappy_destination.split('//')[1].split('/')[0] dest_prefix = '/'.join(snappy_destination.split('//')[1].split('/')[1:]) for file in snappy_files: if not self.upload_to_s3(dest_bucket, dest_prefix, file): print(f"Failed to upload {file}") continue # Cleanup local files self.cleanup_local_files(snappy_files) print("Processing completed successfully") if __name__ == "__main__": parser = argparse.ArgumentParser(description='Process .parquet.lz4 files and upload to S3') # S3 and Athena configuration parser.add_argument('--source-bucket', required=True, help='Source S3 bucket name where parquet.lz4 (last9 saves archives)') parser.add_argument('--source-prefix', required=True, help='Source S3 prefix path where parquet.lz4 files are stored') parser.add_argument('--snappy-destination', required=True, help='S3 path for converted snappy files') parser.add_argument('--athena-results', required=True, help='S3 path for Athena query results') args = parser.parse_args() processor = ParquetProcessor() processor.process_files( source_bucket=args.source_bucket, source_prefix=args.source_prefix, snappy_destination=args.snappy_destination, athena_results_location=args.athena_results ) ``` The script `insert_data_into_athena.py` is used to process `.parquet.lz4` files from the backup bucket and upload them to a separate S3 location for processing in Athena. ### Help Command Run the following command to see all available options and parameters: ```bash python insert_data_into_athena.py --help ``` ### Usage ```plaintext usage: insert_data_into_athena.py [-h] --source-bucket SOURCE_BUCKET --source-prefix SOURCE_PREFIX --snappy-destination SNAPPY_DESTINATION --athena-results ATHENA_RESULTS Process .parquet.lz4 files and upload to S3 options: -h, --help show this help message and exit --source-bucket SOURCE_BUCKET Source S3 bucket name where parquet.lz4 (where Last9 saves backup files) --source-prefix SOURCE_PREFIX Source S3 prefix path where parquet.lz4 files are stored --snappy-destination SNAPPY_DESTINATION S3 path for converted snappy files --athena-results ATHENA_RESULTS S3 path for Athena query results ``` ### Example Command Here’s a sample command that processes files from your backup bucket to prepare them for Athena queries: ```bash python insert_data_into_athena.py \ --source-bucket last9_backup_bucket \ --source-prefix "path/to/file/" \ --snappy-destination "s3://customer_s3_bucket/snappy-files" \ --athena-results "s3://customer_s3_bucket/athena-results/" ``` In this example: * `last9_backup_bucket` is your source bucket containing the archived logs * `path/to/file/` is the directory path where your .parquet.lz4 files are located * `s3://customer_s3_bucket/snappy-files` is where the converted files will be stored * `s3://customer_s3_bucket/athena-results/` is where Athena will store query results ## Check result on Athena After the data has been uploaded, you can query it using Athena with the following SQL: ```sql SELECT * FROM last9.logs; ``` This will retrieve all logs from the `last9.logs` table, allowing you to verify that your data has been successfully uploaded and is accessible through Athena. ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Drop > Drop unwanted telemetry at ingestion layer using Control Plane ![Control Plane — Pipeline Sequence: Drop](/_astro/control-plane-pipeline-sequence-drop.CjNUosRc_Z22f3xD.webp) Order of Last9's pipeline processing. [Drop](https://app.last9.io/control-plane/drop) lets you discard unwanted telemetry that you don’t want to store and query, at runtime — no code changes, no redeploys, no policy updates. For example, if you need debug logs during an incident, just remove the drop rule. It’s a faster, simpler alternative to code-level governance. ## Create new drop rule Head to the Control Plane and create a new Drop Rule. ![Control Plane — New Drop Rule](/_astro/control-plane-drop-rule-new.CLRLc1gX_Z27Egd6.webp) You can configure rules with `==` and `!=` matching filters to drop incoming data. Do note, this data is not ingested and cannot be recovered as well. To verify the filters before saving the rule, you can click on “View in Dashboard”. ![Control Plane — Drop Rule](/_astro/control-plane-drop-rule.DeJ40Pav_f7F4w.webp) ## Supported Telemetry Types Supported telemetry types are logs, metrics and traces. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Forward > Forward telemetry data to object storage backends such as AWS S3 ![Control Plane — Pipeline Sequence: Forward](/_astro/control-plane-pipeline-sequence-forward.B9gohCZ-_1c3LrK.webp) Order of Last9's pipeline processing. ## Forward telemetry data Last9 control plane allows forwarding telemetry data to object storage such as AWS S3 without storing it. Forwarding capability is useful for forwarding certain logs that are not useful for frequent querying but only need to be stored for compliance. Note The forwarded telemetry is stored in `.gz` format in the AWS S3 bucket. ## Configure AWS S3 backend Before creating the forward rule, configure an AWS S3 bucket where Last9 can forward telemetry data using [IAM AssumeRole](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html). ![Configure Cold Storage](/_astro/control-plane-cold-storage.BBvOMXtr_Z2qRNX6.webp) ## Create a new forward rule Head to the Control Plane and create a new Forward Rule. ![Control Plane — New Forward Rule](/_astro/control-plane-forward-logs.ieZd1b4m_8Vnbp.webp) To drop incoming data, you can configure rules with `==` and `!=` matching filters. Do note that this data is not ingested and cannot be recovered. To verify the filters before saving the rule, you can click on “View in Dashboard.” ![Control Plane — Add New Forward Rule](/_astro/control-plane-create-new-forward-rule.DNzWVYCD_ZcrBNL.webp) ## Supported Telemetry data types Supported telemetry types are logs and traces. The supported storage backend is AWS S3. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Ingestion > Control Plane tools to configure for how your data is ingested into Last9. Ingestion is the second pillar of our telemetry data platform, Last9. Once you‘ve [instrumented](/docs/integrations/) your system, controls over how your telemetry data flows in to Last9 do a fair bit of heavy lifting. ## Ingestion Tokens ![Control Plane — Ingestion Tokens](/_astro/control-plane-ingestion-tokens.QRukipwK_1C7Ybv.webp) Ingestion Tokens authenticate your applications when sending telemetry data to Last9. These tokens control what data can be sent and from which origins, ensuring secure data collection. A system-generated ingestion token is created when you signup — this token is used in the setup wizard to help you configure sending data to Last9. System-generated tokens cannot be deleted. ## Access Policies ![Control Plane — Access Policies](/_astro/control-plane-access-policies.BICsVlNR_1E8uT9.webp) Setup how various clients access your data — depending on the client type and token used, you can control from which tier (blaze, hot, cold) your data is queried. We recommend alerting workloads to always use the Blaze Tier and reporting workfloads to use the Cold Tier. [Read more](/docs/access-policies/) on how to configure these policies. ## Streaming Aggregations ![Control Plane — Streaming Aggregation](/_astro/control-plane-streaming-aggregation.D5g_Gcp6_16wumQ.webp) Streaming Aggregations allow you to transform data in real-time at the ingestion layer before it is stored in Last9. They enable you to generate scoped metrics on runtime without any instrumentation changes and improve performance of your read queries by controlling cardinality of the new metrics. [Read more](/docs/streaming-aggregations/) on how to configure these aggregations. *** Note The following configurations are applied in the same sequence as presented. Your telemetry data is first scanned for sensitive data, then matching data is forwarded, and then any matching data is dropped. These configurations allow you to not make any instrumentation level changes and give you a more run-time control. ## Sensitive Data ![Control Plane — New Sensitive Data Rule](/_astro/control-plane-sensitive-data-modal.Bzy1azSp_prYSY.webp) Redact sensitive data from your telemetry data at time of ingestion. Currently supported: * Telemetry Type: Logs * Actions: Redact (default), No Action Last9 provides built-in scan rules for PII like emails, phone numbers, and credit card numbers. While the default action is to redact, you can also choose to take no action. This is particularly helpful when you just want to attach additional labels to the telemetry. Configured rules are applied in a sequential order. Once saved, you can drag-and-drop to reorder the rules. ## Forward ![Control Plane — New Forward Rule](/_astro/control-plane-forward-modal.p6KVChYh_Zcbyvx.webp) While data after applicable retention periods can be moved to your configured S3 bucket for [Cold Storage](/docs/control-plane-storage/#cold-storage), you can also configure rules with `==` and `!=` matching filters to forward incoming data directly to your cold storage without being ingested and stored. To verify the filters before saving the rule, you can click on “View in Dashboard”. Do note, this data will not be available for querying when forwarded, but once [rehydrated](/docs/control-plane-storage/#rehydration), it can be queried. Supported telemetry types: Logs and Traces. ## Drop ![Control Plane — New Drop Rule](/_astro/control-plane-drop-modal.Bl1FdNj2_Z1N2xOc.webp) You can configure rules with `==` and `!=` matching filters to drop incoming data. Do note, this data is not ingested and cannot be recovered as well. To verify the filters before saving the rule, you can click on “View in Dashboard”. Supported telemetry types: Logs, Metrics and Traces. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Query > Control Plane tools for query-level configurations. ## Macros ![Control Plane — Macros](/_astro/control-plane-macros.BvHN4N13_2u6q8k.webp) They work in a way that is similar to how SQL developers use stored procedures. Macros take full advantage of the time-tested best practices of functions, abstractions, and reusability to replace cumbersome and error-prone methods. Simplify your PromQLs that are reused often to avoid repition of code, and improve abstractions and readability. [Read more](/docs/promql-macros/) on how to configure these macros. ## Scrape Interval Set this to the typical scrape and evaluation interval configured in your agent’s config file. If you set this to a greater value than your agent’s config file interval, the embedded Grafana in Explore will evaluate the data according to this interval and you will see less data points. Notes: * Defaults to 1m. * This does not change your agent’s scrape interval. ## Read Data ![Control Plane — Read Data](/_astro/control-plane-read-data.DmxmJe-8_QRuCH.webp) If you’re looking to use your stored telemetry data outside of Last9’s [Alerting](https://app.last9.io/alert-studio) or [Managed Grafana](https://app.last9.io/explore), you can refer to the Read Data settings to configure your choice of visualization tool. For additional settings on how to configure your own Grafana to use Last9 as a datasource, [read this](/docs/grafana-config/). ## Scheduled Search ![Control Plane — Scheduled Search](/_astro/control-plane-scheduled-search.BzqpKP_p_1RqEv7.webp) Create periodic searches on telemetry data and set alerts when patterns are found or missing. [Read more](/docs/scheduled-search/) on how configure these alerts. Supported telemetry types: Logs, and Traces coming soon. ## Query Tokens ![Control Plane — Query Tokens](/_astro/control-plane-query-tokens.DlaYeCpX_ZTj70z.webp) Query Tokens provide read-only access to your telemetry data for external visualization tools like Grafana, alerting systems, and custom applications. A system-generated query token is created when you signup — this token is used to set up dashboard templates, including the Health Dashboard. System-generated tokens cannot be deleted. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Rehydration > Learn how to rehydrate logs from AWS S3 cold storage to Last9 Rehydrate logs from AWS S3 cold storage to Last9 on demand. Note 1. Make sure that the [cold storage](/docs/control-plane-cold-storage/) is enabled before rehydrating the logs. 2. While creating a Rehydrated Index, if service-level backup is available for your selected time range, you can choose which specific services to rehydrate. ## Steps to Rehydrate Logs 1. Go to [Rehydration](https://app.last9.io/control-plane/rehydration) and click on “New Rehydrated Index” 2. Select the time range for rehydration. ![Rehydrate Logs](/_astro/rehydrate-logs-on-demand.OpxXPApp_1nCyUK.webp) 3. (Optional) Add email for completion notification. ![Rehydrate Logs](/_astro/rehydrate-logs-on-demand.OpxXPApp_1nCyUK.webp) 4. Click **Rehydrate**. The job will appear as **pending** in the Rehydration tab. ![Rehydrate Logs Indexes](/_astro/rehydration-list.CiY3gH5Z_2jPYRv.webp) 5. Once complete, the status changes to **completed**. ![Rehydrated Logs Available](/_astro/rehydrated-logs-available.D4MPqL7g_Zxksdg.webp) 6. Click **Query** to access the logs. ![Access Rehydrated Logs](/_astro/access-rehydrated-logs.DYwZC12P_1UufaX.webp) *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Remapping > Transform and standardize your logs data by extracting and mapping fields for better searchability and analysis. ![Control Plane — Pipeline Sequence: Remapping](/_astro/control-plane-pipeline-sequence-remapping.B2HhrwIf_Z1uuIak.webp) Order of Last9’s pipeline processing. [Remapping](https://app.last9.io/control-plane/remapping) allows you to standardize your logs data by extracting fields from log lines and mapping them to consistent formats. This powerful feature helps you normalize data across different services and sources, making your logs more searchable and easier to analyze. Tip Remapping configurations are real-time and do not need any instrumentation or code changes, nor any re-deployment. Remapping consists of two primary functions: 1. **Extract**: Pull specific fields or patterns from your log lines 2. **Map**: Transform extracted fields into standardized formats This capability is valuable for scenarios like: * Normalizing different service names across your infrastructure * Standardizing severity levels from various sources (ERROR, err, Fatal, 500) * Creating consistent environment labels (prod, production, prd) * Extracting structured data from JSON or pattern-based logs * Maintaining consistent field naming conventions ## Working with Remapping ### Extract ![Control Plane — New Drop Rule](/_astro/control-plane-remapping-extract.C9KjV6W7_o1cMk.webp) 1. Navigate to [Control Plane > Remapping](https://app.last9.io/control-plane/remapping) 2. Select the “Extract” tab 3. View existing extraction rules in the table showing: * Name: Descriptive name of the extraction rule * Method: JSON or Pattern Match extraction method * Scope: Which lines the extraction applies to * Fields/Pattern: Which fields or patterns to extract * Action: How the extracted data is handled (Upsert/Insert) * Active Since: When the rule was activated 4. Click ”+ NEW RULE” to create a new extraction rule #### Creating a New Extraction Rule 1. Select “Extraction Method”: * **JSON:** Extract fields from structured JSON logs * **Pattern Match:** Use regex patterns to extract fields from unstructured logs 2. Choose “Extraction Scope”: * **All Lines:** Apply extraction to every log line * **Lines that match:** Apply only to lines matching specific criteria 3. Field(s) to Extract: 1. For JSON method: * Select the field(s) to extract * Example fields: requestId, thread\_id, logger\_name, etc. 2. For Pattern Match method: * Enter the regex pattern in “Pattern to Extract” field * Example: `timeseries:\s*(?P\d+)` 4. Set “Action” to “Upsert” (update if exists, insert if not) or “Insert” 5. Choose “Extract Into” option: * **Log Attributes:** Adds fields to the log’s searchable attributes * **Resource Attributes:** Adds fields to the resource’s metadata 6. Optionally add a “Prefix” to extracted field names * Example: “ec2\_” would transform “id” to “ec2\_id” 7. Enter a descriptive “Rule Name” 8. Click “SAVE” to activate the rule ### Map ![Control Plane — New Drop Rule](/_astro/control-plane-remapping-map.CtiCrDBK_2bxmz0.webp) 1. Navigate to [Control Plane > Remapping](https://app.last9.io/control-plane/remapping) 2. Select the “Map” tab 3. View “Remap Fields” section with existing mappings 4. Map common fields to standardized formats: * **Service:** Map various service names to consistent values * Example: `attributes["service_name"]` * **Severity:** Map different log levels to standard severity * Example: `attributes["level"]` and `attributes["levelname"]` * **Deployment Environment:** Map environment indicators * Select from available attributes 5. Preview the mapping results in the “Preview (Last 2 mins)” section below * SERVICE: How service names appear after mapping * SEVERITY: Standardized severity levels * DEPLOYMENT ENV: Normalized environment names * LOG ATTRIBUTES: Other log details * RESOURCE ATTR: Resource-related information 6. After configuring mappings, click “SAVE” ## Example Use Cases 1. **Standardizing Service Names**: Map various service identifiers to consistent names * Raw values: “auth-svc”, “auth\_service”, “authentication” * Mapped to: “authentication-service” 2. **Normalizing Severity Levels**: Create consistent severity levels across sources * Raw values: “ERROR”, “err”, “Fatal”, “500” * Mapped to: “ERROR” 3. **Extracting Thread Information**: Pull thread details from logs for better filtering * Extract fields: thread\_id, thread\_name, thread\_priority * Makes thread-based troubleshooting more efficient 4. **Environment Consistency**: Standardize environment naming * Raw values: “dev”, “development”, “preprod”, “staging” * Mapped to consistent environment names ## Tips for Effective Remapping * **Start Simple:** Begin with the most common fields you search by * **Use Consistent Naming:** Follow a naming convention for all mapped fields * **Check Preview Results:** Use the preview section to verify your mappings work as expected * **Consider Extraction Order:** Remember that attributes will be looked up in the sequence they are entered * **Use JSON When Possible:** JSON extraction is more reliable for structured logs * **Test Pattern Matches:** Validate regex patterns before implementing them *** ## Troubleshooting If your remapping rules aren’t working as expected: 1. Check the extraction pattern syntax for errors 2. Verify field names match exactly what appears in your logs 3. Ensure your extraction scope is appropriate 4. Look at the preview to confirm data is flowing as expected 5. Try simplifying complex regex patterns Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Sensitive Data > Redact sensitive data from telemetry at ingestion layer using Control Plane ![Control Plane — Pipeline Sequence: Sensitive Data](/_astro/control-plane-pipeline-sequence-pii.CwEm9qV0_Z1vMCkA.webp) Order of Last9's pipeline processing. [Sensitive Data](https://app.last9.io/control-plane/sensitive-data) automatically detects and redacts personally identifiable information (PII) and other sensitive data from your telemetry at ingestion time — no code changes, no redeploys, no policy updates. For example, if customer phone numbers start appearing in your logs, just create a redaction rule to automatically replace them with asterisks. It’s a faster, simpler alternative to code-level data sanitization. ## Create new sensitive data rule Head to the Control Plane and create a new Sensitive Data Rule. ![Control Plane — New Sensitive Data Rule](/_astro/control-plane-pii-new.BeCRzBCz_ZKwSAp.webp) You can configure rules to scan for different types of Personal Identifiable Information (PII) including email addresses, phone numbers, and credit card numbers. When sensitive data is detected, you can choose to redact it (replace with asterisks) or take no action. Additional labels can be attached to matching samples for filtering and alerts. ![Control Plane — Sensitive Data Rules List](/_astro/control-plane-pii.BOH1lvIR_2rD87p.webp) ## Configuration Options ### Telemetry Data Currently supported telemetry type is **logs only**. All samples for the selected telemetry data will be scanned using the configured rules. ### Scan Rules Choose which types of sensitive data to detect: * **Email** - Detects email addresses in your log data * **Phone Number** - Identifies phone numbers across various formats * **Credit Card Number** - Finds credit card numbers and payment card data ### Actions Available actions for detected sensitive data: * **Redact** - Replace matching sensitive data with asterisks (**\***) * **No Action** - Detect and label but don’t modify the data ### Additional Labels Add custom labels (key:value pairs) to samples containing sensitive data. These labels can be used for filtering, alerting, and downstream processing. Common examples: * `sensitive_data: true` * `redacted: true` * `pii_type: phone` ## Supported Telemetry Types Currently supported telemetry type is **logs**. Support for metrics and traces will be added in future releases. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Storage > Control Plane tools for defaults and configuring how your telemetry data is stored and re-used. ## Sampling, Tiering, and Retention 1. **Sampling:** Last9 applies no sampling on your data to ensure an accurate representation of your system’s health. 2. **Data Tiering:** Last9 offers automated data tiering by default for your metrics data, including ones generated by the traces-to-metric and logs-to-metric pipelines. * *Blaze Tier:* Last 2 hours * *Hot Tier:* Last 28 days * *Cold Tier:* As per your Cold Storage 3. **Retention:** Metrics data is retained for 28 days by default with cold storage for backup. Logs and Traces data is retained for 14 days with cold storage for backup and on-demand rehydration. ## Cold Storage ![Control Plane](/_astro/control-plane-cold-storage.BBvOMXtr_Z2qRNX6.webp) For your logs and traces, Last9 currently offers an integration with your AWS S3 bucket to store data older than 15 days. This data will be available for on-demand rehydration to run queries and report on. Read the [Cold Storage](/docs/control-plane-cold-storage/) guide for more details. ## Rehydration The historical logs can be rehydrated for later consumption as needed. You can rehydrate based on a time range filter. Additionally for live debugging use cases, Last9 performs automatic rehydration of logs upto 10M log lines when the requested time range is beyond the retention period. Read the [Rehydration](/docs/control-plane-rehydration/) guide for more details. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Create a GCP service account with read-only access > Step by step guide to create a GCP service account with read-only access for monitoring ## Objective A service account is required to access GCP environment resources for monitoring. This doc provides step by step information on creating a GCP service account with monitoring read-only access. Once you have created the account, share the configuration with Last9 team so that the monitoring data can be sent to [Last9](https://last9.io/). ## Prerequisites * Go to the Google Cloud Console ([console.cloud.google.com](https://console.cloud.google.com/)) account * Select the project in which you want to create the service account * Click on the “IAM & Admin” tab in the left navigation menu * Click on the “Service Accounts” tab ![Create Service Account](/_astro/gcp-account-create-service-account-1.Cy7-X-0W_1QfijE.webp) Note For a GCP Project, ensure that you have access to create credentials and grant permissions. ## Creating Service Account * Click on the “Create Service Account” button * Enter following details 1. Service Account Name: `last9-monitor` 2. Service Account ID: `last9-monitor` 3. Service Account Description: *Allows Last9 API access to read resource metadata and monitoring data* * Click on the “Create and Continue” button ![Create Service Account Form](/_astro/gcp-service-account-create-form.BVZXqA16_Z1hp3SH.webp) ## Monitoring Viewer Role Grant Permissions to this Service Account with Role as `Monitoring Viewer`. ![Monitoring Viewer Role](/_astro/gcp-service-account-monitoring-viewer-role.CX4DjOpV_1rtG4q.webp) Grant other users internal to your organization access to this Service Account(Optional) ![Add other users optionally](/_astro/gcp-service-account-other-users.kXvAuLga_ZAEOr4.webp) ## Generate Credentials * Click on the newly created Service Account to view more details ![Click on the Servive Account](/_astro/gcp-service-account-list.LR7c1sX2_Z2g2BA6.webp) * Create a new Service Account Key ![Create a new Service Account Key](/_astro/gcp-service-account-create-access-key.Bs-ifppC_257u0q.webp) ![Download the Service Account Key](/_astro/gcp-service-account-create-access-key.Bs-ifppC_257u0q.webp) * Share the downloaded key with your Last9 team # Creating Log Analytics Dashboards > Guide to creating Log Analytics Dashboards from aggegragated queries in Logs Explorer ## Introduction Creating custom log analytics dashboards in Last9 allows you to visualize and monitor log data through aggregated metrics. This guide explains how to create and promote log queries into dashboard visualizations. ## Starting with Logs Explorer ### Using Editor Mode 1. Navigate to the [Logs Explorer](https://app.last9.io/logs) in Last9 2. Switch to Editor Mode — this enables writing advanced LogQL-compatible queries 3. Write a normal query to explore your data ```sql {service="adservice"} ``` 4. Convert it into an aggregation query by adding an aggregation function ```sql sum by (severity) (count_over_time({service="adservice"} [1m])) ``` 5. Promote the query to a dashboard by clicking the **Add to Dashboard** button ![Promote to Dashboard](/_astro/logs-add-aggregated-query-to-dashboard.HOKp2FXX_ZlEES4.webp) 6. Create a new dashboard or add it to an existing dashboard by adding a unique panel name ![Create Dashboard](/_astro/logs-create-dashboard._vgW065K_ZT7GM0.webp) 7. You will be redirected to the new dashboard with your query added as a panel ![Dashboard with Panel](/_astro/logs-dashboard-with-panel.CRAqxXnl_Z1UzwDY.webp) 8. You can edit the panel by clicking the **⋮** button and then the **edit** button ![Edit Panel](/_astro/logs-edit-panel.CACQUdWF_8net1.webp) 9. Add multiple panels to the dashboard by following the same steps as above ### Supported Aggregation Functions Last9 supports several aggregation functions for creating meaningful visualizations: * Time-based aggregations: * **`count_over_time`**: Counts the number of logs over time * **`sum_over_time`**: Sums the values of a numeric field over time * **`avg_over_time`**: Averages the values of a numeric field over time * **`max_over_time`**: Finds the maximum value of a numeric field over time * **`min_over_time`**: Finds the minimum value of a numeric field over time * **`rate`**: Calculates the rate of change of a numeric field over time * Statistical aggregations: * **`sum`**: Sums the values of a numeric field * **`avg`**: Averages the values of a numeric field * **`count`**: Counts the number of logs * **`max`**: Finds the maximum value of a numeric field * **`min`**: Finds the minimum value of a numeric field * **`stddev`**: Calculates the standard deviation of a numeric field * **`median`**: Finds the median value of a numeric field * **`stdvar`**: Calculates the standard variance of a numeric field ## Query Construction Guidelines ### Time Windows Specify time windows using the following formats: * Minutes: **`[1m]`** * Hours: **`[1h]`** * Days: **`[1d]`** ### Query Examples Basic severity-based aggregation: ```sql sum by (severity) (count_over_time({service!="user-service"} [1m])) ``` Complex bucket-based aggregation: ```sql sum by (bucket) (count_over_time({ service="unknown", ingestor="s3", bucket=~"elb-logs", log.file.path=~".*api.*" } [3h])) ``` ## Best Practices ### Time Range Selection * Match window size to query requirements * For instant queries, set time range equal to window size * Consider data retention and query performance when selecting time ranges ### Query Performance * Leverage accelerated queries by including Service or Severity filters * Use specific filters to reduce data scanning * Test queries with smaller time ranges before expanding to larger windows ### Dashboard Organization * Group related visualizations together * Use clear, descriptive titles * Include context in dashboard descriptions * Set appropriate refresh intervals based on data update frequency *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Delegate Subdomain between two AWS Accounts using Route 53 > Step-by-step guide on how to delegate Subdomain between two AWS Accounts using Route 53 Note This is specifically useful for Last9 BYOC deployments ## Assumptions 1. AWS Account A is the Primary Account 2. AWS Account B is the Sub Account 3. `example.com` is an arbitrary domain used purely for easy understanding 4. You have enough permissions granted by your AWS Admin to add/modify Route53 ## Premise To set up `subdomain.example.com` as a hosted zone in **AWS Account B** and extend it for internal usage (e.g., `internal.subdomain.example.com`), you need to delegate authority for the subdomain from **AWS Account A** to **AWS Account B**. This process involves creating a new hosted zone in **Account B** for the subdomain and then updating the parent hosted zone in **Account A** to delegate DNS resolution to the nameservers for the subdomain in **Account B**. ![AWS Route 53 Subdomain Delegation](/_astro/route53-subdomain-soa.My5vooC4_Ukx4L.webp) ## Step-by-Step Procedure ### Step 1: Create a Hosted Zone for the Subdomain in Account B 1. **Sign in to the AWS Management Console for Account B** 2. **Create a Hosted Zone**: * Navigate to the Route 53 console * Click on **“Hosted zones”** in the left navigation pane * Click the **“Create hosted zone”** button * Enter `subdomain.example.com` as the domain name * Choose the type as “**Public hosted zone”** (or **Private hosted zone for Amazon VPC** if it’s for internal usage) * Click **Create hosted zone** 3. **Note the Nameservers**: * After the hosted zone is created, note the nameservers (NS records) provided by Route 53 for the new hosted zone. You will need these nameservers to delegate the subdomain from Account A ### Step 2: Delegate the Subdomain from Account A to Account B 1. **Sign in to the AWS Management Console for Account A** 2. **Navigate to the Hosted Zone for `example.com`**: * Go to the Route 53 console * Click on **Hosted zones** in the left navigation pane * Click on the hosted zone for `example.com` 3. **Create NS Record for the Subdomain**: * Click **Create record** * Choose **Simple routing** and click **Next** * For **Record name**, enter `subdomain` (to delegate `subdomain.example.com`) * Choose **Record type** as **NS - Name Server** * In the **Value** field, enter the nameservers for `subdomain.example.com` provided by Account B * Click **Create records** ### Step 3: Create a Hosted Zone for Internal Usage in Account B 1. **Sign in to the AWS Management Console for Account B** 2. **Create a Hosted Zone for `internal.subdomain.example.com`**: * Navigate to the Route 53 console * Click on **Hosted zones** in the left navigation pane * Click the **Create hosted zone** button * Enter `internal.subdomain.example.com` as the domain name * Choose the type as **Private hosted zone for Amazon VPC** * Select the appropriate VPCs * Click **Create hosted zone** ## Verification **Public Subdomain Delegation**: * You can verify that `subdomain.example.com` is correctly delegated by using the `dig` or `nslookup` commands: ```plaintext dig ns subdomain.example.com ``` **Internal Subdomain Resolution**: * For `internal.subdomain.example.com`, ensure that your VPC’s DNS settings are configured correctly and that Route 53 resolver endpoints are set up if necessary ## Summary 1. **Create a hosted zone for `subdomain.example.com` in Account B** and note the nameservers 2. **Delegate the subdomain** from Account A to Account B by creating an NS record in the `example.com` hosted zone in Account A pointing to the nameservers for `subdomain.example.com` in Account B 3. **(Optional) Create a hosted zone for `internal.subdomain.example.com`** in Account B for internal DNS resolution *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # FAQs > Frequently Asked Questions ## What are the challenges of running your own Prometheus? The key challenge faced by an enterprise: * Modern Time Series systems don’t have to grow along a single axis of Cardinality, Coverage, or Retention alone. Instead, the rate of ingestion and exploration warrants an expansion on all three axes. There is a constant need to scale TimeSeries Database vertically * Data are abundant, but it’s not being used. **Growing costs of storing and querying this data** lead to time and effort in auditing what metrics are used, how much data retention is necessary, and trimming from your database * **Enterprises must dedicate full-time engineering resources to manage their time series database**. Automation to scale with rapid changes in data shape and recovery from downtime must be implemented and maintained. To support needs across the business, teams create multiple database instances to handle concurrency and implement query sharding to improve performance. Orgs will also implement a solution like Thanos or Cortex to enable long-term storage of metric data. These implementations will cost significant engineering time to create and maintain and make things operationally confusing for developers sending and querying metrics ## Is Last9 fully managed? You can set up a cluster and get going in under two minutes. How do you ask? Read our [How to onboard to Last9](/docs/onboard/) guide for more details. ## Should an enterprise run Prometheus internally, or can Last9 support all internal enterprise requirements related to Prometheus? An enterprise can run Prometheus internally or offload all metrics storage to [Last9](https://last9.io/). We provide flexibility in our offerings per the requirements most suited for our customers. Prometheus is open-source software that collects metrics from targets by “**scraping**” metrics HTTP endpoints and **stores** the metrics as time-series data. ## Is Last9 SOC2 compliant? Last9 cares deeply about its customer’s data and is SOC2 Type II certified. ## Can Last9 be deployed on infrastructure owned by the enterprise? Yes, Last9 can also run on any cloud provider of your choice. ## What features are available on SaaS vs. on-prem instances? All features, including retention, are available on SaaS and on-prem instances. ## How do enterprises ingest data from GCP, Azure & AWS? All cloud providers have a Prometheus-compatible exporter. This can be run to scrape metrics from all infrastructure resources and ingested into Last9. ## Does the data go over an open internet? Yes, by default, it does, and securely. Alternatively, you could do VPC Peering and remote write data to Last9 over HTTP. ## How to retrieve data for internal consumption? This can be done with simple querying as one will with Grafana and other visualization tools to query TSDB data. Read our [How to configure Grafana for Last9](/docs/grafana-config/) guide for more details. # Getting started with API > Step-by-step walkthrough on how to obtain the API tokens for performing various operations with Last9 The API provides a programmatic method to access and operate Last9. This exposes a subset of features and actions that can be performed on Last9 as REST APIs. For example, you can send [change events](/docs/change-events/) to Last9 using these APIs or you can [generate alert rules](/docs/alerting-via-iac/). Tip These APIs differ from the instrumentation and configurations required for data ingestion into Last9 clusters. They also differ from querying APIs provided by Last9 to read data from a Last9 cluster. ## Base URL The base API URL can be obtained from the [API Access](https://app.last9.io/apiaccess) page. It is in the following format: ```text https://{domain}/api/{version}/organizations/{org}/{endpoint} ``` The `{org}` parameter is your unique organization slug. ![API Acess and Base URL](/_astro/api-access-base-url.iUTTFC5q_139T2Y.webp) ## Tokens The Authentication is performed using Bearer access tokens. Tokens are generated for a logged-in user. Click “Generate Tokens” to create the tokens. A total of 3 pairs of tokens for access and refresh are generated with authentication-specific claims for read, write, and delete operations. ![Generate API Tokens](/_astro/api-access-base-url.iUTTFC5q_139T2Y.webp) ### Token Expiry The Tokens expire in 24 hours; the user or application that uses these tokens should account for expirations and incorporate a refresh mechanism using the refresh token issued along with the access token. The following error will be raised when the access token expires. ```json { "error": "Authorization token is expired" } ``` In such scenarios, you can generate a new access token using the refresh token as follows. ```text POST /v4/oauth/access_token ``` Request Body: ```json { "refresh_token": "eyJhbGciOiXXXXXXXXXXXXX.eyJleHXXXXXXXXX.XXXXXXXXXOwuvUNA" } ``` The response of this endpoint will contain a pair of access tokens and refresh tokens if the refresh token in the request body is valid. Response ```json { "access_token": "eyJhbGciOiXXXXXXXXXXXXXX.eyJleHXXXXXXXXX.XXXXXXXXXOwuvUNA", "expires_at": 1587412870, "issued_at": 1587240070, "refresh_token": "eyJhbGciOiXXXXXXXXXXXXX.eyJleHXXXXXXXXX.XXXXXXXXXOwuvUNA", "type": "Bearer", "scopes": ["read", "write", "delete"] } ``` ## Usage The tokens are specifically separated based on the scopes they are authorized to perform based on the impact they might have on the system’s overall behavior. * **Read Tokens**: Have a minimum impact on the performance of the Last9 application. These are to be specifically used for reading the current state of the data * **Write Tokens**: Use this token to create or modify data in any supported entity. This could change the behavior of your usage of Last9 * **Delete Tokens**: Use this token judiciously. This could break the processes and cause an irrevocable state through missing data ## Authentication & Authorization All public API endpoints require a Token to be supplied as an authorization header for all requests. The token is used to identify the user/application and authenticate the requests to API. The header name must be **X-LAST9-API-TOKEN**. ## Making your first API request Please follow the steps below to create our first API request for a change event. ### Step 1: Generate Tokens Refer to the Tokens section above and generate the tokens from the API Access page. For your first request, copy the Write Access token. ### Step 2: Base URL The base URL of your instance can be obtained as specified in the Base URL section above. ### Step 3: Making the API request The endpoint for creating change events is ```text PUT /change_events ``` ```json { "timestamp": "2024-01-15T17:57:22+05:30", "event_name": "new_deployment", "event_state": "start", "attributes": { "env": "production", "k8s_cluster": "prod-us-east-1", "app": "backend-api" } } ``` The cURL request looks as follows: ```shell curl --location --request PUT 'https://app.last9.io/api/v4/organizations/github-prathamesh-sonpatki/change_events' \ --header 'X-LAST9-API-TOKEN: Bearer ' \ --header 'Content-Type: application/json' \ --data '{ "timestamp": "2024-01-15T17:57:22+05:30", "event_name": "new_deployment", "event_state": "start", "attributes": { "env": "production", "k8s_cluster": "prod-us-east-1", "app": "backend-api" } }' ``` ### Step 4: Verify the response The API will return the following response in case of success with HTTP status code 200. ```json { "message": "success" } ``` *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Getting Started with Events > This document explains how to send events to Last9 and different ways to consume them such as via streaming aggregation. Last9 time series data warehouse is a powerful tool for storing and analyzing time series data. Last9 also supports ingesting real-time events and converting them into metrics so that they can be consumed in myriad ways engineers are already familiar with. This document outlines the steps necessary to send events to Last9 and the different ways of consuming them. ## Events Generally, there are two types of events. 1. They happen over time, and their performance, like frequency, presence, or absence, is interesting * Example: Average hourly take-offs from the San Francisco Airport in the last week. 2. The event and its data are of interest * Example: When was the last time Arsenal won in EPL? The first example is asking questions based on specific aggregations performed on raw events, the individual events may not be necessary, but their aggregations and insights captured using them are relevant for business. The second example is about the event and gives insights based on the event data. Last9 supports extracting both kinds of information from the events. ## Structure of Events Last9 supports accepting events in the following JSON format. Every event has a unique name defined by the `event` key and a list of `properties`. Any extra keys apart from `event` and `properties` are not retained by Last9. ```json { "event": "heartbeat", "properties": { "server": "ip_address", "environment": "staging" }, "extra_fields": "will be dropped" } ``` ## Sending Events to Last9 ### Prerequisites Grab the [Prometheus Remote Write](https://last9.io/blog/what-is-prometheus-remote-write/) URL, cluster ID, and the write token of the Last9 cluster, which you want to use as an Event store. Follow the instructions [here](/docs/onboard/) if you haven’t created the cluster and its write token. ### Sending data Grab the Prometheus Remote Write URL for your Last9 Cluster, and make following changes to the URL. If your Prometheus URL is `https://username:token@app-tsdb.last9.io/v1/metrics/{uuid}/sender/acme/write` The Event URL would be `https://username:token@app-events-tsdb.last9.io/v1/events/{uuid}/sender/acme/publish` Note Note down the table below to get the hostname for the Events gateway depending on the region in which the Last9 cluster exists. ![Last9 Cluster Prometheus Remote Write URL details](/_astro/d8b8120-image.CMG1bCRo_ZGwWJA.webp) | Cluster Region | Host | | :----------------- | :---------------------------- | | Virginia us-east-1 | app-events-tsdb-use1.last9.io | | India ap-south-1 | app-events-tsdb.last9.io | Events must be sent with the `Content-Type: application/json` header. ```bash curl --location --request XPOST 'https://username:token@app-events-tsdb.last9.io/v1/events/{uuid}/sender/acme/publish' \ --header 'Content-Type: application/json' \ --data-raw '[{ "event": "sign_up", "properties": { "currentAppVersion": "4.0.1", "deviceType": "iPhone 14", "dataConnectionType": "wifi", "osType": "iOS", "platformType": "mobile", "mobileNetworkType": "wifi", "country": "US", "state": "CA" } }]' ``` The API endpoint accepts an array of events in the payload so one or more events can be sent in the same packet. The API is greedy and allows partial ingestion. If one or more events in the packet have a problem, they are returned to the response body; everything else is ingested into the system. ## Consuming events as metrics Events are converted to Metrics at 1 DPM (Data Point per minute) and combined to emit `gauges` per combinations of `event name + properties` for the previous minute. We use **[Tumbling Windows](https://learn.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics)** to represent the Event stream’s consistent and disjoint time intervals. All published events are partitioned into buckets of 1 minute each and then grouped by event Name and properties. For example, the elements with timestamp values `\[0:00:00-0:01:00)` are in the first window. Elements with timestamp values `\[0:01:00-0:02:00)` are in the second window. And so on. ![Events timing diagram](/_astro/35a7331-image.ChCMUJBT_ZPktPe.webp) The following events will produce the following metrics: ```bash event_name_count{properties...1} 5 event_name_count{properties...2} 2 ... event_name_count{properties...n} 5 event_name_count{properties...1} 5 event_name_count{properties...2} 1 ... event_name_count{properties...n} 3 event_name_count{properties...1} 2 event_name_count{properties...2} 4 ... event_name_count{properties...n} 3 ``` ### Define Streaming Aggregations You need to define streaming aggregations to query the metrics converted from events. Last9 allows defining a Streaming Aggregation as a PromQL to emit an aggregated metric that alerts or dashboards can then consume. ```yaml - promql: "sum(device_health_total{version='1.0.1'})[5m] by (os)" as: total_devices_by_os_5m with_name: total - promql: "sum(device_health_total{os='ios'})[1m] by (version)" as: concurrency_by_ios_version with_name: concurrency ``` Please refer to the PromQL-powered [Streaming Aggregations](/docs/streaming-aggregations/) to understand the workflow of where and how to define the Streaming Aggregation Pipelines. This feature enables the folding of all metrics that would otherwise explode in cardinality and allows for the emission of meaningful aggregations and views. It is also available for Last9 Metrics, not just limited to Events. ## Events to Gauge Metrics Consider an event named `memoryUsageSpikeAlert` with the following properties: * `increaseInBytes` indicating an increase in memory usage by 1,610,612,736 bytes * `host` represents the host’s IP address associated with the event * `osType` specifying the operating system type as “linux” ```json { "event": "memoryUsageSpikeAlert", "properties": { "increaseInBytes": "1610612736", "host": "10.1.6.14", "osType": "linux" } } ``` Define the streaming aggregation configuration for `max` of `memoryUsageSpikeAlert` as follows: ```yaml - promql: "max by (host) (memoryUsageSpikeAlert_maximum)" as: "max_memory_usage_spike" with_value: "increaseInBytes" with_name: "maximum" ``` Let’s break the example: * `max` is the aggregation function * `maximum` is the `with_name` value appended to the intermediate metric name to maintain uniqueness. It can be any string. * `with_value` is the event’s property name on which the gauge aggregation has to be applied. In this case, it is `increasedBytes`. * `max_memory_usage_spike` will be the final output metric you are exposed to query against. > You can also use the `min` and `sum` aggregations. ## Querying Events Once Events have been converted to Metrics, they can be queried like metrics. This could be a Grafana Dashboard or any other Prometheus Query API client. ![Example of querying events](/_astro/f2b9159-image.Fpez-lNu_Z5vRBV.webp) You may also set alerts on these events, converted to metrics, using Prometheus-compatible Alertmanager. ## Conclusion Here’s a link to the sample repository that brings this all together. It contains some example schemas and aggregation pipelines. ## FAQs **Q: Why is time not accepted as a first-class property?** A: Accepting a User-provided timestamp is extremely risky. A timestamp may not be formatted correctly, or instead of a UnixMilli, one may send Unix alone. Or send January 1st, 1970 as ***testing* data.** Such precision is unfair to expect from developers who want easier integration that is not prone to fragility. Besides, since the system is optimized for time, a malformed timestamp will result in undesired results of dropped packets or the system having to backfill data in frozen shards. Hence, we accept timestamps as when the event is received. This also means that Last9 works well with real-time events! **Q: What happens to the timestamp if there are delays in arrival?** A: Last9 Gateways are designed to be extremely fast and lightweight. They write data as early as they receive it and are highly available across multiple availability zones. They need more processing to ensure messages are not lost or delayed upon arrival. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Configure Grafana with Last9 as Datasource > How to configure Grafana with Last9 cluster as datasource and visualize the metrics stored in Last9 ## How to configure Grafana for Last9? Create a new data source with the appropriate URL. ![Add New Data Source in Grafana](/_astro/ba92d7c-1.BsPoFgkL_sqm7L.webp) Each Last9 Cluster comes with a Read URL which needs to be used when creating a Data Source in Grafana. ![Last9 Read Data Settings](/_astro/levitate-cluster-read-data-settings-tab.BoWiHQqm_Z1QDGCt.webp) Here’s an example of creating a Data Source using the Read URL. You can grab the Read URL either: * While creating your cluster → Read Data → Bring Your Own Visualization * Or, by going to the Last9 Cluster → Settings → Read Data → Bring Your Own Visualization ![Add Last9 Cluster as Prometheus Compatible Data Source in Grafana](/_astro/cc33383-Screenshot_2022-09-25_at_1.23.11_PM.EBzXLx9P_Z1GxBz5.webp) Note Last9 Read URL mandates authenticated access. Create a Read Token for your cluster and use it as `password`. Use the cluster id as `user`. Add the `user` and `password` in the `Basic Auth Details` section while creating data source in Grafana. Make sure that the status of the data source appears as all ✅. ![Last9 Cluster as Prometheus Compatible Data Source Status](/_astro/c20f2ba-3.CEjLNv4t_15PQzE.webp) After this, try exploring data or create a new dashboard in Grafana based on metrics in Last9. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Grafana Loki in Last9 > Use Last9's embedded Grafana Loki to view logs. ## Using Grafana Loki Last9 provides a Grafana Loki interface using LogQL to explore your logs data. ![Grafana Loki in Last9](/_astro/last9-logs-loki.Co_L4Hmz_ARsJJ.webp) * Access the Loki UI by visiting [Grafana Explore](https://app.last9.io/explore/query) and selecting Loki as the datasource. * You can perform [LogQL queries](https://grafana.com/docs/loki/latest/query/) to explore logs in this interface. This is useful for structured exploration of logs data for people who are familiar with Grafana and Loki. Note You can also use [Editor Mode](/docs/logs-explorer/#editor-mode) in Last9’s Logs Explorer to perform LogQL queries as well. ## LogQL Compatibility Following functions in LogQL are supported: * **`RATE`** * **`COUNT_OVER_TIME`** * **`SUM_OVER_TIME`** * **`AVG_OVER_TIME`** * **`MAX_OVER_TIME`** * **`MIN_OVER_TIME`** * **`SUM`** * **`AVG`** * **`COUNT`** * **`MAX`** * **`MIN`** * **`STDDEV`** * **`MEDIAN`** * **`STDVAR`** Following parsers in LogQL are supported: * **`json`** * **`regexp`** Read more about the documentation for each function [here](https://grafana.com/docs/loki/latest/query/). ## Creating Dashboards ### Accessing Grafana 1. Navigate to the Grafana section in Last9 2. Create a new dashboard by clicking **Create Dashboard** 3. Add a new panel to begin visualizing your data ### Selecting Loki Data Source The Loki data source comes pre-configured in Last9’s embedded Grafana, so you can start querying immediately. ### Query Construction Methods #### Using Builder Mode Builder mode provides a visual interface for constructing Loki queries without writing LogQL. Here’s how to use it: 1. Label Selection * Click **Add label** to start building your query * Select labels (e.g., service, severity) from the dropdown * Choose operators (=, !=, =~~, !~~) * Select or type values for the labels 2. Operations * Add operations using the **Operations** button * Common operations include: * Line contains * Line does not contain * Line contains regex * Line does not contain regex * JSON 3. Aggregations * Click **Add range function** * Select functions like: * Rate * Count over time * Sum over time * Avg over time * Set time windows (\[1m], \[5m], \[1h]) 4. Examples Using Builder Mode: Basic Query: * Label: **`service = "auth-service"`** * Operation: **`Line contains "error"`** * Range: **`count_over_time [5m]`** Advanced Query: * Label: **`service =~ "api.*"`** * Label: **`severity = "error"`** * Operation: **`JSON`** * Operation: **`Line contains "timeout"`** * Range: **`sum by (status_code)`** 5. Builder to Code Mode * Switch between modes to see the LogQL equivalent * Learn LogQL syntax through the Builder interface * Fine-tune queries in Code mode #### Writing LogQL Queries For advanced users or complex queries, you can write LogQL directly: Basic Query Structure: ```sql {service="your-service"} ``` Common Aggregation Patterns: ```sql sum by (severity) (count_over_time({service="your-service"}[5m])) ``` ### Key Query Components * Label matchers: **`{label="value"}`** * Line filters: **`|= "error"`** * Aggregation functions: **`sum`**, **`avg`**, **`max`** * Time windows: **`[1m]`**, **`[1h]`**, **`[1d]`** ### Understanding Window Behavior Remember that Last9’s window behavior differs from standard Loki: * Last9 uses tumbling windows (window size = step size) * Both window and step size are defined by the **`[]`** parameter * For instant queries, match time range to window size ### Creating Visualizations #### Panel Types 1. Time Series * Best for tracking metrics over time * Suitable for rate and count queries 2. Bar Charts * Good for comparing values across categories * Works well with **`sum by`** aggregations 3. Tables * Useful for detailed log analysis * Can show multiple columns of log data #### Panel Configuration 1. Set appropriate panel title and description 2. Configure axes and legends 3. Set up thresholds and alerts if needed 4. Choose color scheme for better visibility ### Advanced Query Techniques #### Using Multiple Queries ```sql sum(rate({service="auth-service"} |= "error" [5m])) by (severity) sum(rate({service="auth-service"} |= "warning" [5m])) by (severity) ``` #### Pattern Matching ```sql {service=~"auth.*"} |= "error" != "timeout" ``` #### Metric Extraction ```sql sum by (status_code) (count_over_time({service="api"} | json | status_code != "" [5m])) ``` ### Dashboard Organization #### Best Practices * Group related panels logically * Use consistent time ranges across related panels * Add descriptive titles and documentation * Consider user permissions and sharing settings #### Layout Tips * Arrange panels in order of importance * Use rows to group related visualizations * Consider different screen sizes and resolutions ### Performance Optimization #### Query Efficiency 1. Use label filters before line filters 2. Start with Service and Severity filters for better performance 3. Avoid processing unnecessary data #### Time Range Considerations * Start with smaller time ranges during development * Consider data retention policies * Use appropriate aggregation intervals *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Grafana Tempo in Last9 > Use Last9's embedded Grafana Tempo to view traces. ## Using Grafana Tempo Last9 provides a Grafana Tempo interface to explore your traces data. ![Grafana Tempo in Last9](/_astro/last9-traces-tempo.BDwr2mve_ZaC6yb.webp) * Access the Tempo UI by visiting [Explore](https://app.last9.io/explore/query) and selecting Tempo as the datasource. * You can perform [TraceQL queries](https://grafana.com/docs/tempo/latest/traceql/) to explore traces in this interface. This is useful for structured exploration of traces data for people who are familiar with Grafana and Tempo. \##last9-traces-tempo.png Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Calculate usage patterns and data volume in New Relic > Sample queries to understand data volume of spans, metrics and transactions in New Relic This document lists queries which you can run in your New Relic and share with Last9 as process of migration from New Relic to Last9. ## Calculating Data in New Relic This following NRQL queries will calculate calculate total no. of transactions, spans and mertics over the last day. 1. Total no. of spans over a week ```sql SELECT count(*) FROM Span SINCE 1 week ago ``` 2. Total transactions ```sql SELECT count(*) FROM Transaction SINCE 1 week ago ``` ```sql SELECT count(*) FROM Transaction FACET dateOf(timestamp) SINCE 1 week ago TIMESERIES 1 day ``` 3. Total Metrics ```sql SELECT count(*) FROM Metric SINCE 1 week ago ``` 4.Total Event ```sql SELECT count(*) FROM Event SINCE 1 week ago ``` 5. Total Logs ```sql SELECT count(*) FROM Log SINCE 1 week ago ``` Additionally, you can share how much data is getting ingested in New Relic from the data management tab. # Calculate usage patterns and data volume in Prometheus > Sample PromQL queries to understand ingestion rate, read query rate and total time series ## Calculating Ingestion Rate This query will calculate the per-minute ingestion rate by averaging the per-second ingestion rate over the past minute, as measured by the `prometheus_tsdb_head_samples_appended_total` metric. The result will be a single value representing the average number of samples ingested per minute over the past minute. ```sql rate(prometheus_tsdb_head_samples_appended_total[1m]) * 60 ``` ## Calculating Read Query Rate This query will calculate the per-minute query rate by averaging the per-second query rate over the past minute, as measured by the prometheus\_http\_requests\_total metric for GET requests to the /api/v1/query endpoint. This endpoint is used for executing queries against the Prometheus database, so this metric represents the number of read queries executed by the server. ```sql sum by (handler) (rate(prometheus_http_requests_total{handler="/api/v1/query"}[1m]) * 60) ``` ## Calculating Total Time Series This query will count the number of distinct time series in the database, regardless of the metric or label values. The regular expression ”.+” matches all series names, so this query effectively counts all series. ```sql count({__name__=~".+"}) ``` Tip Note that counting the total number of time series can be resource-intensive for large databases, and may take some time to complete. Additionally, this query may return inaccurate results if the database is actively ingesting or deleting time series during the query. # Create a AWS STS Role > This tutorial walks through setting up a AWS STS (Secure Token Service) role for discovering resources via cloudwatch ## Creating trusted role without external id 1. Visit [AWS Console/Roles](https://console.aws.amazon.com/iam/home#/roles) 2. Click [Create Role](https://console.aws.amazon.com/iam/home#/roles$new?step=type) ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/Screenshot\_2021-02-25\_at\_1.22.54\_PM.png](/_astro/Screenshot_2021-02-25_at_1.22.54_PM.DoIJGDgl_Z1lsmAo.webp) 3. Select **[Another AWS Account](https://console.aws.amazon.com/iam/home#/roles$new?step=type\&roleType=crossAccount)** tab * **Account ID**: `652845092827` * **Next Permissions** ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/Screenshot\_2021-02-25\_at\_1.24.53\_PM.png](/_astro/Screenshot_2021-02-25_at_1.24.53_PM.Cg3MpJaH_ZL1Yoi.webp) 4. Attach policies Tip Last9 requires Cloudwatch Read Only Access to fetch metrics for components that are being used. Last9 also requests Security Audit Access as it enables us to identify components by name and tags. The AWS-managed SecurityAudit policy grants read-only access to logs, events and configuration detail for current and future AWS services. This policy allows Last9 to perform list and describe api calls to AWS to fetch component details such as name, ARNs and tags etc. And as Last9 provides auto detection of infrastructure components, this policy helps to minimize the access request burden on customers as they introduce new services to their infrastructure and as Last9 introduces new services to our platform. a. **SecurityAudit** Policy ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/Screenshot\_2021-02-25\_at\_1.33.46\_PM.png](/_astro/Screenshot_2021-02-25_at_1.33.46_PM.BcnkNl4a_Z2uPlmU.webp) b. **CloudWatchReadOnlyAccess** Policy ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/Screenshot\_2021-02-25\_at\_2.42.11\_PM.png](/_astro/Screenshot_2021-02-25_at_2.42.11_PM.CHmR72gn_npgHB.webp) c. Proceed to Next Steps 5. Add tags if needed ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/Screenshot\_2021-02-25\_at\_1.37.36\_PM.png](/_astro/Screenshot_2021-02-25_at_1.37.36_PM.CXA09kuf_Z1TF2eQ.webp) 6. Review 1. **Role name:** `${business_name}_last9_role` 2. **Role description**: Security Audit Access to Last9 3. **Verify Last9 AWS Account Number** 4. **Verify Granted Policy** 5. **Create Role** ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/Screenshot\_2021-02-25\_at\_1.44.40\_PM.png](/_astro/Screenshot_2021-02-25_at_1.44.40_PM.CMvZPgze_iWIi7.webp) 7. After the role is created, Go to Role → Trust Relationships → Edit Trust Relationship ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/2021-06-08\_22-08.png](/_astro/2021-06-08_22-08.DXphoq4l_1ivQro.webp) 8. Update the JSON to the following and click “**Update Trust Policy**” ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::652845092827:root" }, "Action": "sts:AssumeRole", "Condition": {} } ] } ``` 9. Edit the role and update “**Maximum session duration**” to 3 hours if your security policy permits it. Else leave it as 1 hour. ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/2021-02-25\_14-14.png](/_astro/2021-02-25_14-14.sXXIC7_L_Z1HPD4H.webp) 10. Share the created role ARN with your Last9 point of contact *** ## Creating trusted role with external id 1. Visit [AWS Console/Roles](https://console.aws.amazon.com/iam/home#/roles) 2. Click [Create Role](https://console.aws.amazon.com/iam/home#/roles$new?step=type) ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/Screenshot\_2021-02-25\_at\_1.22.54\_PM.png](/_astro/Screenshot_2021-02-25_at_1.22.54_PM.DoIJGDgl_Z1lsmAo.webp) 3. Select “**[Another AWS Account](https://console.aws.amazon.com/iam/home#/roles$new?step=type\&roleType=crossAccount)**” tab with external ID as a random string. It has to be something other than “somerandomstring” and share it with Last9 * **Account ID**: `652845092827` ![2021-09-08\_17-24.png](/_astro/2021-09-08_17-24.Dosyx1mb_Z1cXXkw.webp) 4. Attach policies Tip Last9 requires Cloudwatch Read Only Access to fetch metrics for components that are being used. Last9 also requests Security Audit Access as it enables us to identify components by name and tags. The AWS-managed SecurityAudit policy grants read-only access to logs, events and configuration detail for current and future AWS services. This policy allows Last9 to perform list and describe api calls to AWS to fetch component details such as name, ARNs and tags etc. And as Last9 provides auto detection of infrastructure components, this policy helps to minimize the access request burden on customers as they introduce new services to their infrastructure and as Last9 introduces new services to our platform. a. **SecurityAudit** Policy ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/Screenshot\_2021-02-25\_at\_1.33.46\_PM.png](/_astro/Screenshot_2021-02-25_at_1.33.46_PM.BcnkNl4a_Z2uPlmU.webp) b. **CloudWatchReadOnlyAccess** Policy ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/Screenshot\_2021-02-25\_at\_2.42.11\_PM.png](/_astro/Screenshot_2021-02-25_at_2.42.11_PM.CHmR72gn_npgHB.webp) c. Proceed to Next Steps 5. Add tags if needed ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/Screenshot\_2021-02-25\_at\_1.37.36\_PM.png](/_astro/Screenshot_2021-02-25_at_1.37.36_PM.CXA09kuf_Z1TF2eQ.webp) 6. Review 1. **Role name:** `${business_name}_last9_role` 2. **Role description**: Security Audit Access to Last9 3. **Verify Last9 AWS Account Number** 4. **Verify Granted Policy** 5. **Create Role** ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/Screenshot\_2021-02-25\_at\_1.44.40\_PM.png](/_astro/Screenshot_2021-02-25_at_1.44.40_PM.CMvZPgze_iWIi7.webp) 7. After the role is created, Go to Role → Trust Relationships → Edit Trust Relationship ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/2021-06-08\_22-08.png](/_astro/2021-06-08_22-08.DXphoq4l_1ivQro.webp) 8. Update the JSON to the following and click “**Update Trust Policy**”. Ensure that the value for `sts:ExternalId` matches the value set earlier for External-ID ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::652845092827:root" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "somerandomstring" } } } ] } ``` 9. Edit the role and update “**Maximum session duration**” to 3 hours if your security policy permits it. Else leave it as 1 hour ![../../../../assets/docs/tutorials/how-to-create-aws-sts-role/2021-02-25\_14-14.png](/_astro/2021-02-25_14-14.sXXIC7_L_Z1HPD4H.webp) 10. Share the created role ARN and external ID string with your Last9 point of contact # Enable EC2 Service Discovery with vmagent > This tutorial walks through setting up service disocvery for EC2 instances with vmagent. ## Service Discovery In the context of monitoring, Service Discovery refers to automatically detecting devices, services, or systems in a network that need to be monitored. Service discovery is significant in cloud environments that use auto-scaling and EC2 instances. These environments often have instances that change rapidly, making manual tracking infeasible from a monitoring point of view. This document lists steps to enable service discovery of EC2 instances so new instances can be monitored as they are created and decommissioned instances can be removed from monitoring, tackling false alerts. This document assumes that the EC2 instance service discovery will be set up for vmagent to send metrics to Last9 via Remote Write. Note Follow our guide for [setting up vmagent on an EC2 machine](/docs/how-to-setup-vmagent-on-ubuntu/). Given that vmagent is successfully running on an EC2 Instance, we need to make provisions for vmagent to discover other EC2 instances, that is, scrape targets based on [ec2\_sd\_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config). ## Create `ec2-trustee` IAM role with assume role policy Go to AWS Console → IAM → Roles → Create Role * Select Trusted Entity ![Select Trusted Entity](/_astro/select-trusted-entity.td_mZcGH_wiKME.webp) * Do NOT add any permissions and click next ![Add Permissions](/_astro/add-permissions.CxGtOvgG_1iEkOU.webp) * Name, Review and Create ![Create IAM Role Step 1](/_astro/create-iam-role-step-1.yVMOnhyv_20Wqm8.webp) ![Create IAM Role Step 2](/_astro/create-iam-role-step-2.eFuXGCwV_Z7G5MS.webp) ## Attach `ec2-trustee` IAM role to vmagent EC2 Host EC2 Instances > Select vmagent Instance > Actions > Instance Settings * Modify IAM Role ![Steps to update IAM role](/_astro/select-vmagent-ec2-instance.DhZA54AY_Z2wB0VS.webp) * Select `ec2-trustee` IAM role and Update ![Modify IAM Role](/_astro/modify-iam-role.2sPhH7q7_Z25VQN6.webp) ## Create `vmagent-sd-role` IAM role Go to AWS Console → IAM → Roles → Create Role * Select Trusted Entity > Custom Trust Policy with below trust policy ![Custom Trust Policy](/_astro/custom-trust-policy.CseaBt17_x2nv0.webp) ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::AWS_ACCOUNT_ID:role/ec2-trustee" }, "Action": "sts:AssumeRole" } ] } ``` * Add Permissions 1. Create Custom Policy with the below policy ````plaintext ```json { "Statement": [ { "Action": "ec2:Describe*", "Effect": "Allow", "Resource": "*" } ], "Version": "2012-10-17" } ``` ```` 2. Select Policy and click next ![Select Policy Step 1](/_astro/select-policy-step-1.BbUzxJ2v_Z22o1cG.webp) ![Select Policy Step 2](/_astro/select-policy-step-2.Ck7r4d89_Z1gy3O9.webp) 3. Name, Review and Create * Add `vmagent-sd-role` as the name of the role, review permissions and trusted entities and create role ![Add vmagent role](/_astro/add-vmagent-sd-role.DvPfQLED_voqDX.webp) ## Use the `vmagent-sd-role` ARN in vmagent configuration Update the `scrape_configs` stanza in your `vmagent.yaml` with the `ec2_sd_configs` stanza as follows and restart vmagent. ```yaml # vmagent.yaml # Check https://prometheus.io/docs/prometheus/latest/configuration/configuration for more details scrape_configs: - job_name: "node-exporter-sd" ec2_sd_configs: - region: ap-south-1 role_arn: "__role_arn_with_ec2_read_access__" filters: - name: tag:namespace values: - node-exporter port: 9100 ``` This will discover new EC2 instances automatically using the Service Discovery mechanism and their metrics will sent to Last9 from vmagent. # Scrape selective metrics in Prometheus > Recipe to only scrape selective metrics in Prometheus to reduce cardinality Prometheus provides the ability to filter specific metrics post-scraping, via the `metrics_relabel_config` stanza. Tip This is useful in reducing the number of metrics before they are consumed and sent to a remote write long term storage. We will use the `node_exporter` Prometheus exporter as an example, and send only metrics matching the regex `node_cpu.*`. ## Prometheus scrape config without filtering ```shell global: scrape_interval: 1m scrape_configs: - job_name: 'node-exporter-01' static_configs: - targets: [ 'localhost:9100' ] ``` This scrapes and stores all node exporter metrics. ## Prometheus scrape config with filtering ```bash global: scrape_interval: 1m scrape_configs: - job_name: 'node-exporter-01' static_configs: - targets: [ 'localhost:9100' ] metric_relabel_configs: - source_labels: [__name__] action: keep regex: '(node_cpu)' ``` This will scrape all metrics, but drop anything that does not match the entries in the `regex` section. # Scrape selective kube state metrics > This document describes how to scrape selective kube state metrics ## Obtain the list of kube state metrics from your Last9 cluster ```json curl -XGET 'https://:@read-app-tsdb.last9.io/hot/v1/metrics//sender//api/v1/label/__name__/values' | jq | grep -e "kube_" "kube_apiserver_pod_logs_pods_logs_backend_tls_failure_total", "kube_apiserver_pod_logs_pods_logs_insecure_backend_total", "kube_certificatesigningrequest_annotations", "kube_certificatesigningrequest_cert_length", "kube_certificatesigningrequest_condition", "kube_certificatesigningrequest_created", "kube_certificatesigningrequest_labels", "kube_configmap_annotations", "kube_configmap_created", "kube_configmap_info", "kube_configmap_labels", "kube_configmap_metadata_resource_version", "kube_daemonset_annotations", "kube_daemonset_created", "kube_daemonset_labels", "kube_daemonset_metadata_generation", "kube_daemonset_status_current_number_scheduled", "kube_daemonset_status_desired_number_scheduled", "kube_daemonset_status_number_available", "kube_daemonset_status_number_misscheduled", "kube_daemonset_status_number_ready", "kube_daemonset_status_number_unavailable", "kube_daemonset_status_observed_generation", "kube_daemonset_status_updated_number_scheduled", "kube_deployment_annotations", "kube_deployment_created", "kube_deployment_labels", "kube_deployment_metadata_generation", "kube_deployment_spec_paused", "kube_deployment_spec_replicas", "kube_deployment_spec_strategy_rollingupdate_max_surge", "kube_deployment_spec_strategy_rollingupdate_max_unavailable", "kube_deployment_status_condition", "kube_deployment_status_observed_generation", "kube_deployment_status_replicas", "kube_deployment_status_replicas_available", "kube_deployment_status_replicas_ready", "kube_deployment_status_replicas_unavailable", "kube_deployment_status_replicas_updated", "kube_endpoint_address", "kube_endpoint_address_available", "kube_endpoint_address_not_ready", "kube_endpoint_annotations", "kube_endpoint_created", "kube_endpoint_info", "kube_endpoint_labels", "kube_endpoint_ports", "kube_ingress_annotations", "kube_ingress_created", "kube_ingress_info", "kube_ingress_labels", "kube_ingress_metadata_resource_version", "kube_ingress_path", "kube_job_annotations", "kube_job_complete", "kube_job_created", "kube_job_info", "kube_job_labels", "kube_job_owner", "kube_job_spec_completions", "kube_job_spec_parallelism", "kube_job_status_active", "kube_job_status_completion_time", "kube_job_status_failed", "kube_job_status_start_time", "kube_job_status_succeeded", "kube_lease_owner", "kube_lease_renew_time", "kube_mutatingwebhookconfiguration_created", "kube_mutatingwebhookconfiguration_info", "kube_mutatingwebhookconfiguration_metadata_resource_version", "kube_namespace_annotations", "kube_namespace_created", "kube_namespace_labels", "kube_namespace_status_phase", "kube_node_annotations", "kube_node_created", "kube_node_deletion_timestamp", "kube_node_info", "kube_node_labels", "kube_node_spec_taint", "kube_node_spec_unschedulable", "kube_node_status_allocatable", "kube_node_status_capacity", "kube_node_status_condition", "kube_persistentvolume_annotations", "kube_persistentvolume_capacity_bytes", "kube_persistentvolume_claim_ref", "kube_persistentvolume_created", "kube_persistentvolume_info", "kube_persistentvolume_labels", "kube_persistentvolume_status_phase", "kube_persistentvolumeclaim_access_mode", "kube_persistentvolumeclaim_annotations", "kube_persistentvolumeclaim_created", "kube_persistentvolumeclaim_info", "kube_persistentvolumeclaim_labels", "kube_persistentvolumeclaim_resource_requests_storage_bytes", "kube_persistentvolumeclaim_status_phase", "kube_pod_annotations", "kube_pod_completion_time", "kube_pod_container_info", "kube_pod_container_resource_limits", "kube_pod_container_resource_requests", "kube_pod_container_state_started", "kube_pod_container_status_last_terminated_exitcode", "kube_pod_container_status_last_terminated_reason", "kube_pod_container_status_ready", "kube_pod_container_status_restarts_total", "kube_pod_container_status_running", "kube_pod_container_status_terminated", "kube_pod_container_status_terminated_reason", "kube_pod_container_status_waiting", "kube_pod_container_status_waiting_reason", "kube_pod_created", "kube_pod_deletion_timestamp", "kube_pod_info", "kube_pod_init_container_info", "kube_pod_init_container_status_ready", "kube_pod_init_container_status_restarts_total", "kube_pod_init_container_status_running", "kube_pod_init_container_status_terminated", "kube_pod_init_container_status_terminated_reason", "kube_pod_init_container_status_waiting", "kube_pod_init_container_status_waiting_reason", "kube_pod_ips", "kube_pod_labels", "kube_pod_owner", "kube_pod_restart_policy", "kube_pod_spec_volumes_persistentvolumeclaims_info", "kube_pod_spec_volumes_persistentvolumeclaims_readonly", "kube_pod_start_time", "kube_pod_status_container_ready_time", "kube_pod_status_phase", "kube_pod_status_qos_class", "kube_pod_status_ready", "kube_pod_status_ready_time", "kube_pod_status_reason", "kube_pod_status_scheduled", "kube_pod_status_scheduled_time", "kube_pod_status_unschedulable", "kube_pod_tolerations", "kube_poddisruptionbudget_annotations", "kube_poddisruptionbudget_created", "kube_poddisruptionbudget_labels", "kube_poddisruptionbudget_status_current_healthy", "kube_poddisruptionbudget_status_desired_healthy", "kube_poddisruptionbudget_status_expected_pods", "kube_poddisruptionbudget_status_observed_generation", "kube_poddisruptionbudget_status_pod_disruptions_allowed", "kube_replicaset_annotations", "kube_replicaset_created", "kube_replicaset_labels", "kube_replicaset_metadata_generation", "kube_replicaset_owner", "kube_replicaset_spec_replicas", "kube_replicaset_status_fully_labeled_replicas", "kube_replicaset_status_observed_generation", "kube_replicaset_status_ready_replicas", "kube_replicaset_status_replicas", "kube_secret_annotations", "kube_secret_created", "kube_secret_info", "kube_secret_labels", "kube_secret_metadata_resource_version", "kube_secret_type", "kube_service_annotations", "kube_service_created", "kube_service_info", "kube_service_labels", "kube_service_spec_type", "kube_service_status_load_balancer_ingress", "kube_storageclass_annotations", "kube_storageclass_created", "kube_storageclass_info", "kube_storageclass_labels", "kube_validatingwebhookconfiguration_created", "kube_validatingwebhookconfiguration_info", "kube_validatingwebhookconfiguration_metadata_resource_version", ``` ## Let’s decide to omit `kube_certificatesigningrequest_*` metrics 1. Prepare a list of metrics you want to omit. This needs to be a comma separated array of strings. ```json [ "kube_certificatesigningrequest_annotations", "kube_certificatesigningrequest_cert_length", "kube_certificatesigningrequest_condition", "kube_certificatesigningrequest_created", "kube_certificatesigningrequest_labels" ] ``` 2. Find the `metricDenylist` configuration in your Kube State Metrics Helm chart and append this list to that config. ```json metricDenylist: ["kube_certificatesigningrequest_annotations", "kube_certificatesigningrequest_cert_length", "kube_certificatesigningrequest_condition", "kube_certificatesigningrequest_created", "kube_certificatesigningrequest_labels"] ``` 3. Now deploy your Kube State Metrics Helm chart as usual 4. Run the below command after the deployment to verify that it is in effect. You should not find any metrics with `kube_certificatesigningrequest` prefix being emitted anymore ```json curl -XGET 'https://:@read-app-tsdb.last9.io/hot/v1/metrics//sender//api/v1/label/__name__/values' | jq | grep -e "kube_certificatesigningrequest_" ``` # Install VictoriaMetrics VMAgent on Ubuntu > This tutorial walks through setting up VMAgent on Ubuntu as a standalone process and monitor itself. [VMAgent](https://docs.victoriametrics.com/vmagent.html#vmagent) is a tiny agent which helps you collect metrics from various sources, relabel and filter the collected metrics and store them in Prometheus compatible remote storage such as Last9 using the [Prometheus remote write](https://last9.io/blog/what-is-prometheus-remote-write/) protocol. In this tutorial, we’ll cover how to install VMAgent on your Ubuntu server, manage the VMAgent process, and set up scrape configs to collect metrics from VMAgent itself. ## Prerequisites Before you begin this guide, you should have a regular, non-root user with sudo privileges configured on your server and also basic dependencies such as `wget`. Create a Last9 cluster by following the [Quick Start Guide](/docs/onboard/). Keep the following information handy after creating the cluster: * `$levitate_remote_write_url` - Last9 cluster’s Remote write endpoint * `$levitate_remote_write_username` - Cluster ID * `$levitate_remote_write_password` - Write token created for the cluster ## Download & extract VMAgent VMAgent is available in as part of the VictoriaMetrics repository’s [latest releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest). First obtain the required binary. ```bash $ ARCH=$(uname -m) case $ARCH in x86_64 | amd64) ARCH="amd64" ;; aarch64 | arm64) ARCH="arm64" ;; *) echo "Unsupported architecture: $ARCH" exit 1 ;; esac $ wget -O /var/tmp/vmutils.tar.gz "https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.96.0/vmutils-linux-${ARCH}-v1.96.0.tar.gz" ``` Extract VMAgent from the compressed file i.e `/var/tmp/vmutils.tar.gz` ```bash $ sudo mkdir -p /opt/vmagent $ tar -xzf /var/tmp/vmutils.tar.gz -C /opt/vmagent ``` ## Setup VMAgent scrape config Create a scrape config in the same directory as the VMAgent binary. ```bash $ sudo cat > /opt/vmagent/vmagent.yaml </etc/systemd/system/vmagent.service < Step by step guide on how to setup vmoperator with only vmagent and vmservicescrape to scrape your Kubernetes svcs and remote write metrics to Last9 The [vmoperator](https://github.com/VictoriaMetrics/operator) streamlines the deployment and management of vmagent on Kubernetes, optimizing for ease of use while retaining native configuration options inherent to Kubernetes environments. It achieves this by introducing various custom resource definitions (CRDs) into the Kubernetes ecosystem and send metrics semalessly to Last9. **Custom Resource Definitions (CRDs)** * vmagent * vmnodescrapes * vmservicescrapes * vmprobes * vmpodscrapes * vmstaticscrapes These CRDs empower users to effortlessly create and manage vmagent instances along with scrape configurations like VMServiceScrape, VMPodScrape, etc. These configurations closely resemble the [Prometheus Operator’s](https://prometheus-operator.dev/) ServiceMonitor and PodMonitor. This eliminates the need for manual setup of vmagent deployment, image configuration, and other intricate details. ## Prerequisites Make sure you have the following prerequisites installed: * [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl) - Kubernetes command-line tool * Clone this repository to your local machine: ```bash git clone https://github.com/last9/vmagent-operator-levitate.git cd vmagent-operator-levitate ``` ## Install Custom Resource Definitions (CRDs) 1. Navigate to the `crds/` directory: ```bash cd ./crds/ ``` 2. Install the CRDs using `kubectl`: ```bash $ kubectl apply -f ./crd.yaml ``` 3. Verify that the CRDs are successfully installed: ```bash $ kubectl get crd --sort-by=.metadata.creationTimestamp # Output NAME CREATED AT vmagents.operator.victoriametrics.com 2023-12-26T12:02:48Z vmnodescrapes.operator.victoriametrics.com 2023-12-26T12:02:49Z vmservicescrapes.operator.victoriametrics.com 2023-12-26T12:02:50Z vmprobes.operator.victoriametrics.com 2023-12-26T12:02:50Z vmpodscrapes.operator.victoriametrics.com 2023-12-26T12:02:50Z vmstaticscrapes.operator.victoriametrics.com 2023-12-26T12:02:51Z ``` You should see the names of the installed CRDs in the output. ## Install vmoperator 1. Navigate to the `operator/` directory: ```bash cd ./operator/ ``` 2. Install the operator and rbac using `kubectl`: ```bash $ kubectl apply -f ./manager.yaml -f rbac.yaml ``` Note This creates a separate `last9-monitoring` namespace to not clash with your existing namespaces. 3. Verify the status of the operator: ```bash $ kubectl get pods -n monitoring-system # Output NAME READY STATUS RESTARTS AGE vm-operator-667dfbff55-cbvkf 1/1 Running 0 101s 2023-12-26T12:02:51Z ``` You should see the vmoperator pod in running status. ## Install vmagent 1. Navigate to the `vmagent/` directory: ```bash cd ./vmagent/ ``` 2. You will need to obtain your Last9 cluster’s Remote Write URL and its credentials. [Here](/docs/onboard/) is a quick way to create your cluster and obtain your credentials. Run the below command to list all the placeholder values in this file [vmagent.yaml](https://github.com/last9/vmagent-operator-levitate/blob/main/vmagent/vmagent.yaml) ```bash $ cat ./vmagent.yaml | grep -n "Todo" # Output 11: levitate_cluster_username: "" # Todo: append levitate cluster username 12: levitate_cluster_password: "" # Todo: append levitate cluster password 31: via_cluster: # Todo: add a relevant cluster name. e.g: k8s cluster name 33: - url: # Todo: append levitate remote write URL 55: storage: 20Gi # Todo: Default is 20Gi. Scale up after you have provisioned more if you need more 58:# Todo: Below configs need to be enabled depending upon your affinity towards nodegroups. 59:# Todo: Ensure that the below selector terms and tolerations are exactly same as the metadata of the nodegroups itself. ``` 3. Proceed to installation once you have replaced the placeholder values with actual values 4. Install vmagent in the `last9-monitoring` namespace using `kubectl`: ```bash $ kubectl apply -f ./vmagent.yaml -n last9-monitoring ``` 5. Verify the status of the vmagent and ensure that it’s running: ```bash $ kubectl get pods -n last9-monitoring -l "last9_monitoring_agent=vmagent" # Output NAME READY STATUS RESTARTS AGE vmagent-demo-6785f7d7b9-zpbv6 2/2 Running 0 72s ``` ## Install VMServiceScrape (i.e ServiceMonitor and PodMonitor) Navigate to the `vmservicescrape/` directory: ````plaintext ```bash cd ./vmservicescrape/ ``` ```` **Caveats** VMServiceScrape works similar to Prometheus Operator’s ServiceMonitor and PodMonitor where you can define scrape selectors to do service and pod discovery. In this file [vmservicescrape.yaml](https://github.com/last9/vmagent-operator-levitate/blob/main/vmservicescrape/vmservicescrape.yaml) you can override default scrape selectors to suit your requirements. By default this assumes the default labels that are applied to the exporter as part of their installations. Below is the generic command to find you the labels of your K8s Services. ````plaintext ```bash $ kubectl get services -n -o jsonpath='{.metadata.labels}' ``` ```` Once, you have inspected the labels for the services that you chose to scrape, you can then proceed to modify this file [vmservicescrape.yaml](https://github.com/last9/vmagent-operator-levitate/blob/main/vmservicescrape/vmservicescrape.yaml) and match the scrape selector labels with the labels of your services. Another caveat to note here is to ensure that the namespaces are also declared correctly for the scrape selectors to correctly perform service discovery. This file also includes scrape configs for Common Exporters such as Kafka, Redis, Node, RabbitMQ, Prometheus Pushgateway etc. Run this command to list all the `Todo` comments which will guide you to customize this file [vmservicescrape.yaml](https://github.com/last9/vmagent-operator-levitate/blob/main/vmservicescrape/vmservicescrape.yaml) as required. ````plaintext ```bash $ cat ./vmservicescrape.yaml | grep -n "Todo" # Output 48: matchNames: [ "kube-system" ] # Todo: append namespaces here 63: matchNames: [ "kube-system" ] # Todo: append namespaces here 86:# Todo: Uncomment this if you have custom application enabled svcs running and you want to scrape them 97:# matchNames: [ ] # Todo: Append more namespaces here 105:# app: "" # Todo: Append app name label here 107:# Todo: Uncomment this if you have node exporters svcs running and you want to scrape them 118:# matchNames: [ ] # Todo: Append more namespaces here 129:# Todo: Uncomment this if you have rabbitMQ exporters svcs running and you want to scrape them 140:# matchNames: [ ] # Todo: Append more namespaces here 151:# Todo: Uncomment this if you have kafka exporters svcs running and you want to scrape them 162:# matchNames: [ ] # Todo: Append more namespaces here 173:# Todo: Uncomment this if you have redis exporters svcs running and you want to scrape them 184:# matchNames: [ ] # Todo: Append more namespaces here 195:# Todo: Uncomment this if you have pushgateway svcs running and you want to scrape them 206:# matchNames: [ ] # Todo: Append more namespaces here ``` ```` Install VMServiceScrape using `kubectl`: ````plaintext ```bash $ kubectl apply -f ./vmservicescrape.yaml -n last9-monitoring # Output vmservicescrape.operator.victoriametrics.com/last9-vmservicescrape-vmagent-01 created vmservicescrape.operator.victoriametrics.com/last9-servicescrape-k8s-01 created vmservicescrape.operator.victoriametrics.com/last9-servicescrape-metrics-server-01 created ``` ```` *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Tutorials > Find best practices recipes and tutorials for monitoring and observability curated by the Last9 team. 1. [Calculate usage patterns and data volume in Prometheus](/docs/how-to-calculate-usage-patterns-and-data-volume-in-prometheus/) 2. [Scrape selective metrics in Prometheus](/docs/how-to-scrape-only-selective-metrics-in-prometheus/) 3. [Using OpenTelemetry Exporter for Prometheus Remote Write](/docs/using-open-telemetry-exporter-for-prometheus-remote-write/) 4. [Scrape selective kube state metrics](/docs/how-to-scrape-selective-kube-state-metrics/) 5. [Setting up Docker and Docker Compose](/docs/setting-up-docker-and-docker-compose-on-linux/) 6. [Install VictoriaMetrics VMAgent on Ubuntu](/docs/how-to-setup-vmagent-on-ubuntu/) 7. [Create a GCP service account with read-only access for monitoring](/docs/create-gcp-service-account-with-read-only-access/) 8. [Setup Kubernetes monitoring using kube-state-metrics(KSM) and Prometheus](/docs/ingest-kubernetes-metrics-via-prometheus/) 9. [Setup vmoperator with only vmagent and vmservicescrape](/docs/how-to-setup-vmoperator-with-vmagent-in-kubernetes/) 10. [Enable EC2 Service Discovery with vmagent](/docs/how-to-enable-ec2-service-discovery-with-vmagent/) 11. [Create AWS STS (Secure Token Service) Role](/docs/how-to-create-aws-sts-role/) 12. [Monitor RabbitMQ using Last9](/docs/integrations-rabbitmq/) 13. [Delegate Subdomain between two AWS Accounts using Route 53](/docs/delegate-subdomain-between-aws-accounts-using-route-53/) # Querying Last9 using HTTP API > How to query metrics from Last9 using HTTP API This step-by-step guide explains how to query Last9 using HTTP API. ## Last9 Read URL Create a Last9 cluster by following [Getting Started](/docs/onboard/). Each Last9 Cluster comes with a Read URL which needs to be used when querying metrics data from Last9. ![Last9 Read Data Settings](/_astro/levitate-cluster-read-data-settings-tab.BoWiHQqm_Z1QDGCt.webp) You can grab the Read URL by going to the Last9 Cluster → Settings → Read Data → Bring Your Own Visualization. Note Last9 Read URL mandates authenticated access. Create a Read Token for your cluster and use it as `password`. Use the cluster id as `username`. Keep the following information handy after creating the Last9 cluster: * `$levitate_read_url` - Last9’s Read endpoint * `$levitate_username` - Cluster ID * `$levitate_password` - Read token created for the cluster ## Authentication Generate the authorization header for authenticated access to Last9 metrics API as follows. ```bash USERNAME="$levitate_ cluster_username" PASSWORD="$levitate_cluster_password" BASIC_AUTH_HEADER=$(echo -n "$USERNAME:$PASSWORD" | base64) AUTH_HEADER="Authorization: Basic $BASIC_AUTH_HEADER" ``` Note Last9 is Prometheus compatible TSDB. You can query metrics stored in Last9 using Prometheus HTTP API. ## Instant Query The simplest way to query Last9 is to use `$levitate_read_url` along with `$AUTH_HEADER`, which allows you to execute an instant query. ```bash curl -XPOST "$levitate_read_url/api/v1/query?query=" -H "$AUTH_HEADER" ``` **Example**: Query the current CPU usage: ```bash curl -XPOST "$levitate_read_url/api/v1/query?query=node_cpu_seconds_total{}" -H "$AUTH_HEADER" ``` ## Range Query Use the `query_range` endpoint to retrieve data over a time range. You need to specify the start time, end time, and step duration. ```bash curl -XPOST '$levitate_read_url/api/v1/query_range?query=&start=&end=&step=' -H "$AUTH_HEADER" ``` **Example**: Query CPU usage over the last hour with a 1-minute step: ```bash curl -XPOST '$levitate_read_url/api/v1/query_range?query=node_cpu_seconds_total&start=$(date -d "1 hour ago" +%s)&end=$(date +%s)&step=60' -H "$AUTH_HEADER" ``` Tip For more details, refer to the [Prometheus HTTP API documentation](https://prometheus.io/docs/prometheus/latest/querying/api/#http-api). *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Indicators > Overview of Indicators ## Indicator Overview Indicators are PromQL queries saved as a part of an Alert Group. Alert Rules evaluate these Indicators against thresholds to generate alerts. ## Creating an Indicator Before you configure an Alert Rule, you need at least one indicator to be created in the Alert Group. To create an Indicator: 1. Navigate to the Alert Group in which you would like to create the indicator: **Home** → **Alert Studio** → **Alert Groups** → *Select an Alert Group* → **Indicators** Tab and Click on the **Create New Indicator** ![Creating An Indicator 1](/_astro/indicators-1.BQPXx2Dd_A0XMk.webp) ![Creating An Indicator 2](/_astro/indicators-2.h2Hoksl__zScxn.webp) 2. The following details are required for an Indicator: 1. **Indicator Name**: Use a descriptive name so that the Indicator is easily identified 2. **Indicator Description** *(Optional)*: Helps your team members identify the purpose of the Indicator 3. **Query**: The PromQL query for the indicator. This can be a query that returns multiple timeseries (as seen in the example below) but cannot contain any variables (for example, `$instance`) 4. **Unit**: The unit that which want to assign to the indicator 5. **Data Source** *(Advanced)*: By default, Indicators inherit the data source from their Alert Group. That is if you change the data source for the Alert Group the same data source will be used for the indicator. You can also override this behavior and assign a different data source for the Indicator. Once you override the Indicator’s data source, the configured data source will now take precedence over the data source configured at the Alert Group level ![Creating An Indicator 3](/_astro/indicators-3.CYTYmEGI_18IeFB.webp) After entering the query, you will need to validate it to ensure that it has no syntax errors. If the query is validated successfully, a preview will be generated. ![Creating An Indicator 4](/_astro/indicators-4.CZGiMkVY_ZHs2Lv.webp) Click **Create Indicator** to save this Indicator. ![Creating An Indicator 5](/_astro/indicators-5.BeHzA9pr_Zc0njt.webp) 3. This Indicator is now ready to be used in Alert Rules. To edit/duplicate or delete this indicator can click the **…** button ![Creating An Indicator 6](/_astro/indicators-6.CsSFHl7o_Z1BKXlO.webp) *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Setup Kubernetes monitoring using kube-state-metrics(KSM) and Prometheus Agent > Step by step guide to enable ingesting Kubernetes metrics via Prometheus Agent and send to Last9 via remote write. ## Pre-requisites 1. Ensure that your [kubectl configuration](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/) is pointing to the right Kubernetes cluster 2. Create a Last9 cluster by following [Quick start guide](/docs/onboard/) ## What is kube-state-metrics(KSM) `kube-state-metrics` (KSM) is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects. It is not focused on the health of the individual Kubernetes components but on the health of the various objects inside, such as deployments, nodes, and pods. The metrics are exported by default to the port’s HTTP endpoint `/metrics` on port 8080. They are served as plaintext. They are designed to be consumed either by Prometheus itself or by a scraper compatible with scraping a Prometheus client endpoint. You can also open `/metrics` in a browser to see the raw metrics. Note that the metrics exposed on the `/metrics` endpoint reflect the current state of the Kubernetes cluster. When Kubernetes objects are deleted, they are no longer visible on the `/metrics` endpoint. Tip The documentation for the metrics exposed by KSM can be found [here](https://github.com/kubernetes/kube-state-metrics/tree/main/docs/). ## Automated installation (Preferred) ### Step 1: Copy the installation command ### Step 2: Run the installation command Before running the command, update it to use the write token of the Last9 cluster. ```text ``` Running the command will download the manifest yaml in the current working directory. It is strongly recommended that you check the manifest file in git so that it can be extended later. You can just follow the video to see the end-to-end setup. ## Manual Installation 1. Clone the GitHub repo ```shell git clone https://github.com/kubernetes/kube-state-metrics.git ``` 2. Deployment steps To deploy this project, you can simply run `kubectl apply -f examples/standard`, and a Kubernetes service and deployment will be created. ```shell kubectl apply -f examples/standard ``` Read for more details on deployment [here](https://github.com/kubernetes/kube-state-metrics?tab=readme-ov-file#kubernetes-deployment). 3. Validate corresponding deployment ```shell kubectl get deployments kube-state-metrics -n kube-system ``` This is the sample output that you should see. ```shell NAME READY UP-TO-DATE AVAILABLE AGE kube-state-metrics 1/1 1 1 6d1h ``` ## Configure remote write to Last9 If you already have a running Prometheus setup, add the attached scrape configs, and remote write setup to your Prometheus config file to send data to Last9. ```yaml # prometheus.yaml scrape_configs: - job_name: "node-exporter" kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_endpoints_name] regex: "node-exporter" action: keep - job_name: "kubernetes-apiservers" kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [ __meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name, ] action: keep regex: default;kubernetes;https - job_name: "kubernetes-nodes" scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics - job_name: "kubernetes-pods" kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - job_name: "kube-state-metrics" static_configs: - targets: ["kube-state-metrics.kube-system.svc.cluster.local:8080"] - job_name: "kubernetes-cadvisor" scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor - job_name: "kubernetes-service-endpoints" kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name remote_write: - url: remote_timeout: 60s queue_config: capacity: 10000 max_samples_per_send: 3000 batch_send_deadline: 20s min_shards: 4 max_shards: 200 min_backoff: 100ms max_backoff: 10s basic_auth: username: password: ``` * Replace the `cluster` variable in `external_labels` as per the description ```yaml external_labels: # TODO - replace xyz.acme.io with a logical name for the cluster being scraped. # by Prometheus e.g. prod1.xyz.com cluster: "xyz.acme.io" ``` Tip If you do not have a Prometheus setup, you can [setup vmagent](/docs/how-to-setup-vmagent-on-ubuntu/) as well. ## Steps to uninstall the KSM setup ### For automated setup To uninstall the Kubernetes resources that were created by the automated installation, you can use the `kubectl delete` command with the `-f` flag pointing to the same YAML file. This will delete all the resources defined in the file. ```bash kubectl delete -f kube-state-metrics.yml ``` This command will remove the namespaces, deployments, services, service accounts, and any other resources defined in the `kube-state-metrics.yml` file. ### For manual setup Delete the created kube-state-metrics objects. ```shell kubectl delete -f examples/standard ``` # Ingestion Tokens > Create and manage tokens for sending telemetry data to Last9, including RUM and Prometheus metrics. ![Control Plane — Ingestion Tokens](/_astro/control-plane-ingestion-tokens.QRukipwK_1C7Ybv.webp) [Ingestion Tokens](https://app.last9.io/control-plane/ingestion-tokens) authenticate your applications and services when sending telemetry data to Last9. These tokens control what data can be sent and from which origins, ensuring secure data collection. ## Creating Ingestion Tokens ![Control Plane — New Ingestion Tokens](/_astro/control-plane-ingestion-tokens-create.Dc682ue6_Z2qTogv.webp) 1. **Select Token Type**: * **Client**: For client-based data collection (eg: RUM) * **Prometheus Remote-Write**: For server-side Prometheus-compatible metrics 2. **Configure Client Type** (if Client selected): * **Web Browser**: Default and currently available option for web applications * **Mobile**: Coming soon for mobile app monitoring 3. **Set Origins** (for Client tokens): * Add the domains from which your application will send data * Example: `https://www.example.com` * Multiple origins can be added for multi-domain applications * Subdomain matching is not automatic, each subdomain needs separate entries Caution Data sent from origins not listed here will be rejected. ```plaintext ✅ Correct Origins: https://app.example.com https://www.example.com http://localhost:3000 ❌ Incorrect Origins: example.com (missing protocol) *.example.com (wildcards not supported) app.example.com (missing protocol) ``` 4. **Name Your Token**: * Use a descriptive name to identify the token’s purpose * Example: “Production RUM Token” or “Staging Web Monitoring” 5. Click **CREATE TOKEN** to generate your token *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Instrumentation > Instrumentation Overview [![Prometheus](https://cdn.simpleicons.org/prometheus/fff) Prometheus ](/docs/integrations-prometheus/)[![vmagnet](https://cdn.simpleicons.org/victoriametrics/fff) vmagnet ](/docs/integrations-vmagent/)[![OpenTelemetry](https://cdn.simpleicons.org/opentelemetry/fff) OpenTelemetry ](/docs/integrations-opentelemetry-collector/)[![Express.js](https://cdn.simpleicons.org/express/fff) Express.js ](/docs/integrations-opentelemetry-expressjs/)[![NestJS](https://cdn.simpleicons.org/nestjs/fff) NestJS ](/docs/integrations-opentelemetry-nestjs/)[![Next.js](https://cdn.simpleicons.org/nextdotjs/fff) Next.js ](/docs/integrations-opentelemetry-nextjs/)[![Koa](https://cdn.simpleicons.org/koa/fff) Koa ](/docs/integrations-opentelemetry-koa/)[![Django](https://cdn.simpleicons.org/django/fff) Django ](/docs/integrations-opentelemetry-django/)[![Flask](https://cdn.simpleicons.org/flask/fff) Flask ](/docs/integrations-opentelemetry-flask/)[![Kong](https://cdn.simpleicons.org/kong/fff) Kong ](/docs/integrations-opentelemetry-kong-gateway/)[![Phoenix Framework](https://cdn.simpleicons.org/phoenixframework/fff) Phoenix Framework ](/docs/integrations-opentelemetry-phoenix/)[![Ruby on Rails](https://cdn.simpleicons.org/rubyonrails/fff) Ruby on Rails ](/docs/integrations-opentelemetry-ruby-on-rails/)[![Sinatra](https://cdn.simpleicons.org/rubysinatra/fff) Sinatra ](/docs/integrations-opentelemetry-ruby-on-rails/)[![Roda](https://last9.github.io/assets-docs/integration-roda.svg) Roda ](/docs/integrations-opentelemetry-ruby-on-rails/)[![Fluent Bit](https://cdn.simpleicons.org/fluentbit/fff) Fluent Bit ](/docs/integrations-opentelemetry-fluent-bit/)[![Ubuntu](https://cdn.simpleicons.org/ubuntu/fff) Ubuntu ](/docs/integrations-opentelemetry-ubuntu/)[![Datadog Agent](https://cdn.simpleicons.org/datadog/fff) Datadog Agent ](/docs/integrations-opentelemetry-datadog-agent/)[![Ubuntu Host Metrics](https://cdn.simpleicons.org/ubuntu/fff) Ubuntu Host Metrics ](/docs/integrations-opentelemetry-ubuntu-host-metrics/)[![Gin](https://cdn.simpleicons.org/gin/fff) Gin ](/docs/integrations-opentelemetry-gin/)[![gRPC](https://last9.github.io/assets-docs/integration-grpc.svg) gRPC ](/docs/integrations-opentelemetry-grpc/)[![FastHTTP](https://cdn.simpleicons.org/go/fff) FastHTTP ](/docs/integrations-opentelemetry-fasthttp/)[![Iris](https://cdn.simpleicons.org/go/fff) Iris ](/docs/integrations-opentelemetry-iris/)[![Gorilla Mux](https://cdn.simpleicons.org/go/fff) Gorilla Mux ](/docs/integrations-opentelemetry-gorilla-mux/)[![Logs from Kubernetes Cluster](https://cdn.simpleicons.org/kubernetes/fff) Logs from Kubernetes Cluster ](/docs/integrations-opentelemetry-kubernetes-logs/)[![Kubernetes Audit Logs](https://cdn.simpleicons.org/kubernetes/fff) Kubernetes Audit Logs ](/docs/integrations-opentelemetry-kubernetes-audit-logs/)[![AWS EC2 Instance](https://cdn.simpleicons.org/amazonec2/fff) AWS EC2 Instance ](/docs/integrations-opentelemetry-aws-ec2/)[![Logs from AWS S3](https://cdn.simpleicons.org/amazons3/fff) Logs from AWS S3 ](/docs/integrations-opentelemetry-aws-s3/)[![MariaDB](https://cdn.simpleicons.org/mariadb/fff) MariaDB ](/docs/integrations-opentelemetry-mariadb/)[![statsd](https://last9.github.io/assets-docs/integration-statsd.png) statsd ](/docs/integrations-statsd/)[![AWS Cloudwatch Metric Stream](https://cdn.simpleicons.org/amazoncloudwatch/fff) AWS Cloudwatch Metric Stream ](/docs/integrations-aws-cloudstream/)[![Telegraf](https://cdn.simpleicons.org/influxdb/fff) Telegraf ](/docs/integrations-telegraf/)[![jmxtrans](https://last9.github.io/assets-docs/integration-jmxtrans.png) jmxtrans ](/docs/integrations-jmxtrans/)[![Confluent](https://last9.github.io/assets-docs/integration-confluent.svg) Confluent ](/docs/integrations-confluent-cloud/)[![Loki](https://cdn.simpleicons.org/grafana/fff) Loki ](/docs/integrations-grafana-loki-ruler/)[![Akamai](https://cdn.simpleicons.org/akamai/fff) Akamai ](/docs/integrations-akamai/)[![Keda](https://last9.github.io/assets-docs/integration-keda.png) Keda ](/docs/integrations-keda/)[![Prodvana](https://last9.github.io/assets-docs/integration-prodvana.svg) Prodvana ](/docs/integrations-prodvana/)[![LaunchDarkly](https://last9.github.io/assets-docs/integration-launchdarkly.svg) LaunchDarkly ](/docs/integrations-launchdarkly/)[![Apache APISIX](https://cdn.simpleicons.org/apache/fff) Apache APISIX ](/docs/integrations-apache-apisix/)[![Fastly](https://cdn.simpleicons.org/fastly/fff) Fastly ](/docs/integrations-fastly/)[![Cloudflare Logs](https://cdn.simpleicons.org/cloudflare/fff) Cloudflare Logs ](/docs/integrations-cloudflare-logs/)[![Cloudflare Workers](https://cdn.simpleicons.org/cloudflare/fff) Cloudflare Workers](/docs/integrations-cloudflare-workers/) # Akamai > Send logs and metrics to Last9 from Akamai for CDN monitoring This document lists step-by-step instructions for Akamai CDN monitoring with Last9. ## Prerequisites Create a Last9 cluster by following [Getting Started](/docs/onboard/). Keep the following information handy after creating the cluster: * `$levitate_remote_write_url` - Last9’s Remote write endpoint * `$levitate_remote_write_username` - Cluster ID * `$levitate_remote_write_password` - Write token created for the cluster ## Setup Last9 supports ingesting logs and metrics from Akamai. ### Logs Last9 leverages [Akamai Datastream V2](https://techdocs.akamai.com/datastream2/docs/welcome-datastream2) Custom HTTPS endpoint integration to push logs from Akamai to Last9. Datastream can gather performance and security data for your global Akamai edge platform properties and stream them Last9. #### Configuration Add Last9 ingestion endpoint for Akamai logs as Custom HTTPS endpoint. 1. In Destination, select Custom HTTPS. 2. Enter a human-readable description for the destination as a name. 3. In Endpoint URL, enter the ingestion endpoint from [Akamai integration](https://app.last9.io/integrations?category=all\&integration=Akamai+Logs). 4. In Authentication, select: None for no authentication. The authentication is handled as **Basic Authorization** within the URL itself. 5. Last9 Akamai ingestion endpoint supports JSON payloads. Enable `application/json` as the content type and log format as JSON instead of structured logs. 6. Click Validate & Save to validate the connection to Last9. ## Verification Visit [Logs Panel](https://app.last9.io/logs) to see the Akamai logs. > More details on the custom HTTPS endpoint can be found [here](https://techdocs.akamai.com/datastream2/docs/stream-custom-https). ### Metrics Last9 uses the [Akamai Reporting API](https://techdocs.akamai.com/reporting/reference/get-report-version-data) to fetch the relevant metrics and push them to Last9. [Create an Akamai API Client](https://techdocs.akamai.com/developer/docs/set-up-authentication-credentials) for the relevant reporting APIs, share the following credentials with the Last9 team. Tip The complete list of available reports can be found [here](https://techdocs.akamai.com/reporting/reference/available-reports). * `host` * `client_token` * `client_secret` * `access_token` The data flow is as follows. ![Akamai Metrics to Last9](/_astro/akamai-connector-for-levitate.ysbA5QqD_Z14q4Os.webp) #### CP Codes Content Provider codes (CP codes) identify your traffic on the ​Akamai​ network for reporting, billing, and monitoring purposes. The Last9 Akamai integration needs access to the CP Codes from your account to map the metrics underlying the CP Code correctly. Please share the CP Codes in the following CSV format. ```csv Network(ESSL/FF),CPCode,Expected peak hits/s for the CPCode,List of hostnames,Path ESSL,1395076,65000,playback-content.mystreamingservice.com,/stream/v1/users ``` #### Standard Akamai Metrics This integration collects the following standard metrics from Akamai for given CP codes. * `edgeHits` * `hitsOffload` * `originHits` * `bytesOffload` * `edgeBytes` * `midgressBytes` * `originBytes` * `edgeHitsTotal` * `originHitsTotal` * `hitsOffload` * `bytesOffload` * `edgeBytesTotal` * `midgressBytesTotal` * `originBytesTotal` Tip The Akamai reporting API reports metrics at 5-minute intervals. ## Next steps Once the credentials are shared with the Last9 team, the Akamai connector will be enabled for your account. Moreover, metrics will start flowing into your Last9 cluster. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Apache APISIX > Send APISIX metrics and traces to Last9 using Prometheus Remote Write and OpenTelemetry This document will showcase how to send metrics and traces from [Apache APISIX](https://apisix.apache.org/) to Last9. ## Pre-requisites 1. You have created a Last9 cluster by following the [getting started guide](/docs/onboard/) 2. You have Apache APISIX set up and running. ## Setup Metrics ![Apache APISIX Metrics to Last9](/_astro/apisix-metrics-to-levitate.CJdP_Cu3_15ALfu.webp) ### Enable Prometheus Plugin in APISIX The APISIX Prometheus Plugin exports metrics in [Prometheus exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/#exposition-formats). It can be enabled with following configuration. ```yaml plugin_attr: prometheus: export_uri: /apisix/prometheus/metrics ``` The exposed metrics can now be fetched as follows: ```yaml curl -i http://127.0.0.1:9091/apisix/prometheus/metrics ``` Follow the official documentation for more details and configuration options for the Prometheus Plugin [here](https://apisix.apache.org/docs/apisix/plugins/prometheus/). ### Setup Prometheus Agent Follow [this](/docs/integrations-prometheus/) guide to setup Prometheus agent which will remote write metrics to Last9. ### Configure Prometheus Agent to fetch metrics from APISIX Modify the `prometheus.yaml` as follows. ```yaml scrape_configs: - job_name: "apisix" scrape_interval: 15s # This value will be related to the time range of the rate function in Prometheus QL. The time range in the rate function should be at least twice this value. metrics_path: "/apisix/prometheus/metrics" static_configs: - targets: ["127.0.0.1:9091"] ``` Note The IP Address in the documentation is `127.0.0.1`, which may not be the same in your case. This IP address should ideally be the internal IP address of the instance in which APISIX is running. ### Grafana Dashboard You can import the Grafana dashboard for APISIX from the repo [here](https://github.com/apache/apisix/blob/master/docs/assets/other/json/apisix-grafana-dashboard.json) ### Available Metrics The following metrics are exported by the APISIX: * Status code: HTTP status code returned from Upstream services. They are available for a single service and across all services. * Bandwidth: Total amount of traffic (ingress and egress) flowing through APISIX. Total bandwidth of a service can also be obtained. * `etcd reachability`: A gauge type representing whether etcd can be reached by APISIX. A value of 1 represents reachable, and 0 represents unreachable. * Connections: Nginx connection metrics like active, reading, writing, and number of accepted connections. * Batch process entries: A gauge type useful when Plugins like syslog, http-logger, tcp-logger, udp-logger, and zipkin use batch process to send data. Entries that hasn’t been sent in batch process will be counted in the metrics. * Latency: Histogram of the request time per service in different dimensions. * Info: Information about the APISIX node. * Shared dict: The capacity and free space of all `nginx.shared.DICT` in APISIX. * `apisix_upstream_status`: Health check result status of upstream nodes. A value of 1 represents healthy and 0 represents unhealthy. Detailed information on the exposed metrics and their dimensions can be found [here](https://apisix.apache.org/docs/apisix/plugins/prometheus/#available-http-metrics) ## Setup Traces The APISIX OpenTelemetry plugin can be used to report tracing data according to the [OpenTelemetry specification](https://opentelemetry.io/docs/specs/otel/). ![Apache APISIX Traces to Last9](/_astro/apisix-traces-to-levitate.DQ2y5Tkq_28IuIv.webp) ### Enable OpenTelemetry Plugin in APISIX Enable the OpenTelemetry plugin as follows: ```yaml plugins: - opentelemetry plugin_attr: opentelemetry: resource: service.name: APISIX tenant.id: business_id collector: address: request_timeout: 3 request_headers: foo: bar batch_span_processor: drop_on_queue_full: false max_queue_size: 6 batch_timeout: 2 inactive_timeout: 1 max_export_batch_size: 2 ``` You can follow the official documentation on enabling the OpenTelemetry plugin [here](https://apisix.apache.org/docs/apisix/plugins/opentelemetry/#enable-plugin). This will send traces from APISIX to Last9 with [OpenTelemetry Specification](https://opentelemetry.io/docs/specs/otel/). *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # AWS Cloudwatch Metric Stream > AWS Cloudwatch Metric Stream enables customers to send their Cloudwatch metrics to Last9. ## Pre-requisites Obtain the following and copy it to your clipboard from the [Home > Integrations > Cloudwatch](https://app.last9.io/integrations?category=cloudwatch) section. 1. `HTTP Endpoint URL` 2. `Username` 3. `Password` ![Cloudwatch Integration](/_astro/cloudwatch-integration-settings.BpjaK01q_1Wu2wA.webp) ## Setting up required IAM policy Note Ensuring your AWS Identity and Access Management (IAM) user account has access permissions is crucial. The following access policy is specifically crafted to enable actions related to creating Kinesis Data Streams and CloudWatch Metric Streams. ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "cloudwatch:StartMetricStreams", "cloudwatch:PutMetricStream", "cloudwatch:GetMetricStream", "cloudwatch:GetMetricData", "cloudwatch:ListMetrics", "cloudwatch:ListMetricStreams" ], "Resource": ["*"] }, { "Effect": "Allow", "Action": [ "firehose:PutRecord", "firehose:CreateDeliveryStream", "firehose:DescribeDeliveryStream", "firehose:PutRecordBatch", "firehose:UpdateDestination", "firehose:ListDeliveryStreams" ], "Resource": ["*"] }, { "Effect": "Allow", "Action": [ "s3:ListAllMyBuckets", "s3:CreateBucket", "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:PutObject" ], "Resource": ["arn:aws:s3:::*"] }, { "Effect": "Allow", "Action": [ "iam:CreateRole", "iam:CreatePolicy", "iam:AttachRolePolicy", "iam:CreatePolicyVersion", "iam:DeletePolicyVersion", "iam:PassRole" ], "Resource": [ "arn:aws:iam:::policy/*", "arn:aws:iam:::role/*" ] }, { "Effect": "Allow", "Action": ["logs:CreateLogGroup", "logs:CreateLogStream"], "Resource": [ "arn:aws:logs:::log-group:*:log-stream:*" ] } ] } ``` ## Creating an AWS Kinesis Delivery Stream 1. Open the AWS Kinesis homepage ([console.aws.amazon.com/kinesis/home](https://console.aws.amazon.com/kinesis/home)) 2. Open the left sidebar (click on the ☰ icon, if it is not expanded already) 3. Click on `Delivery Streams` ![Delivery streams](/_astro/2de34a8-Screenshot_2021-12-15_at_9.02.08_PM.DLvTDqhL_Z1eC3sE.webp) 4. Click on `Create delivery stream` ![Create delivery stream](/_astro/13140e7-image.DyI_Kwgb_ePiCa.webp) 5. Choose `Direct PUT` ![Direct PUT](/_astro/0b5fc88-image.BGmCldPN_1o3KN.webp) 6. Delivery stream name = `last9-$your_organization_name` ![Delivery stream name](/_astro/f7d864f-image.CiX1bnp6_2c6pQB.webp) 7. Set the copied write HTTP Endpoint URL from the Last9 cluster as an HTTP endpoint. ![metrics endpoint](/_astro/73a658b-image.SMan_PO1_Z9AR0C.webp) 8. Add `username` and `password`. ![Add Username and Password](/_astro/9e99cfe-image.BMNOEsUc_ZbQ96.webp) 9. Choose or create an S3 bucket to save data the stream failed to deliver ![S3 Bucket for failed data](/_astro/e9a9478-image.DANy2gPQ_ZMU7C.webp) 10. Click on `Create delivery stream` ![Create delivery stream](/_astro/1fa401a-image.DQq5_y3G_Z1oW9Be.webp) ## Sending data from Cloudwatch to the delivery stream 1. Open the Cloudwatch console and click on `Metrics -> Streams` ![Cloudwatch console](/_astro/cf1a9fb-image.DN3AcRsA_of4sG.webp) 2. Click on `Create metric stream` ![Create metric stream](/_astro/8524a92-image.KLLl3Q65_1GyLKr.webp) 3. Choose `All metrics` to send all Cloudwatch metrics. Optionally, you can also select the metrics you want to stream. You can include or exclude specific namespaces and metrics you want to send by using `Select metrics` option. ![Select metrics](/_astro/cloudwatch-stream-settings-select-metrics.B3xLGWNc_AhMfa.webp) 4. Ensure that you use the delivery stream created in the earlier step and that the output format is `OpenTelemetry 0.7`. ![Delivery stream settings](/_astro/cloudwatch-stream-settings-format.Br4rfRRW_Z2hNTrQ.webp) 5. Enter the Custom Metric stream name as `last9-$your_organization_name` and then click on `Create metric stream` ![Create metric stream](/_astro/ad88fa1-image.PVYBxwzs_fz0Mi.webp) ## Verification Once the Cloudwatch metric stream is enabled, it sends metrics with the prefix `amazonaws_com_AWS prefix`. They can be observed in the Hosted Grafana in the Grafana tab in Last9. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Cloudflare Logs > Send logs to Last9 from Cloudflare for CDN monitoring and worker monitoring via Logpush This document lists step-by-step instructions for pushing logs from Cloudflare to Last9. ## Prerequisites 1. Create a Last9 account by following [Getting Started](/docs/onboard/). 2. Keep the following information handy from the [Integrations](https://app.last9.io/integrations?integration=OpenTelemetry) page: * `$last9_otlp_endpoint`: Last9’s OTLP endpoint copies from Cloudflare Integration section from [Integrations](https://app.last9.io/integrations/) page. * `$last9_basic_auth_header`: OTLP Basic authorization header ## Setup You can use Cloudflare’s [Logpush](https://developers.cloudflare.com/logs/about/) Custom [HTTPS endpoint integration](https://developers.cloudflare.com/logs/get-started/enable-destinations/http/) to push logs from Cloudflare to Last9. Logpush delivers logs in batches as quickly as possible, with no minimum batch size, potentially delivering files more than once per minute. This capability enables Cloudflare to provide information almost in real time, in smaller file sizes. Logpush does not offer storage or search functionality for logs; its primary aim is to send logs as quickly as they arrive which makes it perfect candidate to send logs to [Last9 Log Management](https://last9.io/logs/). ### Via Cloudflare dashboard 5. In **Select a destination**, choose **HTTP destination**. 6. Enter the **HTTP endpoint** as the Last9 Endpoint, and select **Continue**. ```shell https://$last9_otlp_endpoint/cloudflare?header_Authorization= ``` 7. Select the dataset to push to the storage service. 8. In the next step, you need to configure your logpush job: * Enter the **Job name** as Last9. * Under **If logs match**, you can select the events to include and/or remove from your logs. Refer to [Filters](https://developers.cloudflare.com/logs/reference/filters/) for more information. Not all datasets have this option available. * In **Send the following fields**, you can choose to either push all logs to Last9 or selectively choose which logs you want to push. 9. In **Advanced Options**: * Choose the format of timestamp fields in your logs to be `UnixNano`. 10. Select **Submit** once you are done configuring your logpush job. ### Via API To create a Logpush job, make a `POST` request to the [Logpush job creation endpoint URL](https://developers.cloudflare.com/logs/get-started/api-configuration/) with the appropriate parameters. #### Example curl request ```bash curl https://api.cloudflare.com/client/v4/zones/{zone_id}/logpush/jobs \ --header "X-Auth-Email: " \ --header "X-Auth-Key: " \ --header "Content-Type: application/json" \ --data '{ "name": "last9", "output_options": { "field_names": ["EdgeStartTimestamp", "RayID"], "timestamp_format": "unixnano" }, "destination_conf": "https://$last9_otlp_endpoint/cloudflare?header_Authorization=", "dataset": "http_requests", "enabled": true }' ``` ## Verification Visit [Log Explorer](https://app.last9.io/logs) to see the Cloudflare logs in action. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Cloudflare Workers > Monitor Cloudflare Workers using OpenTelemetry and Last9 This document lists step-by-step instructions for setting up monitoring for Cloudflare Workers with Last9. ## Prerequisites 1. Create a Last9 account by following [Getting Started](/docs/onboard/). 2. Keep the following information handy from the [Integrations](https://app.last9.io/integrations?integration=OpenTelemetry) page: * `$last9_otlp_endpoint`: Last9’s OTLP endpoint copies from Cloudflare Integration section from [Integrations](https://app.last9.io/integrations/) page. * `$last9_basic_auth_header`: OTLP Basic authorization header ## Setup Instrument your [Cloudflare Worker](https://developers.cloudflare.com/workers/) applications with [OpenTelemetry](https://opentelemetry.io/) using the the [otel-cf-workers](https://github.com/evanderkoogh/otel-cf-workers) SDK. ### Step 1: Install the SDK Install `@microlabs/otel-cf-workers` in your project. ```bash npm i @microlabs/otel-cf-workers ``` ### Step 2: Add Node.js Compatibility Flags OpenTelemetry requires the Node.js Compatibility flag is enabled at the top level of your `wrangler.toml` file. ```toml compatibility_flags = [ "nodejs_compat" ] ``` ### Step 3: Configure the tracer In your Cloudflare worker file, add the following configuration code to configure OpenTelemetry. ```typescript import { instrument, ResolveConfigFn } from "@microlabs/otel-cf-workers"; export interface Env { LAST9_BASIC_AUTH: string; // Last9 Basic Auth Header SERVICE_NAME: string; // Your service name } const handler = { async fetch( request: Request, env: Env, ctx: ExecutionContext, ): Promise { // your cloudflare worker code }, }; const config: ResolveConfigFn = (env: Env, _trigger) => { return { exporter: { url: `$last9_otlp_endpoint/v1/traces`, headers: { Authorization: env.LAST9_BASIC_AUTH }, }, service: { name: env.SERVICE_NAME }, }; }; export default instrument(handler, config); ``` ### Step 4: Set the Last9 environment variables In your [Cloudflare Workers Secret Configuration](https://developers.cloudflare.com/workers/configuration/secrets/) add the `LAST9_BASIC_AUTH`. To enable tracing for local dev add your `LAST9_BASIC_AUTH` to your `.dev.vars` file ```js LAST9_BASIC_AUTH = $last9_basic_auth_header; ``` In your `wrangler.toml` file set the `SERVICE_NAME` variable ```toml [vars] SERVICE_NAME = "my-service-name" ``` Once these steps are completed, distributed traces from your Cloudflare Workers application should be available in [Last9 Trace Explorer](https://app.last9.io/traces). ## Adding custom OpenTelemetry spans To add custom spans to your OpenTelemetry traces, install the `@opentelemetry/api` package. ```bash npm i @opentelemetry/api ``` And manually add spans to your traces. ```typescript import { trace } from "@opentelemetry/api"; const tracer = trace.getTracer("custom-traces"); const handler = { async fetch( request: Request, env: Env, ctx: ExecutionContext, ): Promise { const span = trace.getActiveSpan(); span.setAttribute("search", search); const result = await tracer.startActiveSpan( `transaction-started`, async (span) => { // your business logic const input = { search }; span.setAttributes(input); const result = await transactionLogic(input); span.setAttributes(result); return result; }, ); }, }; ``` ## Verification Visit [Trace Explorer](https://app.last9.io/traces) to see the Cloudflare traces in action. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Confluent Cloud > Send Kafka metrics from Confluent Cloud to Last9 using Prometheus Remote Write This document will showcase how to send Kafka metrics from Confluent Cloud to Last9 via the `vmagent` or Prometheus. ## Pre-requisites 1. You have created a Last9 cluster by following the [getting started guide](/docs/onboard/) 2. You have an active Confluent cloud account ## Setup ![Confluent Cloud Kafka Metrics to Last9](/_astro/9a35942-image.B2RAEVHA_Z1HsIV1.webp) The setup primarily has three components. 1. Confluent Cloud, which has the required metrics that you want to send to Last9. Confluent Cloud exposes metrics on `/export` GET endpoint. 2. `vmagent` or Prometheus will scrape the metrics from the `/export` endpoint and [remote write](https://last9.io/blog/what-is-prometheus-remote-write/) to Last9 The endpoint is as follows: This endpoint requires a[ Cloud API key for authentication](https://api.telemetry.confluent.cloud/docs?&_ga=2.51091241.1213369535.1676543952-257140517.1676543951#section/Authentication). The Cloud API key can be created using the [Confluent Cloud CLI](https://docs.confluent.io/current/cloud/cli/). Caution This API endpoint does not export [health metrics dataset](https://api.telemetry.confluent.cloud/docs/descriptors/datasets/health-plus). Once you have the key, use it in the `vmagent` configuration as follows: ```yaml scrape_configs: - job_name: Confluent Cloud scrape_interval: 1m scrape_timeout: 1m honor_timestamps: true static_configs: - targets: - api.telemetry.confluent.cloud scheme: https basic_auth: username: password: metrics_path: /v2/metrics/cloud/export params: "resource.kafka.id": - lkc-1 - lkc-2 ``` A sample docker-compose for`vmagent` can be as follows: ```yaml version: "3.5" services: vmagent: container_name: vmagent image: victoriametrics/vmagent ports: - 8429:8429 volumes: - vmagentdata:/vmagentdata - /var/tmp:/var/tmp - ./vmagent.yaml:/etc/vmagent/vmagent.yaml command: - "--promscrape.config=/etc/vmagent/vmagent.yaml" - "--remoteWrite.tmpDataPath=/var/tmp/vmagent/" - "--remoteWrite.maxDiskUsagePerURL=10737418240" - "--remoteWrite.url=$levitate_remote_write_url" - "--remoteWrite.basicAuth.username=$levitate_remote_write_username" - "--remoteWrite.basicAuth.password=$levitate_remote_write_password" restart: always network_mode: "host" volumes: vmagentdata: {} ``` This will start scraping metrics from Confluent Cloud and [remote write](https://last9.io/blog/what-is-prometheus-remote-write/) to Last9. Find more details on the export metrics API specification of Confluent Cloud [here](https://api.telemetry.confluent.cloud/docs?&_ga=2.51091241.1213369535.1676543952-257140517.1676543951#tag/Version-2/paths/~1v2~1metrics~1%7Bdataset%7D~1export/get). The doc for all metrics exposed by Confluent Cloud is [here](https://api.telemetry.confluent.cloud/docs/descriptors/datasets/cloud). ## Next steps Create a read token for your Last9 Cluster and follow our guide to [Configure Grafana](/docs/grafana-config/) to visualize the time series data getting sent to Last9. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Fastly > Send logs to Last9 from Fastly for CDN monitoring This document lists step-by-step instructions for pushing logs from Fastly CDN to Last9. ## Prerequisites 1. Create a Last9 cluster by following [Getting Started](/docs/onboard/). 2. Keep the following information handy after creating the cluster from the [Integrations](https://app.last9.io/integrations?integration=OpenTelemetry) page: * `$last9_otlp_endpoint`: Last9’s OTLP endpoint copies from Fastly Integration section from [Integrations](https://app.last9.io/integrations) page. * `$last9_basic_auth_header`: OTLP Basic authorization header ## Setup Last9 leverages Fastly’s [real time log streaming](https://docs.fastly.com/en/guides/about-fastlys-realtime-log-streaming-features) Custom HTTPS endpoint integration to push logs from Fastly to Last9. Add Last9 ingestion endpoint for Fastly logs as Custom HTTPS endpoint by following next steps. 1. Select the Service for which you want to send logs to Last9 and click on **Logging**. ![Select the Fastly Service](/_astro/fastly-logs-enable-https-endpoint.BVwwLl5n_Z2ktqkH.webp) 2. Select HTTPS Endpoint. ![Click on the HTTPS endpoint](/_astro/fastly-logs-https-endpoint.Db1OiBVH_ZEqAw4.webp) 3. Add the Last9 endpoint in the **URL** field. Use the `$last9_otlp_endpoint` copied in the earlier step. Ensure that you add following query parameters. These are mandatory parameters for the Last9 integration. ```bash ?source=fastly&service_id= ``` Tip You can also add any other additional query parameters to the URL. Make sure to URL encode the query parameters. ![Fastly Last9 ingestion endpoint](/_astro/fastly-logs-https-endpoint-details-1.Dn4E4tOD_1Hom9P.webp) 4. Add Authorization header under Advanced Configuration. Use the `$last9_basic_auth_header` copied in the earlier step. ![Fastly Last9 ingestion advanced configuration](/_astro/fastly-logs-advanced-config.Bg0e5BKD_Z2lxS5A.webp) Click `Create` to finish setting up Last9 logs ingestion endpoint. ## Verification Visit [Log Explorer](https://app.last9.io/logs) to see the Fastly logs in action. The service name for all Fastly logs is `fastly` unless overriden in [Last9 Control Plane](https://last9.io/control-plane/). *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # Grafana Loki > Send data to Last9 using Grafana Loki’s recording rules’ remote write feature We make use of the Loki Ruler’s [remote-write](https://grafana.com/docs/loki/latest/alert/#remote-write) feature to allow Loki Ruler to write to an external remote endpoint i.e. Last9’s remote-write endpoint. ## Prerequisites Create a Last9 cluster by following [Getting Started](/docs/onboard/). Keep the following information handy after creating the cluster: * `$levitate_remote_write_url` - Last9’s Remote write endpoint * `$levitate_remote_write_username` - Cluster ID * `$levitate_remote_write_password` - Write token created for the cluster ## Setup ![Grafana Loki to Last9](/_astro/grafana-loki-levitate.2uuzhjag_1TiBvN.webp) The setup is pretty straight forward. Grafana Loki has the required recording rules that are evaluated by the ruler component. This ruler component has the ability to remote write the declared recording rules as metrics. We need to configure the ruler component to remote write to Last9. This can be done as follows. Update the existing Loki configuration to include the remote write config under the ruler section. ```yaml ruler: ... remote_write: enabled: true client: url: "$levitate_remote_write_url" basic_auth: username: "$levitate_remote_write_username" password: "$levitate_remote_write_password" ``` After that, restart Loki, and data will be written to Last9. Find more details on the remote write configurations [here](https://grafana.com/docs/loki/latest/operations/recording-rules/#remote-write) ## Next steps Explore metric data using embedded Grafana by querying for the recording rule itself. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # jmxtrans > How to send metrics from Java-based applications to Last9 using jmxtrans [jmxtrans](https://github.com/jmxtrans/jmxtrans) is one of the common tools used to extract metrics from JVM via JMX and translate them into a variety of output metric formats e.g. statsd, telegraf, etc. This document will showcase how to use your existing `jmxtrans` setup to emit metrics to Last9. ## Pre-requisites 1. You have created a Last9 cluster by following the [getting started guide](/docs/onboard/) 2. Clone [last9-integrations](https://github.com/last9/last9-integrations/tree/master/levitate/remote-write/telegraf) GitHub repository contains the sample code with different approaches discussed in this article ## Setup ![jmxtrans using Graphite output plugin](/_astro/da27781-image.CQJRKgKq_ZgDksB.webp) The setup primarily has 3 components: 1. The Java Application is the application under observation for monitoring. It exposes a JMX interface 2. A `jmxtrans` container queries the `jmx` endpoint exposed by the Java application for metrics and converts them into graphite output format. These metrics are pushed to `vmagent` 3. `vmagent` reads the data in the graphite format converts it into Prometheus-compatible format and remote writes to Last9 A complete example using this approach can be found [here](https://github.com/last9/last9-integrations/tree/master/levitate/remote-write/jmxtrans). ## Next steps Create a read token for your Last9 Cluster and follow our guide to [Configure Grafana](/docs/grafana-config/) to visualize the time series data getting sent to Last9. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # KEDA > Setup autoscaling with KEDA and Last9 This document lists step-by-step instructions for setting up auto scaling with [KEDA](https://KEDA.sh/) and Last9. ## What is KEDA KEDA is a Kubernetes-based Event Driven Autoscaler. With KEDA, you can drive the scaling of any container in Kubernetes based on the number of events needing to be processed. KEDA works by integrating with various event sources and metrics sources such as Last9 to dynamically adjust the number of replicas of Kubernetes deployments, StatefulSets, or any other scalable resources. ## Prerequisites Create a Last9 cluster by following [Getting Started](/docs/onboard/). Keep the following information handy after creating the cluster: * `$levitate_read_url` - Last9’s Read endpoint * `$levitate_username` - Cluster ID * `$levitate_password` - Read token created for the cluster ## KEDA Installation Refer to the KEDA [docs](https://KEDA.sh/docs/2.13/deploy/#helm) to install KEDA in your Kuberentes Cluster. Check the default `values.yaml` and modify as required. Below is a command to derive the default `values.yaml` ```yaml helm show values kedacore/keda > values.yaml ``` Ensure to install the following. 1. KEDA Operator - Responsible for activation and deactivation of Kubernetes [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) to scale to and from zero on no events 2. KEDA Metric Server - This is a [Kubernetes metrics server](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-custom-metrics) that exposes rich event data like queue length or stream lag to the Horizontal Pod Autoscaler to drive scale out It is up to the Deployment to consume the events directly from the source. This preserves rich event integration and enables gestures like completing or abandoning queue messages to work out of the box. The metric serving is the primary role of the `keda-operator-metrics-apiserver` container that runs when you install KEDA. ## KEDA Scalers KEDA scalers are the components within KEDA that interact with different event sources or metrics services to determine the current demand for an application and scale it accordingly. Each scaler is responsible for a specific type of event source or metric. For example, there are scalers for Prometheus Azure Service Bus, RabbitMQ, Kafka, HTTP requests, and many others. Scalers work by polling the event source at a specified interval to retrieve metrics that indicate the current load or demand. They then use this information to scale the application in or out. The scaling parameters and the thresholds for scaling can be customized through the scaler’s configuration. Keep in handy the following parameters which are variables that act as levers to define scale strategies in KEDA. * `serverAddress` - Read URL of Last9 Cluster * `metricName` - Name of the metric on which scaling decision will be based * `threshold` - Value to start scaling for. (This value can be a float) * `query` - PromQL Query to run #### ScaledObject `ScaledObject` is a CRD that KEDA uses as rule sets that define [scale strategies](https://keda.sh/docs/2.13/concepts/scaling-deployments/). Sample configuration for a `ScaledObject` which can be applied via `kubectl`. ```yaml # scaled-object.yaml apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: template-deployment-traffic namespace: production spec: scaleTargetRef: name: rails-app pollingInterval: 30 # Must be seconds minReplicaCount: 2 maxReplicaCount: 3 triggers: - type: prometheus metadata: serverAddress: https://<$levitate_username>:<$levitate_password>@<$levitate_read_url> metricName: rails_requests_total threshold: "100000" query: sum(increase(rails_requests_total{kubernetes_namespace="production", app="web"}[4m])) ``` Let’s drill down the trigger configuration. #### type Last9 is a Prometheus-compatible telemetry data platform so, you must use the trigger with type `prometheus` in the YAML configuration to trigger scaling. #### serverAddress This is the URL of the metrics source which is the Last9 cluster’s READ URL where the metrics will be read from. #### metricName Name of the metric which will be used to evaluate the trigger condition. #### threshold The threshold for value of the query when the scaling will trigger in the form of creating a new pod. #### query This query is designed to check the number of requests served by the application. The threshold is 100K. When the request count reaches 100K, a new pod is created. Similarly, when the usage drops to 50K, a pod is deleted. To determine the value required to reduce the number of pods, KEDA waits till the result of the query becomes half of the threshold. ### Applying the ScaledObject Apply the configuration as follows. ```yaml kubectl apply -f ./scaled-object.yaml -n production --kubeconfig=$KUBECONFIG ``` This will setup the trigger for autoscaling the pods depending on real time request traffic from the Rails application. Tip You can read more about scaling triggers that KEDA supports [here](https://KEDA.sh/docs/2.13/concepts/scaling-deployments/#triggers) *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # LaunchDarkly > Send LaunchDarkly feature flag change events to Last9 and visualize them as annotations in Last9's Alert Details and managed Grafana Last9 Last9’s integration with LaunchDarkly enables change intelligence with feature flag events. Visualize the feature flag events as Change Event annotations in Last9 to correlate system health. ## Prerequisites To start sending feature flag events to Last9 Change Events, you’ll need: * [Last9 account](https://app.last9.io), and create a Last9 cluster by following [Getting Started](/docs/onboard/) * [Last9 API write token](https://app.last9.io/apiaccess) / [(docs)](/docs/getting-started-with-api/) * A [default Change Events](https://app.last9.io/change-events/) cluster / [(docs)](/docs/change-events/#change-events-storage) ## Set up the Last9 integration in the LaunchDarkly dashboard 1. Navigate to the [LaunchDarkly Integrations](https://app.launchdarkly.com/default/integrations?q=last9) page and find “Last9” 2. Click on “Add Integration” to view the “Create Last9 configuration” side panel 3. (Optional) Provide a Name to the integration that’s human-readable and easy to identify, especially if this instance of the integration is scoped to certain policy filters 4. Enter the Last9 API’s base URL. Do not include a trailing `/`. For the `https://{host}/api/v4/organizations/{org}` format, use `app.last9.io` as the `{host}` and you can use your organization slug from the Last9 dashboard URL (the `demo` in app.last9.io/v2/organizations/demo/apps) as the `{org}` ![Screenshot 2024-05-29 at 1.01.38 AM.png](/_astro/launchdarkly-org-slug.D_uVoJMd_ZB2t5c.webp) 5. Enter the Last9 API write refresh token. If you have not saved the write token earlier while generating it, you can click on “Generate Tokens” on the [API Access](https://app.last9.io/apiaccess) page in the Last9 dashboard to get a new write refresh token ![Untitled](/_astro/launchdarkly-token.CJwyeJTv_Z250VRd.webp) 6. (Optional) Provide a Last9 tag. If provided, all feature flag events matching the policy filter (defined in the next step) will be only associated with Last9 entities with the same tag. This is particularly helpful if this instance of the integration is scoped to certain LaunchDarkly policy filters 7. (Optional) Configure a custom LaunchDarkly policy to control which events LaunchDarkly sends to Last9 8. Once you’ve read the [Integration Terms and Conditions](https://launchdarkly.com/policies/integrations/), click on the “I have read and agree to the Integration Terms and Conditions” checkbox 9. Click on “Save Configuration”. This new integration now appears on the Integrations page under Last9. It is switched on by default 10. Last9 will now start receiving relevant change events whenever a feature flag event is triggered 11. To verify if the integration is working as expected: 1. Click on the `...` icon next to the newly created integration and click on “Edit Integration Configuration” 2. Click on “Validate Connection”. This will trigger a test event from LaunchDarkly 3. To view the test event in Last9, visit [Grafana](https://app.last9.io/grafana/query) 4. Run a query against the `last9_change_events{}` metric to visualize it on the chart 5. You’ll see a data point on which you can hover to see the test event’s labels ## Visualize LaunchDarkly events as Annotations in Last9’s managed Grafana 1. Open any [Dashboard](https://app.last9.io/grafana/dashboards) → Settings → Annotations 2. Click on “Add Annotation Query” 3. Give it a name, eg: LaunchDarkly All Events 4. Select the “Data source” to be the same as the default cluster set in the Change Events settings 5. In “Query”, expand the “Metrics browser”, and select the `last9_change_events` metric 6. Optionally, you can select one more label and their values if you want to filter the visualized annotations. For eg, selecting `environment_key` and `event_state` labels followed by `production` and `start` values respectively. Only start annotations of the production environment will be visualized 7. Click on “Use query” to construct the final query 8. Give an appropriate name to be visualized while viewing the annotation. For eg, “Event on {{environment\_key}} started” will result in “Event on production started” 9. Click on “Apply” to save the configuration. You can also click on “Preview in dashboard” to save and preview 10. Do remember to click on “Save dashboard” to persist the new annotation ## Visualize LaunchDarkly events as Annotations in Last9’s Alert Details modal 1. Click on an alert to view the Alert Details modal * You can receive the alert as a notification in Slack or another tool of choice * You can click on an alert while viewing Alert Monitor * You can click on an alert while viewing an Alert Group’s Health tab 2. While viewing the Alert Details modal, click on an “Impacted Label” 3. If there are any associated Change Events: * You‘ll see “x events for selected label” * You‘ll see a purple annotation on the chart with a timestamp * Hovering on the annotation will let you view details of the change event Note: * While Change Events can be sent to any data source while using Last9’s Change Events API, in case of the LaunchDarkly integration, all feature flag events are sent ***only*** to the default cluster selected in the Change Events settings in the Last9 dashboard * Change Events are only visualized when an Alert Group’s data source is the same as well. In the case of the LaunchDarkly integration, the Alert Group’s data source should be the same as the default cluster selected in the Change Event settings *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # OpenTelemetry > Start sending metrics, logs, and trace data from your applications using OpenTelemetry for a correlated single pane. ## Send data using OpenTelemetry Last9 is fully [OpenTelemetry](https://opentelemetry.io/) compatible. OpenTelemetry is an open-source observability framework. It provides a standardized way to collect and export telemetry data — metrics, logs, and traces — from your applications and infrastructure. If you are instrumenting code for the first time, we recommend using OpenTelemetry. You can either run OpenTelemetry Collector or send the telemetry directly from applications using the SDKs. Last9 supports both gRPC and HTTP endpoints for OpenTelemetry. ## Using OTel Collector If your code is already instrumented using Prometheus, statsD, Influx, Jaeger, Zipkin, etc, please refer to [OpenTelemetry Collector documentation](/docs/integrations-opentelemetry-collector/). OTel Collector supports sending metrics, logs, and trace data irrespective of source. ## Using OTel SDKs [![Express.js](https://cdn.simpleicons.org/express/fff) Express.js ](/docs/integrations-opentelemetry-expressjs/)[![NestJS](https://cdn.simpleicons.org/nestjs/fff) NestJS ](/docs/integrations-opentelemetry-nestjs/)[![Next.js](https://cdn.simpleicons.org/nextdotjs/fff) Next.js ](/docs/integrations-opentelemetry-nextjs/)[![Koa](https://cdn.simpleicons.org/koa/fff) Koa ](/docs/integrations-opentelemetry-koa/)[![Django](https://cdn.simpleicons.org/django/fff) Django ](/docs/integrations-opentelemetry-django/)[![Flask](https://cdn.simpleicons.org/flask/fff) Flask ](/docs/integrations-opentelemetry-flask/)[![Kong](https://cdn.simpleicons.org/kong/fff) Kong ](/docs/integrations-opentelemetry-kong-gateway/)[![Phoenix Framework](https://cdn.simpleicons.org/phoenixframework/fff) Phoenix Framework ](/docs/integrations-opentelemetry-phoenix/)[![Ruby on Rails](https://cdn.simpleicons.org/rubyonrails/fff) Ruby on Rails ](/docs/integrations-opentelemetry-ruby-on-rails/)[![Sinatra](https://cdn.simpleicons.org/rubysinatra/fff) Sinatra ](/docs/integrations-opentelemetry-ruby-on-rails/)[![Roda](https://last9.github.io/assets-docs/integration-roda.svg) Roda ](/docs/integrations-opentelemetry-ruby-on-rails/)[![Fluent Bit](https://cdn.simpleicons.org/fluentbit/fff) Fluent Bit ](/docs/integrations-opentelemetry-fluent-bit/)[![Ubuntu](https://cdn.simpleicons.org/ubuntu/fff) Ubuntu ](/docs/integrations-opentelemetry-ubuntu/)[![Datadog Agent](https://cdn.simpleicons.org/datadog/fff) Datadog Agent ](/docs/integrations-opentelemetry-datadog-agent/)[![Ubuntu Host Metrics](https://cdn.simpleicons.org/ubuntu/fff) Ubuntu Host Metrics ](/docs/integrations-opentelemetry-ubuntu-host-metrics/)[![Gin](https://cdn.simpleicons.org/gin/fff) Gin ](/docs/integrations-opentelemetry-gin/)[![gRPC](https://last9.github.io/assets-docs/integration-grpc.svg) gRPC ](/docs/integrations-opentelemetry-grpc/)[![FastHTTP](https://cdn.simpleicons.org/go/fff) FastHTTP ](/docs/integrations-opentelemetry-fasthttp/)[![Iris](https://cdn.simpleicons.org/go/fff) Iris ](/docs/integrations-opentelemetry-iris/)[![Gorilla Mux](https://cdn.simpleicons.org/go/fff) Gorilla Mux ](/docs/integrations-opentelemetry-gorilla-mux/)[![Logs from Kubernetes Cluster](https://cdn.simpleicons.org/kubernetes/fff) Logs from Kubernetes Cluster ](/docs/integrations-opentelemetry-kubernetes-logs/)[![Kubernetes Audit Logs](https://cdn.simpleicons.org/kubernetes/fff) Kubernetes Audit Logs ](/docs/integrations-opentelemetry-kubernetes-audit-logs/)[![AWS EC2 Instance](https://cdn.simpleicons.org/amazonec2/fff) AWS EC2 Instance ](/docs/integrations-opentelemetry-aws-ec2/)[![Logs from AWS S3](https://cdn.simpleicons.org/amazons3/fff) Logs from AWS S3 ](/docs/integrations-opentelemetry-aws-s3/)[![MariaDB](https://cdn.simpleicons.org/mariadb/fff) MariaDB](/docs/integrations-opentelemetry-mariadb/) ## Obtain OTLP Endpoint & Credentials 1. Visit the [Integrations](https://app.last9.io/integrations/) page 2. Select [OpenTelemetry](https://app.last9.io/integrations?integration=OpenTelemetry) 3. Copy the Endpoint URL and the Authorization Header ![Obtain OTLP Credentials](/_astro/otlp-creds-capture.CSSHOziM_Z1TnP0G.webp) Last9’s OTLP compatible endpoint formats are as follows: #### HTTP ```yaml https://otlp.last9.io #for US-EAST-1 region https://otlp-aps1.last9.io #for AP-SOUTH-1 region ``` #### gRPC ```yaml otlp.last9.io:443 #for US-EAST-1 region otlp-aps1.last9.io:443 #for AP-SOUTH-1 region ``` Note The endpoints are region specific. Please refer to the [Integration](https://app.last9.io/integrations?integration=OpenTelemetry) page for your specific region. *** ## Troubleshooting Please get in touch with us on [Discord](https://discord.com/invite/Q3p2EEucx9) or [Email](mailto:cs@last9.io) if you have any questions. # EC2 Instance > Send logs and hostmetrics from AWS EC2 instance using OpenTelemetry This guide will help you instrument your AWS EC2 instance with OpenTelemetry and smoothly send the logs and host metrics to a Last9. ## Pre-requisites 1. You have a AWS EC2 instance and workload running in it. 2. You have signed up for [Last9](https://app.last9.io), created a cluster, and obtained the following OTLP credentials from the [Integrations](https://app.last9.io/integrations?integration=OpenTelemetry) page: * `endpoint` * `auth_header` 3. Optional: Attach an IAM policy to the EC2 instance with `ec2:DescribeTags` permission. This is needed for resource detection processor to fetch additional tags associated with the EC2 instance which can be used as additional resource attributes. 4. Install Otel Collector. There are multiple ways to install the Otel Collector. One possible way of installing it using rpm is as follows. Every Collector release includes APK, DEB and RPM packaging for Linux amd64/arm64/i386 systems. > Note: systemd is required for automatic service configuration. ```sh sudo rpm -ivh otelcol-contrib_0.103.0_linux_amd64.rpm ``` More installation options can be found [here](https://opentelemetry.io/docs/collector/installation/#linux). > Note: We recommend installing `otel-collector-contrib` version `0.103.0`. ## Sample Otel Collector Configuration The default path for otel config is `/etc/otelcol-contrib/config.yaml`. You can edit it and update it with below configuration. The configuration is annotated with comments which should be addressed before applying the configuration. The configuration for operators is especially important to extract the `timestamp` and `severity`. For JSON logs, you can use `json_parser` and use its keys for log attributes. For non-structured logs, use the `regex_parser`. The configuration provdies sample example of both JSON parser and regex parsers. ```yaml receivers: hostmetrics: collection_interval: 30s scrapers: cpu: metrics: system.cpu.logical.count: enabled: true memory: metrics: system.memory.utilization: enabled: true system.memory.limit: enabled: true load: disk: filesystem: metrics: system.filesystem.utilization: enabled: true network: paging: processes: process: mute_process_user_error: true metrics: process.cpu.utilization: enabled: true process.memory.utilization: enabled: true process.threads: enabled: true process.paging.faults: enabled: true otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 # Detailed configuration options can be found at https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver filelog: # File path pattern to read logs from. Update this to the destination from where you want to read logs. include: [/tmp/*.log] exclude: [/home/ubuntu/exclude/*.log] include_file_path: true # attributes: # A map of key: value pairs to add to the entry's attributes. # resource: # A map of key: value pairs to add to the entry's resource. retry_on_failure: enabled: true operators: # For logs in JSON format - type: json_parser severity: parse_from: attributes.level timestamp: parse_from: attributes.time layout: "%Y-%m-%d %H:%M:%S" # For plain text logs - type: regex_parser regex: '(?P^[A-Za-z]+) (?P