Use the Health Hub

Supported in:

This document describes the Health Hub, which is the central location in Google Security Operations for you to monitor the status and health of all configured data sources. The Health Hub provides crucial information on data sources and log types, offering the context needed to diagnose and remediate data pipeline issues.

The Health Hub includes information about the following:

  • Ingestion volumes and ingestion health.
  • Parsing volumes from raw logs to Unified Data Model (UDM) events.
  • Context and links to interfaces with additional relevant information and functionality.
  • Irregular and failed sources and log types.

The Health Hub detects irregularities and failures on a per-customer basis. It uses statistical methods with a 15-day lookback period to analyze ingestion data. Items that are marked irregular identify surges or drops in data being ingested and processed by Google SecOps. An irregularity can indicate a parser problem, vendor schema change, or a change in your data pipeline.

Key benefits

You can use the Health Hub to do the following:

  • Monitor overall data health at a glance. View the core health status and associated metrics for each feed, data source, log type, and source (that is, the feed ID).
  • Monitor aggregated data-health metrics for:
    • Ingestion and parsing over time with highlighted events (not necessarily irregularities) that link to filtered dashboards.
    • Irregularities—current and over time.
  • Access related dashboards, filtered by timeframe, log type, or feed.
  • Access the feed configuration to edit and fix or remediate a problem.
  • Access the parser configuration to edit and fix or remediate a problem.
  • Click the Set Up Alerts link to open the Cloud Monitoring interface, and from there, configure custom API-based alerts using Status and log-volume metrics.

Key questions

This section refers to Health Hub components and parameters, which are described in the Interface section.

You can use the Health Hub to answer the following questions about your data pipeline:

  • What were the last runs of my feed or last errors from my parser?

    The Health Hub has a deep-dive dashboard that shows the last 200 feed runs and last 200 parser errors for the specific row you click into.

  • Are my logs reaching Google SecOps?

    You can verify whether logs are reaching Google SecOps by using the Last Ingested and Last Normalized metrics. These metrics confirm the last time data was successfully delivered. Additionally, the ingestion-volume metrics (per source and per log type) show you the amount of data being ingested.

  • Are my logs being parsed correctly?

    To confirm correct parsing, view the Last Normalized metric. This metric indicates when the last successful transformation from raw log into a UDM event occurred.

  • Why is ingestion or parsing not happening?

    The text in the Latest Issue Details column identifies specific problems, which helps you pinpoint whether the action is actionable (you fix it) or non-actionable (requires support). The text Forbidden 403: Permission denied is an example of an actionable error, where the auth account provided in the feed configuration lacks required permissions. The text Internal_error is an example of a non-actionable error, where the recommended action is to open a support case with Google SecOps.

  • Are there significant changes in the number of ingested logs and parsed logs?

    The Status field shows your data's health (Healthy or Failed), based on data volume. You can also identify sudden or sustained surges or drops by viewing the Total Ingested Logs graph.

  • How can I get alerted if my sources are failing?

    The Health Hub feeds the Status and log-volume metrics into Cloud Monitoring. In one of the Health Hub tables, click the relevant Alerts link to open the Cloud Monitoring interface. There, you can configure custom API-based alerts using Status and log-volume metrics.

  • How do I infer a delay in a log-type ingestion?

    A delay is indicated when the Last Event Time is significantly behind the Last Ingested timestamp. The Health Hub exposes the 95th percentile of the Last IngestedLast Event Time delta—per log type. A high value suggests a latency problem within the Google SecOps pipeline, whereas a normal value might indicate that the source is pushing old data.

  • Have any recent changes in my configuration caused feed failures?

    If the Config Last Updated timestamp is close to the Last Ingested timestamp, it suggests that a recent configuration update may be the cause of a failure. This correlation helps in root-cause analysis.

  • How has the health of ingestion and parsing been trending over time?

    The Data Source Health Overview, Parsing Health Overview, and Total Ingested Logs graphs show the historical trend of your data's health, letting you observe long-term patterns and irregularities.

Interface

To open the Health Hub, in the side navigation menu, click Health Hub.

The Health Hub is a read-only default dashboard and can't be modified directly. To customize, create a copy of the Health Hub, and then modify the duplicated dashboard for your specific use case.

The Health Hub displays the following widgets:

  • Big number widgets:

    • Healthy Sources: The number of data sources performing with no failures and no irregularities.
    • Failed Sources: The number of data sources that need immediate attention.
    • Irregular Data Sources: The number of irregular data sources and parsers.
    • Healthy Parsers: The number of parsers performing with no failures.
    • Failed Parsers: The number of parsers that need immediate attention.
    • Irregular Parsers: The number of parsers that are exhibiting irregular behavior.
  • Data Source Health Overview: A line graph showing the Healthy, Irregular, and Failed data-sources-per-day curves over time.

  • Parsing Health Overview: A line graph showing the Healthy, Irregular, and Failed parsers-per-day curves over time.

  • Total Ingested Logs: A line graph showing the Ingested Logs logs-per-day curve over time.

  • Failing Parser by Log Type: A line graph showing a curve for each parser with a failed health status, per day over time. In this context, the failed health status is due to a very low parsing-success rate.

  • Health Status by Data Source table—includes the following columns:

    • Status: The cumulative status of the feed (Healthy, Irregular, or Failed), derived from data volume, configuration errors, and API errors.
    • Source Type: The source type (ingestion mechanism)—for example, Ingestion API, Feeds, Native Workspace Ingestion, or Azure Event Hub Feeds.
    • Name: The feed name.
    • Log Type: The log type—for example, CS_EDR, UDM, GCP_CLOUDAUDIT, or WINEVTLOG.
    • Latest Issue Details: The details about the latest issue in the specified timeframe—for example, Failed parsing logs, Config credential issue, or Normalization issue. The stated issue can be actionable (for example, Incorrect Auth) or non-actionable (for example, Internal_error). If the issue is non-actionable, the recommended action is to open a support case with Google SecOps. When there has been no issue in the specified timeframe, the value is empty or displays OK.
    • Issue Duration: The number of days that the data source has been in an irregular or failed state. When the Status is Healthy, the value is empty or displays N/A.
    • Last Collected: The timestamp of the last data collection.
    • Last Ingested: The timestamp of the last successful ingestion. Use this metric to identify whether your logs are reaching Google SecOps.
    • Config Last Updated: The timestamp of the last change to the metric. Use this value to correlate configuration updates with observed irregularities or failures, helping you determine the root cause of ingestion problems or parsing problems.
    • View Ingestion Details: A link that opens a new tab with another dashboard, which contains additional, historical information—for deeper analysis.
    • Edit Data Source: A link that opens a new tab with the corresponding feed configuration—where you can fix configuration-related irregularities or failures.
    • Set Up Alerts: A link, which opens a new tab with the corresponding Cloud Monitoring interface.
  • Health Status by Parser table—includes the following columns:

    • Status: The cumulative status of the log type (Healthy, Irregular, or Failed).
    • Latest Issue Details: The details about the latest parsing problem in the specified timeframe—for example, Failed parsing logs, Config credential issue, or Normalization issue. The stated issue can be actionable (for example, Incorrect Auth) or non-actionable (for example, Internal_error). If the issue is non-actionable, the recommended action is to open a support case with Google SecOps. When there has been no issue in the specified timeframe, the value is empty or displays OK.
    • Last Ingested: The timestamp of the last successful ingestion. You can use this metric to determine whether logs are reaching Google SecOps.
    • Last Event Time: The event timestamp of the last normalized log.

    • Last Normalized: The timestamp of the last parsing and normalization action for the log type. You can use this metric to determine whether raw logs are successfully transformed into UDM events.

    • Config Last Updated: The timestamp of the last change to the metric. Use this value to correlate configuration updates with observed irregularities or failures, helping you determine the root cause of ingestion problems or parsing problems.

    • View Parsing Details: A link, which opens a new tab with another dashboard, which contains additional, historical information—for deeper analysis.

    • Edit Parser: A link, which opens a new tab with the corresponding parser configuration—where you can fix configuration-related irregularities or failures.

    • Set Up Alert: A link, which opens a new tab with the corresponding Cloud Monitoring interface.

Understanding data health hub issues

The Health Hub surfaces various statuses and messages in the Latest Issue Details column to help you diagnose data ingestion and parsing issues. Errors generally fall into the following categories.

Source and ingestion errors

These errors occur when Google SecOps attempts to collect data from your configured sources (for example, Feeds, APIs). Refer to Troubleshoot ingestion for more details.

Example HTTP/canonical code Latest issue details (examples) Likely cause and source Actions
400 Bad Request Invalid request parameters Feed configuration / Parser code Verify feed configuration, API parameters, filters, and resource IDs for typos or incorrect values. Ensure parameters are within allowed limits. Check the relevant feed setup guide.
401 Unauthorized LOGIN_FAILED, Authentication failed Feed configuration / Parser code Refresh or re-enter credentials for the data source. Ensure the service account/API key isn't expired and has the correct permissions.
403 Forbidden ACCESS_DENIED, Permission denied Feed configuration / Parser code Ensure the service account or authentication method has the necessary IAM roles or permissions on the source system to read the data. Check firewalls.
404 Not Found URL not found, FILE_NOT_FOUND Feed configuration / Parser code Verify URLs, bucket names, resource IDs, and file paths in the feed configuration. Ensure the target resource exists and is accessible.
429 Too Many Requests ACCESS_TOO_FREQUENT, Quota limit Feed configuration / Parser code / Source The data source is rate-limiting requests. Reduce polling frequency if possible. May be transient. If persistent, check source-side quotas.
5xx Server Errors GATEWAY_ERROR, INTERNAL_ERROR SecOps / Source system Often transient. Google SecOps will typically retry. If persistent, this may indicate an issue with the source system's ability to serve data or an internal issue in Google SecOps. Contact support if persistent.
CONNECTION_FAILED Can't connect to source Feed configuration / Parser code / Network Verify source availability. Check firewalls between Google SecOps and the source. Validate the hostname or IP address and the port in the configuration.
DNS_ERROR DNS error Feed configuration / Parser code / Network Ensure the hostname of the data source is resolvable. Check for typos in hostnames.
INVALID_FEED_CONFIG Invalid feed configuration Feed configuration / Parser code Review the feed setup for incorrect or missing parameters based on the feed's documentation.
INVALID_SSL_CERTIFICATE Invalid SSL certificate Feed configuration / Parser code / Source Ensure the source system's SSL certificate is valid, not expired, and issued by a trusted CA.
INTERNAL_ERROR Internal error SecOps Usually non-actionable by the customer. Google SecOps will retry. If the issue persists, contact Google SecOps support.

Parser and normalization errors

These issues occur after the data is ingested, when Google SecOps attempts to map the raw logs into the Unified Data Model (UDM).

Latest issue details (examples) Likely cause and source Actions
Failed parsing logs Feed configuration / Parser code / SecOps If using a custom parser, review and debug the parser code. If using a default parser, the raw log format might have changed, or the data may be corrupt. Check the source system's log format. Contact support if it's a default parser issue.
Normalization issue Feed configuration / Parser code / SecOps Similar to Failed parsing logs. Data isn't mapping to UDM as expected.
LOG_PARSING_DROPPED_NO_EVENTS Feed configuration / Parser code The parser ran successfully, but produced zero UDM events. This might occur if logs are purely informational, or it could indicate the parser logic needs adjustment for the ingested log content.
LOG_PARSING_DROPPED_BY_FILTER Feed configuration / Parser code Logs were dropped due to an explicit filter in the parser configuration. This is often intentional to exclude noise. Review parser filters if unexpected logs are missing.
LOG_PARSING_NO_PARSER_FOUND Feed configuration / Parser code The Log Type specified for the ingested data doesn't have an active parser associated with it. Verify the Log Type in the feed configuration or forwarder setup.
Indexing event batch validation error SecOps / Parser The normalized UDM data failed schema validation. This often indicates an issue with the parser generating non-compliant UDM. Contact support if using a default parser.

Health hub irregularities

The Health Hub also flags Irregular statuses based on volume changes or parsing success rates.

Status Likely cause and source Actions
Irregular Feed configuration / Parser code / Source / SecOps Surge/Drop in volume: Investigate the source system. Was there a known outage, change in logging configuration, or a real event causing more or fewer logs?
Normalization ratio drop: Check for recent parser changes or changes in the incoming log format.
Parsing error increase: Similar to Normalization ratio drop.
Failed Feed configuration / Parser code / Source / SecOps Indicates a more persistent problem than an irregularity. Use the Latest Issue Details to narrow down the cause, and follow actions for Ingestion or Parser errors above.

General health check actions

  • Check timestamps: Use the Last Collected, Last Ingested, and Last Normalized fields in the Health Hub to pinpoint the failing process.
  • View details: Click View Ingestion Details or View Parsing Details to get historical context.
  • Edit configuration: Use the Edit Data Source or Edit Parser links to correct configuration issues.
  • Set up alerts: Configure Cloud Monitoring alerts for proactive notifications.
  • Contact support: If the issue is non-actionable (for example, persistent INTERNAL_ERROR) or troubleshooting doesn't resolve it, contact Google SecOps support. Provide details from the Health Hub, including specific messages from the Latest Issue Details column.

Irregularity-detection engine

The Health Hub uses the Google SecOps irregularity-detection engine to automatically identify significant changes in your data, letting you quickly detect and address potential problems.

The irregularity-detection engine looks back at the last 15 days of data, calculates based on the last 24 hours compared to the 15-day period, and recalculates every hour.

Data ingestion irregularity-detection

Google SecOps analyzes daily volume changes, while considering normal weekly patterns.

The irregularity-detection engine uses the following calculations to detect unusual surges or drops in your data ingestion:

  • Daily and weekly comparisons: Google SecOps calculates the difference in ingestion volume between the current day and the previous day, and also the difference between the current day and the average volume over the past week.
  • Standardization: To understand the significance of these changes, Google SecOps standardizes them using the following z-score formula:

    z = (xi − x_bar) / stdev

    where

    • z is the standardized score (or z-score) for an individual difference
    • xi is an individual difference value
    • x_bar is the mean of the differences
    • stdev is the standard deviation of the differences
  • Irregularity flagging: Google SecOps flags an irregularity if both the daily and weekly standardized changes are statistically significant. Specifically, Google SecOps searches for:

    • Drops: Both the daily and weekly standardized differences are less than -1.645.
    • Surges: Both the daily and weekly standardized differences are greater than 1.645.

Normalization ratio

When calculating the ratio of ingested events to normalized events, the irregularity-detection engine uses a combined approach to ensure that only significant drops in normalization rates are flagged. The irregularity-detection engine generates an alert only when the following two conditions are met:

  • There is a statistically significant drop in the normalization ratio compared to the previous day.
  • The drop is also significant in absolute terms, with a magnitude of 0.05 or greater.

Parsing error irregularity detection

For errors that occur during data parsing, the irregularity-detection engine uses a ratio-based method. The irregularity-detection engine triggers an alert if the proportion of parser errors relative to the total number of ingested events increases by 5 percentage points or more compared to the previous day.

What's next

Need more help? Get answers from Community members and Google SecOps professionals.