Help center

ObservAI Support

Guides for getting started, capabilities, connectors, and troubleshooting. Explore the product overview.

What is ObservAI?

ObservAI is an AI-powered observability and incident response platform that works across clouds and on-premises. It helps teams detect, diagnose, and resolve production issues faster.

The platform ingests logs and telemetry from your workloads via Connectors (e.g. GCP, AWS, Azure) or OpenTelemetry (OTLP), uses machine learning and AI to detect anomalies, performs root cause analysis, and suggests or executes remediation, with configurable human oversight.

Value: Reduce mean time to resolution (MTTR), improve reliability, and maintain a full audit trail of incidents and automated actions. Vendor-agnostic: use the connector that matches your environment.

How it works

Data ingestionAnomaly detectionIncident orchestrationTraceabilityRemediationExecutionRCANotifications

Human-in-the-loop (Approvals)Execution target

Data ingestion, Connect your environment using the appropriate Connector (GCP, AWS, Azure, or OpenTelemetry). Logs flow via the connector's native pipeline (e.g. cloud logging → message queue → ObservAI) or via OTLP. The connector's primary log path runs full anomaly detection; OTLP logs are stored for querying and analysis.

Anomaly detection , Incoming logs are evaluated by an ML ensemble and optional AI validation. Only events that meet the anomaly threshold are escalated, reducing noise.

Incident management , Escalated events are deduplicated and grouped into incidents so that many similar alerts become a single incident. This prevents alert storms and keeps focus on root cause.

Traceability and correlation , The system correlates each anomaly with related logs and service topology (including OpenTelemetry data) to identify the root component and propagation path.

Remediation and execution , AI suggests concrete actions (e.g. restart, scale, rollback). Actions can be auto-executed, sent for approval, or shown as suggestions, depending on your policy and confidence settings.

Root cause analysis , For each incident, the platform produces a structured root cause summary with category, evidence, and impact.

Notifications , You receive one consolidated email per incident, with anomaly details, impact, remediation plan, RCA, and execution status. Approval requests are sent when human approval is required.

Human-in-the-loop , When configured, actions that need approval appear in the Approvals page and by email. You can approve, reject, or let the request expire. After execution, the system can verify outcomes and trigger rollback if needed.

For more detail on each part of the pipeline, see Capabilities and components.

Capabilities and components

Anomaly detection

Evaluates each log line using an ML ensemble and optional AI validation. Supports ingestion from Connectors (provider-specific log pipelines) and OpenTelemetry (OTLP). Full ML-based detection runs on the connector's primary log path; OTLP logs are stored for querying. Configure ingestion via the Connector for your environment; the component runs automatically once data is connected.

Orchestration

Central coordinator. Receives detected anomalies, deduplicates and suppresses bursts, and groups events into incidents. Runs traceability, remediation, execution, RCA, and notification in sequence. Processing is asynchronous so ingestion stays responsive.

Traceability

Correlates the anomaly with related logs (e.g. by trace ID, dependency graph, time). Uses OpenTelemetry-derived topology to identify the root component and failure propagation path. Runs first so remediation and RCA have full context. Results appear in the Audit view and in incident emails.

Remediation

Uses AI to produce an actionable remediation plan with structured suggestions (action type, target component, urgency, safety). These drive the Executor. Runs for the primary event of each incident. You see remediation steps in the Audit tab and in notification emails.

Executor

Maps remediation suggestions to your pre-approved actions (defined in Policies). Applies confidence-based behavior: high confidence can auto-execute, medium can require approval, low can be suggestion-only. Manages approval requests (email and Approvals page), execution, verification, and optional rollback. You interact via Policies (define and tune actions) and Approvals (approve or reject).

Root cause analysis (RCA)

Produces a structured root cause for each incident: category, evidence, contributing factors, and impact. Runs for primary events. Results appear in the Audit view and in incident emails.

Notifications (Collaborator)

Sends one consolidated HTML email per incident with anomaly details, impact, remediation plan, RCA, and execution status. Also sends approval-request and execution-result emails. Throttling ensures you do not receive duplicate emails for the same incident.

Analyzer

Natural language query over your observability data. Use the Analyzer in the UI to ask questions in plain language and receive streaming results with optional AI follow-up. Independent of the incident pipeline; useful for ad-hoc investigation and reporting.

Recommender

Proactive code analysis for security, performance, reliability, and maintainability. Integrates with GitHub or GitLab. Use it for pre-production quality; separate from the real-time incident flow.

Connectors

Configure how data reaches ObservAI. Connectors are available for GCP, AWS, Azure, and other environments. Each connector provides ingestion pipelines (provider-native and/or OTLP), a dashboard for config and pipeline controls where applicable (e.g. start/stop, collector status, log samples), and optional metrics. See section 6a for data ingestion and configuration.

Key concepts

Incident

One or more related anomaly events grouped by the platform (e.g. by trace ID or message similarity). Has status, severity, and one primary event that drives full analysis and remediation.

Anomaly

A single log or event flagged by the anomaly detection component. Many anomalies can map to one incident.

Primary vs secondary events

The primary event triggers the full pipeline (traceability, remediation, execution, RCA, notifications). Duplicates and bursts are grouped as secondary and do not re-trigger the pipeline.

Approval request

When Executor confidence is in the "approval" band, it creates a request; user sees it in Approvals and email; approve/reject/timeout.

Confidence tiers

Executor: auto-execute (high), approval (medium), suggestion only (low). Thresholds are configurable per action.

Pre-approved actions

Defined in Policies; define when an action applies (error category, component pattern, severity) and safety constraints. Executor only runs actions that match and pass safety checks.

Using the platform

Dashboard

Component health, workload/cluster metrics, and analytics. Use it to confirm platform health and that ingestion is flowing.

Analyzer

Type a question in natural language; the system queries your observability data and streams results. Use for ad-hoc exploration of logs and anomalies.

Audit

Incident-centric view: anomaly, RCA, remediation, executor, timeline. Use for post-incident review and compliance.

Approvals

Pending actions needing your decision; countdown timers, approve/reject with optional reason. Act before timeout if you want to allow execution.

Policies

Define and edit pre-approved actions and matching rules. Required for any automated or approval-based execution.

Connectors / Settings

Configure ingestion per connector (GCP, AWS, Azure, OTLP, etc.) and platform connections.

Getting started and ingestion

Connector-based ingestion

Use the Connector that matches your environment (GCP, AWS, Azure, etc.). Each connector wires your provider's logging (and optionally metrics/traces) to ObservAI. The connector's primary log path runs full anomaly detection. Follow the in-app Connectors guide and the connector's setup steps (e.g. create sink/topic/queue, set push endpoint, configure auth).

OpenTelemetry (OTLP)

You can also send logs, metrics, and traces from any OTLP collector or instrumentation. Logs are stored for querying and analysis; metrics enrich incident context. Configure the OTLP endpoint and authentication (e.g. Bearer token or API key) in your deployment or connector.

What to configure

Depends on the connector: typically provider project/account, region, topic/queue/endpoint, and ObservAI API base URL and API key. Use consistent component or service naming so traceability and action matching work as expected.

Connectors: data ingestion and configuration

ObservAI is vendor-agnostic. Connectors are available for GCP, AWS, Azure, and other environments. Each connector streams observability data into the platform and may provide a dashboard for config and pipeline controls.

Typical pipeline patterns

Provider-native log pipeline, Workloads write to stdout or the provider's logging service. The connector (or your setup) routes those logs to ObservAI (e.g. via a message queue or push subscription). This path runs full anomaly detection. Authentication is provider-specific (e.g. OIDC, IAM, API keys).
OpenTelemetry (OTLP) , An OTel Collector or instrumented apps send logs, metrics, and traces to ObservAI via OTLP HTTP. Logs are stored for querying (e.g. Analyzer); metrics enrich incident context. Authentication is typically Bearer token or API key.

Configuration

Provider: Project/account ID, region, and any provider-specific resources (e.g. topic, subscription, queue).
ObservAI: API base URL and API key (or equivalent) for the connector and/or OTel exporter.
Optional: Cluster/namespace or resource labels for context; data store for log sampling in the connector dashboard (where supported).

Configure via the connector's dashboard (Setup Wizard and Configuration panel, where available) or via environment variables / manifests used by the connector and OTel Collector.

Pipeline controls (where supported)

Some connectors expose Start / Stop (enable or disable delivery to ObservAI), Restart (clear backlog and start), and Clear queue. Use the connector dashboard to check pipeline status and control delivery.

OTel Collector (where used)

If the connector or your setup uses an OTel Collector (e.g. DaemonSet), you can check status, Restart, and Enable/Stop from the connector UI or cluster. Collector config references the ObservAI endpoint and auth (env or secret).

Log sampling and metrics (where supported)

Connector dashboards may offer log samples (preview from your logs store), pipeline metrics (throughput, errors), and optional workload metrics (e.g. container CPU/memory) synced to ObservAI as OTLP.

Where to configure

First-time: Use the connector's Setup Wizard (Configure → Test → Complete): provider credentials, region, topic/queue/endpoint, ObservAI endpoint, API key. Save and test connection.
Current values: The connector's Configuration panel (or equivalent) shows current settings. Infrastructure (sinks, topics, IAM, collector manifests) is created via the connector's scripts or your provider's console, as documented per connector.

Summary for support

Anomaly detection runs on the connector's primary log pipeline. OTLP logs are stored and queryable; they do not go through ML detection unless the connector sends them over that primary path.
Authentication: Depends on the connector (e.g. OIDC, IAM, Bearer). Ensure audience/endpoint and API key are correct where the connector or OTel calls ObservAI.
No data: Verify the connector's pipeline is started (push/stream enabled), provider resources (sink, topic, queue) match the connector config, and (if used) OTel Collector is running with correct endpoint and auth. See the connector's documentation.

Troubleshooting

No anomalies detected

Confirm logs are reaching the platform via your Connector or OTLP. Note that low-severity entries (e.g. INFO, DEBUG) are not evaluated for anomalies. Check the Dashboard to ensure the anomaly detection component is healthy. Anomaly detection runs on the connector's primary log path; OTLP-only logs are stored for the Analyzer.

Many similar events, one incident

The platform groups related events into a single incident to reduce noise. Only the primary event triggers the full pipeline. This is expected; see Key concepts.

No remediation or execution

Remediation runs for the primary event of each incident. Automated or approval-based execution requires a matching pre-approved action in Policies. Check that your action rules (error category, component pattern, severity) align with the incident, and review the confidence tier (suggestion vs approval vs auto-execute).

Approval email not received

Check the Approvals page in the UI. Verify email (SMTP) configuration and notification settings. Approval requests expire after the configured timeout if not answered.

Action did not execute

Verify that a pre-approved action in Policies matches the incident (component, error category, severity). Check safety constraints such as environment, time windows, and rate limits, and confirm the execution target is correctly registered.

Analyzer query fails or times out

Ensure your observability data store and permissions are correct. Large or complex queries may hit the query timeout or result limit; try a narrower time range or more specific filters.

Component or service unhealthy

The Dashboard shows the status of platform components. Check connectivity to your data store, database, and AI services as applicable for your deployment.

Connector: no logs or pipeline not running

In the connector dashboard, confirm the pipeline is started (push/stream enabled). Verify provider resources (e.g. sink, topic, queue) match the connector config and that the connector's auth (e.g. OIDC, API key) is set correctly. For OTLP: check the OTel Collector is running and the exporter endpoint and Bearer token are correct. See section 6a.

FAQ

The platform groups related events into a single incident. You receive one consolidated email per incident.

Support and documentation

Documentation

For architecture details, component behavior, and design decisions, refer to the full technical documentation provided with the solution or in the deployment package.

Marketplace or deployment channel

For billing, subscription, and channel-specific issues, use the support channel associated with your listing (e.g. Google Cloud Marketplace, AWS Marketplace, or your deployment channel).

Vendor support

Still have questions? Email support@observai.ai for issues or topics not covered in this guide or the documentation.

ObservAI Support

What is ObservAI?

How it works

Capabilities and components

Anomaly detection

Orchestration

Traceability

Remediation

Executor

Root cause analysis (RCA)

Notifications (Collaborator)

Analyzer

Recommender

Connectors

Key concepts

Incident

Anomaly

Primary vs secondary events

Approval request

Confidence tiers

Pre-approved actions

Using the platform

Dashboard

Analyzer

Audit

Approvals

Policies

Connectors / Settings

Getting started and ingestion

Connector-based ingestion

OpenTelemetry (OTLP)

What to configure

Connectors: data ingestion and configuration

Typical pipeline patterns

Configuration

Pipeline controls (where supported)

OTel Collector (where used)

Log sampling and metrics (where supported)

Where to configure

Summary for support

Troubleshooting

No anomalies detected

Many similar events, one incident

No remediation or execution

Approval email not received

Action did not execute

Analyzer query fails or times out

Component or service unhealthy

Connector: no logs or pipeline not running

FAQ

Why did I receive one email for many similar errors?

Why was no action executed?

Can I change when actions auto-execute vs require approval?

Do OpenTelemetry (OTLP) logs run through anomaly detection?

How do I add a new automated action?

How do I set up data ingestion?

Support and documentation

Documentation

Marketplace or deployment channel

Vendor support