AI/LLM-Powered Pipeline Analysis

AI/LLM-Powered Pipeline Analysis #

is a Technology Preview feature only. Technology Preview features are not currently supported and might not be functionally complete. We do not recommend using them in production. These features provide early access to an upcoming Pipelines-as-Code features, enabling you to test functionality and provide feedback during the development process.

Pipelines as Code supports AI-powered analysis of your CI/CD pipeline runs using Large Language Models (LLMs). This feature can automatically analyze failures, provide insights, and suggest fixes directly in your pull requests.

Overview #

The LLM analysis feature enables you to:

  • Automatically analyze failed pipelines and provide root cause analysis
  • Generate actionable recommendations for fixing issues
  • Post insights as PR comments
  • Configure custom analysis scenarios using different prompts and triggers

Note: Additional output destinations (check-run and annotation) and structured JSON output are planned for future releases.

Supported Providers #

  • OpenAI - Default model: gpt-5-mini
  • Google Gemini - Default model: gemini-2.5-flash-lite

You can specify any model supported by your chosen provider. See Model Selection for guidance.

Configuration #

LLM analysis is configured in the Repository CRD under spec.settings.ai:

apiVersion: pipelinesascode.tekton.dev/v1alpha1
kind: Repository
metadata:
  name: my-repo
spec:
  url: "https://github.com/org/repo"
  settings:
    ai:
      enabled: true
      provider: "openai"
      timeout_seconds: 30
      max_tokens: 1000
      secret_ref:
        name: "openai-api-key"
        key: "token"
      roles:
        - name: "failure-analysis"
          model: "gpt-5-mini"  # Optional: specify model (uses provider default if omitted)
          prompt: |
            You are a DevOps expert. Analyze this failed pipeline and:
            1. Identify the root cause
            2. Suggest specific fixes
            3. Recommend preventive measures            
          on_cel: 'body.pipelineRun.status.conditions[0].reason == "Failed"'
          context_items:
            error_content: true
            container_logs:
              enabled: true
              max_lines: 100
          output: "pr-comment"

Configuration Fields #

Top-Level Settings #

FieldTypeRequiredDescription
enabledbooleanYesEnable/disable LLM analysis
providerstringYesLLM provider: openai or gemini
api_urlstringNoCustom API endpoint URL (overrides provider default)
timeout_secondsintegerNoRequest timeout (1-300, default: 30)
max_tokensintegerNoMaximum response tokens (1-4000, default: 1000)
secret_refobjectYesReference to Kubernetes secret with API key
rolesarrayYesList of analysis scenarios (minimum 1)

Analysis Roles #

Each role defines a specific analysis scenario:

FieldTypeRequiredDescription
namestringYesUnique identifier for this role
promptstringYesPrompt template for the LLM
modelstringNoModel name (consult provider documentation for available models). Uses provider default if not specified.
on_celstringNoCEL expression for conditional triggering. If not specified, the role will always run.
outputstringYesOutput destination (currently only pr-comment is supported)
context_itemsobjectNoConfiguration for context inclusion

Context Items #

Control what information is sent to the LLM:

FieldTypeDescription
commit_contentbooleanInclude commit information (see Commit Fields below)
pr_contentbooleanInclude PR title, description, metadata
error_contentbooleanInclude error messages and failures
container_logs.enabledbooleanInclude container/task logs
container_logs.max_linesintegerLimit log lines (1-1000, default: 50). ⚠️ High values may impact performance

Commit Fields #

When commit_content: true is enabled, the following fields are included in the LLM context:

FieldTypeDescriptionExample
commit.shastringCommit SHA hash"abc123def456..."
commit.messagestringCommit title (first line/paragraph)"feat: add new feature"
commit.urlstringWeb URL to view the commit"https://github.com/org/repo/commit/abc123"
commit.full_messagestringComplete commit message (if different from title)"feat: add new feature\n\nDetailed description..."
commit.author.namestringAuthor’s name"John Doe"
commit.author.datetimestampWhen the commit was authored"2024-01-15T10:30:00Z"
commit.committer.namestringCommitter’s name (may differ from author)"GitHub"
commit.committer.datetimestampWhen the commit was committed"2024-01-15T10:31:00Z"

Privacy & Security Notes:

  • Email addresses are intentionally excluded from the commit context to protect personally identifiable information (PII) when sending data to external LLM APIs
  • Fields are only included if available from the git provider
  • Some providers may have limited information (e.g., Bitbucket Cloud only provides author name)
  • Author and committer may be the same person or different (e.g., when using git commit --amend or rebasing)

Model Selection #

Each analysis role can specify a different model to optimize for your needs. If no model is specified, provider-specific defaults are used:

  • OpenAI: gpt-5-mini
  • Gemini: gemini-2.5-flash-lite

Specifying Models #

You can use any model name supported by your chosen provider. Consult your provider’s documentation for available models:

Example: Per-Role Models #

settings:
  ai:
    enabled: true
    provider: "openai"
    secret_ref:
      name: "openai-api-key"
      key: "token"
    roles:
      # Use the most capable model for complex analysis
      - name: "security-analysis"
        model: "gpt-5"
        prompt: "Analyze security failures..."

      # Use default model (gpt-5-mini) for general analysis
      - name: "general-failure"
        # No model specified - uses provider default
        prompt: "Analyze this failure..."

      # Use the most economical model for quick checks
      - name: "quick-check"
        model: "gpt-5-nano"
        prompt: "Quick diagnosis..."

CEL Expressions for Triggers #

By default, LLM analysis only runs for failed pipeline runs. Use CEL expressions in on_cel to further control when analysis runs or to enable it for successful runs.

If on_cel is not specified, the role will execute for all failed pipeline runs.

Overriding the Default Behavior #

To run LLM analysis for all pipeline runs (both successful and failed), use on_cel: 'true':

roles:
  - name: "pipeline-summary"
    prompt: "Generate a summary of this pipeline run..."
    on_cel: 'true'  # Runs for ALL pipeline runs, not just failures
    output: "pr-comment"

This is useful for:

  • Generating summaries for all pipeline runs
  • Tracking metrics for successful runs
  • Celebrating successes with automated messages
  • Reporting on build performance

Example CEL Expressions #

# Run on ALL pipeline runs (overrides default failed-only behavior)
on_cel: 'true'

# Only on successful runs (e.g., for generating success reports)
on_cel: 'body.pipelineRun.status.conditions[0].reason == "Succeeded"'

# Only on pull requests (in addition to default failed-only check)
on_cel: 'body.event.event_type == "pull_request"'

# Only on main branch
on_cel: 'body.event.base_branch == "main"'

# Only on default branch (works across repos with different default branches)
on_cel: 'body.event.base_branch == body.event.default_branch'

# Skip analysis for bot users
on_cel: 'body.event.sender != "dependabot[bot]"'

# Only for PRs with specific labels
on_cel: '"needs-review" in body.event.pull_request_labels'

# Only when triggered by comment
on_cel: 'body.event.trigger_comment.startsWith("/analyze")'

# Combine conditions
on_cel: 'body.pipelineRun.status.conditions[0].reason == "Failed" && body.event.event_type == "pull_request"'

Available CEL Context Fields #

Top-Level Context #

FieldTypeDescription
body.pipelineRunobjectFull PipelineRun object with status and metadata
body.repositoryobjectFull Repository CRD object
body.eventobjectEvent information (see Event Fields below)
pacmap[string]stringPAC parameters map

Event Fields (body.event.*) #

Event Type and Trigger:

FieldTypeDescriptionExample
event_typestringEvent type from provider"pull_request", "push", "Merge Request Hook"
trigger_targetstringNormalized trigger type across providers"pull_request", "push"

Branch and Commit Information:

FieldTypeDescriptionExample
shastringCommit SHA"abc123def456..."
sha_titlestringCommit title/message"feat: add new feature"
base_branchstringTarget branch for PR (or branch for push)"main"
head_branchstringSource branch for PR (or branch for push)"feature-branch"
default_branchstringDefault branch of the repository"main" or "master"

Repository Information:

FieldTypeDescriptionExample
organizationstringOrganization/owner name"my-org"
repositorystringRepository name"my-repo"

URLs:

FieldTypeDescriptionExample
urlstringWeb URL to repository"https://github.com/org/repo"
sha_urlstringWeb URL to commit"https://github.com/org/repo/commit/abc123"
base_urlstringWeb URL to base branch"https://github.com/org/repo/tree/main"
head_urlstringWeb URL to head branch"https://github.com/org/repo/tree/feature"

User Information:

FieldTypeDescriptionExample
senderstringUser who triggered the event"user123", "dependabot[bot]"

Pull Request Fields (only populated for PR events):

FieldTypeDescriptionExample
pull_request_numberintPR/MR number42
pull_request_titlestringPR/MR title"Add new feature"
pull_request_labels[]stringList of PR/MR labels["enhancement", "needs-review"]

Comment Trigger Fields (only when triggered by comment):

FieldTypeDescriptionExample
trigger_commentstringComment that triggered the run"/test", "/retest"

Webhook Fields:

FieldTypeDescriptionExample
target_pipelinerunstringTarget PipelineRun for incoming webhooks"my-pipeline-run"

Excluded Fields #

The following fields are intentionally excluded from CEL context for security and architectural reasons:

  • event.Provider - Contains sensitive API tokens and webhook secrets
  • event.Request - Contains raw HTTP headers and payload which may include secrets
  • event.InstallationID, AccountID, GHEURL, CloneURL - Provider-specific internal identifiers and URLs
  • event.SourceProjectID, TargetProjectID - GitLab-specific internal identifiers
  • event.State - Internal state management fields
  • event.Event - Raw provider event object (already represented in structured fields)

Output Destinations #

PR Comment #

Posts analysis as a comment on the pull request:

output: "pr-comment"

Benefits:

  • Visible to all developers
  • Can be updated with new analysis
  • Easy to discuss and follow up

Coming Soon: Additional output destinations including check-run (GitHub check runs) and annotation (PipelineRun annotations) will be available in future releases.

Setting Up API Keys #

Important: The Secret must be created in the same namespace as the Repository custom resource (CR).

OpenAI #

  1. Get an API key from OpenAI Platform

  2. Create a Kubernetes secret:

kubectl create secret generic openai-api-key \
  --from-literal=token="sk-your-openai-api-key" \
  -n <namespace>

Google Gemini #

  1. Get an API key from Google AI Studio

  2. Create a Kubernetes secret:

kubectl create secret generic gemini-api-key \
  --from-literal=token="your-gemini-api-key" \
  -n <namespace>

Using Custom API Endpoints #

The api_url field allows you to override the default API endpoint for LLM providers. This is useful for:

  • Self-hosted LLM services (e.g., LocalAI, vLLM, Ollama with OpenAI adapter)
  • Enterprise proxy services
  • Regional or custom endpoints (e.g., Azure OpenAI)
  • Alternative OpenAI-compatible APIs

Example Configuration #

settings:
  ai:
    enabled: true
    provider: "openai"
    api_url: "https://custom-llm.example.com/v1"  # Custom endpoint
    secret_ref:
      name: "custom-api-key"
      key: "token"
    roles:
      - name: "failure-analysis"
        prompt: "Analyze this pipeline failure..."
        output: "pr-comment"

Default API Endpoints #

If api_url is not specified, these defaults are used:

  • OpenAI: https://api.openai.com/v1
  • Gemini: https://generativelanguage.googleapis.com/v1beta

URL Format Requirements #

The api_url must:

  • Use http:// or https:// scheme
  • Include a valid hostname
  • Optionally include port and path components

Examples:

# Valid URLs
api_url: "https://api.openai.com/v1"
api_url: "http://localhost:8080/v1"
api_url: "https://custom-proxy.company.com:9000/openai/v1"

# Invalid URLs
api_url: "ftp://example.com"       # Wrong scheme
api_url: "//example.com"           # Missing scheme
api_url: "not-a-url"               # Invalid format

Example: Complete Configuration #

See the complete example for a full configuration with multiple roles.

Best Practices #

Prompt Engineering #

  1. Be specific: Tell the LLM exactly what you want
  2. Structure your prompts: Use numbered lists for clarity
  3. Set expectations: Define the output format
  4. Provide context: Explain what information will be provided

Example prompt:

prompt: |
  You are a DevOps expert analyzing a CI/CD pipeline failure.

  Based on the error logs and context provided:
  1. Identify the root cause of the failure
  2. Suggest 2-3 specific steps to fix the issue
  3. Recommend one preventive measure for the future

  Keep your response concise and actionable.  

Security Considerations #

  1. Protect API keys: Always store in Kubernetes secrets
  2. Review logs: Be aware of what logs are sent to external APIs
  3. Cost monitoring: Set up billing alerts with your LLM provider
  4. Rate limiting: Configure appropriate timeouts

Cost Management #

  1. Select appropriate models: Use more economical models for simple tasks and reserve expensive models for complex analysis. Consult your provider’s pricing documentation.
  2. Limit max_tokens: Reduce costs by limiting response length
  3. Use selective triggers: Only analyze failures, not all runs
  4. Control log lines: Limit max_lines in container logs to reduce context size

Performance Tips #

  1. Set reasonable timeouts: Default 30s is usually sufficient
  2. Non-blocking design: Analysis runs in background, doesn’t block pipeline
  3. Selective context: Only include relevant context items
  4. Limit log fetching: Setting container_logs.max_lines too high (>500) can impact performance when fetching logs from many containers. Start with lower values (50-100) and increase only if needed
  5. Monitor failures: Check logs if analysis consistently fails

Troubleshooting #

Analysis Not Running #

Check that:

  • enabled: true in configuration
  • CEL expression in on_cel matches your event
  • API key secret exists and is accessible
  • Namespace matches Repository location

API Errors #

Common issues:

  • 401 Unauthorized: Check API key validity
  • 429 Rate Limited: Reduce analysis frequency or upgrade plan
  • Timeout: Increase timeout_seconds or reduce context size

Check controller logs:

kubectl logs -n pipelines-as-code deployment/pipelines-as-code-controller | grep "LLM"

High Costs #

To reduce costs:

  1. Use more restrictive on_cel expressions
  2. Lower max_tokens value
  3. Reduce container_logs.max_lines
  4. Consider switching to a cheaper model

Limitations #

  • Analysis is best-effort and non-blocking
  • API key costs are your responsibility
  • Subject to LLM provider rate limits
  • Context size limited by token constraints
  • Not suitable for sensitive/confidential logs

Further Reading #

Calendar November 4, 2025
Edit Edit this page