> ## Documentation Index
> Fetch the complete documentation index at: https://docs.jacobpevans.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Data pipelines

> Log and NetFlow ingest paths: UniFi gear and network devices to Splunk indexers, via HAProxy and Cribl Edge.

Two pipelines move data from the homelab edge to Splunk. The architecture is intentional: HAProxy load-balances toward Cribl Edge, which routes, transforms, and reduces volume before forwarding to Splunk over HEC.

The goal: 30–50% ingest reduction without losing security signal.

## Log pipeline

UniFi network gear and application logs land in Splunk via Cribl Edge. HAProxy fronts the Cribl Edge cluster for high availability.

{/* Boundary crossings: 0. Ranks: 4. Aspect: ~3:1 LR. Pass. */}

```mermaid theme={null}
%%{init: {'theme':'base','look':'handDrawn','themeVariables':{'fontFamily':'Geist','fontSize':'14px','primaryColor':'#102937','primaryTextColor':'#F4EFE6','primaryBorderColor':'#4FB3A9','lineColor':'#4FB3A9','secondaryColor':'#0B1D2A','tertiaryColor':'#1A2A38','clusterBkg':'rgba(79,179,169,0.08)','clusterBorder':'#4FB3A9'}}}%%
flowchart LR
  Net([UniFi gear<br/>syslog tcp/1514])
  HA([HAProxy<br/>tcp/1514])
  EA([Cribl Edge A])
  EB([Cribl Edge B])
  SP[(Splunk Enterprise<br/>HEC :8088)]

  Net --> HA
  HA --> EA --> SP
  HA --> EB --> SP

  classDef edge fill:#102937,stroke:#4FB3A9,stroke-width:2px,color:#F4EFE6;
  classDef proxy fill:#102937,stroke:#E06B4A,stroke-width:2px,color:#F4EFE6;
  classDef store fill:#102937,stroke:#F4EFE6,stroke-width:2px,color:#F4EFE6;

  class Net,EA,EB edge
  class HA proxy
  class SP store

  linkStyle 0 stroke:#4FB3A9,stroke-width:2px;
  linkStyle 1,2,3,4 stroke:#E06B4A,stroke-width:2px,stroke-dasharray:4 3;
```

Coral dashed edges carry the data; the solid green edge is the physical syslog hop. Cribl Edge drops verbose fields, routes by `event_type`, enriches and masks — the indexer takes a smaller, cleaner payload.

## NetFlow pipeline

NetFlow v9 / IPFIX from network devices follows the same shape on a different port. UDP is loss-tolerant by design, so HAProxy distributes rather than fails over.

```mermaid theme={null}
%%{init: {'theme':'base','look':'handDrawn','themeVariables':{'fontFamily':'Geist','fontSize':'14px','primaryColor':'#102937','primaryTextColor':'#F4EFE6','primaryBorderColor':'#4FB3A9','lineColor':'#4FB3A9','secondaryColor':'#0B1D2A','tertiaryColor':'#1A2A38','clusterBkg':'rgba(79,179,169,0.08)','clusterBorder':'#4FB3A9'}}}%%
flowchart LR
  Net([Network devices<br/>NetFlow udp/2055])
  HA([HAProxy<br/>udp/2055])
  EA([Cribl Edge A])
  EB([Cribl Edge B])
  SP[(Splunk Enterprise<br/>HEC :8088)]

  Net --> HA
  HA --> EA --> SP
  HA --> EB --> SP

  classDef edge fill:#102937,stroke:#4FB3A9,stroke-width:2px,color:#F4EFE6;
  classDef proxy fill:#102937,stroke:#E06B4A,stroke-width:2px,color:#F4EFE6;
  classDef store fill:#102937,stroke:#F4EFE6,stroke-width:2px,color:#F4EFE6;

  class Net,EA,EB edge
  class HA proxy
  class SP store

  linkStyle 0 stroke:#4FB3A9,stroke-width:2px;
  linkStyle 1,2,3,4 stroke:#E06B4A,stroke-width:2px,stroke-dasharray:4 3;
```

Cribl pipelines de-duplicate, parse flow records, and aggregate by tuple before forwarding.

## What lives where

| Layer                     | Provisioned by                                                   | Configured by                                                                  | Source repo |
| ------------------------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------ | ----------- |
| Proxmox host / VMs / LXCs | [tofu-proxmox](https://github.com/JacobPEvans/terraform-proxmox) | [ansible-proxmox](https://github.com/JacobPEvans/ansible-proxmox)              | both        |
| HAProxy                   | (Ansible role)                                                   | [ansible-proxmox-apps](https://github.com/JacobPEvans/ansible-proxmox-apps)    | apps repo   |
| Cribl Edge                | (Ansible role)                                                   | [ansible-proxmox-apps](https://github.com/JacobPEvans/ansible-proxmox-apps)    | apps repo   |
| Splunk Enterprise         | (manual / Ansible)                                               | [ansible-splunk](https://github.com/JacobPEvans/ansible-splunk)                | splunk repo |
| Cribl pipelines           | (manual / Cribl pack)                                            | [cc-edge-\* packs](https://github.com/JacobPEvans?tab=repositories\&q=cc-edge) | pack repos  |
| Splunk knowledge objects  | n/a                                                              | Splunk TA (AI observability)                                                   | TA repo     |

## DR posture

Splunk Cloud failover is provisioned via [tofu-aws](https://github.com/JacobPEvans/terraform-aws) — AWS resources (EC2, S3, Route 53) that come up cold and accept the same HEC traffic if the home cluster is offline. Cribl Edge routes can be flipped to point at the AWS endpoint with a single config change.
