workflow

Closed Loop Beats More Automation

A broken automation system is easy to recognize when it crashes. The dangerous one exits cleanly, posts a green check, and teaches you nothing. That is the failure mode I keep seeing in agent workflows, content pipelines, scheduled jobs, and developer tooling: the work runs, the logs look polite, and the system forgets to learn from the thing that just happened.

More automation does not fix that. A bigger queue, another agent, or one more scheduled job only multiplies the same blind spot. The useful question is smaller and harsher: when the metric is missing, the API is blocked, the post duplicates yesterday, or the build artifact is empty, does the workflow create a record that changes tomorrow’s behavior?

Automation loop with artifact proof

The automation trap is not failure; it is non-learning

Most teams already know how to detect a hard failure. Exit code non-zero, uncaught exception, HTTP 500, red CI job, empty deployment. Those are visible. They page somebody, or at least they break the dashboard loudly enough that a person notices.

The harder class is non-learning automation. A scheduled job runs every morning. It writes a report, posts to a channel, refreshes a dashboard, or asks an agent to prepare a draft. One day the external analytics API is unavailable. Another day the generated report has zero rows. Another day the publishing queue accepts a post but schedules it at the wrong time. The pipeline still has enough mechanical success to continue. It produces an output-shaped object, but it does not add knowledge.

This is why I prefer closed-loop workflows over automation-first workflows. A closed loop does five things: it captures the input signal, makes a hypothesis, produces an artifact, gates the artifact, and schedules a readback. If any step fails, the failure becomes an artifact too. The loop is not allowed to pretend silence is success.

DORA’s research has spent years pushing teams away from local vanity metrics and toward outcomes such as deployment frequency, lead time for changes, change failure rate, and recovery time for failed changes [Source: https://dora.dev/research/]. The SPACE framework makes the same point from another angle: developer productivity is not one number; it includes satisfaction and well-being, performance, activity, communication and collaboration, and efficiency and flow [Source: https://queue.acm.org/detail.cfm?id=3454124]. Both ideas matter here because automation metrics are tempting to fake. Counting runs is easy. Counting useful learning is harder.

A closed loop treats the run count as a weak signal. The stronger signal is whether the workflow produced something that can be inspected, reused, and compared against the next run.

A closed loop has five parts

The version I use is deliberately boring. It fits content pipelines, CI jobs, agent runs, data refreshes, and release workflows.

StepQuestionArtifact
SignalWhat triggered this run?strategy brief, issue, alert, metric snapshot, user request
HypothesisWhat should change if this works?experiment note, expected metric, risk to avoid
ArtifactWhat did the system produce?URL, commit, report, JSON, image, post ID, build output
GateHow did we prove it is usable?tests, schema check, HTTP 200, word count, OCR, reviewer score
ReadbackWhen do we learn from it?metric readback date, owner, patch note, next strategy file

The trap is treating the artifact as the end. It is not. The artifact is the thing the gate inspects. The readback is the thing that makes the next run different.

Here is a small YAML contract I use as the shape before writing any job logic:

 1name: daily-content-loop
 2signal:
 3  source: wiki/default/concepts/tomorrow-content-strategy.md
 4  freshness_hours: 36
 5hypothesis:
 6  statement: "A closed-loop workflow checklist earns more saves than another tool post."
 7  primary_metric: "X engagement rate >= 5% or one reply/bookmark"
 8  readback_date: "2026-07-05"
 9artifact:
10  required:
11    - path: /tmp/blog-draft.md
12      min_words: 2500
13    - expected_url: https://zemna.net/blog/closed-loop-beats-more-automation/
14      expected_status_after_publish: 200
15    - file: /tmp/factcheck.json
16      min_score: 98
17gate:
18  banned_terms:
19    - empty corporate hype
20    - vague transformation claims
21    - unsupported productivity promises
22  checks:
23    - hugo_build
24    - fact_check
25    - editor_score
26    - cross_post_char_count
27readback:
28  append_to: wiki/default/concepts/social-experiment-log.md
29  patch_strategy_file: true

That file is not magic. Its value is forcing the job to name the proof before it runs. If a step cannot name its artifact and gate, it is still a demo.

Closed loop workflow

The artifact is the real health check

A cron job that exits zero only proves that a process returned zero. It does not prove that a report exists, a page rendered, a queue item was accepted, or a customer-visible action happened. That is why I put the success ping behind the artifact check, not before it.

GitHub Actions exposes logs for workflow runs, but logs are still evidence to inspect, not the same thing as a valid deliverable [Source: https://docs.github.com/en/actions/how-tos/monitor-workflows/use-workflow-run-logs]. A release job should verify the release asset. A static-site job should verify the generated URL. A data job should verify the row count and schema. A social post job should verify the platform accepted the post and return the post ID.

The minimum artifact check is simple:

 1from pathlib import Path
 2import json
 3import sys
 4
 5checks = [
 6    (Path("/tmp/blog-draft.md"), lambda p: len(p.read_text().split()) >= 2500, "draft has >=2500 words"),
 7    (Path("/tmp/factcheck.json"), lambda p: json.loads(p.read_text())["score"] >= 98, "fact-check score >=98"),
 8    (Path("/tmp/editor-gate.json"), lambda p: json.loads(p.read_text())["score"] >= 90, "editor score >=90"),
 9]
10
11failed = []
12for path, predicate, label in checks:
13    if not path.exists():
14        failed.append(f"missing: {path}")
15        continue
16    try:
17        if not predicate(path):
18            failed.append(f"failed: {label}")
19    except Exception as exc:
20        failed.append(f"error: {path}: {exc}")
21
22if failed:
23    print("ARTIFACT_GATE_FAIL")
24    for item in failed:
25        print(f"- {item}")
26    sys.exit(1)
27
28print("ARTIFACT_GATE_PASS")

This is not a replacement for real tests. It is the layer above tests that checks whether the workflow produced the thing the schedule exists to produce.

The same pattern works for non-content systems. If a backup job exits zero, list the backup and check its size. If an ETL job exits zero, query the target table and validate the partition date. If an agent says it fixed a bug, require the commit hash, the test command, and the test output path. The artifact is the boundary between “the process ran” and “the work exists.”

Observability is a loop, not a warehouse

OpenTelemetry describes traces, metrics, and logs as supported signal types for understanding software behavior [Source: https://opentelemetry.io/docs/concepts/signals/]. That framing is useful, but a closed-loop workflow needs one more discipline: every signal must have a decision path.

A trace that no one reads is storage. A metric that never changes a threshold is decoration. A log line that cannot be tied back to an artifact is a receipt without an order number.

For agent and content workflows, I like adding a small event log that records the loop state in plain JSONL. It is easy to append, easy to diff, and easy to turn into a wiki note later.

 1import json
 2from datetime import datetime, timezone
 3from pathlib import Path
 4
 5LOG = Path("/tmp/closed-loop-events.jsonl")
 6
 7def record(event_type: str, **fields):
 8    row = {
 9        "ts": datetime.now(timezone.utc).isoformat(),
10        "event_type": event_type,
11        **fields,
12    }
13    with LOG.open("a", encoding="utf-8") as f:
14        f.write(json.dumps(row, sort_keys=True) + "\n")
15
16record("signal_read", source="tomorrow-content-strategy.md", freshness_hours=10)
17record("artifact_created", kind="blog_draft", path="/tmp/blog-draft.md", words=">=2500")
18record("gate_passed", gate="fact_check", score=100)
19record("publish_candidate", expected_url="https://zemna.net/blog/closed-loop-beats-more-automation/", status="pending")
20record("readback_scheduled", metric="X ER >= 5% or one reply/bookmark", date="2026-07-05")

This is the difference between observability and memory. Observability lets you answer, “What happened?” Memory lets tomorrow’s run answer, “What did we learn last time?”

MLflow’s tracing documentation frames traces as a way to record the inputs, outputs, metadata, and intermediate steps of LLM applications [Source: https://mlflow.org/docs/latest/genai/tracing/]. That is the right instinct. The trace becomes more valuable when it can promote a failure into the next test case, prompt change, or runbook edit.

Telemetry warehouse versus closed-loop memory

The analytics outage test

The cleanest example is a social analytics outage. A shallow automation pipeline says, “Metricool unavailable, skip analysis.” A worse one invents a trend because the schedule demands a confident answer. A closed-loop pipeline does neither.

The rule is straightforward: if fresh analytics are unavailable, the job records the blocker, uses local history only as a fallback, lowers confidence, and keeps the next experiment small. It does not promote a new rule from stale data.

 1#!/usr/bin/env bash
 2set -euo pipefail
 3
 4STRATEGY="$HOME/wiki/default/concepts/tomorrow-content-strategy.md"
 5HISTORY="$HOME/wiki/default/concepts/social-metrics-history.json"
 6LOG="$HOME/wiki/default/concepts/social-experiment-log.md"
 7
 8if [ ! -s "$STRATEGY" ]; then
 9  status="strategy_missing"
10elif [ $(( $(date +%s) - $(stat -c %Y "$STRATEGY") )) -gt $((36 * 3600)) ]; then
11  status="strategy_stale"
12else
13  status="strategy_fresh"
14fi
15
16case "$status" in
17  strategy_fresh)
18    echo "Using fresh strategy brief"
19    ;;
20  *)
21    echo "Metricool strategy brief missing/stale: $status" | tee -a "$LOG"
22    test -s "$HISTORY" || { echo "No fallback history"; exit 1; }
23    ;;
24esac

The important part is not the shell script. The important part is refusing to hide uncertainty. The final report should say exactly which data source was used and which was unavailable. If a platform API is blocked, that is not nothing; it is a product signal about the operating system around the workflow.

This is where closed loops beat dashboards. A dashboard can show the missing data. A loop decides what lower-risk action is allowed while the data is missing.

Gates are where taste becomes operational

Every workflow has taste, even if nobody writes it down. A developer decides that a test suite is enough. A content operator decides that a draft is too generic. A release engineer decides that a warning is acceptable. A manager decides that one metric is too stale to trust.

The closed-loop move is to turn those judgments into gates. Not all gates need to be code. Some can be human review. Some can be structured checklists. Some can be a score threshold. The point is that the judgment becomes visible and repeatable.

Weak automationClosed-loop version
“Post daily”“Post only if strategy brief is fresh, duplicate guard passes, and editor score is >=90”
“Run tests”“Run named tests and attach the output path”
“Generate image”“Generate image, verify aspect ratio, verify text, then use the original file”
“Collect metrics”“Collect metrics or log unavailable state with fallback confidence”
“Agent fixed it”“Agent names changed files, tests, artifact, rollback”

I use this pattern because it survives tool changes. The analytics vendor can change. The LLM can change. The publishing queue can change. The gates still ask the same questions: what signal did you trust, what did you produce, how did you prove it, and when do you read it back?

The gate also stops false confidence from leaking into the next run. If a post was scheduled at the wrong time, log the mismatch. If an image generation API rate-limited the carousel, stage the outline and publish text-only where appropriate. If a fact-checker fixes a number, make the image spec read from the fact-checked draft instead of the original writer output.

That last detail matters. Closed loops break when later stages secretly trust earlier, unverified state.

I also separate blocking gates from advisory notes. A blocking gate prevents the publish or deploy: wrong price, broken link, failed build, missing artifact, unsupported claim. An advisory note records something worth improving without pretending the work is unusable: the queue picked a less ideal time, the image style is slightly off-brand, or a reference source is useful but not primary. The distinction keeps the loop honest. If everything blocks, the workflow becomes brittle. If nothing blocks, the workflow becomes theater.

The same distinction helps with human review. A reviewer should not have to decide from scratch whether a warning is fatal. The contract should say which failures stop the run, which failures create a follow-up task, and which failures only affect the next experiment. That is how taste becomes operational without becoming arbitrary.

Closed-loop gate checklist

What you should do Monday morning

Do not rewrite your whole automation system. Pick one scheduled workflow that already matters and add a loop around it.

Start with a workflow that already has a painful failure story. The daily job that once produced an empty report is better than the shiny new agent prototype. The release checklist that once missed a broken asset is better than a generic observability initiative. Closed loops work best when they are attached to real scar tissue, because the gate can be written in concrete language.

For one week, keep the loop small enough that you can read every artifact by hand. That is not a step backward. It is calibration. After a few runs, the repeated checks become obvious candidates for code, and the rare judgment calls stay visible for human review.

  1. Name the artifact. Write down the exact file, URL, database row, post ID, commit SHA, or report path that proves the job produced value.
  2. Move the success ping behind the artifact check. The job is not green until the artifact exists, is fresh, has substance, and passes one domain assertion.
  3. Add a fallback state. Missing API data, empty result sets, and stale briefs should be explicit states, not silent skips.
  4. Add a duplicate guard. If the system publishes or notifies, check recent history before repeating the same message.
  5. Write a readback date. Every experiment needs a time when the result is inspected and fed into the next strategy.
  6. Patch the rule, not just the output. If the gate catches a failure, update the checklist, script, or prompt that allowed it.

A useful first commit is small. Add one JSON file, one shell check, and one log entry. Then make the next run read them.

Here is the minimal contract I would start with:

 1{
 2  "workflow": "daily-blog",
 3  "artifact": {
 4    "path": "/home/linuxuser/projects/zemna.net/public/blog/closed-loop-beats-more-automation/index.html",
 5    "min_bytes": 10000,
 6    "public_url_after_publish": "https://zemna.net/blog/closed-loop-beats-more-automation/",
 7    "expected_status_after_publish": 200
 8  },
 9  "gates": {
10    "fact_check_min": 98,
11    "editor_min": 90,
12    "word_count_min": 2500,
13    "code_blocks_min": 3
14  },
15  "readback": {
16    "date": "2026-07-05",
17    "metric": "X ER >= 5% or one reply/bookmark",
18    "log": "wiki/default/concepts/social-experiment-log.md"
19  }
20}

That contract is not expensive. It is a cheap way to prevent a daily automation system from becoming a daily superstition.

Further reading

Closed-loop work is slower on day one. You have to name the artifact, write the gate, and schedule the readback. By day ten, it is faster because the system stops pretending that every green run was useful. The point is not to automate more. The point is to make every automation leave behind evidence that improves the next one.