MODULE 03

CLI WORKFLOWS

Describe workflows in plain English, let your CLI LLM build the automation, and learn to test, debug, and refine pipelines for your reporting.

Describing workflows, building pipelines

Video available in the course platform

In this video, Joe demonstrates a real cron-based scheduler that checks for new content, processes it, and sends results on a recurring schedule. He also walks through Reroute NJ (reroutenj.org), a zero-build static site in 11 languages built with CLI tools — including page generation scripts, a translation pipeline, and scrapers that commit and push changes automatically.

Learning objectives

  • > Describe a multi-step journalism workflow in plain English and have your CLI LLM translate it into a working, reusable script
  • > Use plan mode (/plan) to review the approach before any code is written or any command is run
  • > Test AI-built scripts on a small number of real examples before running them on full workloads
  • > Diagnose pipeline failures by pasting errors back into your CLI session and iterating on fixes
  • > Apply security practices: API keys in environment variables, scripts reviewed before deployment

Before you start

This module is about building automated pipelines — scripts that run a sequence of AI tasks without you manually intervening. To use them well, it helps to understand what's actually happening inside an AI session: how context is stored, why order matters, and what "caching" means for how quickly your pipeline runs.

Read the full explanation: how AI session memory works →

Key concepts

Describe the workflow, let the LLM build it

Instead of asking your CLI tool to help with one document, you describe an entire workflow — a sequence of steps from input to output — and it builds the script that automates it.

"Fetch this URL, strip out the ads, summarize it in three bullets, save it as a markdown file with today's date in the filename." That's a workflow description. Your CLI tool turns it into a reusable script. You review it, test it on a few real examples, and refine it until the output is what you want.

Plan before you build

Before asking your CLI tool to build anything, ask it to plan first. See the approach before any file is written or any command is run.

# Claude Code: enter plan mode
/plan

# Gemini CLI: be explicit
"Before doing anything, plan this out step by step
and wait for my approval before taking any action."

A misunderstood requirement caught at the planning stage costs you 30 seconds. The same misunderstanding caught after a failed 50-document batch run costs you time and money.

Pipeline stages

The right way to build a pipeline: separate stages, each with a clear input and output. Fetch. Clean. Analyze. Format. When something breaks, you know which stage failed — and you can fix one without touching the others.

Test small before running large

Real API calls cost money. Before you process 500 documents through a paid API, test on 5. Better yet, validate your inputs with a free local model before sending anything to a paid service. Use cheap tools to catch problems before running expensive operations.

The debugging loop

When something breaks, paste the exact error message into your CLI session and ask what it means. Don't paraphrase. Your tool already knows the code it built for you, so it can read the error in context and usually identify the problem immediately. This is one of the most useful things about working with a CLI LLM: your debugging collaborator is already there, already has context, and can read error messages directly.

API key security

When your CLI tool builds scripts that call external APIs, those scripts often need API keys. Never put keys directly in your source code. Use environment variables instead:

# Bad: key hardcoded in a script (anyone who sees the file has your key)
API_KEY="sk-abc123..."

# Good: key stored in an environment variable
export API_KEY="sk-abc123..."   # set once in your terminal
echo "$API_KEY"                  # scripts read it from the environment

If you ask your CLI tool to build a script that uses an API, tell it to read keys from environment variables. And never commit a file containing a real key to git.

Rate limits and checkpointing

APIs have rate limits for a reason. Building pauses into your scripts (sleep 2 between API calls) keeps your API access alive. For long batch jobs, build in checkpointing — a log of which files have been processed — so the job can restart from where it left off instead of starting over.

EXERCISE

GOAL: Ask Claude to build a multi-stage workflow script for a real journalism task — then explore what it built and why it works.

Choose one of these scenarios (or adapt one to fit your work):

A. City council agenda to reporter prep sheet. Feed a meeting agenda into a pipeline that extracts each item, summarizes the background, flags items with budget implications, and outputs a prep document a reporter can bring to the meeting.

B. Web scraping pipeline. Pull data from a public source (a government page, a court docket index, an open data portal), clean it into structured format, and output a summary of what changed since the last run.

C. Batch document processing. Process a folder of PDFs or text files (transcripts, press releases, public records) — extract key facts from each, then produce a combined summary with source references.

D. Weekly content roundup generator. Given a folder of articles or story links, generate a newsletter-style roundup: headline, 2-3 sentence summary, and a categorization tag for each piece.

01 Open Claude Code

Open your terminal and launch a session:

terminal
claude

02 Ask Claude to build the pipeline

Describe your chosen scenario inside the session. Here's an example for option D (weekly roundup), but adapt it to whichever scenario you picked:

claude code
Create a content pipeline in ~/content-pipeline. I need:

1. Three realistic sample input files (fictional local news articles as .txt files)
2. A shell script called run-pipeline.sh that processes each input file through multiple stages: first summarize it in 2-3 sentences, then add a category tag, then format the result as an HTML snippet. Save each stage's output so I can inspect intermediate results. Final outputs go in an output/ folder.
3. Make the script executable, run it on ONE file first, and show me the output before processing the rest.

Watch what Claude does: it creates the folder structure, writes sample inputs, writes the script, and runs a single test. You delegated the whole thing — and you tested small before running large.

03 Ask Claude to explain what it built

In the same session, ask Claude to walk you through the script:

claude code
Walk me through the script line by line. Explain each stage of the pipeline: what goes in, what comes out, and how the stages connect. If this script used an API key, where should it store it — and where should it never store it?

This is the key concept of the module: a multi-stage pipeline breaks a big task into smaller steps (fetch, clean, analyze, format), each producing output that feeds the next. Claude built it; your job is to understand the structure well enough to modify or extend it.

04 Run the full pipeline

Now that you've verified the output on one file, ask Claude to process all inputs and show you the results:

claude code
Run the pipeline on all the input files and show me each final output. Then tell me: what would need to change in the script if I wanted to use Gemini CLI instead of Claude? And what would I need to do if this script required an API key to call an external service?

The script Claude wrote is yours — save it, modify it, run it on real data whenever you need. That's what "reusable workflow" means.

Checkpoint

Self-check: Make sure you can answer these before moving on.

  • ? What are the stages in the pipeline you built, and how does output from one stage feed into the next?
  • ? Why should you run a pipeline on one file first before processing an entire folder?
  • ? Where should API keys go — and where should they never go?
  • ? What kinds of journalism workflows are well-suited to an automated pipeline like this one?

Resources

  • [DOCS] Bash Reference Manual - Complete bash documentation
  • [COURSE] The Missing Semester: Shell Tools - MIT's guide to shell scripting
  • [REF] Claude Code documentation - Non-interactive mode reference
  • [TOOL] ExplainShell - Paste any shell command to see what each part does

Troubleshooting

"Permission denied" when running the script

You need to make the script executable: chmod +x run-pipeline.sh

"command not found: claude"

Claude Code isn't in your PATH. If you used the native installer (curl -fsSL https://claude.ai/install.sh | sh), restart your terminal. See code.claude.com/docs for troubleshooting steps.

Script produces empty output

Check that the input file exists and isn't empty. Add set -e at the top of your script to stop on errors, making problems easier to diagnose.

Special characters in filenames cause errors

Always quote variable references in scripts: use "$file" not $file. This handles spaces and special characters correctly.