Module 4: Agents, tools, and data access | Advanced prompt engineering

Learning objectives

> Explain the difference between chatbots and agents -- what tools, autonomy, and multi-step planning look like in practice, and where human oversight is required
> Use claude -p for non-interactive mode, enabling scripted and autonomous workflows that process documents without manual intervention
> Configure a basic MCP server to connect Claude Code to a local knowledge base, giving the AI access to your source documents
> Evaluate the tradeoffs between AI autonomy and editorial control, identifying where grounded, source-attributed responses matter and where things break down

Before you start

Agents and data access introduce a new failure mode: errors that compound across pipeline stages. When Claude delegates work to subagents or pulls data from external sources, mistakes in early steps become assumed facts in later ones. The concept below explains how this works and what to do about it.

Read: working with subagents — compounding errors and verification →

Agents, tools, and data access

Video coming soon

In this video, Joe demonstrates claude -p processing a folder of documents autonomously, walks through MCP configuration, and does a deep dive into Reroute NJ -- a real-world project that applies concepts from all four modules.

Key concepts

Chatbots vs. agents

Chatbot: Responds to prompts using only its training data. Can't take actions, access external information, or modify files. Like talking to someone who can only answer from memory.

Agent: Can use tools, access data sources, and take actions. Breaks complex tasks into steps, decides what tools to use, and executes multi-step plans. Claude Code is an agent -- it can read files, run commands, and make changes to your projects.

Non-interactive mode: `claude -p`

Running claude -p "your prompt here" sends a single prompt without entering an interactive session. Claude processes the prompt, prints the result to stdout, and exits. This is what makes scripted pipelines and autonomous workflows possible.

Practical uses: a shell script that runs claude -p on every file in a folder, a cron job that checks data daily, or piping output from one command into another. Once you can call Claude from a script, you can build workflows that run without you.

Agent delegation and multi-step workflows

Agents don't just answer questions -- they break work into steps and execute each one. Claude Code can read a file, decide what needs to change, make the edit, run a test, and fix problems it finds along the way. Each step informs the next.

This also means agents can delegate. Claude Code can spawn subagents to handle parts of a larger task in parallel. The risk: errors in early steps become assumed facts in later ones. The skill is knowing how to verify intermediate results, not just the final output.

Connecting AI to data sources with MCP

Language models only know what's in their training data. To work with your files, your databases, or live information from the web, the model needs a connection layer. Model Context Protocol (MCP) is an open standard for this -- it lets you plug data sources into Claude the same way you'd add plugins to a browser.

This is sometimes called retrieval-augmented generation (RAG): the model retrieves relevant information from an external source and uses it to generate a grounded response instead of relying on memory alone. MCP is how you set up that retrieval in practice.

Every tool you connect costs context. Each MCP server you configure consumes tokens in your session before you've asked anything -- the server's schema, its available tools, its connection overhead. Before connecting a data source, ask whether the capability it adds justifies what it takes from your context budget. A database connection you rarely need still consumes context on every session where you don't use it. A lean, purposeful set of integrations outperforms a crowded one.

MCP server types

Common MCP servers include:

> Filesystem: Read and write files in specified directories
> Database: Query databases (SQLite, PostgreSQL, etc.)
> Web: Fetch and parse web pages
> Search: Search engines, vector databases for semantic search
> Custom: Any tool or API you want Claude to access

Where data connections break down

Setting up an MCP server is straightforward. Debugging it when things go wrong takes practice. Common failures:

> Permissions: The server can't read the files you pointed it at, or the database user lacks access to the table you need
> Auth failures: API keys expired, tokens revoked, OAuth scopes too narrow -- the connection looks configured but returns nothing useful
> Schema mismatches: The data source changed its column names, field types, or API response format since you set things up
> Silent failures: The server starts, Claude tries to use it, but gets empty results or malformed data -- and keeps going without flagging the problem

The pattern for journalists: when Claude gives you an answer based on external data, check the source. Ask it to show you the raw data it retrieved. If it can't, the connection may be broken and the answer may be fabricated from training data instead.

From the field

Reroute NJ

Reroute NJ is a public transit guide built during a real infrastructure disruption in New Jersey -- 11 languages, zero build step, serving a real community. The project applies concepts from all four modules: a CLAUDE.md for project context (Module 1), reusable scripts for page generation and translation (Module 2), a cron-scheduled scraper that commits and pushes automatically (Module 3), and autonomous data connections to external sources like news sites and NJ Transit alerts (Module 4).

Jay Rosen digital archive

The Jay Rosen digital archive is a collection of 765+ records from journalist Jay Rosen's career. The project applies entity extraction and grounded knowledge retrieval to primary source materials, turning raw documents into a searchable knowledge base.

EXERCISE

GOAL: Set up an MCP server that gives Claude controlled access to a data source, then query it to ground Claude's responses in real documents instead of training data.

01 Understand the default file access

Claude Code can already read and write files in your current working directory. MCP extends this by:

> Giving access to specific directories you configure
> Enabling more structured queries (search within files, list by criteria)
> Supporting additional data sources beyond the filesystem

02 Create a research documents folder

Open your terminal, launch Claude Code, and ask it to set up the folder. Choose one of the scenarios below (or make up your own):

terminal

claude

Option A: local government documents

claude code

Create a folder called research-docs in my home directory with three realistic sample documents I can use to test data queries: a city budget summary for fiscal year 2024 with department-level breakdowns, school board meeting minutes from January 2024, and quarterly crime statistics for Q4 2024. Use believable fictional numbers and names throughout.

Option B: scraped web data

claude code

Create a folder called research-docs in my home directory with three realistic sample files simulating scraped web content: a local news site's recent coverage of a zoning dispute (5 articles as separate text files), a city council member's public social media posts from the last month, and an archived community forum thread about a proposed development. Use believable fictional names and details.

Option C: folder of news articles

claude code

Create a folder called research-docs in my home directory with six realistic sample news articles about a fictional county's opioid crisis response, spanning 2023-2024. Include a mix of hard news, a feature profile, an editorial, and a data-driven piece. Each should be a separate text file with a byline, date, and publication name. Use believable fictional details throughout.

Option D: public API dataset

claude code

Create a folder called research-docs in my home directory with realistic sample data files simulating public API output: a JSON file of 311 service requests for a city (50+ entries with categories, dates, locations, and resolution status), a CSV of restaurant health inspection scores, and a JSON file of building permit applications from the last quarter. Use believable fictional data throughout.

03 Configure MCP in Claude Code

Ask Claude to set up the MCP filesystem server configuration. Stay in the same session:

claude code

Set up MCP filesystem server access to my ~/research-docs folder. Create or update ~/.claude/claude_desktop_config.json with the correct configuration. Use the actual path to my research-docs directory (not a placeholder). Tell me what you wrote so I can verify it.

VERIFY: Check that the path in the config file matches your actual username and home directory. Claude should detect this automatically, but confirm the path before moving on.

04 Install the MCP filesystem server

The server will be installed automatically when needed, but you can pre-install it:

terminal

npm install -g @modelcontextprotocol/server-filesystem

05 Restart Claude Code and verify

Start a new Claude Code session:

terminal

claude

Ask Claude to list what MCP tools are available:

claude code

What MCP servers and tools do you have access to?

You should see the filesystem server listed with access to your research-docs folder.

06 Query your documents

Now query your document collection. The key difference from a normal chat: Claude is pulling answers from your files, not from memory. Try these (adapt to whichever data option you chose):

claude code

Using only the files in my research-docs folder, summarize the key facts. For each fact, tell me which file it came from and quote the relevant passage.

claude code

Search my research documents for any names or organizations that appear in more than one file. What connections can you identify?

claude code

Look at my research documents and identify gaps -- what questions would a reporter still need answered? What data is missing that would make the story stronger?

07 Build a research workflow

Use your document collection to draft a story. This prompt asks Claude to act as a research assistant -- grounding its work in your files, not its training data:

claude code

Using only my research documents, help me outline a story. For each point in the outline, cite the specific file and passage it came from. Flag any claims that would need independent verification. Identify gaps where I'd need additional sources or data.

Checkpoint

Self-check: Make sure you can answer these before moving on.

? What is the main difference between a chatbot and an AI agent?
? When would you use claude -p instead of an interactive session?
? What does MCP stand for and what problem does it solve for journalists working with local data?
? Name two ways an MCP data connection can fail silently, and how you'd catch the problem.
? Why is it important to ask Claude to cite which file a fact came from when using external data?

Resources

[DOCS] Model Context Protocol - Official MCP documentation and specification
[REF] Claude Code documentation - MCP configuration reference
[REPO] MCP Servers repository - Collection of official and community MCP servers
[ARTICLE] Introducing the Model Context Protocol - Anthropic's announcement explaining MCP

Troubleshooting

MCP server doesn't appear in Claude Code

Check that your config file is valid JSON (no trailing commas, proper quoting). Use cat ~/.claude/claude_desktop_config.json to verify the contents. Restart Claude Code after making changes.

"Permission denied" accessing files

The MCP server can only access directories you've explicitly configured. Check that the path in your config matches the actual directory, and that your user has read permission on those files.

Server fails to start

Make sure Node.js and npm are installed and in your PATH. Try running the server manually to see error messages: npx @modelcontextprotocol/server-filesystem ~/research-docs

Claude doesn't seem to use the documents

Be explicit in your prompts: "Search my research documents for..." or "Using the files in my research folder...". Claude may not automatically search documents unless asked.

AGENTS, TOOLS, AND DATA ACCESS

Learning objectives

Before you start

Agents, tools, and data access

Key concepts

Chatbots vs. agents

Non-interactive mode: claude -p

Agent delegation and multi-step workflows

Connecting AI to data sources with MCP

MCP server types

Where data connections break down

From the field

Reroute NJ

Jay Rosen digital archive

EXERCISE

01 Understand the default file access

02 Create a research documents folder

03 Configure MCP in Claude Code

04 Install the MCP filesystem server

05 Restart Claude Code and verify

06 Query your documents

07 Build a research workflow

Checkpoint

Resources

Troubleshooting

MCP server doesn't appear in Claude Code

"Permission denied" accessing files

Server fails to start

Claude doesn't seem to use the documents

Non-interactive mode: `claude -p`