Agentic Coding for PPC: Claude Code Setup

Living document. Last updated March 2026. I update this when the stack meaningfully changes.

PPC is evolving past ‘if-then’ scripts into engineered agentic systems: data analysis, AI classification, and execution logic operating in a continuous loop. These systems don’t just automate tasks, they handle work that wasn’t feasible before.

In practice, that looks like:

Classifying thousands of search terms with an LLM, routing them to the right campaign and ad group, and pushing negatives and relevant new keywords directly through the Google Ads API after human approval
Automating budget distribution and tROAS adjustments using predictive incrementality curves
Measuring predictive LTV using Offline Conversion Import through Google Ads API
Forecasting demand and conversion trends with open-source forecasting models like Prophet, statistical approaches, or more advanced ML methods depending on your data
Rotating campaign and industry-specific assets based on performance signals
Generating weekly and monthly performance reports at scale using AI agents
Building interactive agents your team can talk to about data, performance, strategy, and projects

The Stack: BigQuery acts as your data warehouse, while SQL and Python handle the heavy lifting of analysis. LLM APIs provide the “brain” for classification and decision logic, and the Google Ads API serves as the hands for execution. Using Python and the API’s mutate operations, the system pushes changes at a scale no human can hit.

Claude Code is the force multiplier here. It doesn’t just “suggest” snippets; it fundamentally understands how to architect and operate entire systems. Building systems of these complexity used to require a dedicated DevOps and engineering team. Now, you can operate sophisticated agentic architectures solo.

Why Claude Code?

Claude Code is an agent, not a chatbot. It takes a goal, breaks it into steps, and executes them using real tools: your file system, terminal, git history, any CLI tool on your machine. It writes files, queries databases, runs evals, commits changes, and iterates based on actual results.

I run it from the terminal inside Cursor. My project files are visible in the editor while I talk to Claude in the integrated terminal. I can see file changes happening in real time as Claude works. You can also run Claude Code standalone from any terminal, or through the Claude Desktop app which provides a more visual interface.

Note on other Claude interfaces. Claude Desktop is useful for quick conversational work: planning, analysis, writing. You can give it access to a folder on your machine via a Filesystem MCP, which lets it read and write your project files during chat sessions. Claude Cowork is a newer interface designed for document-heavy tasks like reports, research synthesis, and file organization. I don’t use it much since my work is mostly code and data, but it’s relevant if you’re more document-oriented.

The rest of this article focuses on Claude Code, because that’s where the engineering happens.

The Persistent Context & Self-Improvement Loop

This is the core of the system. It’s what turns Claude Code from a generic coding tool into a personal assistant that understands your projects, your data, your code, your strategy, and gets better every session.

CLAUDE.md: The Operating Instructions

Every repository has a CLAUDE.md file at the root. Claude Code reads it automatically at the start of every session. It defines how Claude should work with you: how to plan, how to validate, what to read first, what rules to follow.

Here’s a simplified version of my actual CLAUDE.md:

# CLAUDE.md

## How to Assist

### Step 1: Understand Intent
Read the request. Identify whether this is planning-heavy (use Opus)
or execution-heavy (use Sonnet/Haiku) and suggest the right model.

### Step 2: Find Relevant Documentation
Read docs/index.md to find relevant docs. Only load what's needed.
- Always read a project's GOAL.md before working on it
- For BigQuery, start with bigquery/README.md, then load the schema
- Re-check docs/index.md when the topic shifts mid-conversation

### Step 3: Plan, Then Execute
Present the full plan with all steps. Include a Validation section:
"How will we know this worked?" Wait for confirmation.
Once confirmed, execute autonomously: write code, run queries,
iterate, fix errors. Only pause if something unexpected changes the plan.

### Step 4: Update Your Own Context
After completing work, review what changed and propose doc updates:
- Update anything now out of date
- Remove references to things that no longer exist
- Add documentation for anything new

## Critical Rules

### Self-Improvement Loop
After any correction or mistake, propose a concrete edit to CLAUDE.md
or the relevant doc that prevents the same mistake recurring.

### Verification Before Done
Never mark a step complete without demonstrating the result:
- SQL: expected output columns or sample rows
- Code: expected input/output or test commands
- Data pipeline: row counts or non-null checks

### Architecture
Every Python pipeline gets a Mermaid diagram in the README.
Write code for humans: clear names, comments on non-obvious logic.
Start simple, evolve intentionally.

This is deliberately short, around 70 lines. Every line loads into Claude’s context window on every session, so every line must earn its place. The detailed knowledge lives in the files CLAUDE.md points to.

The context window is finite. You can’t load every document every session. Instead, you build a navigation layer: an index file that tells Claude what exists and where to find it.

Claude reads this first, then opens only the specific file needed for the current task:

# Documentation Index

## Projects
| Project                | Goal File                         | Overview                                              |
|------------------------|-----------------------------------|-------------------------------------------------------|
| Search Term Pipeline   | projects/search-terms/GOAL.md     | AI classification, campaign routing, ad group routing |
| Bid Management         | projects/bid-management/GOAL.md   | tROAS optimization, bid strategy analysis             |
| Budget Distribution    | projects/budget/GOAL.md           | Predictive budget allocation across campaigns         |
| Ad Copy Automation     | projects/ad-copy/GOAL.md          | Ad copy creation, rotation, RSA testing at scale      |
| Automated Reporting    | projects/reporting/GOAL.md        | MBR/WBR report generation from BigQuery               |
| Forecasting            | projects/forecasting/GOAL.md      | Spend and conversion forecasting with backtesting     |

## BigQuery & Data
| Document               | Location                          | Use When                          |
|------------------------|-----------------------------------|-----------------------------------|
| BigQuery Decision Tree | bigquery/README.md                | Figuring out which table to query |
| Google Ads Transfer    | bigquery/schemas/gads.md          | Raw Google Ads entity + stats     |
| Performance Metrics    | bigquery/schemas/performance.md   | Aggregated business metrics       |
| Calculated Metrics     | bigquery/metrics/formulas.md      | CAC, LTV:CAC, ROAS formulas      |
| Channel Taxonomy       | bigquery/metrics/channels.md      | Channel/subchannel/stage filters  |

## By Question Type
| "I want to..."                                    | Start Here                    | Then Read                          |
|---------------------------------------------------|-------------------------------|------------------------------------|
| Optimize budget allocation using historical ROI  | budget/GOAL.md                | bigquery/schemas/performance.md    |
| Analyze tROAS drift and adjust bid strategies    | bid-management/GOAL.md        | bigquery/schemas/gads.md           |
| Build forecast models with backtesting           | forecasting/GOAL.md           | bigquery/schemas/performance.md    |
| Review pLTV model accuracy and retrain           | predictive-ltv/GOAL.md        | bigquery/schemas/ltv_predictions.md |
| Run search term classification pipeline          | search-terms/GOAL.md          | search-terms/classification.md     |
| Run eval suite and compare baseline accuracy     | search-terms/eval.md          | search-terms/gold-set.md           |
| Review classification eval stats across runs     | search-terms/eval-stats.md    | search-terms/eval.md               |
| Run prompt ablation testing on gold set          | search-terms/prompts/         | search-terms/eval-stats.md         |
| Generate MBR with actuals vs targets             | reporting/runbook.md          | reporting/script-docs.md           |
| Evaluate landing page conversion patterns        | bigquery/schemas/gads.md      | LandingPageStats analysis patterns |
| Query BigQuery with multi-table joins            | bigquery/README.md            | Specific schema file               |

The “By Question Type” section is important. It routes Claude to the right starting point based on what you’re asking, not which project folder it lives in. The more routing context you provide, the faster Claude finds what it needs.

Project Context: GOAL.md

Each project gets a GOAL.md file: what the project is, its current state, what’s been completed, what’s next. Claude reads it before doing any work on that project.

This prevents re-briefing. When you say “let’s continue the search term pipeline,” Claude reads the GOAL.md, sees exactly where you left off, and picks up from there.

The Self-Improvement Loop

This is what makes the system compound over time.

After every correction or mistake, Claude proposes an edit to CLAUDE.md or the relevant documentation file that would prevent the same mistake from happening again. You review the proposed edit, approve or modify it, and it becomes a standing rule.

A few examples:

Claude uses the wrong BigQuery table for a stats query. New rule: “Always use p_ads_* tables with segments_date for date-range stats”
Claude forgets to include rate limiting in a pipeline. Checklist added: “Every pipeline must include batching, rate limiting, retry logic, error handling, logging”
Claude generates SQL with SELECT *. Rule: “Never use SELECT *, always specify columns to control costs”

Over months, this accumulates into a comprehensive set of instructions tailored exactly to your workflow. The assistant gets measurably better at your specific work because every friction point becomes a permanent fix.

This is what makes it a real assistant: it remembers what went wrong, fixes its own instructions, and doesn’t make the same mistake twice.

Skills: Reusable Routines

A skill is a Markdown file that describes a repeatable routine. When you say a trigger phrase, Claude loads the skill and follows its instructions. Store them in .claude/skills/, each in its own folder with a SKILL.md file.

Skills I use regularly:

Skill	What it does
run-pipeline	Executes a specific data pipeline (classification, routing, reporting) with production flags. Validates output row counts and non-null checks before marking complete.
run-eval	Runs the evaluation framework against a gold set. Compares classification accuracy, reports per-category precision/recall, flags regressions. Used after every prompt change or model swap.
weekly-update	Generates a weekly progress summary from git history. Matches commits to Jira tickets by prefix, proposes status transitions. One command for the full sprint review.
send-slack-report	Compiles key metrics from BigQuery and sends a formatted summary to a Slack channel via webhook. Scheduled or on-demand.
sync-jira	Updates Jira ticket statuses based on recent git commits. Proposes transitions (In Progress, Done, In Review) based on branch names and commit messages.
sync-confluence	Pushes local Markdown docs to Confluence via REST API. Shows a dry-run diff first, waits for confirmation, then syncs. Docs stay in git as source of truth.
repo-health	Audits the repository: checks every project has a GOAL.md, verifies all index links are valid, flags orphaned docs, detects stale statuses.
visualize-flow	Generates Mermaid architecture diagrams for data pipelines. Replaces the visual canvas you’d get from tools like n8n.
learning-opportunity	Shifts into teaching mode with three depth levels (conceptual, practical, advanced). Uses your actual code as the example, not textbook abstractions.
dashboard-sync	Syncs task state between a visual HTML dashboard and Claude Code via a shared JSON file. Bidirectional: move a task on the dashboard, Claude sees it next session.

The principle: if you’re re-explaining the same thing to Claude more than twice, write a skill for it. Skills are how you encode operational knowledge into the system.

Skills also cover production maintenance. When a pipeline is live, you need to run it, evaluate its output, compare accuracy across prompt versions, and catch regressions. These are all skills, not ad-hoc commands.

Tooling: Why Claude Code Is Different

The critical differentiator between Claude Code and a chat-based AI is that Claude Code can use tools. Not just generate code, but execute it, read the output, spot problems, fix them, and iterate. This is the difference between getting code suggestions and having an agent that actually operates your systems.

Tools I Use Daily

**bq CLI**: Query BigQuery directly from the terminal. Claude writes SQL, runs it, reads the results, identifies issues, and iterates until the output is correct.
**gcloud**: Google Cloud authentication, project switching, service account management.
**git**: Commits, diffs, branch management, history. The weekly-update skill reads git log to generate progress reports.
**python / pip**: Execute scripts, install packages, run full pipelines. Claude can run a Python classification pipeline end-to-end and read its logs.
**gh**: GitHub CLI for pull requests, issues, code review.

The Execute, Validate, Iterate Loop

This is where the real power sits. Claude doesn’t just write a query. It runs it, validates the output, and fixes problems autonomously.

Example: “Which search terms are spending over $50 with zero conversions?”

Claude checks the BigQuery schema docs for the right table and column names
Writes the SQL query using the correct partitioned table and date filter
Runs it via bq query
Reads the output, notices 0 rows returned
Investigates: queries INFORMATION_SCHEMA to check the actual column name for conversions
Discovers the column is metrics_conversions not conversions
Fixes the query, re-runs
Returns 847 search terms with clean analysis and a recommendation

No copy-pasting between tools. No switching to BigQuery console. No manual debugging of column names. The agent handles the iteration loop that a human would otherwise do manually across multiple browser tabs.

This extends to multi-step analysis. Claude can query one dataset to identify a problem, query a second dataset to find the cause, query a third to validate the hypothesis, all autonomously within a single conversation turn. That’s the kind of analysis workflow that takes a human 45 minutes of tab-switching and becomes a single natural language instruction.

BigQuery is central to everything Claude builds. Whether it’s a Python forecasting script, a budget distribution model, or an ad copy evaluation framework, the data almost always comes from BigQuery. Claude queries it to build the dataset, runs the analysis in Python, and writes results back. The bq CLI is the bridge that makes this seamless.

Plan Mode: Think Before Building

Type /plan or press Shift+Tab twice. Claude enters read-only research mode. It can explore your codebase, read docs, and query schemas, but cannot write files or execute commands. It’s forced to think before acting.

When to Use It

Before any build that touches multiple files or systems
When the architecture has multiple valid approaches
When you need to understand existing code before modifying it
Before building any pipeline that involves multiple APIs or data sources

Example: Building a Search Term Classification Pipeline

This is a non-trivial system: BigQuery SQL (input query to fetch unclassified terms), Python batching (split into chunks of 100), Gemini API (classify each batch with structured JSON output), a web search bridge for ambiguous terms, a second AI pass with enriched context, BigQuery writes (insert to staging, merge to production), and an evaluation framework (compare against a gold set for accuracy tracking).

That’s 6-7 systems interacting across three sequential passes. You don’t want Claude to start writing code immediately. You want it to think through the architecture first.

In plan mode:

Describe the goal and constraints. “Build a pipeline that classifies unclassified search terms using Gemini. Input from BigQuery, output to BigQuery. Must handle 10K+ terms. Must include evaluation.”
Claude explores. It reads your existing code, checks what utilities already exist (batch processing? Gemini client?), reads the BigQuery schema docs for input/output table structures.
Claude presents the plan. Entry point script, data flow diagram, API call structure, error handling strategy, validation steps (“How will we know this worked? Run the eval pipeline and check accuracy against the gold set”).
You review and adjust. “Use the existing Gemini client. Add a --limit flag for testing. Skip enrichment for v1.”
Confirm and execute. Claude builds the full pipeline autonomously. It writes the code, runs it with --limit 10 to test, checks the BigQuery output, fixes any issues, and presents results.

The discipline is: confirm the plan once, then let Claude run. Don’t micro-manage individual steps. Pause only if something unexpected changes the scope.

Model Routing Strategy

Claude Code supports multiple models that you can switch between mid-session. Not every task needs the most capable (and most expensive) model.

Task type	Model	Reasoning
Architecture decisions, project scoping, complex debugging	Opus	Deep reasoning justifies the cost
Writing SQL from a clear spec, building scripts, documentation	Sonnet	Clear instructions, clean execution
Running pipelines, file edits, formatting, repetitive tasks	Haiku	Fast, reliable, follows instructions precisely

The target distribution I aim for: ~5% Opus, ~15% Sonnet, ~80% Haiku. Once a plan is confirmed and the spec is clear, Haiku executes it cleanly at a fraction of the cost. Switch back to Opus only when something unexpected requires re-reasoning.

/model opus    # Planning and architecture
/model sonnet  # Execution requiring judgment
/model haiku   # Execution against a clear spec

The general rule: if you’re asking Claude to think, use Opus. If you’re asking Claude to do, use Haiku.

BigQuery: The Data Foundation

This is the longest section in the article because BigQuery is the most powerful part of the stack. Everything starts here. Every pipeline Claude builds, whether it’s a forecasting model, a budget distribution optimizer, or an ad copy evaluator, pulls its data from BigQuery and typically writes results back to it.

Google Ads Data Transfer

The Google Ads Data Transfer gives you the most granular level of Google Ads data available in a queryable format. It transfers daily snapshots of every entity in your account to BigQuery. This is valuable because it contains your complete account structure: every campaign, ad group, keyword, negative, search term, historical ROAS targets, budget settings, cost and conversion stats. It’s everything you need for campaign optimization in a single queryable location, and Claude knows how to query all of it.

Entity tables (dimension data):

Campaign: name, status, bidding strategy, tROAS/tCPA targets, budget, network settings
AdGroup: name, status, CPC bid, targeting settings
Keyword: text, match type, status, bid, quality score
Ad: type, final URLs, RSA headlines/descriptions, ad strength
BidGoal: portfolio bid strategy settings and targets
Audience, Extension, Criterion: all targeting and extension entities

Stats tables (performance metrics):

CampaignBasicStats: impressions, clicks, cost, CTR, CPC at campaign level
CampaignConversionStats: conversions, conversion value, ROAS at campaign level
AdGroupBasicStats / AdGroupConversionStats: same metrics at ad group level
SearchQueryStats: search term performance (the raw search query report)
KeywordBasicStats / KeywordConversionStats: keyword-level performance
LandingPageStats: impressions, clicks, cost per landing page URL per campaign
BidGoalStats: bid strategy performance metrics

What you get that the Google Ads interface doesn’t provide at this granularity:

Historical tROAS/tCPA changes: daily snapshots of campaign settings let you track exactly when bid targets changed and correlate with performance shifts
Landing page performance per campaign: the Ads UI aggregates landing pages; the Data Transfer gives you per-campaign breakdowns
Search query stats at SQL scale: query across 500K+ terms with custom aggregations, joins, and filters that the UI’s reporting tab can’t do
Asset-level performance: headline and description stats for RSA ads
Bid strategy configuration history: portfolio bid strategy settings over time

Two table types exist for each entity:

**ads_* views**: latest snapshot only. Use for dimension lookups. Filter: WHERE _DATA_DATE = _LATEST_DATE
**p_ads_* partitioned tables**: full history. Use for date-range analysis. Filter: WHERE segments_date BETWEEN ... AND ...

The critical rule: always use p_ads_ tables with segments_date for any performance stats query over a date range.* The ads_* views only contain the latest day.

Running this in Google Cloud means the data is already co-located with your compute. You can connect any other API to it: Google Ads API to push changes back, Gemini for AI classification, Python for custom pipelines and statistical modeling. You can build complete PPC management software on this data layer if you invest the time.

Beyond Google Ads: Your Full Data Warehouse

Google Ads Data Transfer is just the starting point. BigQuery is a data warehouse, which means you can connect any data source your business has. LTV and customer lifetime value data from your backend systems. Order data, revenue, margins, returns, subscription metrics. If your data team has it in BigQuery (or you can get it there), Claude can query it.

This is where PPC optimization gets genuinely powerful. You’re no longer optimizing campaigns based on what Google Ads tells you (clicks, conversions, ROAS). You’re optimizing based on what your business actually cares about: real margins, customer quality, predicted lifetime value, return rates. A campaign that looks great on last-click ROAS might be driving low-LTV customers or high-return-rate products. With backend data in BigQuery, you can see that and act on it.

The pattern is the same regardless of data source: document the table schema, add it to your BigQuery decision tree, and Claude starts using it in analysis. Whether that’s Google Ads performance data, backend revenue tables, or product catalog information, it’s all just SQL once it’s in BigQuery.

Schema Self-Discovery

Claude can query INFORMATION_SCHEMA directly to discover table structures before running any analysis. This means you don’t need pre-built documentation for every table. Claude discovers columns, data types, and date ranges on the fly:

-- Discover all tables in a dataset
SELECT table_name, table_type
FROM `your-project.your_dataset.INFORMATION_SCHEMA.TABLES`
ORDER BY table_name;

-- Inspect columns for a specific table
SELECT column_name, data_type
FROM `your-project.your_dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'your_table'
ORDER BY ordinal_position;

-- Check available date ranges before any historical query
SELECT MIN(segments_date) AS earliest, MAX(segments_date) AS latest, COUNT(*) AS rows
FROM `your-project.your_dataset.p_ads_CampaignBasicStats_XXXXXXXXXX`
WHERE segments_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY);

Claude runs these discovery queries automatically when it encounters a table it hasn’t seen before. It checks what columns exist, what date ranges are available, and then writes the correct query.

That said, it’s better to document key tables when you use them frequently. Pre-built schema docs with business context make Claude faster and more accurate:

Schema docs eliminate discovery overhead. Claude knows the columns immediately.
You can include business context: “this column is in micros, divide by 1,000,000” or “filter by campaign_status = 'ENABLED' by default”
Calculated metrics get their own reference file with canonical SQL formulas

The ideal is both: documented schemas for core tables, self-discovery for everything else.

The BigQuery Decision Tree (Example Structure)

Here’s an example of how you might structure BigQuery knowledge in your repo. Customize this to match your actual setup, tables, and naming conventions.

bigquery/
├── README.md                          <- Decision tree: which table for which question
├── schemas/
│   ├── google_ads_transfer.md         <- Google Ads entity + stats tables, tROAS changes, budget tracking
│   ├── performance.md                 <- Aggregated business metrics (revenue, CAC, LTV, ROAS trends)
│   ├── search_term_tables/            <- AI classification pipeline outputs
│   │   ├── classification.md          <- Search term classification results
│   │   ├── campaign_routing.md        <- Campaign routing decisions
│   │   └── ad_group_routing.md        <- Ad group assignment results
│   ├── predictive_models.md           <- Forecast outputs, budget allocation recommendations
│   ├── ltv_predictions.md             <- Predictive lifetime value models (advanced)
│   └── targets.md                     <- Monthly budget and revenue targets
└── metrics/
    ├── calculated-metrics.md          <- Every business metric formula
    └── channel-breakdown.md           <- Channel/subchannel/stage taxonomy

The README.md is a decision tree that routes Claude to the right table. Here’s an example:

Question type	Table	Schema doc
Ad platform performance (clicks, cost, conversions)	Google Ads Transfer tables	`schemas/google_ads_transfer.md`
tROAS changes over time	Campaign snapshots (Transfer)	`schemas/google_ads_transfer.md`
Campaign budget changes	Campaign dimension history	`schemas/google_ads_transfer.md`
Backend revenue, CAC, LTV:CAC	Performance metrics table	`schemas/performance.md`
Search term classification results	Pipeline output tables	`schemas/search_term_tables/`
Forecasting and budget optimization	Predictive models table	`schemas/predictive_models.md`
Predicted customer lifetime value	LTV predictions table	`schemas/ltv_predictions.md`
Landing page evaluation	LandingPageStats (Transfer)	`schemas/google_ads_transfer.md`
Budget vs actual targets	Targets table	`schemas/targets.md`

The more context you give your agent about which table to use for which question, the better its first query attempt will be. This is where documentation pays for itself every session.

Calculated Metrics & Channel Taxonomy

Two files handle the business logic layer:

Calculated metrics: a single file that defines every business metric formula with canonical SQL:

## CAC (Customer Acquisition Cost)
Formula: costs / new_paying_users
SQL: SAFE_DIVIDE(SUM(costs), SUM(new_paying_users))
Interpretation: < $100 excellent, $100-200 healthy, > $200 investigate

## LTV:CAC
Formula: lifetime_value / costs
SQL: SAFE_DIVIDE(SUM(lifetime_value), SUM(costs))
Interpretation: < 1.0 losing money, 1.0-3.0 growth mode, > 3.0 healthy

## Conversion Rate
Formula: conversions / clicks
SQL: SAFE_DIVIDE(SUM(conversions), SUM(clicks))
Note: Always aggregate first, then calculate the ratio

Every formula uses SAFE_DIVIDE() to prevent division-by-zero errors. Claude references this file whenever it writes metric calculations, ensuring consistent definitions across every query.

Channel taxonomy: defines the hierarchy of channels (paid search non-brand, paid search brand, display, video, paid social), subchannels (by product or toolkit), funnel stages (attract, engage, convert), and teams. Includes default SQL filters and field naming conventions.

This is the “business logic” layer. Without it, Claude would write technically correct SQL that produces meaningless results. With it, Claude understands what “PSNB performance” means, which filters to apply by default, and what metric definitions to use.

`bq` CLI vs BigQuery MCP

In Claude Code, always use the bq CLI. It costs zero context tokens, works across any GCP project you have access to, and gives you the full CLI feature set.

The BigQuery MCP is designed for Claude Desktop chat sessions. It works, but it costs 15-20K context tokens just to load the tool definitions at session start, and it’s locked to a single GCP project.

Factor	`bq` CLI (Claude Code)	BigQuery MCP (Desktop)
Context cost	0 tokens	~15-20K tokens
Cross-project	Any project with IAM	Locked to one
Flexibility	Full CLI	Limited tool definitions

Real Query Patterns

Search term stats over a date range:

SELECT
  search_term_view_search_term,
  SUM(metrics_cost_micros) / 1000000 AS cost,
  SUM(metrics_clicks) AS clicks,
  SUM(metrics_impressions) AS impressions,
  SAFE_DIVIDE(SUM(metrics_conversions), SUM(metrics_clicks)) AS cvr
FROM `your-project.your_dataset.p_ads_SearchQueryStats_XXXXXXXXXX`
WHERE segments_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) AND CURRENT_DATE()
GROUP BY 1
ORDER BY cost DESC
LIMIT 500

Tracking tROAS changes over time (using window functions on daily campaign snapshots):

WITH daily_settings AS (
  SELECT
    campaign_id, campaign_name,
    DATE(_PARTITIONTIME) AS snapshot_date,
    campaign_maximize_conversion_value_target_roas AS target_roas
  FROM `your-project.your_dataset.p_ads_Campaign_XXXXXXXXXX`
  WHERE _PARTITIONTIME >= TIMESTAMP(DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY))
    AND campaign_maximize_conversion_value_target_roas IS NOT NULL
    AND campaign_maximize_conversion_value_target_roas > 0
),
with_previous AS (
  SELECT *,
    LAG(target_roas) OVER (PARTITION BY campaign_id ORDER BY snapshot_date) AS prev_roas
  FROM daily_settings
)
SELECT
  campaign_name, snapshot_date,
  prev_roas AS old_troas,
  target_roas AS new_troas,
  ROUND(target_roas - prev_roas, 2) AS change
FROM with_previous
WHERE prev_roas IS NOT NULL AND target_roas != prev_roas
ORDER BY snapshot_date DESC

Multi-level join (Campaign, Ad Group, Stats):

SELECT
  c.campaign_name,
  ag.ad_group_name,
  SUM(s.metrics_impressions) AS impressions,
  SUM(s.metrics_clicks) AS clicks,
  SUM(s.metrics_cost_micros) / 1000000 AS cost
FROM `your-project.your_dataset.ads_AdGroupBasicStats_XXXXXXXXXX` s
LEFT JOIN (
  SELECT ad_group_id, campaign_id, customer_id, ad_group_name
  FROM `your-project.your_dataset.ads_AdGroup_XXXXXXXXXX`
  WHERE _DATA_DATE = _LATEST_DATE
) ag USING (ad_group_id, campaign_id, customer_id)
LEFT JOIN (
  SELECT campaign_id, customer_id, campaign_name
  FROM `your-project.your_dataset.ads_Campaign_XXXXXXXXXX`
  WHERE _DATA_DATE = _LATEST_DATE
) c USING (campaign_id, customer_id)
WHERE s._DATA_DATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY 1, 2
ORDER BY cost DESC

These patterns compose. Once Claude knows them, it applies them correctly across any analysis.

Agentic Workflows: Connecting LLM APIs to Your Data

This is where the system becomes truly agentic. An agentic workflow is a Python pipeline that connects an LLM API to your data, makes decisions based on that data, and executes actions, either on a schedule or on demand.

The pattern is consistent across every workflow:

Query BigQuery for the data the workflow needs (search terms, performance stats, budget actuals)
Send that data to an LLM API (Gemini, Claude, or any model) with a structured prompt that defines the classification or analysis task
Process the LLM’s response, validate it, write results back to BigQuery
Execute the output via whatever API is relevant: Google Ads API for account changes, Google Docs API for reports, Slack webhooks for notifications

Once the pipeline is built, you schedule it. A cron job, Cloud Scheduler, or a simple bash script that runs daily. The search term classification pipeline runs every morning, processes new terms that came in overnight, and has recommendations ready for team review by the time you open your laptop. Weekly business reviews generate themselves every Monday. Budget reallocation recommendations refresh daily based on the latest performance data.

The key insight: these are just Python scripts with API calls. There’s no special orchestration framework needed. Claude Code builds them, you test them, and they run. If something breaks, Claude reads the logs and fixes it. If the LLM’s output quality drifts, the eval pipeline catches it and you refine the prompt.

Some workflows I run on a schedule:

Search term classification: daily, processes new unclassified terms
Negative keyword automation: weekly, flags high-spend zero-conversion terms for review
WBR/MBR generation: weekly/monthly, queries actuals vs targets, generates narrative reports
Budget distribution recommendations: daily, recalculates optimal allocation based on recent performance
tROAS monitoring: daily, flags campaigns where actual ROAS deviates significantly from target

The more workflows you build, the more your system runs itself. You shift from doing the work to reviewing the work and improving the system that does it.

What You Can Build

The architecture of an autonomous PPC system:

flowchart TB
    subgraph Data["Data Layer"]
        BQ[(BigQuery\nSearch Query Stats)]
    end

    subgraph Classification["Stage 1: Classification"]
        FETCH[Fetch Unprocessed Terms]
        PASS1[Pass 1: AI Classification\nInternal Knowledge Only]
        TAVILY[Web Search Bridge\nResolve Ambiguous Terms]
        PASS2[Pass 2: Re-classification\nWith Web Context]
        STG1[(Staging Table\nAppend-Only Audit Trail)]
        PROD1[(Production Table\nMERGE Upsert)]
    end

    subgraph Routing["Stage 2: Campaign & Ad Group Routing"]
        ROUTE[Score Campaign Themes\nPer Product Line]
        ASSIGN[Assign to Ad Groups\nWith Deduplication]
        STG2[(Routing Tables)]
    end

    subgraph Review["Stage 3: Team Review Dashboard"]
        DASH[Python Dashboard\nFilter by Campaign / Action Type]
        APPROVE[Approve / Reject / Override]
    end

    subgraph Execution["Stage 4: Google Ads API"]
        NEG[Add Negatives]
        KW[Add Keywords]
        BID[Adjust Bids & Budgets]
    end

    subgraph Feedback["Continuous Improvement"]
        CORRECTIONS[Team Corrections]
        GOLD[Gold Set Update]
        EVAL[Eval Pipeline\nPrecision / Recall / Accuracy]
        PROMPT[Prompt Refinement]
    end

    BQ --> FETCH --> PASS1
    PASS1 -->|Known Terms| STG1
    PASS1 -->|Unknown Terms| TAVILY --> PASS2 --> STG1
    STG1 --> PROD1
    PROD1 -->|ADD_KEYWORD| ROUTE --> ASSIGN --> STG2
    PROD1 -->|ADD_NEGATIVE| DASH
    STG2 --> DASH
    DASH --> APPROVE
    APPROVE -->|Approved| NEG & KW & BID
    APPROVE -->|Rejected| CORRECTIONS
    NEG & KW & BID -->|Data Transfer Sync| BQ
    CORRECTIONS --> GOLD --> EVAL --> PROMPT -->|Updated Prompts| PASS1

Each layer is code and APIs. Claude Code builds each layer and connects them. Concrete systems I’ve built or am building:

Search term classification: A three-pass pipeline that processes thousands of search terms per run. Pass 1 sends terms to Gemini in batches of 100, classifying each term across multiple product dimensions with scores (1-10), flagging competitors and branded terms, and recommending an action: ADD_KEYWORD, ADD_NEGATIVE, or TO_REVIEW. Terms the AI can’t confidently classify get flagged and sent through a web search bridge (Tavily) to resolve ambiguity. Pass 2 re-classifies those enriched terms with the web context included. All results flow through a staging table (append-only audit trail) before merging to production. This three-pass architecture handles the long tail of ambiguous terms that a single-pass system would misclassify.

Campaign routing: Takes classified terms marked as ADD_KEYWORD and routes them to specific campaigns based on product theme scoring. For a SaaS company with multiple product lines, Gemini scores each term against every campaign theme (e.g., 11 product-specific campaigns). The highest-scoring theme becomes the routing destination. Same three-pass pattern: main classification, web search for ambiguous terms, re-classification with context. Deduplication uses a token-sorting function that treats “seo tools” and “tools seo” as identical, preventing duplicate keyword additions across campaigns.

Budget distribution optimization: Analyzes historical performance data across campaigns to identify where marginal spend produces the best returns. Combines with predictive modeling (Prophet forecasts or regression models) to recommend budget reallocation across the portfolio. For advanced setups: use predictive LTV models to optimize budget based on predicted customer profitability, not just immediate ROAS. BigQuery provides the historical data, Python runs the optimization, Google Ads API pushes new daily budgets.

tROAS monitoring and optimization: Tracks tROAS changes across all campaigns using the daily snapshot pattern shown above. Correlates target changes with performance shifts. Feeds into a model that predicts optimal tROAS targets based on historical cost/conversion curves. Advanced: combine with predictive LTV modeling to set tROAS targets based on predicted customer profitability instead of immediate ROAS.

Ad copy automation and rotation: Generates ad copy variations using AI, manages RSA headline/description testing at scale, evaluates asset-level performance from BigQuery Data Transfer tables to identify top performers and retire underperformers.

Landing page evaluation: Uses LandingPageStats from the Data Transfer to analyze per-campaign landing page performance. Identifies pages with high traffic and low conversion rates, surfaces A/B testing opportunities.

Automated reporting (MBR/WBR): Queries BigQuery for actuals vs targets, generates visualizations and charts, uses Gemini to write narrative analysis and storyline, auto-publishes to Google Docs or Slides with formatted charts and executive summary. One command regenerates the full report each month. No manual chart creation, no copy-pasting metrics, no writing summaries — the pipeline handles the entire storytelling layer.

Forecasting: 90-day spend and conversion forecasts using Prophet or statistical models on historical BigQuery data. Includes automated backtesting to validate model quality before publishing predictions.

Predictive lifetime value modeling (advanced): Build models that predict customer lifetime value based on first-touch or cohort attributes. Score incoming traffic, then upload predicted LTV values back to Google Ads via the Offline Conversion Import API. This enables bid optimization on predicted profitability, not just last-click ROAS. Claude Code handles the model building, validation, and API integration.

Interactive team agents: Build conversational agents your team can talk to about data, performance, strategy, and projects. Connect an LLM to your BigQuery warehouse and internal documentation, and your team gets a natural language interface to ask questions, pull metrics, and explore trends without writing SQL or waiting for an analyst. The agent handles the query generation, runs it against your data, and returns formatted answers with context.

Negative keyword automation: Identifies search terms with high spend and zero conversions, filters against existing negative lists, pushes new negatives via Google Ads API with campaign-level targeting. Match type selection matters: exact match for branded terms (protect brand campaigns), phrase match for irrelevant competitors (block all variations), exact match for high-risk terms that contain core business words.

The Team Review Dashboard

This is where the human stays in the loop. A Python dashboard (Streamlit, Retool, or custom HTML) pulls recommendations directly from BigQuery and presents them in a filterable interface:

Keyword additions grouped by campaign theme, with AI confidence scores and justification text visible per term
Negative keywords grouped by campaign, with match type recommendations and 90-day spend data for prioritization
Approve / Reject / Override controls per recommendation, with bulk actions for efficiency

Approved recommendations push directly to the Google Ads API. No copy-pasting into Google Ads Editor, no spreadsheet intermediaries. The team reviews AI decisions in context, clicks approve, and the system executes.

This is the critical bridge between full automation and human judgment. The AI handles the analysis at scale. The team validates the decisions that matter. Over time, as accuracy improves and trust builds, the approval threshold can shift, moving more decisions to auto-approve while keeping edge cases in the review queue.

The Self-Improving Feedback Loop

This is what makes the system genuinely autonomous over time, not just automated.

When a team member rejects a recommendation, that correction feeds back into the system:

Rejected recommendations flow into the gold set. The gold set is a curated dataset of 1,000+ human-labeled terms used to benchmark AI accuracy. Every team correction makes the benchmark more representative of real-world edge cases.
The eval pipeline re-runs against the updated gold set. It measures action accuracy, competitor detection accuracy, and per-category precision/recall. Any regression surfaces immediately.
AI detects patterns in rejections. If the team consistently overrides a specific type of classification, like misrouting competitor terms or scoring a product category too high, the pattern becomes visible in the eval stats.
Prompts get refined based on eval results. The classification prompt is the core lever. A single line change (“treat Google products as platforms, not competitors”) can shift accuracy by 2-3 percentage points across thousands of terms.
New eval run validates the prompt change. No change ships to production without improving the benchmark. If accuracy drops, the change gets reverted.

This loop runs continuously. The system doesn’t just process search terms, it learns from every human correction and gets measurably better at the specific decisions your team cares about. After 30+ eval runs, a well-tuned pipeline hits 94%+ accuracy on action recommendations and 96%+ on competitor detection.

This is the future of PPC management. Not “set it and forget it” automation, but a system where AI handles the volume, humans handle the judgment calls, and every interaction between the two makes the system smarter. The teams that build this infrastructure now will operate at a scale and precision that manual workflows cannot match.

Evaluations

In a production engineering environment, a single “Gold Set” isn’t enough. We treat AI evaluations as a multi-layered validation stack to ensure every automated decision is technically sound and economically viable:

Static Evals (The Gold Set): Automated unit tests against human-labeled ground truth to track precision and recall over time.
Semantic Similarity: Using embeddings to measure how closely an AI’s reasoning aligns with our target outputs, catching nuances that binary checks miss.
Model-as-a-Judge: Leveraging high-reasoning models to programmatically audit the logic of our high-speed production models.
Business Logic Guardrails: Hard-coded “circuit breakers” that validate AI decisions against strict financial constraints—like max CPA or net margin floors—before they hit the Google Ads API.

If a change doesn’t improve the benchmark or clear the guardrails, it doesn’t ship.

The Staging-to-Production Pattern

Every pipeline uses the same data flow pattern: raw AI results INSERT into a staging table (append-only, never deleted), then MERGE into a production table (deduplicated, progressively enriched).

Why this matters: a three-pass pipeline writes partial results after each pass. Pass 1 writes classification scores. The web search bridge adds definitions. Pass 2 updates the remaining fields. The MERGE uses COALESCE(source.field, target.field) so each pass adds data without overwriting what previous passes already filled in. Re-runs are safe because MERGE is idempotent. And the staging table keeps a permanent audit trail of every AI decision, which matters when you’re debugging why a term was classified a certain way three months later.

Production Pipeline Requirements

Every pipeline ships with these layers around the core logic:

Batching: split large inputs into chunks (e.g., 50 terms per API call)
Rate limiting: delays between API calls to stay within quotas
Retry logic: exponential backoff on transient failures
Error handling: catch and log individual failures without crashing the full run
Logging: progress output so you can diagnose a run after the fact

These are the things visual workflow tools like n8n provide inside each node automatically. In Python, they’re explicit. Claude knows to include them because the production checklist is encoded in the architecture rules.

Google Ads API: Closing the Loop

The Google Ads API is the execution layer. Once your pipelines are producing decisions in BigQuery, the API is how those decisions become platform actions. Don’t use the API to read data. That’s what BigQuery Data Transfer is for. Use the API to write changes and manage accounts programmatically.

Credentials live in .env (never committed to git):

GOOGLE_ADS_DEVELOPER_TOKEN=your_token
GOOGLE_ADS_CLIENT_ID=your_client_id
GOOGLE_ADS_CLIENT_SECRET=your_secret
GOOGLE_ADS_REFRESH_TOKEN=your_refresh_token
GOOGLE_ADS_LOGIN_CUSTOMER_ID=your_mcc_id

What the API Enables (Autonomous Execution)

The API uses GAQL (Google Ads Query Language) for both queries and mutations. For true autonomous systems, you use mutations to close the loop:

Add negative keywords — Flag a search term as unprofitable in BigQuery, execute a mutation via the API to add it to a negative keyword list in the campaign automatically.

from google.ads.googleads.client import GoogleAdsClient

client = GoogleAdsClient.load_from_env()
service = client.get_service("GoogleAdsService")
customer_id = "your-customer-id"

# Create a negative keyword in a campaign
operation = client.get_type("SharedCriterionOperation")
shared_criterion = operation.create
shared_criterion.keyword.text = "unprofitable search term"
shared_criterion.keyword.match_type = "BROAD"

response = service.mutate_shared_criteria(
    customer_id=customer_id,
    operations=[operation],
)

Add keywords — Classify a search term as high-intent, upload it as a keyword to the right campaign and ad group via the API.

Adjust bids — Update keyword bids, ad group bids, or campaign budgets based on performance data from BigQuery.

Modify ROAS targets — Update portfolio bid strategy target ROAS based on forecasted performance or profitability curves. Advanced: update targets based on predicted lifetime value models to optimize for customer profitability, not just immediate conversion value.

Pause/enable entities — Automatically pause underperforming ad groups or enable new campaigns based on threshold logic.

Update budgets — Reallocate daily budgets across campaigns based on predictive models and performance forecasts. Can also use the Offline Conversion Import API to upload predicted LTV values as conversion values, enabling tROAS bidding based on predicted customer profitability.

The Architecture

BigQuery (historical data + analysis) -> Python (decisions + mutations) -> Google Ads API (live account changes)

The flow:

BigQuery queries identify a problem (high-spend, zero-conversion search terms)
Python script prepares mutations (API calls to add negatives, update bids, etc.)
Google Ads API executes those mutations on the live account
Data Transfer syncs the updated account state back to BigQuery
Next run, the analysis includes the results of the previous execution

Claude handles the API mechanics: GAQL syntax, mutation structure, error handling for partial failures. You define the business rules: what performance threshold triggers an action, what budget reallocation formula to use, when to pause entities. This is what makes it truly autonomous.

The Learning Layer

Building these systems compounds in two directions: you ship better automation, and you develop genuine engineering knowledge. These aren’t separate.

The mechanism: Claude builds something, it works, you ask it to explain what it just built. Claude uses your actual code and your actual data as the example, not a textbook exercise. That’s when the understanding sticks.

The learning-opportunity skill makes this intentional. When you want to understand what was built: “Explain this to me.” Claude gives three depth levels: conceptual overview, practical mechanics, advanced internals. You pick the one that’s useful right now.

The progression over time:

Early: Run pipelines, ask “what did this just do?”
Weeks in: Modify code Claude wrote. Adjust parameters, add filters, change logic.
Months in: Describe new system requirements, Claude builds, you review and direct the architecture.
Later: You design the system, Claude implements, you own the architecture decisions.

The coding knowledge accumulates as a side effect of solving real problems. You were never studying Python in isolation. You were building a search term classifier, and Python was the implementation language.

Principles

Plan with gates, execute freely. Confirm the architecture once, then let it run.
CLI over MCP in Claude Code. Zero context overhead.
Route models by task type. Opus for reasoning, Haiku for execution.
Docs are the memory. Schema docs + metric definitions + GOAL.md files = correct output on the first attempt.
Self-improving loop. Every mistake becomes a standing rule.
Evaluate everything. Gold sets, accuracy tracking, regression detection. You can’t improve what you don’t measure.
The system is the asset. Not any individual script, but the connected pipeline that runs, evaluates, and improves.

Last updated March 2026.

Why Claude Code?#

The Persistent Context & Self-Improvement Loop#

CLAUDE.md: The Operating Instructions#

The Navigation Layer: docs/index.md#

Project Context: GOAL.md#

The Self-Improvement Loop#

Skills: Reusable Routines#

Tooling: Why Claude Code Is Different#

Tools I Use Daily#

The Execute, Validate, Iterate Loop#

Plan Mode: Think Before Building#

When to Use It#

Example: Building a Search Term Classification Pipeline#

Model Routing Strategy#

BigQuery: The Data Foundation#

Google Ads Data Transfer#

Beyond Google Ads: Your Full Data Warehouse#

Schema Self-Discovery#

The BigQuery Decision Tree (Example Structure)#

Calculated Metrics & Channel Taxonomy#

bq CLI vs BigQuery MCP#

Real Query Patterns#

Agentic Workflows: Connecting LLM APIs to Your Data#

What You Can Build#

The Team Review Dashboard#

The Self-Improving Feedback Loop#

Evaluations#

The Staging-to-Production Pattern#

Production Pipeline Requirements#

Google Ads API: Closing the Loop#

What the API Enables (Autonomous Execution)#

The Architecture#

The Learning Layer#

Principles#