Living document. Last updated March 2026. I update this when the stack meaningfully changes.
PPC is evolving past ‘if-then’ scripts into engineered agentic systems: data analysis, AI classification, and execution logic operating in a continuous loop. These systems don’t just automate tasks, they handle work that wasn’t feasible before.
In practice, that looks like:
- Classifying thousands of search terms with an LLM, routing them to the right campaign and ad group, and pushing negatives and relevant new keywords directly through the Google Ads API after human approval
- Automating budget distribution and tROAS adjustments using predictive incrementality curves
- Measuring predictive LTV using Offline Conversion Import through Google Ads API
- Forecasting demand and conversion trends with open-source forecasting models like Prophet, statistical approaches, or more advanced ML methods depending on your data
- Rotating campaign and industry-specific assets based on performance signals
- Generating weekly and monthly performance reports at scale using AI agents
- Building interactive agents your team can talk to about data, performance, strategy, and projects
The Stack: BigQuery acts as your data warehouse, while SQL and Python handle the heavy lifting of analysis. LLM APIs provide the “brain” for classification and decision logic, and the Google Ads API serves as the hands for execution. Using Python and the API’s mutate operations, the system pushes changes at a scale no human can hit.
Claude Code is the force multiplier here. It doesn’t just “suggest” snippets; it fundamentally understands how to architect and operate entire systems. Building systems of these complexity used to require a dedicated DevOps and engineering team. Now, you can operate sophisticated agentic architectures solo.
Why Claude Code?
Claude Code is an agent, not a chatbot. It takes a goal, breaks it into steps, and executes them using real tools: your file system, terminal, git history, any CLI tool on your machine. It writes files, queries databases, runs evals, commits changes, and iterates based on actual results.
I run it from the terminal inside Cursor. My project files are visible in the editor while I talk to Claude in the integrated terminal. I can see file changes happening in real time as Claude works. You can also run Claude Code standalone from any terminal, or through the Claude Desktop app which provides a more visual interface.
Note on other Claude interfaces. Claude Desktop is useful for quick conversational work: planning, analysis, writing. You can give it access to a folder on your machine via a Filesystem MCP, which lets it read and write your project files during chat sessions. Claude Cowork is a newer interface designed for document-heavy tasks like reports, research synthesis, and file organization. I don’t use it much since my work is mostly code and data, but it’s relevant if you’re more document-oriented.
The rest of this article focuses on Claude Code, because that’s where the engineering happens.
The Persistent Context & Self-Improvement Loop
This is the core of the system. It’s what turns Claude Code from a generic coding tool into a personal assistant that understands your projects, your data, your code, your strategy, and gets better every session.
CLAUDE.md: The Operating Instructions
Every repository has a CLAUDE.md file at the root. Claude Code reads it automatically at the start of every session. It defines how Claude should work with you: how to plan, how to validate, what to read first, what rules to follow.
Here’s a simplified version of my actual CLAUDE.md:
# CLAUDE.md
## How to Assist
### Step 1: Understand Intent
Read the request. Identify whether this is planning-heavy (use Opus)
or execution-heavy (use Sonnet/Haiku) and suggest the right model.
### Step 2: Find Relevant Documentation
Read docs/index.md to find relevant docs. Only load what's needed.
- Always read a project's GOAL.md before working on it
- For BigQuery, start with bigquery/README.md, then load the schema
- Re-check docs/index.md when the topic shifts mid-conversation
### Step 3: Plan, Then Execute
Present the full plan with all steps. Include a Validation section:
"How will we know this worked?" Wait for confirmation.
Once confirmed, execute autonomously: write code, run queries,
iterate, fix errors. Only pause if something unexpected changes the plan.
### Step 4: Update Your Own Context
After completing work, review what changed and propose doc updates:
- Update anything now out of date
- Remove references to things that no longer exist
- Add documentation for anything new
## Critical Rules
### Self-Improvement Loop
After any correction or mistake, propose a concrete edit to CLAUDE.md
or the relevant doc that prevents the same mistake recurring.
### Verification Before Done
Never mark a step complete without demonstrating the result:
- SQL: expected output columns or sample rows
- Code: expected input/output or test commands
- Data pipeline: row counts or non-null checks
### Architecture
Every Python pipeline gets a Mermaid diagram in the README.
Write code for humans: clear names, comments on non-obvious logic.
Start simple, evolve intentionally.
This is deliberately short, around 70 lines. Every line loads into Claude’s context window on every session, so every line must earn its place. The detailed knowledge lives in the files CLAUDE.md points to.
The Navigation Layer: docs/index.md
The context window is finite. You can’t load every document every session. Instead, you build a navigation layer: an index file that tells Claude what exists and where to find it.
Claude reads this first, then opens only the specific file needed for the current task:
# Documentation Index
## Projects
| Project | Goal File | Overview |
|------------------------|-----------------------------------|-------------------------------------------------------|
| Search Term Pipeline | projects/search-terms/GOAL.md | AI classification, campaign routing, ad group routing |
| Bid Management | projects/bid-management/GOAL.md | tROAS optimization, bid strategy analysis |
| Budget Distribution | projects/budget/GOAL.md | Predictive budget allocation across campaigns |
| Ad Copy Automation | projects/ad-copy/GOAL.md | Ad copy creation, rotation, RSA testing at scale |
| Automated Reporting | projects/reporting/GOAL.md | MBR/WBR report generation from BigQuery |
| Forecasting | projects/forecasting/GOAL.md | Spend and conversion forecasting with backtesting |
## BigQuery & Data
| Document | Location | Use When |
|------------------------|-----------------------------------|-----------------------------------|
| BigQuery Decision Tree | bigquery/README.md | Figuring out which table to query |
| Google Ads Transfer | bigquery/schemas/gads.md | Raw Google Ads entity + stats |
| Performance Metrics | bigquery/schemas/performance.md | Aggregated business metrics |
| Calculated Metrics | bigquery/metrics/formulas.md | CAC, LTV:CAC, ROAS formulas |
| Channel Taxonomy | bigquery/metrics/channels.md | Channel/subchannel/stage filters |
## By Question Type
| "I want to..." | Start Here | Then Read |
|---------------------------------------------------|-------------------------------|------------------------------------|
| Optimize budget allocation using historical ROI | budget/GOAL.md | bigquery/schemas/performance.md |
| Analyze tROAS drift and adjust bid strategies | bid-management/GOAL.md | bigquery/schemas/gads.md |
| Build forecast models with backtesting | forecasting/GOAL.md | bigquery/schemas/performance.md |
| Review pLTV model accuracy and retrain | predictive-ltv/GOAL.md | bigquery/schemas/ltv_predictions.md |
| Run search term classification pipeline | search-terms/GOAL.md | search-terms/classification.md |
| Run eval suite and compare baseline accuracy | search-terms/eval.md | search-terms/gold-set.md |
| Review classification eval stats across runs | search-terms/eval-stats.md | search-terms/eval.md |
| Run prompt ablation testing on gold set | search-terms/prompts/ | search-terms/eval-stats.md |
| Generate MBR with actuals vs targets | reporting/runbook.md | reporting/script-docs.md |
| Evaluate landing page conversion patterns | bigquery/schemas/gads.md | LandingPageStats analysis patterns |
| Query BigQuery with multi-table joins | bigquery/README.md | Specific schema file |
The “By Question Type” section is important. It routes Claude to the right starting point based on what you’re asking, not which project folder it lives in. The more routing context you provide, the faster Claude finds what it needs.
Project Context: GOAL.md
Each project gets a GOAL.md file: what the project is, its current state, what’s been completed, what’s next. Claude reads it before doing any work on that project.
This prevents re-briefing. When you say “let’s continue the search term pipeline,” Claude reads the GOAL.md, sees exactly where you left off, and picks up from there.
The Self-Improvement Loop
This is what makes the system compound over time.
After every correction or mistake, Claude proposes an edit to CLAUDE.md or the relevant documentation file that would prevent the same mistake from happening again. You review the proposed edit, approve or modify it, and it becomes a standing rule.
A few examples:
- Claude uses the wrong BigQuery table for a stats query. New rule: “Always use
p_ads_*tables withsegments_datefor date-range stats” - Claude forgets to include rate limiting in a pipeline. Checklist added: “Every pipeline must include batching, rate limiting, retry logic, error handling, logging”
- Claude generates SQL with
SELECT *. Rule: “Never useSELECT *, always specify columns to control costs”
Over months, this accumulates into a comprehensive set of instructions tailored exactly to your workflow. The assistant gets measurably better at your specific work because every friction point becomes a permanent fix.
This is what makes it a real assistant: it remembers what went wrong, fixes its own instructions, and doesn’t make the same mistake twice.
Skills: Reusable Routines
A skill is a Markdown file that describes a repeatable routine. When you say a trigger phrase, Claude loads the skill and follows its instructions. Store them in .claude/skills/, each in its own folder with a SKILL.md file.
Skills I use regularly:
| Skill | What it does |
|---|---|
| run-pipeline | Executes a specific data pipeline (classification, routing, reporting) with production flags. Validates output row counts and non-null checks before marking complete. |
| run-eval | Runs the evaluation framework against a gold set. Compares classification accuracy, reports per-category precision/recall, flags regressions. Used after every prompt change or model swap. |
| weekly-update | Generates a weekly progress summary from git history. Matches commits to Jira tickets by prefix, proposes status transitions. One command for the full sprint review. |
| send-slack-report | Compiles key metrics from BigQuery and sends a formatted summary to a Slack channel via webhook. Scheduled or on-demand. |
| sync-jira | Updates Jira ticket statuses based on recent git commits. Proposes transitions (In Progress, Done, In Review) based on branch names and commit messages. |
| sync-confluence | Pushes local Markdown docs to Confluence via REST API. Shows a dry-run diff first, waits for confirmation, then syncs. Docs stay in git as source of truth. |
| repo-health | Audits the repository: checks every project has a GOAL.md, verifies all index links are valid, flags orphaned docs, detects stale statuses. |
| visualize-flow | Generates Mermaid architecture diagrams for data pipelines. Replaces the visual canvas you’d get from tools like n8n. |
| learning-opportunity | Shifts into teaching mode with three depth levels (conceptual, practical, advanced). Uses your actual code as the example, not textbook abstractions. |
| dashboard-sync | Syncs task state between a visual HTML dashboard and Claude Code via a shared JSON file. Bidirectional: move a task on the dashboard, Claude sees it next session. |
The principle: if you’re re-explaining the same thing to Claude more than twice, write a skill for it. Skills are how you encode operational knowledge into the system.
Skills also cover production maintenance. When a pipeline is live, you need to run it, evaluate its output, compare accuracy across prompt versions, and catch regressions. These are all skills, not ad-hoc commands.
Tooling: Why Claude Code Is Different
The critical differentiator between Claude Code and a chat-based AI is that Claude Code can use tools. Not just generate code, but execute it, read the output, spot problems, fix them, and iterate. This is the difference between getting code suggestions and having an agent that actually operates your systems.
Tools I Use Daily
**bqCLI**: Query BigQuery directly from the terminal. Claude writes SQL, runs it, reads the results, identifies issues, and iterates until the output is correct.**gcloud**: Google Cloud authentication, project switching, service account management.**git**: Commits, diffs, branch management, history. The weekly-update skill readsgit logto generate progress reports.**python/pip**: Execute scripts, install packages, run full pipelines. Claude can run a Python classification pipeline end-to-end and read its logs.**gh**: GitHub CLI for pull requests, issues, code review.
The Execute, Validate, Iterate Loop
This is where the real power sits. Claude doesn’t just write a query. It runs it, validates the output, and fixes problems autonomously.
Example: “Which search terms are spending over $50 with zero conversions?”
- Claude checks the BigQuery schema docs for the right table and column names
- Writes the SQL query using the correct partitioned table and date filter
- Runs it via
bq query - Reads the output, notices 0 rows returned
- Investigates: queries
INFORMATION_SCHEMAto check the actual column name for conversions - Discovers the column is
metrics_conversionsnotconversions - Fixes the query, re-runs
- Returns 847 search terms with clean analysis and a recommendation
No copy-pasting between tools. No switching to BigQuery console. No manual debugging of column names. The agent handles the iteration loop that a human would otherwise do manually across multiple browser tabs.
This extends to multi-step analysis. Claude can query one dataset to identify a problem, query a second dataset to find the cause, query a third to validate the hypothesis, all autonomously within a single conversation turn. That’s the kind of analysis workflow that takes a human 45 minutes of tab-switching and becomes a single natural language instruction.
BigQuery is central to everything Claude builds. Whether it’s a Python forecasting script, a budget distribution model, or an ad copy evaluation framework, the data almost always comes from BigQuery. Claude queries it to build the dataset, runs the analysis in Python, and writes results back. The bq CLI is the bridge that makes this seamless.
Plan Mode: Think Before Building
Type /plan or press Shift+Tab twice. Claude enters read-only research mode. It can explore your codebase, read docs, and query schemas, but cannot write files or execute commands. It’s forced to think before acting.
When to Use It
- Before any build that touches multiple files or systems
- When the architecture has multiple valid approaches
- When you need to understand existing code before modifying it
- Before building any pipeline that involves multiple APIs or data sources
Example: Building a Search Term Classification Pipeline
This is a non-trivial system: BigQuery SQL (input query to fetch unclassified terms), Python batching (split into chunks of 100), Gemini API (classify each batch with structured JSON output), a web search bridge for ambiguous terms, a second AI pass with enriched context, BigQuery writes (insert to staging, merge to production), and an evaluation framework (compare against a gold set for accuracy tracking).
That’s 6-7 systems interacting across three sequential passes. You don’t want Claude to start writing code immediately. You want it to think through the architecture first.
In plan mode:
- Describe the goal and constraints. “Build a pipeline that classifies unclassified search terms using Gemini. Input from BigQuery, output to BigQuery. Must handle 10K+ terms. Must include evaluation.”
- Claude explores. It reads your existing code, checks what utilities already exist (batch processing? Gemini client?), reads the BigQuery schema docs for input/output table structures.
- Claude presents the plan. Entry point script, data flow diagram, API call structure, error handling strategy, validation steps (“How will we know this worked? Run the eval pipeline and check accuracy against the gold set”).
- You review and adjust. “Use the existing Gemini client. Add a
--limitflag for testing. Skip enrichment for v1.” - Confirm and execute. Claude builds the full pipeline autonomously. It writes the code, runs it with
--limit 10to test, checks the BigQuery output, fixes any issues, and presents results.
The discipline is: confirm the plan once, then let Claude run. Don’t micro-manage individual steps. Pause only if something unexpected changes the scope.
Model Routing Strategy
Claude Code supports multiple models that you can switch between mid-session. Not every task needs the most capable (and most expensive) model.
| Task type | Model | Reasoning |
|---|---|---|
| Architecture decisions, project scoping, complex debugging | Opus | Deep reasoning justifies the cost |
| Writing SQL from a clear spec, building scripts, documentation | Sonnet | Clear instructions, clean execution |
| Running pipelines, file edits, formatting, repetitive tasks | Haiku | Fast, reliable, follows instructions precisely |
The target distribution I aim for: ~5% Opus, ~15% Sonnet, ~80% Haiku. Once a plan is confirmed and the spec is clear, Haiku executes it cleanly at a fraction of the cost. Switch back to Opus only when something unexpected requires re-reasoning.
/model opus # Planning and architecture
/model sonnet # Execution requiring judgment
/model haiku # Execution against a clear spec
The general rule: if you’re asking Claude to think, use Opus. If you’re asking Claude to do, use Haiku.
BigQuery: The Data Foundation
This is the longest section in the article because BigQuery is the most powerful part of the stack. Everything starts here. Every pipeline Claude builds, whether it’s a forecasting model, a budget distribution optimizer, or an ad copy evaluator, pulls its data from BigQuery and typically writes results back to it.
Google Ads Data Transfer
The Google Ads Data Transfer gives you the most granular level of Google Ads data available in a queryable format. It transfers daily snapshots of every entity in your account to BigQuery. This is valuable because it contains your complete account structure: every campaign, ad group, keyword, negative, search term, historical ROAS targets, budget settings, cost and conversion stats. It’s everything you need for campaign optimization in a single queryable location, and Claude knows how to query all of it.
Entity tables (dimension data):
- Campaign: name, status, bidding strategy, tROAS/tCPA targets, budget, network settings
- AdGroup: name, status, CPC bid, targeting settings
- Keyword: text, match type, status, bid, quality score
- Ad: type, final URLs, RSA headlines/descriptions, ad strength
- BidGoal: portfolio bid strategy settings and targets
- Audience, Extension, Criterion: all targeting and extension entities
Stats tables (performance metrics):
- CampaignBasicStats: impressions, clicks, cost, CTR, CPC at campaign level
- CampaignConversionStats: conversions, conversion value, ROAS at campaign level
- AdGroupBasicStats / AdGroupConversionStats: same metrics at ad group level
- SearchQueryStats: search term performance (the raw search query report)
- KeywordBasicStats / KeywordConversionStats: keyword-level performance
- LandingPageStats: impressions, clicks, cost per landing page URL per campaign
- BidGoalStats: bid strategy performance metrics
What you get that the Google Ads interface doesn’t provide at this granularity:
- Historical tROAS/tCPA changes: daily snapshots of campaign settings let you track exactly when bid targets changed and correlate with performance shifts
- Landing page performance per campaign: the Ads UI aggregates landing pages; the Data Transfer gives you per-campaign breakdowns
- Search query stats at SQL scale: query across 500K+ terms with custom aggregations, joins, and filters that the UI’s reporting tab can’t do
- Asset-level performance: headline and description stats for RSA ads
- Bid strategy configuration history: portfolio bid strategy settings over time
Two table types exist for each entity:
**ads_* views**: latest snapshot only. Use for dimension lookups. Filter:WHERE _DATA_DATE = _LATEST_DATE**p_ads_* partitioned tables**: full history. Use for date-range analysis. Filter:WHERE segments_date BETWEEN ... AND ...
The critical rule: always use p_ads_ tables with segments_date for any performance stats query over a date range.* The ads_* views only contain the latest day.
Running this in Google Cloud means the data is already co-located with your compute. You can connect any other API to it: Google Ads API to push changes back, Gemini for AI classification, Python for custom pipelines and statistical modeling. You can build complete PPC management software on this data layer if you invest the time.
Beyond Google Ads: Your Full Data Warehouse
Google Ads Data Transfer is just the starting point. BigQuery is a data warehouse, which means you can connect any data source your business has. LTV and customer lifetime value data from your backend systems. Order data, revenue, margins, returns, subscription metrics. If your data team has it in BigQuery (or you can get it there), Claude can query it.
This is where PPC optimization gets genuinely powerful. You’re no longer optimizing campaigns based on what Google Ads tells you (clicks, conversions, ROAS). You’re optimizing based on what your business actually cares about: real margins, customer quality, predicted lifetime value, return rates. A campaign that looks great on last-click ROAS might be driving low-LTV customers or high-return-rate products. With backend data in BigQuery, you can see that and act on it.
The pattern is the same regardless of data source: document the table schema, add it to your BigQuery decision tree, and Claude starts using it in analysis. Whether that’s Google Ads performance data, backend revenue tables, or product catalog information, it’s all just SQL once it’s in BigQuery.
Schema Self-Discovery
Claude can query INFORMATION_SCHEMA directly to discover table structures before running any analysis. This means you don’t need pre-built documentation for every table. Claude discovers columns, data types, and date ranges on the fly:
-- Discover all tables in a dataset
SELECT table_name, table_type
FROM `your-project.your_dataset.INFORMATION_SCHEMA.TABLES`
ORDER BY table_name;
-- Inspect columns for a specific table
SELECT column_name, data_type
FROM `your-project.your_dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'your_table'
ORDER BY ordinal_position;
-- Check available date ranges before any historical query
SELECT MIN(segments_date) AS earliest, MAX(segments_date) AS latest, COUNT(*) AS rows
FROM `your-project.your_dataset.p_ads_CampaignBasicStats_XXXXXXXXXX`
WHERE segments_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY);
Claude runs these discovery queries automatically when it encounters a table it hasn’t seen before. It checks what columns exist, what date ranges are available, and then writes the correct query.
That said, it’s better to document key tables when you use them frequently. Pre-built schema docs with business context make Claude faster and more accurate:
- Schema docs eliminate discovery overhead. Claude knows the columns immediately.
- You can include business context: “this column is in micros, divide by 1,000,000” or “filter by
campaign_status = 'ENABLED'by default” - Calculated metrics get their own reference file with canonical SQL formulas
The ideal is both: documented schemas for core tables, self-discovery for everything else.
The BigQuery Decision Tree (Example Structure)
Here’s an example of how you might structure BigQuery knowledge in your repo. Customize this to match your actual setup, tables, and naming conventions.
bigquery/
├── README.md <- Decision tree: which table for which question
├── schemas/
│ ├── google_ads_transfer.md <- Google Ads entity + stats tables, tROAS changes, budget tracking
│ ├── performance.md <- Aggregated business metrics (revenue, CAC, LTV, ROAS trends)
│ ├── search_term_tables/ <- AI classification pipeline outputs
│ │ ├── classification.md <- Search term classification results
│ │ ├── campaign_routing.md <- Campaign routing decisions
│ │ └── ad_group_routing.md <- Ad group assignment results
│ ├── predictive_models.md <- Forecast outputs, budget allocation recommendations
│ ├── ltv_predictions.md <- Predictive lifetime value models (advanced)
│ └── targets.md <- Monthly budget and revenue targets
└── metrics/
├── calculated-metrics.md <- Every business metric formula
└── channel-breakdown.md <- Channel/subchannel/stage taxonomy
The README.md is a decision tree that routes Claude to the right table. Here’s an example:
| Question type | Table | Schema doc |
|---|---|---|
| Ad platform performance (clicks, cost, conversions) | Google Ads Transfer tables | schemas/google_ads_transfer.md |
| tROAS changes over time | Campaign snapshots (Transfer) | schemas/google_ads_transfer.md |
| Campaign budget changes | Campaign dimension history | schemas/google_ads_transfer.md |
| Backend revenue, CAC, LTV:CAC | Performance metrics table | schemas/performance.md |
| Search term classification results | Pipeline output tables | schemas/search_term_tables/ |
| Forecasting and budget optimization | Predictive models table | schemas/predictive_models.md |
| Predicted customer lifetime value | LTV predictions table | schemas/ltv_predictions.md |
| Landing page evaluation | LandingPageStats (Transfer) | schemas/google_ads_transfer.md |
| Budget vs actual targets | Targets table | schemas/targets.md |
The more context you give your agent about which table to use for which question, the better its first query attempt will be. This is where documentation pays for itself every session.
Calculated Metrics & Channel Taxonomy
Two files handle the business logic layer:
Calculated metrics: a single file that defines every business metric formula with canonical SQL:
## CAC (Customer Acquisition Cost)
Formula: costs / new_paying_users
SQL: SAFE_DIVIDE(SUM(costs), SUM(new_paying_users))
Interpretation: < $100 excellent, $100-200 healthy, > $200 investigate
## LTV:CAC
Formula: lifetime_value / costs
SQL: SAFE_DIVIDE(SUM(lifetime_value), SUM(costs))
Interpretation: < 1.0 losing money, 1.0-3.0 growth mode, > 3.0 healthy
## Conversion Rate
Formula: conversions / clicks
SQL: SAFE_DIVIDE(SUM(conversions), SUM(clicks))
Note: Always aggregate first, then calculate the ratio
Every formula uses SAFE_DIVIDE() to prevent division-by-zero errors. Claude references this file whenever it writes metric calculations, ensuring consistent definitions across every query.
Channel taxonomy: defines the hierarchy of channels (paid search non-brand, paid search brand, display, video, paid social), subchannels (by product or toolkit), funnel stages (attract, engage, convert), and teams. Includes default SQL filters and field naming conventions.
This is the “business logic” layer. Without it, Claude would write technically correct SQL that produces meaningless results. With it, Claude understands what “PSNB performance” means, which filters to apply by default, and what metric definitions to use.
bq CLI vs BigQuery MCP
In Claude Code, always use the bq CLI. It costs zero context tokens, works across any GCP project you have access to, and gives you the full CLI feature set.
The BigQuery MCP is designed for Claude Desktop chat sessions. It works, but it costs 15-20K context tokens just to load the tool definitions at session start, and it’s locked to a single GCP project.
| Factor | bq CLI (Claude Code) |
BigQuery MCP (Desktop) |
|---|---|---|
| Context cost | 0 tokens | ~15-20K tokens |
| Cross-project | Any project with IAM | Locked to one |
| Flexibility | Full CLI | Limited tool definitions |
Real Query Patterns
Search term stats over a date range:
SELECT
search_term_view_search_term,
SUM(metrics_cost_micros) / 1000000 AS cost,
SUM(metrics_clicks) AS clicks,
SUM(metrics_impressions) AS impressions,
SAFE_DIVIDE(SUM(metrics_conversions), SUM(metrics_clicks)) AS cvr
FROM `your-project.your_dataset.p_ads_SearchQueryStats_XXXXXXXXXX`
WHERE segments_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) AND CURRENT_DATE()
GROUP BY 1
ORDER BY cost DESC
LIMIT 500
Tracking tROAS changes over time (using window functions on daily campaign snapshots):
WITH daily_settings AS (
SELECT
campaign_id, campaign_name,
DATE(_PARTITIONTIME) AS snapshot_date,
campaign_maximize_conversion_value_target_roas AS target_roas
FROM `your-project.your_dataset.p_ads_Campaign_XXXXXXXXXX`
WHERE _PARTITIONTIME >= TIMESTAMP(DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY))
AND campaign_maximize_conversion_value_target_roas IS NOT NULL
AND campaign_maximize_conversion_value_target_roas > 0
),
with_previous AS (
SELECT *,
LAG(target_roas) OVER (PARTITION BY campaign_id ORDER BY snapshot_date) AS prev_roas
FROM daily_settings
)
SELECT
campaign_name, snapshot_date,
prev_roas AS old_troas,
target_roas AS new_troas,
ROUND(target_roas - prev_roas, 2) AS change
FROM with_previous
WHERE prev_roas IS NOT NULL AND target_roas != prev_roas
ORDER BY snapshot_date DESC
Multi-level join (Campaign, Ad Group, Stats):
SELECT
c.campaign_name,
ag.ad_group_name,
SUM(s.metrics_impressions) AS impressions,
SUM(s.metrics_clicks) AS clicks,
SUM(s.metrics_cost_micros) / 1000000 AS cost
FROM `your-project.your_dataset.ads_AdGroupBasicStats_XXXXXXXXXX` s
LEFT JOIN (
SELECT ad_group_id, campaign_id, customer_id, ad_group_name
FROM `your-project.your_dataset.ads_AdGroup_XXXXXXXXXX`
WHERE _DATA_DATE = _LATEST_DATE
) ag USING (ad_group_id, campaign_id, customer_id)
LEFT JOIN (
SELECT campaign_id, customer_id, campaign_name
FROM `your-project.your_dataset.ads_Campaign_XXXXXXXXXX`
WHERE _DATA_DATE = _LATEST_DATE
) c USING (campaign_id, customer_id)
WHERE s._DATA_DATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY 1, 2
ORDER BY cost DESC
These patterns compose. Once Claude knows them, it applies them correctly across any analysis.
Agentic Workflows: Connecting LLM APIs to Your Data
This is where the system becomes truly agentic. An agentic workflow is a Python pipeline that connects an LLM API to your data, makes decisions based on that data, and executes actions, either on a schedule or on demand.
The pattern is consistent across every workflow:
- Query BigQuery for the data the workflow needs (search terms, performance stats, budget actuals)
- Send that data to an LLM API (Gemini, Claude, or any model) with a structured prompt that defines the classification or analysis task
- Process the LLM’s response, validate it, write results back to BigQuery
- Execute the output via whatever API is relevant: Google Ads API for account changes, Google Docs API for reports, Slack webhooks for notifications
Once the pipeline is built, you schedule it. A cron job, Cloud Scheduler, or a simple bash script that runs daily. The search term classification pipeline runs every morning, processes new terms that came in overnight, and has recommendations ready for team review by the time you open your laptop. Weekly business reviews generate themselves every Monday. Budget reallocation recommendations refresh daily based on the latest performance data.
The key insight: these are just Python scripts with API calls. There’s no special orchestration framework needed. Claude Code builds them, you test them, and they run. If something breaks, Claude reads the logs and fixes it. If the LLM’s output quality drifts, the eval pipeline catches it and you refine the prompt.
Some workflows I run on a schedule:
- Search term classification: daily, processes new unclassified terms
- Negative keyword automation: weekly, flags high-spend zero-conversion terms for review
- WBR/MBR generation: weekly/monthly, queries actuals vs targets, generates narrative reports
- Budget distribution recommendations: daily, recalculates optimal allocation based on recent performance
- tROAS monitoring: daily, flags campaigns where actual ROAS deviates significantly from target
The more workflows you build, the more your system runs itself. You shift from doing the work to reviewing the work and improving the system that does it.
What You Can Build
The architecture of an autonomous PPC system:
flowchart TB
subgraph Data["Data Layer"]
BQ[(BigQuery\nSearch Query Stats)]
end
subgraph Classification["Stage 1: Classification"]
FETCH[Fetch Unprocessed Terms]
PASS1[Pass 1: AI Classification\nInternal Knowledge Only]
TAVILY[Web Search Bridge\nResolve Ambiguous Terms]
PASS2[Pass 2: Re-classification\nWith Web Context]
STG1[(Staging Table\nAppend-Only Audit Trail)]
PROD1[(Production Table\nMERGE Upsert)]
end
subgraph Routing["Stage 2: Campaign & Ad Group Routing"]
ROUTE[Score Campaign Themes\nPer Product Line]
ASSIGN[Assign to Ad Groups\nWith Deduplication]
STG2[(Routing Tables)]
end
subgraph Review["Stage 3: Team Review Dashboard"]
DASH[Python Dashboard\nFilter by Campaign / Action Type]
APPROVE[Approve / Reject / Override]
end
subgraph Execution["Stage 4: Google Ads API"]
NEG[Add Negatives]
KW[Add Keywords]
BID[Adjust Bids & Budgets]
end
subgraph Feedback["Continuous Improvement"]
CORRECTIONS[Team Corrections]
GOLD[Gold Set Update]
EVAL[Eval Pipeline\nPrecision / Recall / Accuracy]
PROMPT[Prompt Refinement]
end
BQ --> FETCH --> PASS1
PASS1 -->|Known Terms| STG1
PASS1 -->|Unknown Terms| TAVILY --> PASS2 --> STG1
STG1 --> PROD1
PROD1 -->|ADD_KEYWORD| ROUTE --> ASSIGN --> STG2
PROD1 -->|ADD_NEGATIVE| DASH
STG2 --> DASH
DASH --> APPROVE
APPROVE -->|Approved| NEG & KW & BID
APPROVE -->|Rejected| CORRECTIONS
NEG & KW & BID -->|Data Transfer Sync| BQ
CORRECTIONS --> GOLD --> EVAL --> PROMPT -->|Updated Prompts| PASS1
Each layer is code and APIs. Claude Code builds each layer and connects them. Concrete systems I’ve built or am building:
Search term classification: A three-pass pipeline that processes thousands of search terms per run. Pass 1 sends terms to Gemini in batches of 100, classifying each term across multiple product dimensions with scores (1-10), flagging competitors and branded terms, and recommending an action: ADD_KEYWORD, ADD_NEGATIVE, or TO_REVIEW. Terms the AI can’t confidently classify get flagged and sent through a web search bridge (Tavily) to resolve ambiguity. Pass 2 re-classifies those enriched terms with the web context included. All results flow through a staging table (append-only audit trail) before merging to production. This three-pass architecture handles the long tail of ambiguous terms that a single-pass system would misclassify.
Campaign routing: Takes classified terms marked as ADD_KEYWORD and routes them to specific campaigns based on product theme scoring. For a SaaS company with multiple product lines, Gemini scores each term against every campaign theme (e.g., 11 product-specific campaigns). The highest-scoring theme becomes the routing destination. Same three-pass pattern: main classification, web search for ambiguous terms, re-classification with context. Deduplication uses a token-sorting function that treats “seo tools” and “tools seo” as identical, preventing duplicate keyword additions across campaigns.
Budget distribution optimization: Analyzes historical performance data across campaigns to identify where marginal spend produces the best returns. Combines with predictive modeling (Prophet forecasts or regression models) to recommend budget reallocation across the portfolio. For advanced setups: use predictive LTV models to optimize budget based on predicted customer profitability, not just immediate ROAS. BigQuery provides the historical data, Python runs the optimization, Google Ads API pushes new daily budgets.
tROAS monitoring and optimization: Tracks tROAS changes across all campaigns using the daily snapshot pattern shown above. Correlates target changes with performance shifts. Feeds into a model that predicts optimal tROAS targets based on historical cost/conversion curves. Advanced: combine with predictive LTV modeling to set tROAS targets based on predicted customer profitability instead of immediate ROAS.
Ad copy automation and rotation: Generates ad copy variations using AI, manages RSA headline/description testing at scale, evaluates asset-level performance from BigQuery Data Transfer tables to identify top performers and retire underperformers.
Landing page evaluation: Uses LandingPageStats from the Data Transfer to analyze per-campaign landing page performance. Identifies pages with high traffic and low conversion rates, surfaces A/B testing opportunities.
Automated reporting (MBR/WBR): Queries BigQuery for actuals vs targets, generates visualizations and charts, uses Gemini to write narrative analysis and storyline, auto-publishes to Google Docs or Slides with formatted charts and executive summary. One command regenerates the full report each month. No manual chart creation, no copy-pasting metrics, no writing summaries — the pipeline handles the entire storytelling layer.
Forecasting: 90-day spend and conversion forecasts using Prophet or statistical models on historical BigQuery data. Includes automated backtesting to validate model quality before publishing predictions.
Predictive lifetime value modeling (advanced): Build models that predict customer lifetime value based on first-touch or cohort attributes. Score incoming traffic, then upload predicted LTV values back to Google Ads via the Offline Conversion Import API. This enables bid optimization on predicted profitability, not just last-click ROAS. Claude Code handles the model building, validation, and API integration.
Interactive team agents: Build conversational agents your team can talk to about data, performance, strategy, and projects. Connect an LLM to your BigQuery warehouse and internal documentation, and your team gets a natural language interface to ask questions, pull metrics, and explore trends without writing SQL or waiting for an analyst. The agent handles the query generation, runs it against your data, and returns formatted answers with context.
Negative keyword automation: Identifies search terms with high spend and zero conversions, filters against existing negative lists, pushes new negatives via Google Ads API with campaign-level targeting. Match type selection matters: exact match for branded terms (protect brand campaigns), phrase match for irrelevant competitors (block all variations), exact match for high-risk terms that contain core business words.
The Team Review Dashboard
This is where the human stays in the loop. A Python dashboard (Streamlit, Retool, or custom HTML) pulls recommendations directly from BigQuery and presents them in a filterable interface:
- Keyword additions grouped by campaign theme, with AI confidence scores and justification text visible per term
- Negative keywords grouped by campaign, with match type recommendations and 90-day spend data for prioritization
- Approve / Reject / Override controls per recommendation, with bulk actions for efficiency
Approved recommendations push directly to the Google Ads API. No copy-pasting into Google Ads Editor, no spreadsheet intermediaries. The team reviews AI decisions in context, clicks approve, and the system executes.
This is the critical bridge between full automation and human judgment. The AI handles the analysis at scale. The team validates the decisions that matter. Over time, as accuracy improves and trust builds, the approval threshold can shift, moving more decisions to auto-approve while keeping edge cases in the review queue.
The Self-Improving Feedback Loop
This is what makes the system genuinely autonomous over time, not just automated.
When a team member rejects a recommendation, that correction feeds back into the system:
- Rejected recommendations flow into the gold set. The gold set is a curated dataset of 1,000+ human-labeled terms used to benchmark AI accuracy. Every team correction makes the benchmark more representative of real-world edge cases.
- The eval pipeline re-runs against the updated gold set. It measures action accuracy, competitor detection accuracy, and per-category precision/recall. Any regression surfaces immediately.
- AI detects patterns in rejections. If the team consistently overrides a specific type of classification, like misrouting competitor terms or scoring a product category too high, the pattern becomes visible in the eval stats.
- Prompts get refined based on eval results. The classification prompt is the core lever. A single line change (“treat Google products as platforms, not competitors”) can shift accuracy by 2-3 percentage points across thousands of terms.
- New eval run validates the prompt change. No change ships to production without improving the benchmark. If accuracy drops, the change gets reverted.
This loop runs continuously. The system doesn’t just process search terms, it learns from every human correction and gets measurably better at the specific decisions your team cares about. After 30+ eval runs, a well-tuned pipeline hits 94%+ accuracy on action recommendations and 96%+ on competitor detection.
This is the future of PPC management. Not “set it and forget it” automation, but a system where AI handles the volume, humans handle the judgment calls, and every interaction between the two makes the system smarter. The teams that build this infrastructure now will operate at a scale and precision that manual workflows cannot match.
Evaluations
In a production engineering environment, a single “Gold Set” isn’t enough. We treat AI evaluations as a multi-layered validation stack to ensure every automated decision is technically sound and economically viable:
- Static Evals (The Gold Set): Automated unit tests against human-labeled ground truth to track precision and recall over time.
- Semantic Similarity: Using embeddings to measure how closely an AI’s reasoning aligns with our target outputs, catching nuances that binary checks miss.
- Model-as-a-Judge: Leveraging high-reasoning models to programmatically audit the logic of our high-speed production models.
- Business Logic Guardrails: Hard-coded “circuit breakers” that validate AI decisions against strict financial constraints—like max CPA or net margin floors—before they hit the Google Ads API.
If a change doesn’t improve the benchmark or clear the guardrails, it doesn’t ship.
The Staging-to-Production Pattern
Every pipeline uses the same data flow pattern: raw AI results INSERT into a staging table (append-only, never deleted), then MERGE into a production table (deduplicated, progressively enriched).
Why this matters: a three-pass pipeline writes partial results after each pass. Pass 1 writes classification scores. The web search bridge adds definitions. Pass 2 updates the remaining fields. The MERGE uses COALESCE(source.field, target.field) so each pass adds data without overwriting what previous passes already filled in. Re-runs are safe because MERGE is idempotent. And the staging table keeps a permanent audit trail of every AI decision, which matters when you’re debugging why a term was classified a certain way three months later.
Production Pipeline Requirements
Every pipeline ships with these layers around the core logic:
- Batching: split large inputs into chunks (e.g., 50 terms per API call)
- Rate limiting: delays between API calls to stay within quotas
- Retry logic: exponential backoff on transient failures
- Error handling: catch and log individual failures without crashing the full run
- Logging: progress output so you can diagnose a run after the fact
These are the things visual workflow tools like n8n provide inside each node automatically. In Python, they’re explicit. Claude knows to include them because the production checklist is encoded in the architecture rules.
Google Ads API: Closing the Loop
The Google Ads API is the execution layer. Once your pipelines are producing decisions in BigQuery, the API is how those decisions become platform actions. Don’t use the API to read data. That’s what BigQuery Data Transfer is for. Use the API to write changes and manage accounts programmatically.
Credentials live in .env (never committed to git):
GOOGLE_ADS_DEVELOPER_TOKEN=your_token
GOOGLE_ADS_CLIENT_ID=your_client_id
GOOGLE_ADS_CLIENT_SECRET=your_secret
GOOGLE_ADS_REFRESH_TOKEN=your_refresh_token
GOOGLE_ADS_LOGIN_CUSTOMER_ID=your_mcc_id
What the API Enables (Autonomous Execution)
The API uses GAQL (Google Ads Query Language) for both queries and mutations. For true autonomous systems, you use mutations to close the loop:
Add negative keywords — Flag a search term as unprofitable in BigQuery, execute a mutation via the API to add it to a negative keyword list in the campaign automatically.
from google.ads.googleads.client import GoogleAdsClient
client = GoogleAdsClient.load_from_env()
service = client.get_service("GoogleAdsService")
customer_id = "your-customer-id"
# Create a negative keyword in a campaign
operation = client.get_type("SharedCriterionOperation")
shared_criterion = operation.create
shared_criterion.keyword.text = "unprofitable search term"
shared_criterion.keyword.match_type = "BROAD"
response = service.mutate_shared_criteria(
customer_id=customer_id,
operations=[operation],
)
Add keywords — Classify a search term as high-intent, upload it as a keyword to the right campaign and ad group via the API.
Adjust bids — Update keyword bids, ad group bids, or campaign budgets based on performance data from BigQuery.
Modify ROAS targets — Update portfolio bid strategy target ROAS based on forecasted performance or profitability curves. Advanced: update targets based on predicted lifetime value models to optimize for customer profitability, not just immediate conversion value.
Pause/enable entities — Automatically pause underperforming ad groups or enable new campaigns based on threshold logic.
Update budgets — Reallocate daily budgets across campaigns based on predictive models and performance forecasts. Can also use the Offline Conversion Import API to upload predicted LTV values as conversion values, enabling tROAS bidding based on predicted customer profitability.
The Architecture
BigQuery (historical data + analysis) -> Python (decisions + mutations) -> Google Ads API (live account changes)
The flow:
- BigQuery queries identify a problem (high-spend, zero-conversion search terms)
- Python script prepares mutations (API calls to add negatives, update bids, etc.)
- Google Ads API executes those mutations on the live account
- Data Transfer syncs the updated account state back to BigQuery
- Next run, the analysis includes the results of the previous execution
Claude handles the API mechanics: GAQL syntax, mutation structure, error handling for partial failures. You define the business rules: what performance threshold triggers an action, what budget reallocation formula to use, when to pause entities. This is what makes it truly autonomous.
The Learning Layer
Building these systems compounds in two directions: you ship better automation, and you develop genuine engineering knowledge. These aren’t separate.
The mechanism: Claude builds something, it works, you ask it to explain what it just built. Claude uses your actual code and your actual data as the example, not a textbook exercise. That’s when the understanding sticks.
The learning-opportunity skill makes this intentional. When you want to understand what was built: “Explain this to me.” Claude gives three depth levels: conceptual overview, practical mechanics, advanced internals. You pick the one that’s useful right now.
The progression over time:
- Early: Run pipelines, ask “what did this just do?”
- Weeks in: Modify code Claude wrote. Adjust parameters, add filters, change logic.
- Months in: Describe new system requirements, Claude builds, you review and direct the architecture.
- Later: You design the system, Claude implements, you own the architecture decisions.
The coding knowledge accumulates as a side effect of solving real problems. You were never studying Python in isolation. You were building a search term classifier, and Python was the implementation language.
Principles
- Plan with gates, execute freely. Confirm the architecture once, then let it run.
- CLI over MCP in Claude Code. Zero context overhead.
- Route models by task type. Opus for reasoning, Haiku for execution.
- Docs are the memory. Schema docs + metric definitions + GOAL.md files = correct output on the first attempt.
- Self-improving loop. Every mistake becomes a standing rule.
- Evaluate everything. Gold sets, accuracy tracking, regression detection. You can’t improve what you don’t measure.
- The system is the asset. Not any individual script, but the connected pipeline that runs, evaluates, and improves.
Last updated March 2026.