A CPA firm owner dreads January each year because they must manually classify every contractor as 1099-NEC or non-reportable, chase down missing W-9s via email, and prepare filings. With dozens of clients and hundreds of contractors, the process is error-prone and stressful. The owner needs an automated system that classifies contractors based on payment data, sends polite W-9 requests, and tracks completion to ensure timely filing.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building a 1099-NEC Contractor Classification and W-9 Collection Agent — a Next.js + TypeScript backend that automates year-end contractor tax reporting. You’ll wire up an AI agent that classifies contractors as 1099-NEC, non-reportable, or unclassified based on payment data; sends W-9 requests via Gmail; parses received W-9 PDFs to verify TINs; and generates 1099-NEC forms as PDFs. The system uses OpenAI’s Responses API, Supabase for contractor storage, Langfuse for tracing, and six REAA agent-framework packages for handoff, memory, caching, budget control, runbooks, and evaluation.
Prerequisites
Node.js >= 22 and pnpm 10+ installed
An OpenAI API key with access to gpt-5.2-mini
A Supabase project (URL and anon key)
Gmail API credentials (client ID, client secret, refresh token, redirect URL) with the Gmail API enabled
A Langfuse account (public key, secret key, host URL)
Familiarity with TypeScript, Next.js App Router, and vitest
Step 1: Set up the project
The scaffold agent has already created the Next.js 16 App Router shell and installed dependencies. Here’s what package.json looks like:
export type { Contractor, Classification, W9Status } from './contractor.js';export type { ClassificationResult } from './classification.js';export type { W9Request, W9RequestStatus } from './w9.js';export type { Form1099 } from './form1099.js';
Expected output:pnpm typecheck exits 0.
Step 3: Create the Supabase contractor store
The contractor store wraps @supabase/supabase-js and provides CRUD operations against a contractors table. Create src/services/contractor-store.ts:
ts
import { createClient } from '@supabase/supabase-js';import type { Contractor, Classification, W9Status } from '../types/contractor.js';const supabase = createClient( process.env.SUPABASE_URL!, process.env.SUPABASE_ANON_KEY!,);export function getStore() { return supabase;}function toContractor(row: Record<string, unknown>): Contractor { return { id: row.id as string, name: row.name as string,
The getStore() export gives direct access to the Supabase client for raw queries. upsertContractor uses the einOrSsn column as a conflict key, so re-importing the same contractor updates their record.
Expected output: TypeScript compiles without errors.
Step 4: Build the OpenAI client
Create src/services/openai-client.ts to wrap the OpenAI Responses API. It provides two functions: one classifies a contractor and one drafts a W-9 request email.
ts
import OpenAI from 'openai';import type { Contractor } from '../types/contractor.js';import type { ClassificationResult } from '../types/classification.js';let _client: OpenAI | null = null;function getClient(): OpenAI { if (!_client) { _client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY!, }); } return _client;}export { getClient };export async function classifyContractor(contractor: Contractor): Promise<ClassificationResult> { const client = getClient(); const input = JSON.stringify({ name: contractor.name, einOrSsn: contractor.einOrSsn, totalPaidYtd: contractor.totalPaidYtd, paymentCount: contractor.paymentCount, }); const response = await client.responses.create({ model: 'gpt-5.2-mini', instructions: 'You are a CPA classification expert. Classify this contractor as 1099-nec (paid >=$600 for services, not an employee) or non-reportable (paid <$600, corporation, or employee relationship). Return JSON: { classification, confidence (0-1), reasoning }.', input, }); const parsed = JSON.parse(response.output_text) as { classification: '1099-nec' | 'non-reportable' | 'unclassified'; confidence: number; reasoning: string; }; return { contractorId: contractor.id, classification: parsed.classification, confidence: parsed.confidence, reasoning: parsed.reasoning, modelUsed: 'gpt-5.2-mini', classifiedAt: new Date(), };}export async function generateW9EmailDraft( contractor: Contractor,): Promise<{ subject: string; bodyHtml: string }> { const client = getClient(); const response = await client.responses.create({ model: 'gpt-5.2-mini', instructions: 'Draft a polite professional email requesting a completed W-9 form. Return JSON: { subject, bodyHtml }. The subject should be a clear, professional subject line for a W-9 request email. The bodyHtml should be the HTML body of the email.', input: `Contractor: ${contractor.name}, Email: ${contractor.email}`, }); const text = response.output_text; try { const parsed = JSON.parse(text) as { subject: string; bodyHtml: string }; return { subject: parsed.subject, bodyHtml: parsed.bodyHtml, }; } catch { return { subject: 'W-9 Form Request', bodyHtml: text, }; }}
Both functions use the OpenAI Responses API (responses.create), not the older chat completions endpoint. getClient() is exported so other modules can reuse the same lazy-initialized instance.
Expected output:pnpm typecheck exits 0.
Step 5: Add LLM caching
The @reaatech/llm-cache package provides a semantic caching layer. Create src/services/cache-service.ts:
The cache uses cosine similarity against OpenAI embeddings with text-embedding-3-small. When it finds a query above the 0.8 similarity threshold, it returns the stored result — saving both API cost and latency. invalidateByUseCase lets you clear caches for a specific use case (e.g. after a pipeline run).
Expected output:cachedClassify wraps any classification call with the semantic cache.
Step 6: Set up agent memory
Create src/services/memory-service.ts using @reaatech/agent-memory. This stores contractor classification context as structured memories for later retrieval:
ts
import { AgentMemory, OpenAILLMProvider, MemoryType } from '@reaatech/agent-memory';import type { Memory } from '@reaatech/agent-memory';let memory: AgentMemory | null = null;export function getMemory(): AgentMemory { if (!memory) { memory = new AgentMemory({ storage: { provider: 'memory' }, embedding: { provider: 'openai', model: 'text-embedding-3-small', apiKey: process.env.OPENAI_API_KEY!, }, extraction: { llmProvider: new OpenAILLMProvider({ apiKey: process.env.OPENAI_API_KEY!, model: 'gpt-4o-mini', }), enabledTypes: [MemoryType.FACT, MemoryType.PREFERENCE], batchSize: 10, confidenceThreshold: 0.7, }, }); } return memory;}export async function storeContractorMemory( _contractorId: string, _context: Record<string, unknown>,): Promise<Memory[]> { const mem = getMemory(); return mem.extractAndStore([ { speaker: 'agent' as const, content: JSON.stringify(_context), timestamp: new Date() }, ]);}export async function retrieveRelevantMemories(query: string, limit = 5): Promise<Memory[]> { const mem = getMemory(); return mem.retrieve(query, { limit });}export async function shutdownMemory(): Promise<void> { if (memory) { await memory.close(); memory = null; }}
The memory service extracts facts and preferences from classification conversations and stores them for later semantic retrieval. shutdownMemory cleans up the in-memory store.
Expected output: After classifying a contractor, you can call retrieveRelevantMemories to recall prior classification context.
Step 7: Add budget controls
Create src/services/budget-controller.ts using @reaatech/agent-budget-engine. It tracks per-contractor spend and enforces budget thresholds:
ts
import { BudgetController, PolicyEvaluator, DowngradeEngine, ToolFilter } from '@reaatech/agent-budget-engine';class InMemorySpendStore { private store = new Map<string, number>(); record(scopeType: string, scopeKey: string, cost: number) { const key = `${scopeType}:${scopeKey}`; this.store.set(key, (this.store.get(key) ?? 0) + cost); } getTotal(scopeType: string, scopeKey: string): number { return this.store.get(`${scopeType}:${scopeKey}`) ?? 0; }}const spendTracker = new InMemorySpendStore();const controller = new BudgetController({ spendTracker: spendTracker as never, defaultEstimateTokens: 1000,});export async function checkBudget( scopeType: string, scopeKey: string, estimatedCost: number, modelId: string,) { const result = await controller.check({ scopeType: scopeType as never, scopeKey, estimatedCost, modelId, tools: [], }); return result;}export async function recordSpend( requestId: string, scopeType: string, scopeKey: string, cost: number,): Promise<void> { await controller.record({ requestId, scopeType: scopeType as never, scopeKey, cost, inputTokens: 0, outputTokens: 0, modelId: 'gpt-5.2-mini', provider: 'openai', timestamp: new Date(), });}export function getBudgetState(scopeType: string, scopeKey: string) { return controller.getState(scopeType as never, scopeKey);}// Subscribe to eventscontroller.on('threshold-breach', (event) => { console.warn(`Budget threshold breached for ${event.scopeType}:${event.scopeKey}`);});controller.on('hard-stop', (event) => { console.error(`Budget hard stop for ${event.scopeType}:${event.scopeKey}`);});export { controller, InMemorySpendStore, PolicyEvaluator, DowngradeEngine, ToolFilter };
Every classification call checks budget before proceeding and records spend afterward. Threshold breaches emit warnings; hard stops halt further spending.
Expected output: Running pnpm typecheck exits 0.
Step 8: Build the classifier agent
Create src/services/classifier-agent.ts. This is the core orchestrator that ties together OpenAI, caching, memory, and budget for a single classification call:
ts
import { getContractor, updateContractor } from '../services/contractor-store.js';import { classifyContractor } from '../services/openai-client.js';import { checkBudget, recordSpend } from '../services/budget-controller.js';import { cachedClassify } from '../services/cache-service.js';import { storeContractorMemory } from '../services/memory-service.js';import type { Contractor } from '../types/contractor.js';import type { ClassificationResult } from '../types/classification.js';export async function classifySingle( contractorId: string,): Promise<ClassificationResult> { const contractor = await getContractor(contractorId); if (!contractor) { throw new Error(`Contractor not found: ${contractorId}`); } await checkBudget('contractor', contractorId, 0.002, 'gpt-5.2-mini'); const cachePrompt = JSON.stringify({ name: contractor.name, totalPaidYtd: contractor.totalPaidYtd, paymentCount: contractor.paymentCount, }); const result = await cachedClassify(cachePrompt, async () => { const res = await classifyContractor(contractor); if (res.confidence < 0.6) { return { ...res, classification: 'unclassified' as const, }; } return res; }); await updateContractor(contractorId, { classification: result.classification, } as Partial<Contractor>); await storeContractorMemory(contractorId, { contractor, classification: result, }); await recordSpend(contractorId, 'contractor', contractorId, 0.002); return result;}export async function batchClassify( contractorIds: string[], concurrency = 5,): Promise<ClassificationResult[]> { if (contractorIds.length === 0) return []; const results: ClassificationResult[] = []; const queue = [...contractorIds]; async function worker() { while (queue.length > 0) { const id = queue.shift()!; const result = await classifySingle(id); results.push(result); } } const workers = Array.from({ length: Math.min(concurrency, contractorIds.length) }, () => worker()); await Promise.all(workers); return results;}
Expected output: A call to classifySingle('c1') fetches the contractor, checks budget, hits the cache (or calls OpenAI), updates the contractor record, stores memory, and logs spend. Low-confidence results (< 0.6) are marked unclassified.
Step 9: Create the W-9 handler
Create src/services/w9-handler.ts. This manages the full W-9 lifecycle: sending requests, sending reminders, recording PDF receipt with TIN extraction, and expiring stale requests.
ts
import pdf from 'pdf-parse';import type { W9Request } from '../types/w9.js';import { getContractor, updateContractor } from './contractor-store.js';import { sendEmail } from './email-service.js';import { generateW9EmailDraft } from './openai-client.js';import { newW9Id } from './runbook-service.js';const w9Store = new Map<string, W9Request>();function getStatusDate(w9Request: W9Request): Date { return w9Request.sentAt;}function daysSince(date: Date): number { const diff = Date.now() - date.getTime(); return Math.floor(diff / (1000 * 60 * 60 * 24));}function createDataUrl(buffer: Buffer, mimeType: string): string { const base64 = buffer.toString('base64'); return `data:${mimeType};base64,${base64}`;}export async function requestW9(contractorId: string): Promise<W9Request> { const contractor = await getContractor(contractorId); if (!contractor) { throw new Error(`Contractor not found: ${contractorId}`); } const now = new Date(); const expiresAt = new Date(now.getTime() + 120 * 24 * 60 * 60 * 1000); const id = newW9Id(); const emailDraft = await generateW9EmailDraft(contractor); await sendEmail(contractor.email, emailDraft.subject, emailDraft.bodyHtml); const w9Request: W9Request = { id, contractorId, sentAt: now, status: 'sent', expiresAt, }; w9Store.set(id, w9Request); await updateContractor(contractorId, { w9Status: 'sent' }); return w9Request;}
The handler also includes:
sendReminder — sends a follow-up email with a REMINDER: prefix if at least 7 days have passed since the last request and the status is sent or reminded.
recordW9Receipt — takes a PDF buffer, parses it with pdf-parse, uses a regex (\d{2}-\d{7}) to extract the TIN, compares it to the stored contractor record, and flags discrepancies. Returns { extractedTin, extractedName, discrepancies }.
getW9Status — returns the most recent W-9 request for a contractor (or null if none exist).
expireStaleRequests — marks requests older than 120 days as expired and returns the count of expired items.
The w9Store is an in-memory Map (see “Next steps” for making it durable).
Expected output: A W-9 request generates an OpenAI-drafted email, sends it via Gmail, and stores the request. Receiving a W-9 extracts the TIN from the PDF and compares it to the stored EIN/SSN.
Step 10: Build the 1099-NEC form generator
Create src/services/form1099-generator.ts. It uses pdf-lib to generate IRS-style 1099-NEC forms as PDFs and xlsx to generate summary reports:
ts
import { PDFDocument, StandardFonts, rgb } from 'pdf-lib';import * as XLSX from 'xlsx/xlsx.mjs';import type { Form1099 } from '../types/form1099.js';import type { Contractor } from '../types/contractor.js';import { getContractor, listContractors } from './contractor-store.js';import { newFormId } from './runbook-service.js';const formStore = new Map<string, Form1099>();function generateFormId(): string { return newFormId();}export
The file also includes generateSummaryReport, which creates an XLSX workbook with one row per 1099-NEC contractor (name, EIN/SSN, total paid, W-9 status).
Expected output: A single generateForm1099 call produces a data-URL-encoded PDF with IRS-style labels. Batch generation processes every eligible contractor and returns separate arrays for generated vs skipped (below $600 threshold).
Step 11: Wire up the year-end pipeline
Create src/agent/year-end-pipeline.ts. This is the high-level orchestrator that runs a full year-end processing cycle: classify unclassified contractors, request W-9s from 1099-NEC contractors who haven’t submitted one, and generate 1099-NEC forms for those who have:
ts
import { listContractors, getContractor } from '../services/contractor-store.js';import { classifySingle } from '../services/classifier-agent.js';import { requestW9 } from '../services/w9-handler.js';import { generateForm1099, generateSummaryReport } from '../services/form1099-generator.js';import { withTrace } from '../services/langfuse-service.js';import type { Contractor } from '../types/contractor.js';export async function runYearEndPipeline(taxYear: number): Promise<{ formsGenerated: number; w9sRequested: number; skipped: number; errors: string[];}> { return withTrace('year-end-pipeline', async () => { const errors: string[] = []; let formsGenerated = 0; let w9sRequested = 0; let skipped = 0; const allContractors = await listContractors(); for (const contractor of allContractors) { try { if (contractor.classification === 'non-reportable') { skipped++; continue; } if (contractor.classification === 'unclassified') { await classifySingle(contractor.id); const updatedContractor = await getContractor(contractor.id); if (updatedContractor && updatedContractor.classification === 'unclassified') { skipped++; continue; } if (updatedContractor && updatedContractor.w9Status === 'not_requested') { await requestW9(contractor.id); w9sRequested++; } } if (contractor.classification === '1099-nec' && contractor.w9Status !== 'received') { await requestW9(contractor.id); w9sRequested++; continue; } if ( contractor.classification === '1099-nec' && contractor.totalPaidYtd >= 600 && contractor.w9Status === 'received' ) { await generateForm1099(contractor.id, taxYear); formsGenerated++; } } catch (err) { errors.push(`Error processing contractor ${contractor.id}: ${err}`); } } await generateSummaryReport(taxYear); return { formsGenerated, w9sRequested, skipped, errors }; });}
The pipeline wraps the entire run in a Langfuse trace via withTrace. Each contractor is processed individually so an error with one doesn’t block the others. The file also exports getPipelineStatus(), which returns a dashboard-style snapshot of all contractor states for monitoring.
Expected output: Running the pipeline with 4 sample contractors produces 1 form generated, 2 W-9s requested, and 1 skipped (non-reportable).
Step 12: Create API routes
Now wire the services into Next.js App Router route handlers. All routes use NextRequest/NextResponse from next/server and export named functions for each HTTP verb.
app/api/contractors/route.ts — list and create contractors:
app/api/contractors/[id]/route.ts — get, update, delete a single contractor. Uses params: Promise<{ id: string }> for Next 16 dynamic route compatibility.
app/api/classify/route.ts — classify a single contractor:
ts
import { NextRequest, NextResponse } from 'next/server.js';import { z } from 'zod';import { classifySingle } from '../../../src/services/classifier-agent.js';import { validateWithSchema } from '../../../src/services/runbook-service.js';const bodySchema = z.object({ contractorId: z.string().min(1),});export async function POST(req: NextRequest) { const parsed = validateWithSchema(bodySchema, await req.json()); if (!parsed.valid) { return NextResponse.json({ error: parsed.errors }, { status: 400 }); } const result = await classifySingle(parsed.data!.contractorId); return NextResponse.json(result);}
app/api/w9/request/route.ts — request a W-9 for a contractor:
ts
import { NextRequest, NextResponse } from 'next/server.js';import { z } from 'zod';import { requestW9 } from '../../../../src/services/w9-handler.js';import { validateWithSchema } from '../../../../src/services/runbook-service.js';const bodySchema = z.object({ contractorId: z.string().min(1),});export async function POST(req: NextRequest) { const parsed = validateWithSchema(bodySchema, await req.json()); if (!parsed.valid) { return NextResponse.json({ error: parsed.errors }, { status: 400 }); } const w9Request = await requestW9(parsed.data!.contractorId); return NextResponse.json(w9Request);}
app/api/w9/receive/route.ts — accept W-9 PDF upload and extract TIN:
ts
import { NextRequest, NextResponse } from 'next/server.js';import { recordW9Receipt, getW9Status } from '../../../../src/services/w9-handler.js';export async function POST(req: NextRequest) { const formData = await req.formData(); const contractorId = formData.get('contractorId') as string | null; const file = formData.get('file') as Blob | null; if (!contractorId || !file) { return NextResponse.json({ error: 'contractorId and file fields are required' }, { status: 400 }); } const currentStatus = await getW9Status(contractorId); if (currentStatus && currentStatus.status === 'received') { return NextResponse.json({ error: 'W-9 already received' }, { status: 409 }); } const arrayBuffer = await file.arrayBuffer(); const buffer = Buffer.from(arrayBuffer); const result = await recordW9Receipt(contractorId, buffer); return NextResponse.json(result);}
app/api/forms/1099/generate/route.ts — generate a single 1099-NEC form. Validates contractorId and taxYear, checks eligibility (>= $600 threshold), and returns the generated form.
app/api/forms/1099/batch/route.ts — batch generate 1099-NEC forms for all eligible contractors. Returns { generated: count, skipped: count, urls: string[] }.
app/api/forms/report/route.ts — generate an XLSX summary report. Accepts optional taxYear query parameter (defaults to current year).
app/api/dashboard/route.ts — returns a status snapshot of all contractors. Expected output shape:
Expected output: All routes use NextRequest/NextResponse and NextResponse.json(). Dynamic route params use params: Promise<{ id: string }> for Next 16 compatibility.
Step 13: Write and run the test suite
The test suite uses vitest with MSW (Mock Service Worker) to intercept HTTP calls to OpenAI, Supabase, and Gmail. Create tests/setup.ts:
Services are mocked with vi.mock and loaded dynamically with await import(...). Here’s the year-end pipeline test (tests/agent/year-end-pipeline.test.ts):
ts
import { describe, it, expect, vi } from 'vitest';// Contractor fixturesconst necContractor = { id: 'c1', name: 'NEC Corp', email: 'nec@example.com', einOrSsn: '12-3456789', address: '123 Main St', totalPaidYtd: 15000, paymentCount: 12, classification: '1099-nec' as const, w9Status: 'received' as const, createdAt: new Date(), updatedAt: new Date(),};const unclassifiedContractor = { id: 'c2', name: 'Unclassified LLC', email: 'unclass@example.com', einOrSsn: '98-7654321'
Run the full suite:
terminal
pnpm test
Expected output: All tests pass. Coverage thresholds of 90%+ on lines, branches, functions, and statements for runtime code under src/**/*.ts and app/**/route.ts.
Step 14: Wire up the remaining services
Several supporting services complete the agent. These are already in the codebase and used by the main flow described above:
src/services/email-service.ts — wraps the Gmail API via googleapis. Creates an OAuth2 client with your Google credentials and sends HTML emails through gmail.users.messages.send.
src/services/langfuse-service.ts — initializes Langfuse for tracing. withTrace() wraps any async function in a trace, automatically flushing and handling unreachable Langfuse servers gracefully.
src/services/runbook-service.ts — provides newW9Id(), newFormId(), and validateWithSchema() using @reaatech/agent-runbook. The validation function wraps Zod schemas and returns { valid, data, errors } tuples.
src/services/handoff-service.ts — provides classifyWithRetry (exponential backoff retry logic wrapped in @reaatech/agent-handoff’s withRetry) and a typed event bus (TypedEventEmitter) for lifecycle events like classification_complete, w9_received, and form_generated.
src/services/spreadsheet-service.ts — exports exportContractorsToXlsx and exportFilingSummaryToXlsx for generating Excel exports from contractor data.
src/services/eval-service.ts — runs classification evaluation via @reaatech/agent-eval-harness-suite, measuring faithfulness, relevance, cost, and latency.
src/agent/contract-classification-agent.ts — higher-level agent that composes classification with retry, cache, trace, budget control, and event emission.
These modules are used by the core pipeline but don’t need their own steps in this walkthrough — they’re ready to use.
Expected output: The full agent compiles with pnpm typecheck and all tests pass with pnpm test.
Next steps
Persist W-9 requests and forms — Move from the in-memory Map stores (w9Store, formStore) to Supabase tables for durability across server restarts.
Add webhook notifications — Use the eventBus from handoff-service.ts to fire webhooks when a W-9 is received or a 1099 is generated, so downstream billing systems can react.
Schedule the pipeline — Wire runYearEndPipeline into a cron job (e.g., Vercel Cron Jobs) to run monthly during tax season.
Build a dashboard UI — Replace the placeholder app/page.tsx with real React Server Components fetching from /api/dashboard for a visual overview.
email: row.email as string,
einOrSsn: row.einOrSsn as string,
address: row.address as string,
totalPaidYtd: row.totalPaidYtd as number,
paymentCount: row.paymentCount as number,
classification: row.classification as Classification,