Skip to content
reaatechREAATECH

Files · Google Gemini Lead Intake for Salesforce SMB Sales

82 (1 binary, 579.2 kB total)attempt 2

README.md·4943 B·markdown
markdown
# Google Gemini Lead Intake for Salesforce SMB Sales
 
> Automatically extract and qualify leads from email attachments and forms, pushing structured lead records directly to Salesforce.
 
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
 
## Problem
 
Sales teams receive hundreds of leads buried in email attachments and web forms. Manually extracting contact details, qualifying intent, and entering data into Salesforce is slow, error-prone, and delays follow-up. This recipe automates the entire pipeline.
 
## Architecture
 
```
Inbound (email/form) → Document Parsing → Gemini Field Extraction → PII Scrubbing
  → Confidence Routing → Salesforce Upsert
```
 
1. **Inbound** — HTTP POST receives files and form fields
2. **Document Parsing**`unpdf` extracts PDF text; `tesseract.js` OCRs images; plain text passes through
3. **Field Extraction**`@google/genai` (Gemini 2.5 Flash) extracts structured lead fields from raw text
4. **PII Scrubbing**`@presidio-dev/hai-guardrails` detects injection attacks and redacts sensitive data
5. **LLM Caching**`@reaatech/llm-cache` caches repeated extraction prompts to reduce cost and latency
6. **Budget Enforcement**`@reaatech/agent-budget-engine` tracks spend per lead and rejects over-budget requests
7. **Confidence Routing**`@reaatech/confidence-router` classifies leads as `route`/`clarify`/`fallback`
8. **Salesforce Upsert**`jsforce` pushes qualified leads to Salesforce, keyed by email
9. **Observability**`langfuse` traces every pipeline execution
 
## Dependencies
 
| Package | Role |
|---------|------|
| `@google/genai` | Google Gemini API client for field extraction |
| `@presidio-dev/hai-guardrails` | Prompt injection detection and PII redaction |
| `@reaatech/agent-budget-engine` | Per-lead budget tracking and enforcement |
| `@reaatech/classifier-evals` | Evaluation metrics for the classifier |
| `@reaatech/confidence-router` | Route/clarify/fallback decision engine |
| `@reaatech/hybrid-rag` | Document chunking and retrieval (type-only) |
| `@reaatech/llm-cache` | Semantic caching for Gemini calls |
| `jsforce` | Salesforce SOAP/REST API client |
| `langfuse` | LLM observability and tracing |
| `next` | Next.js App Router (API routes and UI) |
| `tesseract.js` | OCR for image-based lead documents |
| `unpdf` | PDF text extraction |
| `zod` | Runtime schema validation |
 
## Environment Variables
 
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `GEMINI_API_KEY` | Yes | — | Google Gemini API key |
| `OPENAI_API_KEY` | Yes | — | OpenAI key for `@reaatech/llm-cache` embedder |
| `SALESFORCE_USERNAME` | Yes | — | Salesforce login username |
| `SALESFORCE_PASSWORD` | Yes | — | Salesforce login password |
| `SALESFORCE_SECURITY_TOKEN` | Yes | — | Salesforce security token |
| `LANGFUSE_PUBLIC_KEY` | Yes | — | Langfuse project public key |
| `LANGFUSE_SECRET_KEY` | Yes | — | Langfuse project secret key |
| `LANGFUSE_HOST` | No | `https://cloud.langfuse.com` | Langfuse API host |
| `LEAD_BUDGET_LIMIT` | No | `10.0` | Max USD per lead budget scope |
| `CONFIDENCE_ROUTE_THRESHOLD` | No | `0.8` | Confidence to auto-route |
| `CONFIDENCE_FALLBACK_THRESHOLD` | No | `0.3` | Confidence below which leads fall back |
 
## Getting Started
 
```bash
pnpm install
cp .env.example .env   # fill in real credentials
pnpm dev               # http://localhost:3000
pnpm test              # vitest run with coverage
pnpm typecheck         # TypeScript check
pnpm lint              # ESLint check
```
 
## API Usage
 
Submit a lead as a file attachment:
 
```bash
curl -X POST http://localhost:3000/api/inbound \
  -F "file=@lead.pdf" \
  -F "source=email_attachment"
```
 
Submit from a web form (plain text):
 
```bash
curl -X POST http://localhost:3000/api/inbound \
  -F "file=@-;type=text/plain" \
  -F "source=web_form" \
  -F "name=John" \
  -F "company=Acme" \
  <<< "John from Acme Corp needs CRM software"
```
 
### Success Response (200)
 
```json
{
  "lead": {
    "name": "John Smith",
    "company": "Acme Corp",
    "email": "john@acme.com",
    "phone": "+1-555-0100",
    "needs": "CRM software",
    "source": "email_attachment",
    "confidence": 0.92,
    "routingDecision": "route",
    "salesforceId": "00Qxxxxxxxxxxxxx"
  }
}
```
 
### Error Responses
 
| Code | Condition |
|------|-----------|
| 400 | No file provided |
| 415 | Unsupported file type |
| 422 | Injection detected in input |
| 429 | Lead budget exhausted |
| 500 | Internal server error |
 
## Testing
 
```bash
pnpm test              # full suite with coverage
pnpm typecheck         # TypeScript type check
pnpm lint              # ESLint lint
```
 
Coverage thresholds: 90% lines, branches, functions, and statements.
 
## License
 
MIT — see [LICENSE](./LICENSE).