Files · Google Gemini Lead Intake for Salesforce SMB Sales
82 (1 binary, 579.2 kB total)attempt 2
README.md·4943 B·markdown
markdown
# Google Gemini Lead Intake for Salesforce SMB Sales
> Automatically extract and qualify leads from email attachments and forms, pushing structured lead records directly to Salesforce.
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
## Problem
Sales teams receive hundreds of leads buried in email attachments and web forms. Manually extracting contact details, qualifying intent, and entering data into Salesforce is slow, error-prone, and delays follow-up. This recipe automates the entire pipeline.
## Architecture
```
Inbound (email/form) → Document Parsing → Gemini Field Extraction → PII Scrubbing
→ Confidence Routing → Salesforce Upsert
```
1. **Inbound** — HTTP POST receives files and form fields
2. **Document Parsing** — `unpdf` extracts PDF text; `tesseract.js` OCRs images; plain text passes through
3. **Field Extraction** — `@google/genai` (Gemini 2.5 Flash) extracts structured lead fields from raw text
4. **PII Scrubbing** — `@presidio-dev/hai-guardrails` detects injection attacks and redacts sensitive data
5. **LLM Caching** — `@reaatech/llm-cache` caches repeated extraction prompts to reduce cost and latency
6. **Budget Enforcement** — `@reaatech/agent-budget-engine` tracks spend per lead and rejects over-budget requests
7. **Confidence Routing** — `@reaatech/confidence-router` classifies leads as `route`/`clarify`/`fallback`
8. **Salesforce Upsert** — `jsforce` pushes qualified leads to Salesforce, keyed by email
9. **Observability** — `langfuse` traces every pipeline execution
## Dependencies
| Package | Role |
|---------|------|
| `@google/genai` | Google Gemini API client for field extraction |
| `@presidio-dev/hai-guardrails` | Prompt injection detection and PII redaction |
| `@reaatech/agent-budget-engine` | Per-lead budget tracking and enforcement |
| `@reaatech/classifier-evals` | Evaluation metrics for the classifier |
| `@reaatech/confidence-router` | Route/clarify/fallback decision engine |
| `@reaatech/hybrid-rag` | Document chunking and retrieval (type-only) |
| `@reaatech/llm-cache` | Semantic caching for Gemini calls |
| `jsforce` | Salesforce SOAP/REST API client |
| `langfuse` | LLM observability and tracing |
| `next` | Next.js App Router (API routes and UI) |
| `tesseract.js` | OCR for image-based lead documents |
| `unpdf` | PDF text extraction |
| `zod` | Runtime schema validation |
## Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `GEMINI_API_KEY` | Yes | — | Google Gemini API key |
| `OPENAI_API_KEY` | Yes | — | OpenAI key for `@reaatech/llm-cache` embedder |
| `SALESFORCE_USERNAME` | Yes | — | Salesforce login username |
| `SALESFORCE_PASSWORD` | Yes | — | Salesforce login password |
| `SALESFORCE_SECURITY_TOKEN` | Yes | — | Salesforce security token |
| `LANGFUSE_PUBLIC_KEY` | Yes | — | Langfuse project public key |
| `LANGFUSE_SECRET_KEY` | Yes | — | Langfuse project secret key |
| `LANGFUSE_HOST` | No | `https://cloud.langfuse.com` | Langfuse API host |
| `LEAD_BUDGET_LIMIT` | No | `10.0` | Max USD per lead budget scope |
| `CONFIDENCE_ROUTE_THRESHOLD` | No | `0.8` | Confidence to auto-route |
| `CONFIDENCE_FALLBACK_THRESHOLD` | No | `0.3` | Confidence below which leads fall back |
## Getting Started
```bash
pnpm install
cp .env.example .env # fill in real credentials
pnpm dev # http://localhost:3000
pnpm test # vitest run with coverage
pnpm typecheck # TypeScript check
pnpm lint # ESLint check
```
## API Usage
Submit a lead as a file attachment:
```bash
curl -X POST http://localhost:3000/api/inbound \
-F "file=@lead.pdf" \
-F "source=email_attachment"
```
Submit from a web form (plain text):
```bash
curl -X POST http://localhost:3000/api/inbound \
-F "file=@-;type=text/plain" \
-F "source=web_form" \
-F "name=John" \
-F "company=Acme" \
<<< "John from Acme Corp needs CRM software"
```
### Success Response (200)
```json
{
"lead": {
"name": "John Smith",
"company": "Acme Corp",
"email": "john@acme.com",
"phone": "+1-555-0100",
"needs": "CRM software",
"source": "email_attachment",
"confidence": 0.92,
"routingDecision": "route",
"salesforceId": "00Qxxxxxxxxxxxxx"
}
}
```
### Error Responses
| Code | Condition |
|------|-----------|
| 400 | No file provided |
| 415 | Unsupported file type |
| 422 | Injection detected in input |
| 429 | Lead budget exhausted |
| 500 | Internal server error |
## Testing
```bash
pnpm test # full suite with coverage
pnpm typecheck # TypeScript type check
pnpm lint # ESLint lint
```
Coverage thresholds: 90% lines, branches, functions, and statements.
## License
MIT — see [LICENSE](./LICENSE).