Digital Case Worker — AI-Powered Government Form Assistance Platform
Compliance-First · Secure by Design · Built on Google's Infrastructure
Version
v1.0 — Phase 1 Scope
Motto
"The expertise of a case worker, the speed of AI, the heart of a navigator."
Infrastructure
Google Cloud / Vertex AI
MVP Focus
SSA / VA · 1–3 Form Types
⚠ Confirmed Out of Scope:
The "Pulse" portal login / automated government status scraping feature (originally Suggestion #5) has been removed from all build phases following client discussion. No portal credentials will be stored, no RPA/bot check-ins will occur. Status notifications are limited to internal case stage updates only.
Before We Write a Single Line of Code
Four Conversations We Must Have First
Yes — we absolutely need a design and strategy discussion before anything is built. Here are the four mandatory pre-build conversations, what needs to be resolved in each, and why skipping any of them creates expensive rework downstream.
🎨
Design System Discussion
We need a dedicated design session before the first pixel is placed. This platform handles sensitive personal data for vulnerable users — every design decision carries trust implications.
Define brand voice and visual language (not just a logo)
Establish accessibility baseline: WCAG 2.1 AA minimum
Agree on mobile-first vs. responsive priorities
Bilingual UI behavior: language switcher placement, RTL fallback plans
Decide on component library (recommendation: Material Design + custom tokens)
Define "anxiety reduction" principles — this UI serves people under stress
Prototype intake tone: formal government-adjacent vs. warm conversational
Define what "trust" looks like visually — disclaimers, consent screens, progress indicators
Milestone 0 prerequisite
🗺
User Journey Mapping
The full journey from first visit to delivered form must be agreed on paper before architecture begins. It determines auth flows, data persistence points, notification triggers, and hand-off protocols.
Map every touchpoint: onboarding, intake, document upload, review, delivery
Define "save and resume" behavior and session timeout policies
Clarify how a reviewer requests more info from the client
Define what "form delivered" means: PDF download, secure link, email, all three?
Define the language selection UX — when and how does the user choose?
Clarify multi-case scenarios: can one user have multiple active applications?
Informs data model design
💰
Cost Analysis with Client
The client needs a realistic cost conversation now — not after build. Platform running costs, AI usage, document storage, and cost per case all need to be modelled before committing to a pricing model.
Platform infrastructure and security services baseline (~$800–1,200/month at MVP)
Vertex AI token cost: ~$0.0005–0.002 per 1K tokens depending on model
Encrypted document storage
Document reading and extraction: ~$1.50 per 1,000 pages
Application hosting and scaling services
Estimate per-case cost (target: under $0.50/case at scale)
Development budget vs. infrastructure budget split
Staging vs. production environment cost separation
Client approval needed
🤖
Why Vertex AI, Not OpenAI/Anthropic API
The client must understand this decision — it's not a preference, it's a security and compliance imperative for a platform handling PII, disability records, and immigration data.
Data never leaves Google's infrastructure boundary
Customer Data Processing terms prevent training on PII inputs
AI processing stays within the secure private network — no external data transfer
SOC2/HIPAA-eligible infrastructure out of the box
Gemini fine-tuning capability for SSA/VA-specific language without data leaving GCP
Explainability logging: every AI decision is auditable
Cost at scale: committed use discounts, no OpenAI premium
Single vendor accountability: GCP, storage, AI, IAM all in one audit scope
Architecture decision locked
The Complete User Journey
From First Login to Delivered Form
This is the full journey a user takes — from arriving on the platform to receiving their completed government form. Every step is agreed before any code is written.
GovEase DCW — End-to-End Case Flow
Applicant (end user) perspective · SSA/VA Initial Claim · English or Spanish intake
1
Landing & Language Selection
User arrives at GovEase. Platform immediately presents language choice (English / Español) before any account creation. This selection is stored in session and user profile.
No PII collected yetLanguage token stored
2
Account Creation & Identity Verification
User creates account with email + password (bcrypt hashed, salted). Two-step verification required via TOTP or SMS. Google Identity Platform handles auth tokens. SSO option for returning users.
Two-step verification requiredAll data encrypted from this pointSecure session tokens
All data encrypted during transmission. Attack protection active from the first request.
3
DCW Disclaimer & Consent
Full-screen "Digital Case Worker" disclaimer presented in selected language. User must actively check boxes (no pre-checked). Consent event logged with timestamp, IP, user ID, and version of consent text. Cannot proceed without completion.
Consent logged immutablyVersion-controlled disclaimer text
4
Case Type Selection & Pre-Flight Eligibility Check
User selects benefit type (SSDI / SSI / VA Disability / VA Healthcare). DCW runs 5-question "Pre-Flight" eligibility screen via Vertex AI. If obvious disqualifier detected (e.g., active work income above SGA for SSDI), user is advised before wasting time on a full application.
AI — Google's platformEligibility logic branching
5
Smart Intake — Conversational Q&A
DCW conducts a branching, conversational intake in the user's chosen language. Questions adapt in real time based on prior answers — only relevant questions are asked. For SSA cases, the system follows the five-step evaluation process. For VA cases, PACT Act presumptive conditions are cross-referenced. All answers stored encrypted.
Vertex AI · Branching logicEncrypted session storageSave & resume supported
Personal identifiers are removed before any AI processing — AI never sees raw Social Security numbers or names.
6
Document Upload & Validation
System presents a dynamic document checklist based on case type and intake answers. User uploads documents (PDF, JPG, PNG — up to 10MB each). Each file is automatically read, type-checked, quality-scored, and expiry-verified. Blurry or incomplete documents trigger plain-English re-upload guidance rather than error codes.
Document text extractionCloud Storage · EncryptedVirus/malware scan on upload
Every uploaded file is scanned for malware before being stored. File names are cleaned to prevent attacks.
7
AI Consistency Check & Data Validation
Before generating the form, the AI cross-checks all answers against each other and against the uploaded documents. Contradictions are caught early — for example, if a stated physical limit conflicts with a described daily activity — and the user is asked to clarify.
Vertex AI · Consistency engineContradiction detectionPre-fill validation
8
Draft Form Generation
Vertex AI maps validated intake data to government form fields. Dual-language handling: intake in Spanish, form output in English (or as configured). System generates a draft PDF/structured data package. Translation of user-provided narrative is handled by Translation service with human review flag for all medical/legal statements.
Vertex AI · Form mappingTranslation serviceDraft PDF generated
9
Mandatory Human Review
Every case goes to a human case worker before anything is finalised — no automatic submission is possible. The reviewer sees a clear comparison of what the user stated versus what their documents show. They can approve, request more information, or escalate. Every action is permanently recorded.
Human in the loop — mandatoryCase worker access loggedDiff view
Case workers can only see their assigned cases. No access to other users' records.
10
Client Review & Acknowledgment
Approved draft is returned to the applicant for final review. Bilingual summary shown. User must actively acknowledge accuracy of all information before finalization. E-signature or digital acknowledgment captured (via DocuSign API or Google Workspace integration). Timestamp and version of form acknowledged is recorded.
Digital acknowledgment capturedBilingual summaryAcknowledgment versioned
11
Finalized Form Delivery
Finalized form package delivered via: (a) secure download link (time-limited, signed URL from Cloud Storage), (b) email notification with instructions. Form package includes the completed government form, a document checklist summary, and filing instructions. The platform does NOT submit to government agencies — user/caseworker submits manually.
Private download linkFiling instructions includedLink valid for 72 hours
12
Data Retention & Case Archival
Case enters retention policy: active cases retained for agreed period (TBD with client), then moved to encrypted cold archive. Data deletion workflow triggered on user request (GDPR/CCPA compliance). Audit logs retained separately per compliance requirements. Deletion confirmed via email to user.
Data retention policy enforcedRight to deletion workflowAudit log preserved
Technology Decision · For the Client
Why We Use Google's AI Platform
GovEase processes sensitive government data — disability records, Social Security numbers, and personal history. The choice of where AI runs is a data protection decision, not just a technology preference. Here is the plain-language case for our approach.
🏠
Your Data Stays in One Place
When AI processes your users' data, it never leaves Google's secure infrastructure. It does not travel to a third-party AI company's servers. Everything stays inside the same environment where it is stored — one boundary, one set of rules.
Data sovereignty
📋
One Compliance Umbrella
Google's infrastructure is certified for healthcare-sensitive and government-sensitive data. By keeping AI, storage, and the application all within Google, we operate under one compliance framework rather than managing separate agreements with multiple vendors.
Simplified compliance
🔍
Every AI Decision is Auditable
Every time AI is used in a case — to ask a question, validate a document, or fill a form field — that decision is logged. During a security audit or a dispute, you can trace exactly what the AI did, when, and why. This is not possible with standard commercial AI APIs.
Full audit trail
💸
Cost Scales With You
Using one provider for all services — hosting, storage, AI, security — means a single bill, volume discounts as the platform grows, and no surprise costs from a separate AI vendor relationship. The per-case AI cost is a fraction of a dollar and decreases as volume increases.
Predictable cost model
✅
What this means for the client: You are not paying for a ChatGPT integration. You are paying for an AI that runs inside a secure, audited, government-data-appropriate environment — with a full record of every decision it makes. That is what passes a security review.
Investment & Running Costs
Cost Analysis Framework
These are the cost dimensions that must be discussed with the client before commitment. Numbers are estimates based on GCP pricing as of 2025/2026 and should be modeled against the client's projected case volume.
Cloud Identity Platform (Auth) MFA, session management, up to 50K MAU free tier
~$0–50 / mo
Email / SMS Notifications SendGrid or Google Workspace + Twilio
~$50–100 / mo
Cost Per Case
Target: under $0.50 / case at 1,000+ cases/month
AI processing (intake + validation + mapping) Vertex AI token cost, amortized
~$0.15 – $0.35
OCR document processing Cloud Vision, ~8 pages
~$0.012
Storage (per case, per year) Documents + audit logs in Cloud Storage
~$0.024
Translation (if bilingual) Translation service
~$0.01
💡
Client pricing model to discuss: Consider a per-case fee (e.g., $15–50/case depending on form complexity), a monthly subscription per reviewer seat, or a hybrid model. At 500 cases per month with a $25/case fee, revenue covers infrastructure and case worker time with margin. Cost per case falls as volume grows — the platform is designed to scale economically.
Project Timeline · Phase 1
Build Plan — Ten Milestones
Ten milestones from first conversation to controlled launch. Design comes first — nothing is built until screens are approved by the client. Security and compliance requirements are built into every milestone from day one.
00
Pre-Build
Discovery Lock & Client Alignment
2 weeks
Deliverables
Confirm 1–3 MVP form types (SSA-16, SSA-3368, VA 21-526EZ recommended)
Lock supported languages: English + Spanish for MVP
Define user roles: Applicant, Reviewer, Admin, Super Admin
Client sign-off on out-of-scope items (portal login confirmed removed)
User journey map approved by client and tech lead
Cost model presented and agreed
GCP project provisioned, IAM baseline configured
Signed Data Processing Agreement in place
01
UI First — Priority
Design System & Core UI
3–4 weeks
Deliverables
Design system: color tokens, typography scale, component library
Language selection + onboarding screen (English/Spanish)
External security testers attempt to break into the platform from the outside
Internal security test — testers attempt to move laterally through the system
Formal check against the industry standard list of web and API vulnerabilities
Automated code analysis (static) and live attack simulation (dynamic) on the full codebase
Every third-party library scanned for known security issues
Security platform findings reviewed and fixed before launch
All user and service permissions reviewed — each account has only what it needs
Compliance gap analysis against SOC2 Type II trust service criteria
09
Launch
UAT, Controlled Launch & Monitoring
2 weeks
Deliverables
Testing with 5–10 real users using sample cases — no real personal data used
Full end-to-end test: every step of the user journey validated
Load test: platform handles 10× expected peak volume for 30 minutes without issues
Disaster recovery test: full backup restored and verified
Written step-by-step guide for security incidents and data breach scenarios
Controlled launch: invite-only, maximum 50 cases in the first two weeks
Live monitoring dashboard: uptime, error rate, and cost tracked in real time
📅
Total estimated timeline: 25–35 weeks from Discovery Lock to controlled launch, depending on client feedback cycle times and pen test remediation complexity. Phase 4 (expansion to more forms and jurisdictions) begins after 60 days of stable controlled launch with no critical security findings.
How We Protect Your Users
How We Protect Your Users' Data
These are the technical controls we are building into the platform. They are designed to pass OWASP Top 10 and API Security Top 10 pen testing. We are not claiming certifications — we are implementing verifiable, testable controls. Where a control is a target rather than confirmed, it is marked as such.
🔐
Who Can Access What
Every user must verify identity with two-step login
Case workers can only see their assigned cases
Admins have minimum access needed — nothing more
Sessions automatically end after 15 minutes of inactivity
🔒
Data Encryption
Every file and record is encrypted at rest and in transit
Personal identifiers are stripped before any AI processing
Backups are separately encrypted with their own keys
Encryption keys are automatically rotated every 90 days
📋
Full Audit Trail
Every action in the platform is permanently recorded
Every document viewed by a case worker is logged
Every AI decision is traceable and explainable
Records kept for a minimum of 7 years
🚨
If Something Goes Wrong
Written plan for data breach scenarios — ready before launch
72-hour breach notification procedure in place
Automated alerts if unusual access patterns are detected
Data deletion can be triggered immediately on request
Technology Overview
Google Cloud Infrastructure Stack
Every component runs within Google's infrastructure. No third-party AI vendors. No data leaving the GCP project boundary. Single audit scope, single compliance framework, single vendor accountability.
Service → Purpose Mapping
All services within single GCP project / VPC
Vertex AI (Gemini)AI / ML Layer
Powers all conversational intake, eligibility logic, consistency checking, and form field mapping. Called exclusively via private VPC endpoint. PII tokenized before every call.
Cloud Vision APIDocument OCR
Extracts text from uploaded documents (IDs, medical records, DD-214). Identifies document type, quality issues, and expiry dates. Results encrypted before storage.
Cloud TranslationMultilingual
Handles Spanish ↔ English translation for intake narratives and form field population. All translated medical/legal content flagged for human review.
Cloud RunApplication Layer
Serverless containerized backend (Node.js / Python). Auto-scales. Connected to VPC via VPC connector. No public IP. Accessed via load balancer only.
Cloud SQLRelational Database
PostgreSQL for case data, user records, audit events, consent records. Private Service Connect — no public IP ever. Encrypted with CMEK. Daily automated backups.
Cloud StorageDocument Vault
Encrypted document storage (uploaded files + generated PDFs). CMEK encryption, versioning enabled, lifecycle policies for retention/deletion. Signed URL access only — no public bucket.
controlledKey Management
Customer-managed encryption keys for all storage. Key rotation every 90 days. Separate key rings for documents, database, and audit logs. Access to keys logged via Cloud Audit Logs.
Cloud Identity PlatformAuthentication
User authentication, MFA (TOTP + SMS), session management, JWT issuance. Replaces need for a custom auth system. Integrates directly with IAM for RBAC.
Cloud ArmorWAF / DDoS
Web Application Firewall with OWASP Top 10 managed rules. DDoS protection at network layer. Rate limiting enforced at CDN edge before requests reach application.
Secret ManagerSecrets
All API keys, database passwords, and credentials stored here. Zero hardcoded secrets in source code or environment variables. Rotation alerts configured.
Cloud Logging / ChronicleSIEM
Centralized security logging. Data Access logs, Admin Activity logs, application logs. Chronicle SIEM for threat detection and incident investigation. 7-year log retention.
Cloud Build + Artifact RegistryCI/CD
Automated build, test, SAST, and deployment pipeline. Binary Authorization enforces that only signed, verified container images can be deployed to production. No manual deployments.
How a Case Moves Through the Platform
Structure & Process Flow
The complete journey a case takes — from the moment someone arrives on the platform to the moment their completed form is delivered. Data protection is active at every step, not just at the end.
Complete Case Journey
From first visit to delivered form — what happens at each step
Start / End
Process
Decision
System Layer
Portal Status Checking
Removed from scope. No automated login to government portals. No credential storage.
Human Review — Mandatory
Every case passes through Staff Verification. No auto-approval path exists in the system.
Agency Submission
Platform generates a complete packet only. The applicant or caseworker submits to the agency manually.
What We Are Building With
Preferred Tech Stack
Plain-language overview of the technologies that power GovEase — what each does and why it was chosen.
Frontend
Client Layer
Next.js 14 Primary
React framework with App Router. SSR for SEO + security (no sensitive state in client JS). TypeScript enforced.
TypeScript
Strict type safety across frontend and API contracts. Reduces entire categories of runtime bugs.
Tailwind CSS
Utility-first, no CSS-in-JS runtime cost. Design tokens via CSS variables for brand consistency + dark mode.
i18next
Industry standard for EN/ES bilingual UI. Namespace-based — legal/medical strings managed separately with version control.
React Hook Form
Performant form validation. Uncontrolled inputs reduce re-renders on long intake forms.
Zod
Schema validation shared between frontend and backend. Single source of truth for input rules.
Backend
API Layer
Node.js / NestJS Primary
Structured, opinionated framework with built-in DI, guards, interceptors. Enforces security patterns at architecture level.
Python / FastAPI Alt (AI Services)
Python used for AI microservices (Vertex AI SDK, OCR processing). FastAPI for async, high-performance endpoints.
REST + OpenAPI 3.0
Fully documented API spec. Enables automated contract testing and security scanning against declared schema.
Async job queue for OCR processing, form generation, and notifications — keeps API response times fast.
Helmet.js + CORS
HTTP security headers enforced on all responses. CORS restricted to verified origin domains only.
Data & Storage
Persistence Layer
Cloud SQL (PostgreSQL) GCP
Primary relational database. Row-level security enforced. No public IP. CMEK encryption. Automated daily backups.
Cloud Storage GCP
Document vault. CMEK-encrypted buckets. Signed URLs for time-limited access. Versioning + lifecycle policies.
Firestore GCP
Real-time session state + case status updates pushed to frontend. Document-model fits case object structure.
Cloud Memorystore (Redis)
Session cache + BullMQ job queue backend. Private VPC only. No public endpoint.
Cloud KMS GCP
Customer-managed encryption keys for all data stores. Key ring separation by data sensitivity tier. 90-day rotation.
AI / ML Services
Intelligence Layer
Vertex AI — Gemini Pro GCP
Core LLM for intake logic, eligibility scoring, consistency checking, and form field narrative mapping. Private VPC endpoint. PII tokenized before every call.
Cloud Vision API GCP
OCR for all uploaded documents. Document type classification. Quality and completeness scoring. Expiry date extraction.
Translation service GCP
ES→EN translation for intake narratives and form outputs. Confidence scores logged. Low confidence outputs flagged for human review.
Vertex AI Explainability
Decision attribution for every AI output used in a case. Required for cybersecurity audit trail and regulatory review.
LangChain (Python)
Orchestration layer for multi-step AI workflows (intake → validation → mapping). Prompt versioning and A/B testing built-in.
📌
PDF Generation: PDFLib (Node.js) for server-side fillable PDF generation from mapped form data. Puppeteer as fallback for complex layouts. Generated PDFs are watermarked "DRAFT" until human review approval is recorded. All generation happens server-side — no client-side PDF construction that could be tampered with.
Compliance & Data Protection
Compliance Framework
GovEase handles sensitive personal data — disability records, service history, and Social Security numbers. Below is an honest account of our compliance position: what we implement, what we inherit from our infrastructure provider, and what we are working towards. We do not claim certifications we do not yet hold.
Cyber Test Baseline
OWASP Top 10
Open Web Application Security Project — Web + API
This is what a pen tester will run against us — it is the primary test target
Injection (SQL, prompt): parameterized queries + AI input sanitization layer
Access control: users can only reach what they are authorised to see
Encryption: all data encrypted in storage and during transmission — industry standard
API security checklist applied to every backend entry point
DAST scan (OWASP ZAP) in staging before any production deployment
We make no certification claim — we implement the controls
Controls Implemented
NIST CSF 2.0
NIST Cybersecurity Framework — used as a reference structure
Used as an internal design reference, not claimed as a certification
Identify: asset inventory, data classification, threat model (STRIDE)
No PHI sent to any service outside the GCP project boundary
If the client's use case requires formal HIPAA compliance, a BAA with Google must be signed and a formal risk assessment conducted — not included in MVP scope
GCP Infrastructure
SOC 2 — Aligned Controls
Service Organization Control 2
We do not hold a SOC 2 certificate at MVP — this takes 6–12 months to audit
Google's infrastructure is SOC 2 Type II certified — the layer we run on inherits this
Platform-level controls are designed to be SOC 2 aligned (Security + Availability criteria)
SOC 2 audit is a Year 2 target once the platform has operational history
Security controls documented now so evidence collection is ready when audit begins
By Design (MVP)
GDPR Principles
General Data Protection Regulation — design principles applied
Data minimisation: only fields required per form type are collected
Right to erasure: deletion workflow built and tested before go-live
We are not making a formal GDPR certification claim — we apply the principles
If EU users are in scope, a formal DPIA and legal basis review is required — this is a client decision, not included in build cost
By Design (MVP)
CCPA / CPRA Principles
California Consumer Privacy Act / Privacy Rights Act
Right to know and right to delete workflows built into case management
No user data sold or shared with third parties for advertising purposes
Privacy policy covers all collected data categories
Formal legal review of the privacy policy is the client's responsibility before launch
Design Target
WCAG 2.1 Level AA
Web Content Accessibility Guidelines
AA is the design target — not a certified claim at MVP launch
Screen reader compatibility (ARIA labels, semantic HTML structure)
Color contrast ratio ≥ 4.5:1 for body text enforced in design system
Keyboard navigation operable without mouse on all critical flows
Full independent WCAG audit recommended before any public-sector rollout
GCP-Inherited Only
ISO 27001 / 27017
Information Security Management (Cloud-Specific)
GCP holds ISO 27001 and 27017 certifications — this covers the infrastructure we run on
These are Google's certifications, not ours — we do not inherit the certificate
We can reference GCP's ISO attestation in security questionnaires as infrastructure evidence
Platform-level ISO 27001 certification is not in MVP or near-term scope
Not In Scope
FedRAMP
Federal Risk and Authorization Management Program
Not applicable to the current product — GovEase is a private platform, not a federal system
GCP services used are FedRAMP-authorized, which is the relevant infrastructure fact
If future contracts require FedRAMP authorization for the platform itself, this is a multi-year, multi-hundred-thousand-dollar effort — not a Phase 1–4 commitment
Removed from commitments entirely
What We Actually Claim vs. What We Inherit
⚠
Key Principle: A cyber tester will test your controls, not your paperwork. Claiming a certification you don't hold is a documentation risk. We claim only what is technically verifiable in our deployed system. Pen test findings are addressed openly — not hidden or disputed.
Area
What We Claim (Testable)
What We Do NOT Claim
All data encrypted
All data encrypted in storage and in transit — verifiable in platform configuration
No encryption certification — it's a control, not a certificate
Authentication
Two-step verification required, every endpoint authenticated, lockout after failed attempts — all pen-testable
Not claiming zero auth vulnerabilities — that's what pen testing determines
HIPAA
GCP HIPAA-eligible infrastructure; PHI principles applied; no data outside GCP boundary
Not HIPAA certified — platform is not a covered entity at MVP. BAA with Google requires separate agreement
SOC 2
Controls aligned to SOC 2 Security criteria; GCP infrastructure is SOC 2 Type II certified
No platform-level SOC 2 certificate — audit not conducted. Target Year 2
ISO 27001
GCP holds ISO 27001/27017 — relevant as infrastructure evidence in questionnaires
Not our certificate — Google's. Cannot be cited as a platform-level claim
GDPR / CCPA
Design principles applied: consent, deletion workflow, data minimisation built in
Not certified compliant — formal legal review is client's responsibility pre-launch
OWASP Top 10
All controls implemented and DAST-tested — this is the primary pen test target
No blanket "OWASP compliant" claim — findings from testing are tracked and remediated
FedRAMP
GCP services are FedRAMP-authorized (Google's authorization for their infrastructure)
No FedRAMP authorization for the platform — removed from all commitments