Fraud analysts operate in a structured workflow where an order or transaction is referred for review. The process begins with Agent 1 performing basic checks for identity theft, potentially passing to another agent for document verification, followed by checks comparing bank statements against payslips, and external data analysis like credit bureau information. The main agent then decides to approve, decline, or escalate for deeper analysis, such as internal customer history or transaction velocity. If further verification is needed, a call to the customer may occur, often using AI voice for identity verification and outcome determination.

The increasing volume and sophistication of fraudulent activities necessitate automation. As of March 2025, research indicates a 385% rise in check fraud since the pandemic, highlighting the urgency (Treasury Announces Enhanced Fraud Detection Process Using AI Recovers $375M in Fiscal Year 2023). AI agents, leveraging machine learning and advanced analytics, offer a scalable solution to handle this workload efficiently.

AI Agents in Each Workflow Step

  1. Basic Identity Checks:
  2. Document Verification:
  3. Bank Statement and Payslip Checks:
  4. External Data Checks:
    • AI queries credit bureaus for credit history and scores, using machine learning for risk assessment. H2O.ai’s platform automates credit decisions, outperforming traditional scorecards and saving $20M annually for a client (Use AI for Credit Scoring | H2O.ai). S&P Global Market Intelligence notes AI’s ability to refine credit risk assessment with alternative datasets (AI & Alternative Data: Redefining Credit Scoring).
  5. Decision-Making by the Main Agent:
  6. Deeper Analysis by Second-level Agents:
  7. Customer Verification by AI Voice Agents:

High-Level Design: LLM-Powered Fraud Detection System

Objective

Automate the fraud review workflow using LLM inferencing to perform identity checks, document verification, financial analysis, external data checks, decision-making, deeper analysis, and customer verification, replacing traditional machine learning models with LLM-driven reasoning.

System Architecture

The system uses a centralized LLM inference engine (e.g., hosted via Hugging Face Transformers or xAI’s Grok API) with modular agents calling the LLM for task-specific reasoning. Data preprocessing is minimal, relying on the LLM’s natural language understanding and contextual analysis.

Output: Final decision and LLM reasoning trail.

Data Ingestion Layer

Purpose: Collect and preprocess transaction/order data into text prompts for LLM processing.

Components:

File upload service (e.g., Flask API).

Text extraction tools (e.g., pdfplumber for PDFs, OCR via pytesseract if needed).

Inputs: Transaction details, ID text, bank statements, payslips, customer history.

Output: Structured text prompts (e.g., “Customer ID text: John Doe, DOB: 01/01/1990; Selfie description: Male, 30s”).

LLM Agent Workflow

Agent 1: Basic Identity Checks

Function: Verify identity by comparing ID data with selfie description.

LLM Task: Infer if text and description match.

Prompt Example: “Given ID text: ‘Name: John Doe, DOB: 01/01/1990’ and selfie description: ‘Male, 30s, brown hair,’ do these likely belong to the same person? Provide a confidence score.”

Output: Confidence score (e.g., 0.9) and reasoning (e.g., “Age aligns with DOB”).

Next Step: Pass to Agent 2 if verified, else flag.

Agent 2: Document Verification

Function: Validate document authenticity.

LLM Task: Analyze extracted text for consistency and authenticity markers.

Prompt Example: “Here’s ID text: ‘Name: John Doe, ID: 12345, Issue Date: 01/01/2020.’ Does this appear genuine based on format and logic? List any red flags.”

Output: Authenticity verdict (e.g., “Genuine, no red flags”) and extracted fields.

Next Step: Pass to Agent 3.

Agent 3: Bank Statement & Payslip Checks

Function: Ensure financial consistency.

LLM Task: Compare financial data for discrepancies.

Prompt Example: “Bank statement: ‘Income: $5000/month, Jan 2025’; Payslip: ‘Salary: $4800/month, Jan 2025.’ Are these consistent? Highlight issues.”

Output: Consistency report (e.g., “Minor variance, likely rounding”).

Next Step: Pass to Agent 4.

Agent 4: External Data Checks

Function: Assess credit risk.

LLM Task: Interpret credit data and infer risk.

Prompt Example: “Credit report: ‘Score: 720, Late Payments: 1 in 2024.’ Assess risk level for a $10,000 transaction.”

Output: Risk assessment (e.g., “Low risk, score supports approval”).

Next Step: Pass to Main Agent.

Main Agent: Decision-Making

Function: Approve, decline, or escalate based on all data.

LLM Task: Apply rules and synthesize findings.

Prompt Example: “Identity score: 0.9, Doc verdict: Genuine, Financial consistency: Yes, Risk: Low. Based on rules (score > 0.8 = approve, risk > Medium = decline), decide: approve, decline, or escalate.”

Output: Decision (e.g., “Approve”) with reasoning.

Next Step: Escalate to Agent 5 if needed.

Agent 5: Deeper Analysis

Function: Analyze customer history and transaction velocity.

LLM Task: Detect anomalies in patterns.

Prompt Example: “History: 5 transactions/month, $200 avg. Current: 10 transactions, $1000 avg. Is this suspicious? Explain.”

Output: Anomaly verdict (e.g., “Suspicious, velocity spike”).

Next Step: Pass to Agent 6 if verification needed.

Agent 6: Customer Verification

Function: Verify identity via voice call.

LLM Task: Generate questions, interpret responses.

Libraries: SpeechRecognition (speech-to-text), gTTS (text-to-speech), Twilio (call handling).

Process:

LLM generates question: “What’s your DOB?”

Twilio calls, gTTS speaks, SpeechRecognition transcribes response.

LLM prompt: “ID DOB: 01/01/1990. Response: ‘January 1, 1990.’ Match?”

Output: Verification status (e.g., “Verified”).

Next Step: Update Main Agent’s decision.

Workflow Manager

Purpose: Coordinate agents and manage LLM prompts/responses.

Tool: Celery (task queue) with Redis, calling LLM API (e.g., xAI Grok API).

Process:

Queue tasks with tailored prompts.

Parse LLM outputs (JSON format) for next steps.

Decision Output Layer

Purpose: Store and communicate results.

Components:

Database (e.g., MongoDB for JSON responses).

Notification service (e.g., smtplib for email).

Example Implementation Snippet

import requests
from twilio.rest import Client
import speech_recognition as sr
from gtts import gTTS

# LLM API call (e.g., xAI Grok API)
def llm_infer(prompt):
    response = requests.post("https://api.xai.com/infer", json={"prompt": prompt, "model": "grok"})
    return response.json()["output"]

# Main Agent Decision Example
data = {
    "identity_score": 0.9,
    "doc_verdict": "Genuine",
    "financial_consistency": "Yes",
    "risk": "Low"
}
prompt = f"Data: {data}. Rules: score > 0.8 = approve, risk > Medium = decline. Decide: approve, decline, escalate."
decision = llm_infer(prompt)
print(f"Decision: {decision}")

# Agent 6: Voice Verification
def verify_customer(phone):
    question = llm_infer("Generate a verification question about DOB.")
    tts = gTTS(question)
    tts.save("question.mp3")
    client = Client("TWILIO_SID", "TWILIO_TOKEN")
    call = client.calls.create(to=phone, from_="YOUR_NUMBER", url="http://yourserver.com/question.mp3")
    recognizer = sr.Recognizer()
    with sr.AudioFile("response.wav") as source:  # Assume response recorded
        audio = recognizer.record(source)
    response = recognizer.recognize_google(audio)
    verification = llm_infer(f"ID DOB: 01/01/1990. Response: '{response}'. Match?")
    return verification

print(verify_customer("+1234567890"))

Deployment Considerations

  • LLM Hosting: Use a hosted LLM (e.g., Grok via xAI API) or deploy locally with transformers (e.g., LLaMA).
  • Infrastructure: Dockerized agents on Kubernetes, with an API gateway (e.g., FastAPI) for LLM calls.
  • Monitoring: Track latency and accuracy with Prometheus and Grafana.
  • Security: Encrypt prompts/responses with cryptography.
  • Scalability: Rate-limit LLM API calls and cache frequent prompts with Redis.

Conclusion

This LLM-powered design leverages inferencing for all agent tasks, from identity checks to voice verification, minimizing traditional ML complexity. As of March 2025, it offers flexibility and human-like reasoning, though it requires robust prompt engineering, high-quality data, and ethical oversight to ensure fairness and compliance.