AGENCYSCRIPT
CoursesEnterpriseBlog
๐Ÿ‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
ยฉ 2026 Agency Script, Inc.ยท
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Why Fraud Detection Is Uniquely ChallengingThe Fraud Detection ArchitectureFeature Engineering: Where Fraud Detection Is Won or LostModel ArchitectureReal-Time Scoring PipelineAlert and Investigation SystemEvaluation Metrics That MatterDealing with Adversarial AdaptationRegulatory and Compliance ConsiderationsPricing Fraud Detection ProjectsYour Next Step
Home/Blog/Building Fraud Detection Systems That Work: The AI Agency Field Guide
Delivery

Building Fraud Detection Systems That Work: The AI Agency Field Guide

A

Agency Script Editorial

Editorial Team

ยทMarch 20, 2026ยท13 min read
fraud detectionfinancial AIanomaly detectionproduction ML

Building Fraud Detection Systems That Work: The AI Agency Field Guide

A payments processor came to a seven-person AI agency in Miami after their rule-based fraud detection system flagged 12% of all transactions for manual review. Their fraud team of 15 analysts could only review 60% of flagged transactions within the required timeframe. The remaining 40% were auto-approved to avoid blocking legitimate customers. Among those auto-approved transactions, the fraud rate was 3.2% โ€” costing $8.7 million annually. Meanwhile, 94% of the transactions the analysts did review were legitimate, wasting hundreds of hours on false positives.

The agency built a machine learning-based fraud detection system that replaced the rule-based approach. The new system flagged only 3.1% of transactions โ€” a 74% reduction in review volume โ€” while catching 96% of fraudulent transactions compared to 71% under the old system. The analysts' workload dropped by 74%, the auto-approve problem disappeared, and annual fraud losses dropped to $2.1 million. The payment processor saved $6.6 million in the first year and signed a three-year platform contract with the agency worth $1.8 million.

Fraud detection is the highest-stakes, highest-value AI application most agencies will encounter. The technical challenges are real โ€” extreme class imbalance, adversarial actors, real-time latency requirements. But the payoff is enormous. A system that reduces fraud losses by even 50% can save a client millions annually, making it easy to justify six-figure agency fees.

Why Fraud Detection Is Uniquely Challenging

Fraud detection combines several ML challenges that do not appear together in most other applications:

Extreme class imbalance. In most transaction datasets, fraud accounts for 0.1-1% of all transactions. A model that predicts "not fraud" for everything achieves 99% accuracy but catches zero fraud. Standard accuracy metrics are meaningless.

Adversarial actors. Unlike churn prediction or demand forecasting, fraud involves intelligent adversaries who actively adapt to avoid detection. The fraud patterns your model learns today will evolve within months as fraudsters change tactics.

Real-time latency requirements. Transaction fraud must be detected before the transaction is completed โ€” typically within 50-500 milliseconds. Batch processing is too slow.

Cost asymmetry. A false negative (missed fraud) costs the full transaction amount plus chargeback fees. A false positive (blocked legitimate transaction) costs customer friction and potentially lost lifetime value. These costs are very different and must be balanced.

Concept drift is constant. Fraud patterns shift continuously. New fraud schemes emerge, seasonal patterns change, and the population of legitimate transactions evolves. Models degrade faster in fraud detection than in almost any other domain.

Explainability requirements. When you block a legitimate customer's transaction, they want to know why. When regulators audit your fraud system, they want to see the decision logic. Black-box models create compliance and customer experience problems.

The Fraud Detection Architecture

Feature Engineering: Where Fraud Detection Is Won or Lost

The single most important factor in fraud detection performance is feature engineering. The raw transaction data โ€” amount, merchant, timestamp โ€” is insufficient. You need to compute contextual features that capture behavioral patterns.

Velocity features:

  • Number of transactions in the last 1 hour, 6 hours, 24 hours, 7 days
  • Total transaction amount in the last 1 hour, 6 hours, 24 hours
  • Number of distinct merchants in the last 24 hours
  • Number of distinct devices used in the last 7 days
  • Number of failed transactions in the last hour

Behavioral deviation features:

  • How far is this transaction amount from the user's average?
  • How far is this merchant category from the user's typical categories?
  • Is this transaction at an unusual time of day for this user?
  • Is this device/IP/location new for this user?
  • How long since the user's last transaction (velocity gap)?

Network features:

  • Does the merchant have a higher-than-average fraud rate?
  • Is the user connected to known fraud accounts through shared devices, IPs, or addresses?
  • Has this card been used with a different name at the same merchant?
  • Is the shipping address associated with multiple cards?

Contextual features:

  • Is this a high-risk merchant category (digital goods, gambling, cryptocurrency)?
  • Is this an international transaction for a domestic user?
  • Is the transaction amount a round number (common in testing stolen cards)?
  • Was the card recently reported lost or compromised?

The delivery tip: Spend 40% of your model development time on feature engineering. In fraud detection, a simple model with excellent features outperforms a complex model with mediocre features every time.

Model Architecture

Do not start with a neural network. Gradient-boosted trees (XGBoost, LightGBM) consistently outperform neural networks on tabular fraud data, are faster to train, easier to deploy, and more interpretable. Start with gradient boosting and only move to neural networks if you need to incorporate unstructured data (text, images) or graph-based features.

Handling class imbalance:

  • SMOTE (Synthetic Minority Over-sampling Technique): Generates synthetic fraud examples by interpolating between existing fraud cases. Use with caution โ€” it can create unrealistic examples.
  • Cost-sensitive learning: Assign higher misclassification costs to fraud examples during training. This is the most practical approach and directly maps to the business cost asymmetry.
  • Undersampling the majority class: Train on a balanced subset. Simple and effective, though you lose information from the majority class.
  • Ensemble of balanced subsets: Train multiple models on different balanced subsamples and ensemble their predictions. This combines the benefits of undersampling with full data utilization.

The recommended approach: Use cost-sensitive gradient boosting (set scaleposweight in XGBoost to the ratio of legitimate to fraudulent transactions) as your primary model. If you need additional performance, ensemble it with an anomaly detection model (Isolation Forest) that catches novel fraud patterns.

Real-Time Scoring Pipeline

Fraud scoring must happen in real-time, which means your feature computation and model inference must complete within the latency budget.

Architecture:

  1. Transaction event arrives via the payment gateway
  2. Pre-computed features (user history, merchant profiles) are fetched from a feature store (Redis or similar) โ€” target: 5ms
  3. Real-time features (velocity counts, recent patterns) are computed from a streaming layer (Kafka Streams, Flink) โ€” target: 10ms
  4. Model inference runs on the combined feature vector โ€” target: 5ms
  5. Business rules are applied on top of the model score (always block if amount > $10,000 and account < 1 day old) โ€” target: 2ms
  6. Decision is returned to the payment gateway โ€” total: <50ms

Fallback strategy: If the ML scoring pipeline is unavailable (outage, latency spike), fall back to a simplified rule-based system. Never block all transactions because your ML system is down.

Alert and Investigation System

Not all fraud predictions should result in automatic transaction blocking. Implement a tiered response:

  • High confidence fraud (score > 0.95): Auto-block the transaction. Notify the customer.
  • Medium confidence (0.7 - 0.95): Flag for analyst review within 10 minutes. Hold the transaction pending review.
  • Low confidence (0.4 - 0.7): Allow the transaction but add to the review queue for next-day analysis.
  • Below threshold (< 0.4): Allow and do not flag.

The thresholds should be set based on the client's cost structure โ€” the relative cost of false positives versus false negatives โ€” not on arbitrary values.

Evaluation Metrics That Matter

Precision at a fixed recall. "What percentage of flagged transactions are actually fraud, given that we catch 95% of all fraud?" This directly maps to analyst workload.

Recall at a fixed false positive rate. "What percentage of fraud do we catch if we accept a 2% false positive rate?" This directly maps to customer experience.

Detection rate by fraud amount. Catching 95% of fraud by transaction count is good. But if the 5% you miss are the largest transactions, you might be catching 95% of fraud events but only 80% of fraud dollars. Weight your evaluation by transaction amount.

Time-to-detection. For fraud schemes that involve multiple transactions (account takeover, card testing), how quickly does the system identify the pattern? First-transaction detection is ideal but not always achievable.

Value detection rate. The dollar amount of fraud detected divided by the total dollar amount of fraud. This is the metric the CFO cares about.

False positive rate by customer segment. Are certain customer segments (international travelers, high-value customers) disproportionately affected by false positives? Segment-level analysis prevents customer experience problems.

Dealing with Adversarial Adaptation

Fraud detection is an arms race. Here is how to build systems that stay ahead:

Continuous retraining. Retrain the model monthly (at minimum) on the latest labeled data. Fraud patterns evolve, and stale models degrade.

Ensemble diversity. Use multiple model types (gradient boosting + anomaly detection + rule-based) so that defeating one model type is not sufficient to evade the system.

Feature monitoring. Track feature distributions in real-time. When a fraud feature's distribution shifts (e.g., the velocity of new account creation spikes), it may indicate a new attack pattern even before the model's performance degrades.

Anomaly detection layer. In addition to supervised models trained on known fraud patterns, deploy unsupervised anomaly detection that flags statistically unusual transactions. This catches novel fraud that supervised models have never seen.

Regular red-teaming. Periodically test the system by simulating fraud attacks. Hire an external penetration tester or use synthetic fraud scenarios to identify weaknesses.

Feedback loop speed. The faster you label outcomes (was this transaction ultimately fraudulent?), the faster you can retrain. Work with the client to accelerate the chargeback and dispute resolution process for model feedback.

Regulatory and Compliance Considerations

PCI DSS compliance. If you handle cardholder data, your systems must comply with PCI Data Security Standard. This affects how you store, process, and transmit transaction data. If possible, work with tokenized data where the actual card numbers are replaced with tokens.

Fair lending and non-discrimination. In the US, fraud systems that disproportionately block transactions for protected groups can trigger fair lending violations. Test your model for demographic disparities and mitigate them.

Right to explanation. Customers whose transactions are blocked have a right (regulatory or practical) to understand why. Implement SHAP or similar explainability for every blocked transaction.

Model documentation. Regulators expect documented model development processes, validation results, and ongoing monitoring reports. Build these into your delivery from day one.

Pricing Fraud Detection Projects

Fraud detection commands premium pricing due to its direct impact on financial losses:

  • Discovery and assessment: $20,000 - $40,000
  • Feature engineering and model development: $60,000 - $150,000
  • Real-time scoring pipeline: $40,000 - $100,000
  • Alert and investigation system: $30,000 - $60,000
  • Integration and deployment: $30,000 - $70,000
  • Total typical engagement: $180,000 - $420,000

Ongoing operations: $10,000 - $20,000 per month for monitoring, retraining, and rule updates.

Value-based pricing opportunity: If your system reduces fraud losses by $5 million annually, charging $300,000 for the build and $15,000 per month for operations is easily justified. Some agencies price fraud detection as a percentage of fraud savings โ€” typically 10-20% of the reduction in losses.

Your Next Step

If you are pursuing fraud detection clients, build a proof of concept using the publicly available IEEE-CIS Fraud Detection dataset from Kaggle. Implement the feature engineering framework described above, train a gradient-boosted model with cost-sensitive learning, and document the precision-recall tradeoff at different thresholds. That proof of concept, adapted with the client's terminology and cost structure, becomes the centerpiece of your pitch. Show the prospect their potential savings at different detection rates, and let the numbers sell the engagement.

Beyond the dataset, prepare a one-page ROI model. On one side, show the client's current fraud losses and investigation costs. On the other side, show the projected performance of the ML system at different operating points โ€” aggressive (high recall, more false positives), balanced (optimal tradeoff), and conservative (low false positives, some missed fraud). Let the client choose the operating point that matches their risk tolerance. This consultative approach demonstrates that you understand their business, not just the technology, and positions you as a strategic partner rather than a vendor.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

Delivery

Real-Time Stream Processing for AI Applications: The Complete Delivery Guide

When your client's AI model needs predictions in milliseconds instead of minutes, batch processing is not an option. Here is how to deliver production-grade stream processing for AI workloads.

A
Agency Script Editorial
March 21, 2026ยท14 min read
Delivery

Delivering Survival Analysis for Customer Retention: The AI Agency Playbook

A SaaS company knew their churn rate was 18 percent annually but could not predict when specific customers would leave. Survival analysis gave them a 90-day early warning system that saved $2.1 million in ARR.

A
Agency Script Editorial
March 21, 2026ยท13 min read
Delivery

Building Synthetic Data Generation Pipelines โ€” Creating Training Data When Real Data Is Scarce, Sensitive, or Biased

A healthcare AI company generated 500,000 synthetic patient records that preserved statistical patterns while eliminating privacy risk, cutting their model development timeline by 60%. Here is how to build synthetic data pipelines.

A
Agency Script Editorial
March 21, 2026ยท12 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification