AGENCYSCRIPT
CoursesEnterpriseBlog
๐Ÿ‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
ยฉ 2026 Agency Script, Inc.ยท
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Why Data Quality Monitoring Is a High-Value ServiceCore Data Quality DimensionsFreshnessVolumeSchemaDistributionConsistencyAccuracyTechnical ArchitectureData Profiling EngineAnomaly Detection ModelsAlert Management SystemDelivery FrameworkPhase 1: Discovery and Data Landscape (Weeks 1-3)Phase 2: Profiling and Baseline (Weeks 4-6)Phase 3: Detection and Alerting (Weeks 7-9)Phase 4: Integration and Handoff (Weeks 10-12)Common Delivery ChallengesFalse Positive ManagementHandling Legitimate Data ChangesScale and PerformanceOrganizational AdoptionPricing Data Quality MonitoringYour Next Step
Home/Blog/A Truncated Decimal Cost a Firm $4.7M in Misreported Value
Delivery

A Truncated Decimal Cost a Firm $4.7M in Misreported Value

A

Agency Script Editorial

Editorial Team

ยทMarch 21, 2026ยท13 min read
data quality AIdata monitoring automationdata observabilityai agency data quality

A financial services firm serving institutional investors discovered that a data feed from their custodian had been silently truncating decimal places on foreign exchange rates for three months. The error was subtle โ€” rates that should have been 1.08543 were arriving as 1.08 โ€” but the cumulative impact on portfolio valuation was $4.7 million in misreported asset values. Client reports had been wrong for an entire quarter. The issue was only discovered during a manual reconciliation that happened by coincidence. Nobody was monitoring for this type of error because their data quality checks were limited to null detection and basic range validation.

We deployed an AI-powered data quality monitoring system that learns the statistical properties of every data field, detects deviations in real-time, and alerts the data team before bad data propagates to downstream systems. The system would have caught the decimal truncation within 15 minutes of the first corrupted data arrival โ€” the precision distribution shift from 5 decimal places to 2 would have triggered an immediate alert. Since deployment, the system has caught 23 data quality incidents in the first 6 months, including 4 that the data team classified as "would have caused material downstream impact if undetected."

Automated data quality monitoring is one of the most practical and immediately valuable AI services an agency can deliver. Every data-driven company has data quality problems; most do not know about them until something breaks. Here is the delivery playbook.

Why Data Quality Monitoring Is a High-Value Service

Data quality is the foundation that every analytics, ML, and business intelligence initiative depends on. When it fails, everything downstream fails.

The cost of bad data:

  • Companies lose an estimated 15-25 percent of revenue due to poor data quality
  • Data quality issues are the number one reason ML models fail in production
  • 60 percent of data scientists spend more time cleaning data than analyzing it
  • The average cost of a data quality incident in financial services is $500,000-5,000,000

Why traditional data quality approaches fail:

  • Rule-based checks are incomplete: You can only write rules for problems you anticipate. The most damaging data quality issues are the ones nobody thought to check for.
  • Manual monitoring does not scale: As data sources and pipelines multiply, manual quality review becomes impossible.
  • Checks are applied inconsistently: Different teams apply different quality standards, and checks are often skipped under time pressure.
  • Delayed detection: Most data quality issues are discovered days or weeks after they start, when the damage is already done.

What AI-powered monitoring adds:

  • Learns what "normal" looks like for every data field without manual rule configuration
  • Detects subtle anomalies that rule-based checks miss
  • Monitors continuously and alerts in real-time
  • Adapts to seasonal patterns and expected variations without false positives
  • Scales to thousands of data fields across hundreds of sources

What clients will pay: Data quality monitoring projects range from $50,000 for focused monitoring of critical data assets to $250,000+ for comprehensive data observability platforms. Ongoing retainers run $8,000-20,000 per month.

Core Data Quality Dimensions

AI monitoring should cover all dimensions of data quality:

Freshness

Is data arriving on time?

  • Monitor arrival times for each data source
  • Detect delays before downstream consumers are affected
  • Distinguish between late data and missing data
  • Account for expected schedule variations (weekends, holidays)

Volume

Is the expected amount of data arriving?

  • Monitor record counts, file sizes, and byte volumes
  • Detect both unusual increases (duplicate data, wrong file) and decreases (truncated data, missing records)
  • Account for seasonal patterns (higher volume on business days, lower on weekends)
  • Track volume trends over time

Schema

Does the data structure match expectations?

  • Monitor for added, removed, or renamed fields
  • Detect data type changes
  • Track nullable field changes
  • Monitor nested structure changes

Distribution

Do data values follow expected patterns?

  • Statistical distribution of numeric fields (mean, standard deviation, percentiles)
  • Cardinality of categorical fields
  • Null rates and zero rates
  • Value range boundaries
  • Pattern distributions for string fields

Consistency

Do related data fields maintain expected relationships?

  • Cross-field ratios and correlations
  • Referential integrity across tables
  • Consistency across related data sources
  • Business rule compliance

Accuracy

Does the data represent reality?

  • Comparison against known reference values
  • Cross-validation with independent data sources
  • Reasonableness checks based on domain knowledge
  • Reconciliation with external benchmarks

Technical Architecture

Data Profiling Engine

The profiling engine continuously analyzes incoming data and builds statistical profiles.

For each data field, the profiler tracks:

  • Data type and format distribution
  • Null and empty rate
  • Unique value count (cardinality)
  • For numeric fields: mean, median, standard deviation, min, max, percentiles (5th, 25th, 75th, 95th)
  • For string fields: length distribution, character set, pattern distribution
  • For temporal fields: range, gaps, frequency
  • For categorical fields: value frequency distribution

Profiling approaches:

  • Full scan for batch data (profile the entire dataset on arrival)
  • Sampling for streaming data (profile a statistically representative sample)
  • Incremental profiling for append-only data (update running statistics with new data)

Anomaly Detection Models

Statistical models for each quality dimension:

Freshness monitoring:

  • Model expected arrival times as a distribution
  • Account for day-of-week and calendar effects
  • Alert when arrival time exceeds the expected range
  • Differentiate between "late" and "missing"

Volume monitoring:

  • Time-series model of expected volume (accounting for trends, seasonality, and day-of-week effects)
  • Alert when actual volume deviates significantly from expected
  • Use prediction intervals (not just point estimates) to set appropriate thresholds

Distribution monitoring:

  • For each statistic (mean, null rate, cardinality, etc.), model its expected value over time
  • Use multivariate anomaly detection to catch shifts that are only visible when considering multiple statistics together
  • Implement change-point detection for sudden distribution shifts
  • Use drift detection methods for gradual distribution changes

Cross-field monitoring:

  • Model expected relationships between fields (correlations, ratios, conditional distributions)
  • Alert when relationships change, even if individual fields appear normal
  • Use association rule mining to discover expected co-occurrence patterns

Alert Management System

Raw anomaly detection generates too many alerts. The alert management layer prioritizes, groups, and enriches alerts for human consumption.

Alert processing:

  1. Severity classification: Critical (blocks downstream processes), High (material impact on analytics), Medium (noticeable but manageable), Low (cosmetic or minor)
  2. Grouping: Aggregate related alerts (if 15 fields in the same table are anomalous, that is one root cause, not 15 independent issues)
  3. Root cause analysis: Trace anomalies to their likely source (upstream pipeline failure, source system change, infrastructure issue)
  4. Impact assessment: Map the anomalous data to affected downstream consumers (dashboards, reports, models, applications)
  5. Context enrichment: Include relevant details (recent changes, historical frequency of this anomaly type, suggested investigation steps)

Delivery Framework

Phase 1: Discovery and Data Landscape (Weeks 1-3)

Activities:

  • Inventory critical data assets (sources, pipelines, consumers)
  • Map data dependencies and critical paths
  • Interview data consumers about past quality incidents and current pain points
  • Assess existing quality monitoring (what checks exist today?)
  • Prioritize data assets for monitoring based on business impact
  • Analyze historical data to understand normal patterns

Deliverable: Data quality assessment report with prioritized monitoring plan.

Phase 2: Profiling and Baseline (Weeks 4-6)

Activities:

  • Deploy the data profiling engine on priority data assets
  • Collect baseline profiles over 4-8 weeks of data (enough to capture weekly and monthly patterns)
  • Configure monitoring models for each quality dimension
  • Implement alert thresholds based on baseline profiles
  • Build the alert management and routing system

Phase 3: Detection and Alerting (Weeks 7-9)

Activities:

  • Activate anomaly detection in monitoring mode (detect and log, but do not alert)
  • Tune detection thresholds to minimize false positives while catching real issues
  • Validate detections against known historical incidents (would the system have caught them?)
  • Activate alerting for high-confidence detections
  • Build the monitoring dashboard

Phase 4: Integration and Handoff (Weeks 10-12)

Activities:

  • Integrate with incident management systems (PagerDuty, OpsGenie, Slack)
  • Integrate with data pipeline orchestration (pause downstream pipelines when quality issues are detected)
  • Implement data quarantine workflows for automatically flagged bad data
  • Train the data team on monitoring tools and alert response
  • Document runbooks for common alert types
  • Transition to ongoing support

Common Delivery Challenges

False Positive Management

The biggest risk to adoption is alert fatigue from false positives. Data teams will ignore the system if it cries wolf too often.

Strategies:

  • Start conservative โ€” miss some true anomalies rather than flooding with false positives
  • Tune thresholds over the first 4-6 weeks based on team feedback
  • Implement an "expected variation" calendar (known events that cause legitimate data changes)
  • Use feedback loops where team members can mark alerts as false positive, and the system learns from these
  • Target a false positive rate below 15 percent for high-severity alerts

Handling Legitimate Data Changes

Not every data change is a quality issue. Businesses evolve: new products launch, markets expand, seasonality shifts. The monitoring system needs to distinguish between legitimate changes and quality issues.

Approaches:

  • Allow manual baseline resets when legitimate changes are confirmed
  • Implement automatic adaptation with a configurable learning rate
  • Provide context with every alert so the data team can quickly assess whether the change is expected
  • Integrate with change management systems to correlate data changes with known business changes

Scale and Performance

Monitoring thousands of data fields across hundreds of pipelines generates significant computation and storage requirements.

Optimization:

  • Profile in batch during off-peak hours for non-latency-sensitive data
  • Use statistical sampling for very large datasets
  • Implement tiered monitoring (real-time for critical data, hourly for important data, daily for everything else)
  • Store only aggregate statistics, not raw data, for the monitoring system
  • Use efficient time-series storage for historical profile data

Organizational Adoption

Data quality monitoring requires buy-in from data producers (who need to respond to alerts) and data consumers (who need to report quality issues).

Driving adoption:

  • Start with data assets that have caused pain recently (recent quality incidents)
  • Quantify the cost of quality issues that the system would have caught
  • Make the dashboard and alerts accessible and useful, not just technical
  • Include data quality metrics in team and organizational performance reporting
  • Celebrate catches โ€” when the system prevents a quality incident, make sure stakeholders know

Pricing Data Quality Monitoring

Project-based pricing:

  • Focused monitoring (5-10 critical data assets): $50,000-100,000
  • Comprehensive data observability platform: $120,000-250,000
  • Enterprise data quality system (multi-domain, multi-team): $200,000-400,000

Ongoing retainer:

  • Monitoring system maintenance: $5,000-12,000 per month
  • New data source onboarding: $3,000-8,000 per source
  • Model tuning and false positive reduction: $3,000-5,000 per month

Value justification: A single undetected data quality incident can cost $500,000 or more in financial services. A $150,000 monitoring system that prevents even two major incidents per year pays for itself immediately. Add the time savings from automated monitoring (no more manual spot checks) and the case becomes even stronger.

Your Next Step

Find a data-driven company that has experienced a painful data quality incident in the past year. Offer a paid data quality assessment where you profile their critical data assets, identify the quality dimensions currently unmonitored, and estimate the risk of undetected issues. When you show them the data anomalies that exist in their current data โ€” and there will always be anomalies โ€” the case for automated monitoring becomes immediate and urgent.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

Delivery

Real-Time Stream Processing for AI Applications: The Complete Delivery Guide

When your client's AI model needs predictions in milliseconds instead of minutes, batch processing is not an option. Here is how to deliver production-grade stream processing for AI workloads.

A
Agency Script Editorial
March 21, 2026ยท14 min read
Delivery

Delivering Survival Analysis for Customer Retention: The AI Agency Playbook

A SaaS company knew their churn rate was 18 percent annually but could not predict when specific customers would leave. Survival analysis gave them a 90-day early warning system that saved $2.1 million in ARR.

A
Agency Script Editorial
March 21, 2026ยท13 min read
Delivery

Building Synthetic Data Generation Pipelines โ€” Creating Training Data When Real Data Is Scarce, Sensitive, or Biased

A healthcare AI company generated 500,000 synthetic patient records that preserved statistical patterns while eliminating privacy risk, cutting their model development timeline by 60%. Here is how to build synthetic data pipelines.

A
Agency Script Editorial
March 21, 2026ยท12 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification