A 420K Engagement With No Data Agreement Signed

A boutique AI agency in Austin signed a $420,000 engagement to build a predictive maintenance system for a mid-size manufacturing company. The agency needed historical equipment sensor data, maintenance logs, and failure records — roughly 18 months of operational data across 340 machines. The client agreed to share it. No formal data sharing agreement was signed. The data arrived in a mix of CSV files, proprietary database exports, and handwritten maintenance logs that had been scanned into PDFs.

Six months into the project, the client's legal team discovered that the sensor data included readings from equipment leased from a third party whose contract prohibited sharing operational data with outside vendors. The entire model had been trained on data the client had no right to share. The agency had to scrap three months of work, retrain on a reduced dataset, and eat $140,000 in costs that a proper data sharing agreement would have flagged before the first byte of data changed hands.

Data sharing agreements are not paperwork for the sake of paperwork. They are the operational foundation of every AI project. They define what data moves, how it moves, who can use it, what happens when the project ends, and who bears liability when something goes wrong. If you are building AI products for clients and you do not have a rigorous data sharing framework, you are building on quicksand.

Traditional data sharing agreements — the kind used for business analytics, reporting, and standard software integrations — were not designed for AI. AI introduces unique data dynamics that standard agreements fail to address.

Data is fuel, not just input. In traditional software, data flows through the system and produces outputs. In AI, data fundamentally shapes the system itself. Training data becomes embedded in model weights. The distinction between "using data" and "consuming data" matters enormously for data sharing terms.

Data quality directly determines product quality. A data sharing agreement for an AI project needs data quality provisions that would be unnecessary in a traditional software context. If the client shares incomplete, biased, or inaccurate data, the AI system will produce incomplete, biased, or inaccurate outputs. The agreement needs to allocate this risk.

Derived data creates new IP questions. When your agency uses client data to train an AI model, the resulting model weights represent a new form of derived intellectual property. Is that derived IP owned by the client (whose data created it), the agency (whose expertise built it), or is it shared? The data sharing agreement needs to answer this.

Regulatory requirements are data-specific. Privacy regulations like GDPR and CCPA impose specific requirements on how personal data is shared, processed, and retained. AI-specific regulations add additional requirements around training data documentation, bias assessment, and transparency. Your data sharing agreement needs to address the intersection of data privacy and AI regulation.

Section 1: Data Identification and Specification

Before any data moves, both parties need absolute clarity on what data is being shared. Vague descriptions like "customer data" or "operational records" are not sufficient.

What to specify:

Data categories — Enumerate every type of data being shared (transaction records, user behavior logs, equipment sensor readings, text documents, images)
Data format — Specify the technical format for each data category (CSV, JSON, Parquet, database exports, API access)
Data volume — Estimate the volume of data and the frequency of updates
Data timeframe — Define the historical period covered by the data
Data fields — List the specific fields or attributes within each data category
Sample data — Require sample data before the full sharing begins so both parties can verify the data meets specifications

Why this matters for AI: Model architecture decisions depend on data characteristics. If the agreement specifies structured tabular data but the client delivers unstructured text, your entire approach may need to change. Specifying data upfront prevents costly pivots later.

Section 2: Data Quality Requirements

This section is uniquely important for AI projects and often absent from standard data sharing agreements.

Quality dimensions to address:

Completeness — What percentage of missing values is acceptable? How should missing values be handled?
Accuracy — What validation has the client performed on the data? Are there known accuracy issues?
Consistency — Are data formats and values consistent across the dataset, or are there format changes over time?
Timeliness — How current is the data? What is the lag between data generation and data sharing?
Bias assessment — Has the client assessed the data for potential biases? Are certain populations, time periods, or conditions underrepresented?
Labeling quality — For supervised learning projects, what is the quality of data labels? Who created the labels, and what was the labeling methodology?

Remediation process: Define what happens when shared data does not meet quality requirements. Options include:

Client remediates and re-shares data within a specified timeframe
Agency performs data cleaning at an additional cost
Project scope adjusts to reflect data quality limitations
Either party can terminate if data quality issues are not resolvable

Section 3: Data Transfer and Security

How data physically moves from client to agency is a critical governance concern. Insecure data transfer can expose both parties to regulatory penalties and reputational damage.

Transfer mechanisms to specify:

Secure transfer methods — Encrypted file transfer, secure API endpoints, direct database connections
Transfer scheduling — One-time bulk transfer, periodic batch transfers, real-time streaming
Transfer validation — Checksums, record counts, and other verification that transferred data is complete and uncorrupted
Transfer environments — Where data is transferred to (cloud region, on-premises, specific infrastructure)

Security requirements:

Encryption standards for data at rest and in transit
Access controls and authentication requirements
Network security requirements
Logging and audit trail requirements for data access
Incident response procedures for data security events
Penetration testing or security assessment requirements

Section 4: Permitted Use and Restrictions

This is the heart of the data sharing agreement — what can the agency actually do with the data.

Permitted uses to define:

Model training — Can the agency use the data to train AI models? This seems obvious, but the specifics matter. Can the data be used for initial training only, or for ongoing retraining?
Model evaluation — Can the data be used for testing, validation, and benchmarking?
Derived model creation — Can the agency create derivative models or fine-tuned models using the data?
Aggregated insights — Can the agency use aggregated, anonymized insights from the data for other purposes (benchmarking, marketing, product development)?
Internal research — Can the agency use the data for internal research and development beyond the specific project?

Common restrictions:

Data cannot be shared with third parties without prior written consent
Data cannot be used for purposes outside the defined project scope
Data cannot be combined with data from other clients without anonymization
Personal data must be processed in compliance with specified privacy regulations
Data cannot be stored in jurisdictions not approved by the client

Section 5: Data Retention and Destruction

AI projects create unique data retention challenges. Training data may be embedded in model weights. Intermediate datasets, feature stores, and evaluation datasets accumulate throughout the project. The agreement needs clear rules for all of it.

Retention provisions:

Project duration retention — How long data is retained during active project work
Post-project retention — How long data is retained after project completion (for model retraining, debugging, support)
Training data in models — Address the fact that training data influences model weights even after the raw data is deleted
Derived datasets — Specify retention rules for intermediate datasets, feature engineering outputs, and evaluation datasets
Backup retention — Address data retention in backups and disaster recovery systems

Destruction provisions:

Destruction timeline — How quickly data must be destroyed after the retention period or contract termination
Destruction methods — Specify approved data destruction methods (cryptographic erasure, physical destruction, overwriting)
Destruction certification — Require written certification that data has been destroyed
Exceptions — Identify any data that is exempt from destruction (aggregated statistics, anonymized data, model weights)

Section 6: Intellectual Property and Derived Works

Data sharing for AI creates layered IP questions that straightforward data sharing does not.

IP provisions to address:

Client data ownership — Confirm that the client retains ownership of all shared data
Model ownership — Define who owns AI models trained on the shared data
Training artifacts — Address ownership of training artifacts (hyperparameters, training configurations, feature engineering code)
Evaluation results — Define who owns evaluation results, benchmarks, and performance metrics
Improvements and innovations — If the agency develops new techniques or innovations while working with the data, who owns those innovations?
License grants — If the agency retains model ownership, define the license granted to the client

Section 7: Compliance and Regulatory Requirements

Data sharing for AI sits at the intersection of data privacy regulation and emerging AI regulation.

Privacy compliance:

Identify applicable privacy regulations (GDPR, CCPA, HIPAA, industry-specific regulations)
Define data controller and data processor roles
Reference or incorporate a Data Processing Agreement
Address cross-border data transfer requirements
Define privacy impact assessment obligations

AI-specific compliance:

Address training data documentation requirements under the EU AI Act
Define bias assessment and mitigation obligations
Address transparency requirements for AI systems trained on the shared data
Define compliance responsibilities when regulations change during the project

Section 8: Liability and Indemnification

Data sharing creates shared risk. The agreement needs to allocate that risk fairly.

Liability provisions:

Client liability for data accuracy — The client should represent and warrant that the shared data is accurate and that they have the right to share it
Agency liability for data security — The agency should be liable for data security failures within their control
Shared liability for regulatory compliance — Both parties should share compliance obligations based on their respective roles
Limitation of liability — Set reasonable caps on liability related to data sharing
Indemnification — Define mutual indemnification for breaches of data sharing obligations

Practical Frameworks for Different Engagement Models

Project-Based Engagements

For fixed-scope projects, the data sharing agreement should be tightly scoped to the project timeline and deliverables.

Data sharing is limited to the project duration plus a defined wind-down period
All data is destroyed or returned upon project completion
Model ownership transfers to the client with the project deliverables
The agency retains no right to use the data after project completion

Ongoing Service Engagements

For managed AI services where the agency operates the AI system on behalf of the client, data sharing is continuous and the agreement reflects this.

Data sharing is ongoing for the duration of the service agreement
The agency needs ongoing access to data for model retraining and maintenance
Data retention policies need to accommodate operational requirements
Termination provisions need to address data migration and transition

Platform and Product Engagements

When the agency builds an AI platform or product that serves multiple clients, data sharing agreements need to address multi-tenant considerations.

Data isolation requirements between clients
Restrictions on using one client's data to benefit another client's model
Aggregation and anonymization standards for cross-client data usage
Transparency about multi-tenant architecture

Negotiation Strategies

Lead with risk allocation, not legal language. Clients care about who bears risk when things go wrong. Frame the data sharing agreement as a risk allocation tool, not a legal constraint.

Use the data quality section as a project management tool. The data quality provisions in your agreement also serve as a project planning checklist. Walk through them with the client during project kickoff to surface data issues early.

Offer tiered data access. Some clients are uncomfortable sharing all their data at once. Offer a phased approach — share a limited dataset first, demonstrate value, then expand access as trust builds.

Address the "what if we break up" question early. Clients want to know what happens to their data if the engagement ends. Address data portability and destruction proactively rather than waiting for the client to raise it.

Build in review triggers. Include provisions for reviewing and updating the data sharing agreement when project scope changes, regulations change, or data requirements evolve.

Your Next Step

Audit your current data sharing practices. For your last three AI engagements, answer these questions: Was there a formal data sharing agreement? Did it address AI-specific concerns (training data rights, model ownership, data quality requirements)? Were data destruction obligations defined and executed?

If the answers reveal gaps, draft a standardized AI data sharing agreement template using the eight sections outlined above. Have it reviewed by legal counsel with data privacy and AI experience. Then integrate it into your project kickoff process so that no AI engagement begins without a signed data sharing agreement.

The agency in Austin learned that data sharing without formal agreements is not just risky — it is expensive. A $15,000 investment in proper data sharing agreements would have saved $140,000 in wasted work. The math is not complicated.

Section 1: Data Identification and Specification

Before any data moves, both parties need absolute clarity on what data is being shared. Vague descriptions like "customer data" or "operational records" are not sufficient.

What to specify:

Data categories — Enumerate every type of data being shared (transaction records, user behavior logs, equipment sensor readings, text documents, images)
Data format — Specify the technical format for each data category (CSV, JSON, Parquet, database exports, API access)
Data volume — Estimate the volume of data and the frequency of updates
Data timeframe — Define the historical period covered by the data
Data fields — List the specific fields or attributes within each data category
Sample data — Require sample data before the full sharing begins so both parties can verify the data meets specifications

Section 2: Data Quality Requirements

This section is uniquely important for AI projects and often absent from standard data sharing agreements.

Quality dimensions to address:

Completeness — What percentage of missing values is acceptable? How should missing values be handled?
Accuracy — What validation has the client performed on the data? Are there known accuracy issues?
Consistency — Are data formats and values consistent across the dataset, or are there format changes over time?
Timeliness — How current is the data? What is the lag between data generation and data sharing?
Bias assessment — Has the client assessed the data for potential biases? Are certain populations, time periods, or conditions underrepresented?
Labeling quality — For supervised learning projects, what is the quality of data labels? Who created the labels, and what was the labeling methodology?

Remediation process: Define what happens when shared data does not meet quality requirements. Options include:

Client remediates and re-shares data within a specified timeframe
Agency performs data cleaning at an additional cost
Project scope adjusts to reflect data quality limitations
Either party can terminate if data quality issues are not resolvable

Section 3: Data Transfer and Security

How data physically moves from client to agency is a critical governance concern. Insecure data transfer can expose both parties to regulatory penalties and reputational damage.

Transfer mechanisms to specify:

Secure transfer methods — Encrypted file transfer, secure API endpoints, direct database connections
Transfer scheduling — One-time bulk transfer, periodic batch transfers, real-time streaming
Transfer validation — Checksums, record counts, and other verification that transferred data is complete and uncorrupted
Transfer environments — Where data is transferred to (cloud region, on-premises, specific infrastructure)

Security requirements:

Encryption standards for data at rest and in transit
Access controls and authentication requirements
Network security requirements
Logging and audit trail requirements for data access
Incident response procedures for data security events
Penetration testing or security assessment requirements

Section 4: Permitted Use and Restrictions

This is the heart of the data sharing agreement — what can the agency actually do with the data.

Permitted uses to define:

Model training — Can the agency use the data to train AI models? This seems obvious, but the specifics matter. Can the data be used for initial training only, or for ongoing retraining?
Model evaluation — Can the data be used for testing, validation, and benchmarking?
Derived model creation — Can the agency create derivative models or fine-tuned models using the data?
Aggregated insights — Can the agency use aggregated, anonymized insights from the data for other purposes (benchmarking, marketing, product development)?
Internal research — Can the agency use the data for internal research and development beyond the specific project?

Common restrictions:

Data cannot be shared with third parties without prior written consent
Data cannot be used for purposes outside the defined project scope
Data cannot be combined with data from other clients without anonymization
Personal data must be processed in compliance with specified privacy regulations
Data cannot be stored in jurisdictions not approved by the client

Section 5: Data Retention and Destruction

Retention provisions:

Project duration retention — How long data is retained during active project work
Post-project retention — How long data is retained after project completion (for model retraining, debugging, support)
Training data in models — Address the fact that training data influences model weights even after the raw data is deleted
Derived datasets — Specify retention rules for intermediate datasets, feature engineering outputs, and evaluation datasets
Backup retention — Address data retention in backups and disaster recovery systems

Destruction provisions:

Destruction timeline — How quickly data must be destroyed after the retention period or contract termination
Destruction methods — Specify approved data destruction methods (cryptographic erasure, physical destruction, overwriting)
Destruction certification — Require written certification that data has been destroyed
Exceptions — Identify any data that is exempt from destruction (aggregated statistics, anonymized data, model weights)

Section 6: Intellectual Property and Derived Works

Data sharing for AI creates layered IP questions that straightforward data sharing does not.

IP provisions to address:

Client data ownership — Confirm that the client retains ownership of all shared data
Model ownership — Define who owns AI models trained on the shared data
Training artifacts — Address ownership of training artifacts (hyperparameters, training configurations, feature engineering code)
Evaluation results — Define who owns evaluation results, benchmarks, and performance metrics
Improvements and innovations — If the agency develops new techniques or innovations while working with the data, who owns those innovations?
License grants — If the agency retains model ownership, define the license granted to the client

Section 7: Compliance and Regulatory Requirements

Data sharing for AI sits at the intersection of data privacy regulation and emerging AI regulation.

Privacy compliance:

Identify applicable privacy regulations (GDPR, CCPA, HIPAA, industry-specific regulations)
Define data controller and data processor roles
Reference or incorporate a Data Processing Agreement
Address cross-border data transfer requirements
Define privacy impact assessment obligations

AI-specific compliance:

Address training data documentation requirements under the EU AI Act
Define bias assessment and mitigation obligations
Address transparency requirements for AI systems trained on the shared data
Define compliance responsibilities when regulations change during the project

Section 8: Liability and Indemnification

Data sharing creates shared risk. The agreement needs to allocate that risk fairly.

Liability provisions:

Client liability for data accuracy — The client should represent and warrant that the shared data is accurate and that they have the right to share it
Agency liability for data security — The agency should be liable for data security failures within their control
Shared liability for regulatory compliance — Both parties should share compliance obligations based on their respective roles
Limitation of liability — Set reasonable caps on liability related to data sharing
Indemnification — Define mutual indemnification for breaches of data sharing obligations

Practical Frameworks for Different Engagement Models

Project-Based Engagements

For fixed-scope projects, the data sharing agreement should be tightly scoped to the project timeline and deliverables.

Data sharing is limited to the project duration plus a defined wind-down period
All data is destroyed or returned upon project completion
Model ownership transfers to the client with the project deliverables
The agency retains no right to use the data after project completion

Ongoing Service Engagements

For managed AI services where the agency operates the AI system on behalf of the client, data sharing is continuous and the agreement reflects this.

Data sharing is ongoing for the duration of the service agreement
The agency needs ongoing access to data for model retraining and maintenance
Data retention policies need to accommodate operational requirements
Termination provisions need to address data migration and transition

Platform and Product Engagements

When the agency builds an AI platform or product that serves multiple clients, data sharing agreements need to address multi-tenant considerations.

Data isolation requirements between clients
Restrictions on using one client's data to benefit another client's model
Aggregation and anonymization standards for cross-client data usage
Transparency about multi-tenant architecture

Negotiation Strategies

Lead with risk allocation, not legal language. Clients care about who bears risk when things go wrong. Frame the data sharing agreement as a risk allocation tool, not a legal constraint.

Build in review triggers. Include provisions for reviewing and updating the data sharing agreement when project scope changes, regulations change, or data requirements evolve.

A 420K Engagement With No Data Agreement Signed

Why AI Projects Need Specialized Data Sharing Agreements

Anatomy of an AI Data Sharing Agreement

Section 1: Data Identification and Specification

Section 2: Data Quality Requirements

Section 3: Data Transfer and Security

Section 4: Permitted Use and Restrictions

Section 5: Data Retention and Destruction

Section 6: Intellectual Property and Derived Works

Section 7: Compliance and Regulatory Requirements

Section 8: Liability and Indemnification

Practical Frameworks for Different Engagement Models

Project-Based Engagements

Ongoing Service Engagements

Platform and Product Engagements

Negotiation Strategies

Your Next Step

Agency Script Editorial

Related Articles

SOC 2 Compliance for AI Service Providers — The Complete Trust Services Guide

SOX Compliance for AI in Financial Reporting — Ensuring Auditability in Every Algorithm

Complete Model Risk Management Guide — Controlling Risk Across the Model Lifecycle

Ready to certify your AI capability?

A 420K Engagement With No Data Agreement Signed

Why AI Projects Need Specialized Data Sharing Agreements

Anatomy of an AI Data Sharing Agreement

Section 1: Data Identification and Specification

Section 2: Data Quality Requirements

Section 3: Data Transfer and Security

Section 4: Permitted Use and Restrictions

Section 5: Data Retention and Destruction

Section 6: Intellectual Property and Derived Works

Section 7: Compliance and Regulatory Requirements

Section 8: Liability and Indemnification

Practical Frameworks for Different Engagement Models

Project-Based Engagements

Ongoing Service Engagements

Platform and Product Engagements

Negotiation Strategies

Your Next Step

Agency Script Editorial

Related Articles

SOC 2 Compliance for AI Service Providers — The Complete Trust Services Guide

SOX Compliance for AI in Financial Reporting — Ensuring Auditability in Every Algorithm

Complete Model Risk Management Guide — Controlling Risk Across the Model Lifecycle

Ready to certify your AI capability?