Open Source License Compliance for AI Models and Tools

An AI agency in Minneapolis built a document intelligence platform for an insurance company using a mix of open source components: an Apache-licensed embedding model, a GPL-licensed text extraction library, an MIT-licensed vector database client, and a custom fine-tuned model based on a model released under a restrictive community license. The agency delivered the platform, the insurance company deployed it, and everything seemed fine. Then the insurance company's legal team conducted a software audit before an acquisition. They discovered that the GPL-licensed text extraction library had been statically linked into the platform, potentially triggering the GPL's copyleft requirement—meaning the entire platform's source code might need to be released under the GPL. They also discovered that the base model's community license prohibited commercial use in insurance, a restriction nobody at the agency had noticed. The acquisition was delayed by four months while the agency replaced both components. The insurance company billed the agency $180,000 for the delay costs.

Open source software powers the AI ecosystem. From PyTorch and TensorFlow to Hugging Face models and LangChain, virtually every AI agency depends on open source components. But open source does not mean obligation-free. Every open source component comes with a license that imposes conditions on how you can use, modify, and distribute it. Violating those conditions creates legal liability for your agency and your clients.

This post covers the open source license landscape for AI, the specific compliance challenges that AI creates, and the governance framework your agency needs to manage open source risk.

Open Source License Categories

Permissive Licenses

Permissive licenses allow broad use with minimal restrictions. They are generally the least risky for commercial AI agencies.

MIT License: One of the most common licenses in the AI ecosystem. Allows use, modification, and distribution for any purpose, including commercial. Requires preserving the copyright notice and license text. Does not require sharing source code of derivative works.

Apache License 2.0: Similar to MIT but includes an express patent grant and a patent retaliation clause. Widely used by major AI frameworks (TensorFlow is Apache-licensed). Requires preserving notices and documenting changes to the original code.

BSD Licenses (2-clause, 3-clause): Similar to MIT. The 3-clause version adds a restriction against using the original author's name to endorse derivative products.

ISC License: Functionally equivalent to MIT. Common in some open source communities.

Practical impact for agencies: Permissive licenses generally allow you to use the software in commercial AI products without sharing your own source code. Your main obligations are preserving copyright notices and including the license text with your distributions.

Copyleft Licenses

Copyleft licenses require that derivative works be distributed under the same license. This is where compliance gets complicated.

GNU General Public License (GPL v2, v3): Requires that if you distribute software that incorporates GPL-licensed code, you must make the complete source code available under the GPL. This includes your own code that is combined with the GPL code.

GNU Lesser General Public License (LGPL): A weaker copyleft that allows linking to LGPL-licensed libraries without triggering the copyleft requirement for your own code, as long as the LGPL library can be replaced or updated by the user.

GNU Affero General Public License (AGPL): The strongest copyleft license. AGPL triggers the copyleft requirement not only when you distribute the software but also when you make it available over a network (SaaS, API). This is particularly relevant for AI agencies because many AI systems are deployed as services rather than distributed as software.

Mozilla Public License (MPL) 2.0: A file-level copyleft. Changes to MPL-licensed files must be shared under the MPL, but you can combine MPL files with proprietary files without triggering copyleft for the proprietary files.

Practical impact for agencies: Copyleft licenses can require you to release your proprietary code under an open source license. AGPL is particularly dangerous for AI agencies because it applies to network-accessible services. If you use an AGPL-licensed component in a client-facing AI service, you may be required to release the entire service's source code.

AI-Specific Licenses

The AI ecosystem has produced several licenses specifically designed for AI models and datasets. These are newer and often more restrictive than traditional open source licenses.

OpenRAIL Licenses (RAIL, BigScience RAIL, CreativeML OpenRAIL): These licenses allow broad use but include use-based restrictions—specific prohibited uses that the licensor defines. For example, a model released under OpenRAIL may prohibit use for surveillance, disinformation, or discrimination. Violation of the use restrictions terminates the license.

Llama Community License (Meta): Meta's license for Llama models allows broad use but includes restrictions based on monthly active users (above 700 million MAU requires a separate agreement) and prohibited uses.

Mistral License Variants: Mistral has used various license structures for different models, ranging from Apache to more restrictive commercial licenses.

Model-specific community licenses: Many model releases come with custom licenses that may restrict commercial use, specific industries, or specific applications. These licenses vary widely and must be read carefully.

Practical impact for agencies: AI-specific licenses often include restrictions that traditional open source licenses do not—use restrictions, commercial limitations, and application-specific prohibitions. You must read each model's license carefully and evaluate whether your intended use is permitted.

AI-Specific Compliance Challenges

What Is a "Derivative Work" for AI Models

Traditional open source licensing distinguishes between using a library (generally low risk) and creating a derivative work (may trigger copyleft). For AI, the derivative work question is complex and unsettled.

Is a fine-tuned model a derivative work of the base model? Most legal analysis suggests yes—fine-tuning creates a model that incorporates the base model's weights and is therefore derived from it. If the base model has a copyleft or restrictive license, your fine-tuned model likely inherits those restrictions.

Is a model trained on a dataset a derivative work of the dataset? This is less clear. The model does not contain copies of the training data (usually), but it is influenced by it. The legal analysis depends on the specific license and jurisdiction.

Is the output of a model subject to the model's license? Some AI licenses explicitly address output ownership. Others do not. The emerging consensus is that model outputs are generally not derivative works of the model itself, but some licenses explicitly claim rights over outputs or impose conditions on output use.

Is an application that calls a model's API a derivative work of the model? Generally no—API calls are more analogous to using a service than to incorporating code. But if you download and embed the model in your application, the analysis changes.

License Stacking

AI systems typically combine many open source components with different licenses. License stacking—combining components with different license requirements—creates compliance complexity.

Common stacking scenarios:

A permissive model (Apache) combined with a copyleft library (GPL) for preprocessing
An OpenRAIL model combined with MIT-licensed tooling
A proprietary fine-tuned model built on an open source base model with use restrictions
A RAG system combining Apache-licensed retrieval, AGPL-licensed database, and permissive model

Compatibility issues:

GPL and Apache 2.0 are compatible in one direction (you can include Apache code in a GPL project, but not the reverse)
GPL v2 and GPL v3 are not directly compatible in some configurations
AGPL and most proprietary licenses are incompatible
Some AI-specific licenses are incompatible with each other due to conflicting use restrictions

Training Data Licensing

The training data used for AI models carries its own licensing.

Datasets with licenses: Many published datasets have explicit licenses (Creative Commons variants, custom licenses, research-only licenses). Using a research-only dataset to train a commercial model violates the dataset license.

Web-scraped data: Data scraped from the web may be subject to website terms of service, robots.txt restrictions, and copyright law. The legality of web scraping for AI training is actively litigated.

Client data: Data provided by clients for model training may be subject to the client's own licensing restrictions, privacy obligations, or contractual limitations.

Synthetic data: Data generated by AI models may be subject to the generating model's license. If an OpenRAIL model prohibits certain uses, data generated by that model for those uses may violate the license.

Governance Framework

Software Bill of Materials (SBOM)

Maintain a comprehensive inventory of every open source component in your AI systems.

For each component, record:

Component name and version
License type
Source (where you obtained it)
How it is used in your system (linked, called via API, embedded, used for training)
Any modifications you have made
License obligations that apply to your use
Compatibility with other components in the system

Update your SBOM:

When you add new components
When you update existing components (license terms can change between versions)
When you change how a component is used
At least quarterly for comprehensive review

License Review Process

Before incorporating any open source component into a client deliverable, review its license.

Review checklist:

What license is it released under?
Does the license permit commercial use?
Does the license have copyleft requirements?
Are there use restrictions that could affect your intended use or your client's use?
Are there attribution or notice requirements?
Is the license compatible with other components in the system?
Are there patent implications (grants or retaliation clauses)?

Who reviews: Ideally, your license reviews should involve someone with legal training or experience. If you cannot afford dedicated legal review for every component, establish a pre-approved list of licenses (MIT, Apache 2.0, BSD) that do not require individual review, and escalate any component with a different license for detailed analysis.

Client Communication

Your clients need to understand the open source components in the AI systems you build for them.

What to communicate:

The open source components used and their licenses
Any obligations the client inherits (attribution requirements, copyleft implications)
Any use restrictions from AI-specific licenses
The implications for the client's IP ownership of the deliverable

When to communicate:

During the scoping phase, if you know which components you will use
Before delivery, with the complete SBOM
When you update components that change the license landscape

Compliance Monitoring

Open source license compliance is not a one-time activity. Licenses change, components are updated, and new legal interpretations emerge.

Monitor for:

License changes in components you use (some projects change licenses between major versions)
New legal developments affecting open source licensing (court decisions, regulatory guidance)
Community license disputes that could affect components you depend on
Changes in AI-specific license terms as the ecosystem matures

Respond to changes:

When a component changes its license in a way that affects your use, evaluate alternatives
When a legal development changes the risk profile of a license, reassess your exposure
When a dispute arises around a component you use, monitor the situation and have a contingency plan

Contribution Policy

If your team contributes to open source AI projects, you need a contribution policy.

What projects can team members contribute to during work hours?
Who owns the intellectual property in contributions made during work hours?
Are there projects your team should not contribute to due to competitive or IP concerns?
Do contributions need to be reviewed for proprietary information before submission?
What contributor license agreements (CLAs) can team members sign on behalf of the agency?

Practical Recommendations

Default to permissive components. When you have a choice between an MIT-licensed and a GPL-licensed component with similar functionality, choose the MIT-licensed one. Permissive licenses create fewer compliance obligations and risks.

Avoid AGPL in client deliverables. AGPL's network copyleft trigger is particularly dangerous for AI services. If you need a component that is AGPL-licensed, evaluate whether the vendor offers a commercial license alternative.

Read AI model licenses in full. Do not assume that a model on Hugging Face or GitHub is free to use commercially. Many models have use restrictions, commercial limitations, or custom licenses that restrict specific industries or applications.

Isolate copyleft components. If you must use a copyleft component, isolate it architecturally (separate process, API boundary) so that the copyleft does not extend to your proprietary code.

Keep your SBOM current. A stale SBOM is worse than no SBOM because it creates a false sense of compliance.

Include license compliance in your contracts. Define who is responsible for open source compliance—your agency or the client—and what happens if a compliance issue arises.

Your Next Step

This week, create an SBOM for your most active client engagement. List every open source component—frameworks, libraries, models, datasets, tools—and their licenses. Identify any components with copyleft licenses, use restrictions, or commercial limitations. Assess whether your use is compliant with each license's terms.

If you find issues, address them now while it is cheap. Replacing a problematic component in development costs days. Replacing it after delivery and deployment costs weeks or months. Replacing it after a legal demand costs whatever the other side says it costs.

The agency that manages open source licensing proactively builds client trust, avoids costly surprises, and can confidently deliver AI systems that clients can use without legal worry. That confidence is worth more than any single component in your stack.

This post covers the open source license landscape for AI, the specific compliance challenges that AI creates, and the governance framework your agency needs to manage open source risk.

Open Source License Categories

Permissive Licenses

Permissive licenses allow broad use with minimal restrictions. They are generally the least risky for commercial AI agencies.

BSD Licenses (2-clause, 3-clause): Similar to MIT. The 3-clause version adds a restriction against using the original author's name to endorse derivative products.

ISC License: Functionally equivalent to MIT. Common in some open source communities.

Copyleft Licenses

Copyleft licenses require that derivative works be distributed under the same license. This is where compliance gets complicated.

AI-Specific Licenses

The AI ecosystem has produced several licenses specifically designed for AI models and datasets. These are newer and often more restrictive than traditional open source licenses.

Mistral License Variants: Mistral has used various license structures for different models, ranging from Apache to more restrictive commercial licenses.

AI-Specific Compliance Challenges

What Is a "Derivative Work" for AI Models

License Stacking

AI systems typically combine many open source components with different licenses. License stacking—combining components with different license requirements—creates compliance complexity.

Common stacking scenarios:

A permissive model (Apache) combined with a copyleft library (GPL) for preprocessing
An OpenRAIL model combined with MIT-licensed tooling
A proprietary fine-tuned model built on an open source base model with use restrictions
A RAG system combining Apache-licensed retrieval, AGPL-licensed database, and permissive model

Compatibility issues:

GPL and Apache 2.0 are compatible in one direction (you can include Apache code in a GPL project, but not the reverse)
GPL v2 and GPL v3 are not directly compatible in some configurations
AGPL and most proprietary licenses are incompatible
Some AI-specific licenses are incompatible with each other due to conflicting use restrictions

Training Data Licensing

The training data used for AI models carries its own licensing.

Client data: Data provided by clients for model training may be subject to the client's own licensing restrictions, privacy obligations, or contractual limitations.

Governance Framework

Software Bill of Materials (SBOM)

Maintain a comprehensive inventory of every open source component in your AI systems.

For each component, record:

Component name and version
License type
Source (where you obtained it)
How it is used in your system (linked, called via API, embedded, used for training)
Any modifications you have made
License obligations that apply to your use
Compatibility with other components in the system

Update your SBOM:

When you add new components
When you update existing components (license terms can change between versions)
When you change how a component is used
At least quarterly for comprehensive review

License Review Process

Before incorporating any open source component into a client deliverable, review its license.

Review checklist:

What license is it released under?
Does the license permit commercial use?
Does the license have copyleft requirements?
Are there use restrictions that could affect your intended use or your client's use?
Are there attribution or notice requirements?
Is the license compatible with other components in the system?
Are there patent implications (grants or retaliation clauses)?

Client Communication

Your clients need to understand the open source components in the AI systems you build for them.

What to communicate:

The open source components used and their licenses
Any obligations the client inherits (attribution requirements, copyleft implications)
Any use restrictions from AI-specific licenses
The implications for the client's IP ownership of the deliverable

When to communicate:

During the scoping phase, if you know which components you will use
Before delivery, with the complete SBOM
When you update components that change the license landscape

Compliance Monitoring

Open source license compliance is not a one-time activity. Licenses change, components are updated, and new legal interpretations emerge.

Monitor for:

License changes in components you use (some projects change licenses between major versions)
New legal developments affecting open source licensing (court decisions, regulatory guidance)
Community license disputes that could affect components you depend on
Changes in AI-specific license terms as the ecosystem matures

Respond to changes:

When a component changes its license in a way that affects your use, evaluate alternatives
When a legal development changes the risk profile of a license, reassess your exposure
When a dispute arises around a component you use, monitor the situation and have a contingency plan

Contribution Policy

If your team contributes to open source AI projects, you need a contribution policy.

What projects can team members contribute to during work hours?
Who owns the intellectual property in contributions made during work hours?
Are there projects your team should not contribute to due to competitive or IP concerns?
Do contributions need to be reviewed for proprietary information before submission?
What contributor license agreements (CLAs) can team members sign on behalf of the agency?

Practical Recommendations

Isolate copyleft components. If you must use a copyleft component, isolate it architecturally (separate process, API boundary) so that the copyleft does not extend to your proprietary code.

Keep your SBOM current. A stale SBOM is worse than no SBOM because it creates a false sense of compliance.

Include license compliance in your contracts. Define who is responsible for open source compliance—your agency or the client—and what happens if a compliance issue arises.

Open Source License Compliance for AI Models and Tools

Open Source License Categories

Permissive Licenses

Copyleft Licenses

AI-Specific Licenses

AI-Specific Compliance Challenges

What Is a "Derivative Work" for AI Models

License Stacking

Training Data Licensing

Governance Framework

Software Bill of Materials (SBOM)

License Review Process

Client Communication

Compliance Monitoring

Contribution Policy

Practical Recommendations

Your Next Step

Agency Script Editorial

Related Articles

SOC 2 Compliance for AI Service Providers — The Complete Trust Services Guide

SOX Compliance for AI in Financial Reporting — Ensuring Auditability in Every Algorithm

Complete Model Risk Management Guide — Controlling Risk Across the Model Lifecycle

Ready to certify your AI capability?

Open Source License Compliance for AI Models and Tools

Open Source License Categories

Permissive Licenses

Copyleft Licenses

AI-Specific Licenses

AI-Specific Compliance Challenges

What Is a "Derivative Work" for AI Models

License Stacking

Training Data Licensing

Governance Framework

Software Bill of Materials (SBOM)

License Review Process

Client Communication

Compliance Monitoring

Contribution Policy

Practical Recommendations

Your Next Step

Agency Script Editorial

Related Articles

SOC 2 Compliance for AI Service Providers — The Complete Trust Services Guide

SOX Compliance for AI in Financial Reporting — Ensuring Auditability in Every Algorithm

Complete Model Risk Management Guide — Controlling Risk Across the Model Lifecycle

Ready to certify your AI capability?