Can open datasets be used commercially?
Whether you can use a dataset commercially comes down to its license. The most permissive ones — CC0, PDDL — put the data in the public domain, so anything goes; CC BY, CC BY-SA and ODbL allow commercial use but require attribution; anything with NC is non-commercial only, and anything with ND can be used but not modified and redistributed. The table below helps you match your case to the right rule.
The short answer first
Whether a dataset can be used commercially comes down to its license. The most permissive, CC0 and PDDL, effectively place the data in the public domain, so anything goes; CC BY, CC BY-SA and ODbL allow commercial use but require attribution, and CC BY-SA and ODbL additionally require that any new dataset you build on top of them be released openly as well; anything with NC (non-commercial) is for non-commercial use only, and commercial use needs separate permission; anything with ND can be used but not modified and then redistributed. Find the license note on the dataset page before downloading, and match your case to the table below.
First, tell the two license families apart
Two families of licenses appear most often on open datasets, and once you understand one, the other is easy to read:
- Creative Commons (CC): the most general family, originally designed for content such as articles and images, but also widely used for data. Common ones include CC0, CC BY, CC BY-SA and CC BY-NC, with version 4.0 being the current mainstream.
- Open Data Commons (ODC): a family designed specifically for databases, mainly addressing database-related rights, comprising PDDL, ODC-BY and ODbL. Many database-style datasets use it.
Concepts such as "attribution" and "share-alike" carry across both families. Whichever one you encounter, judge it by that family's corresponding terms.
One table on whether common licenses allow commercial use
| License | Commercial use | Attribution required | Derivatives must stay open | In one line |
|---|---|---|---|---|
| CC0 / PDDL | Allowed | No | No | Effectively public domain, almost no restrictions |
| CC BY / ODC-BY | Allowed | Yes | No | Commercial use is fine as long as you credit the source |
| CC BY-SA / ODbL | Allowed | Yes | Yes (same or compatible license) | Commercial use is fine, but your new data must stay open too |
| CC BY-NC (and combinations with NC) | Non-commercial only | Yes | Depends on the combination | Commercial use needs separate permission |
| CC BY-ND | Allowed, but no modified redistribution | Yes | Not applicable | Use it as-is; no modifying and re-releasing |
In the table, "derivatives must stay open" refers to requirements like ShareAlike and ODbL: when you build a new dataset on it and then publish it, that dataset must use the same or a compatible license. Combinations such as CC BY-NC-SA and CC BY-NC-ND stack "non-commercial" together with other conditions, so read them by their strictest term.
How to find the license on a dataset page
- Kaggle: in the License field on the dataset page, chosen by the publisher; CC0 and CC BY-SA are common.
- Hugging Face: noted on the dataset card; some datasets also require you to accept terms of use or request access first.
- Zenodo, figshare, Harvard Dataverse: the license is noted on the DOI record page, which makes it easy to cite and verify.
- Government and institutional data: check the platform's terms of use, copyright notice or open-data agreement; the wording varies from one body to another.
When you cannot find an explicit license note, do not assume the data is free to use as you like, and certainly do not assume commercial use is allowed. You can contact the data provider to confirm, or switch to comparable data with a clear license.
Common pitfalls
- Different datasets on the same platform can have completely different licenses: do not assume this one is CC0 just because the last one was. Check each individually.
- The boundary of "non-commercial" is easy to misjudge: NC means use that is not primarily for profit; using it in a paid product or a commercial project usually counts as commercial use and needs separate permission.
- Combining multiple sources means following the strictest: when a piece of work draws on several sources, you must satisfy each source's license at once, following the strictest constraint.
- The "viral" nature of ShareAlike and ODbL: if you use such data and then publish a new dataset, you may be required to release it under the same or a compatible license as well, so think it through before any closed commercial use.
The final interpretation of any license rests with the official full license text and the data provider's statements; for significant commercial use, it is wise to have a professional review it.
Not sure whether you can use it? Let us assess first
If you cannot tell whether a particular dataset can be used in your project, or you need several sources at once and worry about conflicting licenses, hand us your research or business goal and the conditions it must meet. We start with a free data availability assessment, running real searches on authoritative data platforms, judging matches and gaps against each of your required items, and noting the license status of every candidate dataset truthfully so you can decide whether commercial use is possible. Even when no perfectly matching dataset is found, the search directions, approximate sources and item-by-item judgments are presented truthfully for your reference.
