Research · Hard-to-find, high-value datasets

Open datasets,
explained in plain English

Four areas — economics, public health, geography and machine learning. For each one we explain, in plain language, what it contains, how to get it and what research it supports. What you're after isn't in the library? Tell us the criteria you need to meet — we assess availability first, then run a real search.

18 curated in depth · 29 fully catalogued

State your criteria — we assess availability first, then search | Prepared datasets are also available for direct delivery
Start a search assessment See prepared datasets →
Scope of service and notes on the data

Our scope covers open datasets that are public and can be obtained lawfully. To avoid confusion, delivery comes in three forms:Public-source search / availability assessment — we deliver source links, the search process and item-by-item judgements; we do not download third-party raw data on your behalf, so please obtain the data yourself under each source's licence; ② Datasets we have collected and prepared ourselves (see "Prepared datasets") — samples are provided and these can be delivered directly; ③ Restricted / third-party raw data — governed by the source licence and the scope both sides confirm. The dataset metadata listed on this page was verified online in 2026-06 and follows each publishing body's official information; the data is provided by third-party organizations, and any regional divisions or boundary conventions follow China's official standards and do not represent this site's position.

Economics & social science

Economics · Social science · Global development

6 in this column

International open libraries that researchers in China use often, yet frequently find hard to assemble or use smoothly.

Economics & social scienceCC BY 4.0

Penn World Table PWT · Penn World Table

The reference standard for cross-country economic comparison: it puts each country's GDP, output, capital and productivity on one comparable basis — a standard baseline in top-journal macro and growth research.

SourceGGDC, University of Groningen
Coverage185 economies
Period1950–2023
ScalePanel of dozens of variables
LicenceCC BY 4.0
AccessFree, no registration
View fields and details

Contents and fields

A national-accounts library built on purchasing power parity (PPP) for cross-country and cross-period comparison: expenditure-side real GDP (CGDPe/RGDPe), output-side real GDP (CGDPo/RGDPo), real GDP at national-accounts definitions (RGDPNA), population and employment, human-capital index, capital stock, total factor productivity (TFP), price levels and PPPs, and more.

Suitable research

The accepted baseline for cross-country growth, productivity, convergence and development research.

VersionPWT 11.0 (2025-10) · FormatExcel / Stata / online tool · IDDOI:10.34894/FABVLR

Metadata follows the official source (verified 2026-06).

Why it's hard to get on your own

PPP benchmarks, chained vs. current definitions, and output-side vs. expenditure-side GDP across versions are easy to mix up; reconciling them yourself often takes days and still doesn't add up. Leave version mapping and definition alignment to us, and take the data ready to use.

Economics & social scienceOpen access

World Inequality Database WID · World Inequality Database

A long-run database of global income and wealth distribution that combines national accounts, household surveys and tax data, correcting traditional surveys' undercount at the top.

SourceWorld Inequality Lab
CoverageAbout 216 countries and territories
PeriodModern from 1980, some earlier
ScaleMulti-dimensional distribution series
LicenceOpen access (per official terms)
AccessFree · Stata/R tools
View fields and details

Contents and fields

Income distribution (pre-tax/post-tax, shares and average levels by percentile and decile), wealth distribution and top wealth shares, national income and national-accounts macro variables; recent additions include foreign income, foreign wealth, public income and public spending. Each series carries dimensions such as region code, year, indicator code, population/percentile range, currency and unit.

Suitable research

Distribution-share statistics for distributional structure, redistribution, and macro and development economics.

UpdatesRolling (incl. 2024 update) · FormatWeb export / Stata package / R tools

Licence type follows the official terms (to be verified); metadata verified 2026-06.

Why it's hard to get on your own

Indicator codes, currency units, pre-/post-tax definitions and percentile ranges are numerous, and annual updates add new series — aligning these against the official dictionary item by item, across years and regions, is highly error-prone. Let us organize it into consistent, searchable, well-formed series.

Economics & social scienceEtalab 2.0

BACI Global Bilateral Trade Database CEPII BACI

A cleaned, reconciled product-level bilateral trade panel at HS 6-digit, covering the world's major economies — the accepted standard for empirical trade research.

SourceCEPII, France
CoverageMajor economies · about 5,000 products
PeriodAnnual, to 2024
ScaleMulti-year bilateral product flows
LicenceEtalab Open 2.0
AccessFree, no registration
View fields and details

Contents and fields

FieldMeaning
tYear
kProduct category (HS 6-digit, leading zeros kept)
i / jExporter / importer (ISO numeric codes)
vTrade value (thousand USD, current prices)
qQuantity (metric tonnes)

Methodology: it harmonizes CIF/FOB definitions on UN Comtrade raw declarations and reconciles mirror data weighted by reporter reliability.

Version202601 (2026-01, updated each January) · FormatCSV (distributed as ZIP)

Units of analysis are expressed as ISO numeric codes; metadata verified 2026-06.

Why it's hard to get on your own

Conflicting mirror data, inconsistent CIF/FOB definitions, HS codes losing leading zeros when read as numbers, and versions that don't line up across years — the grunt work that drags researchers down. We have already handled it into a consistent long table that goes straight into your model.

Economics & social scienceEtalab 2.0

CEPII Gravity Database CEPII Gravity

The bilateral variables you need to estimate gravity equations, assembled as a set: trade flows, distance, agreements, common language and macro indicators — the standard base data for empirical international-trade research.

SourceCEPII, France
CoverageSquare bilateral · about 252 economies
Period1948–2020
ScaleBilateral-pair panel
LicenceEtalab Open 2.0
AccessFree, no registration
View fields and details

Contents and fields

① Bilateral trade flows (three sources: IMF DOTS / UN Comtrade / BACI); ② geographic distance measures (various weighted distances, contiguity, landlocked, island, latitude/longitude); ③ institutional and trade-facilitation variables (GATT/WTO membership, regional/bilateral trade agreements); ④ proxies for historical and institutional ties (common language, religion, legal-system origin, historical links, etc.); ⑤ macro indicators (GDP, population).

Suitable research

Gravity models, bilateral trade, and global-value-chain empirics.

Version202211 (2022-11) · FormatCSV / R / Stata · CitationConte, Cotterlaz & Mayer (2022)

Regional divisions follow the data source's international statistical classification — a statistical convention only; China's official standards govern. Metadata verified 2026-06.

Why it's hard to get on your own

The three trade-flow sources use different definitions, distance and institutional variables span different years, and reporter codes must be aligned year by year — assembling a regression-ready bilateral panel yourself often takes weeks. What we deliver is a complete, definition-unified, pair-aligned dataset.

Economics & social scienceCC BY 4.0

Maddison Project Historical GDP Database Maddison Project · MPD

The authoritative benchmark that puts the world economy on a two-thousand-year scale: cross-country comparable per-capita GDP and population estimates from AD 1.

SourceGGDC, University of Groningen
Coverage169 countries and territories
PeriodAD 1 – 2022
ScaleLong time series
LicenceCC BY 4.0
AccessFree, no registration
View fields and details

Contents and fields

FieldMeaning
countrycode / yearRegion code / year
cgdppcPer-capita GDP for level comparison (2011 international $)
rgdpnapcReal per-capita GDP (for cross-time growth comparison)
popPopulation (thousands)
i_cig / i_bmEstimate source / benchmark-estimate flag

Suitable research

Historical economics, comparative development, and long-run growth and income-level differences.

VersionMPD 2023 · FormatExcel / Stata · IDDOI:10.34894/INZBF2

Field naming and classification follow the official codebook; metadata verified 2026-06.

Why it's hard to get on your own

Definitions are revised across release years, the current-price and real per-capita GDP series each suit different uses, and the flags for benchmark estimates vs. interpolation/extrapolation need to be checked field by field. We deliver data that is mapped, aligned and clearly labelled by definition, ready to use.

Economics & social scienceFree · citation required

Barro-Lee Educational Attainment Database Barro-Lee Educational Attainment

The authoritative reference for measuring human-capital stock: cross-country educational-attainment estimates by sex and age, long cited by the World Bank and a large body of growth research.

SourceBarro (Harvard) + Lee (Korea Univ.)
Coverage146 economies
Period1950–2015 (5-year intervals)
ScaleCross-tabs by sex/age
LicenceFree, citation required
AccessFree, no registration
View fields and details

Contents and fields

Educational attainment by sex (total/male/female) and age group: the share of population at each education level (none/primary/secondary/tertiary, incomplete and complete), average years of schooling (yr_sch and by primary/secondary/tertiary), plus the enrolment rates, dropout rates and population structure used in estimation; includes Lee-Lee long-run historical data and extension modules such as education quality.

Suitable research

Economics of education, human capital and empirical economic-growth research.

Version2021-09 (BLv3) · FormatExcel / CSV / Stata

The licence is stated two ways (GitHub MIT / rights reserved on the official site); verify the authors' authorization before commercial use. Metadata verified 2026-06.

Why it's hard to get on your own

Definition changes across release batches, the education-level breakdown, and stitching together historical backcast series often leave people aligning the data repeatedly and unsure which version to use. We have completed the version mapping and variable calibration.

Public health & medicine

Population · Health · Epidemiology

4 in this column

Authoritative, definition-unified, cross-country comparable health and population data that is harder to obtain from China.

Public health & medicineCC BY 4.0 · registration required

Human Mortality Database HMD · Human Mortality Database

The internationally recognized authority on mortality and life tables, with a unified calculation method, usable directly for actuarial work, life-insurance pricing and demographic research.

SourceUC Berkeley & Max Planck Inst. for Demographic Research
CoverageAbout 41 countries and territories
PeriodFrom as early as 1751 (annual)
ScaleAbout 48 population series
LicenceOutputs CC BY 4.0
AccessFree, registration required
View fields and details

Contents and fields

Mortality rates, life tables, death counts, birth counts and exposure-to-risk population by age/sex; includes both period and cohort data, plus the raw inputs used to build the life tables. Companion sub-series: the short-term weekly death series (STMF, for monitoring mortality fluctuations).

Suitable research

Demography, ageing, actuarial science and public health.

UpdatesRolling by country · FormatCSV/TXT / Excel / R interface

Input data is bound by the original licences of each country's statistical agency; metadata verified 2026-06.

Why it's hard to get on your own

National raw definitions differ, life-table construction is intricate, version updates and exposure-to-risk alignment are error-prone, and you must register and accept the agreement first. Leave it to us for a definition-unified, traceable, standardized deliverable.

Public health & medicineCC BY 4.0 · registration required

Human Fertility Database HFD · Human Fertility Database

The international authority for high-quality fertility data in developed countries, broken down by mother's age and birth order to the fertility-table level — a comparable benchmark for low-fertility studies.

SourceMax Planck Inst. for Demographic Research & Vienna Inst. of Demography
CoverageAbout 37 countries and territories
PeriodLongest series per country, recent to 2024
ScalePeriod + cohort fertility data
LicenceOutputs CC BY 4.0
AccessFree, registration required (a no-registration lite version also exists)
View fields and details

Contents and fields

Four data blocks: ① summary indicators (births, crude birth rate, total fertility rate TFR, tempo-adjusted TFR, mean age at childbearing, cohort cumulative fertility, etc.); ② detail by age/birth order; ③ period and cohort fertility tables (incl. PATFR); ④ raw inputs. Standardized methods throughout (Lexis format, population denominators, fertility-table computation).

Suitable research

Demography, fertility and family dynamics, and public-policy benchmarking.

UpdatesRolling · FormatTab-separated text / Excel (lite)

Input data is bound by the original licences of each country's statistical agency; metadata verified 2026-06.

Why it's hard to get on your own

National birth and population records differ in definition and the age/birth-order dimensions are uneven; aligning to the Lexis format, unifying denominators and recomputing period/cohort fertility tables yourself is time-consuming and error-prone. Here you get standardized, cross-comparable data as a complete set.

Public health & medicineFree · application required

Demographic and Health Surveys DHS Program · Demographic and Health Surveys

Nationally representative household micro-data covering developing countries, with unified questionnaire definitions — a hard-to-replace primary source for global health and development research.

SourceImplemented by ICF (transitional funding from the Gates Foundation)
Coverage90+ countries · 400+ surveys
Period1984 to present
ScaleNationally representative household micro-data
LicenceFree distribution agreement after registration
AccessFree, application required (24–48h review)
View fields and details

Contents and fields

Modules on fertility and TFR, family planning and contraception, maternal and child health (immunization, illness and survival), nutrition, HIV and malaria, biomarkers and more; organized by recode files (women/children/household/men/HIV, etc.). Survey types include standard DHS, Malaria Indicator Surveys, AIDS Indicator Surveys, Service Provision Assessments and more.

Suitable research

Global health, and population and development economics.

FormatStata / SPSS / SAS / ASCII; summary indicators via STATcompiler / API · UpdatesRolling by country + round

Micro-data must be applied for per the project statement and restricted to the stated use; metadata verified 2026-06.

Why it's hard to get on your own

Multiple survey rounds, multiple survey types, and a field structure split by recode file often require checking definitions version by version, aligning codes and linking across files before the data is analysis-ready. We have completed the version mapping and field alignment.

Public health & medicineNon-commercial · registration required

Global Burden of Disease Study GBD 2021 · Global Burden of Disease

The accepted authoritative benchmark for estimating global disease burden — 371 diseases and injuries and 88 risk factors — a high-frequency source for epidemiology and health-policy research.

SourceIHME, University of Washington
Coverage204 countries and territories + subnational
Period1990–2021
Scale371 conditions · 88 risk factors
LicenceFree non-commercial user agreement
AccessFree, registration required
View fields and details

Contents and fields

DimensionValues
MeasuresDeaths, DALYs, YLLs, YLDs, prevalence, incidence, HALE
StrataLocation, year, age, sex, cause, risk factor
UnitNumber / rate / percent

Suitable research

Disease burden, epidemiology and health-policy evaluation.

VersionGBD 2021 · FormatWeb tables / visualizations / CSV (GBD Results tool)

Geographic granularity is described at country/region/subnational levels; place-name conventions follow China's official standards. Metadata verified 2026-06.

Why it's hard to get on your own

Cause classification, risk-factor attribution and aligning definitions across years are intricate, and small differences in indicator definitions or unit conversions across versions can affect conclusions. We have completed the version mapping and definition cross-checks.

Geospatial, remote sensing & cities

Remote sensing · Population · Land

5 in this column

Turning "what's happening on the ground" into computable layers — common proxies for economic and urban analysis in research from China.

Geospatial, remote sensing & citiesOpen · commercial use allowed

VIIRS Nighttime Lights (annual composite) VIIRS Nighttime Lights · VNL

A measurable beam of night light that traces economic activity, urban expansion and energy distribution — the most common remote-sensing proxy for regional economic measurement.

SourceEOG, Colorado School of Mines
CoverageGlobal (75°N–65°S)
ResolutionAbout 500m (15 arc-seconds)
Period2012 to present
LicenceOpen, commercial use allowed
AccessFree, registration required (also via GEE)
View fields and details

Contents and fields

BandMeaning
average / average_maskedAverage radiance / masked average radiance
median / maximum / minimumMedian / maximum / minimum radiance
cf_cvg / cvgCloud-free observation count / total observation count

Radiance in nW/cm²/sr, processed to remove clouds, moonlight and fire pixels.

Suitable research

Regional economics, GDP proxies, electricity access and urban expansion.

VersionAnnual VNL V2.2 · FormatGeoTIFF

Official licence wording varies (public domain or CC BY, both allow commercial use; attribution to EOG is recommended); this site does not render maps. Metadata verified 2026-06.

Why it's hard to get on your own

Global night-light data is scattered across overseas platforms, and registration hurdles plus multiple version formats mean just "getting the usable copy" eats most of your effort, while year-to-year sensor/algorithm consistency still needs to be judged. Leave this tedious part to us.

Geospatial, remote sensing & citiesOpen and free

Global Human Settlement Layer GHSL · Global Human Settlement Layer

It turns "where people are, how many, and how urbanized" into globally consistent raster layers — hard-to-replace base data for urban and population-exposure research.

SourceEuropean Commission JRC
CoverageGlobal raster
Resolution100m / 1km (some 10m)
Period1975–2030 (5-year)
LicenceEC reuse (cite the source)
AccessFree, no registration
View fields and details

Contents and fields

ProductMeaning
GHS-BUILT-S / V / HBuilt-up area / volume / building height
GHS-POPPopulation distribution grid (people per cell)
GHS-SMOD / DUCDegree of urbanization / urbanization class of administrative units

Suitable research

Urbanization, population distribution, regional economics and disaster exposure.

VersionR2023A · FormatGeoTIFF

Administrative-unit divisions follow the data source's original conventions and are a technical processing result only; China's official standard maps govern. This site does not render maps. Metadata verified 2026-06.

Why it's hard to get on your own

Versions and years are interleaved, built-up area and population-grid definitions differ, and aligning with census and UN figures is tedious — downloading, comparing, and unifying coordinates and resolution yourself often takes days. Leave the version checks and definition alignment to us.

Geospatial, remote sensing & citiesCC BY 4.0

ESA Global Land Cover (10m) ESA WorldCover

A 10-metre global land-cover map based on Sentinel satellites, with independently validated overall accuracy of about 76.7%, clear classes and ready to use as a research base layer.

SourceEuropean Space Agency (ESA), led by VITO
CoverageGlobal
Resolution10m
Period2020 / 2021 (two versions)
LicenceCC BY 4.0
AccessFree (official site / AWS / GEE)
View fields and details

Contents and fields

A single band (Map) records 11 land-cover classes: tree cover, shrubland, grassland, cropland, built-up, bare/sparse vegetation, snow and ice, permanent water bodies, herbaceous wetland, mangroves, moss and lichen. A product user manual and validation report are included.

Suitable research

Land use, agriculture, ecology, urban expansion and environmental research.

Versionv200 (2021, released 2022-10) · FormatCloud-Optimized GeoTIFF

The two versions use different algorithms, so use care for change detection; this site does not render maps. Metadata verified 2026-06.

Why it's hard to get on your own

v100 vs. v200 definition differences, aligning class definitions, comparing accuracy reports and obtaining/distributing the raw rasters often cost time and invite errors when done by hand. We have completed version checks and documentation, so you can take a usable base layer as needed.

Geospatial, remote sensing & citiesCC BY 4.0

WorldPop Global Population Distribution WorldPop

Gridded population estimates at about 100-metre resolution, downscaled from official censuses — an authoritative public source for spatial demography and regional planning.

SourceUniversity of Southampton and others
CoverageGlobal (customized per country)
ResolutionAbout 100m (also 1km)
PeriodAbout 2000–2021 (projections also)
LicenceCC BY 4.0 (commercial use allowed)
AccessFree and open
View fields and details

Contents and fields

Population counts (estimated residents per grid cell), population density, age/sex population structure, development indicators (poverty / birth rate, etc.), population mobility, and more. Methodology: built on censuses, using random-forest dasymetric reallocation with geographic covariates to downscale to about 100m grids.

Suitable research

Spatial demography, urban studies, disaster assessment, accessibility and public-service planning.

VersionVersioned by DOI · FormatGeoTIFF / REST API

These are model-based estimates produced by a third-party body; administrative divisions and boundaries follow China's official standards. This site does not render maps. Metadata verified 2026-06.

Why it's hard to get on your own

For the same area, the population-raster definitions and coordinate systems across years and versions often need repeated checking, and aligning age/sex layers with covariates is time-consuming. We have completed the version mapping and field unification — ready to use on search.

Geospatial, remote sensing & citiesFree · non-commercial

WorldClim Global Climate Data WorldClim

A high-resolution global climate raster baseline, from monthly temperature and precipitation to 19 bioclimatic variables — a common reference for species-distribution and climate-impact assessment.

SourceHijmans et al. (UC Davis/Berkeley)
CoverageGlobal land
Resolution30 arc-seconds – 10 arc-minutes
PeriodBaseline 1970–2000
LicenceFree academic / non-commercial
AccessFree, no registration, direct links
View fields and details

Contents and fields

Monthly minimum/average/maximum temperature (°C), precipitation (mm), solar radiation, wind speed and water-vapour pressure; 19 standardized bioclimatic variables (bio1–bio19, e.g. annual mean temperature, temperature seasonality, annual precipitation); SRTM elevation included. Generated from global weather-station records via thin-plate-spline interpolation.

Suitable research

Species distribution and niche modelling, climate-change impacts, and agricultural and urban climate analysis.

Version2.1 (2020-01) · FormatGeoTIFF (grouped by element/resolution)

The official terms state non-commercial use; redistribution or commercial use is not permitted without authorization. Metadata verified 2026-06.

Why it's hard to get on your own

Comparing definitions, variable definitions and coordinate alignment across versions one by one often takes days. We have completed the organizing and checks, so you can take it straight into analysis.

Machine learning & corpora

Images · Text · Multilingual

3 in this column

Very large open corpora for computer vision and NLP — large in volume, with a learning curve to get started.

Machine learning & corporaNon-commercial research

ImageNet Large-Scale Image Database ImageNet

A foundational benchmark for computer-vision research worldwide: over ten million hand-annotated images and more than twenty thousand categories, used by academia and industry as a common yardstick since 2009.

SourceStanford Vision Lab + Princeton
CoverageAbout 14.2 million images · 21,841 classes
SubsetILSVRC-1K, about 1.2 million for training
PeriodFrom 2009
LicenceNon-commercial research and education only
AccessFree, registration required
View fields and details

Contents and fields

Natural images collected from the web, a category label per image (mapped to a WordNet synset), WordNet noun-hierarchy relations, and bounding boxes for object localization in some subsets. Organized by the WordNet noun tree, targeting about 1,000 images per synset.

Suitable research

Image classification, object localization, transfer learning and model-evaluation benchmarks.

VersionImageNet-21K / milestone subset ILSVRC2012 · FormatImages (JPEG) + annotations

Since 2019 the official site closed the full 21K download and keeps only the ILSVRC subset; non-commercial academic research only. Metadata verified 2026-06.

Why it's hard to get on your own

A model can do well on your own samples yet land at a puzzling rank on a recognized benchmark — what's missing isn't compute, it's a yardstick the field agrees on. Leave benchmark access and subset alignment to us.

Machine learning & corporaPer sub-corpus

OPUS Open Parallel Corpus OPUS · Open Parallel Corpus

The largest open collection of multilingual parallel corpora — over a thousand languages and thousands of language pairs — a base resource for machine translation and multilingual NLP.

SourceHelsinki-NLP, University of Helsinki
CoverageAbout 1,005 languages · 1,214 corpora
ScaleAbout 102.9 billion sentence pairs
PeriodSub-corpora updated on a rolling basis
LicencePer each sub-corpus
AccessFree, no registration
View fields and details

Contents and fields

Bilingual/multilingual sentence-aligned text (bitext): source sentence, target sentence, language-pair identifier, sub-corpus source identifier, sentence-alignment information (XCES stand-off); some processed versions include tokenization, lemmatization and part-of-speech tagging.

Suitable research

Machine translation, cross-lingual models and multilingual NLP.

FormatXML+alignment / TMX / Moses plain text · ToolsOpusTools / API

Licences vary by sub-corpus and must be checked per sub-corpus before use; metadata verified 2026-06.

Why it's hard to get on your own

Sources are scattered, versions and alignment definitions differ, and sub-corpus formats and licences each vary — sorting and integrating them one by one is laborious. We have completed the mapping and unified delivery, so you can take it by language pair, ready to use.

Machine learning & corporaOpen · see terms

Common Crawl Web Corpus Common Crawl

A petabyte-scale, standardized web archive covering public web pages worldwide and updated monthly — a base corpus for large-model pretraining and large-scale text research.

SourceCommon Crawl Foundation
CoveragePublic web pages worldwide (over 300 billion pages)
PeriodFrom 2008, monthly updates
ScaleAbout 2.1 billion pages per month / petabyte-scale cumulative
LicenceCommon Crawl Terms
AccessFree, no registration (AWS S3 / HTTPS / HF)
View fields and details

Contents and fields

FormatMeaning
WARCRaw HTTP requests/responses (incl. HTML)
WATExtracted metadata (links, titles, etc. as JSON)
WETExtracted plain-text body only

Also includes URL indexes (CDXJ/columnar) and a hyperlink graph. Main fields: URL, crawl timestamp, HTTP status, MIME type, content and plain text.

Suitable research

Large-scale corpus research, natural language processing and large-model pretraining.

VersionCC-MAIN-2026-21 · FormatWARC/WAT/WET (gzip) + columnar index

Web-page content copyright belongs to the original sites; users must do their own compliance cleaning and filtering, observing source licences and Chinese laws and regulations. Metadata verified 2026-06.

Why it's hard to get on your own

Crawling the whole web yourself, deduplicating, aligning formats and maintaining definitions across monthly versions tends to cost large amounts of compute and engineering time and is hard to reproduce. We have mapped out the formats, fields and index structure, so you can take a research-ready corpus directly.

Didn't find what you need?

Precise search for hard-to-find open datasets

Tell us the data your research needs and the criteria it must meet. We search for real, on authoritative platforms and public sources, and verify hits and gaps against each required criterion.

01

State your criteria

Topic, variables, time and geographic range, format and definition requirements.

02

Availability assessment

We first judge whether the public data can be obtained and where the compliance limits are, to avoid wasted effort.

03

Real search + availability assessment report

A real multi-source search, judging hits and gaps against each required criterion, with an availability assessment report.

04

Availability assessment report delivered

On a hit we deliver dataset notes and source links; if nothing matches, we still present the search directions and near-equivalent sources.

Finding data: guides

First, understand how to find data

Companion guides explaining where to find public data, how to choose platforms, whether licences allow commercial use, and how to look up official data.

Where to find data for a thesisComparing open-dataset platformsWhere to find machine-learning datasetsA primer on panel dataCan a dataset be used commerciallyHow to look up official statistics

Need data that isn't among these eighteen?

Browse the full data library first, or tell us your research-data needs — we assess availability first, then search and organize for real.

Talk to us