Where to find thesis data
Start with the source that fits your discipline: for macro and socioeconomic data, look to the National Bureau of Statistics; for archived research data, look to ScienceDB, Zenodo, figshare and Harvard Dataverse; for international panel data, look to the World Bank and the Penn World Table; and on campus there are library-subscribed databases too. Below we set out what each source is good for, how to get in, and what to watch for.
The short answer
The fastest way to find thesis data is to go straight to the legitimate source that fits your discipline. For economics and social topics, use the National Data platform of China's National Bureau of Statistics and the World Bank's open data; for science, engineering and multidisciplinary research data, use DOI-bearing archival repositories such as ScienceDB, Zenodo, figshare and Harvard Dataverse; and on campus you can also use library-subscribed databases such as CnOpenData and EPS (which usually require a campus network connection or a library account). Confirm the license first, reconcile field definitions next, and only then start organizing — it saves a lot of detours.
A list of legitimate sources: what each is good for and how to get in
1. Official and open data in China
- National Bureau of Statistics · National Data (data.stats.gov.cn): good for China's macro and socioeconomic indicators, organized by month, quarter, year and census, and also including the China Statistical Yearbook. You query or export directly on the official platform, with no charge.
- ScienceDB (scidb.cn): a general-purpose data repository built and operated by the Computer Network Information Center of the Chinese Academy of Sciences, good for archived research data across disciplines. Each dataset is assigned a DOI and a CSTR, and published data is free to access and download.
- Local government open-data platforms: many provinces and cities run open-data portals, good for local-level data on public services and urban operations. Coverage and update frequency vary by region, so check the data notes and time range before using.
2. Databases subscribed to by university libraries
- CnOpenData: a general data platform covering economics, law, healthcare, the humanities and more, with topic databases on patents, business registration, listed companies and other subjects — good for finding firm- and industry-level data in economics, management and social science.
- EPS data platform: a numeric data resource platform with research series on the macroeconomy, industries, trade and financial markets, good for economic and regional analysis.
- A reminder on access: these commercial databases are subscribed to and paid for by the institution, and usually require a campus network environment or a library account to log in. Which databases you can actually use depends on the resources your own university library has subscribed to.
3. International research data repositories
- Zenodo (zenodo.org): an open repository built and run jointly by CERN and OpenAIRE, good for finding citable research data, software and paper supplements with DOIs; uploading and access are free.
- figshare (figshare.com): an open-access repository of research outputs where each item is assigned a DOI; uploading and access are free, datasets are often released under Creative Commons licenses, and it is good for finding directly citable outputs such as figures, tables and datasets.
- Harvard Dataverse (dataverse.harvard.edu): maintained by bodies including Harvard's Institute for Quantitative Social Science, open and free to researchers across disciplines, assigning a DOI to each dataset; it is one of the larger open research data repositories worldwide, and especially rich in social science data.
4. International panel and statistical data
- World Bank Open Data (data.worldbank.org): the World Development Indicators (WDI) cover around 220 economies and thousands of time-series indicators, are licensed under CC BY 4.0 and free to download, and are good for cross-country economic and social comparison.
- Penn World Table (ggdc.net/pwt): national accounts data maintained by the University of Groningen, released under CC BY 4.0, good for long-run cross-country comparison of real GDP, productivity and price levels. We have a structured reference card for it in our curated datasets.
5. Mirrors and domestic alternatives
- Direct access to some international platforms can be unstable from within China. When that happens, try the platform's domestic mirror site first, or switch to a comparable domestic public platform or data repository, so the data you need comes from a source that is reliably accessible.
| Source | Good for | Free? |
|---|---|---|
| NBS National Data | China's macro and socioeconomic indicators | Free |
| ScienceDB | Multidisciplinary archived research data (with DOIs) | Free |
| CnOpenData / EPS | Firm, industry and macroeconomic data | Campus subscription (account required) |
| Zenodo / figshare / Dataverse | Citable research data and paper supplements | Free |
| World Bank / Penn World Table | Cross-country panels and development indicators | Free |
The three most common pitfalls when looking for data
- The license is hard to read, so you hesitate to use it: datasets on the same platform may carry different licenses — some allow commercial use with attribution, others are restricted to non-commercial academic research. Confirm the license type before downloading, and cite the source in your thesis as required.
- Field definitions don't line up: for the same indicator from different sources, the units, statistical scope and reporting years may differ, so combining them directly produces wrong calculations. Reconcile the field definitions first, then decide how to merge.
- What you download is in another language: many international platforms provide dataset documentation and field names in English, which can be hard going. Read the data dictionary first to confirm what each field means, then start processing.
If you really can't find it, or want to save the work of organizing
If you have worked through the sources above and still cannot find data that fits your thesis, or the data is scattered across several sources and too time-consuming to organize, set out your research question and the conditions it must meet, and hand them to us. We start with a free data availability assessment, run a real search across authoritative data platforms, and judge matches and gaps item by item against the requirements you list. Even when no perfectly matching dataset is found, the search directions, approximate sources and item-by-item assessment are presented honestly for your reference.
For a match, we provide a structured dataset reference card with a source link; where there is no match, we still present the search directions and approximate sources, rather than just handing over a pile of links.
