Research · Prepared datasets

Prepared datasets

Datasets we collect, clean and prepare in-house — with field definitions, source notes and a QC record, plus samples and delivery on request. These are our own prepared products, ready to deliver. If what you need is a search of public sources, we deliver source links and prepared results rather than downloading third-party raw files for you (the two are different in scope; see the Curated datasets page for details).

First batch: 5 core datasets — see a sample first, then receive delivery on requestEach dataset has one spec card with field definitions, source notes and a QC record; exact scope, definitions and licensing are confirmed together with you.
Request a sample and quote
About these datasets

All of the datasets below are products we prepared from public sources we collected and processed. The sizes, fields and time coverage shown on each card come from our real, prepared delivery files; where a dataset has several delivery shapes, they are described together on one card. For figures not yet published on this page, contact us for a sample to check against. Exact scope, definitions and licensing are confirmed together with you.

Prepared datasets

First-batch prepared datasets · one card each

One spec card per dataset: read the value and representative fields first, then expand to see the full fields and a sample; the several delivery shapes of one dataset sit on a single card, to use as your purpose requires.

Public safety & casesPrepared dataset

Structured metro accident case dataset

Scattered metro accident and incident cases organized into a structured table with consistent fields, including a searchable main case table and an analysis table aggregated by dimension. Ready to load for accident retrieval, risk attribution and pattern analysis.

ShapeStructured tables (main table + analysis table)
Size151 accident cases
FieldsMain table 35 columns / analysis table 7 columns
TimeAccident years 2016–2025
SourceCollected from public sources + field schema by agreement
DeliveryTables + field dictionary + source notes

Includes 2 delivery tables: a main case table (151×35, searchable and attributable) and an analysis summary table (88×7, aggregated across 5 dimensions — year / type / city / construction method / grade).

View fields and details

Representative fields

FieldDefinition / meaningExample
Accident typeAccident / incident classificationCollapse
Accident gradeSeverity grade of the accidentMajor accident
Fatalities / injuriesCasualties in the case3 / 5
Direct economic lossDirect loss amountAbout 12 million RMB
Construction methodConstruction method involvedShield tunneling
Liability determinationConclusion on liabilityContractor primarily liable

Examples are anonymized illustrations, not real records; the full 35-column field dictionary and anonymized samples are provided with a quote.

Use cases

Metro and rail-transit safety research, building an accident case library, risk attribution and case retrieval.

Source index & collectionPrepared dataset

Tree-heritage weather-disaster source link set

A list of web links, topic-checked and all verified accessible, with domain distribution and audit fields. Ready to use as a source index for text collection, event expansion and knowledge extraction.

ShapeLink index table (with audit enrichment)
Size1,000 web links
FieldsMinimal 2 columns / audit-enriched 16 columns
TimeSee the sample for details
SourceCollected from public web pages + topic check
DeliveryTable + field dictionary + source notes

Two deliveries: a minimal link set (ID + link, 2 columns) and an audit-enriched set (16 columns, about 816 domestic / 184 overseas, all verified accessible).

View fields and details

Representative fields (audit-enriched version)

FieldDefinition / meaningExample
Final redirect URLThe URL the link lands onhttps://…
DomainDomain of the source siteGovernment / media / university
Domestic/overseas flagSource regionDomestic
Topic tierTopic relevance tierTier-1 relevant
Relevance scoreTopic relevance0.92
HTTP status codeAccessibility200

Examples are anonymized illustrations, not real records; the full 16-column field dictionary and anonymized samples are provided with a quote.

Use cases

A source index for text-corpus collection, event-library expansion and knowledge extraction, as well as strict source filtering and secondary extraction.

Agriculture & yield predictionPrepared dataset

Gansu crop yield · weather · disaster panel dataset

Yield, weather and disaster for four crops in Gansu Province, organized into an annual panel on a common basis, in long-table and wide-table shapes. Suited to time-series modeling, regression analysis and thesis appendices.

ShapeAnnual panel (long table + wide table)
SizeLong table 184 rows / wide table 46 rows
FieldsLong table 16 columns / wide table 24 columns
Time1978–2023
SourcePrepared from multiple sources + common-basis alignment
DeliveryTables + field dictionary + source notes

Two shapes: a long table (184×16, 4 crops × 46 years, modeling-friendly) and a wide table (46×24, one row per year, convenient for regression and thesis appendices). Covers winter wheat, spring wheat, maize and rapeseed.

View fields and details

Representative fields

FieldDefinition / meaningExample
CropOne of the four cropsWinter wheat
Sown areaArea sown that year825,000 mu
Total output / yield per unitOutput that year and output per unit area3.1 million t / 376 kg per mu
Annual mean temperatureMean temperature that year9.8 ℃
Annual cumulative precipitationPrecipitation that year320 mm
Total affected areaDisaster-affected area that year (by disaster type)450,000 mu

Examples are anonymized illustrations, not real records; the full field dictionary (including drought / wind-hail / flood / frost by disaster type) and anonymized samples are provided with a quote.

Use cases

Yield time-series modeling, weather–yield relationship research, disaster-impact analysis, regression modeling and thesis appendices.

Price & traceabilityPrepared dataset

Vegetable supply-chain price and traceability dataset

Whole-chain price, cost and traceability information from origin to retail, organized by vegetable category, data stage and date. Ready to use for price prediction, supply-and-demand analysis and anomaly monitoring.

ShapeStructured tables (multiple shapes)
SizeUp to 10,000 rows × 52 columns
FieldsPrice / cost / traceability / scores
Time2022–2023
SourceReal underlying tables + standardized enrichment
DeliveryTables + field dictionary + source notes

Covers 8 data stages × 8 vegetable categories; provided from a common source in 4 delivery shapes: a filtered base table (2000×25), a full enriched table (10000×52), a main sample table (2000×52) and a modeling feature table (2000×45, one-hot encoded).

View fields and details

Representative fields

FieldDefinition / meaningExample
Vegetable categoryOne of 8 categoriesCucumber
Data stageOne of 8 stages — growing / pesticide / logistics / sales, etc.Logistics & transport
Unit priceUnit selling price at that stage4.20 RMB
Origin wholesale priceOrigin wholesale price anchor2.15 RMB
Market demand indexDemand-strength score98.7
Data quality scoreComposite multi-dimension quality score4.6 / 5

Examples are anonymized illustrations, not real records; the full 52-column field dictionary and anonymized samples are provided with a quote.

Use cases

Vegetable price prediction and trend research, supply-and-demand and traceability analysis, anomaly monitoring, and training price-type models.

Weather & citiesPrepared dataset

20-city 2022 daily weather dataset

Daily weather data for 20 major Chinese cities across all of 2022, including temperature, humidity, wind speed and wind direction, with an accompanying city-level annual mean table. Ready to use for city climate comparison and time-series analysis.

ShapeTime-series tables (daily values + annual means)
SizeDaily values 7,300 records / annual means 20 rows
FieldsDaily values 10 columns / annual means 5 columns
TimeAll of 2022
SourcePrepared from Open-Meteo (ECMWF ERA5)
DeliveryTables + field dictionary + source notes

Two deliveries: a daily table (7300×10, 20 cities × 365 days) and an annual mean table (20×5, one row per city). Covers Beijing, Shanghai, Guangzhou, Chongqing, Urumqi and 15 other cities.

View fields and details

Representative fields

FieldDefinition / meaningExample
CityCity name (Chinese / English)北京 / Beijing
DateObservation date2022-07-12
Daily mean temperatureDaily mean temperature, ℃28.6
Relative humidityDaily mean humidity, %64
Wind speedDaily mean wind speed, km/h12.3
Prevailing wind directionMain wind direction that day, °135

Examples are anonymized illustrations, not real records; the full field dictionary and anonymized samples are provided with a quote.

Use cases

Cross-city climate comparison, weather time-series analysis, linkage research with external data such as yield or energy, and thesis and report appendices.

All of the datasets above are products we prepared from public sources we collected and processed; exact scope, definitions and licensing are confirmed together with you.

The dataset you need isn't in this batch?

Tell us the data shape, fields and definitions you need, and we will collect, clean and prepare the dataset on request, with samples and source notes.

Talk to us