Research data in Asia-related research

Research data is an important topic in academia. We – the Specialised Information Service Asia (Fachinformationsdienst [FID] Asien) – would like to support you with information about how to handle research data in the social sciences and humanities from and about Asia.

On this page, you will find some introductory information on research data as well as links with further information.

Please feel free to contact us with any questions:
x-asia(at)sbb.spk-berlin.de

What is research data in Asia-related studies?

Research data refers to all data used in research projects in order to deal with the respective research question. When we talk about research data here, we first of all refer to digitally available data.

Due to the diversity of research fields and methods in Asia-related humanities and social sciences, we deal with very diverse research data, including: text data, bibliographic data, geospatial data, audio, image and video data, numerical or statistical data, but also digital 3D models and program codes such as for relational databases, digital tools and analysis scripts, etc. We generate data in the course of the research process, when studying sources, evaluating and annotating texts, working with objects from collections, through digitisation, recordings and observations, experiments and simulations, qualitative and quantitative surveys, etc. We as researchers create data ourselves, or we create new data when we further develop or evaluate existing data (our own or re-used).

The German Council for Scientific Information Infrastructures (RfII) and in the German Research Foundation’s (DFG) Guidelines on the Handling of Research Data provide relevant definitions of "research data".

Why is it important to consider research data?

Good scientific practice: Research data are an important part of the research process. Therefore, securing and preserving research data is enormously important, among others, in order to make the research process transparent and to ensure the possibility to reproduce the research results (see the DFG Guidelines for Safeguarding Good Scientific Practice, in German).

Reuse of research data: Research data often has a lasting value beyond the research context in which it was created, and can form the basis for other research questions and projects. This is particularly true for unique data that cannot be reproduced. Therefore, it is important to ensure accessibility to the data and to curate the data, i.e. to enrich the data with descriptive and contextualising information to preserve the scientific significance of the data. Often it is helpful to use existing standards (data formats, metadata formats, etc.) when creating and describing the data in order to ensure the comprehensibility of the data. In addition, it is important to consider the perspective of law (please see below).

New research methods: The provision of digital data sets increasingly makes it possible to process such sets using a constantly growing number of digital tools. This opens up completely new ways of research in the field of digital humanities (with respect to research questions, methods, etc.).

What do I have to consider during my research with respect to research data?

It is recommended to deal with the topic of research data management already in the early planning stage of the research project and, if necessary, to create a data management plan (DMP). The aim of research data management is to make the data accessible in the long term, independent of the data producer, and to ensure that it is structured, reusable and verifiable as far as possible. The FAIR Data Principles can help: data should be Findable, Accessible, Interoperable, and Reusable. Accordingly, the project design should already consider or plan which data and how they should be collected, processed, evaluated and described, and which steps are necessary for the transformation, selection and storage of research data.

It is helpful to consider the different stages of the research data life cycle already during the project planning in order to preserve the scientific validity of the data and thus the possibility for subsequent use:

Planning research
Creating and collecting data
Processing and analysing data
Publishing and sharing data
Preserving data and making it accessible
Reusing data
(see: UK Data Service)

Forschungsdaten.info, for example, provides extensive information on research data management, on various topics such as planning and structuring research data, including the creation of a data management plan (DMP), organising and working with research data, preparing and publishing research data, including information on metadata standards, preserving and re-using research data, and legal and ethical issues that should be taken into account when dealing with research data. However, this website is mostly in German. You can find other guides in English, e.g. the handbook Managing and Sharing Data of the UK Data Archive.

DARIAH provides recommendations for data and metadata formats specific to humanities research data, including those concerning cross-disciplinary objects: "Specific recommendations for data and metadata" (in German). FAIRsharing.org provides a curated list of subject specific data and metadata standards (here the link for humanities and social sciences). In general, when describing the data and choosing the (meta-) data formats, the goal should be to provide the data in a way that these are readable, intelligible and processible by both, humans and machines, today and, ideally, in the future as well.

Are there also specific recommendations for Asia-related studies?

In addition to general information on research data management, there are a number of subject-specific recommendations and handouts on how to handle research data, e.g. by the German Research Foundation’s (DFG) Review Boards and various academic societies. For Asian Studies, see, for example, the recommendation of the Review Board 106 "Social and Cultural Anthropology, Non-European Cultures, Jewish Studies and Religious Studies" (in German). Depending on the subject and focus of the research question, other recommendations might be relevant as well: for sociology (in German) and economics as well as of the German Data Forum (RatSWD; in German), for scientific editions in literary studies (in German), for collecting language corpora (in German) as well as legal aspects in the handling of such corpora (in German) in linguistics.

When applying for funding, do I have to address research data?

More and more research funding organisations expect, when applying for a project, the researcher (or group) already considers how to deal with the research data he/she will create, evaluate and develop in the project. The German Research Foundation (DFG) provides Guidelines on the Handling of Research Data and the European Commission provides Guidelines on FAIR Data Management in Horizon 2020. Forschungsdaten.info has compiled an overview of important requirements and information on guidelines of research funding organisations. Such requirements of research funding organisations often include a data management plan.

What is a Data Management Plan (DMP)?

A data management plan helps to plan and structure the handling of research data in a research project. It describes how the data that is / will be collected, processed, evaluated, and described should be handled during and after the project. See also the article on data management plans at forschungsdaten.info (in German) or UK Data Archive handbook Managing and Sharing Data.

Many research funders expect a DMP along with the application for a project. You will find sample DMPs for meeting institutional and funder requirements on the websites of the Humboldt-Universität zu Berlin. In addition, tools for the creation of DMPs are available, such as DMPonline with a template for European Commission’s H2020 project proposals or RDMO, which is designed to meet the requirements of funding organisations in the German-speaking area, i.e. DMPs for DFG and BMBF project proposals.

Where can I find Asia-related research data?

Research data are either stored in the institutional repositories of the research institutions where the respective researchers are located, or in discipline- or topic-specific repositories. Please check the respective terms of use and access information.

Directories of repositories:

OpenDOAR – directory of repositories with various content types, including research data
Re3Data – Registry of Research Data Repositories, a registry to search for repositories that provide research data. It is possible to filter the result by discipline: humanities and social sciences
Repository Finder – a service provided by DataCite to find repositories in Re3Data that meet the criteria recommended by the Enabling FAIR Data Project

Metasearch engines for research data:

Base – Search engine especially for academic web resources
Cinii Research – Cross-search, Discoverysuche for academic output, including research data, provided by the National Institute of Informatics (NII), Research Center for Open Science and Data Platform (RCOS)
DataCite – Search in metadata of participating Data centres
dataOn – Korean National Research Data Platform Service
IRBD Institutional Repository Database 学術機関リポジトリデータベース – NII National Institute of Informatics
Linghub – Search for language resources in the repositories of CLARIN, LRE-Map, META-SHARE and DataHub
RatSWD – Search for research data provided by centres accredited by the German Data Forum (RatSWD)

Interdisciplinary (research data) repositories:

GitHub – Software development projects
OpenAIRE – freely accessible research results, i.e. publications and data sets, from EU-funded projects (keyword: Open Science).
Peking University Open Research Data 北京大学开放研究数据平台
Zenodo – research results from all disciplines, i.e. publications, data sets, presentations, etc.

Subject-specific repositories and data collections:

Humanities data:

Azora Bunko 青空文庫 – Japanische literarische Werke
CBETA Research Platform CBETA 數位研究平台 – Chinese Buddhist Electronic Text Association, Dharma Drum Institute of Liberal Arts (Online Reader, Data Center)
Chinese Text Project 中國哲學書電子化計劃 – Pre-modern Chinese Texts
CLARIN-D – Virtual Language Observatory (VLO) for scientific language data in the CLARIN context and Federated Context Search (FCS) for searching accessible resources
CrossAsia N-gram Service – Data sets developed from texts stored in the Integrated Text Repository CrossAsia for unrestricted download and analysis
CrossAsia Fulltextsearch – Search in the textual resources hosted in the Integrated Textrepository CrossAsia
Humanities Research Data Repository 人文学研究データリポジトリ – Center of Open data in the Humanities, NII, in cooperation with National Institute for Japanese Language and Linguistics (NINJAL) (Datasets, Projects)
Kanseki Repository 漢籍リポジトリ – pre-modern Chinese Texts including Buddhist and Daoist Texts
NINJAL Academic Repository of the National Institute for Japanese Language and Linguistics 国立国語研究所学術情報リポジトリ (Databases and datasets)
NPM Open Data – National Palace Museum 國立故宮博物院
OPEN CONTEXT – archaeological research data (includes a few projects on Asia)
SAT Daizōkyō Text Database – corpus of Buddhist Texts in Chinese with Japanese (and English) translation

Social research data:

Barometer on China's Development 中国发展数据库 – Universities Service Centre for China Studies, CUHK
Beijing City Lab 北京城市实验室
Chinese Social Quality Data Archive 中国社会质量基础数据库 – Chinese Academy of Social Sciences
CNSDA Chinese National Survey Data Archive 中国学术调查数据资料库 – National Survey Research Center (NSRC), Renmin University of China, and National Natural Science Foundation of China
DataHub – contains especially social science and economics data sets
Fudan University Social Science Data Repository 复旦大学社会科学数据平台
ICPSR – social science data of the Inter-University Consortium for Political and Social Research
KOSSDA Korea Social Science Data Archive 한국사회과학자료원
PORI Hong Kong Public Opinion Research Institute 香港民意研究所
SowiDataNet | datorium – search for social science research data in GESIS, the Leibniz Institute for the Social Sciences (so far contains very few data about Asia)
SRDA Survey Research Data Archive 學術調查研究資料庫 – Center for Survey Research, Research Center for Humanities and Social Sciences, Academia Sinica
SSJDA Social Science Japan Data Archive SSJデータアーカイブ – Center for Social Research and Data Archives, Institute of Social Science, The University of Tokyo
... and in official statistics and governmental data repositories of the respective countries and regions

In addition, we plan to use a joint search to display research data stored in our repository as well as data relevant to Asian Studies stored in repositories of other institutions.

What is necessary to consider when reusing data?

When planning and conceptualising your own research project, consider reusing already existing (and published) data sources. This might be revevant e.g. for meta-analyses, for expanding your research focus or for optimising your study design. Research data often include material that has not been analysed in the original research. Relevant, for example, is data produced in projects that are similar in terms of content and/or methodology or when your project does follow-up research. Finally, reusing data is economic and saves resources.

When reusing data that originated in other research contexts, it is important to consider the following aspects:

Data quality and comprehensiveness: What is the context the data has been produced in? Is information provided on the methodology used, on quality control and data comprehensiveness? Has the data in the original research already been published, with a peer-review process?
Data source and reliability: Where is the data published? Which institution is behind the repository? Does the repository provides information on its policy for archiving and long-term availability? Does it provide persistent identifiers (DOI, etc.)?
Description and metadata: What data formats are used? Is the dataset adequately described with standardised metadata formats that are widely accepted in the respective discipline? Are descriptions of project design and methodology provided, with information how the data has been produced and further processed? Are data types, variables, etc. defined in data dictionaries?
Accessibility and terms of use: Is information on accessibility and conditions for reusage defined, for example by using open licenses?

If you are reusing data: Please remember to cite the source of the data you are using.

Where can I publish Asia-related research data?

We, the FID Asia, will be happy to assist you if you would like to publish your data. If you would like to publish data in the CrossAsia repository, please contact us.

When selecting a repository, you should bear in mind that you might use non-Latin scripts and probably those that are not Unicode-compatible in the research data, in particular in the metadata and descriptive materials for the research data. Please make sure that these will be displayed correctly and that they are searchable in the repository.

CLARIN-D – research data services and centres for linguistic data and language corpora
CrossAsia Open Access Repository – Open Access Repository for research concerning Asia
DARIAH-DE – research data repository for the humanities and cultural sciences
Subject-specific repository, search e.g. via Re3Data
GitHub – development platform
Institutional, university repository of your institution
TextGrid – digital preservation archive for humanities research data (so far the repository does not contain any texts in Asian languages)
Zenodo – interdisciplinary repository for scientific data sets, it is possible to implement access restriction

What do I have to consider when publishing research data?

In order to publish data (open accessible) it is important to consider some legal questions, in particular if copyright protected, sensitive and/or personal data is concerned.

In the case of personal data, it is always required to get the consent of the persons concerned. In addition, please clarify whether the data can be anonymised.

If the copyright of others is affected, e.g. if larger parts of a database or of a title are involved, the authors should be contacted in advance.

Forschungsdaten.info has created a document to support the decision making process for publishing research data that deals with the most important legal aspects (in German).

Can I make my published research data searchable via CrossAsia?

As part of the "FID Asia" project, we plan to develop a central access and search system for Asia-related research data. We will keep you up to date in our blog.

We are looking forward to your support:

Tell us about other important sources and repositories for Asia-related research data.
Tell us where you published your data.
Tell us what services and support you would like to receive with respect to research data.
Ask us if you need support, e.g. in creating a data management plan, planning and describing research data, or archiving them.

You are welcome to contact us with these and any other questions at any time:
x-asia(at)sbb.spk-berlin.de

Important links at a glance

FAIR Data Principles
Decision-making support: Publishing research data
Forschungsdaten.info
Forschungsdaten.org

DFG Guidelines on the Handling of Research Data
DFG Guidelines for Safeguarding Good Scientific Practice
DFG Review Board 106: Recommendations on how to handle research data

Contact FID Asia

Staatsbibliothek zu Berlin - PK
East Asia Department
Tel.: +49 30 266-436001
E-Mail: x-asia(at)sbb.spk-berlin.de