nav-image-narrow

Research data in Asia-related research

Research data is an important topic in academia. We – the Specialised Information Service Asia (Fachinformationsdienst [FID] Asien) – would like to support you with information about how to handle research data in the social sciences and humanities from and about Asia.

On this page, you will find some introductory information on research data as well as links with further information.

Please feel free to contact us with any questions:
x-asia(at)sbb.spk-berlin.de


Research data refers to all data used in research projects in order to deal with the respective research question. When we talk about research data here, we first of all refer to digitally available data.

Due to the diversity of research fields and methods in Asia-related humanities and social sciences, we deal with very diverse research data, including: text data, bibliographic data, geospatial data, audio, image and video data, numerical or statistical data, but also digital 3D models and program codes such as for relational databases, digital tools and analysis scripts, etc. We generate data in the course of the research process, when studying sources, evaluating and annotating texts, working with objects from collections, through digitisation, recordings and observations, experiments and simulations, qualitative and quantitative surveys, etc. We as researchers create data ourselves, or we create new data when we further develop or evaluate existing data (our own or re-used).

The German Council for Scientific Information Infrastructures (RfII) and in the German Research Foundation’s (DFG) Guidelines on the Handling of Research Data provide relevant definitions of "research data".

Good scientific practice: Research data are an important part of the research process. Therefore, securing and preserving research data is enormously important, among others, in order to make the research process transparent and to ensure the possibility to reproduce the research results (see the DFG Guidelines for Safeguarding Good Scientific Practice, in German).

Reuse of research data: Research data often has a lasting value beyond the research context in which it was created, and can form the basis for other research questions and projects. This is particularly true for unique data that cannot be reproduced. Therefore, it is important to ensure accessibility to the data and to curate the data, i.e. to enrich the data with descriptive and contextualising information to preserve the scientific significance of the data. Often it is helpful to use existing standards (data formats, metadata formats, etc.) when creating and describing the data in order to ensure the comprehensibility of the data. In addition, it is important to consider the perspective of law (please see below).

New research methods: The provision of digital data sets increasingly makes it possible to process such sets using a constantly growing number of digital tools. This opens up completely new ways of research in the field of digital humanities (with respect to research questions, methods, etc.).

It is recommended to deal with the topic of research data management already in the early planning stage of the research project and, if necessary, to create a data management plan (DMP). The aim of research data management is to make the data accessible in the long term, independent of the data producer, and to ensure that it is structured, reusable and verifiable as far as possible. The FAIR Data Principles can help: data should be Findable, Accessible, Interoperable, and Reusable. Accordingly, the project design should already consider or plan which data and how they should be collected, processed, evaluated and described, and which steps are necessary for the transformation, selection and storage of research data.

It is helpful to consider the different stages of the research data life cycle already during the project planning in order to preserve the scientific validity of the data and thus the possibility for subsequent use:

  • Planning research
  • Creating and collecting data
  • Processing and analysing data
  • Publishing and sharing data
  • Preserving data and making it accessible
  • Reusing data
    (see: UK Data Service)

Forschungsdaten.info, for example, provides extensive information on research data management, on various topics such as planning and structuring research data, including the creation of a data management plan (DMP), organising and working with research data, preparing and publishing research data, including information on metadata standards, preserving and re-using research data, and legal and ethical issues that should be taken into account when dealing with research data. However, this website is mostly in German. You can find other guides in English, e.g. the handbook Managing and Sharing Data of the UK Data Archive.

DARIAH provides recommendations for data and metadata formats specific to humanities research data, including those concerning cross-disciplinary objects: "Specific recommendations for data and metadata" (in German). FAIRsharing.org provides a curated list of subject specific data and metadata standards (here the link for humanities and social sciences). In general, when describing the data and choosing the (meta-) data formats, the goal should be to provide the data in a way that these are readable, intelligible and processible by both, humans and machines, today and, ideally, in the future as well.

In addition to general information on research data management, there are a number of subject-specific recommendations and handouts on how to handle research data, e.g. by the German Research Foundation’s (DFG) Review Boards and various academic societies. For Asian Studies, see, for example, the recommendation of the Review Board 106 "Social and Cultural Anthropology, Non-European Cultures, Jewish Studies and Religious Studies" (in German). Depending on the subject and focus of the research question, other recommendations might be relevant as well: for sociology (in German) and economics as well as of the German Data Forum (RatSWD; in German), for scientific editions in literary studies (in German), for collecting language corpora (in German) as well as legal aspects in the handling of such corpora (in German) in linguistics.

More and more research funding organisations expect, when applying for a project, the researcher (or group) already considers how to deal with the research data he/she will create, evaluate and develop in the project. The German Research Foundation (DFG) provides Guidelines on the Handling of Research Data and the European Commission provides Guidelines on FAIR Data Management in Horizon 2020. Forschungsdaten.info has compiled an overview of important requirements and information on guidelines of research funding organisations. Such requirements of research funding organisations often include a data management plan.

A data management plan helps to plan and structure the handling of research data in a research project. It describes how the data that is / will be collected, processed, evaluated, and described should be handled during and after the project. See also the article on data management plans at forschungsdaten.info (in German) or UK Data Archive handbook Managing and Sharing Data.

Many research funders expect a DMP along with the application for a project. You will find sample DMPs for meeting institutional and funder requirements on the websites of the Humboldt-Universität zu Berlin. In addition, tools for the creation of DMPs are available, such as DMPonline with a template for European Commission’s H2020 project proposals or RDMO, which is designed to meet the requirements of funding organisations in the German-speaking area, i.e. DMPs for DFG and BMBF project proposals.

Research data are either stored in the institutional repositories of the research institutions where the respective researchers are located, or in discipline- or topic-specific repositories. Please check the respective terms of use and access information.

Directories of repositories:

  • OpenDOAR – directory of repositories with various content types, including research data
  • Re3Data – Registry of Research Data Repositories, a registry to search for repositories that provide research data. It is possible to filter the result by discipline: humanities and social sciences 
  • Repository Finder – a service provided by DataCite to find repositories in Re3Data that meet the criteria recommended by the Enabling FAIR Data Project

Metasearch engines for research data:

  • Base – Search engine especially for academic web resources
  • Cinii Research – Cross-search, Discoverysuche for academic output, including research data, provided by the National Institute of Informatics (NII), Research Center for Open Science and Data Platform (RCOS)
  • DataCite – Search in metadata of participating Data centres
  • dataOn – Korean National Research Data Platform Service
  • IRBD Institutional Repository Database 学術機関リポジトリデータベース – NII National Institute of Informatics
  • Linghub – Search for language resources in the repositories of CLARIN, LRE-Map, META-SHARE and DataHub
  • RatSWD – Search for research data provided by centres accredited by the German Data Forum (RatSWD)

Interdisciplinary (research data) repositories:

  • GitHub – Software development projects
  • OpenAIRE – freely accessible research results, i.e. publications and data sets, from EU-funded projects (keyword: Open Science).
  • Peking University Open Research Data 北京大学开放研究数据平台
  • Zenodo – research results from all disciplines, i.e. publications, data sets, presentations, etc.

Subject-specific repositories and data collections:

Humanities data:

Social research data:

  • Barometer on China's Development 中国发展数据库 – Universities Service Centre for China Studies, CUHK
  • Beijing City Lab 北京城市实验室
  • Chinese Social Quality Data Archive 中国社会质量基础数据库 – Chinese Academy of Social Sciences
  • CNSDA Chinese National Survey Data Archive 中国学术调查数据资料库 – National Survey Research Center (NSRC), Renmin University of China, and National Natural Science Foundation of China
  • DataHub – contains especially social science and economics data sets
  • Fudan University Social Science Data Repository 复旦大学社会科学数据平台
  • ICPSR – social science data of the Inter-University Consortium for Political and Social Research
  • KOSSDA Korea Social Science Data Archive 한국사회과학자료원
  • PORI Hong Kong Public Opinion Research Institute 香港民意研究所
  • SowiDataNet | datorium – search for social science research data in GESIS, the Leibniz Institute for the Social Sciences (so far contains very few data about Asia)
  • SRDA Survey Research Data Archive 學術調查研究資料庫 – Center for Survey Research, Research Center for Humanities and Social Sciences, Academia Sinica
  • SSJDA Social Science Japan Data Archive SSJデータアーカイブ – Center for Social Research and Data Archives, Institute of Social Science, The University of Tokyo
  • ... and in official statistics and governmental data repositories of the respective countries and regions

In addition, we plan to use a joint search to display research data stored in our repository as well as data relevant to Asian Studies stored in repositories of other institutions.

When planning and conceptualising your own research project, consider reusing already existing (and published) data sources. This might be revevant e.g. for meta-analyses, for expanding your research focus or for optimising your study design. Research data often include material that has not been analysed in the original research. Relevant, for example, is data produced in projects that are similar in terms of content and/or methodology or when your project does follow-up research. Finally, reusing data is economic and saves resources.

When reusing data that originated in other research contexts, it is important to consider the following aspects:

  • Data quality and comprehensiveness: What is the context the data has been produced in? Is information provided on the methodology used, on quality control and data comprehensiveness? Has the data in the original research already been published, with a peer-review process?
  • Data source and reliability: Where is the data published? Which institution is behind the repository? Does the repository provides information on its policy for archiving and long-term availability? Does it provide persistent identifiers (DOI, etc.)?
  • Description and metadata: What data formats are used? Is the dataset adequately described with standardised metadata formats that are widely accepted in the respective discipline? Are descriptions of project design and methodology provided, with information how the data has been produced and further processed? Are data types, variables, etc. defined in data dictionaries?
  • Accessibility and terms of use: Is information on accessibility and conditions for reusage defined, for example by using open licenses?

If you are reusing data: Please remember to cite the source of the data you are using.

We, the FID Asia, will be happy to assist you if you would like to publish your data. If you would like to publish data in the CrossAsia repository, please contact us.

When selecting a repository, you should bear in mind that you might use non-Latin scripts and probably those that are not Unicode-compatible in the research data, in particular in the metadata and descriptive materials for the research data. Please make sure that these will be displayed correctly and that they are searchable in the repository. 

  • CLARIN-D – research data services and centres for linguistic data and language corpora
  • CrossAsia Open Access Repository – Open Access Repository for research concerning Asia
  • DARIAH-DE – research data repository for the humanities and cultural sciences
  • Subject-specific repository, search e.g. via Re3Data
  • GitHub – development platform
  • Institutional, university repository of your institution
  • TextGrid – digital preservation archive for humanities research data (so far the repository does not contain any texts in Asian languages)
  • Zenodo – interdisciplinary repository for scientific data sets, it is possible to implement access restriction

In order to publish data (open accessible) it is important to consider some legal questions, in particular if copyright protected, sensitive and/or personal data is concerned.

In the case of personal data, it is always required to get the consent of the persons concerned. In addition, please clarify whether the data can be anonymised.

If the copyright of others is affected, e.g. if larger parts of a database or of a title are involved, the authors should be contacted in advance.

Forschungsdaten.info has created a document to support the decision making process for publishing research data that deals with the most important legal aspects (in German).

As part of the "FID Asia" project, we plan to develop a central access and search system for Asia-related research data. We will keep you up to date in our blog.


We are looking forward to your support:

  • Tell us about other important sources and repositories for Asia-related research data.
  • Tell us where you published your data.
  • Tell us what services and support you would like to receive with respect to research data.
  • Ask us if you need support, e.g. in creating a data management plan, planning and describing research data, or archiving them.

You are welcome to contact us with these and any other questions at any time:
x-asia(at)sbb.spk-berlin.de