nav-image-narrow

Research data in Asia-related research

Research data is an important topic in academia. We – the Specialised Information Service Asia (Fachinformationsdienst [FID] Asien) – would like to support you with information about how to handle research data in the social sciences and humanities from and about Asia.

On this page, you will find some introductory information on research data as well as links with further information.

Please feel free to contact us with any questions:
x-asia(at)sbb.spk-berlin.de


Research data refers to all data used in research projects in order to deal with the respective research question. When we talk about research data here, we first of all refer to digitally available data.

Due to the diversity of research fields and methods in Asia-related humanities and social sciences, we deal with very diverse research data, including: text data, bibliographic data, geospatial data, audio, image and video data, numerical or statistical data, but also digital 3D models and program codes such as for relational databases, digital tools and analysis scripts, etc. We generate data in the course of the research process, when studying sources, evaluating and annotating texts, working with objects from collections, through digitisation, recordings and observations, experiments and simulations, qualitative and quantitative surveys, etc. We as researchers create data ourselves, or we create new data when we further develop or evaluate existing data (our own or re-used).

The German Council for Scientific Information Infrastructures (RfII) and in the German Research Foundation’s (DFG) Guidelines on the Handling of Research Data provide relevant definitions of "research data".

Good scientific practice: Research data are an important part of the research process. Therefore, securing and preserving research data is enormously important, among others, in order to make the research process transparent and to ensure the possibility to reproduce the research results (see the DFG Guidelines for Safeguarding Good Scientific Practice, in German).

Reuse of research data: Research data often has a lasting value beyond the research context in which it was created, and can form the basis for other research questions and projects. This is particularly true for unique data that cannot be reproduced. Therefore, it is important to ensure accessibility to the data and to curate the data, i.e. to enrich the data with descriptive and contextualising information to preserve the scientific significance of the data. Often it is helpful to use existing standards (data formats, metadata formats, etc.) when creating and describing the data in order to ensure the comprehensibility of the data. In addition, it is important to consider the perspective of law (please see below).

New research methods: The provision of digital data sets increasingly makes it possible to process such sets using a constantly growing number of digital tools. This opens up completely new ways of research in the field of digital humanities (with respect to research questions, methods, etc.).

It is recommended to deal with the topic of research data management already in the early planning stage of the research project and, if necessary, to create a data management plan (DMP). The aim of research data management is to make the data accessible in the long term, independent of the data producer, and to ensure that it is structured, reusable and verifiable as far as possible. The FAIR Data Principles can help: data should be Findable, Accessible, Interoperable, and Reusable. Accordingly, the project design should already consider or plan which data and how they should be collected, processed, evaluated and described, and which steps are necessary for the transformation, selection and storage of research data.

It is helpful to consider the different stages of the research data life cycle already during the project planning in order to preserve the scientific validity of the data and thus the possibility for subsequent use:

  • Planning research
  • Creating and collecting data
  • Processing and analysing data
  • Publishing and sharing data
  • Preserving data and making it accessible
  • Reusing data
    (see: UK Data Service)

Forschungsdaten.info, for example, provides extensive information on research data management, on various topics such as planning and structuring research data, including the creation of a data management plan (DMP), organising and working with research data, preparing and publishing research data, including information on metadata standards, preserving and re-using research data, and legal and ethical issues that should be taken into account when dealing with research data. However, this website is mostly in German. You can find other guides in English, e.g. the handbook Managing and Sharing Data of the UK Data Archive.

DARIAH provides recommendations for data and metadata formats specific to humanities research data, including those concerning cross-disciplinary objects: "Specific recommendations for data and metadata" (in German). FAIRsharing.org provides a curated list of subject specific data and metadata standards (here the link for humanities and social sciences). In general, when describing the data and choosing the (meta-) data formats, the goal should be to provide the data in a way that these are readable, intelligible and processible by both, humans and machines, today and, ideally, in the future as well.

In addition to general information on research data management, there are a number of subject-specific recommendations and handouts on how to handle research data, e.g. by the German Research Foundation’s (DFG) Review Boards and various academic societies. For Asian Studies, see, for example, the recommendation of the Review Board 106 "Social and Cultural Anthropology, Non-European Cultures, Jewish Studies and Religious Studies" (in German). Depending on the subject and focus of the research question, other recommendations might be relevant as well: for sociology (in German) and economics as well as of the German Data Forum (RatSWD; in German), for scientific editions in literary studies (in German), for collecting language corpora (in German) as well as legal aspects in the handling of such corpora (in German) in linguistics.

More and more research funding organisations expect, when applying for a project, the researcher (or group) already considers how to deal with the research data he/she will create, evaluate and develop in the project. The German Research Foundation (DFG) provides Guidelines on the Handling of Research Data and the European Commission provides Guidelines on FAIR Data Management in Horizon 2020. Forschungsdaten.info has compiled an overview of important requirements and information on guidelines of research funding organisations. Such requirements of research funding organisations often include a data management plan.

A data management plan helps to plan and structure the handling of research data in a research project. It describes how the data that is / will be collected, processed, evaluated, and described should be handled during and after the project. See also the article on data management plans at forschungsdaten.info (in German) or UK Data Archive handbook Managing and Sharing Data.

Many research funders expect a DMP along with the application for a project. You will find sample DMPs for meeting institutional and funder requirements on the websites of the Humboldt-Universität zu Berlin. In addition, tools for the creation of DMPs are available, such as DMPonline with a template for European Commission’s H2020 project proposals or RDMO, which is designed to meet the requirements of funding organisations in the German-speaking area, i.e. DMPs for DFG and BMBF project proposals.

Research data are either stored in the institutional repositories of the research institutions where the respective researchers are located, or in discipline- or topic-specific repositories.

Directories of repositories:

  • Re3Data – Registry of Research Data Repositories, a registry to search for repositories that provide research data. It is possible to filter the result by discipline: humanities and social sciences
  • Repository Finder – a service provided by DataCite to find repositories in Re3Data that meet the criteria recommended by the Enabling FAIR Data Project.
  • OpenDOAR – directory of repositories with various content types, including research data

Metasearch engines for research data:

  • Base – Search engine especially for academic web resources
  • DataCite – Search for different, participating data centres
  • Linghub – Search for language resources in the repositories of CLARIN, LRE-Map, META-SHARE and DataHub
  • RatSWD – Search for research data provided by centres accredited by the German Data Forum (RatSWD)

Interdisciplinary (research data) repositories and metasearch engines:

  • OpenAIRE – freely accessible research results, i.e. publications and data sets, from EU-funded projects (keyword: Open Science).
  • Zenodo – research results from all disciplines, i.e. publications, data sets, presentations, etc.
  • GitHub – Software development projects

Subject-specific repositories and data collections:

  • CBETA – Chinese Buddhist Electronic Text Association
  • Chinese Text Project – Pre-modern Chinese Texts
  • CLARIN-D – Virtual Language Observatory (VLO) for scientific language data in the CLARIN context and Federated Context Search (FCS) for searching accessible resources
  • CrossAsia N-gram Service – Data sets developed from texts stored in the Integrated Text Repository CrossAsia for unrestricted download and analysis
  • CrossAsia Fulltext Search – Search in the textual resources hosted in the Integrated Textrepository CrossAsia
  • DataHub – contains especially social science and economics data sets 
  • ICPSR – social science data of the Inter-University Consortium for Political and Social Research
  • Kanseki Repository – pre-modern Chinese Texts including Buddhist and Daoist Texts
  • National Bureau of Statistics of China – statistical data of the People's Republic of China
  • OPEN CONTEXT – archaeological research data (includes a few projects on Asia)
  • SAT Daizōkyō Text Database – corpus of Buddhist Texts in Chinese with Japanese (and English) translation
  • SowiDataNet | datorium – search for social science research data in GESIS, the Leibniz Institute for the Social Sciences (so far contains very few data about Asia)

In addition, we plan to use a joint search to display research data stored in our repository as well as data relevant to Asian Studies stored in repositories of other institutions.

We, the FID Asia, will be happy to assist you if you would like to publish your data. If you would like to publish data in the CrossAsia repository, please contact us.

When selecting a repository, you should bear in mind that you might use non-Latin scripts and probably those that are not Unicode-compatible in the research data, in particular in the metadata and descriptive materials for the research data. Please make sure that these will be displayed correctly and that they are searchable in the repository. 

  • CLARIN-D – research data services and centres for linguistic data and language corpora
  • DARIAH-DE – research data repository for the humanities and cultural sciences
  • Subject-specific repository, search e.g. via Re3Data
  • GitHub – development platform
  • Institutional, university repository of your institution
  • TextGrid – digital preservation archive for humanities research data (so far the repository does not contain any texts in Asian languages)
  • Zenodo – interdisciplinary repository for scientific data sets, it is possible to implement access restriction

In order to publish data (open accessible) it is important to consider some legal questions, in particular if copyright protected, sensitive and/or personal data is concerned.

In the case of personal data, it is always required to get the consent of the persons concerned. In addition, please clarify whether the data can be anonymised.

If the copyright of others is affected, e.g. if larger parts of a database or of a title are involved, the authors should be contacted in advance.

Forschungsdaten.info has created a document to support the decision making process for publishing research data that deals with the most important legal aspects (in German).

As part of the "FID Asia" project, we plan to develop a central access and search system for Asia-related research data. We will keep you up to date in our blog.


We are looking forward to your support:

  • Tell us about other important sources and repositories for Asia-related research data.
  • Tell us where you published your data.
  • Tell us what services and support you would like to receive with respect to research data.
  • Ask us if you need support, e.g. in creating a data management plan, planning and describing research data, or archiving them.

You are welcome to contact us with these and any other questions at any time:
x-asia(at)sbb.spk-berlin.de  

Contact East, Southeast and Central Asia

Staatsbibliothek zu Berlin - PK
East Asia Department
Tel.: +49 30 266-436001
E-Mail: x-asia(at)sbb.spk-berlin.de

Contact Southasia

CATS Library / South Asia
Voßstrasse 2, Building 4110
D-69115 Heidelberg
Tel.: +49 (0)6221 54 15047
E-Mail: merkel(at)ub.uni-heidelberg.de