Research Outline
Presently, various results of social activities are accumulated in the form of text data. For instance, the results of diverse research activities in academic fields are being published every day in the form of academic papers and patents. However, at present, there are no methods by which we can objectively and quantitatively understand and utilize these achievements or their ripple effect.
Therefore, we will carry out research on a method that automatically recognizes meta-knowledge structures such as causal relationships, reasons, and purposes that are expressed explicitly or implicitly in the text and automatically extracts structured knowledge from natural language texts based on these structures. For the time being, the method will target texts that explicitly include meta-knowledge structures, such as academic articles and patent documents; studies will be conducted to extend this method to more general text data in the future.
(Project Director: Yusuke Miyao [National Institute of Informatics])
Objective of the project
In developing a method for automatically extracting structured knowledge from natural language texts, we shall focus on two aspects of a meta-knowledge structure. The first is the linguistic aspect, and explicit language representation (cue expression) that indicates some relationship (for example, reason) will be utilized. The second is statistical characteristics, and the fact that the statistical/probabilistic distribution of concepts (for example, a concept that is likely to become a reason) or conceptual relations (two concepts that are likely to form a causal relationship) are common across a text will be used. Since these two properties are complementary, it is ultimately necessary to develop an automatic extraction method that integrates the two, and the following research topics will be specifically pursued:
- Formulation of a meta-knowledge structure
- Establishment of a corpus annotation as training and test data
- Development of automatic extraction methods based on linguistic cues
- Development of automatic extraction methods based on statistical characteristics
- Development of an automatic extraction method that integrates linguistic and statistical methods
Project promotion structure
During the first year, the research will focus on a survey of relevant studies, building a framework for the formal representation of the meta-knowledge structure, and the creation of an annotated corpus based on it. These theories and data will form the foundation for the development of the meta-knowledge structure recognition method and will be an indispensable resource for subsequent studies.
As the area of expertise of the co-researchers varies over a wide range of fields, including linguistics, natural language processing, corpus annotation, and statistics, the researchers are responsible for their field of study during the research work and research is advanced by having information exchange and discussions through regular meetings.
Research View 027
Statistical thinking reshapes natural language processing
[Analysis of meta-knowledge structures] Daichi Mochihashi (Associate Professor, The Institute of Statistical Mathematics)
Research View 019
People's Actions Have a Pattern
[Analysis of meta-knowledge structures] Yusuke Miyao (National Institute of Informatics, Associate Professor)