People's Actions Have a Pattern

Much of the data that humankind produces on a daily basis has been accumulated in the form of “text.” As against machine-readable “programming language,” that which we humans use is called “natural language.” However, at present there are no means by which to objectively and quantitatively understand the ripple effect or the outcomes of writings in natural language from such texts as, for example, newspaper articles and academic papers. Therefore, “our linguistic activities are but a step in exploring the ‘meta-knowledge structure’ known as logic and causal relationships” – this is the claim of the project “Meta-knowledge structure analysis.” Based on the “deep analysis” method of Miyao Yusuke, Associate Professor, National Institute of Informatics, which makes computations by picking out logical structures as well as relationships that constitute sequences of words and sentences, we shall introduce this natural language processing project making interdisciplinary use of analytical techniques.

People unintentionally read causal relationships into a sentence.

People use a variety of words to convey the same meaning. For instance, “heavy rain” and “downpour” have essentially the same meaning, although the expressions are different. It was only while I was engaged in the research work of making the machine understand that words such as these, which are semantically similar on the surface, can convey the same or different meanings, that I realized that sometimes a word considered in itself does not quite convey what one is actually trying to say. For example, when we say, “I shall go out and eat,” “eating” is the purpose of “going out.” But if we look only at the superficial meaning of the words, the purpose is not specifically mentioned. The sentence does not say, “for the purpose of eating.” Similarly, in the expression “it having rained, the ground is wet,” since the structure of the sentence consists of two statements, people involuntarily tend to read a causal relationship “it rained -> the ground is wet.” Thus, although sentences in natural language encode and convey many relationships, such as purpose, reason, cause, result, and method or means, these have not been well researched until now. How to go about organizing and analyzing these relationships, so as to make the meaning clear—this is the objective of this project.

Collaboration between different standpoints

Our research begins by first looking at the actual data and observing the phenomenon indicated by it. In linguistics, generally, only data picked up in accordance with some theme is analyzed. What is basically done in natural with a phenomenon with the highest identifiable frequency and gradually extend the scope. Ryu Iida (National Institute of Information and Communications Technology), Senior Researcher in the project, has studied a few thousand to 20,000 sentences from newspapers and has analyzed them to find correlations between the expressions in them and the likely relationships indicated. Associate Professor Daisuke Bekki (Ochanomizu University) and his group have been studying individual sentences to see what kinds of logical expressions can be written to show specific relationships, as for instance causal relationships. Superficially, such expressions will differ by language, but if it is possible to give expression to the intended meaning of the relationship, then, it might become possible to do so in a way that is independent of individual languages. Hence, we intend to first make our analysis in Japanese and then apply it to English. Further, Associate Professor Daichi Mochihashi (Institute of Statistical Mathematics) has been conducting research into the possibility that by completely eliminating arbitrary elements, basically everything can be analyzed statistically. Although this view does not agree with the approach of “deep analysis,” we intend to meet and discuss a variety of things, at least once a month. By the way, Associate Professor Bekki and Associate Professor Mochihashi hold diametrically opposite views on the methodology of natural language processing and we might therefore call this a “miraculous” project in which two such people are working together.

Statistical thinking reshapes natural language processing

From tree structure to graph structure

On the basis of such analytical approaches, our group is working to understand the content of papers in information science using “deep analysis” and obtain the desired information. In the field of information science, there are many instances in which new uses or roles are suggested for various things or objects, such as, for example, a new technology by which something can be projected from the palm of one’s hand. So, here again, it would be effective to formalize the manner in which relationships such as motives, means, and effects are described in these papers. And these relationships are best represented by a graphic structure rather than a tree structure with branching verb phrases, noun phrases, etc., like that used in the structural analysis of sentences in language. In other words, we realized that the problem was to suppose yet another type of structure behind the data of the sentence structure, and to derive an appropriate graphic structure from within it. Once this type of visual representation is defined through the analysis of the data, the accuracy of the automatic analysis can be improved by repeated analysis applying various techniques. Currently, we have reached the stage of trying to use this as an application for smart search for papers and similar uses.

Are there two ways of “logical thinking”?

People often make statements like “this sentence is logical,” but what exactly it is in the sentence that makes it “logical” is something that is not yet known to anyone. Although we are not usually conscious of the fact, this “logical” in natural language is quite different from the logic of mathematics. Rather than being a formal logic, it is possible that the connection with  such things as reasons, means, causes, and results, which are the focus of this project, are what cause a sentence to be thought of as “logical.” For instance, although there is a logical gap in the statement, “If the wind blows, the cooper’s coffer grows,” it cannot be filled relying on relationships of words, but only by the common real-world knowledge that “when the wind blows, dust is kicked up.” To what extent do human beings fill such gaps (in logic) usually? This might be said to be an untapped area of study.

Use of language, especially logical thinking, has long been considered the highest form of intelligence, one that only human beings have and that is not found in other animals. But if we look closely at linguistic data, we appear to find both relationships that are merely convenient for human beings and those that are not. That is to say, relationships of means, reasons, causes, etc., might have been considered important only because they are necessary for the survival of human beings. Perhaps, elsewhere in the universe, there is a world where this kind of information is not needed and there is a completely different system of information exchange. To look at it another way, it is possible that language processing in humans is something that has been acquired only by chance during the process of evolution. We are now in the process of trying to understand it.

(Text in Japanese: Yusuke Miyao, Rue Ikeya. Photographs: Mitsuru Mizutani. Published: March 11, 2015)