Future of Humanities in the Era of Data Science
Data science is undoubtedly a major trend in modern science that continues to propel all types of discoveries. The rise of data science has had great influence on humanities as well, bringing on various novel movements, such as “digital humanities,” which makes use of classic materials in brand-new ways by converting them into digital data. In this installment of the Science Report, we will take you through some of the joint humanities research projects conducted by the National Institute of Japanese Literature (NIJL) and the Research Organization of Information and Systems’ Joint Support-Center for Data Science Research (ROIS-DS) by talking with three experts: NIJL’s Director-General Robert Campbell; ROIS-DS Director Asao Fujiyama; and Asanobu Kitamoto, Director of the Center for Open Data in the Humanities (CODH) at the ROIS.
Ask the Expert: Robert Campbell (Director-General, the National Institute of Japanese Literature)
Dr. Campbell is a scholar of Japanese literature specializing in the study of modern and contemporary work. Originally from New York, Dr. Campbell studied in the Department of East Asian Languages and Civilizations, Graduate School of Fine Arts, Harvard University, where he earned his PhD. He moved to Japan in 1985 to study early modern literature at Kyushu University and later began teaching. He served as professor at the University of Tokyo before being appointed to the current post in 2017. Known for his keen interest and knowledge in diverse topics related to his research, including arts, philosophy and mass communications, Dr. Campbell is a frequent guest on many Japanese TV shows.
Ask the Expert: Asanobu Kitamoto (Director, the Center for Open Data in the Humanities, or CODH; Professor, the Research Organization of Information and Systems, or ROIS)
Dr. Kitamoto has been on the leading edge of digital humanities, the relatively new research field that he has been fostering. He makes particular efforts to promote research that uses artificial intelligence (AI) to decode visual data from images found in a wide range of Japanese classical books. Dr. Kitamoto earned his Ph.D. in engineering from the University of Tokyo.
Ask the Expert: Asao Fujiyama (Director, the Joint Support-Center for Data Science Research, the Research Organization of Information and Systems; Specially Appointed Professor, the National Institute of Genetics)
Dr. Fujiyama is a molecular biologist and genome scientist, who participated in the decoding of the human genome in 2003. He later successfully decoded the entire chimpanzee genome. Since being appointed to his current position in 2016, he has made efforts to promote cross-field data science research and to foster the younger generation of data scientists. Dr. Fujiyama earned his PhD from Nagoya University.
Moving toward humanities of the data science era
“Data science” is a 21st Century buzzword. But you are mistaken if you think data science refers to a field of study, according to Dr. Asao Fujiyama, Director, the Joint Support-Center for Data Science Research.
“Data science is an overarching term that includes all kinds of research activities involving numbers, documents, images, and so on — particularly those that deal with voluminous data — to make them useful for researchers or society at large,“ Dr. Fujiyama said. “So, you can think of ‘data science’ as a research approach based on the principles of statistics and computer science.”
Dr. Fujiyama comes from a biology background and has the experience of participating and promoting the historic international Human Genome Project, beginning around 1990.
“Human genome data is the equivalent of 3 billion characters in terms of volume. The project was completed in 2004 after putting all of the data through the machine to analyze it and make it ready for use by everyone. With this achievement, biology became one of the “data-driven” sciences,” Dr. Fujiyama said.
“Humanities research is changing as well. In yesteryears, prominent scholars would bury themselves in old books, written in kuzushiji (Japanese cursive characters), to theorize about what is what. In the era of data science, computational machines play central roles in converting information into something useful for researchers and other people,” he said. “At the Joint Support-Center for Data Science Research, the Research Organization of Information and Systems (ROIS-DS), we work to advance data science in a wide range of research fields, and ‘digital humanities’ is one of those fields.”
Four features of Japanese classical books that pose challenges in digitization
Dr. Robert Campbell, the Director of the National Institute of Japanese Literature, is a widely recognized scholar of modern and contemporary Japanese literature. He recently sat down with the Science Report to talk about the Institute’s collection.
“Our institute operates a library specializing in literature and history. People can view and make copies of the original materials we have here, and many researchers visit to take advantage of it,” Dr. Campbell said. “Most of the books and publications in our collection were produced in the Japanese archipelago in the Edo Period or earlier. You can see that the common physical format of books has changed over the course of many, many centuries, from maki-mono (scrolls) to kansu-bon (scrolled books) to fukuro-toji (double-leaved book), etc. We refer to all of these as “kotenseki” (classic books), which really are the crown jewels of the 1,000-year history of the classics.
Japanese classic books have some unique characteristics, according to Dr. Campbell.
“First of all, there are a lot of pictures in these books. In other words, visual and character elements are inseparable,” Dr. Campbell said. “Secondly, we see many different orthographic styles as well as notations, including Chinese writing and those mixed in with katakana. It’s also very interesting to know how people in history have accumulated knowledge and information. When you see these classic books, you will learn that people before the Edo Period used to add annotation to keep the information together with relevant book texts. They made notes about various ideas of their times in this way, and that’s how they built knowledge,” he said.
“In addition, Japanese classic books are unique in that the content and physical appearance of a book — such as the size, the material with which the book was made, and the cover design — are correlated. The bigger a book is, the more classic and valuable it is. And the smaller a book is, the more contemporary and adoptable the idea is,” Dr. Campbell said. “These classic books offer deep insights such that could not be captured by simply digitizing the characters in them. So, how do you convert these books into data without missing such valuable information? How do you gain, collect and sort all the information the books could provide? That’s the big challenge we are faced with.”
Classical books become a subject of informatics research
Dr. Campbell considers ROIS-DS as the NIJL’s “invaluable partner” in their effort to convert classic materials into reliable, versatile digital data that will continuously and eternally benefit society. Both institutes have constantly worked together on various research projects that resulted in success, ranging from the digitization of materials to the recognition of kuzushiji characters in kuzishiji books, among others.
Dr. Asanobu Kitamoto, the Director of the Center for Open Data in the Humanities (CODH), who oversees activities concerning digital humanities at ROIS-DS, comes from a background in informatics research.
“To tell you the truth, I wasn’t that familiar with classical books until I began collaborating with the NIJL. Just like many Japanese, I had always thought of classical books as something that belonged to a distant world,” Dr. Kitamoto said. “The National Institute for Japanese Literature made a massive number of books available to the public in the form of digital data. As a result, ancient classics became a subject of informatics research, making it possible for us to apply informatics perspectives in analyzing the data. At the same time, AI technologies were becoming more available. Combining these two factors led to the successful recognition of kuzushiji characters using AI technologies.
There are three different approaches to conducting joint research between the NIJL and ROIS-DS, according to Dr. Kitamoto. One of them is for ROIS-DS to take data produced by the NIJL, convert its data format into something that’s easier to use for informatics researchers, and then release it to the public.
“For example, CODH has organized NIJL’s dataset into an easier format for AI researchers to use and released it as the "Kuzushiji Dataset,” Dr. Kitamoto said.
The second approach calls for Japanese literature and informatics researchers to come together to discuss new ideas to use data collaboratively. For instance, the “Edo Cooking Recipies Dataset” — which has drawn the interest of the general public — is the result of an idea pitched during the NIJL-sponsored idea-thon.
The third approach is a technology-driven one. Informatics researchers would propose new technologies that would be useful for humanities research.
“Those are often research methods that humanities scholars wouldn’t think of. An example of one such method be using computer vision technology to compare different edition of a book automatically. We are testing this idea on “Bukan,” the document that is a compilation of the names and other information of the lords and shogunate officials in the Edo Period,” Dr. Kitamoto said.
Mining classical books for the DNA of Japanese culture
Kitamoto: Our biggest goal right now is to make the entire contents of every classical book digitally searchable. Japanese classical books require you to read page by page to understand what’s in them. So, I think making the whole content searchable has an enormous value. Further down the road, we also hope to explore new methods for humanities research to dig deeper into Japanese culture. Dr. Campbell pointed out earlier that texts and pictures together constitute the contents of Japanese classical books. I suppose that Japanese modern anime and manga (comic books) carry on that unique feature of Japanese classical books. We want to seek ways to take advantage of information technology to effectively disseminate information, so more people worldwide will come to know such uniqueness about Japanese culture.
Campbell: The popular manga series, “Kimetsu-no Yaiba” (Demon Slayer), may actually be based on a yomihon novel, or perhaps “kibyoshi” (a picture book), from the 18th Century. I bet these types of interesting facts will quickly become the talk of the town through social media postings by informatics (laughter). And one more thing — we would love to see our diverse, rich cultural materials being used more for non-research purposes as well. To promote this initiative, we launched the “NIJL Arts Initiative” out of the gallery inside the NIJL’s building. We would invite various partners outside the research world, such as artists and translators, to work with us and use this gallery space to showcase the collaborative work. For example, Hiromi Kawakami, an Akutagawa Prize-winning author, was coming here often to learn more about "Ise-Monogatari” (The Tale of Ise). She then used the knowledge to write a novel, “Sandome no Koi” (The Third Love), based on it, offering this whole new, fictious world for readers to enjoy. There will be a special exhibition, "Toki no Taba wo Hiraku” (Undoing the Bundle of Time), starting on February 15, featuring all the achievements made by individual partners.
Fujiyama: The genome is a set of genetic information. The research field of “genomics” (genome science) focuses on the methodology and idea of studying all the genetic information as a whole. As I listened to you two just now, I can see the potential of “humanomics,” or a similar comprehensive research field, developing into a genomics version of humanities that’s all-encompassing, dealing with research materials, analytic technologies, information output to society and more.
Kitamoto: Actually, I’ve been taking hints from the history of genome analysis quite a bit myself. I believe what happened in the research arena of genome science will also happen in the humanities. I think that’s the research method of the new digital humanities era. (Ref.: Asanobu Kitamoto, "Rekishiteki tenseki-no kensaku kinou-no koudo-ka, soshite Skuriputomu kaiseki-ni mukete Sophistication of Search for Japanese Pre-Modern Books and a Vision Toward Scriptome Analysis.”)
Campbell: Through digitalization of Japanese classical books, I’m becoming deeply aware that the patterns of behaviors of people who’ve ever lived in the Japanese archipelago, their world views, as well as their views on life and death, and the choices people made in their daily lives, emerge through history as patterns of various sorts. The way we’ve lived over the past year amid the threat of the global new coronavirus pandemic is an example. The Japanese government never enforced a lockdown, but has instead called on industries and citizens in different regions to refrain from activities or follow certain guidelines. Globally speaking, this is quite a unique approach. As a scholar specializing in Edo literature, I find a lot of similarities between this approach to controlling the pandemic and the way the Edo shogunate governed the country, by handing down guidelines for different industries and regions to use to manage their own people. You could also cite satoyama as an example — the areas where local people do the necessary work to maintain the integrity of the natural environment. You will find different examples like these in different times throughout history. Classical books have got Japanese minds, matters, and the genome of the language buried in them. I believe collaborating with informatics researchers will enable us to discover things that couldn’t be unearthed if each of us were working alone. The discussion we had today has been very encouraging to me.
Fujiyama: I believe that collecting documents is the first step in studying culture, and history confirms that. Once again, I felt inspired to learn how the latest technologies are pushing research forward, and that information is being actively shared with society. I know we hope that our conversation will develop into a larger discussion about Japanese (human) culture, and it was meaningful that we were able to share this common sentiment.
Interviewer: Rue Ikeya
Released on: Mar. 10, 2022 (The Japanese version released on Jan. 12, 2021)
* This interview was conducted online.