Moving Toward a Better Society Using Knowledge Gained from Data

Professor Satoshi Yamashita, director of the Risk Analysis Research Center at the Institute of Statistical Mathematics in Tokyo, promotes research involving both "social communication" and "human and social data" at the Research Commons Project. He has made many contributions to highly reliable financial data collected from a wide range of sources and respective analyses. His accomplishments include: risk analysis calculation for irrecoverable debt prediction—which, conventionally, has depended solely on a company's financial data—by using securities and guarantees as an integrated database; development of a method for complementing missing values in large financial data; and development of a ranking system for sovereign default risks. Professor Satoshi Yamashita introduces the activities in the projects.

Utilizing social data to benefit society

Credit financial data, which pertain to the reliability of individuals and corporations, must be maintained under high security. As researchers, we may first need to win trust before we can obtain data from certain individuals or groups. I have been engaged in studies that require winning such trust and building healthy relationships. How can we, as researchers, contribute to society when handling data obtained from financial institutions? For example, big banks today are working on high-level statistics, but our contributions must exceed far beyond theirs. Also, it is possible to build new, trust-based relationships by using the tools required to meet consulting requests. It is important to establish the recognition that the Institute of Statistical Mathematics can provide certain types of data; for example, we are planning to further strengthen the systems that maintain governmental data such as business statistics from the Ministry of Internal Affairs and Communications in addition to data from financial institutions and other data vendors.

Secondary use of public statistical data

Predicting profit over 30 years for leased real estate

Among the suite of bank loans is a product called the "apartment loan," intended for landlords who wish to build multi-family housing structures. This type of loan makes up only 10% of all bank loans, but unlike other loans, it does not require a risk evaluation model. This is because it is difficult to determine which properties have the potential to turn a profit; thus there is no existing, reliable customer data. Given these conditions, in our project at the Research Commons, we started by gradually creating a database for leased real estate properties in Kinki region and are engaged in predicting their profit margins. This development is underway thanks to the collaboration of five entities: the Institute of Statistical Mathematics, National Institute of Informatics, data vendors, real estate appraisal offices, and banks.

Web data, survey data, and analysis methods

We first use a method developed by Project Researcher Yu Ichifuji (National Institute of Informatics) to obtain and store website information regarding the available/occupied rooms of leased real estate properties once every ten days. While this method can produce a large amount of data, the information posted on the websites does not always accurately reflect the actual situation. Thus, we obtain data of much higher quality using a different method once every three months. This involves highly precise research by a real estate appraiser who visits each real estate property of a bank’s apartment loan portfolio and makes expert judgments. Based on these two types of data, we make a statistical model that predicts the probability of whether an available room will or will not be occupied in three months for each of the two data sets. These three processes are the features of our project that set us apart from others. We plan to further develop this method to make predictions of available rooms over 20 and 30 years.

Method of obtaining data from websites

Modeling that connects data providers and users

The form of research that I wanted to realize the most is one that can integrate the needs of  providers and  users. This type of research can surely have an impact and benefit society, which isthe strong incentive driving my research—even though it will take a long time to complete. We also incorporate feedback from the users of the models to make improved versions. In engineering, for example, the created product has to work. In medicine, the product has to cure diseases. In our field, however, even though we claim that we made a model 10% more precise than the conventional model, it would not be used if we could not dispel people's doubt that we just used more convenient data and called it “new.” This is why humans are the key to success at the end of the day.

"My master's thesis was on calculating numbers on highway signs that announce 'this many minutes until this place'. I modeled the roads as water channels and used hydrological models, which produced highly accurate predictions that were even responsive to traffic accidents," says Professor Yamashita. Since then, he has made it a rule "not conduct research that is not used in society.”

(Text in Japanese: Satoshi Yamahsita, Rue Ikeya. Photographs: Toshiaki Kitaoka. Published: August 10, 2015)