Human and Social Data / Building a cyber-physical integrated society database is also helpful in policy making

Latest Publications and Announcements

Click here for the latest publications and announcements from "Human and Social Data."

NEED FOR THE DATABASE

Thanks to advanced information communication technology, we live in a society where information devices and sensors of every type are connected to networks, and information is digitized, distributed, and accessed anytime by anyone from anywhere. As a result, cyber-space and the physical world are interlinked or integrated, forming a "Cyber-Physical Integrated Society." In an integrated society, it is expected for activities of the physical world and human and social activities to be reflected in the cyberworld, and the power of information can help devise measures and create new values to address environmental and energy, medical and health, and food issues faced by the human race. This project will research and develop a technological and social mechanism that senses human and social behaviors, analyzes the data collected, composes the information and services that control people and things, and provides timely feedback.

Solving human and social problems is difficult because risks and profits are estimated, and subjective judgment and decisions are made, based on incomplete information and data. To assist in making such rational decisions or judgments, a foundation for human and social data will be built to allow the collection, maintenance, sharing, analysis, and composition of numerous and diverse "Big Data" collected from the rapidly increasing number of smartphones, SNS, and various sensors.

This database will be promoted in organic coordination with the project "Social Communication.” (Communication Informatics) (Project Director: Noboru Sonehara [National Institute of Informatics])

R&D OVERVIEW

< On the Infrastructure for Human and Social Data-Driven Service Science>

(Principal Investigator: Noboru Sonehara [National Institute of Informatics])

1. Foundation for the application of the protection of human and social data privacy

As more advanced mobile devices are used and social network services, such as Twitter and Facebook, emerge, various data, including a vast amount of personal digital data (life logs), continue to accumulate on the Internet. Images and videos uploaded in mass quantity from fixed and mobile cameras can also be categorized as types of life logs. The Internet space is becoming similar to multimedia Big Data. Because life logs are composed of personal information, policies for their effective use while offering protection are needed.

For this purpose, the foundation for the application of the protection of personal information along a time axis (extraordinary events such as disasters) and a space axis (special locations in the physical world such as train stations, commercial facilities, theme parks) will be built. Access and use of such personal and demographic information from accumulated life logs, necessary in the event of disasters or emergencies, are currently difficult, and this became one of the hindrances to prompt evacuation and rescue activities during the Great East Japan earthquake. Thus, an information system in which personal information can be accessed and used via communication is required in the case of disasters and emergencies.

Based on the protected use of personal information  along the time axis, an information system will be studied that can combine and centrally control administrative, private sector and personal life logs, and determine and process access to life logs in a self-regulating and geographically distributed manner. This system will penetrate the barrier of personal information protection legislation and realize, as a specific service, a method for the integration of personal information, such as who lives where in an affected area, whether the victim is a child or an adult, requires assistance, is bedridden, or understands Japanese, by utilizing personal and demographic information, and promptly devise an appropriate aid or rescue plan.

On the other hand, with regard to the basis for the protected use of personal information on the space axis, a harmonic information field will be created in a special location where characteristic information (such as hobbies, preferences, behavioral, and purchase tendencies) among other personal information can be disclosed spontaneously, and where such disclosure matches the benefit that users can expect to gain. Integration of social media and sensory data, data cleansing for privacy protection, time and space database construction and mining, and a method for applying and recommending information will be employed as the underlying technologies.

In realizing the basis for human and social data, the protection and disclosure of personal information is a critical issue to be solved. The academic objective and significance of this research is to examine how this balance can be incorporated into the information system. In other words, this is an attempt to realize the natural idea "I need to provide my information in order to receive good and beneficial service" in a scientific framework with the combination of informatics, statistics, and social sciences. Separating the time and space axes to clarify their  correlation and differences is a new approach. The interface of informatics, statistics, service sciences, and social sciences will be greatly expanded by confronting the socio-psychological topic of privacy.

For example, an attempt is being made to build a foundation where users can control the disclosure of their personal information, in view of the interrelationship of time and space axes when the personal information of victims is to be actively disclosed to specific areas in the event of a disaster. This can mean the birth of new information flow called protected use of personal information on the space axis. Through this attempt, an "ID Data Commons" will be built as a mechanism for users to decide the management of their own personal and attribute information such as life logs, with regard to their collection, management, analysis, and use. Personally, privately, and administratively owned data will then be linked to aim for the creation of a strong social base against crises, such as serious incidents and social crisis.

2. A collection base for human and social communication data

Whereas many theories of social sciences are effective in explaining social phenomena, they do not suggest much in terms of intervening methodology to show how they can change society. Informatics, on the other hand, takes an engineering approach to society, but fails to explain the theoretical meaning, and so the verification of its effectiveness is weak.

This project will aim at closing the gap between the social sciences and informatics, and academically create a new social value using an informatics approach based on the theories of the social sciences. A collection base for human and social communication data will be developed using smartphones to measure and research social intervention, which is not possible through the conventional methodology of social sciences. In particular, results have been achieved in proposing a measurement methodology using the communication log data from smartphones, and in the empirical study of foundations for social capital generation based on the social network theory.

The creation of a new social value will be attempted, taking advantage of the characteristics of the collection base for the human and social communication data, thus offering many potential applications. In particular, the communication styles of smartphone users will be estimated from phone logs, and specific action-prompting messages customized to their style will be sent as stimuli to verify effectiveness. This approach integrates the behavioral theory of the social sciences and the methodology of informatics, thus potentially pioneering a new academic field.

3. Foundation for human and social data analysis service synthesis 

In order to make a better "cyber-physical integrated society," a society in which cyberspace and the physical world are linked and integrated, a data-driven service synthesis science  needs to be created. It will project and analyze information of the physical world in cyberspace, and feed the results back to people or things in order to create new value.

Today, many policies and decisions that lack strong scientific grounding, such as serviceability and economy, are a social problem because they do not achieve the expected effects and leave significant problems in sustainable operation. To solve this issue,  policies and decisions must be made based on objective data. Accessing and using the history of social participation, living, and communication activities (life logs) from mobile communication devices, such as smartphones, are effective for this. However, in order to collect and use life logs, the use purpose must be notified, and discloser consent must be obtained. The increasing cost of consent processing, information security for personal information management, and compliance against possible legal violation prevents the rise of new information services and rational decision-making.

In reality, this has made the collection and use of personal and demographic information necessary in the event of disasters difficult, and became one of the hindrances to prompt evacuation and rescue activities during the Great East Japan Earthquake. The law allows obtaining personal information without consent only when it is difficult to obtain such consent and the life, body, or property of the individual is at risk. At the site of emergencies, such as the case of the great earthquake, communication and/or administrative functions may be lost or interrupted, and access and use of personal information may not be quickly determined.

To address these issues, a mechanism to self-determine the handling of collection, management, sharing, analysis, and combination of the life logs, "ID Data Commons (Identity Data Commons)," will be established. In ID Data Commons, individuals can conditionally (1) opt during the period of disclosure after the occurrence of a disaster/incident, (2) limit the accessing party and use purpose, and (3) grant permission for direct use for aid and rescue purposes, and indirect use such as to collect statistics. Further, by obtaining advance consent that the personal information registered in ordinary information services and held by authorities will be accessed in the event of disaster only for the purpose of responding, will eliminate the need to repeat the collection of personal information, as well as lower the psychological hurdle for consenting because access is limited to the disclosure only in the event of disasters. This presents sufficient benefit for the general public to consent, namely information services in normal times and improved disaster mitigation.

<Infrastructure for accelerating collection and access to human and social data>

(Principle Investigator: Hiroe Tsubaki [The Institute of Statistical Mathematics])

4. Preparation for the establishment of on-site location for official statistics, and nationwide deployment plan and expansion of on-site data to include Asian data

The objective is to prepare data infrastructure for high quality governmental information in order to promote human and social data-centric science. To this end, cooperation with the Statistical Bureau, National Statistic Center, Statistical Information Institute for Consulting and Analysis, and Section I Committee on Statistics from People's Perspective  under the Science Council of Japan is further strengthened, and a database that links anonymized locations throughout Japan and the environment that contributes to the promotion of researcher access to government information will be organized.

As part of this effort, an anonymized official statistics data access location will be developed in collaboration with the Statistics Center opened in 2010. In addition, a role will be fulfilled as the public data access location in Asia, as stated in the memorandum agreed between Chairman Ito of the Statistical Information Institute for Consulting, and the directors of ten Asian nations in January 2013, and agreed upon by the Ministry of Education, Culture, Sports, Science, and Technology. The project will also aim at forming an On-site Analysis Room previously opened and approved in 2011 to be the location where micro-data of the official statistics data of 12 Asian nations can be analyzed.

The long-term objective is to refine the plan to connect the Statistics Center and human and social sciences departments of universities throughout Japan with a dedicated network to link the official statistics information, conduct exploratory data analysis, and develop on-site locations with advanced modeling. This was jointly proposed with the Statistics Center in March 2013 for a large facility plan by the Science Council of Japan.  In 2014, statistics related to suicide were developed through a contract with the National Center of Neurology and Psychiatry using the on-site location currently under construction.

5. Building a database accessible by industry and  government, and planning for joint research premised on the use of this database

A foundation will be prepared for the future purpose of offering three databases with a high potential for contributing to joint research by industry, academia, and government.

The building of the first database, a "small and medium-sized enterprise financial database," will continue, based on the corporate financial data held by the CRD Association with the support of external funding through the CRD Association. To accumulate the results of joint research directly using the data, a joint study on the credit risk of small and medium-sized enterprises and their management will be performed through open-type joint research with the Institute of Statistical Mathematics.

The second is a database that has clustered control technologies in order to accelerate information circulation (not limited to statistical methodologies) as a standard structural body, such that the methodology can be searched by milestones of the standard R&D process (integrating not only ISO/TC69/SC8, but also various process models) used by industry and government. To develop this database, a network of R&D researchers from industry, government, and academia called "VCP-NET" was launched in May 2013 with the cooperation of the Japanese Standards Association.

The third database is an information base related to health science data such as receipts. Research proposals will be the object of collaboration on a regular basis with universities such as Shimane University Faculty of Medicine, using the receipt sampling data temporarily made available by the Ministry of Health, Labor and Welfare in 2012 after the on-site location of the research institute  is established, in order to create an environment where the sampling data are regularly stored and made available for joint research. In 2013, in addition to research on the safety of medicines, initial examination of the feasibility of measuring the medical economic effects of functional food in the prevention of eyesight loss (advocated by the General Incorporated Association of International Food & Nutrition), was performed in collaboration with the Shimane University Faculty of Medicine, using the largest privately-owned receipt database owned by the research institute . The research that uses this database will lead to proposals through the above organization for science research funding by the Ministry of Health, Labor, and Welfare.

6. Effective implementation of collection and a feedback platform for industry environment information in Asia

With the cooperation of Osaka University, and using eL-Platform developed based on the data provided by the Japan Environmental Management Association for Industry, industry environment information in Asia will be collected to start accumulating researchable data, in collaboration in Asia, especially with Badan Pengkajian dan Penerapan Teknologi (BPPT) in Indonesia. The analysis results of the regional information collected are then fed back to  local governments (such as Kota Bogor) and enterprises that involved in their collection. Necessary training will also be provided to local researchers who help with information gathering.

PROMOTION OF DATABASE DEVELOPMENT

With the National Institute of Informatics and the Institute of Statistical Mathematics at the core, the collaborative network extends to 29 universities throughout Japan (including the University of Tokyo, Osaka University, Doshisha University, Hiroshima University, Kochi University, Tokyo Gakugei University, Wakayama Medical University, Keio University, the University of Electro-Communications, and Kyushu University). Stronger ties exist with four universities (including Tohoku University and Ishinomaki Senshu University) affected by the Great East Japan Earthquake, and nine local public bodies, in order to establish methods for analyzing, collecting, managing, and sharing important human and social data that provide lessons from the disaster. With the collaboration of industry and local governments (including Sendai City, Kyoto City, Hiroshima Prefecture and City, Yamanashi Prefecture, and Kochi Prefecture) that are the main governing bodies in local areas, data-centric policy making support services will be socially implemented in the fields of tourism, disaster prevention and mitigation, and environmental policy science.

Related articles

Research View 022

Moving Toward a Better Society Using Knowledge Gained from Data

[Human and Social Data] Satoshi Yamashita (Professor, the Institute of Statistical Mathematics)

Professor Satoshi Yamashita, director of the Risk Analysis Research Center at the Institute of Statistical Mathematics in Tokyo, promotes research involving both "social communication" and "human and social data" at the Research Commons Project.

Research View 016

A Vision for a Future Where Statistical Data Benefits Society

[Human and Social Data] Hiroe Tsubaki (Vice Director-General of the Institute of Statistical Mathematics)

Japan's 1947 Statistics Act underwent a thorough revision in April 2009, around 60 years after it was first introduced.