A paper by researchers from SAKURA internet Research Center, the in-house research organization of SAKURA internet Inc. (hereinafter, “SAKURA internet,” Headquarters: Osaka City, Osaka; Founder, President & CEO: Kunihiro Tanaka), has been accepted for the AI for Accelerated Materials Discovery (AI4Mat) Workshop, held in conjunction with Neural Information Processing Systems (NeurIPS) 2025, a world-renowned international conference in the field of AI and machine learning.
The paper will be presented in San Diego, California, USA, on Saturday, December 6, 2025, local time.
Research Overview
SAKURA internet Research Center has released a new dataset called MatPROV, which extracts material synthesis procedures from scientific literature and enables the capture of causal relationships among materials, operations, and conditions using directed graphs. The novelty and utility of this dataset were recognized, which led to the acceptance of the paper.
Research Background and Objectives
SAKURA internet Research Center advances research and development in the field of laboratory informatics, making the management, analysis, and sharing of research data more efficient. Specifically, we are developing an electronic laboratory notebook platform to accelerate the process of scientific discovery. It is extremely important for electronic laboratory notebooks to record experiment procedures accurately and systematically.
The paper accepted for the conference focuses on addressing these challenges in the following two ways:
1. Researching the optimal capture format for recording experiment procedures as structured data
2. Utilizing large language models (LLMs) to automatically convert experiment procedures described in natural language into structured data
We selected materials science as our research subject because it is widely known that the procedures of material synthesis greatly affect the resulting material properties. This makes materials science a suitable subject for verifying the effectiveness of digitizing experiment procedures. However, existing studies have relied on limited schemas with predefined structures to capture synthesis procedures and have struggled to capture the complexity observed in real-world material synthesis procedures, such as branching and merging operations.
Research Features and Outcomes
To address these limitations, SAKURA internet Research Center adopted the Provenance Data Model (PROV-DM), an international standard for provenance information, which can flexibly model procedures and causal relationships. Furthermore, we used LLMs to extract synthesis procedures from scientific literature and constructed our own PROV-DM-compliant dataset, MatPROV.
MatPROV captures the relationships among materials, operations, and conditions through visually intuitive directed graphs, serving as a foundation broadly applicable to future research, such as AI-driven automated synthesis planning and process optimization.
The results of this research are expected to open new possibilities for AI-driven material exploration and to contribute to the acceleration of sustainable material development. Additionally, the knowledge gained from this research can serve as foundational technology for enhancing the recording and structuring of experiment procedures in electronic laboratory notebooks, leading to new developments in data management and utilization within research environments.
SAKURA internet Research Center will continue its research and development endeavors to realize new and valuable internet infrastructures for society.
About the Accepted Paper
Title
MatPROV: A Provenance Graph Dataset of Material Synthesis Extracted from Scientific Literature
Hirofumi Tsuruta (SAKURA internet Inc.), Masaya Kumagai (SAKURA internet Inc., Kyoto University)
Paper
https://arxiv.org/abs/2509.01042
Open Dataset
https://huggingface.co/datasets/MatPROV-project/MatPROV
Abstract

Study Overview Diagram
In materials research, the synthesis procedure is a crucial factor that directly influences the properties of the material. With data-driven approaches increasingly accelerating materials discovery in recent years, there has been growing interest in extracting synthesis procedures from scientific literature as structured data.
However, many existing studies rely on schemas tailored to specific materials with predefined fields or assume that synthesis procedures are linear sequences of operations, inadequately capturing the structural complexity inherent in real-world synthesis procedures.
To address these limitations, SAKURA internet Research Center adopted PROV-DM, an international standard for provenance information, as a way of modeling synthesis procedures, enabling the flexible representation of procedures as directed graphs.
In this study, we present MatPROV, a dataset of PROV-DM-compliant synthesis procedures extracted from scientific literature using LLMs. MatPROV captures the structural complexity and causal relationships among materials, operations, and conditions through visually intuitive directed graphs. This representation enables the acquisition of synthesis-related knowledge in a machine-interpretable form, paving the way for future developments such as automated synthesis planning and synthesis process optimization.
About the Presentation at the NeurIPS 2025 AI4Mat Workshop
About NeurIPS
NeurIPS is an international conference in the field of artificial intelligence (AI) and machine learning, established in 1987. It is regarded as one of the world’s most prestigious conferences due to the large number of submissions and its rigorous peer review process, which results in low acceptance rates. Speakers at the conference present innovative research across a broad range of AI and machine learning, including deep learning, reinforcement learning, computer vision, natural language processing, and applied research in various fields.
The 39th NeurIPS conference will be held in the United States in December 2025.
About the AI4Mat Workshop
The AI4Mat Workshop first took place in 2022 as an opportunity for researchers in materials science and AI to discuss and share the challenges and achievements at the forefront of AI-driven materials discovery.
Date and Venue
Date: Saturday, December 6, 2025
Venue: San Diego Convention Center, San Diego, California, USA
Presenter
Hirofumi Tsuruta – SAKURA internet Research Center
Details
Please see the website below.
https://neurips.cc/Conferences/2025
SAKURA internet Inc.
Representative: Founder, CEO, and President Kunihiro Tanaka
Headquarters: Grand Green Osaka North Building, JAM BASE 3F, 6-38 Ofukacho, Kita-ku,Osaka City, Osaka
Founded: December 23, 1996
Incorporated: August 17, 1999
URL: https://www.sakura.ad.jp/corporate/en/
Contact Information for Media Inquiries Regarding This Matter
Sakura internet Inc. PR Representative
Inquiry Form: https://sakura.f-form.com/sakurapr
*This information was accurate at the time of release. It may subsequently be subject to change without notice.
*The published company and product names are the trademarks or registered trademarks of each company.