In the heart of Japan, at the National Institute for Materials Science (NIMS) in Tsukuba, a team led by Yukari Katsura is revolutionizing how we collect and curate data in materials science. Their work, published in the journal *Science and Technology of Advanced Materials: Methods* (translated to English as “Science and Technology of Advanced Materials: Methods”), is set to accelerate the development of materials crucial for the energy sector, from more efficient solar cells to advanced battery technologies.
Katsura and her team have developed two innovative tools that leverage the power of Large Language Models (LLMs) to streamline data curation from scientific publications. The first tool, Starrydata Auto-Suggestion for Sample Information, acts like a smart assistant. It takes text from abstracts and experimental methods provided by the user and generates concise English descriptions that fit seamlessly into the existing Starrydata database schema. “This tool is designed to make the curator’s job easier,” explains Katsura. “It’s like having a knowledgeable colleague who can quickly summarize complex information into a format that’s ready to be plugged into our database.”
The second tool is a more ambitious project: a schema-free dual-component system called Starrydata Auto-Summary GPT and Starrydata Auto-Summary Viewer. The Auto-Summary GPT processes entire PDF files of open-access papers, using advanced models like GPT-5 to generate comprehensive JSON outputs. This output captures and summarizes all figures, tables, and experimental samples as they appear in the original papers. The companion viewer then transforms this data into interactive tables, giving curators a clear, organized view of the paper’s structure. “This system allows curators to quickly identify relevant data collection targets and input information efficiently,” says Katsura. “It’s a significant step towards automating the construction of scientific databases.”
The implications for the energy sector are substantial. Materials science is at the heart of developing new energy technologies, from more efficient photovoltaics to advanced energy storage solutions. The faster and more accurately we can curate and analyze data on materials, the quicker we can innovate and bring new technologies to market. “This research is not just about making data curation more efficient,” says Katsura. “It’s about accelerating the pace of discovery and innovation in materials science, which has direct benefits for the energy sector and beyond.”
The tools developed by Katsura’s team represent a significant leap forward in materials informatics. By automating and streamlining the data curation process, they are paving the way for more rapid and accurate knowledge extraction from scientific literature. This could lead to faster development cycles for new materials, reducing the time and cost associated with bringing new energy technologies to market.
As the energy sector continues to evolve, the need for advanced materials that can meet the demands of a sustainable future becomes ever more pressing. The work of Katsura and her team at NIMS is a testament to the power of interdisciplinary collaboration and the potential of AI to transform scientific research. Their tools are not just enhancing curation efficiency; they are shaping the future of materials science and, by extension, the energy technologies that will power our world.

