{"id":1968224368,"date":"2026-04-20T11:00:44","date_gmt":"2026-04-20T02:00:44","guid":{"rendered":"https:\/\/www.sakura.ad.jp\/corporate\/?post_type=en_newsreleases&#038;p=1968224368"},"modified":"2026-04-17T17:44:13","modified_gmt":"2026-04-17T08:44:13","slug":"sakura-internet-research-centers-paper-accepted-for-industry-track-at-mlsys-2026-international-conference-on-machine-learning-systems","status":"publish","type":"en_newsreleases","link":"https:\/\/www.sakura.ad.jp\/corporate\/en\/information\/2026\/04\/20\/1968224368\/","title":{"rendered":"SAKURA internet Research Center\u2019s Paper Accepted for Industry Track at MLSys 2026 International Conference on Machine Learning Systems"},"content":{"rendered":"<p>A paper by researchers at the SAKURA internet Research Center, the in-house research division of digital infrastructure service provider SAKURA internet Inc. (hereinafter, \u201cSAKURA internet,\u201d\u00a0 Headquarters: Osaka City, Osaka Prefecture; Founder, CEO, and President: Kunihiro Tanaka), has been accepted for Industry Track at Machine Learning and Systems (MLSys) 2026, an international conference for fields where machine learning and systems intersect.<br \/>\nThis paper is scheduled to be presented in Bellevue, Washington, on Thursday, May 21, 2026 local time.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-large wp-image-1968224372 aligncenter\" src=\"https:\/\/www.sakura.ad.jp\/corporate\/wp-content\/uploads\/2026\/04\/MLSys2026_EN-1024x538.png\" alt=\"\" width=\"1024\" height=\"538\" srcset=\"https:\/\/www.sakura.ad.jp\/corporate\/wp-content\/uploads\/2026\/04\/MLSys2026_EN-1024x538.png 1024w, https:\/\/www.sakura.ad.jp\/corporate\/wp-content\/uploads\/2026\/04\/MLSys2026_EN-300x158.png 300w, https:\/\/www.sakura.ad.jp\/corporate\/wp-content\/uploads\/2026\/04\/MLSys2026_EN-768x403.png 768w, https:\/\/www.sakura.ad.jp\/corporate\/wp-content\/uploads\/2026\/04\/MLSys2026_EN.png 1200w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<p>&nbsp;<\/p>\n<h3>Research Overview<\/h3>\n<p>SAKURA internet Research Center develops and operates SAKURAONE, a High Performance Computing (HPC) cluster designed for training AI, including large language models (LLMs). This system employs open, vendor-neutral network technology, and ranked 49th worldwide in the HPL benchmark on the Top 500 list, an international supercomputer performance ranking announced at the ISC 2025 international conference on high-performance computing. It is also the only system among the top 100 to use an open network stack exclusively. Furthermore, despite a scarcity of published case studies, the valuable insights into workload characteristics obtained from a real-world operational environment involving a mid-sized GPU cluster (comprised of several hundred GPUs) were highly regarded, leading to the acceptance of the research paper.<\/p>\n<h3>Research Background and Objectives<\/h3>\n<p>At SAKURA internet Research Center, we conduct research and development on large-scale AI infrastructure to support the advancement of AI research in Japan and improve the country\u2019s industrial competitiveness. In particular, we are working to implement an open, distributed architecture that excels in cost efficiency, flexibility, and transparency, and to systematize operational insights gained from actual LLM development.<\/p>\n<p>With this objective in mind, our accepted paper focused on the following two areas: using an open Ethernet network based on the 800 Gigabit Ethernet (hereinafter \u201c800 GbE\u201d) network standard, and the Software for Open Networking in the Cloud (hereinafter \u201cSONiC\u201d) network operating system.<\/p>\n<p>\u3000<br \/>\n1. Architectural design, implementation, and performance evaluation of AI-oriented HPC systems based on a vendor-neutral open Ethernet network<br \/>\n2. Observation and analysis of workload characteristics based on production data from LLM development in a single-project, single-tenant environment<\/p>\n<p>In traditional HPC systems for AI, proprietary interconnect networks (InfiniBand, etc.) that relied on specific vendors were the mainstream choice for achieving high bandwidth and low latency. As a result, there was a lack of empirical evidence to demonstrate whether open, Ethernet-based networks could achieve equivalent performance and stability in production environments. Furthermore, publicly available case studies on GPU cluster usage patterns have tended to focus on hyperscale clusters comprising tens of thousands of GPUs, while there has been a scarcity of published operational data regarding mid-sized clusters\u2014typically consisting of several hundred GPUs\u2014that are actually in use at many AI development sites around the world, including in Japan.<\/p>\n<h3>Research Features and Outcomes<\/h3>\n<p>SAKURA internet Research Center has built and operates SAKURAONE, an AI-focused HPC cluster equipped with 800 NVIDIA H100 GPUs (eight GPUs per node across 100 nodes), utilizing an 800 GbE rail-optimized leaf-spine fabric based on SONiC, an open network OS. Given the scarcity of such case studies, we have widely shared its insights on the design, implementation, and operation of this system through this paper.<\/p>\n<p>SAKURAONE recorded 33.95 PFLOP\/s on the HPL (High Performance Linpack) benchmark\u2014a standard metric for supercomputer computational performance\u2014in the ISC 2025 Top 500, ranking 49th globally. It is also the only system among the top 100 that exclusively uses an open network stack. Furthermore, the system achieved 396.295 TFLOP\/s in the HPCG (High Performance Conjugate Gradients) benchmark\u2014which evaluates performance closer to real-world scientific and engineering computations\u2014and 339.86 PFLOP\/s in the FP8 mixed-precision HPL-MxP benchmark, demonstrating that open technologies can deliver HPC and AI performance rivaling that of dedicated networks.<\/p>\n<p>In addition, we gained the following insights from our analysis of real-world operational data from a Japanese medical LLM project conducted using this system.<\/p>\n<ul>\n<li>While the majority of jobs are small-scale jobs involving one to four nodes, the bulk of GPU usage is consumed by large-scale jobs involving 17 or more nodes, confirming that GPU resource consumption is concentrated in a small number of large-scale jobs.<\/li>\n<li>While the majority of jobs are completed quickly, we found that as jobs scale to 17 or more nodes, the distribution of execution times exhibits a long tail; approximately 13.6% of jobs running on 17 to 32 nodes continue to run for more than one week.<\/li>\n<li>As projects progress, we observed a transition typical of the LLM development lifecycle, in which resource utilization shifts from a large-scale pre-training phase to a medium-scale fine-tuning phase.<\/li>\n<\/ul>\n<p>These findings provide valuable insights for both industry and academia as design guidelines for AI infrastructure. What is particularly significant is that the workload trends reported for hyperscale clusters have been observed in the same way in mid-sized clusters such as SAKURAONE. This demonstrates that these characteristics are common properties properties of GPU clusters that do not depend on system scale, and provides insights that can serve as a reference for design decisions for operators worldwide running clusters of 100 to 1,000 GPUs. In addition to demonstrating the feasibility of an open, cost-effective HPC infrastructure, the workload analysis based on actual operations is expected to serve as design guidelines for next-generation industrial computing platforms.<\/p>\n<p>SAKURA internet Research Center will continue its research and development endeavors to realize new and valuable internet infrastructures for society.<\/p>\n<h2>About the Accepted Paper<\/h2>\n<h3>Title<\/h3>\n<p>SAKURAONE: An Open Ethernet-Based AI HPC System and Its Observed Workload Dynamics in a Single-Tenant LLM Development Environment<\/p>\n<h3>Authors<\/h3>\n<p>Fumikazu Konishi, Yuuki Tsubouchi, Hirofumi Tsuruta (SAKURA internet Inc.)<\/p>\n<h3>Paper<\/h3>\n<p><a href=\"https:\/\/arxiv.org\/abs\/2604.13600\" target=\"_blank\" rel=\"noopener\">https:\/\/arxiv.org\/abs\/2604.13600<\/a><\/p>\n<h3>Abstract<u><\/u><\/h3>\n<p>SAKURAONE is a managed high performance computing (HPC) cluster developed and operated by the SAKURA Internet Research Center. It builds on the KOKARYOKU PHY bare metal GPU platform and is optimized for advanced workloads, including large language model (LLM) training. In ISC 2025 TOP500, SAKURAONE is ranked 49th by HPL and is the only top 100 system that uses a fully open networking stack\u2014800 GbE with SONiC\u2014demonstrating the scalability of vendor-neutral technology. Measured performance is 33.95 PFLOP\/s (HPL Rmax), 396.295 TFLOP\/s (HPCG), and 339.86 PFLOP\/s on HPL-MxP with FP8. The system consists of 100 nodes, each with eight NVIDIA H100 GPUs and a 2 PB all-flash Lustre file system, interconnected via a rail-optimized 800 GbE leaf\u2013spine fabric with RoCEv2. Through exclusive use by a single research project, we observed the characteristics of development-related jobs. Consistent with previous HPC studies, small-scale jobs dominated in number, while a few large-scale jobs accounted for most GPU resource time. As the project progressed, resource use shifted from large-scale to mid-scale jobs, reflecting a transition from initial large-scale training to iterative refinement. These observations illustrate the real-world utilization dynamics of GPU clusters under unified project workloads.<\/p>\n<h2>Presentation at MLSys 2026 Industry Track<\/h2>\n<h3>About MLSys<\/h3>\n<p>MLSys is an international conference dedicated to presenting cutting-edge research in fields where machine learning and systems converge. The first conference was held in 2018 under the name \u201cSysML\u201d, and it has been held annually ever since, even after being changed to its current name. It was established with the aim of fostering knowledge-sharing and creating new points of connection between the two research communities of machine learning and systems. Covering a wide range of topics, from system technologies that support the efficiency and scalability of machine learning to practical initiatives aimed at integrating these technologies into the real world, it is one of the most prestigious conferences of its kind, attracting attention from both industry and academia.<\/p>\n<h3>About Industry Track<\/h3>\n<p>Industry Track is a new presentation track introduced at MLSys 2026, designed to share insights gained from the design and implementation of systems currently in operation and under development. Unlike the research track, papers in this track are evaluated not so much for their novelty, such as the proposal of new methods, but rather for their ability to provide design methodologies, performance evaluations, and implementation insights backed by large-scale deployment in production environments.<\/p>\n<h3>Date and Venue<\/h3>\n<p>Date: Thursday, May 21, 2026<br \/>\nVenue: Hyatt Regency Bellevue, Bellevue, Washington, USA<\/p>\n<h3>Presenter<\/h3>\n<p>Fumikazu Konishi, Yuuki Tsubouchi, Hirofumi Tsuruta (SAKURA internet Inc.)<\/p>\n<h3>Details<\/h3>\n<p>Please see the website below.<br \/>\n<a href=\"https:\/\/mlsys.org\/Conferences\/2026\" target=\"_blank\" rel=\"noopener\">https:\/\/mlsys.org\/Conferences\/2026<\/a><\/p>\n<h3>Acknowledgments<\/h3>\n<p>The data analyzed in this paper consists of activity data (such as system behavior during the learning process) observed during the training of a medical LLM, which is the result of research conducted with funding from the Strategic Innovation Promotion Program (SIP) project, \u201cEstablishment of an Integrated Healthcare System\u201d (JPJ012425).<\/p>\n<h3>For Reference<\/h3>\n<p>SAKURAONE, SAKURA internet Research Center\u2019s Cloud-Based Supercomputer System, Ranks 49th Worldwide in the Top 500 Processing Performance Ranking (released on June 11, 2025)<\/p>\n<p><a href=\"https:\/\/www.sakura.ad.jp\/corporate\/en\/information\/2025\/08\/15\/1968220604\/\" target=\"_blank\">https:\/\/www.sakura.ad.jp\/corporate\/en\/information\/2025\/08\/15\/1968220604\/<\/a><\/p>\n","protected":false},"featured_media":1968224372,"template":"","class_list":["post-1968224368","en_newsreleases","type-en_newsreleases","status-publish","has-post-thumbnail","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.sakura.ad.jp\/corporate\/wp-json\/wp\/v2\/en_newsreleases\/1968224368","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sakura.ad.jp\/corporate\/wp-json\/wp\/v2\/en_newsreleases"}],"about":[{"href":"https:\/\/www.sakura.ad.jp\/corporate\/wp-json\/wp\/v2\/types\/en_newsreleases"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.sakura.ad.jp\/corporate\/wp-json\/wp\/v2\/media\/1968224372"}],"wp:attachment":[{"href":"https:\/\/www.sakura.ad.jp\/corporate\/wp-json\/wp\/v2\/media?parent=1968224368"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}