Publications resulting from research conducted using Delta AI appear here. Check back to see how the list of exciting discoveries made using Delta grows.
If you have a publication that should be listed here and isn’t, please share your success with us!
5854943
3NXZNVBX
1
nature
50
default
4596
https://delta.ncsa.illinois.edu/wp-content/plugins/zotpress/
%7B%22status%22%3A%22success%22%2C%22updateneeded%22%3Afalse%2C%22instance%22%3Afalse%2C%22meta%22%3A%7B%22request_last%22%3A50%2C%22request_next%22%3A50%2C%22used_cache%22%3Atrue%7D%2C%22data%22%3A%5B%7B%22key%22%3A%22ZAC4UYB5%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Pandey%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BPandey%2C%20S.%2C%20Lovell%2C%20C.%20C.%2C%20Modi%2C%20C.%20%26amp%3B%20Wandelt%2C%20B.%20D.%20Galactification%3A%20painting%20galaxies%20onto%20dark%20matter%20only%20simulations%20using%20a%20transformer-based%20model.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2511.08438%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2511.08438%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Galactification%3A%20painting%20galaxies%20onto%20dark%20matter%20only%20simulations%20using%20a%20transformer-based%20model%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Shivam%22%2C%22lastName%22%3A%22Pandey%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Christopher%20C.%22%2C%22lastName%22%3A%22Lovell%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Chirag%22%2C%22lastName%22%3A%22Modi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Benjamin%20D.%22%2C%22lastName%22%3A%22Wandelt%22%7D%5D%2C%22abstractNote%22%3A%22Connecting%20the%20formation%20and%20evolution%20of%20galaxies%20to%20the%20large-scale%20structure%20is%20crucial%20for%20interpreting%20cosmological%20observations.%20While%20hydrodynamical%20simulations%20accurately%20model%20the%20correlated%20properties%20of%20galaxies%2C%20they%20are%20computationally%20prohibitive%20to%20run%20over%20volumes%20that%20match%20modern%20surveys.%20We%20address%20this%20by%20developing%20a%20framework%20to%20rapidly%20generate%20mock%20galaxy%20catalogs%20conditioned%20on%20inexpensive%20dark-matter-only%20simulations.%20We%20present%20a%20multi-modal%2C%20transformer-based%20model%20that%20takes%203D%20dark%20matter%20density%20and%20velocity%20fields%20as%20input%2C%20and%20outputs%20a%20corresponding%20point%20cloud%20of%20galaxies%20with%20their%20physical%20properties.%20We%20demonstrate%20that%20our%20trained%20model%20faithfully%20reproduces%20a%20variety%20of%20galaxy%20summary%20statistics%20and%20correctly%20captures%20their%20variation%20with%20changes%20in%20the%20underlying%20cosmological%20and%20astrophysical%20parameters%2C%20making%20it%20the%20first%20accelerated%20forward%20model%20to%20capture%20all%20the%20relevant%20galaxy%20properties%2C%20their%20full%20spatial%20distribution%2C%20and%20their%20conditional%20dependencies%20in%20hydrosimulations.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2511.08438%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2511.08438%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T23%3A20%3A44Z%22%7D%7D%2C%7B%22key%22%3A%22QR8MRRBI%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Zhao%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BZhao%2C%20Y.%2C%20Wang%2C%20Z.%20%26amp%3B%20Zhang%2C%20M.%20PuzzleMoE%3A%20Efficient%20Compression%20of%20Large%20Mixture-of-Experts%20Models%20via%20Sparse%20Expert%20Merging%20and%20Bit-packed%20inference.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2511.04805%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2511.04805%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22PuzzleMoE%3A%20Efficient%20Compression%20of%20Large%20Mixture-of-Experts%20Models%20via%20Sparse%20Expert%20Merging%20and%20Bit-packed%20inference%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yushu%22%2C%22lastName%22%3A%22Zhao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zheng%22%2C%22lastName%22%3A%22Wang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Minjia%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22Mixture-of-Experts%20%28MoE%29%20models%20have%20shown%20strong%20potential%20in%20scaling%20language%20models%20efficiently%20by%20activating%20only%20a%20small%20subset%20of%20experts%20per%20input.%20However%2C%20their%20widespread%20deployment%20remains%20limited%20due%20to%20the%20high%20memory%20overhead%20associated%20with%20storing%20all%20expert%20parameters%2C%20particularly%20as%20the%20number%20of%20experts%20increases.%20To%20address%20this%20challenge%2C%20prior%20works%20have%20explored%20expert%20dropping%20and%20merging%20strategies%2C%20yet%20they%20often%20suffer%20from%20performance%20drop%20at%20high%20compression%20ratios.%20In%20this%20paper%2C%20we%20introduce%20PuzzleMoE%2C%20a%20training-free%20MoE%20compression%20method%20that%20achieves%20both%20high%20accuracy%20and%20efficient%20inference%20through%20two%20key%20innovations%3A%20First%2C%20PuzzleMoE%20performs%20sparse%20expert%20merging%20by%20identifying%20element-wise%20weight%20redundancy%20and%20specialization.%20It%20uses%20a%20dual-mask%20to%20capture%20both%20shared%20and%20expert-specific%20parameters.%20Second%2C%20to%20avoid%20the%20overhead%20of%20storing%20binary%20masks%20and%20signs%2C%20PuzzleMoE%20introduces%20a%20bit-packed%20encoding%20scheme%20that%20reuses%20underutilized%20exponent%20bits%2C%20enabling%20efficient%20MoE%20inference%20on%20GPUs.%20Extensive%20experiments%20demonstrate%20that%20PuzzleMoE%20can%20compress%20MoE%20models%20by%20up%20to%2050%25%20while%20maintaining%20accuracy%20across%20various%20tasks.%20Specifically%2C%20it%20outperforms%20prior%20MoE%20compression%20methods%20by%20up%20to%2016.7%25%20on%20MMLU%20at%2050%25%20compression%20ratio%2C%20and%20achieves%20up%20to%201.28%5C%5Ctimes%20inference%20speedup.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2511.04805%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2511.04805%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T23%3A17%3A35Z%22%7D%7D%2C%7B%22key%22%3A%227WXZEM4G%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Yan%20et%20al.%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BYan%2C%20X.%2C%20Firestone%2C%20M.%20A.%2C%20Keceli%2C%20M.%2C%20Chaudhuri%2C%20S.%20%26amp%3B%20Huerta%2C%20E.%20From%20Atomistic%20Models%20to%20Machine%20Learning%3A%20Predictive%20Design%20of%20Nanocarbons%20under%20Extreme%20Conditions.%20%26lt%3Bi%26gt%3BBiomedicine%26lt%3B%5C%2Fi%26gt%3B%20%26lt%3Bb%26gt%3B26%26lt%3B%5C%2Fb%26gt%3B%2C%2027.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22From%20Atomistic%20Models%20to%20Machine%20Learning%3A%20Predictive%20Design%20of%20Nanocarbons%20under%20Extreme%20Conditions%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Xiaoli%22%2C%22lastName%22%3A%22Yan%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Millicent%20A%22%2C%22lastName%22%3A%22Firestone%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Murat%22%2C%22lastName%22%3A%22Keceli%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Santanu%22%2C%22lastName%22%3A%22Chaudhuri%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Eliu%22%2C%22lastName%22%3A%22Huerta%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%22%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22ISSN%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fscholar.google.com%5C%2Fcitations%3Fview_op%3Dview_citation%26hl%3Den%26user%3DrnOPrqUAAAAJ%26citation_for_view%3DrnOPrqUAAAAJ%3AiH-uZ7U-co4C%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T23%3A11%3A40Z%22%7D%7D%2C%7B%22key%22%3A%22NXL3YH6V%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Zeng%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BZeng%2C%20G.%2C%20Zhou%2C%20Z.%2C%20Arora%2C%20D.%20%26amp%3B%20Zanette%2C%20A.%20Shrinking%20the%20Variance%3A%20Shrinkage%20Baselines%20for%20Reinforcement%20Learning%20with%20Verifiable%20Rewards.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2511.03710%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2511.03710%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Shrinking%20the%20Variance%3A%20Shrinkage%20Baselines%20for%20Reinforcement%20Learning%20with%20Verifiable%20Rewards%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Guanning%22%2C%22lastName%22%3A%22Zeng%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zhaoyi%22%2C%22lastName%22%3A%22Zhou%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Daman%22%2C%22lastName%22%3A%22Arora%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Andrea%22%2C%22lastName%22%3A%22Zanette%22%7D%5D%2C%22abstractNote%22%3A%22Reinforcement%20Learning%20with%20Verifiable%20Rewards%20%28RLVR%29%20has%20emerged%20as%20a%20powerful%20paradigm%20for%20post-training%20large%20reasoning%20models%20%28LRMs%29%20using%20policy-gradient%20methods%20such%20as%20GRPO.%20To%20stabilize%20training%2C%20these%20methods%20typically%20center%20trajectory%20rewards%20by%20subtracting%20the%20empirical%20mean%20for%20each%20prompt.%20Statistically%2C%20this%20centering%20acts%20as%20a%20control%20variate%20%28or%20baseline%29%2C%20reducing%20the%20variance%20of%20the%20policy-gradient%20estimator.%5Cn%20Typically%2C%20the%20mean%20reward%20is%20estimated%20using%20per-prompt%20empirical%20averages%20for%20each%20prompt%20in%20a%20batch.%20Drawing%20inspiration%20from%20Stein%26%23039%3Bs%20paradox%2C%20we%20propose%20using%20shrinkage%20estimators%20that%20combine%20per-prompt%20and%20across-prompt%20means%20to%20improve%20the%20overall%20per-prompt%20mean%20estimation%20accuracy%20--%20particularly%20in%20the%20low-generation%20regime%20typical%20of%20RLVR.%20Theoretically%2C%20we%20construct%20a%20shrinkage-based%20baseline%20that%20provably%20yields%20lower-variance%20policy-gradient%20estimators%20across%20algorithms.%20Our%20proposed%20baseline%20serves%20as%20a%20drop-in%20replacement%20for%20existing%20per-prompt%20mean%20baselines%2C%20requiring%20no%20additional%20hyper-parameters%20or%20computation.%20Empirically%2C%20shrinkage%20baselines%20consistently%20outperform%20standard%20empirical-mean%20baselines%2C%20leading%20to%20lower-variance%20gradient%20updates%20and%20improved%20training%20stability.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2511.03710%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2511.03710%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T22%3A48%3A15Z%22%7D%7D%2C%7B%22key%22%3A%22I5WRSH6Z%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Wen%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BWen%2C%20J.%2C%20Schwing%2C%20A.%20G.%20%26amp%3B%20Wang%2C%20S.%20NoPo-Avatar%3A%20Generalizable%20and%20Animatable%20Avatars%20from%20Sparse%20Inputs%20without%20Human%20Poses.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2511.16673%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2511.16673%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22NoPo-Avatar%3A%20Generalizable%20and%20Animatable%20Avatars%20from%20Sparse%20Inputs%20without%20Human%20Poses%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jing%22%2C%22lastName%22%3A%22Wen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Alexander%20G.%22%2C%22lastName%22%3A%22Schwing%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Shenlong%22%2C%22lastName%22%3A%22Wang%22%7D%5D%2C%22abstractNote%22%3A%22We%20tackle%20the%20task%20of%20recovering%20an%20animatable%203D%20human%20avatar%20from%20a%20single%20or%20a%20sparse%20set%20of%20images.%20For%20this%20task%2C%20beyond%20a%20set%20of%20images%2C%20many%20prior%20state-of-the-art%20methods%20use%20accurate%20%26quot%3Bground-truth%26quot%3B%20camera%20poses%20and%20human%20poses%20as%20input%20to%20guide%20reconstruction%20at%20test-time.%20We%20show%20that%20pose-dependent%20reconstruction%20degrades%20results%20significantly%20if%20pose%20estimates%20are%20noisy.%20To%20overcome%20this%2C%20we%20introduce%20NoPo-Avatar%2C%20which%20reconstructs%20avatars%20solely%20from%20images%2C%20without%20any%20pose%20input.%20By%20removing%20the%20dependence%20of%20test-time%20reconstruction%20on%20human%20poses%2C%20NoPo-Avatar%20is%20not%20affected%20by%20noisy%20human%20pose%20estimates%2C%20making%20it%20more%20widely%20applicable.%20Experiments%20on%20challenging%20THuman2.0%2C%20XHuman%2C%20and%20HuGe100K%20data%20show%20that%20NoPo-Avatar%20outperforms%20existing%20baselines%20in%20practical%20settings%20%28without%20ground-truth%20poses%29%20and%20delivers%20comparable%20results%20in%20lab%20settings%20%28with%20ground-truth%20poses%29.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2511.16673%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2511.16673%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T22%3A37%3A02Z%22%7D%7D%2C%7B%22key%22%3A%22GJFNYR5P%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Mohapatra%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BMohapatra%2C%20R.%2C%20Dutta%2C%20A.%20%26amp%3B%20Sharma%2C%20P.%20Tracing%20Multiphase%20Structure%20in%20the%20Circumgalactic%20Medium%3A%20Insights%20from%20Magnetohydrodynamic%20Turbulence%20Simulations.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2511.00229%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2511.00229%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Tracing%20Multiphase%20Structure%20in%20the%20Circumgalactic%20Medium%3A%20Insights%20from%20Magnetohydrodynamic%20Turbulence%20Simulations%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Rajsekhar%22%2C%22lastName%22%3A%22Mohapatra%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Alankar%22%2C%22lastName%22%3A%22Dutta%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Prateek%22%2C%22lastName%22%3A%22Sharma%22%7D%5D%2C%22abstractNote%22%3A%22The%20circumgalactic%20medium%20%28CGM%29%20is%20the%20diffuse%20gas%20surrounding%20a%20galaxy%26%23039%3Bs%20halo%2C%20and%20it%20plays%20a%20vital%20role%20in%20the%20galactic%20baryon%20cycle.%20However%2C%20its%20mass%20distribution%20across%20the%20virial%20phase%20and%20the%20cooler%2C%20denser%20atomic%20phase%2C%20remains%20uncertain.%20To%20investigate%20this%2C%20we%20perform%20high-resolution%20magnetohydrodynamic%20simulations%20of%200.125--8%20kpc-scale%20representative%20patches%20of%20the%20CGM%2C%20with%20parameters%20informed%20by%20quasar%20absorption%20line%20observations.%20Our%20simulations%20resolve%20the%20cooling%20length%20%28the%20minimum%20across%20all%20temperatures%20of%20%24c_s%20t_%7B%5C%5Crm%20cool%7D%24%2C%20where%20%24c_s%24%20is%20the%20sound%20speed%20and%20%24t_%7B%5C%5Crm%20cool%7D%24%20is%20the%20cooling%20time%20in%20isobaric%20conditions%29%2C%20allowing%20us%20to%20track%20the%20evolution%20of%20cold%20gas%20more%20accurately.%20We%20find%20that%20low-density%20CGM%20gas%20%28%243%5C%5Ctimes10%5E%7B-4%7D%24%20cm%24%5E%7B-3%7D%24%29%20cannot%20sustain%20cold%20gas%20below%20%2410%5E4%24%20K%20for%20long%2C%20due%20to%20a%20large%20value%20of%20the%20ratio%20between%20the%20cooling%20to%20mixing%20time%20%28%24t_%7B%5C%5Crm%20cool%7D%5C%2Ft_%7B%5C%5Crm%20mix%7D%24%29.%20In%20contrast%2C%20higher-density%20environments%20%28%243%5C%5Ctimes10%5E%7B-3%7D~%7B%5C%5Crm%20cm%7D%5E%7B-3%7D%24%29%20reach%20a%20turbulent%20multiphase%20steady%20state%2C%20with%20up%20to%20%2450%5C%5C%25%24%20of%20the%20mass%20in%20the%20cold%20phase%2C%20occupying%20only%20about%20%241%5C%5C%25%24%20of%20the%20volume.%20To%20connect%20with%20large-volume%20cosmological%20simulations%20and%20small%20%24%7B%5C%5Crm%20pc%7D%24-scale%20idealized%20simulations%2C%20we%20explore%20different%20box%20sizes%20%280.125--8%20kpc%29%20and%20identify%20a%20key%20scaling%20relation%3A%20simulations%20with%20similar%20%24t_%7B%5C%5Crm%20cool%7D%5C%2Ft_%7B%5C%5Crm%20mix%7D%24%20exhibit%20comparable%20cold%20gas%20mass%20fractions%20and%20lifetimes.%20Importantly%2C%20we%20find%20that%20simply%20sub-sampling%20%28reducing%20box-size%29%20a%20small%20region%20from%20a%20large-volume%20simulation%20while%20maintaining%20a%20constant%20turbulent%20energy%20density%20injection%20rate%20from%20larger%20to%20smaller%20scales%20artificially%20shortens%20%24t_%5C%5Cmathrm%7Bmix%7D%24%2C%20leading%20to%20inaccurate%20predictions%20for%20cold%20gas%20survival.%20This%20means%20that%20cold%20gas%20at%20small%20%24%5C%5Clesssim%2010%24%20kpc%20scales%20arises%20in%20relatively%20dense%2C%20quiescent%20regions%20of%20the%20CGM%20rather%20than%20the%20turbulent%20ones%20undergoing%20cascade%20from%20large%20scales.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2511.00229%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2511.00229%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T22%3A33%3A18Z%22%7D%7D%2C%7B%22key%22%3A%22R8DVF66A%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Loehr%20and%20Clark%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BLoehr%2C%20K.%20%26amp%3B%20Clark%2C%20B.%20K.%20Enhancing%20Neural%20Network%20Backflow.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2510.26906%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2510.26906%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Enhancing%20Neural%20Network%20Backflow%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kieran%22%2C%22lastName%22%3A%22Loehr%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Bryan%20K.%22%2C%22lastName%22%3A%22Clark%22%7D%5D%2C%22abstractNote%22%3A%22Accurately%20describing%20the%20ground%20state%20of%20strongly%20correlated%20systems%20is%20essential%20for%20understanding%20their%20emergent%20properties.%20Neural%20Network%20Backflow%20%28NNBF%29%20is%20a%20powerful%20variational%20ansatz%20that%20enhances%20mean-field%20wave%20functions%20by%20introducing%20configuration-dependent%20modifications%20to%20single-particle%20orbitals.%20Although%20NNBF%20is%20theoretically%20universal%20in%20the%20limit%20of%20large%20networks%2C%20we%20find%20that%20practical%20gains%20saturate%20with%20increasing%20network%20size.%20Instead%2C%20significant%20improvements%20can%20be%20achieved%20by%20using%20a%20multi-determinant%20ansatz.%20We%20explore%20efficient%20ways%20to%20generate%20these%20multi-determinant%20expansions%20without%20increasing%20the%20number%20of%20variational%20parameters.%20In%20particular%2C%20we%20study%20single-step%20Lanczos%20and%20symmetry%20projection%20techniques%2C%20benchmarking%20their%20performance%20against%20diffusion%20Monte%20Carlo%20and%20NNBF%20applied%20to%20alternative%20mean%20fields.%20Benchmarking%20on%20a%20doped%20periodic%20square%20Hubbard%20model%20near%20optimal%20doping%2C%20we%20find%20that%20a%20Lanczos%20step%2C%20diffusion%20Monte%20Carlo%2C%20and%20projection%20onto%20a%20symmetry%20sector%20all%20give%20similar%20improvements%20achieving%20state-of-the-art%20energies%20at%20minimal%20cost.%20By%20further%20optimizing%20the%20projected%20symmetrized%20states%20directly%2C%20we%20gain%20significantly%20in%20energy.%20Using%20this%20technique%20we%20report%20the%20lowest%20variational%20energies%20for%20this%20Hamiltonian%20on%20%244%5C%5Ctimes%2016%24%20and%20%244%20%5C%5Ctimes%208%24%20lattices%20as%20well%20as%20accurate%20variance%20extrapolated%20energies.%20We%20also%20show%20the%20evolution%20of%20spin%2C%20charge%2C%20and%20pair%20correlation%20functions%20as%20the%20quality%20of%20the%20variational%20ansatz%20improves.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2510.26906%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2510.26906%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T22%3A31%3A26Z%22%7D%7D%2C%7B%22key%22%3A%22RM5RNTCY%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Zhang%20et%20al.%22%2C%22parsedDate%22%3A%222025-10-29%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BZhang%2C%20Z.%20A.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20One%20Token%20per%20Highly%20Selective%20Frame%3A%20Towards%20Extreme%20Compression%20for%20Long%20Video%20Understanding.%20in%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22One%20Token%20per%20Highly%20Selective%20Frame%3A%20Towards%20Extreme%20Compression%20for%20Long%20Video%20Understanding%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zheyu%20Aqa%22%2C%22lastName%22%3A%22Zhang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ziqi%22%2C%22lastName%22%3A%22Pang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Shixing%22%2C%22lastName%22%3A%22Chen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Xiang%22%2C%22lastName%22%3A%22Hao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Vimal%22%2C%22lastName%22%3A%22Bhat%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yu-Xiong%22%2C%22lastName%22%3A%22Wang%22%7D%5D%2C%22abstractNote%22%3A%22Long%20video%20understanding%20is%20inherently%20challenging%20for%20vision-language%20models%20%28VLMs%29%20because%20of%20the%20extensive%20number%20of%20frames.%20With%20each%20video%20frame%20typically%20expanding%20into%20tens%20or%20hundreds%20of%20tokens%2C%20the%20limited%20context%20length%20of%20large%20language%20models%20%28LLMs%29%20forces%20the%20VLMs%20to%20perceive%20the%20frames%20sparsely%20and%20lose%20temporal%20information.%20To%20address%20this%2C%20we%20explore%20extreme%20video%20token%20compression%20towards%20%2Aone%20token%20per%20frame%2A%20at%20the%20final%20LLM%20layer.%20Our%20key%20insight%20is%20that%20heuristic-based%20compression%2C%20widely%20adopted%20by%20previous%20methods%2C%20is%20prone%20to%20information%20loss%2C%20and%20this%20necessitates%20supervising%20LLM%20layers%20into%20%2Alearnable%2A%20and%20%2Aprogressive%2A%20modules%20for%20%2Atoken-level%20compression%2A%20%28LP-Comp%29.%20Such%20compression%20enables%20our%20VLM%20to%20digest%202x-4x%20more%20frames%20with%20improved%20performance.%20To%20further%20increase%20the%20token%20efficiency%2C%20we%20investigate%20%5C%5Cemph%7Bframe-level%20compression%7D%2C%20which%20selects%20the%20frames%20most%20relevant%20to%20the%20queries%20via%20the%20internal%20attention%20scores%20of%20the%20LLM%20layers%2C%20named%20%2Aquestion-conditioned%20compression%2A%20%28QC-Comp%29.%20As%20a%20notable%20distinction%20from%20previous%20studies%2C%20we%20mitigate%20the%20position%20bias%20of%20LLM%20attention%20in%20long%20contexts%2C%20%2Ai.e.%2A%2C%20the%20over-concentration%20on%20the%20beginning%20and%20end%20of%20a%20sequence%2C%20by%20splitting%20long%20videos%20into%20short%20segments%20and%20employing%20local%20attention.%20Collectively%2C%20our%20combined%20%2Atoken-level%2A%20and%20%2Aframe-level%2A%20leads%20to%20an%20e%2A%2Ax%2A%2Atreme%20compression%20model%20for%20long%20video%20understanding%2C%20named%20%2A%2AXComp%2A%2A%2C%20achieving%20a%20significantly%20larger%20compression%20ratio%20and%20enabling%20denser%20frame%20sampling.%20Our%20XComp%20is%20finetuned%20from%20VideoChat-Flash%20with%20a%20data-efficient%20%2Asupervised%20compression%20tuning%2A%20stage%20that%20only%20requires%202.5%5C%5C%25%20of%20the%20supervised%20fine-tuning%20data%2C%20yet%20boosts%20the%20accuracy%20from%2042.9%5C%5C%25%20to%2046.2%5C%5C%25%20on%20LVBench%20and%20enhances%20multiple%20other%20long%20video%20benchmarks.%22%2C%22date%22%3A%222025%5C%2F10%5C%2F29%22%2C%22proceedingsTitle%22%3A%22%22%2C%22conferenceName%22%3A%22The%20Thirty-ninth%20Annual%20Conference%20on%20Neural%20Information%20Processing%20Systems%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%22%22%2C%22ISBN%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fopenreview.net%5C%2Fforum%3Fid%3DbythzT0b81%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T22%3A13%3A13Z%22%7D%7D%2C%7B%22key%22%3A%22NUNL7DQV%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Vega%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BVega%2C%20O.%2C%20Komijani%2C%20J.%2C%20El-Khadra%2C%20A.%20%26amp%3B%20Marinkovic%2C%20M.%20Group-Equivariant%20Diffusion%20Models%20for%20Lattice%20Field%20Theory.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2510.26081%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2510.26081%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Group-Equivariant%20Diffusion%20Models%20for%20Lattice%20Field%20Theory%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Octavio%22%2C%22lastName%22%3A%22Vega%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Javad%22%2C%22lastName%22%3A%22Komijani%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Aida%22%2C%22lastName%22%3A%22El-Khadra%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Marina%22%2C%22lastName%22%3A%22Marinkovic%22%7D%5D%2C%22abstractNote%22%3A%22Near%20the%20critical%20point%2C%20Markov%20Chain%20Monte%20Carlo%20%28MCMC%29%20simulations%20of%20lattice%20quantum%20field%20theories%20%28LQFT%29%20become%20increasingly%20inefficient%20due%20to%20critical%20slowing%20down.%20In%20this%20work%2C%20we%20investigate%20score-based%20symmetry-preserving%20diffusion%20models%20as%20an%20alternative%20strategy%20to%20sample%20two-dimensional%20%24%5Cu03d5%5E4%24%20and%20%24%7B%5C%5Crm%20U%7D%281%29%24%20lattice%20field%20theories.%20We%20develop%20score%20networks%20that%20are%20equivariant%20to%20a%20range%20of%20group%20transformations%2C%20including%20global%20%24%5C%5Cmathbb%7BZ%7D_2%24%20reflections%2C%20local%20%24%7B%5C%5Crm%20U%7D%281%29%24%20rotations%2C%20and%20periodic%20translations%20%24%5C%5Cmathbb%7BT%7D%24.%20The%20score%20networks%20are%20trained%20using%20an%20augmented%20training%20scheme%2C%20which%20significantly%20improves%20sample%20quality%20in%20the%20simulated%20field%20theories.%20We%20also%20demonstrate%20empirically%20that%20our%20symmetry-aware%20models%20outperform%20generic%20score%20networks%20in%20sample%20quality%2C%20expressivity%2C%20and%20effective%20sample%20size.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2510.26081%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2510.26081%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T21%3A52%3A20Z%22%7D%7D%2C%7B%22key%22%3A%22Q6W7S77N%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Cui%20et%20al.%22%2C%22parsedDate%22%3A%222025-11-16%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BCui%2C%20S.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20Story%20of%20Two%20GPUs%3A%20Characterizing%20the%20Resilience%20of%20Hopper%20H100%20and%20Ampere%20A100%20GPUs.%20in%20%26lt%3Bi%26gt%3BProceedings%20of%20the%20International%20Conference%20for%20High%20Performance%20Computing%2C%20Networking%2C%20Storage%20and%20Analysis%26lt%3B%5C%2Fi%26gt%3B%201145%26%23x2013%3B1164%20%28ACM%2C%20St.%20Louis%20MO%20USA%2C%202025%29.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttp%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F3712285.3759821%26%23039%3B%26gt%3Bhttp%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F3712285.3759821%26lt%3B%5C%2Fa%26gt%3B.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Story%20of%20Two%20GPUs%3A%20Characterizing%20the%20Resilience%20of%20Hopper%20H100%20and%20Ampere%20A100%20GPUs%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Shengkun%22%2C%22lastName%22%3A%22Cui%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Archit%22%2C%22lastName%22%3A%22Patke%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hung%22%2C%22lastName%22%3A%22Nguyen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Aditya%22%2C%22lastName%22%3A%22Ranjan%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ziheng%22%2C%22lastName%22%3A%22Chen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Phuong%22%2C%22lastName%22%3A%22Cao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Gregory%22%2C%22lastName%22%3A%22Bauer%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Brett%22%2C%22lastName%22%3A%22Bode%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Catello%20Di%22%2C%22lastName%22%3A%22Martino%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Saurabh%22%2C%22lastName%22%3A%22Jha%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Chandra%22%2C%22lastName%22%3A%22Narayanaswami%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Daby%22%2C%22lastName%22%3A%22Sow%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zbigniew%20T.%22%2C%22lastName%22%3A%22Kalbarczyk%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ravishankar%20K.%22%2C%22lastName%22%3A%22Iyer%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%222025-11-16%22%2C%22proceedingsTitle%22%3A%22Proceedings%20of%20the%20International%20Conference%20for%20High%20Performance%20Computing%2C%20Networking%2C%20Storage%20and%20Analysis%22%2C%22conferenceName%22%3A%22SC%20%2725%3A%20The%20International%20Conference%20for%20High%20Performance%20Computing%2C%20Networking%2C%20Storage%20and%20Analysis%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1145%5C%2F3712285.3759821%22%2C%22ISBN%22%3A%229798400714665%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdl.acm.org%5C%2Fdoi%5C%2F10.1145%5C%2F3712285.3759821%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T21%3A41%3A39Z%22%7D%7D%2C%7B%22key%22%3A%22XY8DPWSE%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Zhang%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BZhang%2C%20Y.%2C%20Schwing%2C%20A.%20%26amp%3B%20Zhao%2C%20Z.%20Variational%20Masked%20Diffusion%20Models.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2510.23606%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2510.23606%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Variational%20Masked%20Diffusion%20Models%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yichi%22%2C%22lastName%22%3A%22Zhang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Alex%22%2C%22lastName%22%3A%22Schwing%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zhizhen%22%2C%22lastName%22%3A%22Zhao%22%7D%5D%2C%22abstractNote%22%3A%22Masked%20diffusion%20models%20have%20recently%20emerged%20as%20a%20flexible%20framework%20for%20discrete%20generative%20modeling.%20However%2C%20a%20key%20limitation%20of%20standard%20masked%20diffusion%20is%20its%20inability%20to%20effectively%20capture%20dependencies%20among%20tokens%20that%20are%20predicted%20concurrently%2C%20leading%20to%20degraded%20generation%20quality%20when%20dependencies%20among%20tokens%20are%20important.%20To%20explicitly%20model%20dependencies%20among%20tokens%2C%20we%20propose%20Variational%20Masked%20Diffusion%20%28VMD%29%2C%20a%20framework%20that%20introduces%20latent%20variables%20into%20the%20masked%20diffusion%20process.%20Through%20controlled%20experiments%20on%20synthetic%20datasets%2C%20we%20demonstrate%20that%20VMD%20successfully%20learns%20dependencies%20that%20conventional%20masked%20diffusion%20fails%20to%20capture.%20We%20further%20validate%20the%20effectiveness%20of%20our%20approach%20on%20Sudoku%20puzzles%20and%20text%20datasets%2C%20where%20learning%20of%20dependencies%20among%20tokens%20improves%20global%20consistency.%20Across%20these%20domains%2C%20VMD%20enhances%20both%20generation%20quality%20and%20dependency%20awareness%2C%20highlighting%20the%20value%20of%20integrating%20variational%20inference%20into%20masked%20diffusion.%20Our%20code%20is%20available%20at%3A%20https%3A%5C%2F%5C%2Friccizz.github.io%5C%2FVMD.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2510.23606%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2510.23606%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T21%3A34%3A48Z%22%7D%7D%2C%7B%22key%22%3A%228ZCE8FUK%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BCross-Domain%20Long-Term%20Forecasting%3A%20Radiation%20Dose%20from%20Sparse%20Neutron%20Sensor%20via%20Spatio-Temporal%20Operator%20Network.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-ItemURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Farxiv.org%5C%2Fhtml%5C%2F2510.18041v1%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Farxiv.org%5C%2Fhtml%5C%2F2510.18041v1%26lt%3B%5C%2Fa%26gt%3B.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22webpage%22%2C%22title%22%3A%22Cross-Domain%20Long-Term%20Forecasting%3A%20Radiation%20Dose%20from%20Sparse%20Neutron%20Sensor%20via%20Spatio-Temporal%20Operator%20Network%22%2C%22creators%22%3A%5B%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fhtml%5C%2F2510.18041v1%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T21%3A30%3A18Z%22%7D%7D%2C%7B%22key%22%3A%22YRMPZDFP%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Chen%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BChen%2C%20H.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20ERA%3A%20Transforming%20VLMs%20into%20Embodied%20Agents%20via%20Embodied%20Prior%20Learning%20and%20Online%20Reinforcement%20Learning.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2510.12693%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2510.12693%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22ERA%3A%20Transforming%20VLMs%20into%20Embodied%20Agents%20via%20Embodied%20Prior%20Learning%20and%20Online%20Reinforcement%20Learning%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hanyang%22%2C%22lastName%22%3A%22Chen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mark%22%2C%22lastName%22%3A%22Zhao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Rui%22%2C%22lastName%22%3A%22Yang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Qinwei%22%2C%22lastName%22%3A%22Ma%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ke%22%2C%22lastName%22%3A%22Yang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jiarui%22%2C%22lastName%22%3A%22Yao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kangrui%22%2C%22lastName%22%3A%22Wang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hao%22%2C%22lastName%22%3A%22Bai%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zhenhailong%22%2C%22lastName%22%3A%22Wang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Rui%22%2C%22lastName%22%3A%22Pan%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mengchao%22%2C%22lastName%22%3A%22Zhang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jose%22%2C%22lastName%22%3A%22Barreiros%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Aykut%22%2C%22lastName%22%3A%22Onol%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22ChengXiang%22%2C%22lastName%22%3A%22Zhai%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Heng%22%2C%22lastName%22%3A%22Ji%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Manling%22%2C%22lastName%22%3A%22Li%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Huan%22%2C%22lastName%22%3A%22Zhang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tong%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22Recent%20advances%20in%20embodied%20AI%20highlight%20the%20potential%20of%20vision%20language%20models%20%28VLMs%29%20as%20agents%20capable%20of%20perception%2C%20reasoning%2C%20and%20interaction%20in%20complex%20environments.%20However%2C%20top-performing%20systems%20rely%20on%20large-scale%20models%20that%20are%20costly%20to%20deploy%2C%20while%20smaller%20VLMs%20lack%20the%20necessary%20knowledge%20and%20skills%20to%20succeed.%20To%20bridge%20this%20gap%2C%20we%20present%20%5C%5Ctextit%7BEmbodied%20Reasoning%20Agent%20%28ERA%29%7D%2C%20a%20two-stage%20framework%20that%20integrates%20prior%20knowledge%20learning%20and%20online%20reinforcement%20learning%20%28RL%29.%20The%20first%20stage%2C%20%5C%5Ctextit%7BEmbodied%20Prior%20Learning%7D%2C%20distills%20foundational%20knowledge%20from%20three%20types%20of%20data%3A%20%281%29%20Trajectory-Augmented%20Priors%2C%20which%20enrich%20existing%20trajectory%20data%20with%20structured%20reasoning%20generated%20by%20stronger%20models%3B%20%282%29%20Environment-Anchored%20Priors%2C%20which%20provide%20in-environment%20knowledge%20and%20grounding%20supervision%3B%20and%20%283%29%20External%20Knowledge%20Priors%2C%20which%20transfer%20general%20knowledge%20from%20out-of-environment%20datasets.%20In%20the%20second%20stage%2C%20we%20develop%20an%20online%20RL%20pipeline%20that%20builds%20on%20these%20priors%20to%20further%20enhance%20agent%20performance.%20To%20overcome%20the%20inherent%20challenges%20in%20agent%20RL%2C%20including%20long%20horizons%2C%20sparse%20rewards%2C%20and%20training%20instability%2C%20we%20introduce%20three%20key%20designs%3A%20self-summarization%20for%20context%20management%2C%20dense%20reward%20shaping%2C%20and%20turn-level%20policy%20optimization.%20Extensive%20experiments%20on%20both%20high-level%20planning%20%28EB-ALFRED%29%20and%20low-level%20control%20%28EB-Manipulation%29%20tasks%20demonstrate%20that%20ERA-3B%20surpasses%20both%20prompting-based%20large%20models%20and%20previous%20training-based%20baselines.%20Specifically%2C%20it%20achieves%20overall%20improvements%20of%208.4%5C%5C%25%20on%20EB-ALFRED%20and%2019.4%5C%5C%25%20on%20EB-Manipulation%20over%20GPT-4o%2C%20and%20exhibits%20strong%20generalization%20to%20unseen%20tasks.%20Overall%2C%20ERA%20offers%20a%20practical%20path%20toward%20scalable%20embodied%20intelligence%2C%20providing%20methodological%20insights%20for%20future%20embodied%20AI%20systems.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2510.12693%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2510.12693%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T21%3A14%3A04Z%22%7D%7D%2C%7B%22key%22%3A%228FA77MJ7%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Wu%20and%20Zhang%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BWu%2C%20M.%20%26amp%3B%20Zhang%2C%20Z.%20Maple%3A%20A%20Multi-agent%20System%20for%20Portable%20Deep%20Learning%20across%20Clusters.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2510.08842%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2510.08842%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Maple%3A%20A%20Multi-agent%20System%20for%20Portable%20Deep%20Learning%20across%20Clusters%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Molang%22%2C%22lastName%22%3A%22Wu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zhao%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22Training%20deep%20learning%20%28DL%29%20models%20across%20Graphics%20Processing%20Unit%20%28GPU%29%20clusters%20is%20technically%20challenging.%20One%20aspect%20is%20that%20users%20have%20to%20compose%20command%20lines%20to%20adapt%20to%20the%20heterogeneous%20launchers%2C%20schedulers%2C%20affinity%20options%2C%20DL%20framework%20arguments%2C%20and%20environment%20variables.%20Composing%20correct%20command%20lines%20is%20error-prone%20and%20can%20easily%20frustrate%20users%2C%20impeding%20research%20or%20wasting%20resources.%20In%20this%20work%2C%20we%20present%20Maple%2C%20a%20multi-agent%20system%20that%20generates%20correct%20DL%20command%20lines%20with%20users%26%23039%3B%20natural%20language%20input.%20Maple%20consists%20of%20four%20agents%20with%20the%20functionalities%20of%20information%20extraction%2C%20template%20retrieval%2C%20command%20line%20verification%2C%20and%20error%20correction.%20We%20evaluate%20Maple%20on%20nine%20GPU%20clusters%20across%20national%20computing%20centers%20in%20the%20U.S.%2C%20five%20representative%20deep%20learning%20model%20families%2C%20and%20four%20commonly%20used%20parallel%20DL%20training%20paradigms.%20Our%20experiments%20also%20cover%20schedulers%20of%20SLURM%20and%20PBS%20and%20heterogeneous%20architectures%2C%20such%20as%20NVIDIA%20A100%5C%2FH200%20GPUs%20and%20Intel%20Max%20series%20GPUs.%20Maple%20achieves%2092.0%25%20accuracy%20in%20generating%20command%20lines%20across%20the%20567%20test%20cases.%20Leverage%20multiple%20language%20models%20with%20an%20aggregated%20size%20of%2010B%20parameters%2C%20Maple%20delivers%20comparable%20performance%20to%20the%20state-of-the-art%20models%20of%20GPT-5%2C%20Claude%2C%20and%20Gemini.%20Together%2C%20these%20results%20highlight%20Maple%26%23039%3Bs%20practical%20value%20in%20enabling%20portable%20and%20scalable%20distributed%20DL%20across%20heterogeneous%20HPC%20environments.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2510.08842%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2510.08842%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T21%3A00%3A23Z%22%7D%7D%2C%7B%22key%22%3A%22UQTK8JUZ%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Xie%20et%20al.%22%2C%22parsedDate%22%3A%222025-09-15%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BXie%2C%20H.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20Diamond%3A%20Harnessing%20GPU%20Resources%20for%20Scientific%20Deep%20Learning.%20in%20%26lt%3Bi%26gt%3B2025%20IEEE%20International%20Conference%20on%20eScience%20%28eScience%29%26lt%3B%5C%2Fi%26gt%3B%20196%26%23x2013%3B204%20%28IEEE%2C%20Chicago%2C%20IL%2C%20USA%2C%202025%29.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttp%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1109%5C%2FeScience65000.2025.00031%26%23039%3B%26gt%3Bhttp%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1109%5C%2FeScience65000.2025.00031%26lt%3B%5C%2Fa%26gt%3B.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Diamond%3A%20Harnessing%20GPU%20Resources%20for%20Scientific%20Deep%20Learning%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Haotian%22%2C%22lastName%22%3A%22Xie%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Rohan%22%2C%22lastName%22%3A%22Marwaha%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Minu%22%2C%22lastName%22%3A%22Mathew%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Song%22%2C%22lastName%22%3A%22Bian%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Gengcong%22%2C%22lastName%22%3A%22Yang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Minghao%22%2C%22lastName%22%3A%22Yan%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yadu%22%2C%22lastName%22%3A%22Babuji%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Owen%22%2C%22lastName%22%3A%22Price%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yinzhi%22%2C%22lastName%22%3A%22Wang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Volodymyr%22%2C%22lastName%22%3A%22Kindratenko%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Shivaram%22%2C%22lastName%22%3A%22Venkataraman%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kyle%22%2C%22lastName%22%3A%22Chard%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ian%20T.%22%2C%22lastName%22%3A%22Foster%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zhao%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%222025-9-15%22%2C%22proceedingsTitle%22%3A%222025%20IEEE%20International%20Conference%20on%20eScience%20%28eScience%29%22%2C%22conferenceName%22%3A%222025%20IEEE%20International%20Conference%20on%20eScience%20%28eScience%29%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%2210.1109%5C%2FeScience65000.2025.00031%22%2C%22ISBN%22%3A%229798331591458%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fieeexplore.ieee.org%5C%2Fdocument%5C%2F11181545%5C%2F%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T20%3A47%3A40Z%22%7D%7D%2C%7B%22key%22%3A%22ALFAXZ2P%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Patel%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BPatel%2C%20P.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20RADAR-Radio%20Afterglow%20Detection%20and%20AI-driven%20Response%3A%20A%20Federated%20Framework%20for%20Gravitational%20Wave%20Event%20Follow-Up.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.14827%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.14827%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22RADAR-Radio%20Afterglow%20Detection%20and%20AI-driven%20Response%3A%20A%20Federated%20Framework%20for%20Gravitational%20Wave%20Event%20Follow-Up%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Parth%22%2C%22lastName%22%3A%22Patel%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Alessandra%22%2C%22lastName%22%3A%22Corsi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22E.%20A.%22%2C%22lastName%22%3A%22Huerta%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kara%22%2C%22lastName%22%3A%22Merfeld%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Victoria%22%2C%22lastName%22%3A%22Tiki%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zilinghan%22%2C%22lastName%22%3A%22Li%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tekin%22%2C%22lastName%22%3A%22Bicer%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kyle%22%2C%22lastName%22%3A%22Chard%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ryan%22%2C%22lastName%22%3A%22Chard%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ian%20T.%22%2C%22lastName%22%3A%22Foster%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Maxime%22%2C%22lastName%22%3A%22Gonthier%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Valerie%22%2C%22lastName%22%3A%22Hayot-Sasson%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hai%20Duc%22%2C%22lastName%22%3A%22Nguyen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Haochen%22%2C%22lastName%22%3A%22Pan%22%7D%5D%2C%22abstractNote%22%3A%22The%20landmark%20detection%20of%20both%20gravitational%20waves%20%28GWs%29%20and%20electromagnetic%20%28EM%29%20radiation%20from%20the%20binary%20neutron%20star%20merger%20GW170817%20has%20spurred%20efforts%20to%20streamline%20the%20follow-up%20of%20GW%20alerts%20in%20current%20and%20future%20observing%20runs%20of%20ground-based%20GW%20detectors.%20Within%20this%20context%2C%20the%20radio%20band%20of%20the%20EM%20spectrum%20presents%20unique%20challenges.%20Sensitive%20radio%20facilities%20capable%20of%20detecting%20the%20faint%20radio%20afterglow%20seen%20in%20GW170817%2C%20and%20with%20sufficient%20angular%20resolution%2C%20have%20small%20fields%20of%20view%20compared%20to%20typical%20GW%20localization%20areas.%20Additionally%2C%20theoretical%20models%20predict%20that%20the%20radio%20emission%20from%20binary%20neutron%20star%20mergers%20can%20evolve%20over%20weeks%20to%20years%2C%20necessitating%20long-term%20monitoring%20to%20probe%20the%20physics%20of%20the%20various%20post-merger%20ejecta%20components.%20These%20constraints%2C%20combined%20with%20limited%20radio%20observing%20resources%2C%20make%20the%20development%20of%20more%20coordinated%20follow-up%20strategies%20essential%20--%20especially%20as%20the%20next%20generation%20of%20GW%20detectors%20promise%20a%20dramatic%20increase%20in%20detection%20rates.%20Here%2C%20we%20present%20RADAR%2C%20a%20framework%20designed%20to%20address%20these%20challenges%20by%20promoting%20community-driven%20information%20sharing%2C%20federated%20data%20analysis%2C%20and%20system%20resilience%2C%20while%20integrating%20AI%20methods%20for%20both%20GW%20signal%20identification%20and%20radio%20data%20aggregation.%20We%20show%20that%20it%20is%20possible%20to%20preserve%20data%20rights%20while%20sharing%20models%20that%20can%20help%20design%20and%5C%2For%20update%20follow-up%20strategies.%20We%20demonstrate%20our%20approach%20through%20a%20case%20study%20of%20GW170817%2C%20and%20discuss%20future%20directions%20for%20refinement%20and%20broader%20application.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2507.14827%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2507.14827%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T20%3A25%3A46Z%22%7D%7D%2C%7B%22key%22%3A%22MDY88FU3%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Kacmaz%20et%20al.%22%2C%22parsedDate%22%3A%222025-08-29%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BKacmaz%2C%20S.%2C%20Haas%2C%20R.%20%26amp%3B%20Huerta%2C%20E.%20A.%20Machine%20Learning-Driven%20Conservative-to-Primitive%20Conversion%20in%20Hybrid%20Piecewise%20Polytropic%20and%20Tabulated%20Equations%20of%20State.%20%26lt%3Bi%26gt%3BSymmetry%26lt%3B%5C%2Fi%26gt%3B%20%26lt%3Bb%26gt%3B17%26lt%3B%5C%2Fb%26gt%3B%2C%201409%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Machine%20Learning-Driven%20Conservative-to-Primitive%20Conversion%20in%20Hybrid%20Piecewise%20Polytropic%20and%20Tabulated%20Equations%20of%20State%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Semih%22%2C%22lastName%22%3A%22Kacmaz%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Roland%22%2C%22lastName%22%3A%22Haas%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22E.%20A.%22%2C%22lastName%22%3A%22Huerta%22%7D%5D%2C%22abstractNote%22%3A%22We%20present%20a%20novel%20machine%20learning%20%28ML%29-based%20method%20to%20accelerate%20conservative-to-primitive%20inversion%2C%20focusing%20on%20hybrid%20piecewise%20polytropic%20and%20tabulated%20equations%20of%20state.%20Traditional%20root-finding%20techniques%20are%20computationally%20expensive%2C%20particularly%20for%20large-scale%20relativistic%20hydrodynamics%20simulations.%20To%20address%20this%2C%20we%20employ%20feedforward%20neural%20networks%20%28NNC2PS%20and%20NNC2PL%29%2C%20trained%20in%20PyTorch%20%282.0%2B%29%20and%20optimized%20for%20GPU%20inference%20using%20NVIDIA%20TensorRT%20%288.4.1%29%2C%20achieving%20significant%20speedups%20with%20minimal%20accuracy%20loss.%20The%20NNC2PS%20model%20achieves%20L1%20and%20L%5Cu221e%20errors%20of%204.54%5Cu00d710%5Cu22127%20and%203.44%5Cu00d710%5Cu22126%2C%20respectively%2C%20while%20the%20NNC2PL%20model%20exhibits%20even%20lower%20error%20values.%20TensorRT%20optimization%20with%20mixed-precision%20deployment%20substantially%20accelerates%20performance%20compared%20to%20traditional%20root-finding%20methods.%20Specifically%2C%20the%20mixed-precision%20TensorRT%20engine%20for%20NNC2PS%20achieves%20inference%20speeds%20approximately%20400%20times%20faster%20than%20a%20traditional%20single-threaded%20CPU%20implementation%20for%20a%20dataset%20size%20of%201%2C000%2C000%20points.%20Ideal%20parallelization%20across%20an%20entire%20compute%20node%20in%20the%20Delta%20supercomputer%20%28dual%20AMD%2064-core%202.45%20GHz%20Milan%20processors%20and%208%20NVIDIA%20A100%20GPUs%20with%2040%20GB%20HBM2%20RAM%20and%20NVLink%29%20predicts%20a%2025-fold%20speedup%20for%20TensorRT%20over%20an%20optimally%20parallelized%20numerical%20method%20when%20processing%208%20million%20data%20points.%20Moreover%2C%20the%20ML%20method%20exhibits%20sub-linear%20scaling%20with%20increasing%20dataset%20sizes.%20We%20release%20the%20scientific%20software%20developed%2C%20enabling%20further%20validation%20and%20extension%20of%20our%20findings.%20By%20exploiting%20the%20underlying%20symmetries%20within%20the%20equation%20of%20state%2C%20these%20findings%20highlight%20the%20potential%20of%20ML%2C%20combined%20with%20GPU%20optimization%20and%20model%20quantization%2C%20to%20accelerate%20conservative-to-primitive%20inversion%20in%20relativistic%20hydrodynamics%20simulations.%22%2C%22date%22%3A%222025-08-29%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.3390%5C%2Fsym17091409%22%2C%22ISSN%22%3A%222073-8994%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.mdpi.com%5C%2F2073-8994%5C%2F17%5C%2F9%5C%2F1409%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T20%3A01%3A44Z%22%7D%7D%2C%7B%22key%22%3A%22ZLVSU2SJ%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Srivastava%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BSrivastava%2C%20A.%2C%20Basiri%2C%20S.%20%26amp%3B%20Salapaka%2C%20S.%20Autonomy-Aware%20Clustering%3A%20When%20Local%20Decisions%20Supersede%20Global%20Prescriptions.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2509.25775%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2509.25775%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Autonomy-Aware%20Clustering%3A%20When%20Local%20Decisions%20Supersede%20Global%20Prescriptions%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Amber%22%2C%22lastName%22%3A%22Srivastava%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Salar%22%2C%22lastName%22%3A%22Basiri%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Srinivasa%22%2C%22lastName%22%3A%22Salapaka%22%7D%5D%2C%22abstractNote%22%3A%22Clustering%20arises%20in%20a%20wide%20range%20of%20problem%20formulations%2C%20yet%20most%20existing%20approaches%20assume%20that%20the%20entities%20under%20clustering%20are%20passive%20and%20strictly%20conform%20to%20their%20assigned%20groups.%20In%20reality%2C%20entities%20often%20exhibit%20local%20autonomy%2C%20overriding%20prescribed%20associations%20in%20ways%20not%20fully%20captured%20by%20feature%20representations.%20Such%20autonomy%20can%20substantially%20reshape%20clustering%20outcomes%20--%20altering%20cluster%20compositions%2C%20geometry%2C%20and%20cardinality%20--%20with%20significant%20downstream%20effects%20on%20inference%20and%20decision-making.%20We%20introduce%20autonomy-aware%20clustering%2C%20a%20reinforcement%20learning%20%28RL%29%20framework%20that%20learns%20and%20accounts%20for%20the%20influence%20of%20local%20autonomy%20without%20requiring%20prior%20knowledge%20of%20its%20form.%20Our%20approach%20integrates%20RL%20with%20a%20Deterministic%20Annealing%20%28DA%29%20procedure%2C%20where%2C%20to%20determine%20underlying%20clusters%2C%20DA%20naturally%20promotes%20exploration%20in%20early%20stages%20of%20annealing%20and%20transitions%20to%20exploitation%20later.%20We%20also%20show%20that%20the%20annealing%20procedure%20exhibits%20phase%20transitions%20that%20enable%20design%20of%20efficient%20annealing%20schedules.%20To%20further%20enhance%20adaptability%2C%20we%20propose%20the%20Adaptive%20Distance%20Estimation%20Network%20%28ADEN%29%2C%20a%20transformer-based%20attention%20model%20that%20learns%20dependencies%20between%20entities%20and%20cluster%20representatives%20within%20the%20RL%20loop%2C%20accommodates%20variable-sized%20inputs%20and%20outputs%2C%20and%20enables%20knowledge%20transfer%20across%20diverse%20problem%20instances.%20Empirical%20results%20show%20that%20our%20framework%20closely%20aligns%20with%20underlying%20data%20dynamics%3A%20even%20without%20explicit%20autonomy%20models%2C%20it%20achieves%20solutions%20close%20to%20the%20ground%20truth%20%28gap%20~3-4%25%29%2C%20whereas%20ignoring%20autonomy%20leads%20to%20substantially%20larger%20gaps%20%28~35-40%25%29.%20The%20code%20and%20data%20are%20publicly%20available%20at%20https%3A%5C%2F%5C%2Fgithub.com%5C%2Fsalar96%5C%2FAutonomyAwareClustering.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2509.25775%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2509.25775%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T18%3A50%3A31Z%22%7D%7D%2C%7B%22key%22%3A%22K6TWQYFV%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Zhu%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BZhu%2C%20M.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20Probing%20the%20Critical%20Point%20%28CritPt%29%20of%20AI%20Reasoning%3A%20a%20Frontier%20Physics%20Research%20Benchmark.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2509.26574%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2509.26574%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Probing%20the%20Critical%20Point%20%28CritPt%29%20of%20AI%20Reasoning%3A%20a%20Frontier%20Physics%20Research%20Benchmark%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Minhui%22%2C%22lastName%22%3A%22Zhu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Minyang%22%2C%22lastName%22%3A%22Tian%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Xiaocheng%22%2C%22lastName%22%3A%22Yang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tianci%22%2C%22lastName%22%3A%22Zhou%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Lifan%22%2C%22lastName%22%3A%22Yuan%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Penghao%22%2C%22lastName%22%3A%22Zhu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Eli%22%2C%22lastName%22%3A%22Chertkov%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Shengyan%22%2C%22lastName%22%3A%22Liu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yufeng%22%2C%22lastName%22%3A%22Du%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ziming%22%2C%22lastName%22%3A%22Ji%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Indranil%22%2C%22lastName%22%3A%22Das%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Junyi%22%2C%22lastName%22%3A%22Cao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jiabin%22%2C%22lastName%22%3A%22Yu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Peixue%22%2C%22lastName%22%3A%22Wu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jinchen%22%2C%22lastName%22%3A%22He%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yifan%22%2C%22lastName%22%3A%22Su%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yikun%22%2C%22lastName%22%3A%22Jiang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yujie%22%2C%22lastName%22%3A%22Zhang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Chang%22%2C%22lastName%22%3A%22Liu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ze-Min%22%2C%22lastName%22%3A%22Huang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Weizhen%22%2C%22lastName%22%3A%22Jia%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yunkai%22%2C%22lastName%22%3A%22Wang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Farshid%22%2C%22lastName%22%3A%22Jafarpour%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yong%22%2C%22lastName%22%3A%22Zhao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Xinan%22%2C%22lastName%22%3A%22Chen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jessie%22%2C%22lastName%22%3A%22Shelton%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Aaron%20W.%22%2C%22lastName%22%3A%22Young%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22John%22%2C%22lastName%22%3A%22Bartolotta%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Wenchao%22%2C%22lastName%22%3A%22Xu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yue%22%2C%22lastName%22%3A%22Sun%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Anjun%22%2C%22lastName%22%3A%22Chu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Victor%22%2C%22lastName%22%3A%22Colussi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Chris%22%2C%22lastName%22%3A%22Akers%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Nathan%22%2C%22lastName%22%3A%22Brooks%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Wenbo%22%2C%22lastName%22%3A%22Fu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jinchao%22%2C%22lastName%22%3A%22Zhao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Marvin%22%2C%22lastName%22%3A%22Qi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Anqi%22%2C%22lastName%22%3A%22Mu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yubo%22%2C%22lastName%22%3A%22Yang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Allen%22%2C%22lastName%22%3A%22Zang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yang%22%2C%22lastName%22%3A%22Lyu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Peizhi%22%2C%22lastName%22%3A%22Mai%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Christopher%22%2C%22lastName%22%3A%22Wilson%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Xuefei%22%2C%22lastName%22%3A%22Guo%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Juntai%22%2C%22lastName%22%3A%22Zhou%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Daniel%22%2C%22lastName%22%3A%22Inafuku%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Chi%22%2C%22lastName%22%3A%22Xue%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Luyu%22%2C%22lastName%22%3A%22Gao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ze%22%2C%22lastName%22%3A%22Yang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ya%5Cu00efr%22%2C%22lastName%22%3A%22Hein%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yonatan%22%2C%22lastName%22%3A%22Kahn%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kevin%22%2C%22lastName%22%3A%22Zhou%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Di%22%2C%22lastName%22%3A%22Luo%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22John%20Drew%22%2C%22lastName%22%3A%22Wilson%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jarrod%20T.%22%2C%22lastName%22%3A%22Reilly%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Dmytro%22%2C%22lastName%22%3A%22Bandak%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ofir%22%2C%22lastName%22%3A%22Press%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Liang%22%2C%22lastName%22%3A%22Yang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Xueying%22%2C%22lastName%22%3A%22Wang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hao%22%2C%22lastName%22%3A%22Tong%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Nicolas%22%2C%22lastName%22%3A%22Chia%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Eliu%22%2C%22lastName%22%3A%22Huerta%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hao%22%2C%22lastName%22%3A%22Peng%22%7D%5D%2C%22abstractNote%22%3A%22While%20large%20language%20models%20%28LLMs%29%20with%20reasoning%20capabilities%20are%20progressing%20rapidly%20on%20high-school%20math%20competitions%20and%20coding%2C%20can%20they%20reason%20effectively%20through%20complex%2C%20open-ended%20challenges%20found%20in%20frontier%20physics%20research%3F%20And%20crucially%2C%20what%20kinds%20of%20reasoning%20tasks%20do%20physicists%20want%20LLMs%20to%20assist%20with%3F%20To%20address%20these%20questions%2C%20we%20present%20the%20CritPt%20%28Complex%20Research%20using%20Integrated%20Thinking%20-%20Physics%20Test%2C%20pronounced%20%26quot%3Bcritical%20point%26quot%3B%29%2C%20the%20first%20benchmark%20designed%20to%20test%20LLMs%20on%20unpublished%2C%20research-level%20reasoning%20tasks%20that%20broadly%20covers%20modern%20physics%20research%20areas%2C%20including%20condensed%20matter%2C%20quantum%20physics%2C%20atomic%2C%20molecular%20%26amp%3B%20optical%20physics%2C%20astrophysics%2C%20high%20energy%20physics%2C%20mathematical%20physics%2C%20statistical%20physics%2C%20nuclear%20physics%2C%20nonlinear%20dynamics%2C%20fluid%20dynamics%20and%20biophysics.%20CritPt%20consists%20of%2071%20composite%20research%20challenges%20designed%20to%20simulate%20full-scale%20research%20projects%20at%20the%20entry%20level%2C%20which%20are%20also%20decomposed%20to%20190%20simpler%20checkpoint%20tasks%20for%20more%20fine-grained%20insights.%20All%20problems%20are%20newly%20created%20by%2050%2B%20active%20physics%20researchers%20based%20on%20their%20own%20research.%20Every%20problem%20is%20hand-curated%20to%20admit%20a%20guess-resistant%20and%20machine-verifiable%20answer%20and%20is%20evaluated%20by%20an%20automated%20grading%20pipeline%20heavily%20customized%20for%20advanced%20physics-specific%20output%20formats.%20We%20find%20that%20while%20current%20state-of-the-art%20LLMs%20show%20early%20promise%20on%20isolated%20checkpoints%2C%20they%20remain%20far%20from%20being%20able%20to%20reliably%20solve%20full%20research-scale%20challenges%3A%20the%20best%20average%20accuracy%20among%20base%20models%20is%20only%205.7%25%2C%20achieved%20by%20GPT-5%20%28high%29%2C%20moderately%20rising%20to%20around%2010%25%20when%20equipped%20with%20coding%20tools.%20Through%20the%20realistic%20yet%20standardized%20evaluation%20offered%20by%20CritPt%2C%20we%20highlight%20a%20large%20disconnect%20between%20current%20model%20capabilities%20and%20realistic%20physics%20research%20demands%2C%20offering%20a%20foundation%20to%20guide%20the%20development%20of%20scientifically%20grounded%20AI%20tools.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2509.26574%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2509.26574%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T18%3A41%3A25Z%22%7D%7D%2C%7B%22key%22%3A%22IQL8TVNS%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Lian%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BLian%2C%20X.%2C%20Tanaka%2C%20M.%2C%20Ruwase%2C%20O.%20%26amp%3B%20Zhang%2C%20M.%20SuperOffload%3A%20Unleashing%20the%20Power%20of%20Large-Scale%20LLM%20Training%20on%20Superchips.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2509.21271%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2509.21271%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22SuperOffload%3A%20Unleashing%20the%20Power%20of%20Large-Scale%20LLM%20Training%20on%20Superchips%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Xinyu%22%2C%22lastName%22%3A%22Lian%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Masahiro%22%2C%22lastName%22%3A%22Tanaka%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Olatunji%22%2C%22lastName%22%3A%22Ruwase%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Minjia%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22The%20emergence%20of%20Superchips%20represents%20a%20significant%20advancement%20in%20next-generation%20AI%20hardware.%20These%20Superchips%20employ%20a%20tightly%20coupled%20heterogeneous%20architecture%20that%20integrates%20GPU%20and%20CPU%20on%20the%20same%20package%2C%20which%20offers%20unprecedented%20computational%20power.%20However%2C%20there%20has%20been%20scant%20research%20investigating%20how%20LLM%20training%20benefits%20from%20this%20new%20architecture.%20In%20this%20work%2C%20for%20the%20first%20time%2C%20we%20study%20LLM%20training%20solutions%20based%20on%20offloading%20for%20Superchips.%20We%20observe%20important%20differences%20between%20Superchips%20and%20traditional%20loosely-coupled%20GPU-CPU%20architecture%2C%20which%20necessitate%20revisiting%20prevailing%20assumptions%20about%20offloading.%20Based%20on%20that%2C%20we%20present%20SuperOffload%2C%20a%20Superchip-centric%20offloading%20system%20that%20simultaneously%20uses%20Hopper%20GPU%2C%20Grace%20CPU%2C%20and%20NVLink-C2C%20interconnect%20more%20efficiently.%20SuperOffload%20accomplishes%20this%20via%20a%20combination%20of%20techniques%2C%20such%20as%20adaptive%20weight%20offloading%2C%20bucketization%20repartitioning%2C%20Superchip-aware%20casting%2C%20speculative%20execution%2C%20and%20a%20highly%20optimized%20Adam%20optimizer%20for%20Grace%20CPUs.%20Our%20evaluation%20of%20SuperOffload%20on%20NVIDIA%20GH200%20demonstrates%20up%20to%202.5x%20throughput%20improvement%20compared%20to%20state-of-the-art%20offloading-based%20systems%2C%20enabling%20training%20of%20up%20to%2025B%20model%20on%20a%20single%20Superchip%20while%20achieving%20high%20training%20throughput.%20We%20also%20extend%20SuperOffload%20with%20ZeRO-style%20data%20parallelism%20and%20DeepSpeed-Ulysses%20sequence%20parallelism%2C%20enabling%20training%20of%2013B%20model%20with%20sequence%20lengths%20up%20to%201%20million%20tokens%20on%208%20GH200%20while%20achieving%2055%25%20MFU.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2509.21271%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2509.21271%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-12-05T18%3A28%3A26Z%22%7D%7D%2C%7B%22key%22%3A%2248SB6SER%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22D%5Cu00edaz-Ibarra%20et%20al.%22%2C%22parsedDate%22%3A%222025-09-29%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BD%26%23xED%3Baz-Ibarra%2C%20O.%20H.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20TChem-atm%20%28v2.0.0%29%3A%20Scalable%20Performance-Portable%20Multiphase%20Atmospheric%20Chemistry.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.5194%5C%2Fegusphere-2025-4376%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.5194%5C%2Fegusphere-2025-4376%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22TChem-atm%20%28v2.0.0%29%3A%20Scalable%20Performance-Portable%20Multiphase%20Atmospheric%20Chemistry%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Oscar%20H.%22%2C%22lastName%22%3A%22D%5Cu00edaz-Ibarra%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Samuel%20G.%22%2C%22lastName%22%3A%22Frederick%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jeffrey%20H.%22%2C%22lastName%22%3A%22Curtis%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zachary%22%2C%22lastName%22%3A%22D%27Aquino%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Peter%20A.%22%2C%22lastName%22%3A%22Bosler%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Lekha%22%2C%22lastName%22%3A%22Patel%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Cosmin%22%2C%22lastName%22%3A%22Safta%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Matthew%22%2C%22lastName%22%3A%22West%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Nicole%22%2C%22lastName%22%3A%22Riemer%22%7D%5D%2C%22abstractNote%22%3A%22Abstract.%20We%20present%20TChem-atm%2C%20a%20performance-portable%20approach%20that%20enables%20efficient%20simulation%20of%20chemically%20detailed%20and%20multiphase%20atmospheric%20chemistry%20on%20modern%20heterogeneous%20computing%20architectures.%20Unlike%20previous%20efforts%20that%20rely%20on%20architecture-specific%20code%20or%20focus%20exclusively%20on%20gas-phase%20chemistry%2C%20TChem-atm%20supports%20fully%20coupled%20gas%5Cu2013aerosol%20systems%20with%20execution%20across%20CPUs%2C%20NVIDIA%20GPUs%2C%20and%20AMD%20GPUs%20through%20the%20Kokkos%20programming%20model.%20It%20integrates%20the%20flexible%20multiphase%20capabilities%20of%20the%20Community%20Atmospheric%20Model%20Chemistry%20Package%20%28CAMP%29%20with%20the%20high%20performance%20kinetic%20routines%20of%20TChem%2C%20and%20includes%20automatic%20Jacobian%20construction%20with%20support%20for%20a%20range%20of%20stiff%20ODE%20solvers.%20We%20demonstrate%20TChem-atm%26%23039%3Bs%20integration%20into%20the%20particle-resolved%20aerosol%20model%20PartMC%20and%20validate%20its%20accuracy%20against%20the%20existing%20PartMC%5Cu2013CAMP%20implementation%2C%20showing%20agreement%20within%20solver%20tolerances.%20Performance%20benchmarks%20reveal%20substantial%20speedups%20on%20GPU%20platforms%2C%20particularly%20for%20large%20particle%20populations%2C%20with%20consistent%20results%20across%20hardware%20backends.%20By%20enabling%20chemically%20detailed%2C%20multiphase%20simulations%20with%20true%20performance%20portability%20and%20host-model%20flexibility%2C%20TChem-atm%20provides%20a%20new%20foundation%20for%20next-generation%20atmospheric%20models.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025-9-29%22%2C%22DOI%22%3A%2210.5194%5C%2Fegusphere-2025-4376%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fegusphere.copernicus.org%5C%2Fpreprints%5C%2F2025%5C%2Fegusphere-2025-4376%5C%2F%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-11-20T15%3A00%3A26Z%22%7D%7D%2C%7B%22key%22%3A%22SFHB3AKG%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Zhao%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BZhao%2C%20Y.%2C%20LV%2C%20J.%2C%20Wu%2C%20D.%2C%20Wang%2C%20J.%20%26amp%3B%20Gooley%2C%20C.%20Are%20We%20Scaling%20the%20Right%20Thing%3F%20A%20System%20Perspective%20on%20Test-Time%20Scaling.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2509.19645%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2509.19645%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Are%20We%20Scaling%20the%20Right%20Thing%3F%20A%20System%20Perspective%20on%20Test-Time%20Scaling%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Youpeng%22%2C%22lastName%22%3A%22Zhao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jinpeng%22%2C%22lastName%22%3A%22LV%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Di%22%2C%22lastName%22%3A%22Wu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jun%22%2C%22lastName%22%3A%22Wang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Christopher%22%2C%22lastName%22%3A%22Gooley%22%7D%5D%2C%22abstractNote%22%3A%22Test-time%20scaling%20%28TTS%29%20has%20recently%20emerged%20as%20a%20promising%20direction%20to%20exploit%20the%20hidden%20reasoning%20capabilities%20of%20pre-trained%20large%20language%20models%20%28LLMs%29.%20However%2C%20existing%20scaling%20methods%20narrowly%20focus%20on%20the%20compute-optimal%20Pareto-frontier%2C%20ignoring%20the%20simple%20fact%20that%20compute-optimal%20is%20not%20always%20system-optimal.%20In%20this%20work%2C%20we%20propose%20a%20system-driven%20perspective%20on%20TTS%2C%20analyzing%20how%20reasoning%20models%20scale%20against%20practical%20metrics%2C%20such%20as%20latency%20and%20cost-per-token.%20By%20evaluating%20the%20impact%20of%20popular%20optimizations%20such%20as%20tensor%20parallelism%20and%20speculative%20decoding%2C%20our%20preliminary%20analysis%20reveals%20the%20limitations%20of%20current%20methods%20and%20calls%20for%20a%20paradigm%20shift%20toward%20holistic%2C%20system-aware%20evaluations%20that%20capture%20the%20true%20essence%20of%20scaling%20laws%20at%20inference%20time.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2509.19645%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2509.19645%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-11-20T14%3A46%3A31Z%22%7D%7D%2C%7B%22key%22%3A%22NGCZJ3M9%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Wilfong%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BWilfong%2C%20B.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20Testing%20and%20benchmarking%20emerging%20supercomputers%20via%20the%20MFC%20flow%20solver.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2509.13575%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2509.13575%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Testing%20and%20benchmarking%20emerging%20supercomputers%20via%20the%20MFC%20flow%20solver%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Benjamin%22%2C%22lastName%22%3A%22Wilfong%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Anand%22%2C%22lastName%22%3A%22Radhakrishnan%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Henry%20A.%20Le%22%2C%22lastName%22%3A%22Berre%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tanush%22%2C%22lastName%22%3A%22Prathi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Stephen%22%2C%22lastName%22%3A%22Abbott%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Spencer%20H.%22%2C%22lastName%22%3A%22Bryngelson%22%7D%5D%2C%22abstractNote%22%3A%22Deploying%20new%20supercomputers%20requires%20testing%20and%20evaluation%20via%20application%20codes.%20Portable%2C%20user-friendly%20tools%20enable%20evaluation%2C%20and%20the%20Multicomponent%20Flow%20Code%20%28MFC%29%2C%20a%20computational%20fluid%20dynamics%20%28CFD%29%20code%2C%20addresses%20this%20need.%20MFC%20is%20adorned%20with%20a%20toolchain%20that%20automates%20input%20generation%2C%20compilation%2C%20batch%20job%20submission%2C%20regression%20testing%2C%20and%20benchmarking.%20The%20toolchain%20design%20enables%20users%20to%20evaluate%20compiler-hardware%20combinations%20for%20correctness%20and%20performance%20with%20limited%20software%20engineering%20experience.%20As%20with%20other%20PDE%20solvers%2C%20wall%20time%20per%20spatially%20discretized%20grid%20point%20serves%20as%20a%20figure%20of%20merit.%20We%20present%20MFC%20benchmarking%20results%20for%20five%20generations%20of%20NVIDIA%20GPUs%2C%20three%20generations%20of%20AMD%20GPUs%2C%20and%20various%20CPU%20architectures%2C%20utilizing%20Intel%2C%20Cray%2C%20NVIDIA%2C%20AMD%2C%20and%20GNU%20compilers.%20These%20tests%20have%20revealed%20compiler%20bugs%20and%20regressions%20on%20recent%20machines%20such%20as%20Frontier%20and%20El%20Capitan.%20MFC%20has%20benchmarked%20approximately%2050%20compute%20devices%20and%205%20flagship%20supercomputers.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2509.13575%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2509.13575%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-09-22T22%3A36%3A06Z%22%7D%7D%2C%7B%22key%22%3A%229NXQI4NG%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Bazavov%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BBazavov%2C%20A.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20High-Precision%20Scale%20Setting%20with%20the%20Omega-Baryon%20Mass%20and%20Gradient%20Flow.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2509.14367%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2509.14367%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22High-Precision%20Scale%20Setting%20with%20the%20Omega-Baryon%20Mass%20and%20Gradient%20Flow%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Alexei%22%2C%22lastName%22%3A%22Bazavov%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Claude%20W.%22%2C%22lastName%22%3A%22Bernard%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22David%20A.%22%2C%22lastName%22%3A%22Clarke%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Carleton%22%2C%22lastName%22%3A%22DeTar%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Aida%20X.%22%2C%22lastName%22%3A%22El-Khadra%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Elvira%22%2C%22lastName%22%3A%22G%5Cu00e1miz%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Steven%22%2C%22lastName%22%3A%22Gottlieb%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Anthony%20V.%22%2C%22lastName%22%3A%22Grebe%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Urs%20M.%22%2C%22lastName%22%3A%22Heller%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Leon%22%2C%22lastName%22%3A%22Hostetler%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22William%20I.%22%2C%22lastName%22%3A%22Jay%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hwancheol%22%2C%22lastName%22%3A%22Jeong%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Andreas%20S.%22%2C%22lastName%22%3A%22Kronfeld%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yin%22%2C%22lastName%22%3A%22Lin%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Shaun%22%2C%22lastName%22%3A%22Lahert%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jack%22%2C%22lastName%22%3A%22Laiho%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Michael%22%2C%22lastName%22%3A%22Lynch%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Andrew%20T.%22%2C%22lastName%22%3A%22Lytle%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Aaron%20S.%22%2C%22lastName%22%3A%22Meyer%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ethan%20T.%22%2C%22lastName%22%3A%22Neil%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Curtis%20T.%22%2C%22lastName%22%3A%22Peterson%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22James%20N.%22%2C%22lastName%22%3A%22Simone%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jacob%20W.%22%2C%22lastName%22%3A%22Sitison%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ruth%20S.%22%2C%22lastName%22%3A%22Van%20de%20Water%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Alejandro%22%2C%22lastName%22%3A%22Vaquero%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Michael%20L.%22%2C%22lastName%22%3A%22Wagman%22%7D%5D%2C%22abstractNote%22%3A%22The%20gradient-flow%20scale%20%24w_0%24%20in%20lattice%20QCD%20is%20determined%20using%20the%20mass%20of%20the%20%24%5Cu03a9%5E-%24%20baryon%20to%20set%20the%20physical%20scale.%20Nine%20ensembles%20using%20the%20highly%20improved%20staggered%20quark%20%28HISQ%29%20action%20with%20lattice%20spacings%20of%200.15%20fm%20down%20to%200.04%20fm%20are%20used%2C%20seven%20of%20which%20have%20nearly%20physical%20light-quark%20masses.%20Electromagnetic%20corrections%20to%20the%20%24%5Cu03a9%5E-%24%20mass%20are%20defined%20in%20order%20to%20compute%20a%20pure-QCD%20%24%5Cu03a9%24%20mass.%20The%20final%20result%20is%20%24w_0%20%3D%200.17187%2868%29%24%20fm%2C%20corresponding%20to%20a%20relative%20uncertainty%20of%200.40%25%20and%20a%20central%20value%20in%20good%20agreement%20with%20previous%20calculations%20in%20the%20literature.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2509.14367%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2509.14367%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-09-22T22%3A33%3A27Z%22%7D%7D%2C%7B%22key%22%3A%22F88ENFI7%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Yazdani-Jahromi%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BYazdani-Jahromi%2C%20M.%2C%20Yalabadi%2C%20A.%20K.%20%26amp%3B%20Garibay%2C%20O.%20O.%20Equi-mRNA%3A%20Protein%20Translation%20Equivariant%20Encoding%20for%20mRNA%20Language%20Models.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2508.15103%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2508.15103%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Equi-mRNA%3A%20Protein%20Translation%20Equivariant%20Encoding%20for%20mRNA%20Language%20Models%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mehdi%22%2C%22lastName%22%3A%22Yazdani-Jahromi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ali%20Khodabandeh%22%2C%22lastName%22%3A%22Yalabadi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ozlem%20Ozmen%22%2C%22lastName%22%3A%22Garibay%22%7D%5D%2C%22abstractNote%22%3A%22The%20growing%20importance%20of%20mRNA%20therapeutics%20and%20synthetic%20biology%20highlights%20the%20need%20for%20models%20that%20capture%20the%20latent%20structure%20of%20synonymous%20codon%20%28different%20triplets%20encoding%20the%20same%20amino%20acid%29%20usage%2C%20which%20subtly%20modulates%20translation%20efficiency%20and%20gene%20expression.%20While%20recent%20efforts%20incorporate%20codon-level%20inductive%20biases%20through%20auxiliary%20objectives%2C%20they%20often%20fall%20short%20of%20explicitly%20modeling%20the%20structured%20relationships%20that%20arise%20from%20the%20genetic%20code%26%23039%3Bs%20inherent%20symmetries.%20We%20introduce%20Equi-mRNA%2C%20the%20first%20codon-level%20equivariant%20mRNA%20language%20model%20that%20explicitly%20encodes%20synonymous%20codon%20symmetries%20as%20cyclic%20subgroups%20of%202D%20Special%20Orthogonal%20matrix%20%28SO%282%29%29.%20By%20combining%20group-theoretic%20priors%20with%20an%20auxiliary%20equivariance%20loss%20and%20symmetry-aware%20pooling%2C%20Equi-mRNA%20learns%20biologically%20grounded%20representations%20that%20outperform%20vanilla%20baselines%20across%20multiple%20axes.%20On%20downstream%20property-prediction%20tasks%20including%20expression%2C%20stability%2C%20and%20riboswitch%20switching%20Equi-mRNA%20delivers%20up%20to%20approximately%2010%25%20improvements%20in%20accuracy.%20In%20sequence%20generation%2C%20it%20produces%20mRNA%20constructs%20that%20are%20up%20to%20approximately%204x%20more%20realistic%20under%20Frechet%20BioDistance%20metrics%20and%20approximately%2028%25%20better%20preserve%20functional%20properties%20compared%20to%20vanilla%20baseline.%20Interpretability%20analyses%20further%20reveal%20that%20learned%20codon-rotation%20distributions%20recapitulate%20known%20GC-content%20biases%20and%20tRNA%20abundance%20patterns%2C%20offering%20novel%20insights%20into%20codon%20usage.%20Equi-mRNA%20establishes%20a%20new%20biologically%20principled%20paradigm%20for%20mRNA%20modeling%2C%20with%20significant%20implications%20for%20the%20design%20of%20next-generation%20therapeutics.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2508.15103%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2508.15103%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-09-22T22%3A21%3A56Z%22%7D%7D%2C%7B%22key%22%3A%22BZTCE6NW%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Yu%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BYu%2C%20J.%2C%20Taneja%2C%20A.%2C%20Lin%2C%20J.%20%26amp%3B%20Zhang%2C%20M.%20VoltanaLLM%3A%20Feedback-Driven%20Frequency%20Control%20and%20State-Space%20Routing%20for%20Energy-Efficient%20LLM%20Serving.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2509.04827%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2509.04827%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22VoltanaLLM%3A%20Feedback-Driven%20Frequency%20Control%20and%20State-Space%20Routing%20for%20Energy-Efficient%20LLM%20Serving%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jiahuan%22%2C%22lastName%22%3A%22Yu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Aryan%22%2C%22lastName%22%3A%22Taneja%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Junfeng%22%2C%22lastName%22%3A%22Lin%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Minjia%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22Modern%20Large%20Language%20Model%20%28LLM%29%20serving%20systems%20increasingly%20support%20interactive%20applications%2C%20like%20real-time%20chat%20assistants%2C%20code%20generation%20tools%2C%20and%20agentic%20workflows.%20However%2C%20the%20soaring%20energy%20cost%20of%20LLM%20inference%20presents%20a%20growing%20challenge%20for%20sustainable%20and%20cost-effective%20deployment.%20This%20paper%20introduces%20VoltanaLLM%2C%20a%20system%20for%20SLO-aware%2C%20energy-efficient%20LLM%20serving%2C%20built%20from%20a%20control%20theory%20perspective.%20VoltanaLLM%20co-designs%20frequency%20scaling%20and%20request%20routing%20in%20emerging%20prefill%5C%2Fdecode%20disaggregated%20architectures%2C%20leveraging%20their%20decoupled%20execution%20to%20enable%20fine-grained%20phase-specific%20control.%20It%20consists%20of%20a%20feedback-driven%20frequency%20controller%20that%20dynamically%20adapts%20GPU%20frequency%20for%20prefill%20and%20decode%20phases%2C%20and%20a%20state-space%20router%20that%20explores%20routing%20decisions%20across%20frequency-scaled%20instances%20to%20minimize%20energy%20under%20latency%20constraints.%20We%20implement%20VoltanaLLM%20in%20SGLang%20and%20evaluate%20its%20performance%20over%20multiple%20state-of-the-art%20LLMs%20and%20real-world%20datasets.%20The%20results%20demonstrate%20that%20VoltanaLLM%20achieves%20up%20to%2036.3%25%20energy%20savings%20while%20maintaining%20near-perfect%20SLO%20attainment%20rate%2C%20paving%20the%20way%20for%20sustainable%20and%20intelligent%20LLM%20serving.%20Code%20of%20VoltanaLLM%20is%20open-sourced%20on%20GitHub%3A%20https%3A%5C%2F%5C%2Fgithub.com%5C%2FSupercomputing-System-AI-Lab%5C%2FVoltanaLLM.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2509.04827%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2509.04827%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-09-22T22%3A20%3A18Z%22%7D%7D%2C%7B%22key%22%3A%222VWZPGXI%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Ba%5Cu00f1o-Medina%20et%20al.%22%2C%22parsedDate%22%3A%222025-07-17%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BBa%26%23xF1%3Bo-Medina%2C%20J.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20A%20Regional%20High%20Resolution%20AI%20Weather%20Model%20for%20the%20Prediction%20of%20Atmospheric%20Rivers%20and%20Extreme%20Precipitation.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.21203%5C%2Frs.3.rs-7087242%5C%2Fv1%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.21203%5C%2Frs.3.rs-7087242%5C%2Fv1%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22A%20Regional%20High%20Resolution%20AI%20Weather%20Model%20for%20the%20Prediction%20of%20Atmospheric%20Rivers%20and%20Extreme%20Precipitation%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jorge%22%2C%22lastName%22%3A%22Ba%5Cu00f1o-Medina%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Agniv%22%2C%22lastName%22%3A%22Sengupta%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Daniel%22%2C%22lastName%22%3A%22Steinhoff%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Patrick%22%2C%22lastName%22%3A%22Mulrooney%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Thomas%22%2C%22lastName%22%3A%22Nipen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mario%22%2C%22lastName%22%3A%22Santa-Cruz%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yanbo%22%2C%22lastName%22%3A%22Nie%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Luca%20Delle%22%2C%22lastName%22%3A%22Monache%22%7D%5D%2C%22abstractNote%22%3A%22Abstract%20%5Cn%20%20%20%20%20%20%20%20%20%20Accurate%20precipitation%20forecasting%20often%20relies%20on%20high-resolution%20numerical%20weather%20prediction%20%28NWP%29%20models%2C%20which%20are%20essential%20for%20capturing%20fine-scale%20and%20nonlinear%20atmospheric%20dynamics.%20However%2C%20the%20computational%20demands%20of%20these%20models%20can%20be%20substantial.%20Leveraging%20recent%20advancements%20in%20artificial%20intelligence%20%28AI%29%2C%20we%20present%20a%20stretched-grid%20AI-driven%20weather%20model%20with%206-km%20horizontal%20grid%20increments%20over%20the%20Western%20United%20States%20and%20approximately%2031-km%20in%20other%20regions%20globally.%20The%20model%20employs%20an%20autoregressive%20framework%20to%20generate%20forecasts%20in%20minutes%20and%20is%20evaluated%20against%20global%20and%20regional%20NWP%20systems%2C%20as%20well%20as%20a%20lower-resolution%20AI%20model.%20Our%20results%20show%20that%20the%20regional%20AI%20model%20reduces%2024-hour%20accumulated%20precipitation%20errors%2C%20performs%20competitively%20with%20the%20regional%20NWP%20model%2C%20and%20effectively%20captures%20extreme%20precipitation%20events%2C%20particularly%20those%20linked%20to%20atmospheric%20rivers%2C%20which%20global%20coarser%20models%20often%20underestimate.%20This%20work%20underscores%20the%20potential%20of%20regional%2C%20high-resolution%20AI%20models%20for%20precipitation%20forecasting%20at%20km-scales%2C%20and%20discusses%20some%20of%20the%20challenges%20for%20future%20development.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025-07-17%22%2C%22DOI%22%3A%2210.21203%5C%2Frs.3.rs-7087242%5C%2Fv1%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.researchsquare.com%5C%2Farticle%5C%2Frs-7087242%5C%2Fv1%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-09-18T21%3A57%3A22Z%22%7D%7D%2C%7B%22key%22%3A%22NXRE2SNR%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Yuan%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BYuan%2C%20Y.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20X-MoE%3A%20Enabling%20Scalable%20Training%20for%20Emerging%20Mixture-of-Experts%20Architectures%20on%20HPC%20Platforms.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2508.13337%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2508.13337%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22X-MoE%3A%20Enabling%20Scalable%20Training%20for%20Emerging%20Mixture-of-Experts%20Architectures%20on%20HPC%20Platforms%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yueming%22%2C%22lastName%22%3A%22Yuan%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ahan%22%2C%22lastName%22%3A%22Gupta%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jianping%22%2C%22lastName%22%3A%22Li%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Sajal%22%2C%22lastName%22%3A%22Dash%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Feiyi%22%2C%22lastName%22%3A%22Wang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Minjia%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22Emerging%20expert-specialized%20Mixture-of-Experts%20%28MoE%29%20architectures%2C%20such%20as%20DeepSeek-MoE%2C%20deliver%20strong%20model%20quality%20through%20fine-grained%20expert%20segmentation%20and%20large%20top-k%20routing.%20However%2C%20their%20scalability%20is%20limited%20by%20substantial%20activation%20memory%20overhead%20and%20costly%20all-to-all%20communication.%20Furthermore%2C%20current%20MoE%20training%20systems%20-%20primarily%20optimized%20for%20NVIDIA%20GPUs%20-%20perform%20suboptimally%20on%20non-NVIDIA%20platforms%2C%20leaving%20significant%20computational%20potential%20untapped.%20In%20this%20work%2C%20we%20present%20X-MoE%2C%20a%20novel%20MoE%20training%20system%20designed%20to%20deliver%20scalable%20training%20performance%20for%20next-generation%20MoE%20architectures.%20X-MoE%20achieves%20this%20via%20several%20novel%20techniques%2C%20including%20efficient%20padding-free%20MoE%20training%20with%20cross-platform%20kernels%2C%20redundancy-bypassing%20dispatch%2C%20and%20hybrid%20parallelism%20with%20sequence-sharded%20MoE%20blocks.%20Our%20evaluation%20on%20the%20Frontier%20supercomputer%2C%20powered%20by%20AMD%20MI250X%20GPUs%2C%20shows%20that%20X-MoE%20scales%20DeepSeek-style%20MoEs%20up%20to%20545%20billion%20parameters%20across%201024%20GPUs%20-%2010x%20larger%20than%20the%20largest%20trainable%20model%20with%20existing%20methods%20under%20the%20same%20hardware%20budget%2C%20while%20maintaining%20high%20training%20throughput.%20The%20source%20code%20of%20X-MoE%20is%20available%20at%20https%3A%5C%2F%5C%2Fgithub.com%5C%2FSupercomputing-System-AI-Lab%5C%2FX-MoE.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2508.13337%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2508.13337%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-09-17T20%3A33%3A04Z%22%7D%7D%2C%7B%22key%22%3A%22R8WTMRZR%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Adams%20and%20Bienz%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BAdams%2C%20M.%20%26amp%3B%20Bienz%2C%20A.%20Optimizing%20Allreduce%20Operations%20for%20Heterogeneous%20Architectures%20with%20Multiple%20Processes%20per%20GPU.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2508.13397%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2508.13397%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Optimizing%20Allreduce%20Operations%20for%20Heterogeneous%20Architectures%20with%20Multiple%20Processes%20per%20GPU%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Michael%22%2C%22lastName%22%3A%22Adams%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Amanda%22%2C%22lastName%22%3A%22Bienz%22%7D%5D%2C%22abstractNote%22%3A%22Large%20inter-GPU%20all-reduce%20operations%2C%20prevalent%20throughout%20deep%20learning%2C%20are%20bottlenecked%20by%20communication%20costs.%20Emerging%20heterogeneous%20architectures%20are%20comprised%20of%20complex%20nodes%2C%20often%20containing%20%244%24%20GPUs%20and%20dozens%20to%20hundreds%20of%20CPU%20cores%20per%20node.%20Parallel%20applications%20are%20typically%20accelerated%20on%20the%20available%20GPUs%2C%20using%20only%20a%20single%20CPU%20core%20per%20GPU%20while%20the%20remaining%20cores%20sit%20idle.%20This%20paper%20presents%20novel%20optimizations%20to%20large%20GPU-aware%20all-reduce%20operations%2C%20extending%20lane-aware%20reductions%20to%20the%20GPUs%2C%20and%20notably%20using%20multiple%20CPU%20cores%20per%20GPU%20to%20accelerate%20these%20operations.%20These%20multi-CPU-accelerated%20GPU-aware%20lane%20all-reduces%20yield%20speedup%20of%20up%20to%20%242.45%24x%20for%20large%20MPI%20all-reduces%20across%20the%20NVIDIA%20A100%20GPUs%20of%20NCSA%26%23039%3Bs%20Delta%20supercomputer.%20Finally%2C%20the%20approach%20is%20extended%20to%20NVIDIA%26%23039%3Bs%20and%20AMD%26%23039%3Bs%20collective%20communication%20libraries%2C%20achieving%20speedup%20of%20up%20to%20%241.77%24x%20and%20%241.71%24x%2C%20respectively%2C%20across%20%242%24%20state-of-the-art%20supercomputers.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2508.13397%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2508.13397%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-09-17T20%3A26%3A21Z%22%7D%7D%2C%7B%22key%22%3A%22FNWGFPGL%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Gong%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BGong%2C%20Y.%2C%20Zhu%2C%20Z.%20%26amp%3B%20Zhang%2C%20M.%20InstantEdit%3A%20Text-Guided%20Few-Step%20Image%20Editing%20with%20Piecewise%20Rectified%20Flow.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2508.06033%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2508.06033%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22InstantEdit%3A%20Text-Guided%20Few-Step%20Image%20Editing%20with%20Piecewise%20Rectified%20Flow%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yiming%22%2C%22lastName%22%3A%22Gong%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zhen%22%2C%22lastName%22%3A%22Zhu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Minjia%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22We%20propose%20a%20fast%20text-guided%20image%20editing%20method%20called%20InstantEdit%20based%20on%20the%20RectifiedFlow%20framework%2C%20which%20is%20structured%20as%20a%20few-step%20editing%20process%20that%20preserves%20critical%20content%20while%20following%20closely%20to%20textual%20instructions.%20Our%20approach%20leverages%20the%20straight%20sampling%20trajectories%20of%20RectifiedFlow%20by%20introducing%20a%20specialized%20inversion%20strategy%20called%20PerRFI.%20To%20maintain%20consistent%20while%20editable%20results%20for%20RectifiedFlow%20model%2C%20we%20further%20propose%20a%20novel%20regeneration%20method%2C%20Inversion%20Latent%20Injection%2C%20which%20effectively%20reuses%20latent%20information%20obtained%20during%20inversion%20to%20facilitate%20more%20coherent%20and%20detailed%20regeneration.%20Additionally%2C%20we%20propose%20a%20Disentangled%20Prompt%20Guidance%20technique%20to%20balance%20editability%20with%20detail%20preservation%2C%20and%20integrate%20a%20Canny-conditioned%20ControlNet%20to%20incorporate%20structural%20cues%20and%20suppress%20artifacts.%20Evaluation%20on%20the%20PIE%20image%20editing%20dataset%20demonstrates%20that%20InstantEdit%20is%20not%20only%20fast%20but%20also%20achieves%20better%20qualitative%20and%20quantitative%20results%20compared%20to%20state-of-the-art%20few-step%20editing%20methods.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2508.06033%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2508.06033%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-09-15T22%3A50%3A28Z%22%7D%7D%2C%7B%22key%22%3A%22MUIDAQQ7%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Wu%20et%20al.%22%2C%22parsedDate%22%3A%222025-07-28%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BWu%2C%20T.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20Spatial%20Heterogeneity%20Alters%20the%20Dynamics%20of%20the%20Yeast%20Galactose%20Switch%3A%20Insights%20from%204D%20RDME%26%23x2013%3BODE%20Hybrid%20Simulations.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1101%5C%2F2025.07.23.666409%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1101%5C%2F2025.07.23.666409%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Spatial%20Heterogeneity%20Alters%20the%20Dynamics%20of%20the%20Yeast%20Galactose%20Switch%3A%20Insights%20from%204D%20RDME%5Cu2013ODE%20Hybrid%20Simulations%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tianyu%22%2C%22lastName%22%3A%22Wu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Marie-Christin%22%2C%22lastName%22%3A%22Spindler%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Emmy%22%2C%22lastName%22%3A%22Earnest%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Henry%22%2C%22lastName%22%3A%22Li%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zane%20R.%22%2C%22lastName%22%3A%22Thornburg%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Julia%22%2C%22lastName%22%3A%22Mahamid%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zaida%22%2C%22lastName%22%3A%22Luthey-Schulten%22%7D%5D%2C%22abstractNote%22%3A%22Abstract%20%5Cn%20%20%20%20%20%20%20%20%20%20%20%5Cn%20%20%20%20%20%20%20%20%20%20%20%20We%20present%20the%20first%204D%20simulations%20of%20the%20galactose%20switch%20in%20%5Cn%20%20%20%20%20%20%20%20%20%20%20%20Saccharomyces%20cerevisiae%20%5Cn%20%20%20%20%20%20%20%20%20%20%20%20using%20a%20hybrid%20framework%20that%20integrates%20reaction%20diffusion%20master%20equations%20%28RDMEs%29%20and%20ordinary%20differential%20equations%20%28ODEs%29.%20Using%20the%20GPU-based%20Lattice%20Microbes%20program%2C%20genetic%20information%20processes%20were%20simulated%20stochastically%20while%20a%20simplified%20metabolism%20was%20modeled%20deterministically.%20Cell%20geometry%20was%20constructed%20based%20on%20recently%20acquired%20cryo-electron%20tomograms%28cryo-ET%29%2C%20which%20allows%20us%20to%20quantify%20and%20differentiate%20between%20cytosolic%20ribosomes%20and%20endoplasmic%20reticulum%28ER%29%20associated%20ribosomes.%20This%20allows%20us%20to%20simulate%20realistic%20numbers%20of%20available%20ribosomes%20for%20ER-associated%20translation%20of%20proteins%20destined%20for%20the%20cell%20membrane%2C%20like%20the%20galactose%20transporter%20G2.%20Our%20simulations%20show%20that%20an%20extracellular%2011%20mM%20galactose%20triggers%20expression%20of%2010k-15k%20galactose%20transporters%20within%2060%20minutes.%20We%20also%20benchmarked%20the%20multi-GPU%20solver%5Cu2019s%20performance%20under%20various%20spatial%20decompositions.%20Our%20work%20underscores%20the%20challenges%20of%20whole%20cell%20modeling%20of%20eukaryotic%20cells%20and%20the%20effects%20of%20their%20inherent%20spatial%20heterogeneity.%20%5Cn%20%20%20%20%20%20%20%20%20%20%20%5Cn%20%20%20%20%20%20%20%20%20%20%20%5Cn%20%20%20%20%20%20%20%20%20%20%20%20Author%20summary%20%5Cn%20%20%20%20%20%20%20%20%20%20%20%20Cells%20must%20quickly%20adapt%20when%20their%20food%20source%20changes.%20In%20baker%5Cu2019s%20yeast%2C%20a%20genetic%20switch%20turns%20on%20dozens%20of%20genes%20so%20the%20cell%20can%20use%20the%20sugar%20galactose%20instead%20of%20glucose.%20We%20built%20the%20first%20four-dimensional%20computer%20model%20that%20follows%20every%20key%20molecule%20in%20a%20realistic%2C%20tomogram-based%20yeast%20cell%20as%20this%20switch%20is%20involved.%20The%20model%20combines%20random%2C%20molecule-by-molecule%20chemistry%20for%20rare%20events%20with%20faster%2C%20deterministic%20equations%20for%20abundant%20reactions%2C%20and%20runs%20on%20graphics-processing%20units%20powerful%20enough%20to%20simulate%20an%20entire%20cell%20in%20an%20hour.%20By%20explicitly%20distinguishing%20ribosomes%20that%20float%20freely%20from%20those%20anchored%20on%20the%20endoplasmic%20reticulum%2C%20we%20tracked%20where%20the%20galactose%20transporter%20Gal2%20is%20made%20and%20how%20it%20travels%20to%20the%20cell%20membrane.%20The%20simulations%20predict%20that%20about%2010%2C000%5Cu201315%2C000%20transporters%20appear%20within%20an%20hour%20after%20the%20cell%20senses%2011%20mM%20galactose%2C%20and%20reveal%20that%20the%20maze-like%20endoplasmic%20reticulum%20slows%20their%20delivery.%20Our%20framework%20shows%20how%20cellular%20geography%20alters%20genetic%20information%20processing%20production%20and%20paves%20the%20way%20for%20whole-cell%20simulations%20of%20more%20complex%20organisms.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025-07-28%22%2C%22DOI%22%3A%2210.1101%5C%2F2025.07.23.666409%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Fbiorxiv.org%5C%2Flookup%5C%2Fdoi%5C%2F10.1101%5C%2F2025.07.23.666409%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-08-20T17%3A51%3A05Z%22%7D%7D%2C%7B%22key%22%3A%22XCF3YYAI%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Zhu%20et%20al.%22%2C%22parsedDate%22%3A%222025-08-05%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BZhu%2C%20Z.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20Understanding%20the%20Landscape%20of%20Ampere%20GPU%20Memory%20Errors.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FarXiv.2508.03513%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FarXiv.2508.03513%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Understanding%20the%20Landscape%20of%20Ampere%20GPU%20Memory%20Errors%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zhu%22%2C%22lastName%22%3A%22Zhu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yu%22%2C%22lastName%22%3A%22Sun%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Dhatri%22%2C%22lastName%22%3A%22Parakal%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Bo%22%2C%22lastName%22%3A%22Fang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Steven%22%2C%22lastName%22%3A%22Farrell%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Gregory%20H.%22%2C%22lastName%22%3A%22Bauer%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Brett%22%2C%22lastName%22%3A%22Bode%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ian%20T.%22%2C%22lastName%22%3A%22Foster%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Michael%20E.%22%2C%22lastName%22%3A%22Papka%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22William%22%2C%22lastName%22%3A%22Gropp%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zhao%22%2C%22lastName%22%3A%22Zhang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Lishan%22%2C%22lastName%22%3A%22Yang%22%7D%5D%2C%22abstractNote%22%3A%22Graphics%20Processing%20Units%20%28GPUs%29%20have%20become%20a%20de%20facto%20solution%20for%20accelerating%20high-performance%20computing%20%28HPC%29%20applications.%20Understanding%20their%20memory%20error%20behavior%20is%20an%20essential%20step%20toward%20achieving%20efficient%20and%20reliable%20HPC%20systems.%20In%20this%20work%2C%20we%20present%20a%20large-scale%20cross-supercomputer%20study%20to%20characterize%20GPU%20memory%20reliability%2C%20covering%20three%20supercomputers%20-%20Delta%2C%20Polaris%2C%20and%20Perlmutter%20-%20all%20equipped%20with%20NVIDIA%20A100%20GPUs.%20We%20examine%20error%20logs%20spanning%2067.77%20million%20GPU%20device-hours%20across%2010%2C693%20GPUs.%20We%20compare%20error%20rates%20and%20mean-time-between-errors%20%28MTBE%29%20and%20highlight%20both%20shared%20and%20distinct%20error%20characteristics%20among%20these%20three%20systems.%20Based%20on%20these%20observations%20and%20analyses%2C%20we%20discuss%20the%20implications%20and%20lessons%20learned%2C%20focusing%20on%20the%20reliable%20operation%20of%20supercomputers%2C%20the%20choice%20of%20checkpointing%20interval%2C%20and%20the%20comparison%20of%20reliability%20characteristics%20with%20those%20of%20previous-generation%20GPUs.%20Our%20characterization%20study%20provides%20valuable%20insights%20into%20fault-tolerant%20HPC%20system%20design%20and%20operation%2C%20enabling%20more%20efficient%20execution%20of%20HPC%20applications.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22arXiv%3A2508.03513%22%2C%22date%22%3A%222025-08-05%22%2C%22DOI%22%3A%2210.48550%5C%2FarXiv.2508.03513%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2508.03513%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-08-20T17%3A41%3A24Z%22%7D%7D%2C%7B%22key%22%3A%223XGPZ8T9%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Bharadwaj%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BBharadwaj%2C%20S.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20OpenBEATs%3A%20A%20Fully%20Open-Source%20General-Purpose%20Audio%20Encoder.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.14129%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.14129%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22OpenBEATs%3A%20A%20Fully%20Open-Source%20General-Purpose%20Audio%20Encoder%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Shikhar%22%2C%22lastName%22%3A%22Bharadwaj%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Samuele%22%2C%22lastName%22%3A%22Cornell%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kwanghee%22%2C%22lastName%22%3A%22Choi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Satoru%22%2C%22lastName%22%3A%22Fukayama%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hye-jin%22%2C%22lastName%22%3A%22Shim%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Soham%22%2C%22lastName%22%3A%22Deshmukh%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Shinji%22%2C%22lastName%22%3A%22Watanabe%22%7D%5D%2C%22abstractNote%22%3A%22Masked%20token%20prediction%20has%20emerged%20as%20a%20powerful%20pre-training%20objective%20across%20language%2C%20vision%2C%20and%20speech%2C%20offering%20the%20potential%20to%20unify%20these%20diverse%20modalities%20through%20a%20single%20pre-training%20task.%20However%2C%20its%20application%20for%20general%20audio%20understanding%20remains%20underexplored%2C%20with%20BEATs%20being%20the%20only%20notable%20example.%20BEATs%20has%20seen%20limited%20modifications%20due%20to%20the%20absence%20of%20open-source%20pre-training%20code.%20Furthermore%2C%20BEATs%20was%20trained%20only%20on%20AudioSet%2C%20restricting%20its%20broader%20downstream%20applicability.%20To%20address%20these%20gaps%2C%20we%20present%20OpenBEATs%2C%20an%20open-source%20framework%20that%20extends%20BEATs%20via%20multi-domain%20audio%20pre-training.%20We%20conduct%20comprehensive%20evaluations%20across%20six%20types%20of%20tasks%2C%20twenty%20five%20datasets%2C%20and%20three%20audio%20domains%2C%20including%20audio%20reasoning%20tasks%20such%20as%20audio%20question%20answering%2C%20entailment%2C%20and%20captioning.%20OpenBEATs%20achieves%20state-of-the-art%20performance%20on%20six%20bioacoustics%20datasets%2C%20two%20environmental%20sound%20datasets%20and%20five%20reasoning%20datasets%2C%20performing%20better%20than%20models%20exceeding%20a%20billion%20parameters%20at%20one-fourth%20their%20parameter%20size.%20These%20results%20demonstrate%20the%20effectiveness%20of%20multi-domain%20datasets%20and%20masked%20token%20prediction%20task%20to%20learn%20general-purpose%20audio%20representations.%20To%20promote%20further%20research%20and%20reproducibility%2C%20we%20release%20all%20pre-training%20and%20evaluation%20code%2C%20pretrained%20and%20fine-tuned%20checkpoints%2C%20and%20training%20logs%20at%20https%3A%5C%2F%5C%2Fshikhar-s.github.io%5C%2FOpenBEATs%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2507.14129%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2507.14129%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-08-05T18%3A56%3A22Z%22%7D%7D%2C%7B%22key%22%3A%224XKZTMPE%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Bode%20et%20al.%22%2C%22parsedDate%22%3A%222025-07-20%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BBode%2C%20B.%2C%20Bauer%2C%20G.%2C%20Herriott%2C%20L.%2C%20Kindratenko%2C%20V.%20%26amp%3B%20Gropp%2C%20W.%20DeltaAI%3A%20A%20National%20Resource%20for%20AI%5C%2FML%20Research.%20in%20%26lt%3Bi%26gt%3BPractice%20and%20Experience%20in%20Advanced%20Research%20Computing%202025%3A%20The%20Power%20of%20Collaboration%26lt%3B%5C%2Fi%26gt%3B%201%26%23x2013%3B4%20%28ACM%2C%20Columbus%20Ohio%20USA%2C%202025%29.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttp%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F3708035.3736062%26%23039%3B%26gt%3Bhttp%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F3708035.3736062%26lt%3B%5C%2Fa%26gt%3B.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22DeltaAI%3A%20A%20National%20Resource%20for%20AI%5C%2FML%20Research%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Brett%22%2C%22lastName%22%3A%22Bode%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Gregory%22%2C%22lastName%22%3A%22Bauer%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Laura%22%2C%22lastName%22%3A%22Herriott%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Volodymyr%22%2C%22lastName%22%3A%22Kindratenko%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22William%22%2C%22lastName%22%3A%22Gropp%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%222025-07-20%22%2C%22proceedingsTitle%22%3A%22Practice%20and%20Experience%20in%20Advanced%20Research%20Computing%202025%3A%20The%20Power%20of%20Collaboration%22%2C%22conferenceName%22%3A%22PEARC%20%2725%3A%20Practice%20and%20Experience%20in%20Advanced%20Research%20Computing%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1145%5C%2F3708035.3736062%22%2C%22ISBN%22%3A%229798400713989%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdl.acm.org%5C%2Fdoi%5C%2F10.1145%5C%2F3708035.3736062%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-08-05T18%3A47%3A46Z%22%7D%7D%2C%7B%22key%22%3A%22WQJXN8ZG%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Wang%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BWang%2C%20R.%2C%20Li%2C%20Y.%2C%20Fung%2C%20Y.%20R.%20%26amp%3B%20Zhang%2C%20T.%20Let%26%23x2019%3Bs%20Reason%20Formally%3A%20Natural-Formal%20Hybrid%20Reasoning%20Enhances%20LLM%26%23x2019%3Bs%20Math%20Capability.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2505.23703%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2505.23703%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Let%27s%20Reason%20Formally%3A%20Natural-Formal%20Hybrid%20Reasoning%20Enhances%20LLM%27s%20Math%20Capability%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ruida%22%2C%22lastName%22%3A%22Wang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yuxin%22%2C%22lastName%22%3A%22Li%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yi%20R.%22%2C%22lastName%22%3A%22Fung%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tong%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22Enhancing%20the%20mathematical%20reasoning%20capabilities%20of%20LLMs%20has%20garnered%20significant%20attention%20in%20both%20the%20mathematical%20and%20computer%20science%20communities.%20Recent%20works%20have%20made%20substantial%20progress%20in%20both%20Natural%20Language%20%28NL%29%20reasoning%20and%20Formal%20Language%20%28FL%29%20reasoning%20by%20leveraging%20the%20potential%20of%20pure%20Reinforcement%20Learning%20%28RL%29%20methods%20on%20base%20models.%20However%2C%20RL%20approaches%20struggle%20to%20impart%20new%20capabilities%20not%20presented%20in%20the%20base%20model%2C%20highlighting%20the%20need%20to%20integrate%20more%20knowledge%20like%20FL%20into%20NL%20math%20reasoning%20effectively.%20Yet%2C%20this%20integration%20is%20challenging%20due%20to%20inherent%20disparities%20in%20problem%20structure%20and%20reasoning%20format%20between%20NL%20and%20FL.%20To%20address%20these%20challenges%2C%20we%20introduce%20%2A%2ANL-FL%20HybridReasoning%2A%2A%2C%20an%20end-to-end%20framework%20designed%20to%20incorporate%20the%20FL%20expert%20into%20NL%20math%20problem-solving.%20To%20bridge%20the%20NL%20and%20FL%20input%20format%20gap%2C%20we%20propose%20the%20%2ANL-FL%20Problem%20Alignment%2A%20method%2C%20which%20reformulates%20the%20Question-Answering%20%28QA%29%20problems%20in%20NL%20as%20existence%20theorems%20in%20FL.%20Subsequently%2C%20the%20%2AMixed%20Problem%20Input%2A%20technique%20we%20provide%20enables%20the%20FL%20reasoner%20to%20handle%20both%20QA%20and%20existence%20problems%20concurrently.%20Lastly%2C%20we%20mitigate%20the%20NL%20and%20FL%20output%20format%20gap%20in%20reasoning%20through%20an%20LLM-based%20%2AAnswer%20Extraction%2A%20mechanism.%20Comprehensive%20experiments%20demonstrate%20that%20the%20%2A%2AHybridReasoning%2A%2A%20framework%20achieves%20%2A%2A89.80%25%2A%2A%20and%20%2A%2A84.34%25%2A%2A%20accuracy%20rates%20on%20the%20MATH-500%20and%20the%20AMC%20benchmarks%2C%20surpassing%20the%20NL%20baseline%20by%204.60%25%20and%204.82%25%2C%20respectively.%20Notably%2C%20some%20problems%20resolved%20by%20our%20framework%20remain%20unsolved%20by%20the%20NL%20baseline%20model%20even%20under%20a%20larger%20number%20of%20trials.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2505.23703%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2505.23703%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-08-05T16%3A32%3A15Z%22%7D%7D%2C%7B%22key%22%3A%22UVDA3PC2%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Yao%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BYao%2C%20J.%2C%20Wang%2C%20R.%20%26amp%3B%20Zhang%2C%20T.%20FANS%20--%20Formal%20Answer%20Selection%20for%20Natural%20Language%20Math%20Reasoning%20Using%20Lean4.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2503.03238%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2503.03238%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22FANS%20--%20Formal%20Answer%20Selection%20for%20Natural%20Language%20Math%20Reasoning%20Using%20Lean4%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jiarui%22%2C%22lastName%22%3A%22Yao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ruida%22%2C%22lastName%22%3A%22Wang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tong%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22Large%20Language%20Models%20%28LLMs%29%20have%20displayed%20astonishing%20abilities%20in%20various%20tasks%2C%20especially%20in%20text%20generation%2C%20classification%2C%20question%20answering%2C%20etc.%20However%2C%20the%20reasoning%20ability%20of%20LLMs%20still%20faces%20many%20debates.%20The%20inherent%20ambiguity%20of%20Natural%20Language%20%28NL%29%20limits%20LLMs%26%23039%3B%20ability%20to%20perform%20verifiable%20reasoning%2C%20making%20its%20answers%20lack%20coherence%20and%20trustworthy%20support.%20To%20tackle%20the%20above%20problems%2C%20we%20propose%20a%20novel%20framework%20named%20FANS%3A%20Formal%20ANswer%20Selection%20for%20Natural%20Language%20Math%20Reasoning%20Using%20Lean4.%20To%20the%20best%20of%20our%20knowledge%2C%20it%20is%20the%20first%20framework%20that%20utilizes%20Lean4%20to%20enhance%20LLMs%26%23039%3B%20NL%20math%20reasoning%20ability.%20In%20particular%2C%20given%20an%20NL%20math%20question%20and%20LLM-generated%20answers%2C%20FANS%20first%20translates%20it%20into%20Lean4%20theorem%20statements.%20Then%20it%20tries%20to%20prove%20it%20using%20a%20Lean4%20prover%20and%20verify%20it%20by%20Lean4.%20Finally%2C%20it%20uses%20the%20FL%20result%20to%20assist%20in%20answer%20selection.%20It%20enhances%20LLMs%26%23039%3B%20NL%20math%20ability%20in%20providing%20a%20computer-verifiable%20solution%20for%20its%20correct%20answer%20and%20proposes%20an%20alternative%20method%20for%20answer%20selection%20beyond%20the%20reward%20model.%20Extensive%20experiments%20indicate%20the%20effectiveness%20of%20our%20framework.%20It%20can%20improve%20the%20accuracy%20rate%20of%20reward%20model%20enhanced%20LLMs%20in%20the%20MATH-500%20dataset%20by%20at%20most%201.91%25%20and%20AMC-23%20by%20at%20most%208.33%25%20on%20strong%20reward-model%20baselines.%20In%20some%20particular%20fields%20like%20number%20theory%20that%20Lean4%20experts%20in%2C%20we%20can%20even%20select%20all%20correct%20solutions.%20The%20qualitative%20analysis%20also%20shows%20our%20framework%20can%20make%20NL%20results%20formally%20backed%20by%20Lean4%20proofs.%20As%20a%20pioneering%20work%20in%20the%20corresponding%20field%2C%20we%20will%20open-source%20all%20our%20models%20and%20datasets%20to%20further%20boost%20the%20development%20of%20the%20field.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2503.03238%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2503.03238%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-08-05T16%3A31%3A56Z%22%7D%7D%2C%7B%22key%22%3A%22E5MYFLEQ%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Gladstone%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BGladstone%2C%20A.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20Energy-Based%20Transformers%20are%20Scalable%20Learners%20and%20Thinkers.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.02092%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.02092%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Energy-Based%20Transformers%20are%20Scalable%20Learners%20and%20Thinkers%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Alexi%22%2C%22lastName%22%3A%22Gladstone%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ganesh%22%2C%22lastName%22%3A%22Nanduru%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Md%20Mofijul%22%2C%22lastName%22%3A%22Islam%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Peixuan%22%2C%22lastName%22%3A%22Han%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hyeonjeong%22%2C%22lastName%22%3A%22Ha%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Aman%22%2C%22lastName%22%3A%22Chadha%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yilun%22%2C%22lastName%22%3A%22Du%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Heng%22%2C%22lastName%22%3A%22Ji%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jundong%22%2C%22lastName%22%3A%22Li%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tariq%22%2C%22lastName%22%3A%22Iqbal%22%7D%5D%2C%22abstractNote%22%3A%22Inference-time%20computation%20techniques%2C%20analogous%20to%20human%20System%202%20Thinking%2C%20have%20recently%20become%20popular%20for%20improving%20model%20performances.%20However%2C%20most%20existing%20approaches%20suffer%20from%20several%20limitations%3A%20they%20are%20modality-specific%20%28e.g.%2C%20working%20only%20in%20text%29%2C%20problem-specific%20%28e.g.%2C%20verifiable%20domains%20like%20math%20and%20coding%29%2C%20or%20require%20additional%20supervision%5C%2Ftraining%20on%20top%20of%20unsupervised%20pretraining%20%28e.g.%2C%20verifiers%20or%20verifiable%20rewards%29.%20In%20this%20paper%2C%20we%20ask%20the%20question%20%26quot%3BIs%20it%20possible%20to%20generalize%20these%20System%202%20Thinking%20approaches%2C%20and%20develop%20models%20that%20learn%20to%20think%20solely%20from%20unsupervised%20learning%3F%26quot%3B%20Interestingly%2C%20we%20find%20the%20answer%20is%20yes%2C%20by%20learning%20to%20explicitly%20verify%20the%20compatibility%20between%20inputs%20and%20candidate-predictions%2C%20and%20then%20re-framing%20prediction%20problems%20as%20optimization%20with%20respect%20to%20this%20verifier.%20Specifically%2C%20we%20train%20Energy-Based%20Transformers%20%28EBTs%29%20--%20a%20new%20class%20of%20Energy-Based%20Models%20%28EBMs%29%20--%20to%20assign%20an%20energy%20value%20to%20every%20input%20and%20candidate-prediction%20pair%2C%20enabling%20predictions%20through%20gradient%20descent-based%20energy%20minimization%20until%20convergence.%20Across%20both%20discrete%20%28text%29%20and%20continuous%20%28visual%29%20modalities%2C%20we%20find%20EBTs%20scale%20faster%20than%20the%20dominant%20Transformer%2B%2B%20approach%20during%20training%2C%20achieving%20an%20up%20to%2035%25%20higher%20scaling%20rate%20with%20respect%20to%20data%2C%20batch%20size%2C%20parameters%2C%20FLOPs%2C%20and%20depth.%20During%20inference%2C%20EBTs%20improve%20performance%20with%20System%202%20Thinking%20by%2029%25%20more%20than%20the%20Transformer%2B%2B%20on%20language%20tasks%2C%20and%20EBTs%20outperform%20Diffusion%20Transformers%20on%20image%20denoising%20while%20using%20fewer%20forward%20passes.%20Further%2C%20we%20find%20that%20EBTs%20achieve%20better%20results%20than%20existing%20models%20on%20most%20downstream%20tasks%20given%20the%20same%20or%20worse%20pretraining%20performance%2C%20suggesting%20that%20EBTs%20generalize%20better%20than%20existing%20approaches.%20Consequently%2C%20EBTs%20are%20a%20promising%20new%20paradigm%20for%20scaling%20both%20the%20learning%20and%20thinking%20capabilities%20of%20models.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2507.02092%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2507.02092%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-08-05T16%3A15%3A03Z%22%7D%7D%2C%7B%22key%22%3A%22LDXKTRMU%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Pan%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BPan%2C%20R.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20Adapt-Pruner%3A%20Adaptive%20Structural%20Pruning%20for%20Efficient%20Small%20Language%20Model%20Training.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2502.03460%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2502.03460%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Adapt-Pruner%3A%20Adaptive%20Structural%20Pruning%20for%20Efficient%20Small%20Language%20Model%20Training%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Rui%22%2C%22lastName%22%3A%22Pan%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Boyao%22%2C%22lastName%22%3A%22Wang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Shizhe%22%2C%22lastName%22%3A%22Diao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Xingyuan%22%2C%22lastName%22%3A%22Pan%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jipeng%22%2C%22lastName%22%3A%22Zhang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Renjie%22%2C%22lastName%22%3A%22Pi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tong%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22Small%20language%20models%20%28SLMs%29%20have%20attracted%20considerable%20attention%20from%20both%20academia%20and%20industry%20due%20to%20their%20broad%20range%20of%20applications%20in%20edge%20devices.%20To%20obtain%20SLMs%20with%20strong%20performance%2C%20conventional%20approaches%20either%20pre-train%20the%20models%20from%20scratch%2C%20which%20incurs%20substantial%20computational%20costs%2C%20or%20compress%5C%2Fprune%20existing%20large%20language%20models%20%28LLMs%29%2C%20which%20results%20in%20performance%20drops%20and%20falls%20short%20in%20comparison%20to%20pre-training.%20In%20this%20paper%2C%20we%20investigate%20the%20family%20of%20acceleration%20methods%20that%20involve%20both%20structured%20pruning%20and%20model%20training.%20We%20found%201%29%20layer-wise%20adaptive%20pruning%20%28Adapt-Pruner%29%20is%20extremely%20effective%20in%20LLMs%20and%20yields%20significant%20improvements%20over%20existing%20pruning%20techniques%2C%202%29%20adaptive%20pruning%20equipped%20with%20further%20training%20leads%20to%20models%20comparable%20to%20those%20pre-training%20from%20scratch%2C%203%29%20incremental%20pruning%20brings%20non-trivial%20performance%20gain%20by%20interleaving%20pruning%20with%20training%20and%20only%20removing%20a%20small%20portion%20of%20neurons%20%28%24%5C%5Csim%245%25%29%20at%20a%20time.%20Experimental%20results%20on%20LLaMA-3.1-8B%20demonstrate%20that%20Adapt-Pruner%20outperforms%20conventional%20pruning%20methods%2C%20such%20as%20LLM-Pruner%2C%20FLAP%2C%20and%20SliceGPT%2C%20by%20an%20average%20of%201%25-7%25%20in%20accuracy%20on%20commonsense%20benchmarks.%20Additionally%2C%20Adapt-Pruner%20restores%20the%20performance%20of%20MobileLLM-125M%20to%20600M%20on%20the%20MMLU%20benchmark%20with%20200%24%5C%5Ctimes%24%20fewer%20tokens%20via%20pruning%20from%20its%20larger%20counterparts%2C%20and%20discovers%20a%20new%201B%20model%20that%20surpasses%20LLaMA-3.2-1B%20in%20multiple%20benchmarks.%20The%20official%20code%20is%20released%20at%20https%3A%5C%2F%5C%2Fgithub.com%5C%2Fresearch4pan%5C%2FAdaptPruner.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2502.03460%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2502.03460%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-08-05T14%3A32%3A21Z%22%7D%7D%2C%7B%22key%22%3A%22NJVCX88J%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Wang%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BWang%2C%20R.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20MA-LoT%3A%20Model-Collaboration%20Lean-based%20Long%20Chain-of-Thought%20Reasoning%20enhances%20Formal%20Theorem%20Proving.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2503.03205%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2503.03205%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22MA-LoT%3A%20Model-Collaboration%20Lean-based%20Long%20Chain-of-Thought%20Reasoning%20enhances%20Formal%20Theorem%20Proving%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ruida%22%2C%22lastName%22%3A%22Wang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Rui%22%2C%22lastName%22%3A%22Pan%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yuxin%22%2C%22lastName%22%3A%22Li%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jipeng%22%2C%22lastName%22%3A%22Zhang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yizhen%22%2C%22lastName%22%3A%22Jia%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Shizhe%22%2C%22lastName%22%3A%22Diao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Renjie%22%2C%22lastName%22%3A%22Pi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Junjie%22%2C%22lastName%22%3A%22Hu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tong%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22Solving%20mathematical%20problems%20using%20computer-verifiable%20languages%20like%20Lean%20has%20significantly%20impacted%20the%20mathematical%20and%20computer%20science%20communities.%20State-of-the-art%20methods%20utilize%20a%20single%20Large%20Language%20Model%20%28LLM%29%20to%20generate%20complete%20proof%20or%20perform%20tree%20search%2C%20but%20they%20fail%20to%20balance%20these%20tasks.%20We%20propose%20%2A%2AMA-LoT%2A%2A%3A%20%2AModel-CollAboration%20Lean-based%20Long%20Chain-of-Thought%2A%2C%20a%20comprehensive%20framework%20for%20Lean4%20theorem%20proving%20to%20solve%20this%20issue.%20It%20separates%20the%20cognition%20tasks%20of%20general%20NL%20for%20whole-proof%20generation%20and%20error%20analysis%20for%20proof%20correction%20using%20the%20model-collaboration%20method.%20We%20achieve%20this%20by%20structured%20interaction%20of%20the%20LLM%20and%20Lean4%20verifier%20in%20Long%20CoT.%20To%20implement%20the%20framework%2C%20we%20propose%20the%20novel%20%2ALoT-Transfer%20Learning%2A%20training-inference%20pipeline%2C%20which%20enables%20the%20Long%20CoT%20thinking%20capability%20to%20LLMs%20without%20special%20data%20annotation.%20Extensive%20experiment%20shows%20that%20our%20framework%20achieves%20a%20%2A%2A61.07%25%2A%2A%20accuracy%20rate%20on%20the%20Lean4%20version%20of%20the%20MiniF2F-Test%20dataset%2C%20largely%20outperforming%20DeepSeek-V3%20%2833.61%25%29%2C%20single-model%20tree%20search%20%28InternLM-Step-Prover%2C%2050.70%25%29%2C%20and%20whole-proof%20generation%20%28Godel-Prover%2C%2055.33%25%29%20baselines.%20Furthermore%2C%20our%20findings%20highlight%20the%20potential%20of%20combining%20Long%20CoT%20with%20formal%20verification%20for%20a%20more%20insightful%20generation%20in%20a%20broader%20perspective.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2503.03205%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2503.03205%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-08-05T14%3A30%3A11Z%22%7D%7D%2C%7B%22key%22%3A%22JVW9YLQY%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Lian%20et%20al.%22%2C%22parsedDate%22%3A%222024%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BLian%2C%20X.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20Universal%20Checkpointing%3A%20A%20Flexible%20and%20Efficient%20Distributed%20Checkpointing%20System%20for%20Large-Scale%20DNN%20Training%20with%20Reconfigurable%20Parallelis.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2406.18820%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2406.18820%26lt%3B%5C%2Fa%26gt%3B%20%282024%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Universal%20Checkpointing%3A%20A%20Flexible%20and%20Efficient%20Distributed%20Checkpointing%20System%20for%20Large-Scale%20DNN%20Training%20with%20Reconfigurable%20Parallelis%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Xinyu%22%2C%22lastName%22%3A%22Lian%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Sam%20Ade%22%2C%22lastName%22%3A%22Jacobs%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Lev%22%2C%22lastName%22%3A%22Kurilenko%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Masahiro%22%2C%22lastName%22%3A%22Tanaka%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Stas%22%2C%22lastName%22%3A%22Bekman%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Olatunji%22%2C%22lastName%22%3A%22Ruwase%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Minjia%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22Deep%20neural%20network%20%28DNN%29%20training%20continues%20to%20scale%20rapidly%20in%20terms%20of%20model%20size%2C%20data%20volume%2C%20and%20sequence%20length%2C%20to%20the%20point%20where%20multiple%20machines%20are%20required%20to%20fit%20large%20models%20for%20training.%20Different%20distributed%20and%20parallel%20training%20strategies%20have%20been%20developed%20to%20support%20large-scale%20DNN%20training%20by%20partitioning%20the%20training%20state%20across%20GPUs.%20However%2C%20existing%20DNN%20training%20systems%20provide%20very%20limited%20support%20for%20reconfiguring%20parallelism%20strategies%20in%20the%20middle%20of%20the%20training%20via%20checkpointing.%20This%20limitation%20arises%20because%20distributed%20checkpoints%20are%20tightly%20coupled%20to%20specific%20model%20parallelism%20and%20hardware%20configurations%2C%20preventing%20large-scale%20training%20jobs%20from%20efficiently%20adapting%20to%20hardware%20failures%20or%20resource%20elasticity.%5Cn%20This%20paper%20presents%20Universal%20Checkpointing%20%28UCP%29%2C%20a%20novel%20checkpointing%20system%20that%20enables%20flexible%20and%20efficient%20DNN%20training%20with%20reconfigurable%20parallelism.%20UCP%20overcomes%20challenges%20in%20existing%20systems%20by%20decoupling%20checkpoint%20structure%20from%20parallel%20training%20strategies%20and%20hardware%20configurations.%20In%20addition%2C%20we%20present%20a%20pattern-based%20reconfiguration%20pipeline%20that%20enables%20automatic%2C%20flexible%2C%20and%20efficient%20mapping%20of%20checkpoint%20state%20to%20various%20parallelism%20strategies.%20Evaluation%20on%20a%20range%20of%20DNN%20models%2C%20including%20state-of-the-art%20dense%20and%20sparse%20LLMs%2C%20shows%20that%20UCP%20enables%20reconfiguration%20for%20a%20broader%20set%20of%20widely%20used%20parallelism%20strategies%20than%20existing%20solutions%20while%20adding%20negligible%20reconfiguration%20cost.%20UCP%20has%20been%20successfully%20employed%20in%20real%20LLM%20training%20workloads%2C%20greatly%20enhancing%20their%20flexibility%20and%20resilience%20to%20dynamic%20hardware%20environments.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222024%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2406.18820%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2406.18820%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-08-04T16%3A14%3A36Z%22%7D%7D%2C%7B%22key%22%3A%225KI6HNES%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Wang%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BWang%2C%20X.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20MedCite%3A%20Can%20Language%20Models%20Generate%20Verifiable%20Text%20for%20Medicine%3F%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2506.06605%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2506.06605%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22MedCite%3A%20Can%20Language%20Models%20Generate%20Verifiable%20Text%20for%20Medicine%3F%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Xiao%22%2C%22lastName%22%3A%22Wang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mengjue%22%2C%22lastName%22%3A%22Tan%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Qiao%22%2C%22lastName%22%3A%22Jin%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Guangzhi%22%2C%22lastName%22%3A%22Xiong%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yu%22%2C%22lastName%22%3A%22Hu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Aidong%22%2C%22lastName%22%3A%22Zhang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zhiyong%22%2C%22lastName%22%3A%22Lu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Minjia%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22Existing%20LLM-based%20medical%20question-answering%20systems%20lack%20citation%20generation%20and%20evaluation%20capabilities%2C%20raising%20concerns%20about%20their%20adoption%20in%20practice.%20In%20this%20work%2C%20we%20introduce%20%5C%5Cname%2C%20the%20first%20end-to-end%20framework%20that%20facilitates%20the%20design%20and%20evaluation%20of%20citation%20generation%20with%20LLMs%20for%20medical%20tasks.%20Meanwhile%2C%20we%20introduce%20a%20novel%20multi-pass%20retrieval-citation%20method%20that%20generates%20high-quality%20citations.%20Our%20evaluation%20highlights%20the%20challenges%20and%20opportunities%20of%20citation%20generation%20for%20medical%20tasks%2C%20while%20identifying%20important%20design%20choices%20that%20have%20a%20significant%20impact%20on%20the%20final%20citation%20quality.%20Our%20proposed%20method%20achieves%20superior%20citation%20precision%20and%20recall%20improvements%20compared%20to%20strong%20baseline%20methods%2C%20and%20we%20show%20that%20evaluation%20results%20correlate%20well%20with%20annotation%20results%20from%20professional%20experts.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2506.06605%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2506.06605%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-08-04T15%3A47%3A39Z%22%7D%7D%2C%7B%22key%22%3A%22HA7Q9MZS%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Basiri%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BBasiri%2C%20S.%2C%20Tiwari%2C%20D.%20%26amp%3B%20Salapaka%2C%20S.%20M.%20Parametrized%20Multi-Agent%20Routing%20via%20Deep%20Attention%20Models.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.22338%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.22338%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Parametrized%20Multi-Agent%20Routing%20via%20Deep%20Attention%20Models%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Salar%22%2C%22lastName%22%3A%22Basiri%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Dhananjay%22%2C%22lastName%22%3A%22Tiwari%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Srinivasa%20M.%22%2C%22lastName%22%3A%22Salapaka%22%7D%5D%2C%22abstractNote%22%3A%22We%20propose%20a%20scalable%20deep%20learning%20framework%20for%20parametrized%20sequential%20decision-making%20%28ParaSDM%29%2C%20where%20multiple%20agents%20jointly%20optimize%20discrete%20action%20policies%20and%20shared%20continuous%20parameters.%20A%20key%20subclass%20of%20this%20setting%20arises%20in%20Facility-Location%20and%20Path%20Optimization%20%28FLPO%29%2C%20where%20multi-agent%20systems%20must%20simultaneously%20determine%20optimal%20routes%20and%20facility%20locations%2C%20aiming%20to%20minimize%20the%20cumulative%20transportation%20cost%20within%20the%20network.%20FLPO%20problems%20are%20NP-hard%20due%20to%20their%20mixed%20discrete-continuous%20structure%20and%20highly%20non-convex%20objective.%20To%20address%20this%2C%20we%20integrate%20the%20Maximum%20Entropy%20Principle%20%28MEP%29%20with%20a%20neural%20policy%20model%20called%20the%20Shortest%20Path%20Network%20%28SPN%29-a%20permutation-invariant%20encoder-decoder%20that%20approximates%20the%20MEP%20solution%20while%20enabling%20efficient%20gradient-based%20optimization%20over%20shared%20parameters.%20The%20SPN%20achieves%20up%20to%20100%24%5C%5Ctimes%24%20speedup%20in%20policy%20inference%20and%20gradient%20computation%20compared%20to%20MEP%20baselines%2C%20with%20an%20average%20optimality%20gap%20of%20approximately%206%25%20across%20a%20wide%20range%20of%20problem%20sizes.%20Our%20FLPO%20approach%20yields%20over%2010%24%5C%5Ctimes%24%20lower%20cost%20than%20metaheuristic%20baselines%20while%20running%20significantly%20faster%2C%20and%20matches%20Gurobi%26%23039%3Bs%20optimal%20cost%20with%20annealing%20at%20a%201500%24%5C%5Ctimes%24%20speedup-establishing%20a%20new%20state%20of%20the%20art%20for%20ParaSDM%20problems.%20These%20results%20highlight%20the%20power%20of%20structured%20deep%20models%20for%20solving%20large-scale%20mixed-integer%20optimization%20tasks.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2507.22338%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2507.22338%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-08-04T15%3A28%3A41Z%22%7D%7D%2C%7B%22key%22%3A%2272BXCHP8%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Wang%20and%20Buitrago%22%2C%22parsedDate%22%3A%222025-07-20%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BWang%2C%20M.-Y.%20%26amp%3B%20Buitrago%2C%20P.%20Evaluating%20Pretraining%20Efficiency%20of%20Language%20Models%20on%20AI%20Accelerators.%20in%20%26lt%3Bi%26gt%3BPractice%20and%20Experience%20in%20Advanced%20Research%20Computing%202025%3A%20The%20Power%20of%20Collaboration%26lt%3B%5C%2Fi%26gt%3B%201%26%23x2013%3B5%20%28ACM%2C%20Columbus%20Ohio%20USA%2C%202025%29.%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttp%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F3708035.3736101%26%23039%3B%26gt%3Bhttp%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F3708035.3736101%26lt%3B%5C%2Fa%26gt%3B.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Evaluating%20Pretraining%20Efficiency%20of%20Language%20Models%20on%20AI%20Accelerators%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mei-Yu%22%2C%22lastName%22%3A%22Wang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Paola%22%2C%22lastName%22%3A%22Buitrago%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%222025-07-20%22%2C%22proceedingsTitle%22%3A%22Practice%20and%20Experience%20in%20Advanced%20Research%20Computing%202025%3A%20The%20Power%20of%20Collaboration%22%2C%22conferenceName%22%3A%22PEARC%20%2725%3A%20Practice%20and%20Experience%20in%20Advanced%20Research%20Computing%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1145%5C%2F3708035.3736101%22%2C%22ISBN%22%3A%229798400713989%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdl.acm.org%5C%2Fdoi%5C%2F10.1145%5C%2F3708035.3736101%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-08-04T15%3A19%3A08Z%22%7D%7D%2C%7B%22key%22%3A%224N9GU4SZ%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Sharma%20et%20al.%22%2C%22parsedDate%22%3A%222024%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BSharma%2C%20A.%2C%20Ding%2C%20H.%2C%20Li%2C%20J.%2C%20Dani%2C%20N.%20%26amp%3B%20Zhang%2C%20M.%20MiniKV%3A%20Pushing%20the%20Limits%20of%20LLM%20Inference%20via%202-Bit%20Layer-Discriminative%20KV%20Cache.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2411.18077%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2411.18077%26lt%3B%5C%2Fa%26gt%3B%20%282024%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22MiniKV%3A%20Pushing%20the%20Limits%20of%20LLM%20Inference%20via%202-Bit%20Layer-Discriminative%20KV%20Cache%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Akshat%22%2C%22lastName%22%3A%22Sharma%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hangliang%22%2C%22lastName%22%3A%22Ding%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jianping%22%2C%22lastName%22%3A%22Li%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Neel%22%2C%22lastName%22%3A%22Dani%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Minjia%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22How%20to%20efficiently%20serve%20LLMs%20in%20practice%20has%20become%20exceptionally%20challenging%20due%20to%20their%20prohibitive%20memory%20and%20computation%20requirements.%20In%20this%20study%2C%20we%20investigate%20optimizing%20the%20KV%20cache%2C%20whose%20memory%20footprint%20poses%20a%20critical%20bottleneck%20in%20LLM%20inference%2C%20especially%20when%20dealing%20with%20long%20context%20tasks.%20To%20tackle%20the%20challenge%2C%20we%20introduce%20MiniKV%2C%20a%20KV%20cache%20optimization%20method%20that%20simultaneously%20preserves%20long%20context%20task%20accuracy%20while%20significantly%20reducing%20KV%20cache%20size%20via%20a%20novel%202-bit%20layer-discriminative%20KV%20cache.%20More%20importantly%2C%20we%20develop%20specialized%20CUDA%20kernels%20to%20make%20MiniKV%20compatible%20with%20FlashAttention.%20Experiments%20on%20a%20wide%20range%20of%20long%20context%20tasks%20show%20that%20MiniKV%20effectively%20achieves%2086%25%20KV%20cache%20compression%20ratio%20while%20recovering%20over%2098.5%25%20of%20accuracy%2C%20outperforming%20state-of-the-art%20methods%20while%20achieving%20excellent%20measured%20system%20performance%20improvements.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222024%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2411.18077%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2411.18077%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-07-29T14%3A19%3A53Z%22%7D%7D%2C%7B%22key%22%3A%22ARYPIVHK%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Kargupta%20et%20al.%22%2C%22parsedDate%22%3A%222024%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BKargupta%2C%20P.%2C%20Zhang%2C%20Y.%2C%20Jiao%2C%20Y.%2C%20Ouyang%2C%20S.%20%26amp%3B%20Han%2C%20J.%20Synergizing%20Unsupervised%20Episode%20Detection%20with%20LLMs%20for%20Large-Scale%20News%20Events.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2408.04873%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2408.04873%26lt%3B%5C%2Fa%26gt%3B%20%282024%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Synergizing%20Unsupervised%20Episode%20Detection%20with%20LLMs%20for%20Large-Scale%20News%20Events%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Priyanka%22%2C%22lastName%22%3A%22Kargupta%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yunyi%22%2C%22lastName%22%3A%22Zhang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yizhu%22%2C%22lastName%22%3A%22Jiao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Siru%22%2C%22lastName%22%3A%22Ouyang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jiawei%22%2C%22lastName%22%3A%22Han%22%7D%5D%2C%22abstractNote%22%3A%22State-of-the-art%20automatic%20event%20detection%20struggles%20with%20interpretability%20and%20adaptability%20to%20evolving%20large-scale%20key%20events%20--%20unlike%20episodic%20structures%2C%20which%20excel%20in%20these%20areas.%20Often%20overlooked%2C%20episodes%20represent%20cohesive%20clusters%20of%20core%20entities%20performing%20actions%20at%20a%20specific%20time%20and%20location%3B%20a%20partially%20ordered%20sequence%20of%20episodes%20can%20represent%20a%20key%20event.%20This%20paper%20introduces%20a%20novel%20task%2C%20episode%20detection%2C%20which%20identifies%20episodes%20within%20a%20news%20corpus%20of%20key%20event%20articles.%20Detecting%20episodes%20poses%20unique%20challenges%2C%20as%20they%20lack%20explicit%20temporal%20or%20locational%20markers%20and%20cannot%20be%20merged%20using%20semantic%20similarity%20alone.%20While%20large%20language%20models%20%28LLMs%29%20can%20aid%20with%20these%20reasoning%20difficulties%2C%20they%20suffer%20with%20long%20contexts%20typical%20of%20news%20corpora.%20To%20address%20these%20challenges%2C%20we%20introduce%20EpiMine%2C%20an%20unsupervised%20framework%20that%20identifies%20a%20key%20event%26%23039%3Bs%20candidate%20episodes%20by%20leveraging%20natural%20episodic%20partitions%20in%20articles%2C%20estimated%20through%20shifts%20in%20discriminative%20term%20combinations.%20These%20candidate%20episodes%20are%20more%20cohesive%20and%20representative%20of%20true%20episodes%2C%20synergizing%20with%20LLMs%20to%20better%20interpret%20and%20refine%20them%20into%20final%20episodes.%20We%20apply%20EpiMine%20to%20our%20three%20diverse%2C%20real-world%20event%20datasets%20annotated%20at%20the%20episode%20level%2C%20where%20it%20achieves%20a%2059.2%25%20average%20gain%20across%20all%20metrics%20compared%20to%20baselines.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222024%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2408.04873%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2408.04873%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-07-29T14%3A07%3A57Z%22%7D%7D%2C%7B%22key%22%3A%229UXV2M4C%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Tian%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BTian%2C%20B.%2C%20Gao%2C%20Q.%2C%20Xianyu%2C%20S.%2C%20Cui%2C%20X.%20%26amp%3B%20Zhang%2C%20M.%20FlexGaussian%3A%20Flexible%20and%20Cost-Effective%20Training-Free%20Compression%20for%203D%20Gaussian%20Splatting.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.06671%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.06671%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22FlexGaussian%3A%20Flexible%20and%20Cost-Effective%20Training-Free%20Compression%20for%203D%20Gaussian%20Splatting%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Boyuan%22%2C%22lastName%22%3A%22Tian%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Qizhe%22%2C%22lastName%22%3A%22Gao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Siran%22%2C%22lastName%22%3A%22Xianyu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Xiaotong%22%2C%22lastName%22%3A%22Cui%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Minjia%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%223D%20Gaussian%20splatting%20has%20become%20a%20prominent%20technique%20for%20representing%20and%20rendering%20complex%203D%20scenes%2C%20due%20to%20its%20high%20fidelity%20and%20speed%20advantages.%20However%2C%20the%20growing%20demand%20for%20large-scale%20models%20calls%20for%20effective%20compression%20to%20reduce%20memory%20and%20computation%20costs%2C%20especially%20on%20mobile%20and%20edge%20devices%20with%20limited%20resources.%20Existing%20compression%20methods%20effectively%20reduce%203D%20Gaussian%20parameters%20but%20often%20require%20extensive%20retraining%20or%20fine-tuning%2C%20lacking%20flexibility%20under%20varying%20compression%20constraints.%5Cn%20In%20this%20paper%2C%20we%20introduce%20FlexGaussian%2C%20a%20flexible%20and%20cost-effective%20method%20that%20combines%20mixed-precision%20quantization%20with%20attribute-discriminative%20pruning%20for%20training-free%203D%20Gaussian%20compression.%20FlexGaussian%20eliminates%20the%20need%20for%20retraining%20and%20adapts%20easily%20to%20diverse%20compression%20targets.%20Evaluation%20results%20show%20that%20FlexGaussian%20achieves%20up%20to%2096.4%25%20compression%20while%20maintaining%20high%20rendering%20quality%20%28%26lt%3B1%20dB%20drop%20in%20PSNR%29%2C%20and%20is%20deployable%20on%20mobile%20devices.%20FlexGaussian%20delivers%20high%20compression%20ratios%20within%20seconds%2C%20being%201.7-2.1x%20faster%20than%20state-of-the-art%20training-free%20methods%20and%2010-100x%20faster%20than%20training-involved%20approaches.%20The%20code%20is%20being%20prepared%20and%20will%20be%20released%20soon%20at%3A%20https%3A%5C%2F%5C%2Fgithub.com%5C%2FSupercomputing-System-AI-Lab%5C%2FFlexGaussian%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2507.06671%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2507.06671%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-07-24T21%3A17%3A27Z%22%7D%7D%2C%7B%22key%22%3A%22SFMCCAK7%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Park%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BPark%2C%20J.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20Bridging%20Sequential%20Deep%20Operator%20Network%20and%20Video%20Diffusion%3A%20Residual%20Refinement%20of%20Spatio-Temporal%20PDE%20Solutions.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.06133%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.06133%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Bridging%20Sequential%20Deep%20Operator%20Network%20and%20Video%20Diffusion%3A%20Residual%20Refinement%20of%20Spatio-Temporal%20PDE%20Solutions%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jaewan%22%2C%22lastName%22%3A%22Park%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Farid%22%2C%22lastName%22%3A%22Ahmed%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kazuma%22%2C%22lastName%22%3A%22Kobayashi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Seid%22%2C%22lastName%22%3A%22Koric%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Syed%20Bahauddin%22%2C%22lastName%22%3A%22Alam%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Iwona%22%2C%22lastName%22%3A%22Jasiuk%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Diab%22%2C%22lastName%22%3A%22Abueidda%22%7D%5D%2C%22abstractNote%22%3A%22Video-diffusion%20models%20have%20recently%20set%20the%20standard%20in%20video%20generation%2C%20inpainting%2C%20and%20domain%20translation%20thanks%20to%20their%20training%20stability%20and%20high%20perceptual%20fidelity.%20Building%20on%20these%20strengths%2C%20we%20repurpose%20conditional%20video%20diffusion%20as%20a%20physics%20surrogate%20for%20spatio-temporal%20fields%20governed%20by%20partial%20differential%20equations%20%28PDEs%29.%20Our%20two-stage%20surrogate%20first%20applies%20a%20Sequential%20Deep%20Operator%20Network%20%28S-DeepONet%29%20to%20produce%20a%20coarse%2C%20physics-consistent%20prior%20from%20the%20prescribed%20boundary%20or%20loading%20conditions.%20The%20prior%20is%20then%20passed%20to%20a%20conditional%20video%20diffusion%20model%20that%20learns%20only%20the%20residual%3A%20the%20point-wise%20difference%20between%20the%20ground%20truth%20and%20the%20S-DeepONet%20prediction.%20By%20shifting%20the%20learning%20burden%20from%20the%20full%20solution%20to%20its%20much%20smaller%20residual%20space%2C%20diffusion%20can%20focus%20on%20sharpening%20high-frequency%20structures%20without%20sacrificing%20global%20coherence.%20The%20framework%20is%20assessed%20on%20two%20disparate%20benchmarks%3A%20%28i%29%20vortex-dominated%20lid-driven%20cavity%20flow%20and%20%28ii%29%20tensile%20plastic%20deformation%20of%20dogbone%20specimens.%20Across%20these%20data%20sets%20the%20hybrid%20surrogate%20consistently%20outperforms%20its%20single-stage%20counterpart%2C%20cutting%20the%20mean%20relative%20L2%20error%20from%204.57%25%20to%200.83%25%20for%20the%20flow%20problem%20and%20from%204.42%25%20to%202.94%25%20for%20plasticity%2C%20a%20relative%20improvements%20of%2081.8%25%20and%2033.5%25%20respectively.%20The%20hybrid%20approach%20not%20only%20lowers%20quantitative%20errors%20but%20also%20improves%20visual%20quality%2C%20visibly%20recovering%20fine%20spatial%20details.%20These%20results%20show%20that%20%28i%29%20conditioning%20diffusion%20on%20a%20physics-aware%20prior%20enables%20faithful%20reconstruction%20of%20localized%20features%2C%20%28ii%29%20residual%20learning%20reduces%20the%20problem%2C%20accelerating%20convergence%20and%20enhancing%20accuracy%2C%20and%20%28iii%29%20the%20same%20architecture%20transfers%20seamlessly%20from%20incompressible%20flow%20to%20nonlinear%20elasto-plasticity%20without%20problem-specific%20architectural%20modifications%2C%20highlighting%20its%20broad%20applicability%20to%20nonlinear%2C%20time-dependent%20continua.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2507.06133%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2507.06133%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-07-24T19%3A37%3A13Z%22%7D%7D%2C%7B%22key%22%3A%22LBYIA4HX%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Xi%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BXi%2C%20J.%20%26lt%3Bi%26gt%3Bet%20al.%26lt%3B%5C%2Fi%26gt%3B%20VecFlow%3A%20A%20High-Performance%20Vector%20Data%20Management%20System%20for%20Filtered-Search%20on%20GPUs.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2506.00812%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2506.00812%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22VecFlow%3A%20A%20High-Performance%20Vector%20Data%20Management%20System%20for%20Filtered-Search%20on%20GPUs%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jingyi%22%2C%22lastName%22%3A%22Xi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Chenghao%22%2C%22lastName%22%3A%22Mo%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Benjamin%22%2C%22lastName%22%3A%22Karsin%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Artem%22%2C%22lastName%22%3A%22Chirkin%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mingqin%22%2C%22lastName%22%3A%22Li%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Minjia%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22Vector%20search%20and%20database%20systems%20have%20become%20a%20keystone%20component%20in%20many%20AI%20applications.%20While%20many%20prior%20research%20has%20investigated%20how%20to%20accelerate%20the%20performance%20of%20generic%20vector%20search%2C%20emerging%20AI%20applications%20require%20running%20more%20sophisticated%20vector%20queries%20efficiently%2C%20such%20as%20vector%20search%20with%20attribute%20filters.%20Unfortunately%2C%20recent%20filtered-ANNS%20solutions%20are%20primarily%20designed%20for%20CPUs%2C%20with%20few%20exploration%20and%20limited%20performance%20of%20filtered-ANNS%20that%20take%20advantage%20of%20the%20massive%20parallelism%20offered%20by%20GPUs.%20In%20this%20paper%2C%20we%20present%20VecFlow%2C%20a%20novel%20high-performance%20vector%20filtered%20search%20system%20that%20achieves%20unprecedented%20high%20throughput%20and%20recall%20while%20obtaining%20low%20latency%20for%20filtered-ANNS%20on%20GPUs.%20We%20propose%20a%20novel%20label-centric%20indexing%20and%20search%20algorithm%20that%20significantly%20improves%20the%20selectivity%20of%20ANNS%20with%20filters.%20In%20addition%20to%20algorithmic%20level%20optimization%2C%20we%20provide%20architectural-aware%20optimization%20for%20VecFlow%26%23039%3Bs%20functional%20modules%2C%20effectively%20supporting%20both%20small%20batch%20and%20large%20batch%20queries%2C%20and%20single-label%20and%20multi-label%20query%20processing.%20Experimental%20results%20on%20NVIDIA%20A100%20GPU%20over%20several%20public%20available%20datasets%20validate%20that%20VecFlow%20achieves%205%20million%20QPS%20for%20recall%2090%25%2C%20outperforming%20state-of-the-art%20CPU-based%20solutions%20such%20as%20Filtered-DiskANN%20by%20up%20to%20135%20times.%20Alternatively%2C%20VecFlow%20can%20easily%20extend%20its%20support%20to%20high%20recall%2099%25%20regime%2C%20whereas%20strong%20GPU-based%20baselines%20plateau%20at%20around%2080%25%20recall.%20The%20source%20code%20is%20available%20at%20https%3A%5C%2F%5C%2Fgithub.com%5C%2FSupercomputing-System-AI-Lab%5C%2FVecFlow.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2506.00812%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2506.00812%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-07-24T19%3A22%3A15Z%22%7D%7D%2C%7B%22key%22%3A%22QF25TFI4%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Liu%20and%20Koric%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BLiu%2C%20Q.%20%26amp%3B%20Koric%2C%20S.%20Sequential%20Neural%20Operator%20Transformer%20for%20High-Fidelity%20Surrogates%20of%20Time-Dependent%20Non-linear%20Partial%20Differential%20Equations.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.03272%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.03272%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Sequential%20Neural%20Operator%20Transformer%20for%20High-Fidelity%20Surrogates%20of%20Time-Dependent%20Non-linear%20Partial%20Differential%20Equations%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Qibang%22%2C%22lastName%22%3A%22Liu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Seid%22%2C%22lastName%22%3A%22Koric%22%7D%5D%2C%22abstractNote%22%3A%22Partial%20differential%20equations%20%28PDEs%29%20are%20fundamental%20to%20modeling%20complex%20and%20nonlinear%20physical%20phenomena%2C%20but%20their%20numerical%20solution%20often%20requires%20significant%20computational%20resources%2C%20particularly%20when%20a%20large%20number%20of%20forward%20full%20solution%20evaluations%20are%20necessary%2C%20such%20as%20in%20design%2C%20optimization%2C%20sensitivity%20analysis%2C%20and%20uncertainty%20quantification.%20Recent%20progress%20in%20operator%20learning%20has%20enabled%20surrogate%20models%20that%20efficiently%20predict%20full%20PDE%20solution%20fields%3B%20however%2C%20these%20models%20often%20struggle%20with%20accuracy%20and%20robustness%20when%20faced%20with%20highly%20nonlinear%20responses%20driven%20by%20sequential%20input%20functions.%20To%20address%20these%20challenges%2C%20we%20propose%20the%20Sequential%20Neural%20Operator%20Transformer%20%28S-NOT%29%2C%20a%20architecture%20that%20combines%20gated%20recurrent%20units%20%28GRUs%29%20with%20the%20self-attention%20mechanism%20of%20transformers%20to%20address%20time-dependent%2Cnonlinear%20PDEs.%20Unlike%20S-DeepONet%20%28S-DON%29%2C%20which%20uses%20a%20dot%20product%20to%20merge%20encoded%20outputs%20from%20the%20branch%20and%20trunk%20sub-networks%2C%20S-NOT%20leverages%20attention%20to%20better%20capture%20intricate%20dependencies%20between%20sequential%20inputs%20and%20spatial%20query%20points.%20We%20benchmark%20S-NOT%20on%20three%20challenging%20datasets%20from%20real-world%20applications%20with%20plastic%20and%20thermo-viscoplastic%20highly%20nonlinear%20material%20responses%3A%20multiphysics%20steel%20solidification%2C%20a%203D%20lug%20specimen%2C%20and%20a%20dogbone%20specimen%20under%20temporal%20and%20path-dependent%20loadings.%20The%20results%20show%20that%20S-NOT%20consistently%20achieves%20a%20higher%20prediction%20accuracy%20than%20S-DON%20even%20for%20data%20outliers%2C%20demonstrating%20its%20accuracy%20and%20robustness%20for%20drastically%20accelerating%20computational%20frameworks%20in%20scientific%20and%20engineering%20applications.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2507.03272%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2507.03272%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-07-24T19%3A21%3A11Z%22%7D%7D%2C%7B%22key%22%3A%22XULGGRY6%22%2C%22library%22%3A%7B%22id%22%3A5854943%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Kacmaz%20et%20al.%22%2C%22parsedDate%22%3A%222025%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%26lt%3Bdiv%20class%3D%26quot%3Bcsl-bib-body%26quot%3B%20style%3D%26quot%3Bline-height%3A%202%3B%20%26quot%3B%26gt%3B%5Cn%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-entry%26quot%3B%20style%3D%26quot%3Bclear%3A%20left%3B%20%26quot%3B%26gt%3B%5Cn%20%20%20%20%26lt%3Bdiv%20class%3D%26quot%3Bcsl-left-margin%26quot%3B%20style%3D%26quot%3Bfloat%3A%20left%3B%20padding-right%3A%200.5em%3B%20text-align%3A%20right%3B%20width%3A%201em%3B%26quot%3B%26gt%3B1.%26lt%3B%5C%2Fdiv%26gt%3B%26lt%3Bdiv%20class%3D%26quot%3Bcsl-right-inline%26quot%3B%20style%3D%26quot%3Bmargin%3A%200%20.4em%200%201.5em%3B%26quot%3B%26gt%3BKacmaz%2C%20S.%2C%20Huerta%2C%20E.%20A.%20%26amp%3B%20Haas%2C%20R.%20Resolving%20Turbulent%20Magnetohydrodynamics%3A%20A%20Hybrid%20Operator-Diffusion%20Framework.%20Preprint%20at%20%26lt%3Ba%20class%3D%26%23039%3Bzp-DOIURL%26%23039%3B%20href%3D%26%23039%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.02106%26%23039%3B%26gt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FARXIV.2507.02106%26lt%3B%5C%2Fa%26gt%3B%20%282025%29.%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%20%20%26lt%3B%5C%2Fdiv%26gt%3B%5Cn%26lt%3B%5C%2Fdiv%26gt%3B%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Resolving%20Turbulent%20Magnetohydrodynamics%3A%20A%20Hybrid%20Operator-Diffusion%20Framework%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Semih%22%2C%22lastName%22%3A%22Kacmaz%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22E.%20A.%22%2C%22lastName%22%3A%22Huerta%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Roland%22%2C%22lastName%22%3A%22Haas%22%7D%5D%2C%22abstractNote%22%3A%22We%20present%20a%20hybrid%20machine%20learning%20framework%20that%20combines%20Physics-Informed%20Neural%20Operators%20%28PINOs%29%20with%20score-based%20generative%20diffusion%20models%20to%20simulate%20the%20full%20spatio-temporal%20evolution%20of%20two-dimensional%2C%20incompressible%2C%20resistive%20magnetohydrodynamic%20%28MHD%29%20turbulence%20across%20a%20broad%20range%20of%20Reynolds%20numbers%20%28%24%5C%5Cmathrm%7BRe%7D%24%29.%20The%20framework%20leverages%20the%20equation-constrained%20generalization%20capabilities%20of%20PINOs%20to%20predict%20coherent%2C%20low-frequency%20dynamics%2C%20while%20a%20conditional%20diffusion%20model%20stochastically%20corrects%20high-frequency%20residuals%2C%20enabling%20accurate%20modeling%20of%20fully%20developed%20turbulence.%20Trained%20on%20a%20comprehensive%20ensemble%20of%20high-fidelity%20simulations%20with%20%24%5C%5Cmathrm%7BRe%7D%20%5C%5Cin%20%5C%5C%7B100%2C%20250%2C%20500%2C%20750%2C%201000%2C%203000%2C%2010000%5C%5C%7D%24%2C%20the%20approach%20achieves%20state-of-the-art%20accuracy%20in%20regimes%20previously%20inaccessible%20to%20deterministic%20surrogates.%20At%20%24%5C%5Cmathrm%7BRe%7D%3D1000%24%20and%20%243000%24%2C%20the%20model%20faithfully%20reconstructs%20the%20full%20spectral%20energy%20distributions%20of%20both%20velocity%20and%20magnetic%20fields%20late%20into%20the%20simulation%2C%20capturing%20non-Gaussian%20statistics%2C%20intermittent%20structures%2C%20and%20cross-field%20correlations%20with%20high%20fidelity.%20At%20extreme%20turbulence%20levels%20%28%24%5C%5Cmathrm%7BRe%7D%3D10000%24%29%2C%20it%20remains%20the%20first%20surrogate%20capable%20of%20recovering%20the%20high-wavenumber%20evolution%20of%20the%20magnetic%20field%2C%20preserving%20large-scale%20morphology%20and%20enabling%20statistically%20meaningful%20predictions.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222025%22%2C%22DOI%22%3A%2210.48550%5C%2FARXIV.2507.02106%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2507.02106%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%223NXZNVBX%22%5D%2C%22dateModified%22%3A%222025-07-24T19%3A07%3A40Z%22%7D%7D%5D%7D
1.
Pandey, S., Lovell, C. C., Modi, C. & Wandelt, B. D. Galactification: painting galaxies onto dark matter only simulations using a transformer-based model. Preprint at https://doi.org/10.48550/ARXIV.2511.08438 (2025).
1.
Zhao, Y., Wang, Z. & Zhang, M. PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference. Preprint at https://doi.org/10.48550/ARXIV.2511.04805 (2025).
1.
Yan, X., Firestone, M. A., Keceli, M., Chaudhuri, S. & Huerta, E. From Atomistic Models to Machine Learning: Predictive Design of Nanocarbons under Extreme Conditions. Biomedicine 26, 27.
1.
Zeng, G., Zhou, Z., Arora, D. & Zanette, A. Shrinking the Variance: Shrinkage Baselines for Reinforcement Learning with Verifiable Rewards. Preprint at https://doi.org/10.48550/ARXIV.2511.03710 (2025).
1.
Wen, J., Schwing, A. G. & Wang, S. NoPo-Avatar: Generalizable and Animatable Avatars from Sparse Inputs without Human Poses. Preprint at https://doi.org/10.48550/ARXIV.2511.16673 (2025).
1.
Mohapatra, R., Dutta, A. & Sharma, P. Tracing Multiphase Structure in the Circumgalactic Medium: Insights from Magnetohydrodynamic Turbulence Simulations. Preprint at https://doi.org/10.48550/ARXIV.2511.00229 (2025).
1.
Loehr, K. & Clark, B. K. Enhancing Neural Network Backflow. Preprint at https://doi.org/10.48550/ARXIV.2510.26906 (2025).
1.
Zhang, Z. A. et al. One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding. in (2025).
1.
Vega, O., Komijani, J., El-Khadra, A. & Marinkovic, M. Group-Equivariant Diffusion Models for Lattice Field Theory. Preprint at https://doi.org/10.48550/ARXIV.2510.26081 (2025).
1.
Cui, S. et al. Story of Two GPUs: Characterizing the Resilience of Hopper H100 and Ampere A100 GPUs. in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 1145–1164 (ACM, St. Louis MO USA, 2025). http://doi.org/10.1145/3712285.3759821.
1.
Zhang, Y., Schwing, A. & Zhao, Z. Variational Masked Diffusion Models. Preprint at https://doi.org/10.48550/ARXIV.2510.23606 (2025).
1.
Cross-Domain Long-Term Forecasting: Radiation Dose from Sparse Neutron Sensor via Spatio-Temporal Operator Network. https://arxiv.org/html/2510.18041v1.
1.
Chen, H. et al. ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning. Preprint at https://doi.org/10.48550/ARXIV.2510.12693 (2025).
1.
Wu, M. & Zhang, Z. Maple: A Multi-agent System for Portable Deep Learning across Clusters. Preprint at https://doi.org/10.48550/ARXIV.2510.08842 (2025).
1.
Xie, H. et al. Diamond: Harnessing GPU Resources for Scientific Deep Learning. in 2025 IEEE International Conference on eScience (eScience) 196–204 (IEEE, Chicago, IL, USA, 2025). http://doi.org/10.1109/eScience65000.2025.00031.
1.
Patel, P. et al. RADAR-Radio Afterglow Detection and AI-driven Response: A Federated Framework for Gravitational Wave Event Follow-Up. Preprint at https://doi.org/10.48550/ARXIV.2507.14827 (2025).
1.
Kacmaz, S., Haas, R. & Huerta, E. A. Machine Learning-Driven Conservative-to-Primitive Conversion in Hybrid Piecewise Polytropic and Tabulated Equations of State. Symmetry 17, 1409 (2025).
1.
Srivastava, A., Basiri, S. & Salapaka, S. Autonomy-Aware Clustering: When Local Decisions Supersede Global Prescriptions. Preprint at https://doi.org/10.48550/ARXIV.2509.25775 (2025).
1.
Zhu, M. et al. Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark. Preprint at https://doi.org/10.48550/ARXIV.2509.26574 (2025).
1.
Lian, X., Tanaka, M., Ruwase, O. & Zhang, M. SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips. Preprint at https://doi.org/10.48550/ARXIV.2509.21271 (2025).
1.
Díaz-Ibarra, O. H. et al. TChem-atm (v2.0.0): Scalable Performance-Portable Multiphase Atmospheric Chemistry. Preprint at https://doi.org/10.5194/egusphere-2025-4376 (2025).
1.
Zhao, Y., LV, J., Wu, D., Wang, J. & Gooley, C. Are We Scaling the Right Thing? A System Perspective on Test-Time Scaling. Preprint at https://doi.org/10.48550/ARXIV.2509.19645 (2025).
1.
Wilfong, B. et al. Testing and benchmarking emerging supercomputers via the MFC flow solver. Preprint at https://doi.org/10.48550/ARXIV.2509.13575 (2025).
1.
Bazavov, A. et al. High-Precision Scale Setting with the Omega-Baryon Mass and Gradient Flow. Preprint at https://doi.org/10.48550/ARXIV.2509.14367 (2025).
1.
Yazdani-Jahromi, M., Yalabadi, A. K. & Garibay, O. O. Equi-mRNA: Protein Translation Equivariant Encoding for mRNA Language Models. Preprint at https://doi.org/10.48550/ARXIV.2508.15103 (2025).
1.
Yu, J., Taneja, A., Lin, J. & Zhang, M. VoltanaLLM: Feedback-Driven Frequency Control and State-Space Routing for Energy-Efficient LLM Serving. Preprint at https://doi.org/10.48550/ARXIV.2509.04827 (2025).
1.
Baño-Medina, J. et al. A Regional High Resolution AI Weather Model for the Prediction of Atmospheric Rivers and Extreme Precipitation. Preprint at https://doi.org/10.21203/rs.3.rs-7087242/v1 (2025).
1.
Yuan, Y. et al. X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms. Preprint at https://doi.org/10.48550/ARXIV.2508.13337 (2025).
1.
Adams, M. & Bienz, A. Optimizing Allreduce Operations for Heterogeneous Architectures with Multiple Processes per GPU. Preprint at https://doi.org/10.48550/ARXIV.2508.13397 (2025).
1.
Gong, Y., Zhu, Z. & Zhang, M. InstantEdit: Text-Guided Few-Step Image Editing with Piecewise Rectified Flow. Preprint at https://doi.org/10.48550/ARXIV.2508.06033 (2025).
1.
Wu, T. et al. Spatial Heterogeneity Alters the Dynamics of the Yeast Galactose Switch: Insights from 4D RDME–ODE Hybrid Simulations. Preprint at https://doi.org/10.1101/2025.07.23.666409 (2025).
1.
Zhu, Z. et al. Understanding the Landscape of Ampere GPU Memory Errors. Preprint at https://doi.org/10.48550/arXiv.2508.03513 (2025).
1.
Bharadwaj, S. et al. OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder. Preprint at https://doi.org/10.48550/ARXIV.2507.14129 (2025).
1.
Bode, B., Bauer, G., Herriott, L., Kindratenko, V. & Gropp, W. DeltaAI: A National Resource for AI/ML Research. in Practice and Experience in Advanced Research Computing 2025: The Power of Collaboration 1–4 (ACM, Columbus Ohio USA, 2025). http://doi.org/10.1145/3708035.3736062.
1.
Wang, R., Li, Y., Fung, Y. R. & Zhang, T. Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability. Preprint at https://doi.org/10.48550/ARXIV.2505.23703 (2025).
1.
Yao, J., Wang, R. & Zhang, T. FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4. Preprint at https://doi.org/10.48550/ARXIV.2503.03238 (2025).
1.
Gladstone, A. et al. Energy-Based Transformers are Scalable Learners and Thinkers. Preprint at https://doi.org/10.48550/ARXIV.2507.02092 (2025).
1.
Pan, R. et al. Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training. Preprint at https://doi.org/10.48550/ARXIV.2502.03460 (2025).
1.
Wang, R. et al. MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving. Preprint at https://doi.org/10.48550/ARXIV.2503.03205 (2025).
1.
Lian, X. et al. Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelis. Preprint at https://doi.org/10.48550/ARXIV.2406.18820 (2024).
1.
Wang, X. et al. MedCite: Can Language Models Generate Verifiable Text for Medicine? Preprint at https://doi.org/10.48550/ARXIV.2506.06605 (2025).
1.
Basiri, S., Tiwari, D. & Salapaka, S. M. Parametrized Multi-Agent Routing via Deep Attention Models. Preprint at https://doi.org/10.48550/ARXIV.2507.22338 (2025).
1.
Wang, M.-Y. & Buitrago, P. Evaluating Pretraining Efficiency of Language Models on AI Accelerators. in Practice and Experience in Advanced Research Computing 2025: The Power of Collaboration 1–5 (ACM, Columbus Ohio USA, 2025). http://doi.org/10.1145/3708035.3736101.
1.
Sharma, A., Ding, H., Li, J., Dani, N. & Zhang, M. MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache. Preprint at https://doi.org/10.48550/ARXIV.2411.18077 (2024).
1.
Kargupta, P., Zhang, Y., Jiao, Y., Ouyang, S. & Han, J. Synergizing Unsupervised Episode Detection with LLMs for Large-Scale News Events. Preprint at https://doi.org/10.48550/ARXIV.2408.04873 (2024).
1.
Tian, B., Gao, Q., Xianyu, S., Cui, X. & Zhang, M. FlexGaussian: Flexible and Cost-Effective Training-Free Compression for 3D Gaussian Splatting. Preprint at https://doi.org/10.48550/ARXIV.2507.06671 (2025).
1.
Park, J. et al. Bridging Sequential Deep Operator Network and Video Diffusion: Residual Refinement of Spatio-Temporal PDE Solutions. Preprint at https://doi.org/10.48550/ARXIV.2507.06133 (2025).
1.
Xi, J. et al. VecFlow: A High-Performance Vector Data Management System for Filtered-Search on GPUs. Preprint at https://doi.org/10.48550/ARXIV.2506.00812 (2025).
1.
Liu, Q. & Koric, S. Sequential Neural Operator Transformer for High-Fidelity Surrogates of Time-Dependent Non-linear Partial Differential Equations. Preprint at https://doi.org/10.48550/ARXIV.2507.03272 (2025).
1.
Kacmaz, S., Huerta, E. A. & Haas, R. Resolving Turbulent Magnetohydrodynamics: A Hybrid Operator-Diffusion Framework. Preprint at https://doi.org/10.48550/ARXIV.2507.02106 (2025).