| 导航世界模型 |
Amir Bar |
PDF |
N/A |
Navigation World Models |
| Style3D:面向3D物体生成的注意力引导多视角风格迁移 |
Bingjie Song |
PDF |
N/A |
Style3D: Attention-guided Multi-view Style Transfer for 3D Object Generation |
| 通过生成合成分析实现稀疏视图姿态估计与重建 |
Qitao Zhao |
PDF |
N/A |
Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis |
| 《黑客帝国:无限地平线世界生成与实时移动控制》 |
Ruili Feng |
PDF |
N/A |
The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control |
| 查询事件开始的流式检测 |
Cristobal Eyzaguirre |
PDF |
N/A |
Streaming Detection of Queried Event Start |
| FreeSim:在驾驶场景中实现自由视角相机模拟 |
Lue Fan |
PDF |
N/A |
FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes |
| Inst-IT:通过显式视觉提示指令调优提升多模态实例理解 |
Wujian Peng |
PDF |
N/A |
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning |
| 从个体到社会:基于大型语言模型代理的社会模拟调查 |
Xinyi Mou |
PDF |
N/A |
From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents |
| FLAIR:具有细粒度语言引导图像表示的视觉语言模型 |
Rui Xiao |
PDF |
N/A |
FLAIR: VLM with Fine-grained Language-informed Image Representations |
| MIDI:用于单张图像生成3D场景的多实例扩散 |
Zehuan Huang |
PDF |
N/A |
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation |
| 最佳N次越狱 |
John Hughes |
PDF |
N/A |
Best-of-N Jailbreaking |
| PaliGemma 2:多功能 VLM 家族,助力迁移 |
Andreas Steiner |
PDF |
N/A |
PaliGemma 2: A Family of Versatile VLMs for Transfer |
| Imagine360:从视角锚点生成沉浸式360度视频 |
Jing Tan |
PDF |
N/A |
Imagine360: Immersive 360 Video Generation from Perspective Anchor |
| 感知令牌增强多模态语言模型中的视觉推理能力 |
Mahtab Bigverdi |
PDF |
N/A |
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models |
| NODE-AdvGAN:通过动态系统驱动的对抗生成模型提升对抗样本的迁移性和感知相似性 |
Xinheng Xie |
PDF |
N/A |
NODE-AdvGAN: Improving the transferability and perceptual similarity of adversarial examples by dynamic-system-driven adversarial generative model |
| 评估预训练语言模型与提示适应模型之间的性别偏见传递 |
Natalie Mackraz |
PDF |
N/A |
Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models |
| 关于利用大型语言模型在生物医学科学中进行科学知识提取的综述 |
Gabriel Lino Garcia |
PDF |
N/A |
A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences |
| FANAL -- 金融活动新闻警报语言建模框架 |
Urjitkumar Patel |
PDF |
N/A |
FANAL -- Financial Activity News Alerting Language Modeling Framework |
| 单目视频动态场景的前馈子弹时间重建 |
Hanxue Liang |
PDF |
N/A |
Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos |
| 超越视角:基于全局注意力的多视角驾驶场景视频生成 |
Hannan Lu |
PDF |
N/A |
Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention |
| 受卷帘快门影响的光场图像密集场景重建 |
Hermes McGriff |
PDF |
N/A |
Dense Scene Reconstruction from Light-Field Images Affected by Rolling Shutter |
| NVComposer:利用多张稀疏且未对齐的图像提升生成新视角合成效果 |
Lingen Li |
PDF |
N/A |
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images |
| 你(不)是我的菜——大型语言模型能否为初级编程任务生成特定类型的反馈? |
Dominic Lohr |
PDF |
N/A |
You're (Not) My Type -- Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks? |
| 将扩散模型蒸馏为高效的3D LiDAR场景补全 |
Shengyuan Zhang |
PDF |
N/A |
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion |
| KKLIP:利用K均值聚类的知识蒸馏技术进行语言-图像预训练 |
Kuei-Chun Kao |
PDF |
N/A |
KKLIP: Knowledge Distillation Exploiting K-means Clustering for Language-Image Pre-Training |
| 扩散特征的蒸馏用于语义对应 |
Frank Fundel |
PDF |
N/A |
Distillation of Diffusion Features for Semantic Correspondence |
| 用于学习弱形式算子和梯度流的自我测试损失函数 |
Yuan Gao |
PDF |
N/A |
Self-test loss functions for learning weak-form operators and gradient flows |
| 使用身体标志进行精确步态识别的双向孪生循环神经网络 |
Proma Hossain Progga |
PDF |
N/A |
A Bidirectional Siamese Recurrent Neural Network for Accurate Gait Recognition Using Body Landmarks |
| 软校验和标记不可信的机器学习代理预测及其在原子物理模拟中的应用 |
Casey Lauer |
PDF |
N/A |
Soft Checksums to Flag Untrustworthy Machine Learning Surrogate Predictions and Application to Atomic Physics Simulations |
| TRENDy:有效非线性动力学的时间回归 |
Matthew Ricci |
PDF |
N/A |
TRENDy: Temporal Regression of Effective Non-linear Dynamics |
| 超越算法超参数:关于机器学习应用中的预处理超参数及其相关陷阱 |
Christina Sauer |
PDF |
N/A |
Beyond algorithm hyperparameters: on preprocessing hyperparameters and associated pitfalls in machine learning applications |
| 在目标检测的背景下,语义信息与深度信息的融合 |
Md Abu Yusuf |
PDF |
N/A |
Data Fusion of Semantic and Depth Information in the Context of Object Detection |
| 流匹配与一般离散路径:一种动力学最优视角 |
Neta Shaul |
PDF |
N/A |
Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective |
| 紧密的PAC-贝叶斯风险证书用于对比学习 |
Anna van Elst |
PDF |
N/A |
Tight PAC-Bayesian Risk Certificates for Contrastive Learning |
| 卷积神经网络与专家混合模型在5G网络及未来网络入侵检测中的应用 |
Loukas Ilias |
PDF |
N/A |
Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond |
| Urban4D:城市场景重建的语义引导4D高斯喷洒技术 |
Ziwen Li |
PDF |
N/A |
Urban4D: Semantic-Guided 4D Gaussian Splatting for Urban Scene Reconstruction |
| 测量一切:基于视觉的实时多阶段尺寸测量,利用分割一切技术 |
Yongkyu Lee |
PDF |
N/A |
Measure Anything: Real-time, Multi-stage Vision-based Dimensional Measurement using Segment Anything |
| 聚类特定表示学习 |
Mahalakshmi Sabanayagam |
PDF |
N/A |
Cluster Specific Representation Learning |
| 无训练的语言推理能力在多模态指令调优后的缓解 |
Neale Ratzlaff |
PDF |
N/A |
Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning |
| YT-30M:一个多语言多类别的YouTube评论数据集 |
Hridoy Sankar Dutta |
PDF |
N/A |
YT-30M: A multi-lingual multi-category dataset of YouTube comments |
| 一致性CUSUM程序的有效性与效率 |
Vladimir Vovk |
PDF |
N/A |
Validity and efficiency of the conformal CUSUM procedure |
| 艺术品中的手势分类利用上下文图像特征 |
Azhar Hussian |
PDF |
N/A |
Gesture Classification in Artworks Using Contextual Image Features |
| 预训练的多潜在变量生成模型是抵御对抗攻击的良好防御者 |
Dario Serez |
PDF |
N/A |
Pre-trained Multiple Latent Variable Generative Models are good defenders against Adversarial Attacks |
| 平面喷涂:3分钟内精确的平面表面重建 |
Bin Tan |
PDF |
N/A |
PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes |
| 从文字到流程:自动化业务流程 |
Laura Minkova |
PDF |
N/A |
From Words to Workflows: Automating Business Processes |
| 状态频率估计用于异常检测 |
Clinton Cao |
PDF |
N/A |
State Frequency Estimation for Anomaly Detection |
| PBP:恶意软件分类器的后训练后门净化 |
Dung Thuy Nguyen |
PDF |
N/A |
PBP: Post-training Backdoor Purification for Malware Classifiers |
| CleanDIFT:无噪声的扩散特征 |
Nick Stracke |
PDF |
N/A |
CleanDIFT: Diffusion Features without Noise |
| BIMCaP:基于BIM的AI辅助激光雷达-相机姿态优化 |
Miguel Arturo Vega Torres |
PDF |
N/A |
BIMCaP: BIM-based AI-supported LiDAR-Camera Pose Refinement |
| 基于遗传算法的系统用于在单元网格环境中进行无人机群的路径规划 |
Alejandro Puente-Castro |
PDF |
N/A |
Genetic Algorithm Based System for Path Planning with Unmanned Aerial Vehicles Swarms in Cell-Grid Environments |
| 歌手:基于Vivid音频驱动的歌唱视频生成与多尺度谱扩散模型 |
Yan Li |
PDF |
N/A |
SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model |
| 2DGS-Room:基于种子引导的二维高斯喷洒与几何约束的高保真室内场景重建 |
Wanting Zhang |
PDF |
N/A |
2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction |
| 评估基础模型在精准医学中对生理信号的迁移能力 |
Matthias Christenson |
PDF |
N/A |
Assessing Foundation Models' Transferability to Physiological Signals in Precision Medicine |
| 探戈*:利用化学信息价值函数的约束合成规划 |
Daniel Armstrong |
PDF |
N/A |
Tango*: Constrained synthesis planning using chemically informed value functions |
| 使用模型推理搜索启发式方法自动生成REST API的测试用例 |
Clinton Cao |
PDF |
N/A |
Automated Test-Case Generation for REST APIs Using Model Inference Search Heuristic |
| 从物联网数据中学习语义关联规则 |
Erkan Karabulut |
PDF |
N/A |
Learning Semantic Association Rules from Internet of Things Data |
| 云遮挡下海表温度重建的深度学习方法 |
Andrea Asperti |
PDF |
N/A |
Deep Learning for Sea Surface Temperature Reconstruction under Cloud Occlusion |
| PrefixKV:自适应前缀KV缓存是视觉指令跟随模型高效生成所需的关键 |
Ao Wang |
PDF |
N/A |
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation |
| Skel3D:骨骼引导的新视角合成 |
Aron Fóthi |
PDF |
N/A |
Skel3D: Skeleton Guided Novel View Synthesis |
| 深度算子BSDE:一种近似解算子的数值方案 |
Giulia Di Nunno |
PDF |
N/A |
Deep Operator BSDE: a Numerical Scheme to Approximate the Solution Operators |
| 基准测试用于机器人辅助食管切除术实时识别的预训练注意力模型 |
Ronald L. P. D. de Jong |
PDF |
N/A |
Benchmarking Pretrained Attention-based Models for Real-Time Recognition in Robot-Assisted Esophagectomy |
| 通过目标标记调整在稳定扩散中进行隐式先验编辑 |
Feng He |
PDF |
N/A |
Implicit Priors Editing in Stable Diffusion via Targeted Token Adjustment |
| RedStone:为大型语言模型策划通用、代码、数学和问答数据 |
Yaoyao Chang |
PDF |
N/A |
RedStone: Curating General, Code, Math, and QA Data for Large Language Models |
| 神经算子是否总能被连续离散化? |
Takashi Furuya |
PDF |
N/A |
Can neural operators always be continuously discretized? |
| 通过不确定性量化实现风险感知分类 |
Murat Sensoy |
PDF |
N/A |
Risk-aware Classification via Uncertainty Quantification |
| 利用生成式人工智能增强供应链可见性:知识图谱中关系预测的探索性案例研究 |
Ge Zheng |
PDF |
N/A |
Enhancing Supply Chain Visibility with Generative AI: An Exploratory Case Study on Relationship Prediction in Knowledge Graphs |
| DiffStyleTTS:基于扩散的多层次韵律建模,用于多样化且可控风格的文本转语音 |
Jiaxuan Liu |
PDF |
N/A |
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles |
| 通信成本预算下的分层联邦学习的响应式编排 |
Ivan Čilić |
PDF |
N/A |
Reactive Orchestration for Hierarchical Federated Learning Under a Communication Cost Budget |
| 使用改进的中位数估计的经典影子方法 |
Winston Fu |
PDF |
N/A |
Classical Shadows with Improved Median-of-Means Estimation |
| 使用Transformer进行体积映射 -- 具有长程交互的超分辨率网络 |
August Leander Høeg |
PDF |
N/A |
Mapping using Transformers for Volumes -- Network for Super-Resolution with Long-Range Interactions |
| 体积一致的三维高斯光栅化 |
Chinmay Talegaonkar |
PDF |
N/A |
Volumetrically Consistent 3D Gaussian Rasterization |
| 具有Universum数据的粒球双支持向量机 |
M. A. Ganaie |
PDF |
N/A |
Granular Ball Twin Support Vector Machine with Universum Data |
| SGSST:缩放高斯喷溅风格转移 |
Bruno Galerne |
PDF |
N/A |
SGSST: Scaling Gaussian Splatting StyleTransfer |
| WiS平台:通过基于游戏的分析增强基于大语言模型的多智能体系统的评估 |
Chengwei Hu |
PDF |
N/A |
WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis |
| TASR:用于图像超分辨率的时步感知扩散模型 |
Qinwei Lin |
PDF |
N/A |
TASR: Timestep-Aware Diffusion Model for Image Super-Resolution |
| 使用基于极正弦的分段畸变进行直观轴向增强以用于医学逐层分割 |
Yiqin Zhang |
PDF |
N/A |
Intuitive Axial Augmentation Using Polar-Sine-Based Piecewise Distortion for Medical Slice-Wise Segmentation |
| 更公平的分析和人口统计平衡的人脸生成,以实现更公平的人脸验证 |
Alexandre Fournier-Montgieux |
PDF |
N/A |
Fairer Analysis and Demographically Balanced Face Generation for Fairer Face Verification |
| DIVE:驯服DINO以实现主题驱动的视频编辑 |
Yi Huang |
PDF |
N/A |
DIVE: Taming DINO for Subject-Driven Video Editing |
| 通过可能性探索微调提升大型语言模型的语言多样性 |
Long Mai |
PDF |
N/A |
Improving Linguistic Diversity of Large Language Models with Possibility Exploration Fine-Tuning |
| UniVAD:一种无需训练的少样本视觉异常检测统一模型 |
Zhaopeng Gu |
PDF |
N/A |
UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection |
| AI驱动的日常路线选择 |
Leizhen Wang |
PDF |
N/A |
AI-Driven Day-to-Day Route Choice |
| 扬卡里:一个单语约鲁巴语数据集 |
Maro Akpobi |
PDF |
N/A |
Yankari: A Monolingual Yoruba Dataset |
| 关于 $\ell_2^2$ 最小和聚类的近似性 |
Karthik C. S. |
PDF |
N/A |
On Approximability of $\ell_2^2$ Min-Sum Clustering |
| LuxEmbedder:一种增强卢森堡语句子嵌入的跨语言方法 |
Fred Philippy |
PDF |
N/A |
LuxEmbedder: A Cross-Lingual Approach to Enhanced Luxembourgish Sentence Embeddings |
| 具有弱耦合约束的多动作无休止强盗:同时学习和控制 |
Jing Fu |
PDF |
N/A |
Multi-Action Restless Bandits with Weakly Coupled Constraints: Simultaneous Learning and Control |
| 及时行动,事半功倍:小型视觉语言模型是加速大型视觉语言模型的精准指南 |
Wangbo Zhao |
PDF |
N/A |
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for accelerating Large VLMs |
| 可扩展的贝叶斯张量环分解用于多路数据分析 |
Zerui Tao |
PDF |
N/A |
Scalable Bayesian Tensor Ring Factorization for Multiway Data Analysis |
| 使用物理约束合成数据进行与域无关的脑卒中病变分割 |
Liam Chalcroft |
PDF |
N/A |
Domain-Agnostic Stroke Lesion Segmentation Using Physics-Constrained Synthetic Data |
| 餐巾纸上的FlashAttention:深度学习IO感知图解法 |
Vincent Abbott |
PDF |
N/A |
FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness |
| 几何引导的多视角扩散用于一对多跨视角图像合成 |
Tao Jun Lin |
PDF |
N/A |
Geometry-guided Cross-view Diffusion for One-to-many Cross-view Image Synthesis |
| 基于图像重建的等变表示学习用于增强型自监督学习 |
Qin Wang |
PDF |
N/A |
Equivariant Representation Learning for Augmentation-based Self-Supervised Learning via Image Reconstruction |
| 路径引导的基于粒子的采样 |
Mingzhou Fan |
PDF |
N/A |
Path-Guided Particle-based Sampling |
| 为形式化方法设计的轻量级图示语言设计 |
Siddhartha Prasad |
PDF |
N/A |
Grounded Language Design for Lightweight Diagramming for Formal Methods |
| 用户行为类型学:网络复杂搜索会话的探索性研究 |
Claire Ibarboure |
PDF |
N/A |
Typologie des comportements utilisateurs : {é}tude exploratoire des sessions de recherche complexe sur le Web |
| 在恶劣天气条件下,利用图神经网络进行共享单车需求预测的上下文数据集成 |
Romain Rochas |
PDF |
N/A |
Contextual Data Integration for Bike-sharing Demand Prediction with Graph Neural Networks in Degraded Weather Conditions |
| 全球MMLU:理解和解决多语言评估中的文化和语言偏见 |
Shivalika Singh |
PDF |
N/A |
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation |
| 通过触觉和声音向机器人传达情感 |
Qiaoqiao Ren |
PDF |
N/A |
Conveying Emotions to Robots through Touch and Sound |
| 高斯过程用于地震地面震动概率估计:一维概念验证 |
Sam A. Scivier |
PDF |
N/A |
Gaussian Processes for Probabilistic Estimates of Earthquake Ground Shaking: A 1-D Proof-of-Concept |
| 无训练域转换的组合图像检索 |
Nikos Efthymiadis |
PDF |
N/A |
Composed Image Retrieval for Training-Free Domain Conversion |
| 扩散-VLA:通过统一的扩散和自回归扩展机器人基础模型 |
Junjie Wen |
PDF |
N/A |
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression |
| 将生成式人工智能融入艺术治疗:技术展示 |
Yannis Valentin Schmutz |
PDF |
N/A |
Integrating Generative AI into Art Therapy: A Technical Showcase |
| 针对扩散模型的语义水印的Black-Box伪造攻击 |
Andreas Müller |
PDF |
N/A |
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models |
| AntLM:连接因果语言模型与掩码语言模型 |
Xinru Yu |
PDF |
N/A |
AntLM: Bridging Causal and Masked Language Models |
| 使用神经跳跃常微分方程的非参数滤波、估计与分类 |
Jakob Heiss |
PDF |
N/A |
Nonparametric Filtering, Estimation and Classification using Neural Jump ODEs |
| 基于意图的上下文学习在少样本对话状态跟踪中的应用 |
Zihao Yi |
PDF |
N/A |
Intent-driven In-context Learning for Few-shot Dialogue State Tracking |
| RFSR:通过奖励反馈学习改进图像超分辨率扩散模型 |
Xiaopeng Sun |
PDF |
N/A |
RFSR: Improving ISR Diffusion Models via Reward Feedback Learning |
| 使用手机和设备上的IConNet检测异常心音 |
Linh Vu |
PDF |
N/A |
Detecting abnormal heart sound using mobile phones and on-device IConNet |
| 在野外环境下的NeRF和Gaussian Splatting SLAM |
Fabian Schmidt |
PDF |
N/A |
NeRF and Gaussian Splatting SLAM in the Wild |
| JPEG AI会改变图像取证吗? |
Edoardo Daniele Cannas |
PDF |
N/A |
Is JPEG AI going to change image forensics? |
| GERD:几何事件响应数据生成 |
Jens Egholm Pedersen |
PDF |
N/A |
GERD: Geometric event response data generation |
| 单模态学习:解决离线强化学习中的多模态问题 |
Mianchu Wang |
PDF |
N/A |
Learning on One Mode: Addressing Multi-Modality in Offline Reinforcement Learning |
| 动态控制:改进文本到图像生成的自适应条件选择 |
Qingdong He |
PDF |
N/A |
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation |
| 预训练阶段的校准!致力于阿拉伯语大型语言模型的本地化校准 |
Juhao Liang |
PDF |
N/A |
Alignment at Pre-training! Towards Native Alignment for Arabic LLMs |
| 变速度教学回放作为模仿学习的现实世界数据增强 |
Nozomu Masuya |
PDF |
N/A |
Variable-Speed Teaching-Playback as Real-World Data Augmentation for Imitation Learning |
| 控制大型语言模型中的变异以实现算法的有效进化 |
Haoran Yin |
PDF |
N/A |
Controlling the Mutation in Large Language Models for the Efficient Evolution of Algorithms |
| 目标:通过令牌合并和剪枝实现多模态大型语言模型的自适应推理 |
Yiwu Zhong |
PDF |
N/A |
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning |
| 在英语-俄语时尚语料库上对ChatGPT的术语构建能力进行基准测试 |
Anastasiia Bezobrazova |
PDF |
N/A |
Benchmarking terminology building capabilities of ChatGPT on an English-Russian Fashion Corpus |
| 任务驱动的图像融合与可学习的融合损失 |
Haowen Bai |
PDF |
N/A |
Task-driven Image Fusion with Learnable Fusion Loss |
| 动态一致的 $k$ 中心聚类与最优调整 |
Sebastian Forster |
PDF |
N/A |
Dynamic Consistent $k$-Center Clustering with Optimal Recourse |
| 大型语言模型的安全培训是否能推广到语义相关的自然提示? |
Sravanti Addepalli |
PDF |
N/A |
Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts? |
| PERL:拼音增强的中文ASR N-best错误修正语言模型 |
Junhong Liang |
PDF |
N/A |
PERL: Pinyin Enhanced Rephrasing Language Model for Chinese ASR N-best Error Correction |
| 材料选择器:基于扩散变换器的多模态材料生成 |
Xiaohe Ma |
PDF |
N/A |
MaterialPicker: Multi-Modal Material Generation with Diffusion Transformers |
| 通道反射:基于知识的脑电图数据增强技术用于脑机接口 |
Ziwei Wang |
PDF |
N/A |
Channel Reflection: Knowledge-Driven Data Augmentation for EEG-Based Brain-Computer Interfaces |
| Linq-Embed-Mistral 技术报告 |
Chanyeol Choi |
PDF |
N/A |
Linq-Embed-Mistral Technical Report |
| 不同大型语言模型架构的调查:趋势、基准测试与挑战 |
Minghao Shao |
PDF |
N/A |
Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges |
| 超越[cls]:探索掩码图像建模表示的真正潜力 |
Marcin Przewięźlikowski |
PDF |
N/A |
Beyond [cls]: Exploring the true potential of Masked Image Modeling representations |
| 连续低秩缩放点积注意力 |
Ginés Carreto Picón |
PDF |
N/A |
Continual Low-Rank Scaled Dot-product Attention |
| ClusterKV:在语义空间中操作LLM KV缓存以实现可召回的压缩 |
Guangda Liu |
PDF |
N/A |
ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression |
| 半监督迁移提升(SS-TrBoosting) |
Lingfei Deng |
PDF |
N/A |
Semi-Supervised Transfer Boosting (SS-TrBoosting) |
| 感知网络的参数增强:一种人类启发的方法用于图像质量评估 |
Jorge Vila-Tomás |
PDF |
N/A |
Parametric Enhancement of PerceptNet: A Human-Inspired Approach for Image Quality Assessment |
| U-MATH:一个用于评估大型语言模型中数学技能的大学水平基准 |
Konstantin Chernyshev |
PDF |
N/A |
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs |
| Fab-ME:一种用于织物缺陷检测的视觉状态空间和注意力增强框架 |
Shuai Wang |
PDF |
N/A |
Fab-ME: A Vision State-Space and Attention-Enhanced Framework for Fabric Defect Detection |
| 生物启发式半监督语义分割在生物医学成像中的应用 |
Luca Ciampi |
PDF |
N/A |
Biologically-inspired Semi-supervised Semantic Segmentation for Biomedical Imaging |
| 具有集成拒绝选项的节点分类 |
Uday Bhaskar |
PDF |
N/A |
Node Classification With Integrated Reject Option |
| 时空图神经网络的半去中心化训练用于交通预测 |
Ivan Kralj |
PDF |
N/A |
Semi-decentralized Training of Spatio-Temporal Graph Neural Networks for Traffic Prediction |
| 加权奖励偏好优化用于隐式模型融合 |
Ziyi Yang |
PDF |
N/A |
Weighted-Reward Preference Optimization for Implicit Model Fusion |
| 通过多任务一致性和优先级优化密集视觉预测 |
Maxime Fontana |
PDF |
N/A |
Optimizing Dense Visual Predictions Through Multi-Task Coherence and Prioritization |
| 走向理解和量化文本到图像生成的模糊性 |
Gianni Franchi |
PDF |
N/A |
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation |
| PatchDPO:用于无微调个性化图像生成的补丁级DPO |
Qihan Huang |
PDF |
N/A |
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation |
| 结合医学语言模型和本体论的西班牙语临床笔记疾病自动检测 |
Leon-Paul Schaub Torre |
PDF |
N/A |
Automatic detection of diseases in Spanish clinical notes combining medical language models and ontologies |
| IRisPath:通过鲁棒的IR-RGB融合增强越野导航,提升昼夜通行能力 |
Saksham Sharma |
PDF |
N/A |
IRisPath: Enhancing Off-Road Navigation with Robust IR-RGB Fusion for Improved Day and Night Traversability |
| 解释有用吗?皮肤病变分类器中可解释性方法的比较分析 |
Rosa Y. G. Paccotacya-Yanque |
PDF |
N/A |
Are Explanations Helpful? A Comparative Analysis of Explainability Methods in Skin Lesion Classifiers |
| 用于求解偏微分方程逆问题的物理信息深度逆算子网络 |
Sung Woong Cho |
PDF |
N/A |
Physics-Informed Deep Inverse Operator Networks for Solving PDE Inverse Problems |
| 字节BPE分词作为逆字符串同态映射 |
Saibo Geng |
PDF |
N/A |
Byte BPE Tokenization as an Inverse string Homomorphism |
| 多层次关联网络用于少样本图像分类 |
Yunkai Dang |
PDF |
N/A |
Multi-Level Correlation Network For Few-Shot Image Classification |
| LEP-QNN:使用量子神经网络进行贷款资格预测 |
Nouhaila Innan |
PDF |
N/A |
LEP-QNN: Loan Eligibility Prediction Using Quantum Neural Networks |
| 测试神经网络验证器:一个带有隐藏反例的健全性基准 |
Xingjian Zhou |
PDF |
N/A |
Testing Neural Network Verifiers: A Soundness Benchmark with Hidden Counterexamples |
| 自动化指标系统依赖性度量 |
Pius von Däniken |
PDF |
N/A |
A Measure of the System Dependence of Automated Metrics |
| 大型语言模型展现出与人类相媲美的个体和集体创造力。 |
Luning Sun |
PDF |
N/A |
Large Language Models show both individual and collective creativity comparable to humans |
| 基于示例的语义图像合成中的外观匹配适配器 |
Siyoon Jin |
PDF |
N/A |
Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis |
| 社交媒体上的细粒度行为模拟与角色扮演大型语言模型 |
Kun Li |
PDF |
N/A |
Fine-Grained Behavior Simulation with Role-Playing Large Language Model on Social Media |
| 单纯复形上的拓扑轨迹分类与地标推断 |
Vincent P. Grande |
PDF |
N/A |
Topological Trajectory Classification and Landmark Inference on Simplicial Complexes |
| 具有调整偏移量噪声的广义扩散模型 |
Takuro Kutsuna |
PDF |
N/A |
Generalized Diffusion Model with Adjusted Offset Noise |
| 统一大型语言模型的KV缓存压缩与LeanKV |
Yanqi Zhang |
PDF |
N/A |
Unifying KV Cache Compression for Large Language Models with LeanKV |
| 短距离光通信:神经形态硬件的现实应用任务 |
Elias Arnold |
PDF |
N/A |
Short-reach Optical Communications: A Real-world Task for Neuromorphic Hardware |
| 将可编程可塑性整合到模拟神经形态硬件的实验描述中 |
Philipp Spilger |
PDF |
N/A |
Integrating programmable plasticity in experiment descriptions for analog neuromorphic hardware |
| 基于大语言模型的鲁棒多比特文本水印 |
Xiaojun Xu |
PDF |
N/A |
Robust Multi-bit Text Watermark with LLM-based Paraphrasers |
| 《Splats中的Splats:在高斯喷溅中嵌入隐形3D水印》 |
Yijia Guo |
PDF |
N/A |
Splats in Splats: Embedding Invisible 3D Watermark within Gaussian Splatting |
| 用于顺序组合最优传输的Sinkhorn算法 |
Kazuki Watanabe |
PDF |
N/A |
Sinkhorn Algorithm for Sequentially Composed Optimal Transports |
| ObjectFinder:面向盲人互动物体搜索的开放词汇辅助系统 |
Ruiping Liu |
PDF |
N/A |
ObjectFinder: Open-Vocabulary Assistive System for Interactive Object Search by Blind People |
| 基于经验的规划策略发现 |
Ruiqi He |
PDF |
N/A |
Experience-driven discovery of planning strategies |
| CredID:可信的多比特水印用于大型语言模型识别 |
Haoyu Jiang |
PDF |
N/A |
CredID: Credible Multi-Bit Watermark for Large Language Models Identification |
| 在条件生成对抗网络中使用自适应权重掩码进行少样本学习 |
Jiacheng Hu |
PDF |
N/A |
Few-Shot Learning with Adaptive Weight Masking in Conditional GANs |
| ChatTS:通过合成数据将时间序列与LLMs对齐,以增强理解和推理能力 |
Zhe Xie |
PDF |
N/A |
ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning |
| MultiGO:面向单目三维纹理人体重建的多层次几何学习 |
Gangjian Zhang |
PDF |
N/A |
MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction |
| 用于平面视频实时立体转换的轻量级多平面图像网络 |
Shanding Diao |
PDF |
N/A |
Lightweight Multiplane Images Network for Real-Time Stereoscopic Conversion from Planar Video |
| 一个每层都至关重要的惊喜预言者 |
Xudong Hong |
PDF |
N/A |
A surprisal oracle for when every layer counts |
| 利用图神经网络(GNNs)增强推荐系统并解决过平滑问题 |
Wenyi Liu |
PDF |
N/A |
Enhancing Recommendation Systems with GNNs and Addressing Over-Smoothing |
| TOOL-ED:利用LLM的工具调用能力增强共情响应生成 |
Huiying Cao |
PDF |
N/A |
TOOL-ED: Enhancing Empathetic Response Generation with the Tool Calling Capability of LLM |
| 使用基于共识的估计和近似恒定速度建模进行分散式移动目标跟踪 |
Amir Ahmad Ghods |
PDF |
N/A |
Decentralized Mobile Target Tracking Using Consensus-Based Estimation with Nearly-Constant-Velocity Modeling |
| 通过一个强大的基于CLIP的编码器扩展事件模态应用 |
Sungheon Jeong |
PDF |
N/A |
Expanding Event Modality Applications through a Robust CLIP-Based Encoder |
| Revolve:通过跟踪文本优化中的响应演变来优化AI系统 |
Peiyan Zhang |
PDF |
N/A |
Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization |
| Mimir:提升视频扩散模型以实现精确的文本理解 |
Shuai Tan |
PDF |
N/A |
Mimir: Improving Video Diffusion Models for Precise Text Understanding |
| 基于混合深度学习的肝细胞癌癌变分级策略,用于H&E染色肝脏组织病理学图像的分类 |
Ajinkya Deshpande |
PDF |
N/A |
Hybrid deep learning-based strategy for the hepatocellular carcinoma cancer grade classification of H&E stained liver histopathology images |
| 一种基于近似SRBB的酉合成可扩展量子神经网络 |
Giacomo Belli |
PDF |
N/A |
A Scalable Quantum Neural Network for Approximate SRBB-Based Unitary Synthesis |
| Align3R:动态视频的对齐单目深度估计 |
Jiahao Lu |
PDF |
N/A |
Align3R: Aligned Monocular Depth Estimation for Dynamic Videos |
| RoDyGS:用于随意视频的鲁棒动态高斯光栅化技术 |
Yoonwoo Jeong |
PDF |
N/A |
RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos |
| 协调多臂老虎机以提升Wi-Fi中的空间重用 |
Francesc Wilhelmi |
PDF |
N/A |
Coordinated Multi-Armed Bandits for Improved Spatial Reuse in Wi-Fi |
| ASR-EC基准测试:评估大型语言模型在中文语音识别错误纠正上的表现 |
Victor Junqiu Wei |
PDF |
N/A |
ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction |
| 使用自监督学习模型对无文本语音合成原始音频的分析研究 |
Joonyong Park |
PDF |
N/A |
Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model |
| 基于偏好的可微分游戏对手塑造 |
Xinyu Qiao |
PDF |
N/A |
Preference-based opponent shaping in differentiable games |
| TokenFlow:统一的多模态理解和生成图像Token器 |
Liao Qu |
PDF |
N/A |
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation |
| UTSD:统一时间序列扩散模型 |
Xiangkai Ma |
PDF |
N/A |
UTSD: Unified Time Series Diffusion Model |
| 通过混合变形实现轻量级随机视频预测 |
Kazuki Kotoyori |
PDF |
N/A |
Lightweight Stochastic Video Prediction via Hybrid Warping |
| CLAP:通过曲率采样和原型学习实现融合3D感知的无监督3D表示学习 |
Runjian Chen |
PDF |
N/A |
CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning |
| 重新审视基于能量的模型用于分布外检测 |
Yifan Wu |
PDF |
N/A |
Revisiting Energy-Based Model for Out-of-Distribution Detection |
| Point-GN:一种使用高斯位置编码的非参数网络,用于点云分类 |
Marzieh Mohammadi |
PDF |
N/A |
Point-GN: A Non-Parametric Network Using Gaussian Positional Encoding for Point Cloud Classification |
| 通过边缘-云协作实现无人机天线干扰检测的实时AIoT |
Jun Dong |
PDF |
N/A |
Real-Time AIoT for UAV Antenna Interference Detection via Edge-Cloud Collaboration |
| 趋势:通过时间预测进行无监督三维表示学习的激光雷达感知 |
Runjian Chen |
PDF |
N/A |
TREND: Unsupervised 3D Representation Learning via Temporal Forecasting for LiDAR Perception |
| 点-GR:用于三维物体分类和分割的图残差点云网络 |
Md Meraz |
PDF |
N/A |
Point-GR: Graph Residual Point Cloud Network for 3D Object Classification and Segmentation |
| 少即是多:一种针对基于深度强化学习的自动驾驶策略的隐秘且高效的对抗攻击方法 |
Junchao Fan |
PDF |
N/A |
Less is More: A Stealthy and Efficient Adversarial Attack Method for DRL-based Autonomous Driving Policies |
| 基于骨架的视频异常检测的扰动训练频率引导扩散模型 |
Xiaofeng Tan |
PDF |
N/A |
Frequency-Guided Diffusion Model with Perturbation Training for Skeleton-Based Video Anomaly Detection |
| MRNet:用于医学图像到图像翻译的多方面弹性网络 |
Hyojeong Lee |
PDF |
N/A |
MRNet: Multifaceted Resilient Networks for Medical Image-to-Image Translation |
| MILLION:一种具有可控风险的多目标通用框架,用于投资组合管理 |
Liwei Deng |
PDF |
N/A |
MILLION: A General Multi-Objective Framework with Controllable Risk for Portfolio Management |
| 扇形束CT重建用于未对齐的稀疏视图X射线行李数据集 |
Shin Kim |
PDF |
N/A |
Fan-Beam CT Reconstruction for Unaligned Sparse-View X-ray Baggage Dataset |
| 从格兰杰因果关系的角度看梯度下降及其在剪枝中的应用 |
Aditya Shah |
PDF |
N/A |
A Granger-Causal Perspective on Gradient Descent with Application to Pruning |
| 系统中神经网络的规范生成 |
Isha Chaudhary |
PDF |
N/A |
Specification Generation for Neural Networks in Systems |
| 时间序列单细胞RNA-seq表达数据的时间戳校准 |
Xiran Chen |
PDF |
N/A |
Timestamp calibration for time-series single cell RNA-seq expression data |
| ASIGN:一种用于三维空间转录组学的解剖学感知空间插补图形网络 |
Junchao Zhu |
PDF |
N/A |
ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics |
| 人类变异性与机器一致性:人类和大型语言模型生成文本的语言学分析 |
Sergio E. Zanotto |
PDF |
N/A |
Human Variability vs. Machine Consistency: A Linguistic Analysis of Texts Generated by Humans and Large Language Models |