Arxiv 2024-12-04 Papers

标题	作者	PDF链接	代码仓库	Title
导航世界模型	Amir Bar	PDF	N/A	Navigation World Models
Style3D：面向3D物体生成的注意力引导多视角风格迁移	Bingjie Song	PDF	N/A	Style3D: Attention-guided Multi-view Style Transfer for 3D Object Generation
通过生成合成分析实现稀疏视图姿态估计与重建	Qitao Zhao	PDF	N/A	Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis
《黑客帝国：无限地平线世界生成与实时移动控制》	Ruili Feng	PDF	N/A	The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control
查询事件开始的流式检测	Cristobal Eyzaguirre	PDF	N/A	Streaming Detection of Queried Event Start
FreeSim：在驾驶场景中实现自由视角相机模拟	Lue Fan	PDF	N/A	FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes
Inst-IT：通过显式视觉提示指令调优提升多模态实例理解	Wujian Peng	PDF	N/A	Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
从个体到社会：基于大型语言模型代理的社会模拟调查	Xinyi Mou	PDF	N/A	From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents
FLAIR：具有细粒度语言引导图像表示的视觉语言模型	Rui Xiao	PDF	N/A	FLAIR: VLM with Fine-grained Language-informed Image Representations
MIDI：用于单张图像生成3D场景的多实例扩散	Zehuan Huang	PDF	N/A	MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
最佳N次越狱	John Hughes	PDF	N/A	Best-of-N Jailbreaking
PaliGemma 2：多功能 VLM 家族，助力迁移	Andreas Steiner	PDF	N/A	PaliGemma 2: A Family of Versatile VLMs for Transfer
Imagine360：从视角锚点生成沉浸式360度视频	Jing Tan	PDF	N/A	Imagine360: Immersive 360 Video Generation from Perspective Anchor
感知令牌增强多模态语言模型中的视觉推理能力	Mahtab Bigverdi	PDF	N/A	Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
NODE-AdvGAN：通过动态系统驱动的对抗生成模型提升对抗样本的迁移性和感知相似性	Xinheng Xie	PDF	N/A	NODE-AdvGAN: Improving the transferability and perceptual similarity of adversarial examples by dynamic-system-driven adversarial generative model
评估预训练语言模型与提示适应模型之间的性别偏见传递	Natalie Mackraz	PDF	N/A	Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models
关于利用大型语言模型在生物医学科学中进行科学知识提取的综述	Gabriel Lino Garcia	PDF	N/A	A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences
FANAL -- 金融活动新闻警报语言建模框架	Urjitkumar Patel	PDF	N/A	FANAL -- Financial Activity News Alerting Language Modeling Framework
单目视频动态场景的前馈子弹时间重建	Hanxue Liang	PDF	N/A	Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos
超越视角：基于全局注意力的多视角驾驶场景视频生成	Hannan Lu	PDF	N/A	Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention
受卷帘快门影响的光场图像密集场景重建	Hermes McGriff	PDF	N/A	Dense Scene Reconstruction from Light-Field Images Affected by Rolling Shutter
NVComposer：利用多张稀疏且未对齐的图像提升生成新视角合成效果	Lingen Li	PDF	N/A	NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
你（不）是我的菜——大型语言模型能否为初级编程任务生成特定类型的反馈？	Dominic Lohr	PDF	N/A	You're (Not) My Type -- Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks?
将扩散模型蒸馏为高效的3D LiDAR场景补全	Shengyuan Zhang	PDF	N/A	Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
KKLIP：利用K均值聚类的知识蒸馏技术进行语言-图像预训练	Kuei-Chun Kao	PDF	N/A	KKLIP: Knowledge Distillation Exploiting K-means Clustering for Language-Image Pre-Training
扩散特征的蒸馏用于语义对应	Frank Fundel	PDF	N/A	Distillation of Diffusion Features for Semantic Correspondence
用于学习弱形式算子和梯度流的自我测试损失函数	Yuan Gao	PDF	N/A	Self-test loss functions for learning weak-form operators and gradient flows
使用身体标志进行精确步态识别的双向孪生循环神经网络	Proma Hossain Progga	PDF	N/A	A Bidirectional Siamese Recurrent Neural Network for Accurate Gait Recognition Using Body Landmarks
软校验和标记不可信的机器学习代理预测及其在原子物理模拟中的应用	Casey Lauer	PDF	N/A	Soft Checksums to Flag Untrustworthy Machine Learning Surrogate Predictions and Application to Atomic Physics Simulations
TRENDy：有效非线性动力学的时间回归	Matthew Ricci	PDF	N/A	TRENDy: Temporal Regression of Effective Non-linear Dynamics
超越算法超参数：关于机器学习应用中的预处理超参数及其相关陷阱	Christina Sauer	PDF	N/A	Beyond algorithm hyperparameters: on preprocessing hyperparameters and associated pitfalls in machine learning applications
在目标检测的背景下，语义信息与深度信息的融合	Md Abu Yusuf	PDF	N/A	Data Fusion of Semantic and Depth Information in the Context of Object Detection
流匹配与一般离散路径：一种动力学最优视角	Neta Shaul	PDF	N/A	Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective
紧密的PAC-贝叶斯风险证书用于对比学习	Anna van Elst	PDF	N/A	Tight PAC-Bayesian Risk Certificates for Contrastive Learning
卷积神经网络与专家混合模型在5G网络及未来网络入侵检测中的应用	Loukas Ilias	PDF	N/A	Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond
Urban4D：城市场景重建的语义引导4D高斯喷洒技术	Ziwen Li	PDF	N/A	Urban4D: Semantic-Guided 4D Gaussian Splatting for Urban Scene Reconstruction
测量一切：基于视觉的实时多阶段尺寸测量，利用分割一切技术	Yongkyu Lee	PDF	N/A	Measure Anything: Real-time, Multi-stage Vision-based Dimensional Measurement using Segment Anything
聚类特定表示学习	Mahalakshmi Sabanayagam	PDF	N/A	Cluster Specific Representation Learning
无训练的语言推理能力在多模态指令调优后的缓解	Neale Ratzlaff	PDF	N/A	Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning
YT-30M：一个多语言多类别的YouTube评论数据集	Hridoy Sankar Dutta	PDF	N/A	YT-30M: A multi-lingual multi-category dataset of YouTube comments
一致性CUSUM程序的有效性与效率	Vladimir Vovk	PDF	N/A	Validity and efficiency of the conformal CUSUM procedure
艺术品中的手势分类利用上下文图像特征	Azhar Hussian	PDF	N/A	Gesture Classification in Artworks Using Contextual Image Features
预训练的多潜在变量生成模型是抵御对抗攻击的良好防御者	Dario Serez	PDF	N/A	Pre-trained Multiple Latent Variable Generative Models are good defenders against Adversarial Attacks
平面喷涂：3分钟内精确的平面表面重建	Bin Tan	PDF	N/A	PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes
从文字到流程：自动化业务流程	Laura Minkova	PDF	N/A	From Words to Workflows: Automating Business Processes
状态频率估计用于异常检测	Clinton Cao	PDF	N/A	State Frequency Estimation for Anomaly Detection
PBP：恶意软件分类器的后训练后门净化	Dung Thuy Nguyen	PDF	N/A	PBP: Post-training Backdoor Purification for Malware Classifiers
CleanDIFT：无噪声的扩散特征	Nick Stracke	PDF	N/A	CleanDIFT: Diffusion Features without Noise
BIMCaP：基于BIM的AI辅助激光雷达-相机姿态优化	Miguel Arturo Vega Torres	PDF	N/A	BIMCaP: BIM-based AI-supported LiDAR-Camera Pose Refinement
基于遗传算法的系统用于在单元网格环境中进行无人机群的路径规划	Alejandro Puente-Castro	PDF	N/A	Genetic Algorithm Based System for Path Planning with Unmanned Aerial Vehicles Swarms in Cell-Grid Environments
歌手：基于Vivid音频驱动的歌唱视频生成与多尺度谱扩散模型	Yan Li	PDF	N/A	SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model
2DGS-Room：基于种子引导的二维高斯喷洒与几何约束的高保真室内场景重建	Wanting Zhang	PDF	N/A	2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction
评估基础模型在精准医学中对生理信号的迁移能力	Matthias Christenson	PDF	N/A	Assessing Foundation Models' Transferability to Physiological Signals in Precision Medicine
探戈*：利用化学信息价值函数的约束合成规划	Daniel Armstrong	PDF	N/A	Tango*: Constrained synthesis planning using chemically informed value functions
使用模型推理搜索启发式方法自动生成REST API的测试用例	Clinton Cao	PDF	N/A	Automated Test-Case Generation for REST APIs Using Model Inference Search Heuristic
从物联网数据中学习语义关联规则	Erkan Karabulut	PDF	N/A	Learning Semantic Association Rules from Internet of Things Data
云遮挡下海表温度重建的深度学习方法	Andrea Asperti	PDF	N/A	Deep Learning for Sea Surface Temperature Reconstruction under Cloud Occlusion
PrefixKV：自适应前缀KV缓存是视觉指令跟随模型高效生成所需的关键	Ao Wang	PDF	N/A	PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation
Skel3D：骨骼引导的新视角合成	Aron Fóthi	PDF	N/A	Skel3D: Skeleton Guided Novel View Synthesis
深度算子BSDE：一种近似解算子的数值方案	Giulia Di Nunno	PDF	N/A	Deep Operator BSDE: a Numerical Scheme to Approximate the Solution Operators
基准测试用于机器人辅助食管切除术实时识别的预训练注意力模型	Ronald L. P. D. de Jong	PDF	N/A	Benchmarking Pretrained Attention-based Models for Real-Time Recognition in Robot-Assisted Esophagectomy
通过目标标记调整在稳定扩散中进行隐式先验编辑	Feng He	PDF	N/A	Implicit Priors Editing in Stable Diffusion via Targeted Token Adjustment
RedStone：为大型语言模型策划通用、代码、数学和问答数据	Yaoyao Chang	PDF	N/A	RedStone: Curating General, Code, Math, and QA Data for Large Language Models
神经算子是否总能被连续离散化？	Takashi Furuya	PDF	N/A	Can neural operators always be continuously discretized?
通过不确定性量化实现风险感知分类	Murat Sensoy	PDF	N/A	Risk-aware Classification via Uncertainty Quantification
利用生成式人工智能增强供应链可见性：知识图谱中关系预测的探索性案例研究	Ge Zheng	PDF	N/A	Enhancing Supply Chain Visibility with Generative AI: An Exploratory Case Study on Relationship Prediction in Knowledge Graphs
DiffStyleTTS：基于扩散的多层次韵律建模，用于多样化且可控风格的文本转语音	Jiaxuan Liu	PDF	N/A	DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
通信成本预算下的分层联邦学习的响应式编排	Ivan Čilić	PDF	N/A	Reactive Orchestration for Hierarchical Federated Learning Under a Communication Cost Budget
使用改进的中位数估计的经典影子方法	Winston Fu	PDF	N/A	Classical Shadows with Improved Median-of-Means Estimation
使用Transformer进行体积映射 -- 具有长程交互的超分辨率网络	August Leander Høeg	PDF	N/A	Mapping using Transformers for Volumes -- Network for Super-Resolution with Long-Range Interactions
体积一致的三维高斯光栅化	Chinmay Talegaonkar	PDF	N/A	Volumetrically Consistent 3D Gaussian Rasterization
具有Universum数据的粒球双支持向量机	M. A. Ganaie	PDF	N/A	Granular Ball Twin Support Vector Machine with Universum Data
SGSST：缩放高斯喷溅风格转移	Bruno Galerne	PDF	N/A	SGSST: Scaling Gaussian Splatting StyleTransfer
WiS平台：通过基于游戏的分析增强基于大语言模型的多智能体系统的评估	Chengwei Hu	PDF	N/A	WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
TASR：用于图像超分辨率的时步感知扩散模型	Qinwei Lin	PDF	N/A	TASR: Timestep-Aware Diffusion Model for Image Super-Resolution
使用基于极正弦的分段畸变进行直观轴向增强以用于医学逐层分割	Yiqin Zhang	PDF	N/A	Intuitive Axial Augmentation Using Polar-Sine-Based Piecewise Distortion for Medical Slice-Wise Segmentation
更公平的分析和人口统计平衡的人脸生成，以实现更公平的人脸验证	Alexandre Fournier-Montgieux	PDF	N/A	Fairer Analysis and Demographically Balanced Face Generation for Fairer Face Verification
DIVE：驯服DINO以实现主题驱动的视频编辑	Yi Huang	PDF	N/A	DIVE: Taming DINO for Subject-Driven Video Editing
通过可能性探索微调提升大型语言模型的语言多样性	Long Mai	PDF	N/A	Improving Linguistic Diversity of Large Language Models with Possibility Exploration Fine-Tuning
UniVAD：一种无需训练的少样本视觉异常检测统一模型	Zhaopeng Gu	PDF	N/A	UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection
AI驱动的日常路线选择	Leizhen Wang	PDF	N/A	AI-Driven Day-to-Day Route Choice
扬卡里：一个单语约鲁巴语数据集	Maro Akpobi	PDF	N/A	Yankari: A Monolingual Yoruba Dataset
关于 $\ell_2^2$ 最小和聚类的近似性	Karthik C. S.	PDF	N/A	On Approximability of $\ell_2^2$ Min-Sum Clustering
LuxEmbedder：一种增强卢森堡语句子嵌入的跨语言方法	Fred Philippy	PDF	N/A	LuxEmbedder: A Cross-Lingual Approach to Enhanced Luxembourgish Sentence Embeddings
具有弱耦合约束的多动作无休止强盗：同时学习和控制	Jing Fu	PDF	N/A	Multi-Action Restless Bandits with Weakly Coupled Constraints: Simultaneous Learning and Control
及时行动，事半功倍：小型视觉语言模型是加速大型视觉语言模型的精准指南	Wangbo Zhao	PDF	N/A	A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for accelerating Large VLMs
可扩展的贝叶斯张量环分解用于多路数据分析	Zerui Tao	PDF	N/A	Scalable Bayesian Tensor Ring Factorization for Multiway Data Analysis
使用物理约束合成数据进行与域无关的脑卒中病变分割	Liam Chalcroft	PDF	N/A	Domain-Agnostic Stroke Lesion Segmentation Using Physics-Constrained Synthetic Data
餐巾纸上的FlashAttention：深度学习IO感知图解法	Vincent Abbott	PDF	N/A	FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness
几何引导的多视角扩散用于一对多跨视角图像合成	Tao Jun Lin	PDF	N/A	Geometry-guided Cross-view Diffusion for One-to-many Cross-view Image Synthesis
基于图像重建的等变表示学习用于增强型自监督学习	Qin Wang	PDF	N/A	Equivariant Representation Learning for Augmentation-based Self-Supervised Learning via Image Reconstruction
路径引导的基于粒子的采样	Mingzhou Fan	PDF	N/A	Path-Guided Particle-based Sampling
为形式化方法设计的轻量级图示语言设计	Siddhartha Prasad	PDF	N/A	Grounded Language Design for Lightweight Diagramming for Formal Methods
用户行为类型学：网络复杂搜索会话的探索性研究	Claire Ibarboure	PDF	N/A	Typologie des comportements utilisateurs : {é}tude exploratoire des sessions de recherche complexe sur le Web
在恶劣天气条件下，利用图神经网络进行共享单车需求预测的上下文数据集成	Romain Rochas	PDF	N/A	Contextual Data Integration for Bike-sharing Demand Prediction with Graph Neural Networks in Degraded Weather Conditions
全球MMLU：理解和解决多语言评估中的文化和语言偏见	Shivalika Singh	PDF	N/A	Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
通过触觉和声音向机器人传达情感	Qiaoqiao Ren	PDF	N/A	Conveying Emotions to Robots through Touch and Sound
高斯过程用于地震地面震动概率估计：一维概念验证	Sam A. Scivier	PDF	N/A	Gaussian Processes for Probabilistic Estimates of Earthquake Ground Shaking: A 1-D Proof-of-Concept
无训练域转换的组合图像检索	Nikos Efthymiadis	PDF	N/A	Composed Image Retrieval for Training-Free Domain Conversion
扩散-VLA：通过统一的扩散和自回归扩展机器人基础模型	Junjie Wen	PDF	N/A	Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
将生成式人工智能融入艺术治疗：技术展示	Yannis Valentin Schmutz	PDF	N/A	Integrating Generative AI into Art Therapy: A Technical Showcase
针对扩散模型的语义水印的Black-Box伪造攻击	Andreas Müller	PDF	N/A	Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models
AntLM：连接因果语言模型与掩码语言模型	Xinru Yu	PDF	N/A	AntLM: Bridging Causal and Masked Language Models
使用神经跳跃常微分方程的非参数滤波、估计与分类	Jakob Heiss	PDF	N/A	Nonparametric Filtering, Estimation and Classification using Neural Jump ODEs
基于意图的上下文学习在少样本对话状态跟踪中的应用	Zihao Yi	PDF	N/A	Intent-driven In-context Learning for Few-shot Dialogue State Tracking
RFSR：通过奖励反馈学习改进图像超分辨率扩散模型	Xiaopeng Sun	PDF	N/A	RFSR: Improving ISR Diffusion Models via Reward Feedback Learning
使用手机和设备上的IConNet检测异常心音	Linh Vu	PDF	N/A	Detecting abnormal heart sound using mobile phones and on-device IConNet
在野外环境下的NeRF和Gaussian Splatting SLAM	Fabian Schmidt	PDF	N/A	NeRF and Gaussian Splatting SLAM in the Wild
JPEG AI会改变图像取证吗？	Edoardo Daniele Cannas	PDF	N/A	Is JPEG AI going to change image forensics?
GERD：几何事件响应数据生成	Jens Egholm Pedersen	PDF	N/A	GERD: Geometric event response data generation
单模态学习：解决离线强化学习中的多模态问题	Mianchu Wang	PDF	N/A	Learning on One Mode: Addressing Multi-Modality in Offline Reinforcement Learning
动态控制：改进文本到图像生成的自适应条件选择	Qingdong He	PDF	N/A	DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
预训练阶段的校准！致力于阿拉伯语大型语言模型的本地化校准	Juhao Liang	PDF	N/A	Alignment at Pre-training! Towards Native Alignment for Arabic LLMs
变速度教学回放作为模仿学习的现实世界数据增强	Nozomu Masuya	PDF	N/A	Variable-Speed Teaching-Playback as Real-World Data Augmentation for Imitation Learning
控制大型语言模型中的变异以实现算法的有效进化	Haoran Yin	PDF	N/A	Controlling the Mutation in Large Language Models for the Efficient Evolution of Algorithms
目标：通过令牌合并和剪枝实现多模态大型语言模型的自适应推理	Yiwu Zhong	PDF	N/A	AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
在英语-俄语时尚语料库上对ChatGPT的术语构建能力进行基准测试	Anastasiia Bezobrazova	PDF	N/A	Benchmarking terminology building capabilities of ChatGPT on an English-Russian Fashion Corpus
任务驱动的图像融合与可学习的融合损失	Haowen Bai	PDF	N/A	Task-driven Image Fusion with Learnable Fusion Loss
动态一致的 $k$ 中心聚类与最优调整	Sebastian Forster	PDF	N/A	Dynamic Consistent $k$-Center Clustering with Optimal Recourse
大型语言模型的安全培训是否能推广到语义相关的自然提示？	Sravanti Addepalli	PDF	N/A	Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts?
PERL：拼音增强的中文ASR N-best错误修正语言模型	Junhong Liang	PDF	N/A	PERL: Pinyin Enhanced Rephrasing Language Model for Chinese ASR N-best Error Correction
材料选择器：基于扩散变换器的多模态材料生成	Xiaohe Ma	PDF	N/A	MaterialPicker: Multi-Modal Material Generation with Diffusion Transformers
通道反射：基于知识的脑电图数据增强技术用于脑机接口	Ziwei Wang	PDF	N/A	Channel Reflection: Knowledge-Driven Data Augmentation for EEG-Based Brain-Computer Interfaces
Linq-Embed-Mistral 技术报告	Chanyeol Choi	PDF	N/A	Linq-Embed-Mistral Technical Report
不同大型语言模型架构的调查：趋势、基准测试与挑战	Minghao Shao	PDF	N/A	Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges
超越[cls]：探索掩码图像建模表示的真正潜力	Marcin Przewięźlikowski	PDF	N/A	Beyond [cls]: Exploring the true potential of Masked Image Modeling representations
连续低秩缩放点积注意力	Ginés Carreto Picón	PDF	N/A	Continual Low-Rank Scaled Dot-product Attention
ClusterKV：在语义空间中操作LLM KV缓存以实现可召回的压缩	Guangda Liu	PDF	N/A	ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
半监督迁移提升（SS-TrBoosting）	Lingfei Deng	PDF	N/A	Semi-Supervised Transfer Boosting (SS-TrBoosting)
感知网络的参数增强：一种人类启发的方法用于图像质量评估	Jorge Vila-Tomás	PDF	N/A	Parametric Enhancement of PerceptNet: A Human-Inspired Approach for Image Quality Assessment
U-MATH：一个用于评估大型语言模型中数学技能的大学水平基准	Konstantin Chernyshev	PDF	N/A	U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs
Fab-ME：一种用于织物缺陷检测的视觉状态空间和注意力增强框架	Shuai Wang	PDF	N/A	Fab-ME: A Vision State-Space and Attention-Enhanced Framework for Fabric Defect Detection
生物启发式半监督语义分割在生物医学成像中的应用	Luca Ciampi	PDF	N/A	Biologically-inspired Semi-supervised Semantic Segmentation for Biomedical Imaging
具有集成拒绝选项的节点分类	Uday Bhaskar	PDF	N/A	Node Classification With Integrated Reject Option
时空图神经网络的半去中心化训练用于交通预测	Ivan Kralj	PDF	N/A	Semi-decentralized Training of Spatio-Temporal Graph Neural Networks for Traffic Prediction
加权奖励偏好优化用于隐式模型融合	Ziyi Yang	PDF	N/A	Weighted-Reward Preference Optimization for Implicit Model Fusion
通过多任务一致性和优先级优化密集视觉预测	Maxime Fontana	PDF	N/A	Optimizing Dense Visual Predictions Through Multi-Task Coherence and Prioritization
走向理解和量化文本到图像生成的模糊性	Gianni Franchi	PDF	N/A	Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
PatchDPO：用于无微调个性化图像生成的补丁级DPO	Qihan Huang	PDF	N/A	PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
结合医学语言模型和本体论的西班牙语临床笔记疾病自动检测	Leon-Paul Schaub Torre	PDF	N/A	Automatic detection of diseases in Spanish clinical notes combining medical language models and ontologies
IRisPath：通过鲁棒的IR-RGB融合增强越野导航，提升昼夜通行能力	Saksham Sharma	PDF	N/A	IRisPath: Enhancing Off-Road Navigation with Robust IR-RGB Fusion for Improved Day and Night Traversability
解释有用吗？皮肤病变分类器中可解释性方法的比较分析	Rosa Y. G. Paccotacya-Yanque	PDF	N/A	Are Explanations Helpful? A Comparative Analysis of Explainability Methods in Skin Lesion Classifiers
用于求解偏微分方程逆问题的物理信息深度逆算子网络	Sung Woong Cho	PDF	N/A	Physics-Informed Deep Inverse Operator Networks for Solving PDE Inverse Problems
字节BPE分词作为逆字符串同态映射	Saibo Geng	PDF	N/A	Byte BPE Tokenization as an Inverse string Homomorphism
多层次关联网络用于少样本图像分类	Yunkai Dang	PDF	N/A	Multi-Level Correlation Network For Few-Shot Image Classification
LEP-QNN：使用量子神经网络进行贷款资格预测	Nouhaila Innan	PDF	N/A	LEP-QNN: Loan Eligibility Prediction Using Quantum Neural Networks
测试神经网络验证器：一个带有隐藏反例的健全性基准	Xingjian Zhou	PDF	N/A	Testing Neural Network Verifiers: A Soundness Benchmark with Hidden Counterexamples
自动化指标系统依赖性度量	Pius von Däniken	PDF	N/A	A Measure of the System Dependence of Automated Metrics
大型语言模型展现出与人类相媲美的个体和集体创造力。	Luning Sun	PDF	N/A	Large Language Models show both individual and collective creativity comparable to humans
基于示例的语义图像合成中的外观匹配适配器	Siyoon Jin	PDF	N/A	Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis
社交媒体上的细粒度行为模拟与角色扮演大型语言模型	Kun Li	PDF	N/A	Fine-Grained Behavior Simulation with Role-Playing Large Language Model on Social Media
单纯复形上的拓扑轨迹分类与地标推断	Vincent P. Grande	PDF	N/A	Topological Trajectory Classification and Landmark Inference on Simplicial Complexes
具有调整偏移量噪声的广义扩散模型	Takuro Kutsuna	PDF	N/A	Generalized Diffusion Model with Adjusted Offset Noise
统一大型语言模型的KV缓存压缩与LeanKV	Yanqi Zhang	PDF	N/A	Unifying KV Cache Compression for Large Language Models with LeanKV
短距离光通信：神经形态硬件的现实应用任务	Elias Arnold	PDF	N/A	Short-reach Optical Communications: A Real-world Task for Neuromorphic Hardware
将可编程可塑性整合到模拟神经形态硬件的实验描述中	Philipp Spilger	PDF	N/A	Integrating programmable plasticity in experiment descriptions for analog neuromorphic hardware
基于大语言模型的鲁棒多比特文本水印	Xiaojun Xu	PDF	N/A	Robust Multi-bit Text Watermark with LLM-based Paraphrasers
《Splats中的Splats：在高斯喷溅中嵌入隐形3D水印》	Yijia Guo	PDF	N/A	Splats in Splats: Embedding Invisible 3D Watermark within Gaussian Splatting
用于顺序组合最优传输的Sinkhorn算法	Kazuki Watanabe	PDF	N/A	Sinkhorn Algorithm for Sequentially Composed Optimal Transports
ObjectFinder：面向盲人互动物体搜索的开放词汇辅助系统	Ruiping Liu	PDF	N/A	ObjectFinder: Open-Vocabulary Assistive System for Interactive Object Search by Blind People
基于经验的规划策略发现	Ruiqi He	PDF	N/A	Experience-driven discovery of planning strategies
CredID：可信的多比特水印用于大型语言模型识别	Haoyu Jiang	PDF	N/A	CredID: Credible Multi-Bit Watermark for Large Language Models Identification
在条件生成对抗网络中使用自适应权重掩码进行少样本学习	Jiacheng Hu	PDF	N/A	Few-Shot Learning with Adaptive Weight Masking in Conditional GANs
ChatTS：通过合成数据将时间序列与LLMs对齐，以增强理解和推理能力	Zhe Xie	PDF	N/A	ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning
MultiGO：面向单目三维纹理人体重建的多层次几何学习	Gangjian Zhang	PDF	N/A	MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction
用于平面视频实时立体转换的轻量级多平面图像网络	Shanding Diao	PDF	N/A	Lightweight Multiplane Images Network for Real-Time Stereoscopic Conversion from Planar Video
一个每层都至关重要的惊喜预言者	Xudong Hong	PDF	N/A	A surprisal oracle for when every layer counts
利用图神经网络（GNNs）增强推荐系统并解决过平滑问题	Wenyi Liu	PDF	N/A	Enhancing Recommendation Systems with GNNs and Addressing Over-Smoothing
TOOL-ED：利用LLM的工具调用能力增强共情响应生成	Huiying Cao	PDF	N/A	TOOL-ED: Enhancing Empathetic Response Generation with the Tool Calling Capability of LLM
使用基于共识的估计和近似恒定速度建模进行分散式移动目标跟踪	Amir Ahmad Ghods	PDF	N/A	Decentralized Mobile Target Tracking Using Consensus-Based Estimation with Nearly-Constant-Velocity Modeling
通过一个强大的基于CLIP的编码器扩展事件模态应用	Sungheon Jeong	PDF	N/A	Expanding Event Modality Applications through a Robust CLIP-Based Encoder
Revolve：通过跟踪文本优化中的响应演变来优化AI系统	Peiyan Zhang	PDF	N/A	Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization
Mimir：提升视频扩散模型以实现精确的文本理解	Shuai Tan	PDF	N/A	Mimir: Improving Video Diffusion Models for Precise Text Understanding
基于混合深度学习的肝细胞癌癌变分级策略，用于H&E染色肝脏组织病理学图像的分类	Ajinkya Deshpande	PDF	N/A	Hybrid deep learning-based strategy for the hepatocellular carcinoma cancer grade classification of H&E stained liver histopathology images
一种基于近似SRBB的酉合成可扩展量子神经网络	Giacomo Belli	PDF	N/A	A Scalable Quantum Neural Network for Approximate SRBB-Based Unitary Synthesis
Align3R：动态视频的对齐单目深度估计	Jiahao Lu	PDF	N/A	Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
RoDyGS：用于随意视频的鲁棒动态高斯光栅化技术	Yoonwoo Jeong	PDF	N/A	RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos
协调多臂老虎机以提升Wi-Fi中的空间重用	Francesc Wilhelmi	PDF	N/A	Coordinated Multi-Armed Bandits for Improved Spatial Reuse in Wi-Fi
ASR-EC基准测试：评估大型语言模型在中文语音识别错误纠正上的表现	Victor Junqiu Wei	PDF	N/A	ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction
使用自监督学习模型对无文本语音合成原始音频的分析研究	Joonyong Park	PDF	N/A	Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model
基于偏好的可微分游戏对手塑造	Xinyu Qiao	PDF	N/A	Preference-based opponent shaping in differentiable games
TokenFlow：统一的多模态理解和生成图像Token器	Liao Qu	PDF	N/A	TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
UTSD：统一时间序列扩散模型	Xiangkai Ma	PDF	N/A	UTSD: Unified Time Series Diffusion Model
通过混合变形实现轻量级随机视频预测	Kazuki Kotoyori	PDF	N/A	Lightweight Stochastic Video Prediction via Hybrid Warping
CLAP：通过曲率采样和原型学习实现融合3D感知的无监督3D表示学习	Runjian Chen	PDF	N/A	CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning
重新审视基于能量的模型用于分布外检测	Yifan Wu	PDF	N/A	Revisiting Energy-Based Model for Out-of-Distribution Detection
Point-GN：一种使用高斯位置编码的非参数网络，用于点云分类	Marzieh Mohammadi	PDF	N/A	Point-GN: A Non-Parametric Network Using Gaussian Positional Encoding for Point Cloud Classification
通过边缘-云协作实现无人机天线干扰检测的实时AIoT	Jun Dong	PDF	N/A	Real-Time AIoT for UAV Antenna Interference Detection via Edge-Cloud Collaboration
趋势：通过时间预测进行无监督三维表示学习的激光雷达感知	Runjian Chen	PDF	N/A	TREND: Unsupervised 3D Representation Learning via Temporal Forecasting for LiDAR Perception
点-GR：用于三维物体分类和分割的图残差点云网络	Md Meraz	PDF	N/A	Point-GR: Graph Residual Point Cloud Network for 3D Object Classification and Segmentation
少即是多：一种针对基于深度强化学习的自动驾驶策略的隐秘且高效的对抗攻击方法	Junchao Fan	PDF	N/A	Less is More: A Stealthy and Efficient Adversarial Attack Method for DRL-based Autonomous Driving Policies
基于骨架的视频异常检测的扰动训练频率引导扩散模型	Xiaofeng Tan	PDF	N/A	Frequency-Guided Diffusion Model with Perturbation Training for Skeleton-Based Video Anomaly Detection
MRNet：用于医学图像到图像翻译的多方面弹性网络	Hyojeong Lee	PDF	N/A	MRNet: Multifaceted Resilient Networks for Medical Image-to-Image Translation
MILLION：一种具有可控风险的多目标通用框架，用于投资组合管理	Liwei Deng	PDF	N/A	MILLION: A General Multi-Objective Framework with Controllable Risk for Portfolio Management
扇形束CT重建用于未对齐的稀疏视图X射线行李数据集	Shin Kim	PDF	N/A	Fan-Beam CT Reconstruction for Unaligned Sparse-View X-ray Baggage Dataset
从格兰杰因果关系的角度看梯度下降及其在剪枝中的应用	Aditya Shah	PDF	N/A	A Granger-Causal Perspective on Gradient Descent with Application to Pruning
系统中神经网络的规范生成	Isha Chaudhary	PDF	N/A	Specification Generation for Neural Networks in Systems
时间序列单细胞RNA-seq表达数据的时间戳校准	Xiran Chen	PDF	N/A	Timestamp calibration for time-series single cell RNA-seq expression data
ASIGN：一种用于三维空间转录组学的解剖学感知空间插补图形网络	Junchao Zhu	PDF	N/A	ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics
人类变异性与机器一致性：人类和大型语言模型生成文本的语言学分析	Sergio E. Zanotto	PDF	N/A	Human Variability vs. Machine Consistency: A Linguistic Analysis of Texts Generated by Humans and Large Language Models