| [MASK] 是一切的关键 |
Vincent Tao Hu |
PDF |
N/A |
[MASK] is All You Need |
| 从深度中提取语义:一种用于手势合成的RAG解决方案 |
M. Hamza Mughal |
PDF |
N/A |
Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis |
| 触觉梦境融合:利用触觉感知进行三维生成 |
Ruihan Gao |
PDF |
N/A |
Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation |
| P3-PO:用于机器人策略视觉空间泛化的规定性点先验 |
Mara Levy |
PDF |
N/A |
P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot Policies |
| CARP:通过由粗到精的自回归预测进行视觉运动策略学习 |
Zhefei Gong |
PDF |
N/A |
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction |
| "80个时间步环游世界:一种全球视觉地理定位的生成方法" |
Nicolas Dufour |
PDF |
N/A |
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation |
| 多样化的分数蒸馏 |
Yanbo Xu |
PDF |
N/A |
Diverse Score Distillation |
| AnyBimanual:将单手策略迁移用于通用双手操作 |
Guanxing Lu |
PDF |
N/A |
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation |
| Driv3R:为自动驾驶学习密集的4D重建 |
Xin Fei |
PDF |
N/A |
Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving |
| 深入探讨视觉对比解码在大规模视觉语言模型幻觉缓解中的应用 |
Yi-Lun Lee |
PDF |
N/A |
Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models |
| 视觉词汇表:语言空间中的丰富图像特征 |
XuDong Wang |
PDF |
N/A |
Visual Lexicon: Rich Image Features in Language Space |
| 在不确定性条件下的多轮文本到图像生成中的主动代理 |
Meera Hahn |
PDF |
N/A |
Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty |
| 动态事件NeRF:从多视角事件相机重建一般动态场景 |
Viktor Rudnev |
PDF |
N/A |
Dynamic EventNeRF: Reconstructing General Dynamic Scenes from Multi-view Event Cameras |
| 训练大型语言模型在连续潜在空间中进行推理 |
Shibo Hao |
PDF |
N/A |
Training Large Language Models to Reason in a Continuous Latent Space |
| MAtCha高斯分布:从稀疏视角生成高质量几何和照片级真实感的图表集 |
Antoine Guédon |
PDF |
N/A |
MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views |
| 排名感知适配器用于结合CLIP的文本驱动图像排序 |
Wei-Hsiang Yu |
PDF |
N/A |
Ranking-aware adapter for text-driven image ordering with CLIP |
| XRZoo:一个大规模且多功能的扩展现实(XR)应用数据集 |
Shuqing Li |
PDF |
N/A |
XRZoo: A Large-Scale and Versatile Dataset of Extended Reality (XR) Applications |
| 即时恢复:单步个性化人脸修复与共享图像注意力 |
Howard Zhang |
PDF |
N/A |
InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention |
| 拒绝令牌:一种校准大型语言模型拒绝的简单方法 |
Neel Jain |
PDF |
N/A |
Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models |
| ONEBench 测试一切:开放式能力上的样本级基准测试 |
Adhiraj Ghosh |
PDF |
N/A |
ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities |
| 用于高保真小儿胶质瘤分割的三维图注意力网络 |
Harish Thangaraj |
PDF |
N/A |
3D Graph Attention Networks for High Fidelity Pediatric Glioma Segmentation |
| ContRail:一个利用ControlNet实现真实铁路图像合成的框架 |
Andrei-Robert Alexandrescu |
PDF |
N/A |
ContRail: A Framework for Realistic Railway Image Synthesis using ControlNet |
| 卷积走向高阶:一种生物启发的机制助力图像分类 |
Simone Azeglio |
PDF |
N/A |
Convolution goes higher-order: a biologically inspired mechanism empowers image classification |
| JAPAGEN:通过LLM生成日语训练数据集实现高效的小样本/零样本学习 |
Takuro Fujii |
PDF |
N/A |
JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM |
| 以假乱真:针对AIGC检测的逼真型鲁棒黑盒对抗攻击 |
Caiyun Xie |
PDF |
N/A |
Take Fake as Real: Realistic-like Robust Black-box Adversarial Attack to Evade AIGC Detection |
| AutoDCWorkflow:基于LLM的数据清洗工作流自动生成与基准测试 |
Lan Li |
PDF |
N/A |
AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark |
| VP-MEL:视觉提示引导的多模态实体链接 |
Hongze Mi |
PDF |
N/A |
VP-MEL: Visual Prompts Guided Multimodal Entity Linking |
| 利用深度学习实现Bankart损伤的非侵入性诊断 |
Sahil Sethi |
PDF |
N/A |
Toward Non-Invasive Diagnosis of Bankart Lesions with Deep Learning |
| 如何随着时间的推移合并您的多模态模型? |
Sebastian Dziadzio |
PDF |
N/A |
How to Merge Your Multimodal Models Over Time? |
| MISFEAT:针对具有系统性缺失数据的亚组进行特征选择 |
Bar Genossar |
PDF |
N/A |
MISFEAT: Feature Selection for Subgroups with Systematic Missing Data |
| 通过深度学习诊断帕金森病:一种基于LSTM的新方法用于冻结步态检测 |
Aqib Nazir Mir |
PDF |
N/A |
Parkinson's Disease Diagnosis Through Deep Learning: A Novel LSTM-Based Approach for Freezing of Gait Detection |
| FlexEvent:任意频率下的事件相机目标检测 |
Dongyue Lu |
PDF |
N/A |
FlexEvent: Event Camera Object Detection at Arbitrary Frequencies |
| 具有完美记忆的异步智能体:联盟策略的模型简化、基于知识的构建与模型检测 |
Dilian Gurov |
PDF |
N/A |
Asynchronous Agents with Perfect Recall: Model Reductions, Knowledge-Based Construction, and Model Checking for Coalitional Strategies |
| 音乐的源分离与自动转录 |
Bradford Derby |
PDF |
N/A |
Source Separation & Automatic Transcription for Music |
| 你看到它,你就得到了它:在无姿态视频上大规模学习3D创作 |
Baorui Ma |
PDF |
N/A |
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale |
| Gen-3扩散:通过2D与3D扩散协同实现逼真的图像到3D生成 |
Yuxuan Xue |
PDF |
N/A |
Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy |
| 基于数字孪生概念的供水系统数字化转型 |
MohammadHossein Homaei |
PDF |
N/A |
Digital Transformation in the Water Distribution System based on the Digital Twins Concept |
| OmniEvalKit:一个模块化、轻量级的工具箱,用于评估大型语言模型及其全方位扩展 |
Yi-Kai Zhang |
PDF |
N/A |
OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions |
| FedSynthCT-Brain:一种用于多机构脑部MRI到CT合成的联邦学习框架 |
Ciro Benito Raggio |
PDF |
N/A |
FedSynthCT-Brain: A Federated Learning Framework for Multi-Institutional Brain MRI-to-CT Synthesis |
| 隐私参数对图像分类深度学习模型的影响 |
Basanta Chaulagain |
PDF |
N/A |
Impact of Privacy Parameters on Deep Learning Models for Image Classification |
| 操作员学习中的一些最佳实践 |
Dustin Enyeart |
PDF |
N/A |
Some Best Practices in Operator Learning |
| 政策不可知强化学习:离线强化学习与在线强化学习的微调,适用于任何类别和骨干网络 |
Max Sobol Mark |
PDF |
N/A |
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone |
| 探索决策制定策略的关键测试场景:一种大型语言模型方法 |
Weichao Xu |
PDF |
N/A |
Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach |
| 面向基于大语言模型(LLM)代理的交通系统建模:一个概念框架 |
Tianming Liu |
PDF |
N/A |
Toward LLM-Agent-Based Modeling of Transportation Systems: A Conceptual Framework |
| 我不知道:使用[IDK]标记显式建模不确定性 |
Roi Cohen |
PDF |
N/A |
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token |
| EMOv2:推动5M视觉模型前沿 |
Jiangning Zhang |
PDF |
N/A |
EMOv2: Pushing 5M Vision Model Frontier |
| ILLUME:照亮你的大型语言模型,使其能够看、画和自我增强 |
Chunwei Wang |
PDF |
N/A |
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance |
| Diff5T: 以广泛的5.0特斯拉K空间和空间数据集为基准的人脑扩散MRI |
Shanshan Wang |
PDF |
N/A |
Diff5T: Benchmarking Human Brain Diffusion MRI with an Extensive 5.0 Tesla K-Space and Spatial Dataset |
| 细粒度遥感图像分割中的知识迁移与领域自适应 |
Shun Zhang |
PDF |
N/A |
Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation |
| 效率与保真度的结合:一种新颖的量化框架,用于稳定扩散 |
Shuaiting Li |
PDF |
N/A |
Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion |
| 基于未来状态和动作访问测量的离策略最大熵强化学习 |
Adrien Bolland |
PDF |
N/A |
Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures |
| GEAR:一种简单的无监督反向词典方法,包括生成、嵌入、平均和排序步骤。 |
Fatemah Almeman |
PDF |
N/A |
GEAR: A Simple GENERATE, EMBED, AVERAGE AND RANK Approach for Unsupervised Reverse Dictionary |
| 语义搜索与推荐算法 |
Aryan Duhan |
PDF |
N/A |
Semantic Search and Recommendation Algorithm |
| 使用事件相机进行目标检测:基于MoE热传导的检测器与新基准数据集 |
Xiao Wang |
PDF |
N/A |
Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset |
| 狭隘之门:视觉-语言模型中的本地化图像-文本交流 |
Alessandro Serra |
PDF |
N/A |
The Narrow Gate: Localized Image-Text Communication in Vision-Language Models |
| 类平衡对主动类增量学习至关重要 |
Zitong Huang |
PDF |
N/A |
Class Balance Matters to Active Class-Incremental Learning |
| 使用多层卷积神经网络模型检测面部图像篡改 |
Alejandro Marco Montejano |
PDF |
N/A |
Detecting Facial Image Manipulations with Multi-Layer CNN Models |
| 超越标量:基于概念的视觉变换器对齐分析 |
Johanna Vielhaben |
PDF |
N/A |
Beyond Scalars: Concept-Based Alignment Analysis in Vision Transformers |
| MAVias:减轻任何视觉偏见 |
Ioannis Sarridis |
PDF |
N/A |
MAVias: Mitigate any Visual Bias |
| PolytopeWalk: 多面体上的稀疏MCMC采样 |
Benny Sun |
PDF |
N/A |
PolytopeWalk: Sparse MCMC Sampling over Polytopes |
| 基于眼底图像的视力评估与PAC保证 |
Sooyong Jang |
PDF |
N/A |
Fundus Image-based Visual Acuity Assessment with PAC-Guarantees |
| 通过自适应模型融合实现受版权保护的语言生成 |
Javier Abad |
PDF |
N/A |
Copyright-Protected Language Generation via Adaptive Model Fusion |
| AI TrackMate:终于有人能给你的音乐带来不仅仅是“听起来很棒!”的评价了! |
Yi-Lin Jiang |
PDF |
N/A |
AI TrackMate: Finally, Someone Who Will Give Your Music More Than Just "Sounds Great!" |
| MVReward:更好地对齐和评估多视角扩散模型与人类偏好 |
Weitao Wang |
PDF |
N/A |
MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences |
| MLLMs中的三维空间理解:消歧与评估 |
Chun-Peng Chang |
PDF |
N/A |
3D Spatial Understanding in MLLMs: Disambiguation and Evaluation |
| ML/AI会议审稿人分配中基于文本匹配的脆弱性对合谋的影响 |
Jhih-Yi |
PDF |
N/A |
Vulnerability of Text-Matching in ML/AI Conference Reviewer Assignments to Collusions |
| VOPy:一个用于黑箱向量优化的框架 |
Yaşar Cahit Yıldırım |
PDF |
N/A |
VOPy: A Framework for Black-box Vector Optimization |
| 在大语言模型时代下的可控语音合成:综述 |
Tianxin Xie |
PDF |
N/A |
Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey |
| 推进音乐疗法:在新颖的五行和谐系统中整合东方五行音乐理论与西方技术及人工智能 |
Yubo Zhou |
PDF |
N/A |
Advancing Music Therapy: Integrating Eastern Five-Element Music Theory and Western Techniques with AI in the Novel Five-Element Harmony System |
| 基于自动失真识别技术的无参考医学图像质量评估方法:在磁共振引导放疗预处理中的应用 |
Zilin Wang |
PDF |
N/A |
A No-Reference Medical Image Quality Assessment Method Based on Automated Distortion Recognition Technology: Application to Preprocessing in MRI-guided Radiotherapy |
| 协作学习中的自利代理:一种激励的自适应数据中心框架 |
Nithia Vijayan |
PDF |
N/A |
Self-Interested Agents in Collaborative Learning: An Incentivized Adaptive Data-Centric Framework |
| 大型语言模型中的锚定偏差:一项实验研究 |
Jiaxu Lou |
PDF |
N/A |
Anchoring Bias in Large Language Models: An Experimental Study |
| PrEditor3D:快速且精确的3D形状编辑工具 |
Ziya Erkoç |
PDF |
N/A |
PrEditor3D: Fast and Precise 3D Shape Editing |
| 跨越鸿沟:重新审视Softmax与线性注意力 |
Dongchen Han |
PDF |
N/A |
Bridging the Divide: Reconsidering Softmax and Linear Attention |
| EmoSpeech:一个情感丰富且上下文详尽的语音标注语料库 |
Weizhen Bian |
PDF |
N/A |
EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations |
| 电影:移动扩散用于视频编辑 |
Adil Karjauv |
PDF |
N/A |
MoViE: Mobile Diffusion for Video Editing |
| 基于多样性的大语言模型在文本分类中的数据质量提升:揭示、处理困难与噪声 |
Min Zeng |
PDF |
N/A |
Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy |
| CONDEN-FI:基于一致性与多样性学习的无监督多视角特征与实例协同选择 |
Yanyong Huang |
PDF |
N/A |
CONDEN-FI: Consistency and Diversity Learning-based Multi-View Unsupervised Feature and In-stance Co-Selection |
| DEX:用于在微型AI加速器上高效进行CNN推理的数据通道扩展 |
Taesik Gong |
PDF |
N/A |
DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI Accelerators |
| ProcessBench:识别数学推理中的过程错误 |
Chujie Zheng |
PDF |
N/A |
ProcessBench: Identifying Process Errors in Mathematical Reasoning |
| 当降维遇上图(绘图)理论:介绍一个通用框架、挑战与机遇 |
Fernando Paulovich |
PDF |
N/A |
When Dimensionality Reduction Meets Graph (Drawing) Theory: Introducing a Common Framework, Challenges and Opportunities |
| 原油中的DNA片段揭示了地球的隐秘历史 |
Wan-Qian Zhao |
PDF |
N/A |
DNA Fragments in Crude Oil Reveals Earth's Hidden History |
| 使用类人推理预测道路场景中的被遮挡行人:基于OccluRoads数据集的见解 |
Melo Castillo Angie Nataly |
PDF |
N/A |
Prediction of Occluded Pedestrians in Road Scenes using Human-like Reasoning: Insights from the OccluRoads Dataset |
| 关于迭代幅度剪枝如何在全连接神经网络中发现局部感受野的研究 |
William T. Redman |
PDF |
N/A |
On How Iterative Magnitude Pruning Discovers Local Receptive Fields in Fully Connected Neural Networks |
| 懒惰:针对LLM技能的扩展法则,用于预测跨系列多基准性能 |
Felipe Maia Polo |
PDF |
N/A |
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families |
| 通过关联记忆理解变压器中的事实回忆 |
Eshaan Nichani |
PDF |
N/A |
Understanding Factual Recall in Transformers via Associative Memories |
| 使用检测变换器反转视觉表示 |
Jan Rathjens |
PDF |
N/A |
Inverting Visual Representations with Detection Transformers |
| 解开强化学习代理中记忆复杂性的谜团:一种分类与评估的方法 |
Egor Cherepanov |
PDF |
N/A |
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation |
| HES-UNet:一种用于肝棘球蚴病病变分割的U-Net |
Jiayan Chen |
PDF |
N/A |
HES-UNet: A U-Net for Hepatic Echinococcosis Lesion Segmentation |
| 来自1.2亿年前狼鳍鱼化石的古DNA揭示了进化见解 |
Wan-Qian Zhao |
PDF |
N/A |
Ancient DNA from 120-Million-Year-Old Lycoptera Fossils Reveals Evolutionary Insights |
| 大型语言模型与形式化方法融合以构建可信AI代理:路线图 |
Yedi Zhang |
PDF |
N/A |
The Fusion of Large Language Models and Formal Methods for Trustworthy AI Agents: A Roadmap |
| 将球面高斯拟合到动态高动态范围成像序列 |
Pascal Clausen |
PDF |
N/A |
Fitting Spherical Gaussians to Dynamic HDRI Sequences |
| 异常控制:学习跨模态语义特征以实现可控的异常合成 |
Shidan He |
PDF |
N/A |
AnomalyControl: Learning Cross-modal Semantic Features for Controllable Anomaly Synthesis |
| BATseg:基于边界感知的多类别脊髓肿瘤在3D MRI扫描中的分割 |
Hongkang Song |
PDF |
N/A |
BATseg: Boundary-aware Multiclass Spinal Cord Tumor Segmentation on 3D MRI Scans |
| 混合注意力网络:一种高效的非解剖标志点检测方法 |
Xiaoqian Zhou |
PDF |
N/A |
Hybrid Attention Network: An efficient approach for anatomy-free landmark detection |
| 一个关于协作AI在实际医疗应用中成本效益的警示故事 |
Francesco Cremonesi |
PDF |
N/A |
A cautionary tale on the cost-effectiveness of collaborative AI in real-world medical applications |
| PPT:使用伪标记轨迹进行运动预测的预训练 |
Yihong Xu |
PDF |
N/A |
PPT: Pre-Training with Pseudo-Labeled Trajectories for Motion Forecasting |
| 一种高效的场景坐标编码与重定位方法 |
Kuan Xu |
PDF |
N/A |
An Efficient Scene Coordinate Encoding and Relocalization Method |
| 改进基于文本的潜在扩散模型以应用于癌症病理学 |
Aakash Madhav Rao |
PDF |
N/A |
Improving text-conditioned latent diffusion for cancer pathology |
| SimuDICE:通过世界模型更新和DICE估计进行离线策略优化 |
Catalin E. Brita |
PDF |
N/A |
SimuDICE: Offline Policy Optimization Through World Model Updates and DICE Estimation |
| 小语言,大模型:一项关于挪威语言连续训练的研究 |
David Samuel |
PDF |
N/A |
Small Languages, Big Models: A Study of Continual Training on Languages of Norway |
| 安全世界:地理多样性安全对齐 |
Da Yin |
PDF |
N/A |
SafeWorld: Geo-Diverse Safety Alignment |
| 使用贝叶斯模型比较来衡量两个系统之间依赖关系的推断性度量 |
Guillaume Marrelec |
PDF |
N/A |
An inferential measure of dependence between two systems using Bayesian model comparison |
| 从不确定性到信任:通过不确定性引导的Dropout解码提升视觉语言模型的可靠性 |
Yixiong Fang |
PDF |
N/A |
From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding |
| 值得思考的问题:机器学习如何帮助更好地预测和理解食品价格的变动? |
Kristina L. Kupferschmidt |
PDF |
N/A |
Food for thought: How can machine learning help better predict and understand changes in food prices? |
| 使用上下文采样和一对多熵的主动学习用于语义分割 |
Fei Wu |
PDF |
N/A |
Active Learning with Context Sampling and One-vs-Rest Entropy for Semantic Segmentation |
| 超越RGB的智能体旅程:揭示视觉与语言导航中的混合语义-空间环境表征 |
Xuesong Zhang |
PDF |
N/A |
Agent Journey Beyond RGB: Unveiling Hybrid Semantic-Spatial Environmental Representations for Vision-and-Language Navigation |
| 门控增量网络:通过增量规则改进Mamba2 |
Songlin Yang |
PDF |
N/A |
Gated Delta Networks: Improving Mamba2 with Delta Rule |
| 内部排名:无标签视觉问答的大规模多模态模型排名 |
Weijie Tu |
PDF |
N/A |
Ranked from Within: Ranking Large Multimodal Models for Visual Question Answering Without Labels |
| 修剪全能选手:重新思考并提升大型视觉语言模型的推理效率 |
Wei Suo |
PDF |
N/A |
Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models |
| 无人机虚拟天线阵列部署用于数据收集网络中的上行干扰缓解 |
Hongjuan Li |
PDF |
N/A |
UAV Virtual Antenna Array Deployment for Uplink Interference Mitigation in Data Collection Networks |
| 自适应图学习从空间信息中提取手术工作流程预测 |
Francis Xiatian Zhang |
PDF |
N/A |
Adaptive Graph Learning from Spatial Information for Surgical Workflow Anticipation |
| 不确定性估计有多可靠?三个新的地球观测数据集用于基准测试机器学习中的不确定性量化。 |
Yuanyuan Wang |
PDF |
N/A |
How Certain are Uncertainty Estimates? Three Novel Earth Observation Datasets for Benchmarking Uncertainty Quantification in Machine Learning |
| 超声心动图到心脏MRI视图变换用于实时盲恢复 |
Ilke Adalioglu |
PDF |
N/A |
Echocardiography to Cardiac MRI View Transformation for Real-Time Blind Restoration |
| BoRA:双维权重分解低秩适应 |
Qiushi Wang |
PDF |
N/A |
BoRA: Bi-dimensional Weight-Decomposed Low-Rank Adaptation |
| 局部注意力变压器用于高细节光流上采样 |
Alexander Gielisse |
PDF |
N/A |
Local Attention Transformers for High-Detail Optical Flow Upsampling |
| 基础模型能否在交互环境中主动收集信息以验证假设? |
Nan Rosemary Ke |
PDF |
N/A |
Can foundation models actively gather information in interactive environments to test hypotheses? |
| 一种使用原始-对偶样式微分的双层学习自适应不精确方法 |
Lea Bogensperger |
PDF |
N/A |
An Adaptively Inexact Method for Bilevel Learning Using Primal-Dual Style Differentiation |
| 使用欲望驱动的自主性模拟类人日常活动 |
Yiding Wang |
PDF |
N/A |
Simulating Human-like Daily Activities with Desire-driven Autonomy |
| 将专家标签整合到基于大语言模型的排放目标检测中:示例选择与自动提示设计 |
Marco Wrzalik |
PDF |
N/A |
Integrating Expert Labels into LLM-based Emission Goal Detection: Example Selection vs Automatic Prompt Design |
| 预见并先行:任务预测与预调度实现高效机器人仓储 |
B. Cao |
PDF |
N/A |
Foresee and Act Ahead: Task Prediction and Pre-Scheduling Enabled Efficient Robotic Warehousing |
| Deblur4DGS:从模糊单目视频生成的4D高斯喷射 |
Renlong Wu |
PDF |
N/A |
Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video |
| LLM-BIP:基于块级前向重要性传播的大型语言模型结构化剪枝 |
Haihang Wu |
PDF |
N/A |
LLM-BIP: Structured Pruning for Large Language Models with Block-Wise Forward Importance Propagation |
| 持续学习用于分割任何模型适应 |
Jinglong Yang |
PDF |
N/A |
Continual Learning for Segment Anything Model Adaptation |
| 在无线网络中使用模型剪枝和梯度量化的联邦分割学习 |
Junhe Zhang |
PDF |
N/A |
Federated Split Learning with Model Pruning and Gradient Quantization in Wireless Networks |
| 视觉与语言导航中的世界一致性数据生成 |
Yu Zhong |
PDF |
N/A |
World-Consistent Data Generation for Vision-and-Language Navigation |
| 星语望远镜:基于代理的观测助手系统,迈向人工智能天体物理学家 |
Cunshi Wang |
PDF |
N/A |
StarWhisper Telescope: Agent-Based Observation Assistant System to Approach AI Astrophysicist |
| 批量TopK稀疏自编码器 |
Bart Bussmann |
PDF |
N/A |
BatchTopK Sparse Autoencoders |
| 生成线匹配模型 |
Ori Matityahu |
PDF |
N/A |
Generative Lines Matching Models |
| 游戏竞技场:通过实时电脑游戏评估大型语言模型的推理能力 |
Lanxiang Hu |
PDF |
N/A |
GameArena: Evaluating LLM Reasoning through Live Computer Games |
| 边缘延迟深度确定性策略梯度:边缘场景下的高效连续控制 |
Alberto Sinigaglia |
PDF |
N/A |
Edge Delayed Deep Deterministic Policy Gradient: efficient continuous control for edge scenarios |
| 探索合成数据对使用生成对抗网络进行人体手势识别任务的影响 |
George Kontogiannis |
PDF |
N/A |
Exploring the Impact of Synthetic Data on Human Gesture Recognition Tasks Using GANs |
| PyPulse:一个用于生物信号插补的Python库 |
Kevin Gao |
PDF |
N/A |
PyPulse: A Python Library for Biosignal Imputation |
| 温和的鲁棒性意味着泛化。 |
Khoat Than |
PDF |
N/A |
Gentle robustness implies Generalization |
| 基于体积约束和正则化的低秩矩阵分解 |
Olivier Vu Thanh |
PDF |
N/A |
Low-Rank Matrix Factorizations with Volume-based Constraints and Regularizations |
| 分子古生物学中的新兴挑战:环境DNA片段的误用与将脱氨作用误解为原位DNA鉴定关键标准的误区 |
Wan-Qian Zhao |
PDF |
N/A |
Emerging Challenges in Molecular Paleontology: Misapplication of Environmental DNA Fragments and Misconception of Deamination as a Key Criterion for In Situ DNA Identification |
| 探索前沿大语言模型中的记忆与版权侵权问题:《纽约时报》诉OpenAI 2023年诉讼案研究 |
Joshua Freeman |
PDF |
N/A |
Exploring Memorization and Copyright Violation in Frontier LLMs: A Study of the New York Times v. OpenAI 2023 Lawsuit |
| 无需标签测量时间序列基础模型的预训练数据质量 |
Songkang Wen |
PDF |
N/A |
Measuring Pre-training Data Quality without Labels for Time Series Foundation Models |
| 自监督足够了吗?在有丝分裂图像分类中,对基础模型与端到端训练进行基准测试 |
Jonathan Ganz |
PDF |
N/A |
Is Self-Supervision Enough? Benchmarking Foundation Models Against End-to-End Training for Mitotic Figure Classification |
| 设备端自监督学习低延迟单目深度仅从事件中获取 |
Jesse Hagenaars |
PDF |
N/A |
On-Device Self-Supervised Learning of Low-Latency Monocular Depth from Only Events |
| 灵活可扩展的深度树突尖峰神经网络与多重非线性分支 |
Yifan Huang |
PDF |
N/A |
Flexible and Scalable Deep Dendritic Spiking Neural Networks with Multiple Nonlinear Branching |
| GraphNeuralNetworks.jl:使用Julia进行图上的深度学习 |
Carlo Lucibello |
PDF |
N/A |
GraphNeuralNetworks.jl: Deep Learning on Graphs with Julia |
| SeFENet:通过语义驱动的特征增强实现鲁棒的深度单应性估计 |
Zeru Shi |
PDF |
N/A |
SeFENet: Robust Deep Homography Estimation via Semantic-Driven Feature Enhancement |
| 潜在动态系统的跟踪控制及其在航天器姿态控制中的应用 |
Congxi Zhang |
PDF |
N/A |
Tracking control of latent dynamic systems with application to spacecraft attitude control |
| Elastic-DETR:通过特定内容网络预测实现图像分辨率可学习 |
Daeun Seo |
PDF |
N/A |
Elastic-DETR: Making Image Resolution Learnable with Content-Specific Network Prediction |
| UniPaint:通过专家混合实现时空视频修复的统一框架 |
Zhen Wan |
PDF |
N/A |
UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts |
| TriDi:三维人体、物体及交互的三边扩散 |
Ilya A. Petrov |
PDF |
N/A |
TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions |
| 通过增加行动空间与惯例来提升Hanabi中的多智能体合作 |
F. Bredell |
PDF |
N/A |
Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi |
| 并非所有错误都相同:阿尔茨海默病检测中的语音识别错误调查 |
Jiawen Kang |
PDF |
N/A |
Not All Errors Are Equal: Investigation of Speech Recognition Errors in Alzheimer's Disease Detection |
| 归一化流是一种强大的生成模型 |
Shuangfei Zhai |
PDF |
N/A |
Normalizing Flows are Capable Generative Models |
| 使用指令引导的交互器进行世界知识增强的自动驾驶推理 |
Mingliang Zhai |
PDF |
N/A |
World knowledge-enhanced Reasoning Using Instruction-guided Interactor in Autonomous Driving |
| HAIFAI:用于心理人脸重建的人机协作 |
Florian Strohm |
PDF |
N/A |
HAIFAI: Human-AI Collaboration for Mental Face Reconstruction |
| LLaVA-SpaceSGG:通过增强空间关系的视觉指令调优,实现开放词汇场景图生成 |
Mingjie Xu |
PDF |
N/A |
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations |
| CAD-Unet:一种增强型Unet架构,利用胶囊网络实现COVID-19肺部感染CT图像的精确分割 |
Yijie Dang |
PDF |
N/A |
CAD-Unet: A Capsule Network-Enhanced Unet Architecture for Accurate Segmentation of COVID-19 Lung Infections from CT Images |
| 基于视觉的无人机自主导航深度强化学习利用特权信息 |
Junqiao Wang |
PDF |
N/A |
Vision-Based Deep Reinforcement Learning of UAV Autonomous Navigation Using Privileged Information |
| 面向自动化规划中的高级建模 |
Carla Davesa Sureda |
PDF |
N/A |
Towards High-Level Modelling in Automated Planning |
| 精确:利用协同和语义信息对序列推荐系统进行预训练 |
Chonggang Song |
PDF |
N/A |
PRECISE: Pre-training Sequential Recommenders with Collaborative and Semantic Information |
| 基于信心的飞鸟目标检测模型训练中的简单样本优先自定步调学习策略 |
Zi-Wei Sun |
PDF |
N/A |
Self-Paced Learning Strategy with Easy Sample Prior Based on Confidence for the Flying Bird Object Detection Model Training |
| DSAI:面向数据为中心的人工智能的无偏见且可解释的潜在特征提取 |
Hyowon Cho |
PDF |
N/A |
DSAI: Unbiased and Interpretable Latent Feature Extraction for Data-Centric AI |
| 4D高斯喷射技术结合了尺度感知的残差场和自适应优化,实现了对时间复杂度高、动态场景的实时渲染。 |
Jinbo Yan |
PDF |
N/A |
4D Gaussian Splatting with Scale-aware Residual Field and Adaptive Optimization for Real-time Rendering of Temporally Complex Dynamic Scenes |
| 看得更远,当清晰时:课程一致性模型 |
Yunpeng Liu |
PDF |
N/A |
See Further When Clear: Curriculum Consistency Model |
| 掌握协作多模态数据选择:关注信息性、独特性和代表性 |
Qifan Yu |
PDF |
N/A |
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness |
| ZeroKey:基于大型语言模型的点级推理与零样本三维关键点检测 |
Bingchen Gong |
PDF |
N/A |
ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models |
| S$^{2}$FT:通过结构化稀疏实现高效、可扩展和泛化的LLM微调 |
Xinyu Yang |
PDF |
N/A |
S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity |
| PediaBench:一个用于基准测试大型语言模型的综合性中文儿科数据集 |
Qian Zhang |
PDF |
N/A |
PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models |
| 通过稳定扩散进行艺术对象检测的注释缺失 |
Patrick Ramos |
PDF |
N/A |
No Annotations for Object Detection in Art through Stable Diffusion |
| 神经服装动态超分辨率 |
Meng Zhang |
PDF |
N/A |
Neural Garment Dynamic Super-Resolution |
| 你的数据并不完美:面向类别不平衡数据中的跨域分布外检测 |
Xiang Fang |
PDF |
N/A |
Your Data Is Not Perfect: Towards Cross-Domain Out-of-Distribution Detection in Class-Imbalanced Data |
| Omni-Scene:面向以自我为中心的稀疏视角场景重建的全向高斯表示 |
Dongxu Wei |
PDF |
N/A |
Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction |
| 在大型语言模型时代的法律引注预测方法:一项澳大利亚法律案例研究 |
Ehsan Shareghi |
PDF |
N/A |
Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study |
| 开放词汇高分辨率三维(OVHR3D)数据分割与标注框架 |
Jiuyi Xu |
PDF |
N/A |
Open-Vocabulary High-Resolution 3D (OVHR3D) Data Segmentation and Annotation Framework |
| Table2Image: 使用真实图像变换的可解释表格数据分类 |
Seungeun Lee |
PDF |
N/A |
Table2Image: Interpretable Tabular data Classification with Realistic Image Transformations |
| 流匹配指南与代码 |
Yaron Lipman |
PDF |
N/A |
Flow Matching Guide and Code |
| iLLaVA:在大规模多模态模型中,一张图像的价值少于1/3的输入标记 |
Lianyu Hu |
PDF |
N/A |
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models |
| 利用神经记忆常微分方程的轻量级U型网络用于简化解码器 |
Quansong He |
PDF |
N/A |
A Lightweight U-like Network Utilizing Neural Memory Ordinary Differential Equations for Slimming the Decoder |
| 使用基于姿态的虚拟标记增强多目标追踪在3x3篮球中的应用 |
Li Yin |
PDF |
N/A |
Enhanced Multi-Object Tracking Using Pose-based Virtual Markers in 3x3 Basketball |
| 推进扩展现实与3D高斯喷洒技术:创新与展望 |
Shi Qiu |
PDF |
N/A |
Advancing Extended Reality with 3D Gaussian Splatting: Innovations and Prospects |
| Splatter-360:适用于宽基线全景图像的可泛化360°高斯喷洒技术 |
Zheng Chen |
PDF |
N/A |
Splatter-360: Generalizable 360$^{\circ}$ Gaussian Splatting for Wide-baseline Panoramic Images |
| 优化大型语言模型中的多任务学习以提升性能 |
Zhen Qi |
PDF |
N/A |
Optimizing Multi-Task Learning for Enhanced Performance in Large Language Models |
| 渲染精炼的稳定扩散模型,用于符合隐私保护要求的合成数据生成 |
Kartik Patwari |
PDF |
N/A |
Rendering-Refined Stable Diffusion for Privacy Compliant Synthetic Data |
| 通过内在维度对大型语言模型中学习范式的比较研究 |
Saahith Janapati |
PDF |
N/A |
A Comparative Study of Learning Paradigms in Large Language Models via Intrinsic Dimension |
| DenseVLM:一种用于开放词汇密集预测的检索与解耦对齐框架 |
Yunheng Li |
PDF |
N/A |
DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction |
| U-Know-DiffPAN:一种具有不确定性感知的知识蒸馏扩散框架,结合细节增强技术用于全色锐化 |
Sungpyo Kim |
PDF |
N/A |
U-Know-DiffPAN: An Uncertainty-aware Knowledge Distillation Diffusion Framework with Details Enhancement for PAN-Sharpening |
| 使用基于BERT的大型语言模型在软件定义网络中进行未见攻击检测 |
Mohammed N. Swileh |
PDF |
N/A |
Unseen Attack Detection in Software-Defined Networking Using a BERT-Based Large Language Model |
| 针对胰腺癌治疗中关键蛋白KRAS的天然植物的计算机模拟药代动力学和分子对接研究 |
Marsha Mariya Kappan |
PDF |
N/A |
In Silico Pharmacokinetic and Molecular Docking Studies of Natural Plants against Essential Protein KRAS for Treatment of Pancreatic Cancer |
| VariFace: 面向公平与多样性的人脸识别合成数据集生成 |
Michael Yeung |
PDF |
N/A |
VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition |
| 生成式稠密化:学习通过高斯稠密化实现高保真、可泛化的三维重建 |
Seungtae Nam |
PDF |
N/A |
Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction |
| 矩阵补全的表示迁移学习 |
Yong He |
PDF |
N/A |
Representational Transfer Learning for Matrix Completion |
| 一个可扩展的分散式强化学习框架,用于使用循环PPO进行无人机目标定位 |
Leon Fernando |
PDF |
N/A |
A Scalable Decentralized Reinforcement Learning Framework for UAV Target Localization Using Recurrent PPO |
| 大型语言模型作为辩论伙伴:利用遗传算法和对抗性搜索实现自适应论点 |
Prakash Aryan |
PDF |
N/A |
LLMs as Debate Partners: Utilizing Genetic Algorithms and Adversarial Search for Adaptive Arguments |
| 注意力增强的轻量级沙漏网络用于人体姿态估计 |
Marsha Mariya Kappan |
PDF |
N/A |
Attention-Enhanced Lightweight Hourglass Network for Human Pose Estimation |
| Uni-NaVid:一种基于视频的视觉-语言-动作模型,用于统一具身导航任务 |
Jiazhao Zhang |
PDF |
N/A |
Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks |
| 无数据后门攻击 |
Bochuan Cao |
PDF |
N/A |
Data Free Backdoor Attacks |
| 针对自动驾驶车辆中目标检测的对象消失对抗性补丁攻击的实时防御 |
Jaden Mu |
PDF |
N/A |
A Real-Time Defense Against Object Vanishing Adversarial Patch Attacks for Object Detection in Autonomous Vehicles |
| 一种自引导的多模态方法,用于增强阿尔茨海默病的图表示学习 |
Zhepeng Wang |
PDF |
N/A |
A Self-guided Multimodal Approach to Enhancing Graph Representation Learning for Alzheimer's Diseases |
| MSCrackMamba:利用视觉Mamba进行融合多光谱图像中的裂缝检测 |
Qinfeng Zhu |
PDF |
N/A |
MSCrackMamba: Leveraging Vision Mamba for Crack Detection in Fused Multispectral Imagery |
| H-FedSN:面向物联网应用的高效准确个性化稀疏网络的分层联邦学习 |
Jiechao Gao |
PDF |
N/A |
H-FedSN: Personalized Sparse Networks for Efficient and Accurate Hierarchical Federated Learning for IoT Applications |
| 声音转视觉:通过跨模态潜在对齐从音频生成多样化的视觉效果 |
Kim Sung-Bin |
PDF |
N/A |
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment |
| 用于视听事件定位的导引式多模态语义通信 |
Fei Yu |
PDF |
N/A |
Pilot-guided Multimodal Semantic Communication for Audio-Visual Event Localization |
| 技能增强的从演示中加速强化学习 |
Hanping Zhang |
PDF |
N/A |
Skill-Enhanced Reinforcement Learning Acceleration from Demonstrations |