跳转至

Arxiv 2025-01-14 Papers

标题 作者 PDF链接 代码仓库 Title
DAViD: 使用预训练的视频扩散模型对3D物体的动态可供性进行建模 Hyeonwoo Kim PDF N/A DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models
MangaNinja:精确参考跟随的线稿上色技术 Zhiheng Liu PDF N/A MangaNinja: Line Art Colorization with Precise Reference Following
随波逐流:使用实时扭曲噪声的运动可控视频扩散模型 Ryan Burgert PDF N/A Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise
在线学习中的梯度均衡:理论与应用 Anastasios N. Angelopoulos PDF N/A Gradient Equilibrium in Online Learning: Theory and Applications
从单目视频预测4D手部轨迹 Yufei Ye PDF N/A Predicting 4D Hand Trajectory from Monocular Videos
PokerBench:训练大型语言模型成为专业扑克玩家 Richard Zhuang PDF N/A PokerBench: Training Large Language Models to become Professional Poker Players
Omni-RGPT:通过标记符号统一图像和视频的区域级理解 Miran Heo PDF N/A Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
GameFactory: 使用生成式互动视频创建新游戏 Jiwen Yu PDF N/A GameFactory: Creating New Games with Generative Interactive Videos
ADAM-1:人工智能与生物信息学在阿尔茨海默病检测及微生物组-临床数据整合中的应用 Ziyuan Huang PDF N/A ADAM-1: AI and Bioinformatics for Alzheimer's Detection and Microbiome-Clinical Data Integrations
探索多语言大语言模型在现实世界噪声数据上的鲁棒性 Amirhossein Aliakbarzadeh PDF N/A Exploring Robustness of Multilingual LLMs on Real-World Noisy Data
增强自动可解释性:以输出为中心的特征描述 Yoav Gur-Arieh PDF N/A Enhancing Automated Interpretability with Output-Centric Feature Descriptions
函数相似性度量及其在统计学习与优化中的应用 Chengpiao Huang PDF N/A A Similarity Measure Between Functions with Applications to Statistical Learning and Optimization
这段英文可以翻译为中文如下:

扩散对抗性后训练用于一步视频生成

这个翻译保留了原文的技术术语和含义,适用于描述一种用于视频生成的机器学习方法。 | Shanchuan Lin | PDF | N/A | Diffusion Adversarial Post-Training for One-Step Video Generation | | MiniMax-01:使用闪电注意力扩展基础模型 | MiniMax | PDF | N/A | MiniMax-01: Scaling Foundation Models with Lightning Attention | | 每个人都喜欢睡觉:一项基于计算机辅助的30种语言物体命名数据比较 | Alžběta Kučerová | PDF | N/A | Everybody Likes to Sleep: A Computer-Assisted Comparison of Object Naming Data from 30 Languages | | 使用机器学习与扩展特征的路径损耗预测 | Jonathan Ethier | PDF | N/A | Path Loss Prediction Using Machine Learning with Extended Features | | 基准测试图表示和图神经网络在多变量时间序列分类中的应用 | Wennuo Yang | PDF | N/A | Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series Classification | | 通过多模态视觉序列变压器推进语义未来预测 | Efstathios Karypidis | PDF | N/A | Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers | | 有界树宽的多项式阈值函数:一些可解释性与复杂性方面的探讨 | Karine Chubarian | PDF | N/A | Polynomial Threshold Functions of Bounded Tree-Width: Some Explainability and Complexity Aspects | | 在线平台恋童癖者属性识别技术调查 | Hiba Fallatah | PDF | N/A | A Survey on Pedophile Attribution Techniques for Online Platforms | | LayerAnimate: 动画的图层特定控制 | Yuxue Yang | PDF | N/A | LayerAnimate: Layer-specific Control for Animation | | HALoGEN:神奇的LLM幻觉及其发现之处 | Abhilasha Ravichander | PDF | N/A | HALoGEN: Fantastic LLM Hallucinations and Where to Find Them | | 使用归一化流避免随机信号的减法和除法:NFdeconvolve | Pedro Pessoa | PDF | N/A | Avoiding subtraction and division of stochastic signals using normalizing flows: NFdeconvolve | | VINGS-Mono:大场景中基于视觉-惯性高斯点云的单目SLAM系统 | Ke Wu | PDF | N/A | VINGS-Mono: Visual-Inertial Gaussian Splatting Monocular SLAM in Large Scenes | | 贝叶斯神经网络能否显式地建模输入不确定性? | Matias Valdenegro-Toro | PDF | N/A | Can Bayesian Neural Networks Explicitly Model Input Uncertainty? | | AfriHate:一个针对非洲语言的多语言仇恨言论和侮辱性语言数据集集合 | Shamsuddeen Hassan Muhammad | PDF | N/A | AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages | | LLaVA-ST:一种用于细粒度时空理解的多模态大语言模型 | Hongyu Li | PDF | N/A | LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding | | 从神经网络中解码可解释的逻辑规则 | Chuqin Geng | PDF | N/A | Decoding Interpretable Logic Rules from Neural Networks | | SmartEraser:使用遮罩区域引导从图像中移除任何内容 | Longtao Jiang | PDF | N/A | SmartEraser: Remove Anything from Images using Masked-Region Guidance | | 探索大型语言模型(LLMs)对社会人口统计学条件化改写的鲁棒性 | Pulkit Arora | PDF | N/A | Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing | | 高效适配器微调在顶尖Transformer模型中的比较分析 | Saad Mashkoor Siddiqui | PDF | N/A | Comparative Analysis of Efficient Adapter-Based Fine-Tuning of State-of-the-Art Transformer Models | | 基于深度学习模型的AI驱动水域分割技术用于增强洪水监测 | Sanjida Afrin Mou | PDF | N/A | AI Driven Water Segmentation with deep learning models for Enhanced Flood Monitoring | | 多人联合学习:以更少的通信达到均衡 | TaeHo Yoon | PDF | N/A | Multiplayer Federated Learning: Reaching Equilibrium with Less Communication | | FDPP:基于人类偏好的扩散策略微调 | Yuxin Chen | PDF | N/A | FDPP: Fine-tune Diffusion Policy with Human Preference | | 迈向端到端(E2E)对抗学习及其在物理世界中的应用 | Dudi Biton | PDF | N/A | Towards an End-to-End (E2E) Adversarial Learning and Application in the Physical World | | 激发长上下文大型语言模型的上下文检索与推理能力 | Yifu Qiu | PDF | N/A | Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models | | 文本扩散红队测试大型语言模型:通过邻近约束揭示有害行为 | Jonathan Nöther | PDF | N/A | Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints | | 持续深度主动学习在医学影像中的应用:基于回放的上下文适应架构 | Rui Daniel | PDF | N/A | Continual Deep Active Learning for Medical Imaging: Replay-Base Architecture for Context Adaptation | | 为自主云操作(CloudOps)设计的多代理框架的工程化大型语言模型(LLM) | Kannan Parthasarathy | PDF | N/A | Engineering LLM Powered Multi-agent Framework for Autonomous CloudOps | | 以下是这段文字的中文翻译:

一种基于Choquet积分和差分进化优化的特征级集成模型,用于CXR图像中的COVID-19识别

翻译说明: - Feature-Level Ensemble Model:特征级集成模型,指在特征层面进行模型集成的方法。 - COVID-19 Identification:COVID-19识别,指通过图像或其他数据识别COVID-19。 - CXR Images:CXR图像,即胸部X光图像。 - Choquet Integral:Choquet积分,一种用于多特征融合的数学工具。 - Differential Evolution Optimization:差分进化优化,一种用于优化问题的进化算法。

希望这段翻译对你有帮助! | Amir Reza Takhsha | PDF | N/A | A Feature-Level Ensemble Model for COVID-19 Identification in CXR Images using Choquet Integral and Differential Evolution Optimization | | 机器学习中的隐私保护模型与预处理验证 | Wenbiao Li | PDF | N/A | Privacy-Preserving Model and Preprocessing Verification for Machine Learning | | 使用多智能体强化学习的高速铁路动态定价 | Enrique Adrian Villarrubia-Martin | PDF | N/A | Dynamic Pricing in High-Speed Railways Using Multi-Agent Reinforcement Learning | | 基于深度学习的高效脑肿瘤生长模型正向求解器 | Zeineb Haouari | PDF | N/A | Efficient Deep Learning-based Forward Solvers for Brain Tumor Growth Models | | FramePainter:为交互式图像编辑赋予视频扩散先验 | Yabo Zhang | PDF | N/A | FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors | | 大规模批处理贝叶斯主动学习通过考虑预测概率 | Sebastian W. Ober | PDF | N/A | Big Batch Bayesian Active Learning by Considering Predictive Probabilities | | 使用强化学习优化卫星通信的链路配置 | Tobias Rohe | PDF | N/A | Optimization of Link Configuration for Satellite Communication Using Reinforcement Learning | | 研究在不同任务和动态电压频率调节(DVFS)设置下大型语言模型(LLM)推理的能效与性能权衡 | Paul Joe Maliakel | PDF | N/A | Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings | | ASTRID —— 一个自动化且可扩展的TRIaD,用于评估基于RAG的临床问答系统 | Mohita Chowdhury | PDF | N/A | ASTRID -- An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems | | 为量子机器学习建模特征图 | Navneet Singh | PDF | N/A | Modeling Feature Maps for Quantum Machine Learning | | ArithmAttack: 评估大型语言模型在数学问题解决中对噪声上下文的鲁棒性 | Zain Ul Abedin | PDF | N/A | ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving | | 使用非线性动力学的二次嵌入进行数据驱动的系统辨识 | Stefan Klus | PDF | N/A | Data-driven system identification using quadratic embeddings of nonlinear dynamics | | 全局收敛的变分推断 | Declan McNamara | PDF | N/A | Globally Convergent Variational Inference | | CWEval:基于结果的LLM代码生成功能与安全性评估 | Jinjun Peng | PDF | N/A | CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation | | EmoNeXt:一种适用于面部表情识别的改进版ConvNeXt | Yassine El Boudouri | PDF | N/A | EmoNeXt: an Adapted ConvNeXt for Facial Emotion Recognition | | OpenCSG中文语料库:一系列用于大语言模型训练的高质量中文数据集 | Yijiong Yu | PDF | N/A | OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training | | 自监督深度高光谱修复与即插即用和深度图像先验模型 | Shuo Li | PDF | N/A | Self-supervised Deep Hyperspectral Inpainting with the Plug and Play and Deep Image Prior Models | | 为基因组数据分析建模量子机器学习 | Navneet Singh | PDF | N/A | Modeling Quantum Machine Learning for Genomic Data Analysis | | PRESERVE:分布式大语言模型服务中的权重预取与KV缓存机制 | Ahmet Caner Yüzügüler | PDF | N/A | PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving | | 单目深度估计中的不确定性量化与基础模型的关键综合 | Steven Landgraf | PDF | N/A | A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation | | 单细胞分析的多模态人工智能副驾驶,具备指令跟随功能 | Yin Fang | PDF | N/A | A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following | | 评估中小企业中的人工智能应用与数字化:实施框架 | Serena Proietti | PDF | N/A | Assessing AI Adoption and Digitalization in SMEs: A Framework for Implementation | | CG-MER:一个基于卡牌游戏的多模态情感识别数据集 | Nessrine Farhat | PDF | N/A | CG-MER: A Card Game-based Multimodal dataset for Emotion Recognition | | D$^2$-DPM:量化扩散概率模型的双重去噪 | Qian Zeng | PDF | N/A | D$^2$-DPM: Dual Denoising for Quantized Diffusion Probabilistic Models | | 以下是这段文字的中文翻译:

对象中心的二维高斯泼溅:背景去除与遮挡感知修剪以实现紧凑的对象模型

这个翻译保留了原文的技术术语和核心概念,同时使其更符合中文的表达习惯。 | Marcel Rogge | PDF | N/A | Object-Centric 2D Gaussian Splatting: Background Removal and Occlusion-Aware Pruning for Compact Object Models | | 基准测试多模态模型在细粒度图像分析中的应用:跨多样化视觉特征的比较研究 | Evgenii Evstafev | PDF | N/A | Benchmarking Multimodal Models for Fine-Grained Image Analysis: A Comparative Study Across Diverse Visual Features | | 利用深度学习和可解释人工智能(XAI)革新通信,提升阿拉伯手语识别能力 | Mazen Balat | PDF | N/A | Revolutionizing Communication with Deep Learning and XAI for Enhanced Arabic Sign Language Recognition | | LeapVAD:通过认知感知与双过程思维实现自动驾驶的飞跃 | Yukai Ma | PDF | N/A | LeapVAD: A Leap in Autonomous Driving via Cognitive Perception and Dual-Process Thinking | | 大型语言模型作为非结构化文本数据评判者的潜力与风险 | Rewina Bedemariam | PDF | N/A | Potential and Perils of Large Language Models as Judges of Unstructured Textual Data | | 我可以在几秒钟内找到你!利用大型语言模型进行代码作者归属 | Soohyeon Choi | PDF | N/A | I Can Find You in Seconds! Leveraging Large Language Models for Code Authorship Attribution | | DM-Mamba: 用于MRI重建的双域多尺度Mamba | Yucong Meng | PDF | N/A | DM-Mamba: Dual-domain Multi-scale Mamba for MRI reconstruction | | 推理时计算:更真实吗?研究笔记 | James Chua | PDF | N/A | Inference-Time-Compute: More Faithful? A Research Note | | FairTTTS:一种面向公平性分类的树测试时间模拟方法 | Nurit Cohen-Inger | PDF | N/A | FairTTTS: A Tree Test Time Simulation Method for Fairness-Aware Classification | | 将这段翻译成中文是:“针对深度神经网络的能量后门攻击”。 | Hanene F. Z. Brachemi Meftah | PDF | N/A | Energy Backdoor Attack to Deep Neural Networks | | 多输入变分自编码器在异构数据中的异常检测 | Phai Vu Dinh | PDF | N/A | Multiple-Input Variational Auto-Encoder for Anomaly Detection in Heterogeneous Data | | 大型语言模型中的拒绝行为:非线性视角 | Fabian Hildebrandt | PDF | N/A | Refusal Behavior in Large Language Models: A Nonlinear Perspective | | 引导关键场景:高分辨率图像修复在自动飞行安全关键检测与规避中的应用 | Jonathan Lyhs | PDF | N/A | Bootstrapping Corner Cases: High-Resolution Inpainting for Safety Critical Detect and Avoid for Automated Flying | | EEG-ReMinD:通过自监督状态重建引导的黎曼动力学增强神经退行性EEG解码 | Zirui Wang | PDF | N/A | EEG-ReMinD: Enhancing Neurodegenerative EEG Decoding through Self-Supervised State Reconstruction-Primed Riemannian Dynamics | | 视听深度伪造检测与局部时间不一致性 | Marcella Astrid | PDF | N/A | Audio-visual Deepfake Detection With Local Temporal Inconsistencies | | 基于符号回归的航空声学预测的壁面压力谱经验模型 | Laura Botero Bolívar | PDF | N/A | An Empirical Wall-Pressure Spectrum Model for Aeroacoustic Predictions Based on Symbolic Regression | | SAR反击战:RSVQA的新希望 | Lucrezia Tosato | PDF | N/A | SAR Strikes Back: A New Hope for RSVQA | | 使用Graph-PReFLexOR进行原位图推理和知识扩展 | Markus J. Buehler | PDF | N/A | In-situ graph reasoning and knowledge expansion using Graph-PReFLexOR | | 回顾鸟瞰图感知模型与冻结基础模型的结合:DINOv2与Metric3Dv2 | Seamie Hayes | PDF | N/A | Revisiting Birds Eye View Perception Models with Frozen Foundation Models: DINOv2 and Metric3Dv2 | | RoHan:手术室中的鲁棒手部检测 | Roi Papo | PDF | N/A | RoHan: Robust Hand Detection in Operation Room | | 遥感图像描述生成技术的演进:迈向SAT-Cap —— 一种单阶段Transformer方法 | Yuduo Wang | PDF | N/A | Change Captioning in Remote Sensing: Evolution to SAT-Cap -- A Single-Stage Transformer Approach | | EarthView: 一个用于自监督学习的大规模遥感数据集 | Diego Velazquez | PDF | N/A | EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision | | 数据驱动的新产品库存管理:一种预热启动与调整的Dyna-$Q$方法 | Xinyu Qu | PDF | N/A | Data-driven inventory management for new products: A warm-start and adjusted Dyna-$Q$ approach | | 大型语言模型在社交媒体上生成的回复和续写的一致性 | Wenlu Fan | PDF | N/A | Consistency of Responses and Continuations Generated by Large Language Models on Social Media | | 通过平滑在线学习实现顺畅交接 | Michail Kalntis | PDF | N/A | Smooth Handovers via Smoothed Online Learning | | 指导使用深度学习和手工设计的放射学特征对3D CT扫描中的肝细胞癌进行分类 | E. Sarfati | PDF | N/A | Guiding the classification of hepatocellular carcinoma on 3D CT-scans using deep and handcrafted radiological features | | 基于混合动作的多目标兼容自动驾驶强化学习 | Guizhe Jin | PDF | N/A | Hybrid Action Based Reinforcement Learning for Multi-Objective Compatible Autonomous Driving | | CellOMaps:一种用于稳健分类肺腺癌生长模式的紧凑表示方法 | Arwa Al-Rubaian | PDF | N/A | CellOMaps: A Compact Representation for Robust Classification of Lung Adenocarcinoma Growth Patterns | | 以下是“Hierarchical Autoscaling for Large Language Model Serving with Chiron”的中文翻译:

基于Chiron的大型语言模型服务分层自动扩展

这个标题描述了一种名为Chiron的系统,它用于大型语言模型(LLM)服务的分层自动扩展。具体来说,Chiron通过分层架构实现资源的动态调整,以应对LLM服务中的负载变化,从而提高资源利用率和系统性能。 | Archit Patke | PDF | N/A | Hierarchical Autoscaling for Large Language Model Serving with Chiron | | AgentPose: 通过特征代理进行渐进式分布对齐的人体姿态蒸馏 | Feng Zhang | PDF | N/A | AgentPose: Progressive Distribution Alignment via Feature Agent for Human Pose Distillation | | NOMTO: 基于神经算子的符号模型近似与发现 | Sergei Garmaev | PDF | N/A | NOMTO: Neural Operator-based symbolic Model approximaTion and discOvery | | 动态多模态情感分析:利用跨模态注意力实现分类 | Hui Lee | PDF | N/A | Dynamic Multimodal Sentiment Analysis: Leveraging Cross-Modal Attention for Enabled Classification | | 基准测试视觉基础模型在自动驾驶输入监控中的应用 | Nert Keser | PDF | N/A | Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving | | 人工肝分类器:传统机器学习模型的新替代方案 | Mahmood A. Jumaah | PDF | N/A | Artificial Liver Classifier: A New Alternative to Conventional Machine Learning Models | | CuAsmRL:通过深度强化学习优化GPU SASS调度 | Guoliang He | PDF | N/A | CuAsmRL: Optimizing GPU SASS Schedules via Deep Reinforcement Learning | | 以下是“A Roadmap to Guide the Integration of LLMs in Hierarchical Planning”的中文翻译:

指导大语言模型在分层规划中集成的路线图

这个标题可以理解为:为如何将大语言模型(LLMs)整合到分层规划过程中提供指导性框架或步骤。 | Israel Puerta-Merino | PDF | N/A | A Roadmap to Guide the Integration of LLMs in Hierarchical Planning | | 在协变量偏移下的最优策略适应 | Xueqing Liu | PDF | N/A | Optimal Policy Adaptation under Covariate Shift | | 零样本中文字符生成的骨架与字体生成网络 | Mobai Xue | PDF | N/A | Skeleton and Font Generation Network for Zero-shot Chinese Character Generation | | 通过条件计算优化语音多视图特征融合 | Weiqiao Shan | PDF | N/A | Optimizing Speech Multi-View Feature Fusion through Conditional Computation | | 探索大型语言模型中的叙事聚类:BERT的分层分析 | Awritrojit Banerjee | PDF | N/A | Exploring Narrative Clustering in Large Language Models: A Layerwise Analysis of BERT | | 关于在结构健康监测中使用统计学习理论进行模型选择 | C. A. Lindley | PDF | N/A | On the use of Statistical Learning Theory for model selection in Structural Health Monitoring | | 自注意力时空校准用于精确的中间层匹配在ANN到SNN的蒸馏中 | Di Hong | PDF | N/A | Self-Attentive Spatio-Temporal Calibration for Precise Intermediate Layer Matching in ANN-to-SNN Distillation | | Gen-A:将Ambisonics神经编码推广至未知麦克风阵列 | Mikko Heikkinen | PDF | N/A | Gen-A: Generalizing Ambisonics Neural Encoding to Unseen Microphone Arrays | | 构建共生人工智能:审视《人工智能法案》以建立以人为本、基于原则的框架 | Miriana Calvano | PDF | N/A | Building Symbiotic AI: Reviewing the AI Act for a Human-Centred, Principle-Based Framework | | UFGraphFR:一种基于用户文本特征的联邦推荐系统的尝试 | Xudong Wang | PDF | N/A | UFGraphFR: An attempt at a federated recommendation system based on user text characteristics | | PolyLUT:基于硬件感知结构化剪枝的超低延迟多项式推理 | Marta Andronic | PDF | N/A | PolyLUT: Ultra-low Latency Polynomial Inference with Hardware-Aware Structured Pruning | | 探索视觉语言模型作为尤文肉瘤诊断中的强大工具 | Alvaro Pastor-Naranjo | PDF | N/A | Exploring visual language models as a powerful tool in the diagnosis of Ewing Sarcoma | | 一类递归神经网络实时递归学习(RTRL)的收敛性分析 | Samuel Chun-Hei Lam | PDF | N/A | Convergence Analysis of Real-time Recurrent Learning (RTRL) for a class of Recurrent Neural Networks | | 通过光照-纹理调制实现鲁棒的低光人体姿态估计 | Feng Zhang | PDF | N/A | Robust Low-Light Human Pose Estimation through Illumination-Texture Modulation | | 增强型SPS(半持续调度)速度自适应方案:5G NR V2I网络中的接入公平性 | Xiao Xu | PDF | N/A | Enhanced SPS Velocity-adaptive Scheme: Access Fariness in 5G NR V2I Networks | | 阅读:基于强化的对抗学习在有限标注数据下的文本分类应用 | Rohit Sharma | PDF | N/A | READ: Reinforcement-based Adversarial Learning for Text Classification with Limited Labeled Data | | 协作巡逻路线规划:通过多智能体强化学习优化城市犯罪监控 | Juan Palma-Borda | PDF | N/A | Cooperative Patrol Routing: Optimizing Urban Crime Surveillance through Multi-Agent Reinforcement Learning | | 一个基于人工智能的框架,用于快速和本地化优化城市开放空间 | Pegah Eshraghi | PDF | N/A | An AI-driven framework for rapid and localized optimizations of urban open spaces | | 教程:变分自编码器(VAE)作为神经影像学的推理范式 | C. Vázquez-García | PDF | N/A | Tutorial: VAE as an inference paradigm for neuroimaging | | TriAdaptLoRA:基于大脑启发的三角自适应低秩适应,用于参数高效微调 | Yao Liang | PDF | N/A | TriAdaptLoRA: Brain-Inspired Triangular Adaptive Low-Rank Adaptation for Parameter-Efficient Fine-Tuning | | DisCoPatch:批次统计量是进行OOD检测所需的全部,但前提是您能够信任它们 | Francisco Caetano | PDF | N/A | DisCoPatch: Batch Statistics Are All You Need For OOD Detection, But Only If You Can Trust Them | | 将以下内容翻译成中文:为法语数据采样形式化词汇和句法多样性 | Louis Estève | PDF | N/A | Formalising lexical and syntactic diversity for data sampling in French | | 通过基于贝叶斯优化的模型投毒最大化联邦学习的不确定性 | Marios Aristodemou | PDF | N/A | Maximizing Uncertainty for Federated learning via Bayesian Optimisation-based Model Poisoning | | GDiffRetro:基于双图增强分子表示与扩散生成的逆合成预测 | Shengyin Sun | PDF | N/A | GDiffRetro: Retrosynthesis Prediction with Dual Graph Enhanced Molecular Representation and Diffusion Generation | | 无监督特征构建在时间序列异常检测中的应用——一项评估 | Marine Hamon | PDF | N/A | Unsupervised Feature Construction for Anomaly Detection in Time Series -- An Evaluation | | 奖励兼容性:一个逆向强化学习的框架 | Filippo Lazzati | PDF | N/A | Reward Compatibility: A Framework for Inverse RL | | 结合成像和形状特征进行阿尔茨海默病分类和脑龄回归的预测任务 | Nairouz Shehata | PDF | N/A | Combining imaging and shape features for prediction tasks of Alzheimer's disease classification and brain age regression | | LLM增强的整体架构用于临时可扩展的系统之系统(SoS) | Muhammad Ashfaq | PDF | N/A | LLM-Ehnanced Holonic Architecture for Ad-Hoc Scalable SoS | | 使用数字孪生技术训练具有多模光学非线性特性的混合神经网络 | Ilker Oguz | PDF | N/A | Training Hybrid Neural Networks with Multimode Optical Nonlinearities Using Digital Twins | | GAC-Net:基于几何和注意力机制的深度补全网络 | Kuang Zhu | PDF | N/A | GAC-Net_Geometric and attention-based Network for Depth Completion | | 检查框:安全可变阻抗学习用于机器人抛光 | Emma Cramer | PDF | N/A | CHEQ-ing the Box: Safe Variable Impedance Learning for Robotic Polishing | | 阈值注意力网络用于遥感图像的语义分割 | Wei Long | PDF | N/A | Threshold Attention Network for Semantic Segmentation of Remote Sensing Images | | V-Trans4Style:视频制作风格适应的视觉转场推荐 | Pooja Guhan | PDF | N/A | V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation | | 视频中的面部动态:通过指令调优提升面部表情感知与上下文理解能力 | Jiaxing Zhao | PDF | N/A | Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness | | 零样本视频时刻检索通过现成的多模态大型语言模型实现 | Yifang Xu | PDF | N/A | Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models | | 基于综合元路径的异质图变换器用于基因-疾病关联预测 | Wentao Cui | PDF | N/A | Comprehensive Metapath-based Heterogeneous Graph Transformer for Gene-Disease Association Prediction | | 多输出(又称多任务)高斯过程输出相关性推断的推导 | Shuhei Watanabe | PDF | N/A | Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process | | SkipClick:结合快速响应和低级特征实现冬季运动场景中的交互式分割 | Robin Schön | PDF | N/A | SkipClick: Combining Quick Responses and Low-Level Features for Interactive Segmentation in Winter Sports Contexts | | 自指导少样本越狱攻击:将攻击分解为模式学习和行为学习 | Jiaqi Hua | PDF | N/A | Self-Instruct Few-Shot Jailbreaking: Decompose the Attack into Pattern and Behavior Learning | | AI导盲犬:基于智能手机的自我中心路径预测 | Aishwarya Jadhav | PDF | N/A | AI Guide Dog: Egocentric Path Prediction on Smartphone | | 多目标神经进化在游戏测试中的应用 | Patric Feldmeier | PDF | N/A | Many-Objective Neuroevolution for Testing Games | | 稳健的高光谱图像全色锐化通过稀疏空间-光谱表示 | Chia-Ming Lee | PDF | N/A | Robust Hyperspectral Image Panshapring via Sparse Spatial-Spectral Representation | | 使用学习编码的差分时间表示的脉冲神经网络加速器架构 | Daniel Windhager | PDF | N/A | Spiking Neural Network Accelerator Architecture for Differential-Time Representation using Learned Encoding | | “等等,你是指医生吗?”:收集用于主题分析的对话语料库 | Amandine Decker | PDF | N/A | "Wait, did you mean the doctor?": Collecting a Dialogue Corpus for Topical Analysis | | 早期通过视频显微镜预测牛胚胎的移植能力 | Yasmine Hachani | PDF | N/A | Early prediction of the transferability of bovine embryos from videomicroscopy | | ChatGPT模型在糖尿病自我管理中的建议:挑战与推荐 | Waqar Hussain | PDF | N/A | Advice for Diabetes Self-Management by ChatGPT Models: Challenges and Recommendations | | 一种用于高效灵活CNN架构的自适应正交卷积方案 | Thibaut Boissin | PDF | N/A | An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures | | 甘道夫之红:大型语言模型的自适应安全 | Niklas Pfister | PDF | N/A | Gandalf the Red: Adaptive Security for LLMs | | 使用LSTM、GRU和BiLSTM进行航空安全中的飞行阶段分类:基于ASN数据集的案例研究 | Aziida Nanyonga | PDF | N/A | Phase of Flight Classification in Aviation Safety using LSTM, GRU, and BiLSTM: A Case Study with ASN Dataset | | 使用主题建模和聚类技术探索航空事故叙述 | Aziida Nanyonga | PDF | N/A | Exploring Aviation Incident Narratives Using Topic Modeling and Clustering Techniques | | 通过自然语言处理与深度学习增强航空安全:在ATSB安全报告中分类飞行阶段 | Aziida Nanyonga | PDF | N/A | Aviation Safety Enhancement via NLP & Deep Learning: Classifying Flight Phases in ATSB Safety Reports | | VENOM:基于扩散模型的文本驱动无限制对抗样本生成 | Hui Kuurila-Zhang | PDF | N/A | VENOM: Text-driven Unrestricted Adversarial Example Generation with Diffusion Models | | 家庭能源管理系统的大型语言模型接口 | François Michelon | PDF | N/A | Large Language Model Interface for Home Energy Management Systems | | 管理AI代理 | Noam Kolt | PDF | N/A | Governing AI Agents | | 深度学习与自然语言处理在建筑领域的应用 | Rémy Kessler | PDF | N/A | Deep Learning and Natural Language Processing in the Field of Construction | | 对数记忆网络(Logarithmic Memory Networks,简称LMNs):面向资源受限环境的高效长程序列建模 | Mohamed A. Taha | PDF | N/A | Logarithmic Memory Networks (LMNs): Efficient Long-Range Sequence Modeling for Resource-Constrained Environments | | 使用动态规划和分支定界法对连续特征数据进行最优分类树构建 | Catalin E. Brita | PDF | N/A | Optimal Classification Trees for Continuous Feature Data Using Dynamic Programming with Branch-and-Bound | | 基于双流残差网络的极化合成孔径雷达与光学数据融合去云方法 | Yuxi Wang | PDF | N/A | Cloud Removal With PolSAR-Optical Data Fusion Using A Two-Flow Residual Network | | 人脸图像质量度量中的人口统计学变异性 | Wassim Kabbani | PDF | N/A | Demographic Variability in Face Image Quality Measures | | 随时协作式隐式命中集求解 | Emma Rollón | PDF | N/A | Anytime Cooperative Implicit Hitting Set Solving | | 利用元记忆机制增强大型语言模型的无数据代码生成能力 | Shuai Wang | PDF | N/A | Leveraging Metamemory Mechanisms for Enhanced Data-Free Code Generation in LLMs | | GRAPHMOE:通过引入自我反思机制增强专家混合网络的认知深度 | Chen Tang | PDF | N/A | GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism | | Tarsier2:从详细的视频描述到全面的视频理解,推动大型视觉语言模型的发展 | Liping Yuan | PDF | N/A | Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding | | 在弱监督条件下,迭代标签优化比偏好优化更为重要。 | Yaowen Ye | PDF | N/A | Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision | | 使用因果建模减轻多类CNN分类中的算法偏差 | Min Sik Byun | PDF | N/A | Mitigating Algorithmic Bias in Multiclass CNN Classifications Using Causal Modeling | | MD-Syn:基于多维特征融合方法和注意力机制的协同药物组合预测 | XinXin Ge | PDF | N/A | MD-Syn: Synergistic drug combination prediction based on the multidimensional feature fusion method and attention mechanisms | | 分布式非参数估计:从稀疏到密集的终端样本 | Deheng Yuan | PDF | N/A | Distributed Nonparametric Estimation: from Sparse to Dense Samples per Terminal | | 使用Whisper进行嵌入层手术和任务级束搜索的持续学习 | Chin Yuen Kwok | PDF | N/A | Continual Learning with Embedding Layer Surgery and Task-wise Beam Search using Whisper | | Make-A-Character 2:从单张图像生成可动画的3D角色 | Lin Liu | PDF | N/A | Make-A-Character 2: Animatable 3D Character Generation From a Single Image | | ReARTeR:基于可信过程奖励的检索增强推理 | Zhongxiang Sun | PDF | N/A | ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding | | deepTerra —— 让AI土地分类变得简单 | Andrew Keith Wilkinson | PDF | N/A | deepTerra -- AI Land Classification Made Easy | | 使用本地大型语言模型进行业务应用的分层存储库级代码摘要 | Nilesh Dhulshette | PDF | N/A | Hierarchical Repository-Level Code Summarization for Business Applications Using Local LLMs | | ## 图像超分辨率的最先进Transformer模型:技术、挑战与应用

摘要: 近年来,Transformer模型在自然语言处理领域取得了巨大成功,并逐渐扩展到计算机视觉领域。本文将探讨Transformer模型在图像超分辨率(SR)任务中的应用,介绍其核心技术、面临的挑战以及实际应用场景。

关键词: Transformer,图像超分辨率,深度学习,计算机视觉

1. 引言

图像超分辨率是指从低分辨率图像重建高分辨率图像的技术,在医学影像、卫星图像、视频监控等领域具有广泛应用。传统的图像超分辨率方法主要基于插值和重建算法,而深度学习的兴起为这一领域带来了新的突破。

2. Transformer模型简介

Transformer模型最初应用于机器翻译任务,其核心思想是利用自注意力机制捕捉序列数据之间的长距离依赖关系。与传统的卷积神经网络(CNN)相比,Transformer模型具有以下优势:

  • 全局感受野: 自注意力机制可以捕捉图像中任意两个像素之间的关系,而CNN的感受野受限于卷积核大小。
  • 并行计算: Transformer模型可以并行处理序列数据,计算效率更高。
  • 可解释性: 自注意力权重可以直观地反映模型关注的重点区域。

3. Transformer模型在图像超分辨率中的应用

近年来,研究者们将Transformer模型引入图像超分辨率任务,并取得了显著成果。主要技术路线包括:

  • 基于Transformer的编码器-解码器架构: 将Transformer模型作为编码器和解码器,分别用于提取图像特征和重建高分辨率图像。
  • 混合CNN-Transformer架构: 结合CNN和Transformer的优势,利用CNN提取局部特征,利用Transformer捕捉全局依赖关系。
  • 轻量级Transformer模型: 针对移动端等资源受限场景,设计轻量级的Transformer模型,在保证性能的同时降低计算复杂度。

4. 挑战与未来方向

尽管Transformer模型在图像超分辨率任务中展现出巨大潜力,但仍面临一些挑战:

  • 计算复杂度高: Transformer模型的计算复杂度与图像尺寸的平方成正比,难以处理高分辨率图像。
  • 数据需求量大: Transformer模型需要大量的训练数据才能达到较好的性能。
  • 模型可解释性有待提高: 尽管自注意力机制具有一定的可解释性,但仍需进一步研究如何更好地理解和解释Transformer模型的决策过程。

未来研究方向包括:

  • 设计更高效的Transformer架构: 探索更高效的注意力机制和模型结构,降低计算复杂度。
  • 利用无监督学习和自监督学习: 减少对标注数据的依赖,提高模型的泛化能力。
  • 结合领域知识: 将图像超分辨率领域的先验知识融入Transformer模型,提高模型的性能和可解释性。

5. 应用场景

Transformer模型在图像超分辨率领域的应用前景广阔,例如:

  • 医学影像: 提高医学影像的分辨率,辅助医生进行疾病诊断和治疗。
  • 卫星图像: 增强卫星图像的清晰度,用于环境监测、城市规划等领域。
  • 视频监控: 提升监控视频的画质,便于目标识别和行为分析。

6. 结论

Transformer模型为图像超分辨率领域带来了新的机遇和挑战。随着技术的不断发展,Transformer模型有望在图像超分辨率任务中发挥更大的作用,为相关应用领域带来更大的价值。 | Debasish Dutta | PDF | N/A | State-of-the-Art Transformer Models for Image Super-Resolution: Techniques, Challenges, and Applications | | 优化语言模型以提升语法可接受性:微调技术的比较研究 | Shobhit Ratan | PDF | N/A | Optimizing Language Models for Grammatical Acceptability: A Comparative Study of Fine-Tuning Techniques | | 以下是将这段英文翻译成中文的结果:

一种用于半监督动脉粥样硬化冠状动脉斑块分割的帧内和帧间拓扑一致性方案

这个翻译保持了原文的技术性和专业性,同时确保了中文表达的准确性和流畅性。 | Ziheng Zhang | PDF | N/A | An Intra- and Cross-frame Topological Consistency Scheme for Semi-supervised Atherosclerotic Coronary Plaque Segmentation | | 揭示大型语言模型在代码生成中的提供者偏见 | Xiaoyu Zhang | PDF | N/A | Unveiling Provider Bias in Large Language Models for Code Generation | | 基于图结构的推理:构建隐性知识以增强大型语言模型的推理能力 | Haoyu Han | PDF | N/A | Reasoning with Graphs: Structuring Implicit Knowledge to Enhance LLMs Reasoning | | 基于大语言模型的高速列车驾驶员咨询系统 | Y. C. Luo | PDF | N/A | A Driver Advisory System Based on Large Language Model for High-speed Train | | 流程:一种模块化的自动化代理工作流生成方法 | Boye Niu | PDF | N/A | Flow: A Modular Approach to Automated Agentic Workflow Generation | | 电价预测区间构建方法 | Xin Lu | PDF | N/A | Prediction Interval Construction Method for Electricity Prices | | 实时验证与优化语言模型文本生成 | Joonho Ko | PDF | N/A | Real-time Verification and Refinement of Language Model Text Generation | | 3UR-LLM:一种用于3D场景理解的端到端多模态大语言模型 | Haomiao Xiong | PDF | N/A | 3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding | | 一种用于微调大型语言模型的多编码器冻结解码器方法 | Kaustubh D. Dhole | PDF | N/A | A Multi-Encoder Frozen-Decoder Approach for Fine-Tuning Large Language Models | | 以代理为中心的任务提示技术及其对大型语言模型合成训练数据的影响 | Dhruv Dhamani | PDF | N/A | Agent-Centric Projection of Prompting Techniques and Implications for Synthetic Training Data for Large Language Models | | STTS-EAD: 通过改进基于时空学习的时间序列预测 | Yuanyuan Liang | PDF | N/A | STTS-EAD: Improving Spatio-Temporal Learning Based Time Series Prediction via | | 与合适的专家交流:多智能体系统中的问答路由与规划 | Feijie Wu | PDF | N/A | Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering | | AVS-Mamba:探索用于音视频分割的时间与多模态Mamba | Sitong Gong | PDF | N/A | AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation | | 共形映射坐标物理信息神经网络(CoCo-PINNs):用于设计中立包含物的神经网络学习方法 | Daehee Cho | PDF | N/A | Conformal mapping Coordinates Physics-Informed Neural Networks (CoCo-PINNs): learning neural networks for designing neutral inclusions | | 一种低成本且超轻量级的二进制神经网络用于交通信号识别 | Mingke Xiao | PDF | N/A | A Low-cost and Ultra-lightweight Binary Neural Network for Traffic Signal Recognition | | 学习运动和时间线索以进行无监督视频对象分割 | Yunzhi Zhuge | PDF | N/A | Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation | | 知识蒸馏中的平衡差异 | Yafei Qi | PDF | N/A | Balance Divergence for Knowledge Distillation | | 将“Visual Language Models as Operator Agents in the Space Domain”翻译成中文可以是:

“视觉语言模型作为空间领域中的操作代理”

这个标题表明视觉语言模型在空间领域(如航天、卫星图像分析等)中扮演着操作代理的角色,可能用于自动化任务、决策支持或数据分析等场景。 | Alejandro Carrasco | PDF | N/A | Visual Language Models as Operator Agents in the Space Domain | | 网络安全中基于DNN的白盒可解释AI方法的比较分析 | Osvaldo Arreche | PDF | N/A | A Comparative Analysis of DNN-based White-Box Explainable AI Methods in Network Security | | BioPose:基于单目视频的生物力学精确三维姿态估计 | Farnoosh Koleini | PDF | N/A | BioPose: Biomechanically-accurate 3D Pose Estimation from Monocular Videos | | 线性收敛的Mixup学习 | Gakuto Obi | PDF | N/A | Linearly Convergent Mixup Learning | | 参数倒置图像金字塔网络用于视觉感知与多模态理解 | Zhaokai Wang | PDF | N/A | Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding | | 变革室内定位:针对分布式传感器主导的非视距无线环境的先进Transformer架构 | Saad Masrur | PDF | N/A | Transforming Indoor Localization: Advanced Transformer Architecture for NLOS Dominated Wireless Environments with Distributed Sensors | | 对称性感知生成建模通过学习规范化实现 | Kusha Sareen | PDF | N/A | Symmetry-Aware Generative Modeling through Learned Canonicalization | | BMIP: 面向视觉语言模型的双向模态交互提示学习 | Song-Lin Lv | PDF | N/A | BMIP: Bi-directional Modality Interaction Prompt Learning for VLM | | 大型语言模型在知识图谱嵌入技术、方法和挑战中的应用:综述 | Bingchen Liu | PDF | N/A | Large Language Models for Knowledge Graph Embedding Techniques, Methods, and Challenges: A Survey | | PINN-FEM:一种在物理信息神经网络中强制执行狄利克雷边界条件的混合方法 | Nahil Sobh | PDF | N/A | PINN-FEM: A Hybrid Approach for Enforcing Dirichlet Boundary Conditions in Physics-Informed Neural Networks | | 深度学习在疾病暴发预测中的应用:跨临界分岔的稳健早期预警信号 | Reza Miry | PDF | N/A | Deep Learning for Disease Outbreak Prediction: A Robust Early Warning Signal for Transcritical Bifurcations |