2024/12/20
- DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation
- AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
- UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency
- Move-in-2D: 2D-Conditioned Human Motion Generation
2024/12/19
2024/12/18
2024/12/17
- Byte Latent Transformer: Patches Scale Better Than Tokens
- ColorFlow: Retrieval-Augmented Image Sequence Colorization
- Causal Diffusion Transformers for Generative Modeling
- Smaller Language Models Are Better Instruction Evolvers
- IDArb: Intrinsic Decomposition for arbitrary number of input views and illuminations
- SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
- Reliable, Reproducible, and Really Fast Leaderboards with Evalica
2024/12/12
2024/12/11
2024/12/10
2024/12/09
2024/12/06
- Evaluating Language Models as Synthetic Data Generators
- Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
- Negative Token Merging: Image-based Adversarial Feature Guidance
- MV-Adapter: Multi-view Consistent Image Generation Made Easy
- Densing Law of LLMs
- OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
2024/12/05
2024/12/04
2024/12/03
2024/11/29
2024/11/28
- Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
- UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
- Optimizing Brain Tumor Segmentation with MedNeXt: BraTS 2024 SSA and Pediatrics
- VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format
- Adaptive Blind All-in-One Image Restoration
2024/11/27
- TEXGen: a Generative Diffusion Model for Mesh Textures
- SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
- Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
- FineCaption: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
2024/11/26
2024/11/25
- Style-Friendly SNR Sampler for Style-Driven Generation
- Tülu 3: Pushing Frontiers in Open Language Model Post-Training
- A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection
- OminiControl: Minimal and Universal Control for Diffusion Transformer
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
- BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
2024/11/22
- Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
- OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
- UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages
- MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
2024/11/21
- SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
- Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
- When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
- 𝕍𝕚𝔹𝕖: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models
- ORID: Organ-Regional Information Driven Framework for Radiology Report Generation
2024/11/20
2024/11/19
- Generative World Explorer
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
- Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering
- SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
- VeGaS: Video Gaussian Splatting
- 本サイトは大規模言語モデルを用いた実験的な性質を持つものであるため、コンテンツの正確性についての保証は致しかねます。
- プライバシーポリシー