Image Processing New Papers / Books / Telegram Index

Open in telegram

☆☆☆☆☆

⚑ Report channel

248 @image_processing_new_papers

Description

Fully automatic :)

هدف کانال بیشتر آشنایی با کارهای جدید است، نه اینکه مکفی از خواندن مقالات باشد.

searching words: [image processing, vision]

similar channel in NLP field:
https://t.me/NLP_New_Pape

ادمین:
@dangerous_seif

We recommend to visit

Z-Library Official ?

617,627 @zlibrary_official

News and announcements of the library. No books here.
??Official Chinese channel: t.me/zlib_china_official
? https://z-library.sk
https://en.wikipedia.org/wiki/Z-Library
? https://twitter.com/Z_Lib_official
? https://mastodon.social/@Z_Lib_official

Last updated 1 year ago

Intel Slava Z

421,876 @intelslava

Intel slava is a Russian News aggregator who covers Conflicts/Geopolitics and urgent news from around the world.

For paid promotions and feedback contact us at: @CEOofBelarus

Last updated 6 months, 2 weeks ago

Books Hub: Ebook & Audiobook

303,870 @bookshub25

💫Welcome to the best book channel of Telegram.

✨Buy ads: https://telega.io/c/BooksHub25

✨Contact admin ➠ @Bookshub_contact_bot

✨ Copyright Disclaimer➠ https://telegra.ph/LEGAL-COPYRIGHT-DISCLAIMER-09-18

1 year, 10 months ago

سلام و عرض ادب خدمت همراهان گرامی.
عزیزان یک اتفاق بد افتاده و اون اینکه سایت arxiv لینک های دانلود فایل مقالات رو بر ما تحریم کرده.
اگر عزیزی راهی برای دور زدن یا vpn ای که کار میکنه سراغ داره خیلی ممنون میشم به بنده پیام بده. ?
@dangerous_seif

577 #

1 year, 10 months ago

ChatGPT summarized:
This paper describes the development of a new framework for multi-modal retrieval using machine learning and artificial intelligence. It uses state-of-the-art machine learning to develop a model capable of performing retrieval on multiple languages including English, French, German, and Spanish. The approach outperforms previous models based on monolingual and multilingual data. Results and Discussion The authors discuss the various approaches they used to develop their model. MuMUR outperforms all other methods in both image retrieval and pattern recognition. They demonstrate that their model surpasses previous models trained on single language or multi-language datasets. However, they admit that their approach suffers from some design hiccups due to its reliance on machine learning with mixed results.

Abstract:
Multi-modal retrieval has seen tremendous progress with the development of vision-language models. However, further improving these models require additional labelled data which is a huge manual effort. In this paper, we propose a framework MuMUR, that utilizes knowledge transfer from a multilingual model to boost the performance of multi-modal (image and video) retrieval. We first use state-of-the-art machine translation models to construct pseudo ground-truth multilingual visual-text pairs. We then use this data to learn a joint vision-text representation where English and non-English text queries are represented in a common embedding space based on pretrained multilingual models. We evaluate our proposed approach on a diverse set of retrieval datasets: five video retrieval datasets such as MSRVTT, MSVD, DiDeMo, Charades and MSRVTT multilingual, two image retrieval datasets such as Flickr30k and Multi30k . Experimental results demonstrate that our approach achieves state-of-the-art results on all video retrieval datasets outperforming previous models. Additionally, our framework MuMUR significantly beats other multilingual video retrieval dataset. We also observe that MuMUR exhibits strong performance on image retrieval. This demonstrates the universal ability of MuMUR to perform retrieval across all visual inputs (image and video) and text inputs (monolingual and multilingual).

نویسندگان:
Avinash Madasu, Estelle Aflalo, Gabriela Ben Melech Stan, Shachar Rosenman, Shao-Yen Tseng, Gedas Bertasius, Vasudev Lal

تاریخ انتشار:
18 September, 2023

623 #

1 year, 10 months ago

عنوان مقاله:
[MuMUR : Multilingual Multimodal Universal Retrieval

MuMUR: بازیابی جهانی چند وجهی چند زبانه](https://arxiv.org/abs/2208.11553)

خلاصه متن با استفاده از ChatGPT:
این مقاله توسعه یک چارچوب جدید برای بازیابی چندوجهی با استفاده از یادگیری ماشین و هوش مصنوعی را توصیف می‌کند. از پیشرفته ترین یادگیری ماشینی برای توسعه مدلی استفاده می کند که قادر به بازیابی در چندین زبان از جمله انگلیسی، فرانسوی، آلمانی و اسپانیایی است. این رویکرد بهتر از مدل های قبلی مبتنی بر داده های تک زبانه و چند زبانه است. نتایج و بحث نویسندگان در مورد رویکردهای مختلفی که برای توسعه مدل خود استفاده کرده اند بحث می کنند. MuMUR از همه روش‌های دیگر هم در بازیابی تصویر و هم در تشخیص الگو بهتر عمل می‌کند. آنها نشان می دهند که مدل آنها از مدل های قبلی آموزش داده شده بر روی مجموعه داده های تک زبانی یا چند زبانه پیشی می گیرد. با این حال، آنها اعتراف می کنند که رویکرد آنها به دلیل اتکا به یادگیری ماشینی با نتایج متفاوت، از برخی مشکلات طراحی رنج می برد.

قسمت چکیده (abstract) مقاله:
بازیابی چند وجهی با توسعه مدل‌های زبان بینایی پیشرفت چشمگیری داشته است. با این حال، بهبود بیشتر این مدل‌ها به داده‌های برچسب‌دار اضافی نیاز دارد که تلاش دستی بزرگی است. در این مقاله، ما یک چارچوب MuMUR را پیشنهاد می‌کنیم که از انتقال دانش از یک مدل چند زبانه برای افزایش عملکرد بازیابی چند وجهی (تصویر و ویدیو) استفاده می‌کند. ما ابتدا از پیشرفته‌ترین مدل‌های ترجمه ماشینی برای ساخت جفت‌های متنی بصری چندزبانه شبه واقعیت استفاده می‌کنیم. سپس از این داده‌ها برای یادگیری یک نمایش متن-دید مشترک استفاده می‌کنیم که در آن پرس‌وجوهای متن انگلیسی و غیرانگلیسی در یک فضای جاسازی مشترک بر اساس مدل‌های چندزبانه از پیش آموزش دیده نمایش داده می‌شوند. ما رویکرد پیشنهادی خود را بر روی مجموعه متنوعی از مجموعه داده‌های بازیابی ارزیابی می‌کنیم: پنج مجموعه داده بازیابی ویدیویی مانند MSRVTT، MSVD، DiDeMo، Charades و MSRVTT چند زبانه، دو مجموعه داده بازیابی تصویر مانند Flickr30k و Multi30k. نتایج تجربی نشان می‌دهد که رویکرد ما به نتایج پیشرفته‌ای در تمام مجموعه داده‌های بازیابی ویدیویی دست می‌یابد که عملکرد بهتری از مدل‌های قبلی دارند. علاوه بر این، چارچوب ما MuMUR به طور قابل توجهی از دیگر مجموعه داده‌های بازیابی ویدیوی چندزبانه پیشی می‌گیرد. ما همچنین مشاهده می کنیم که MuMUR عملکرد قوی در بازیابی تصویر از خود نشان می دهد. این توانایی جهانی MuMUR را برای انجام بازیابی در تمام ورودی های بصری (تصویر و ویدئو) و ورودی های متن (تک زبانه و چند زبانه) نشان می دهد.

arXiv.org

MuMUR : Multilingual Multimodal Universal Retrieval

Multi-modal retrieval has seen tremendous progress with the development of vision-language models. However, further improving these models require additional labelled data which is a huge manual...

421 #

1 year, 10 months ago

ChatGPT summarized:
In this paper, the authors describe a new approach to machine learning that enables "zero-shot scene transfer" and real-world deployment of a robot. They examine how their approach outperforms previous approaches based on reinforcement learning and contrastive learning techniques. Their goal is to learn a sensorimotor policy capable of directly mapping raw onboard images and proprioceptive measurements into quadcopter control commands. In addition, they demonstrate that their approach can be applied to the task of autonomous, vision-based quadrotor flight. To demonstrate the performance of their approach, they train a vision encoder using their dataset in four simulation environments and then use a multi-pronged approach to train the action network. Finally, they perform several experiments to test the effectiveness of their system in both the training environment and the real world. The study is supported by the European Union's Horizon 2020 Research and Innovation Programme under grant agreement No.871479 and the European Research Council (ERC)

Abstract:
Scene transfer for vision-based mobile robotics applications is a highly relevant and challenging problem. The utility of a robot greatly depends on its ability to perform a task in the real world, outside of a well-controlled lab environment. Existing scene transfer end-to-end policy learning approaches often suffer from poor sample efficiency or limited generalization capabilities, making them unsuitable for mobile robotics applications. This work proposes an adaptive multi-pair contrastive learning strategy for visual representation learning that enables zero-shot scene transfer and real-world deployment. Control policies relying on the embedding are able to operate in unseen environments without the need for finetuning in the deployment environment. We demonstrate the performance of our approach on the task of agile, vision-based quadrotor flight. Extensive simulation and real-world experiments demonstrate that our approach successfully generalizes beyond the training domain and outperforms all baselines.

نویسندگان:
Jiaxu Xing, Leonard Bauersfeld, Yunlong Song, Chunwei Xing, Davide Scaramuzza

تاریخ انتشار:
18 September, 2023

298 #

1 year, 10 months ago

عنوان مقاله:
[Contrastive Learning for Enhancing Robust Scene Transfer in Vision-based Agile Flight

یادگیری متضاد برای افزایش انتقال قوی صحنه در پرواز چابک مبتنی بر دید](https://arxiv.org/abs/2309.09865)

خلاصه متن با استفاده از ChatGPT:
در این مقاله، نویسندگان یک رویکرد جدید برای یادگیری ماشینی را توصیف می‌کنند که «انتقال صحنه صفر شات» و استقرار یک ربات را در دنیای واقعی ممکن می‌سازد. آنها بررسی می کنند که چگونه رویکرد آنها بر رویکردهای قبلی مبتنی بر یادگیری تقویتی و تکنیک های یادگیری متضاد عمل می کند. هدف آنها یادگیری یک خط مشی حسی حرکتی است که قادر به نگاشت مستقیم تصاویر خام روی هواپیما و اندازه گیری های عمقی در دستورات کنترل کوادکوپتر است. علاوه بر این، آنها نشان می‌دهند که رویکرد آنها می‌تواند برای وظیفه پرواز چهارچرخ خودران مبتنی بر بینایی به کار رود. برای نشان دادن عملکرد رویکرد خود، آنها یک رمزگذار بینایی را با استفاده از مجموعه داده خود در چهار محیط شبیه‌سازی آموزش می‌دهند و سپس از یک رویکرد چند جانبه برای آموزش شبکه عمل استفاده می‌کنند. در نهایت، آنها چندین آزمایش را برای آزمایش اثربخشی سیستم خود در محیط آموزشی و دنیای واقعی انجام می دهند. این مطالعه توسط برنامه تحقیقات و نوآوری افق 2020 اتحادیه اروپا تحت موافقتنامه کمک مالی شماره 871479 و شورای تحقیقات اروپا (ERC) پشتیبانی می شود.

قسمت چکیده (abstract) مقاله:
انتقال صحنه برای برنامه های روباتیک موبایل مبتنی بر بینایی یک مشکل بسیار مرتبط و چالش برانگیز است. کاربرد یک ربات تا حد زیادی به توانایی آن در انجام یک کار در دنیای واقعی، خارج از محیط آزمایشگاهی به خوبی کنترل شده بستگی دارد. رویکردهای یادگیری خط مشی پایان به انتها انتقال صحنه موجود اغلب از بازده نمونه ضعیف یا قابلیت های تعمیم محدود رنج می برند، که آنها را برای برنامه های روباتیک سیار نامناسب می کند. این کار یک استراتژی یادگیری متضاد چند جفت تطبیقی را برای یادگیری بازنمایی بصری پیشنهاد می‌کند که انتقال صحنه صفر شات و استقرار در دنیای واقعی را امکان‌پذیر می‌سازد. سیاست‌های کنترلی با تکیه بر تعبیه می‌توانند در محیط‌های دیده نشده بدون نیاز به تنظیم دقیق در محیط استقرار عمل کنند. ما عملکرد رویکرد خود را در پرواز چهارچرخ چابک و مبتنی بر بینایی نشان می‌دهیم. شبیه‌سازی گسترده و آزمایش‌های دنیای واقعی نشان می‌دهد که رویکرد ما با موفقیت فراتر از حوزه آموزشی تعمیم می‌یابد و از همه خطوط پایه بهتر عمل می‌کند.

arXiv.org

Contrastive Learning for Enhancing Robust Scene Transfer in...

Scene transfer for vision-based mobile robotics applications is a highly relevant and challenging problem. The utility of a robot greatly depends on its ability to perform a task in the real...

259 #

1 year, 10 months ago

عنوان مقاله:
[Unsupervised Open-Vocabulary Object Localization in Videos

بومی سازی اشیاء واژگان باز بدون نظارت در ویدیوها](https://arxiv.org/abs/2309.09858)

خلاصه متن با استفاده از ChatGPT:
نویسندگان چندین رویکرد برای تشخیص و طبقه‌بندی اشیا را در بازی ویدیویی مورد بحث قرار می‌دهند. در این مقاله، آن‌ها نشان می‌دهند که رویکرد آنها دقیق‌تر و قابل اعتمادتر از رویکردهای قبلی است که به افراد آموزش دیده برای یادگیری تکیه دارند. آنها همچنین نشان می دهند که چگونه اشیا را با استفاده از ترکیبی از تکنیک های تشخیص تصویر استاندارد و یادگیری تصویر طبقه بندی می کنند. رویکرد آنها با چندین اصلاح تکمیل می شود که دقت و حساسیت مدل آنها را افزایش می دهد. به عنوان مثال، آنها از تکنیک خوشه بندی خوشه بندی برای طبقه بندی اشیا بر اساس موقعیت آنها در یک صحنه به جای موقعیت نسبی آنها در زمان یا مکان استفاده می کنند. آنها وظایفی مانند خواستگاری و شبکه های عصبی کانولوشنال را انجام می دهند تا پیش بینی کنند کدام اشیا در یک محیط معین عملکرد خوبی خواهند داشت. آن‌ها از هوش مصنوعی برای پیش‌بینی داده‌های احساسی استفاده نمی‌کنند، زیرا معتقدند که احساسات معمولاً از نظر نوع خود انعطاف‌پذیر هستند. نویسندگان

قسمت چکیده (abstract) مقاله:
در این مقاله، ما نشان می‌دهیم که پیشرفت‌های اخیر در یادگیری بازنمایی ویدیویی و مدل‌های از پیش آموزش‌دیده‌شده زبان بینایی، امکان پیشرفت‌های قابل‌توجهی را در محلی‌سازی اشیاء ویدیویی با نظارت خود فراهم می‌کند. ما روشی را پیشنهاد می‌کنیم که ابتدا اشیاء را در ویدیوها از طریق رویکرد توجه شکاف بومی‌سازی می‌کند و سپس متن را به شکاف‌های به‌دست‌آمده اختصاص می‌دهد. دومی با روشی بدون نظارت برای خواندن اطلاعات معنایی محلی از مدل CLIP از پیش آموزش دیده به دست می آید. محلی‌سازی شیء ویدیویی حاصل جدا از حاشیه‌نویسی ضمنی موجود در CLIP کاملاً بدون نظارت است و عملاً اولین رویکرد بدون نظارت است که نتایج خوبی در معیارهای ویدیویی معمولی به همراه دارد.

ChatGPT summarized:
The authors discuss several approaches to object detection and classification in the video game DeepMind uses a combination of machine learning and natural language analysis to develop a method for detecting objects in live streamed video. In this paper, they demonstrate that their approach is more accurate and reliable than previous approaches that rely on trained individuals to learn by rote. They also demonstrate how they classify objects using a mixture of standard image recognition and image-learning techniques. Their approach is complemented by several refinements that increase both the accuracy and the sensitivity of their model. For example, they use a clustering clustering technique to classify objects based on their position within a scene rather than on their relative positions in time or space. They perform tasks such as matchmaking and convolutional neural nets in order to predict which objects will perform well in a given environment. They do not employ artificial intelligence to predict sentimentual data since they believe that sentiment is usually malleable in terms of its own type. The authors

Abstract:
In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization. We propose a method that first localizes objects in videos via a slot attention approach and then assigns text to the obtained slots. The latter is achieved by an unsupervised way to read localized semantic information from the pre-trained CLIP model. The resulting video object localization is entirely unsupervised apart from the implicit annotation contained in CLIP, and it is effectively the first unsupervised approach that yields good results on regular video benchmarks.

نویسندگان:
Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He

تاریخ انتشار:
18 September, 2023

231 #

1 year, 10 months ago

ChatGPT summarized:
In this paper, the authors present a new large-scale grasp detection dataset developed from foundation models using Grasp-Anything. They demonstrate that their model can outperform previous methods by detecting "grasp" and "noise" in real-world scenarios. They use several different approaches to improve the quality of the data to achieve more robust generalization in grasp detection. In particular, they focus on improving the neural network with a model-centric approach rather than relying on machine learning. The authors believe that there are many different kinds of applications where it is important to have accurate grasping data. They discuss the role of the human eye in industrial applications such as manufacturing, logistics, and warehouse automation and report that deep learning has not been very effective in this field. Their paper also discusses the advantages of using natural language processing over artificial intelligence to predict future threats.

Abstract:
Foundation models such as ChatGPT have made significant strides in robotic tasks due to their universal representation of real-world domains. In this paper, we leverage foundation models to tackle grasp detection, a persistent challenge in robotics with broad industrial applications. Despite numerous grasp datasets, their object diversity remains limited compared to real-world figures. Fortunately, foundation models possess an extensive repository of real-world knowledge, including objects we encounter in our daily lives. As a consequence, a promising solution to the limited representation in previous grasp datasets is to harness the universal knowledge embedded in these foundation models. We present Grasp-Anything, a new large-scale grasp dataset synthesized from foundation models to implement this solution. Grasp-Anything excels in diversity and magnitude, boasting 1M samples with text descriptions and more than 3M objects, surpassing prior datasets. Empirically, we show that Grasp-Anything successfully facilitates zero-shot grasp detection on vision-based tasks and real-world robotic experiments. Our dataset and code are available at https://grasp-anything-2023.github.io.

نویسندگان:
An Dinh Vuong, Minh Nhat Vu, Hieu Le, Baoru Huang, Binh Huynh, Thieu Vo, Andreas Kugi, Anh Nguyen

تاریخ انتشار:
18 September, 2023

grasp-anything-2023.github.io

Grasp-Anything: Large-scale Grasp Dataset from Foundation Models

210 #

1 year, 10 months ago

عنوان مقاله:
[Learning Inertial Parameter Identification of Unknown Object with Humanoid Robot using Sim-to-Real Adaptation

یادگیری شناسایی پارامتر اینرسی شی ناشناخته با ربات انسان نما با استفاده از سازگاری Sim-to-Real](https://arxiv.org/abs/2309.09810)

خلاصه متن با استفاده از ChatGPT:
این مقاله روش جدیدی را برای یادگیری شناسایی پارامترهای اینرسی اشیاء ناشناخته با استفاده از یادگیری ماشین توصیف می‌کند. این ربات از یک ربات چهار درجه آزادی به نام SATYRR به عنوان پایه خود استفاده می کند. نویسندگان اذعان می‌کنند که روش آن‌ها چندین محدودیت دارد: برای آموزش و اعتبارسنجی داده‌های آموزشی زیادی نیاز دارد، و آن‌ها باید از تعدادی رویکردهای مختلف برای تخمین «پارامتر اینرسی» یک شی ناشناخته استفاده کنند. با این حال، آنها ادعا می کنند که رویکرد آنها قابل اعتمادتر و راحت تر از روش های قبلی است. روش آنها همچنین می تواند برای پیش بینی رفتار اشیاء دنیای واقعی مانند واکرهای متحرک و روبات های انسان نما چرخدار مورد استفاده قرار گیرد.

قسمت چکیده (abstract) مقاله:
درک دینامیک شی ناشناخته برای ربات های مشارکتی از جمله انسان نماها برای تعامل ایمن تر و دقیق تر با انسان بسیار مهم است. بیشتر متون مرتبط از حسگر نیرو/گشتاور، دانش قبلی از جسم، سیستم بینایی و یک مسیر افق بلند استفاده می کنند که اغلب غیرعملی هستند. علاوه بر این، این روش‌ها اغلب مستلزم حل مسئله بهینه‌سازی غیرخطی هستند که گاهی نتایج فیزیکی متناقضی را به همراه دارند. در این کار، ما یک تخمین پارامتر اینرسی مبتنی بر یادگیری سریع را به عنوان روشی کاربردی‌تر پیشنهاد می‌کنیم. ما یک مجموعه داده قابل اعتماد در یک شبیه‌سازی با وفاداری بالا به دست می‌آوریم و یک مدل رگرسیون مبتنی بر داده‌های سری زمانی (به عنوان مثال، LSTM) را برای تخمین پارامتر اینرسی اشیاء ناشناخته آموزش می‌دهیم. ما همچنین یک روش جدید انطباق سیم به واقعی را معرفی می کنیم که شناسایی سیستم ربات و فرآیندهای گاوسی را برای انتقال مستقیم مدل آموزش دیده به برنامه دنیای واقعی ترکیب می کند. ما روش خود را با یک دستکاری 4-DOF از ربات انسان‌نمای چرخدار فیزیکی، SATYRR نشان می‌دهیم. نتایج نشان می‌دهد که روش ما می‌تواند پارامترهای اینرسی اشیاء ناشناخته مختلف را سریع‌تر و دقیق‌تر از روش‌های معمولی شناسایی کند.

ChatGPT summarized:
This paper describes a new method for learning to identify the inertial parameters of unknown objects using machine learning. It uses a four-degrees-of-freedom robot named SATYRR as its basis. The authors acknowledge that their method has several limitations: it requires a lot of training data to be trained and validated, and they have to employ a number of different approaches to estimate the "inertial parameter" of an unknown object. However, they claim that their approach is more reliable and easier to use than previous methods. Their method can also be used to predict the behavior of real-world objects, such as moving walkers and wheeled humanoid robots.

Abstract:
Understanding the dynamics of unknown object is crucial for collaborative robots including humanoids to more safely and accurately interact with humans. Most relevant literature leverage a force/torque sensor, prior knowledge of object, vision system, and a long-horizon trajectory which are often impractical. Moreover, these methods often entail solving non-linear optimization problem, sometimes yielding physically inconsistent results. In this work, we propose a fast learningbased inertial parameter estimation as more practical manner. We acquire a reliable dataset in a high-fidelity simulation and train a time-series data-driven regression model (e.g., LSTM) to estimate the inertial parameter of unknown objects. We also introduce a novel sim-to-real adaptation method combining Robot System Identification and Gaussian Processes to directly transfer the trained model to real-world application. We demonstrate our method with a 4-DOF single manipulator of physical wheeled humanoid robot, SATYRR. Results show that our method can identify the inertial parameters of various unknown objects faster and more accurately than conventional methods.

نویسندگان:
Donghoon Baek, Bo Peng, Saurabh Gupta, Joao Ramos

تاریخ انتشار:
18 September, 2023

arXiv.org

Online Learning-Based Inertial Parameter Identification of Unknown...

Identifying the dynamic properties of manipulated objects is essential for safe and accurate robot control. Most methods rely on low noise force torque sensors, long exciting signals, and solving...

138 #

1 year, 10 months ago

ChatGPT summarized:
This chapter deals primarily with a re-evaluation of the capabilities of the analytical tools used to predict future performance. The goal is to find out what will be useful for predicting future performance in a variety of situations. Machine learning is based on solving problems that are real-world problems. For example, if a task is too difficult to solve, the solution may be to use an existing problem as an opportunity to learn something new. This paper describes a method for application-driven validation of posterior-based methods in inverse problems. It uses machine learning to predict and measure various aspects of an image. It caters to a wide range of applications from medicine to surgery to industrial espionage. This chapter discusses several different types of measurements including reaction times, error time, sensitivity to light, and reaction times using statistical methods. They admit that their work is experimental, but they have tried to accurately estimate the effect of each measure on the overall health of the entire system

Abstract:
Current deep learning-based solutions for image analysis tasks are commonly incapable of handling problems to which multiple different plausible solutions exist. In response, posterior-based methods such as conditional Diffusion Models and Invertible Neural Networks have emerged; however, their translation is hampered by a lack of research on adequate validation. In other words, the way progress is measured often does not reflect the needs of the driving practical application. Closing this gap in the literature, we present the first systematic framework for the application-driven validation of posterior-based methods in inverse problems. As a methodological novelty, it adopts key principles from the field of object detection validation, which has a long history of addressing the question of how to locate and match multiple object instances in an image. Treating modes as instances enables us to perform mode-centric validation, using well-interpretable metrics from the application perspective. We demonstrate the value of our framework through instantiations for a synthetic toy example and two medical vision use cases: pose estimation in surgery and imaging-based quantification of functional tissue parameters for diagnostics. Our framework offers key advantages over common approaches to posterior validation in all three examples and could thus revolutionize performance assessment in inverse problems.

نویسندگان:
Tim J. Adler, Jan-Hinrich Nölke, Annika Reinke, Minu Dietlinde Tizabi, Sebastian Gruber, Dasha Trofimova, Lynton Ardizzone, Paul F. Jaeger, Florian Buettner, Ullrich Köthe, Lena Maier-Hein

تاریخ انتشار:
18 September, 2023

107 #

We recommend to visit

Z-Library Official ?

617,627 @zlibrary_official

Last updated 1 year ago

Intel Slava Z

421,876 @intelslava

Intel slava is a Russian News aggregator who covers Conflicts/Geopolitics and urgent news from around the world.

For paid promotions and feedback contact us at: @CEOofBelarus

Last updated 6 months, 2 weeks ago

Books Hub: Ebook & Audiobook

303,870 @bookshub25