Subscribe to the PwC Newsletter

Join the community, add a new evaluation result row, video understanding.

300 papers with code • 0 benchmarks • 42 datasets

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Benchmarks Add a Result

research paper video processing

Most implemented papers

Video swin transformer.

research paper video processing

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.

TSM: Temporal Shift Module for Efficient Video Understanding

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.

Is Space-Time Attention All You Need for Video Understanding?

We present a convolution-free approach to video classification built exclusively on self-attention over space and time.

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

SoccerNet 2022 Challenges Results

The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.

ECO: Efficient Convolutional Network for Online Video Understanding

In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time.

Learnable pooling with Context Gating for video classification

In particular, we evaluate our method on the large-scale multi-modal Youtube-8M v2 dataset and outperform all other methods in the Youtube 8M Large-Scale Video Understanding challenge.

Representation Flow for Action Recognition

Our representation flow layer is a fully-differentiable layer designed to capture the `flow' of any representation channel within a convolutional neural network for action recognition.

Video Instance Segmentation

The goal of this new task is simultaneous detection, segmentation and tracking of instances in videos.

CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

In this paper, we propose a CLIP4Clip model to transfer the knowledge of the CLIP model to video-language retrieval in an end-to-end manner.

An Overview of Traditional and Recent Trends in Video Processing

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

EURASIP Journal on Image and Video Processing Cover Image

  • Search by keyword
  • Search by citation

Page 1 of 19

Semi-automated computer vision-based tracking of multiple industrial entities: a framework and dataset creation approach

This contribution presents the TOMIE framework (Tracking Of Multiple Industrial Entities), a framework for the continuous tracking of industrial entities (e.g., pallets, crates, barrels) over a network of, in ...

  • View Full Text

Fast CU size decision and intra-prediction mode decision method for H.266/VVC

H.266/Versatile Video Coding (VVC) is the most recent video coding standard developed by the Joint Video Experts Team (JVET). The quad-tree with nested multi-type tree (QTMT) architecture that improves the com...

Assessment framework for deepfake detection in real-world situations

Detecting digital face manipulation in images and video has attracted extensive attention due to the potential risk to public trust. To counteract the malicious usage of such techniques, deep learning-based de...

Edge-aware nonlinear diffusion-driven regularization model for despeckling synthetic aperture radar images

Speckle noise corrupts synthetic aperture radar (SAR) images and limits their applications in sensitive scientific and engineering fields. This challenge has attracted several scholars because of the wide dema...

Multimodal few-shot classification without attribute embedding

Multimodal few-shot learning aims to exploit complementary information inherent in multiple modalities for vision tasks in low data scenarios. Most of the current research focuses on a suitable embedding space...

Secure image transmission through LTE wireless communications systems

Secure transmission of images over wireless communications systems can be done using RSA, the most known and efficient cryptographic algorithm, and OFDMA, the most preferred signal processing choice in wireles...

An optimized capsule neural networks for tomato leaf disease classification

Plant diseases have a significant impact on leaves, with each disease exhibiting specific spots characterized by unique colors and locations. Therefore, it is crucial to develop a method for detecting these di...

Multi-layer features template update object tracking algorithm based on SiamFC++

SiamFC++ only extracts the object feature of the first frame as a tracking template, and only uses the highest level feature maps in both the classification branch and the regression branch, so that the respec...

Robust steganography in practical communication: a comparative study

To realize the act of covert communication in a public channel, steganography is proposed. In the current study, modern adaptive steganography plays a dominant role due to its high undetectability. However, th...

Multi-attention-based approach for deepfake face and expression swap detection and localization

Advancements in facial manipulation technology have resulted in highly realistic and indistinguishable face and expression swap videos. However, this has also raised concerns regarding the security risks assoc...

Semantic segmentation of textured mosaics

This paper investigates deep learning (DL)-based semantic segmentation of textured mosaics. Existing popular datasets for mosaic texture segmentation, designed prior to the DL era, have several limitations: (1...

Comparison of synthetic dataset generation methods for medical intervention rooms using medical clothing detection as an example

The availability of real data from areas with high privacy requirements, such as the medical intervention space is low and the acquisition complex in terms of data protection. To enable research for assistance...

Phase congruency based on derivatives of circular symmetric Gaussian function: an efficient feature map for image quality assessment

Image quality assessment (IQA) has become a hot issue in the area of image processing, which aims to evaluate image quality automatically by a metric being consistent with subjective evaluation. The first stag...

Correction: Printing and scanning investigation for image counter forensics

The original article was published in EURASIP Journal on Image and Video Processing 2022 2022 :2

An early CU partition mode decision algorithm in VVC based on variogram for virtual reality 360 degree videos

360-degree videos have become increasingly popular with the application of virtual reality (VR) technology. To encode such kind of videos with ultra-high resolution, an efficient and real-time video encoder be...

Learning a crowd-powered perceptual distance metric for facial blendshapes

It is known that purely geometric distance metrics cannot reflect the human perception of facial expressions. A novel perceptually based distance metric designed for 3D facial blendshape models is proposed in ...

Studies in differentiating psoriasis from other dermatoses using small data set and transfer learning

Psoriasis is a common skin disorder that should be differentiated from other dermatoses if an effective treatment has to be applied. Regions of Interests, or scans for short, of diseased skin are processed by ...

Heterogeneous scene matching based on the gradient direction distribution field

Heterogeneous scene matching is a key technology in the field of computer vision. The image rotation problem is popular and difficult in the field of heterogeneous scene matching. In this paper, a heterogeneou...

FitDepth: fast and lite 16-bit depth image compression algorithm

This article presents a fast parallel lossless technique and a lossy image compression technique for 16-bit single-channel images. Nowadays, such techniques are “a must” in robotics and other areas where sever...

Vehicle logo detection using an IoAverage loss on dataset VLD100K-61

Vehicle Logo Detection (VLD) is of great significance to Intelligent Transportation Systems (ITS). Although many methods have been proposed for VLD, it remains a challenging problem. To improve the VLD accurac...

Correction: Research on application of multimedia image processing technology based on wavelet transform

The original article was published in EURASIP Journal on Image and Video Processing 2019 2019 :24

Correction: Geolocation of covert communication entity on the Internet for post-steganalysis

The original article was published in EURASIP Journal on Image and Video Processing 2020 2020 :15

Reversible designs for extreme memory cost reduction of CNN training

Training Convolutional Neural Networks (CNN) is a resource-intensive task that requires specialized hardware for efficient computation. One of the most limiting bottlenecks of CNN training is the memory cost a...

Data and image storage on synthetic DNA: existing solutions and challenges

Storage of digital data is becoming challenging for humanity due to the relatively short life-span of storage devices. Furthermore, the exponential increase in the generation of digital data is creating the ne...

Retraction Note: Research on path guidance of logistics transport vehicle based on image recognition and image processing in port area

A novel secured euclidean space points algorithm for blind spatial image watermarking.

Digital raw images obtained from the data set of various organizations require authentication, copyright protection, and security with simple processing. New Euclidean space point’s algorithm is proposed to au...

Retraction Note: Research on professional talent training technology based on multimedia remote image analysis

Retraction note: analysis of sports image detection technology based on machine learning, retraction note: research on image correction method of network education assignment based on wavelet transform, retraction note: performance analysis of ethylene-propylene diene monomer sound-absorbing materials based on image processing recognition, retraction note to: translation analysis of english address image recognition based on image recognition, retraction note: image processing algorithm of hartmann method aberration automatic measurement system with tensor product model, retraction note to: research on english translation distortion detection based on image evolution, retraction note: a method for spectral image registration based on feature maximum submatrix, fine-grained precise-bone age assessment by integrating prior knowledge and recursive feature pyramid network.

Bone age assessment (BAA) evaluates individual skeletal maturity by comparing the characteristics of skeletal development to the standard in a specific population. The X-ray image examination for bone age is t...

Palpation localization of radial artery based on 3-dimensional convolutional neural networks

Palpation localization is essential for detecting physiological parameters of the radial artery for pulse diagnosis of Traditional Chinese Medicine (TCM). Detecting signal or applying pressure at the wrong loc...

Weakly supervised spatial–temporal attention network driven by tracking and consistency loss for action detection

This study proposes a novel network model for video action tube detection. This model is based on a location-interactive weakly supervised spatial–temporal attention mechanism driven by multiple loss functions...

Performance analysis of different DCNN models in remote sensing image object detection

In recent years, deep learning, especially deep convolutional neural networks (DCNN), has made great progress. Many researchers use different DCNN models to detect remote sensing targets. Different DCNN models...

Multi-orientation local ternary pattern-based feature extraction for forensic dentistry

Accurate and automated identification of the deceased victims with dental radiographs plays a significant role in forensic dentistry. The image processing techniques such as segmentation and feature extraction...

Face image synthesis from facial parts

Recently, inspired by the growing power of deep convolutional neural networks (CNNs) and generative adversarial networks (GANs), facial image editing has received increasing attention and has produced a series...

An image-guided network for depth edge enhancement

With the rapid development of 3D coding and display technologies, numerous applications are emerging to target human immersive entertainments. To achieve a prime 3D visual experience, high accuracy depth maps ...

Automatic kidney segmentation using 2.5D ResUNet and 2.5D DenseUNet for malignant potential analysis in complex renal cyst based on CT images

Bosniak renal cyst classification has been widely used in determining the complexity of a renal cyst. However, it turns out that about half of patients undergoing surgery for Bosniak category III, take surgica...

Adaptive response maps fusion of correlation filters with anti-occlusion mechanism for visual object tracking

Despite the impressive performance of correlation filter-based trackers in terms of robustness and accuracy, the trackers have room for improvement. The majority of existing trackers use a single feature or fi...

Random CNN structure: tool to increase generalization ability in deep learning

The paper presents a novel approach for designing the CNN structure of improved generalization capability in the presence of a small population of learning data. Unlike the classical methods for building CNN, ...

Printing and scanning investigation for image counter forensics

Examining the authenticity of images has become increasingly important as manipulation tools become more accessible and advanced. Recent work has shown that while CNN-based image manipulation detectors can suc...

The Correction to this article has been published in EURASIP Journal on Image and Video Processing 2023 2023 :10

Reduced reference image and video quality assessments: review of methods

With the growing demand for image and video-based applications, the requirements of consistent quality assessment metrics of image and video have increased. Different approaches have been proposed in the liter...

Perceptual hashing method for video content authentication with maximized robustness

Perceptual video hashing represents video perceptual content by compact hash. The binary hash is sensitive to content distortion manipulations, but robust to perceptual content preserving operations. Currently...

A study on implementation of real-time intelligent video surveillance system based on embedded module

Conventional surveillance systems for preventing accidents and incidents do not identify 95% thereof after 22 min when one person monitors a plurality of closed circuit televisions (CCTV). To address this issu...

HR-MPF: high-resolution representation network with multi-scale progressive fusion for pulmonary nodule segmentation and classification

Accurate segmentation and classification of pulmonary nodules are of great significance to early detection and diagnosis of lung diseases, which can reduce the risk of developing lung cancer and improve patien...

Fatigue driving detection based on electrooculography: a review

To accurately identify fatigued driving, establishing a monitoring system is one of the important guarantees of improving traffic safety and reducing traffic accidents. Among many research methods, electroocul...

  • Aims and Scope
  • Editorial Board
  • Sign up for article alerts and news from this journal
  • Follow us on Twitter
  • Follow us on Facebook

New Content Item

Webinar Series

Learn more about the EURASIP Journal on Image and Video Processing free monthly webinar series

Affiliated with

research paper video processing

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

  •  We're Hiring!
  •  Help Center

Video Processing

  • Most Cited Papers
  • Most Downloaded Papers
  • Newest Papers
  • Save to Library
  • Last »
  • Image Processing Follow Following
  • Video Analysis Follow Following
  • Web Programming Follow Following
  • Computer Vision Follow Following
  • Pattern Recognition Follow Following
  • Digital Image Processing Follow Following
  • Video Games Follow Following
  • MySQL Follow Following
  • Computer Science Follow Following
  • Machine Learning Follow Following

Enter the email address you signed up with and we'll email you a reset link.

  • Academia.edu Publishing
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

research paper video processing

Street-Based Parking Lot Detection With Image Processing And Deep Learning

  • Ahmet Sayar
  • Ahmet Fatih Mustacoglu

research paper video processing

Deep hybrid architectures and DenseNet35 in speaker-dependent visual speech recognition

  • Preethi Jayappa Seegehalli
  • B. Niranjana Krupa

research paper video processing

Sub-RENet: a wavelet-based network for super resolution of diagnostic ultrasound

  • Mayank Kumar Singh

research paper video processing

CNN models for Maghrebian accent recognition with SVM silence elimination

  • Kamel Mebarkia
  • Aicha Reffad

research paper video processing

Real-time masked face recognition using deep learning-based double generator network

  • P. Jayapriya

research paper video processing

Noma using reconfigurable intelligent surfaces (RIS) with power adaptation and solar energy

  • Raed Alhamad
  • Hatem Boujemâa

research paper video processing

Conv-ViT fusion for improved handwritten Arabic character classification

  • Sarra Rouabhi
  • Abdennour Azerine
  • Lhassane Idoumghar

research paper video processing

An effective masked transformer network for image denoising

  • Shaoping Xu
  • Minghai Xiong

research paper video processing

Wall-Cor Net: wall color replacement via Clifford chance-based deep generative adversarial network

  • M. Sabitha Preethi
  • M. R. Geetha

research paper video processing

A deep face spoof detection framework using multi-level ELBPs and stacked LSTMs

  • Chhavi Dhiman
  • Aashania Antil
  • Soham Gakhar

research paper video processing

FS-YOLO: a multi-scale SAR ship detection network in complex scenes

  • Shouwen Cai

research paper video processing

A novel approach to geometric algebra-based variable step-size LMS adaptive filtering algorithm

  • Khurram Shahzad
  • Junaid Jamshid

research paper video processing

Multi-class differentiation feature representation guided joint dictionary learning for facial expression recognition

  • Jiatong Bai
  • Hehao Zhang

research paper video processing

Recognize and classify illnesses on tomato leaves using EfficientNet's transfer learning approach with different size dataset

  • Pratik Buchke
  • A. V. R. Mayuri

research paper video processing

Integrating a cosmetic detection scheme into face–iris multimodal biometric systems

  • Maryam Eskandari

research paper video processing

Dynamic 8-bit XOR algorithm with AES crypto algorithm for image steganography

  • A. Samydurai

research paper video processing

A deep neural network-based end-to-end 3D medical abdominal segmentation and reconstruction model

  • Yuhan Jiang

research paper video processing

A lightweight road crack detection algorithm based on improved YOLOv7 model

  • Yanchao Wang
  • Zhonglong Zheng

research paper video processing

Convolutional neural network-based fracture detection in spectrogram of acoustic emission

  • S. Deivalakshmi

research paper video processing

A shrinkage adaptive filtering algorithm with graph filter models

  • Wenyuan Wang

research paper video processing

Improving real-time small objects detection by fusion features of spatial coordinates

  • Qianjiang Yu
  • Tongyuan Huang

research paper video processing

ERCU-Net: segmentation of road potholes using enhanced residual convolutional block based on U-Net for ADAS

  • Ruchi Tripathi
  • Rohit Kumar

research paper video processing

Research on variational optical flow method for rockfall monitoring

research paper video processing

Boundary-sensitive denoised temporal reasoning network for video action segmentation

research paper video processing

Enhancing DC microgrid performance with fuzzy logic control for hybrid energy storage system

  • Vinay Kumar SadolaluBoregowda
  • Saurabh Kumar

research paper video processing

Novel fractional scaled Wigner distribution using fractional instantaneous autocorrelation

  • Aamir H. Dar
  • Huda M. Alshanbari
  • Sundus N. Alaziz

research paper video processing

Fusion of UNet and ResNet decisions for change detection using low and high spectral resolution images

  • Emna Brahim
  • Sonia Bouzidi

research paper video processing

YOLO-based anomaly activity detection system for human behavior analysis and crime mitigation

  • K. Ganagavalli

research paper video processing

Attention-driven residual-dense network for no-reference image quality assessment

  • Changzhong Wang
  • Yingnan Song

research paper video processing

Emotion recognition with attention mechanism-guided dual-feature multi-path interaction network

  • Yanjiang Wang

research paper video processing

Research on cuttings image segmentation method based on improved MultiRes-Unet++  with attention mechanism

  • Fengcai Huo
  • Kaiming Liu

research paper video processing

A deep learning based multi-image compression technique

  • Dibyendu Barman
  • Abul Hasnat
  • Bandana Barman

research paper video processing

A two-stage fusion remote sensing image dehazing network based on multi-scale feature and hybrid attention

  • Mengjun Miao
  • Heming Huang

research paper video processing

Analysis of CNN models in classifying Alzheimer's stages: comparison and explainability examination of the proposed separable convolution-based neural network and transfer learning models

  • Naciye Nur Arslan
  • Durmus Ozdemir

research paper video processing

Enhancing ASD classification through hybrid attention-based learning of facial features

  • Inzamam Shahzad
  • Saif Ur Rehman Khan

research paper video processing

A positional-aware attention PCa detection network on multi-parametric MRI

  • Weiming Ren
  • Yongyi Chen

research paper video processing

Source bias reduction for source-free domain adaptation

  • Zhenbin Wang

research paper video processing

Lyapunov-based nonlinear model predictive control for the path following of bevel-tip flexible needles in 3D environment

research paper video processing

Towards stronger illumination robustness of local feature detection and description based on auxiliary learning

  • Houqin Bian

research paper video processing

MULTI-head self-attention-based recurrent neural network with dwarf mongoose optimization algorithm-espoused QRS detector design

  • S. R. Malathi
  • P. Vijay Kumar

research paper video processing

3D face recognition using image decomposition and POEM descriptor

  • Abdelghafour Abbad
  • Soukaina El Idrissi El Kaitouni
  • Hamid Tairi

research paper video processing

PCB defect detection based on PSO-optimized threshold segmentation and SURF features

  • Yuanpei Chang
  • Jiancun Zuo

research paper video processing

Deep palmprint recognition algorithm based on self-supervised learning and uncertainty loss

  • Xiaohong Han

research paper video processing

Color image sparse adversarial samples generation via color channel Volterra expansion

  • Lingfeng Cheng
  • Heying Zhang
  • Jiashu Zhang

research paper video processing

Msap: multi-scale attention probabilistic network for underwater image enhancement network

  • Baocai Chang
  • Jinjiang Li

research paper video processing

Joint throughput maximization, interference cancellation, and power efficiency for multi-IRS-empowered UAV communications

  • Amin Mohajer

research paper video processing

AI-based smart agriculture 4.0 system for plant diseases detection in Tunisia

  • Soulef Bouaafia
  • Abdellatif Mtibaa

research paper video processing

Triplet attention-based deep learning model for hierarchical image classification of household items for robotic applications

  • Divya Arora Bhayana
  • Om Prakash Verma

research paper video processing

Small target detection in drone aerial images based on feature fusion

  • Huajun Wang
  • Yufeng Chen

research paper video processing

Image processing-based protection of privacy data in cloud using NTRU algorithm

  • K. Karthika
  • R. Devi Priya

research paper video processing

  • Find a journal
  • Publish with us
  • Track your research

Main Navigation

  • Contact NeurIPS
  • Code of Ethics
  • Code of Conduct
  • Create Profile
  • Journal To Conference Track
  • Diversity & Inclusion
  • Proceedings
  • Future Meetings
  • Exhibitor Information
  • Privacy Policy

NeurIPS 2024

Conference Dates: (In person) 9 December - 15 December, 2024

Homepage: https://neurips.cc/Conferences/2024/

Call For Papers 

Author notification: Sep 25, 2024

Camera-ready, poster, and video submission: Oct 30, 2024 AOE

Submit at: https://openreview.net/group?id=NeurIPS.cc/2024/Conference  

The site will start accepting submissions on Apr 22, 2024 

Subscribe to these and other dates on the 2024 dates page .

The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024) is an interdisciplinary conference that brings together researchers in machine learning, neuroscience, statistics, optimization, computer vision, natural language processing, life sciences, natural sciences, social sciences, and other adjacent fields. We invite submissions presenting new and original research on topics including but not limited to the following:

  • Applications (e.g., vision, language, speech and audio, Creative AI)
  • Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
  • Evaluation (e.g., methodology, meta studies, replicability and validity, human-in-the-loop)
  • General machine learning (supervised, unsupervised, online, active, etc.)
  • Infrastructure (e.g., libraries, improved implementation and scalability, distributed solutions)
  • Machine learning for sciences (e.g. climate, health, life sciences, physics, social sciences)
  • Neuroscience and cognitive science (e.g., neural coding, brain-computer interfaces)
  • Optimization (e.g., convex and non-convex, stochastic, robust)
  • Probabilistic methods (e.g., variational inference, causal inference, Gaussian processes)
  • Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)
  • Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)
  • Theory (e.g., control theory, learning theory, algorithmic game theory)

Machine learning is a rapidly evolving field, and so we welcome interdisciplinary submissions that do not fit neatly into existing categories.

Authors are asked to confirm that their submissions accord with the NeurIPS code of conduct .

Formatting instructions:   All submissions must be in PDF format, and in a single PDF file include, in this order:

  • The submitted paper
  • Technical appendices that support the paper with additional proofs, derivations, or results 
  • The NeurIPS paper checklist  

Other supplementary materials such as data and code can be uploaded as a ZIP file

The main text of a submitted paper is limited to nine content pages , including all figures and tables. Additional pages containing references don’t count as content pages. If your submission is accepted, you will be allowed an additional content page for the camera-ready version.

The main text and references may be followed by technical appendices, for which there is no page limit.

The maximum file size for a full submission, which includes technical appendices, is 50MB.

Authors are encouraged to submit a separate ZIP file that contains further supplementary material like data or source code, when applicable.

You must format your submission using the NeurIPS 2024 LaTeX style file which includes a “preprint” option for non-anonymous preprints posted online. Submissions that violate the NeurIPS style (e.g., by decreasing margins or font sizes) or page limits may be rejected without further review. Papers may be rejected without consideration of their merits if they fail to meet the submission requirements, as described in this document. 

Paper checklist: In order to improve the rigor and transparency of research submitted to and published at NeurIPS, authors are required to complete a paper checklist . The paper checklist is intended to help authors reflect on a wide variety of issues relating to responsible machine learning research, including reproducibility, transparency, research ethics, and societal impact. The checklist forms part of the paper submission, but does not count towards the page limit.

Supplementary material: While all technical appendices should be included as part of the main paper submission PDF, authors may submit up to 100MB of supplementary material, such as data, or source code in a ZIP format. Supplementary material should be material created by the authors that directly supports the submission content. Like submissions, supplementary material must be anonymized. Looking at supplementary material is at the discretion of the reviewers.

We encourage authors to upload their code and data as part of their supplementary material in order to help reviewers assess the quality of the work. Check the policy as well as code submission guidelines and templates for further details.

Use of Large Language Models (LLMs): We welcome authors to use any tool that is suitable for preparing high-quality papers and research. However, we ask authors to keep in mind two important criteria. First, we expect papers to fully describe their methodology, and any tool that is important to that methodology, including the use of LLMs, should be described also. For example, authors should mention tools (including LLMs) that were used for data processing or filtering, visualization, facilitating or running experiments, and proving theorems. It may also be advisable to describe the use of LLMs in implementing the method (if this corresponds to an important, original, or non-standard component of the approach). Second, authors are responsible for the entire content of the paper, including all text and figures, so while authors are welcome to use any tool they wish for writing the paper, they must ensure that all text is correct and original.

Double-blind reviewing:   All submissions must be anonymized and may not contain any identifying information that may violate the double-blind reviewing policy.  This policy applies to any supplementary or linked material as well, including code.  If you are including links to any external material, it is your responsibility to guarantee anonymous browsing.  Please do not include acknowledgements at submission time. If you need to cite one of your own papers, you should do so with adequate anonymization to preserve double-blind reviewing.  For instance, write “In the previous work of Smith et al. [1]…” rather than “In our previous work [1]...”). If you need to cite one of your own papers that is in submission to NeurIPS and not available as a non-anonymous preprint, then include a copy of the cited anonymized submission in the supplementary material and write “Anonymous et al. [1] concurrently show...”). Any papers found to be violating this policy will be rejected.

OpenReview: We are using OpenReview to manage submissions. The reviews and author responses will not be public initially (but may be made public later, see below). As in previous years, submissions under review will be visible only to their assigned program committee. We will not be soliciting comments from the general public during the reviewing process. Anyone who plans to submit a paper as an author or a co-author will need to create (or update) their OpenReview profile by the full paper submission deadline. Your OpenReview profile can be edited by logging in and clicking on your name in https://openreview.net/ . This takes you to a URL "https://openreview.net/profile?id=~[Firstname]_[Lastname][n]" where the last part is your profile name, e.g., ~Wei_Zhang1. The OpenReview profiles must be up to date, with all publications by the authors, and their current affiliations. The easiest way to import publications is through DBLP but it is not required, see FAQ . Submissions without updated OpenReview profiles will be desk rejected. The information entered in the profile is critical for ensuring that conflicts of interest and reviewer matching are handled properly. Because of the rapid growth of NeurIPS, we request that all authors help with reviewing papers, if asked to do so. We need everyone’s help in maintaining the high scientific quality of NeurIPS.  

Please be aware that OpenReview has a moderation policy for newly created profiles: New profiles created without an institutional email will go through a moderation process that can take up to two weeks. New profiles created with an institutional email will be activated automatically.

Venue home page: https://openreview.net/group?id=NeurIPS.cc/2024/Conference

If you have any questions, please refer to the FAQ: https://openreview.net/faq

Ethics review: Reviewers and ACs may flag submissions for ethics review . Flagged submissions will be sent to an ethics review committee for comments. Comments from ethics reviewers will be considered by the primary reviewers and AC as part of their deliberation. They will also be visible to authors, who will have an opportunity to respond.  Ethics reviewers do not have the authority to reject papers, but in extreme cases papers may be rejected by the program chairs on ethical grounds, regardless of scientific quality or contribution.  

Preprints: The existence of non-anonymous preprints (on arXiv or other online repositories, personal websites, social media) will not result in rejection. If you choose to use the NeurIPS style for the preprint version, you must use the “preprint” option rather than the “final” option. Reviewers will be instructed not to actively look for such preprints, but encountering them will not constitute a conflict of interest. Authors may submit anonymized work to NeurIPS that is already available as a preprint (e.g., on arXiv) without citing it. Note that public versions of the submission should not say "Under review at NeurIPS" or similar.

Dual submissions: Submissions that are substantially similar to papers that the authors have previously published or submitted in parallel to other peer-reviewed venues with proceedings or journals may not be submitted to NeurIPS. Papers previously presented at workshops are permitted, so long as they did not appear in a conference proceedings (e.g., CVPRW proceedings), a journal or a book.  NeurIPS coordinates with other conferences to identify dual submissions.  The NeurIPS policy on dual submissions applies for the entire duration of the reviewing process.  Slicing contributions too thinly is discouraged.  The reviewing process will treat any other submission by an overlapping set of authors as prior work. If publishing one would render the other too incremental, both may be rejected.

Anti-collusion: NeurIPS does not tolerate any collusion whereby authors secretly cooperate with reviewers, ACs or SACs to obtain favorable reviews. 

Author responses:   Authors will have one week to view and respond to initial reviews. Author responses may not contain any identifying information that may violate the double-blind reviewing policy. Authors may not submit revisions of their paper or supplemental material, but may post their responses as a discussion in OpenReview. This is to reduce the burden on authors to have to revise their paper in a rush during the short rebuttal period.

After the initial response period, authors will be able to respond to any further reviewer/AC questions and comments by posting on the submission’s forum page. The program chairs reserve the right to solicit additional reviews after the initial author response period.  These reviews will become visible to the authors as they are added to OpenReview, and authors will have a chance to respond to them.

After the notification deadline, accepted and opted-in rejected papers will be made public and open for non-anonymous public commenting. Their anonymous reviews, meta-reviews, author responses and reviewer responses will also be made public. Authors of rejected papers will have two weeks after the notification deadline to opt in to make their deanonymized rejected papers public in OpenReview.  These papers are not counted as NeurIPS publications and will be shown as rejected in OpenReview.

Publication of accepted submissions:   Reviews, meta-reviews, and any discussion with the authors will be made public for accepted papers (but reviewer, area chair, and senior area chair identities will remain anonymous). Camera-ready papers will be due in advance of the conference. All camera-ready papers must include a funding disclosure . We strongly encourage accompanying code and data to be submitted with accepted papers when appropriate, as per the code submission policy . Authors will be allowed to make minor changes for a short period of time after the conference.

Contemporaneous Work: For the purpose of the reviewing process, papers that appeared online within two months of a submission will generally be considered "contemporaneous" in the sense that the submission will not be rejected on the basis of the comparison to contemporaneous work. Authors are still expected to cite and discuss contemporaneous work and perform empirical comparisons to the degree feasible. Any paper that influenced the submission is considered prior work and must be cited and discussed as such. Submissions that are very similar to contemporaneous work will undergo additional scrutiny to prevent cases of plagiarism and missing credit to prior work.

Plagiarism is prohibited by the NeurIPS Code of Conduct .

Other Tracks: Similarly to earlier years, we will host multiple tracks, such as datasets, competitions, tutorials as well as workshops, in addition to the main track for which this call for papers is intended. See the conference homepage for updates and calls for participation in these tracks. 

Experiments: As in past years, the program chairs will be measuring the quality and effectiveness of the review process via randomized controlled experiments. All experiments are independently reviewed and approved by an Institutional Review Board (IRB).

Financial Aid: Each paper may designate up to one (1) NeurIPS.cc account email address of a corresponding student author who confirms that they would need the support to attend the conference, and agrees to volunteer if they get selected. To be considered for Financial the student will also need to fill out the Financial Aid application when it becomes available.

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: event-based video frame interpolation with edge guided motion refinement.

Abstract: Video frame interpolation, the process of synthesizing intermediate frames between sequential video frames, has made remarkable progress with the use of event cameras. These sensors, with microsecond-level temporal resolution, fill information gaps between frames by providing precise motion cues. However, contemporary Event-Based Video Frame Interpolation (E-VFI) techniques often neglect the fact that event data primarily supply high-confidence features at scene edges during multi-modal feature fusion, thereby diminishing the role of event signals in optical flow (OF) estimation and warping refinement. To address this overlooked aspect, we introduce an end-to-end E-VFI learning method (referred to as EGMR) to efficiently utilize edge features from event signals for motion flow and warping enhancement. Our method incorporates an Edge Guided Attentive (EGA) module, which rectifies estimated video motion through attentive aggregation based on the local correlation of multi-modal features in a coarse-to-fine strategy. Moreover, given that event data can provide accurate visual references at scene edges between consecutive frames, we introduce a learned visibility map derived from event data to adaptively mitigate the occlusion problem in the warping refinement process. Extensive experiments on both synthetic and real datasets show the effectiveness of the proposed approach, demonstrating its potential for higher quality video frame interpolation.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

CASP Lab goes to SIGMOD’24 with a demo and a DaMoN paper

Members of the CASP Lab will present two recent works at the upcoming ACM SIGMOD’24 conference , in Santiago, Chile:

  • QueryShield: Cryptographically Secure Analytics in the Cloud , was accepted at the SIGMOD’24 Demos track .
  • In situ neighborhood sampling for large-scale GNN training , was accepted at the Data Management on New Hardware (DaMoN’24) workshop .

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)

Our approach

  • Responsibility
  • Infrastructure
  • Try Meta AI

RECOMMENDED READS

  • 5 Steps to Getting Started with Llama 2
  • The Llama Ecosystem: Past, Present, and Future
  • Introducing Code Llama, a state-of-the-art large language model for coding
  • Meta and Microsoft Introduce the Next Generation of Llama
  • Today, we’re introducing Meta Llama 3, the next generation of our state-of-the-art open source large language model.
  • Llama 3 models will soon be available on AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake, and with support from hardware platforms offered by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.
  • We’re dedicated to developing Llama 3 in a responsible way, and we’re offering various resources to help others use it responsibly as well. This includes introducing new trust and safety tools with Llama Guard 2, Code Shield, and CyberSec Eval 2.
  • In the coming months, we expect to introduce new capabilities, longer context windows, additional model sizes, and enhanced performance, and we’ll share the Llama 3 research paper.
  • Meta AI, built with Llama 3 technology, is now one of the world’s leading AI assistants that can boost your intelligence and lighten your load—helping you learn, get things done, create content, and connect to make the most out of every moment. You can try Meta AI here .

Today, we’re excited to share the first two models of the next generation of Llama, Meta Llama 3, available for broad use. This release features pretrained and instruction-fine-tuned language models with 8B and 70B parameters that can support a broad range of use cases. This next generation of Llama demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning. We believe these are the best open source models of their class, period. In support of our longstanding open approach, we’re putting Llama 3 in the hands of the community. We want to kickstart the next wave of innovation in AI across the stack—from applications to developer tools to evals to inference optimizations and more. We can’t wait to see what you build and look forward to your feedback.

Our goals for Llama 3

With Llama 3, we set out to build the best open models that are on par with the best proprietary models available today. We wanted to address developer feedback to increase the overall helpfulness of Llama 3 and are doing so while continuing to play a leading role on responsible use and deployment of LLMs. We are embracing the open source ethos of releasing early and often to enable the community to get access to these models while they are still in development. The text-based models we are releasing today are the first in the Llama 3 collection of models. Our goal in the near future is to make Llama 3 multilingual and multimodal, have longer context, and continue to improve overall performance across core LLM capabilities such as reasoning and coding.

State-of-the-art performance

Our new 8B and 70B parameter Llama 3 models are a major leap over Llama 2 and establish a new state-of-the-art for LLM models at those scales. Thanks to improvements in pretraining and post-training, our pretrained and instruction-fine-tuned models are the best models existing today at the 8B and 70B parameter scale. Improvements in our post-training procedures substantially reduced false refusal rates, improved alignment, and increased diversity in model responses. We also saw greatly improved capabilities like reasoning, code generation, and instruction following making Llama 3 more steerable.

research paper video processing

*Please see evaluation details for setting and parameters with which these evaluations are calculated.

In the development of Llama 3, we looked at model performance on standard benchmarks and also sought to optimize for performance for real-world scenarios. To this end, we developed a new high-quality human evaluation set. This evaluation set contains 1,800 prompts that cover 12 key use cases: asking for advice, brainstorming, classification, closed question answering, coding, creative writing, extraction, inhabiting a character/persona, open question answering, reasoning, rewriting, and summarization. To prevent accidental overfitting of our models on this evaluation set, even our own modeling teams do not have access to it. The chart below shows aggregated results of our human evaluations across of these categories and prompts against Claude Sonnet, Mistral Medium, and GPT-3.5.

research paper video processing

Preference rankings by human annotators based on this evaluation set highlight the strong performance of our 70B instruction-following model compared to competing models of comparable size in real-world scenarios.

Our pretrained model also establishes a new state-of-the-art for LLM models at those scales.

research paper video processing

To develop a great language model, we believe it’s important to innovate, scale, and optimize for simplicity. We adopted this design philosophy throughout the Llama 3 project with a focus on four key ingredients: the model architecture, the pretraining data, scaling up pretraining, and instruction fine-tuning.

Model architecture

In line with our design philosophy, we opted for a relatively standard decoder-only transformer architecture in Llama 3. Compared to Llama 2, we made several key improvements. Llama 3 uses a tokenizer with a vocabulary of 128K tokens that encodes language much more efficiently, which leads to substantially improved model performance. To improve the inference efficiency of Llama 3 models, we’ve adopted grouped query attention (GQA) across both the 8B and 70B sizes. We trained the models on sequences of 8,192 tokens, using a mask to ensure self-attention does not cross document boundaries.

Training data

To train the best language model, the curation of a large, high-quality training dataset is paramount. In line with our design principles, we invested heavily in pretraining data. Llama 3 is pretrained on over 15T tokens that were all collected from publicly available sources. Our training dataset is seven times larger than that used for Llama 2, and it includes four times more code. To prepare for upcoming multilingual use cases, over 5% of the Llama 3 pretraining dataset consists of high-quality non-English data that covers over 30 languages. However, we do not expect the same level of performance in these languages as in English.

To ensure Llama 3 is trained on data of the highest quality, we developed a series of data-filtering pipelines. These pipelines include using heuristic filters, NSFW filters, semantic deduplication approaches, and text classifiers to predict data quality. We found that previous generations of Llama are surprisingly good at identifying high-quality data, hence we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3.

We also performed extensive experiments to evaluate the best ways of mixing data from different sources in our final pretraining dataset. These experiments enabled us to select a data mix that ensures that Llama 3 performs well across use cases including trivia questions, STEM, coding, historical knowledge, etc.

Scaling up pretraining

To effectively leverage our pretraining data in Llama 3 models, we put substantial effort into scaling up pretraining. Specifically, we have developed a series of detailed scaling laws for downstream benchmark evaluations. These scaling laws enable us to select an optimal data mix and to make informed decisions on how to best use our training compute. Importantly, scaling laws allow us to predict the performance of our largest models on key tasks (for example, code generation as evaluated on the HumanEval benchmark—see above) before we actually train the models. This helps us ensure strong performance of our final models across a variety of use cases and capabilities.

We made several new observations on scaling behavior during the development of Llama 3. For example, while the Chinchilla-optimal amount of training compute for an 8B parameter model corresponds to ~200B tokens, we found that model performance continues to improve even after the model is trained on two orders of magnitude more data. Both our 8B and 70B parameter models continued to improve log-linearly after we trained them on up to 15T tokens. Larger models can match the performance of these smaller models with less training compute, but smaller models are generally preferred because they are much more efficient during inference.

To train our largest Llama 3 models, we combined three types of parallelization: data parallelization, model parallelization, and pipeline parallelization. Our most efficient implementation achieves a compute utilization of over 400 TFLOPS per GPU when trained on 16K GPUs simultaneously. We performed training runs on two custom-built 24K GPU clusters . To maximize GPU uptime, we developed an advanced new training stack that automates error detection, handling, and maintenance. We also greatly improved our hardware reliability and detection mechanisms for silent data corruption, and we developed new scalable storage systems that reduce overheads of checkpointing and rollback. Those improvements resulted in an overall effective training time of more than 95%. Combined, these improvements increased the efficiency of Llama 3 training by ~three times compared to Llama 2.

Instruction fine-tuning

To fully unlock the potential of our pretrained models in chat use cases, we innovated on our approach to instruction-tuning as well. Our approach to post-training is a combination of supervised fine-tuning (SFT), rejection sampling, proximal policy optimization (PPO), and direct preference optimization (DPO). The quality of the prompts that are used in SFT and the preference rankings that are used in PPO and DPO has an outsized influence on the performance of aligned models. Some of our biggest improvements in model quality came from carefully curating this data and performing multiple rounds of quality assurance on annotations provided by human annotators.

Learning from preference rankings via PPO and DPO also greatly improved the performance of Llama 3 on reasoning and coding tasks. We found that if you ask a model a reasoning question that it struggles to answer, the model will sometimes produce the right reasoning trace: The model knows how to produce the right answer, but it does not know how to select it. Training on preference rankings enables the model to learn how to select it.

Building with Llama 3

Our vision is to enable developers to customize Llama 3 to support relevant use cases and to make it easier to adopt best practices and improve the open ecosystem. With this release, we’re providing new trust and safety tools including updated components with both Llama Guard 2 and Cybersec Eval 2, and the introduction of Code Shield—an inference time guardrail for filtering insecure code produced by LLMs.

We’ve also co-developed Llama 3 with torchtune , the new PyTorch-native library for easily authoring, fine-tuning, and experimenting with LLMs. torchtune provides memory efficient and hackable training recipes written entirely in PyTorch. The library is integrated with popular platforms such as Hugging Face, Weights & Biases, and EleutherAI and even supports Executorch for enabling efficient inference to be run on a wide variety of mobile and edge devices. For everything from prompt engineering to using Llama 3 with LangChain we have a comprehensive getting started guide and takes you from downloading Llama 3 all the way to deployment at scale within your generative AI application.

A system-level approach to responsibility

We have designed Llama 3 models to be maximally helpful while ensuring an industry leading approach to responsibly deploying them. To achieve this, we have adopted a new, system-level approach to the responsible development and deployment of Llama. We envision Llama models as part of a broader system that puts the developer in the driver’s seat. Llama models will serve as a foundational piece of a system that developers design with their unique end goals in mind.

research paper video processing

Instruction fine-tuning also plays a major role in ensuring the safety of our models. Our instruction-fine-tuned models have been red-teamed (tested) for safety through internal and external efforts. ​​Our red teaming approach leverages human experts and automation methods to generate adversarial prompts that try to elicit problematic responses. For instance, we apply comprehensive testing to assess risks of misuse related to Chemical, Biological, Cyber Security, and other risk areas. All of these efforts are iterative and used to inform safety fine-tuning of the models being released. You can read more about our efforts in the model card .

Llama Guard models are meant to be a foundation for prompt and response safety and can easily be fine-tuned to create a new taxonomy depending on application needs. As a starting point, the new Llama Guard 2 uses the recently announced MLCommons taxonomy, in an effort to support the emergence of industry standards in this important area. Additionally, CyberSecEval 2 expands on its predecessor by adding measures of an LLM’s propensity to allow for abuse of its code interpreter, offensive cybersecurity capabilities, and susceptibility to prompt injection attacks (learn more in our technical paper ). Finally, we’re introducing Code Shield which adds support for inference-time filtering of insecure code produced by LLMs. This offers mitigation of risks around insecure code suggestions, code interpreter abuse prevention, and secure command execution.

With the speed at which the generative AI space is moving, we believe an open approach is an important way to bring the ecosystem together and mitigate these potential harms. As part of that, we’re updating our Responsible Use Guide (RUG) that provides a comprehensive guide to responsible development with LLMs. As we outlined in the RUG, we recommend that all inputs and outputs be checked and filtered in accordance with content guidelines appropriate to the application. Additionally, many cloud service providers offer content moderation APIs and other tools for responsible deployment, and we encourage developers to also consider using these options.

Deploying Llama 3 at scale

Llama 3 will soon be available on all major platforms including cloud providers, model API providers, and much more. Llama 3 will be everywhere .

Our benchmarks show the tokenizer offers improved token efficiency, yielding up to 15% fewer tokens compared to Llama 2. Also, Group Query Attention (GQA) now has been added to Llama 3 8B as well. As a result, we observed that despite the model having 1B more parameters compared to Llama 2 7B, the improved tokenizer efficiency and GQA contribute to maintaining the inference efficiency on par with Llama 2 7B.

For examples of how to leverage all of these capabilities, check out Llama Recipes which contains all of our open source code that can be leveraged for everything from fine-tuning to deployment to model evaluation.

What’s next for Llama 3?

The Llama 3 8B and 70B models mark the beginning of what we plan to release for Llama 3. And there’s a lot more to come.

Our largest models are over 400B parameters and, while these models are still training, our team is excited about how they’re trending. Over the coming months, we’ll release multiple models with new capabilities including multimodality, the ability to converse in multiple languages, a much longer context window, and stronger overall capabilities. We will also publish a detailed research paper once we are done training Llama 3.

To give you a sneak preview for where these models are today as they continue training, we thought we could share some snapshots of how our largest LLM model is trending. Please note that this data is based on an early checkpoint of Llama 3 that is still training and these capabilities are not supported as part of the models released today.

research paper video processing

We’re committed to the continued growth and development of an open AI ecosystem for releasing our models responsibly. We have long believed that openness leads to better, safer products, faster innovation, and a healthier overall market. This is good for Meta, and it is good for society. We’re taking a community-first approach with Llama 3, and starting today, these models are available on the leading cloud, hosting, and hardware platforms with many more to come.

Try Meta Llama 3 today

We’ve integrated our latest models into Meta AI, which we believe is the world’s leading AI assistant. It’s now built with Llama 3 technology and it’s available in more countries across our apps.

You can use Meta AI on Facebook, Instagram, WhatsApp, Messenger, and the web to get things done, learn, create, and connect with the things that matter to you. You can read more about the Meta AI experience here .

Visit the Llama 3 website to download the models and reference the Getting Started Guide for the latest list of all available platforms.

You’ll also soon be able to test multimodal Meta AI on our Ray-Ban Meta smart glasses.

As always, we look forward to seeing all the amazing products and experiences you will build with Meta Llama 3.

Our latest updates delivered to your inbox

Subscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.

Join us in the pursuit of what’s possible with AI.

research paper video processing

Product experiences

Foundational models

Latest news

Meta © 2024

This paper is in the following e-collection/theme issue:

Published on 2.5.2024 in Vol 26 (2024)

Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Models (CREMLS)

Authors of this article:

Author Orcid Image

There are no citations yet available for this article according to Crossref .

IMAGES

  1. (PDF) A Review on Image & Video Processing

    research paper video processing

  2. How To Write A Research Paper Step By Step Ppt? Update New

    research paper video processing

  3. INTRODUCTION TO RESEARCH PAPER

    research paper video processing

  4. (PDF) A Review on Image Processing

    research paper video processing

  5. Brain Tumor Detection Using Image Processing Ieee Papers

    research paper video processing

  6. 🎉 Medical image processing research papers. Most Downloaded Medical

    research paper video processing

VIDEO

  1. The Art of VIDEO EDITING: Conveying Storytelling through Visuals

  2. How to write a research paper

  3. Research Paper Presentation

  4. Processing Information on Paper

  5. WORD PROCESSING OF RESEARCH PAPER by Olayres & Rabasio

  6. Lecture 06: Research Process-I

COMMENTS

  1. Video Processing Using Deep Learning Techniques: A Systematic

    Studies show lots of advanced research on various data types such as image, speech, and text using deep learning techniques, but nowadays, research on video processing is also an emerging field of computer vision. Several surveys are present on video processing using computer vision deep learning techniques, targeting specific functionality such as anomaly detection, crowd analysis, activity ...

  2. Video Processing Using Deep Learning Techniques: A Systematic

    Review (SLR) on video processing using deep learning to in vestigate the applications, functionalities, techniques, datasets, issues, and challenges by formulating the relevant research questions ...

  3. Video Processing Using Deep Learning Techniques: A Systematic

    This paper aims to present a Systematic Literature Review (SLR) on video processing using deep learning to investigate the applications, functionalities, techniques, datasets, issues, and challenges by formulating the relevant research questions (RQs). This systematic mapping includes 93 research articles from reputed databases published ...

  4. Video Understanding

    Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets. ... In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time. 6.

  5. Home

    Overview. Signal, Image and Video Processing is an interdisciplinary journal focusing on theory and practice of signal, image and video processing. Sets forth practical solutions for current signal, image and video processing problems in engineering and science. Features reviews, tutorials, and accounts of practical developments.

  6. Applications of Video Processing and Computer Vision Sensor

    A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications. Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the ...

  7. An Overview of Traditional and Recent Trends in Video Processing

    Video processing is a significant field of research interest in recent years. Before going into the recent advancement of video processing, an overview about the traditional video processing is a matter of interest. Knowing about this, its advantages and limitations help to give a strong base and invoke an insight into the further development of this research area. This paper introduces the ...

  8. A Systematic Review of Video Analytics Using Machine ...

    Video processing research trends illustrate a survey on practices like object detection, object recognition, object tracking, traffic control and monitoring, action and behaviour recognition, disaster management and so on. ... In this paper, we present state-of-the-art computational techniques available for the modules mentioned above. This ...

  9. Video coding and processing: A survey

    So the combination between AI algorithms and video coding procedure will be a hot research area in the future. 5. Conclusion. In this paper, the survey of video technologies has been presented. First, the architecture of hybrid coding framework has been shown, which includes prediction, transform, quantization, scanning and entropy coding modules.

  10. Video description: A comprehensive survey of deep learning ...

    Video description refers to understanding visual content and transforming that acquired understanding into automatic textual narration. It bridges the key AI fields of computer vision and natural language processing in conjunction with real-time and practical applications. Deep learning-based approaches employed for video description have demonstrated enhanced results compared to conventional ...

  11. 3314 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on IMAGE AND VIDEO PROCESSING. Find methods information, sources, references or conduct a literature ...

  12. Articles

    Hailey Joren, Otkrist Gupta and Dan Raviv. EURASIP Journal on Image and Video Processing 2022 2022 :2. Research Published on: 7 February 2022. The Correction to this article has been published in EURASIP Journal on Image and Video Processing 2023 2023 :10. Full Text.

  13. Intelligent video surveillance: a review through deep learning

    Big data applications are consuming most of the space in industry and research area. Among the widespread examples of big data, the role of video streams from CCTV cameras is equally important as other sources like social media data, sensor data, agriculture data, medical data and data evolved from space research. Surveillance videos have a major contribution in unstructured big data. CCTV ...

  14. Machine Learning in Image and Video Processing

    Therefore, the aim of this Special Issue is to apply advanced machine learning approaches in image and video processing. The Issue will provide novel guidance for machine learning researchers and broaden the perspectives of machine learning and computer vision researchers. Original research and review articles are welcomed.

  15. Image and Video Processing

    Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications. ... Image and Video Processing for ...

  16. (PDF) VIDEO PROCESSING AND ITS APPLICATION

    Introduction. Video proce ssing is a specific instance of sign p rocessing, whic h frequently utilizes video channels and where the info and yield. signals are video records or video tr ansfers ...

  17. Image and Video Processing authors/titles Jun 2023

    Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV) [10] arXiv:2306.00548 [ pdf ] Title: Label- and slide-free tissue histology using 3D epi-mode quantitative phase imaging and virtual H&E staining

  18. Video Processing Using Deep Learning Techniques: A Systematic

    Year-wise distribution of the publication. list of publications we considered purely to answer the RQs is between the time range 2011-2020, and few papers which are beyond time range are used only for background Study III. RESULTS A total of 93 peer-reviewed research papers on video processing using deep learning techniques were studied.

  19. A review of image and video colorization: From analogies to deep

    It is worth noting that since the current research on semantic understanding in natural language processing is still in the preliminary stage, the input text of text based colorization methods is more similar to a text control instruction, and more accurate text understanding and color matching are still Need further research. 4.4.

  20. Video Processing Research Papers

    Modèles et méthodes de traitement d'images pour l'analyse de la langue des signes. This paper focuses on methods applied for sign language video processing. In the first part, we present a robust traking method which detects and tracks the hands and face of a person performing Signs' language communication.

  21. Advances in image processing using machine learning techniques

    Encoding low-rank matched acquired images using High Efficiency Video Coding eliminates redundancies in the light field. ... mainly IEEE, conference papers. Her research interests include digital signal processing and digital communications. Dr. ... His research interests include image processing, adaptive filtering, digital filter design, and ...

  22. IET Image Processing: Vol 18, No 6

    IET Image Processing journal publishes the latest research in image and video processing, covering the generation, processing and communication of visual ... thus they perform poorly on image classification tasks. Aiming at the above problems, this paper proposes image classification based on cross modal knowledge learning of scene text (CKLST ...

  23. Articles

    Xin Gao. Longgang Zhang. Jie Zheng. Original Paper 13 April 2024. 1. 2. …. 63. Signal, Image and Video Processing is an interdisciplinary journal focusing on theory and practice of signal, image and video processing.

  24. NeurIPS 2024 Call for Papers

    Camera-ready, poster, and video submission: Oct 30, 2024 AOE ... Use of Large Language Models (LLMs): We welcome authors to use any tool that is suitable for preparing high-quality papers and research. However, we ask authors to keep in mind two important criteria. ... For the purpose of the reviewing process, papers that appeared online within ...

  25. [2404.18156] Event-based Video Frame Interpolation with Edge Guided

    View a PDF of the paper titled Event-based Video Frame Interpolation with Edge Guided Motion Refinement, by Yuhan Liu and 5 other authors. View PDF HTML (experimental) Abstract: Video frame interpolation, the process of synthesizing intermediate frames between sequential video frames, has made remarkable progress with the use of event cameras ...

  26. CASP Lab goes to SIGMOD'24 with a demo and a DaMoN paper

    CASP Lab goes to SIGMOD'24 with a demo and a DaMoN paper. Members of the CASP Lab will present two recent works at the upcoming ACM SIGMOD'24 conference, in Santiago, Chile:. QueryShield: Cryptographically Secure Analytics in the Cloud, was accepted at the SIGMOD'24 Demos track.; In situ neighborhood sampling for large-scale GNN training, was accepted at the Data Management on New ...

  27. Electrical control of glass-like dynamics in vanadium dioxide for data

    Y.P. and J.S. acknowledge support for the PLD growth from the Basic Science Research Program (2020R1A4A1018935) through the National Research Foundation of Korea (NRF) funded by the Ministry of ...

  28. (PDF) A Review on Image & Video Processing

    [email protected]. Abstract. Image and Video Processing are hot topics in the field of research and development. Image processing is any form of signal processing for which the input is an image ...

  29. Introducing Meta Llama 3: The most capable openly available LLM to date

    Today, we're introducing Meta Llama 3, the next generation of our state-of-the-art open source large language model. Llama 3 models will soon be available on AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake, and with support from hardware platforms offered by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.

  30. Journal of Medical Internet Research

    The number of papers presenting machine learning (ML) models that are being submitted to and published in the Journal of Medical Internet Research and other JMIR Publications journals has steadily increased. Editors and peer reviewers involved in the review process for such manuscripts often go through multiple review cycles to enhance the quality and completeness of reporting.