IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Neural Style Transfer: A Review

The seminal work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNNs) in creating artistic imagery by separating and recombining image content and style. This process of using CNNs to render a content image in different styles is referred to as Neural Style Transfer (NST). Since then, NST has become a trending topic both in academic literature and industrial applications. It is receiving increasing attention and a variety of approaches are proposed to either improve or extend the original NST algorithm. In this paper, we aim to provide a comprehensive overview of the current progress towards NST. We first propose a taxonomy of current algorithms in the field of NST. Then, we present several evaluation methods and compare different NST algorithms both qualitatively and quantitatively. The review concludes with a discussion of various applications of NST and open problems for future research. A list of papers discussed in this review, corresponding codes, pre-trained models and more comparison results are publicly available at: https://github.com/ycjing/Neural-Style-Transfer-Papers .

Index Terms:

1 introduction.

Painting is a popular form of art. For thousands of years, people have been attracted by the art of painting with the advent of many appealing artworks, e.g., van Gogh’s “The Starry Night”. In the past, re-drawing an image in a particular style requires a well-trained artist and lots of time.

Since the mid-1990s, the art theories behind the appealing artworks have been attracting the attention of not only the artists but many computer science researchers. There are plenty of studies and techniques exploring how to automatically turn images into synthetic artworks. Among these studies, the advances in non-photorealistic rendering (NPR) [ 1 , 2 , 3 ] are inspiring, and nowadays, it is a firmly established field in the community of computer graphics. However, most of these NPR stylisation algorithms are designed for particular artistic styles [ 3 , 4 ] and cannot be easily extended to other styles. In the community of computer vision, style transfer is usually studied as a generalised problem of texture synthesis, which is to extract and transfer the texture from the source to target [ 5 , 6 , 7 , 8 ] . Hertzmann et al. [ 9 ] further propose a framework named image analogies to perform a generalised style transfer by learning the analogous transformation from the provided example pairs of unstylised and stylised images. However, the common limitation of these methods is that they only use low-level image features and often fail to capture image structures effectively.

Refer to caption

Recently, inspired by the power of Convolutional Neural Networks (CNNs) , Gatys et al. [ 10 ] first studied how to use a CNN to reproduce famous painting styles on natural images. They proposed to model the content of a photo as the feature responses from a pre-trained CNN, and further model the style of an artwork as the summary feature statistics. Their experimental results demonstrated that a CNN is capable of extracting content information from an arbitrary photograph and style information from a well-known artwork. Based on this finding, Gatys et al. [ 10 ] first proposed to exploit CNN feature activations to recombine the content of a given photo and the style of famous artworks. The key idea behind their algorithm is to iteratively optimise an image with the objective of matching desired CNN feature distributions, which involves both the photo’s content information and artwork’s style information. Their proposed algorithm successfully produces stylised images with the appearance of a given artwork. Figure  1 shows an example of transferring the style of a Chinese painting “Dwelling in the Fuchun Mountains” onto a photo of The Great Wall. Since the algorithm of Gatys et al. does not have any explicit restrictions on the type of style images and also does not need ground truth results for training, it breaks the constraints of previous approaches. The work of Gatys et al. opened up a new field called Neural Style Transfer (NST) , which is the process of using Convolutional Neural Network to render a content image in different styles.

The seminal work of Gatys et al. has attracted wide attention from both academia and industry. In academia, lots of follow-up studies were conducted to either improve or extend this NST algorithm. The related researches of NST have also led to many successful industrial applications (e.g., Prisma [ 11 ] , Ostagram [ 12 ] , Deep Forger [ 13 ] ). However, there is no comprehensive survey summarising and discussing recent advances as well as challenges within this new field of Neural Style Transfer.

In this paper, we aim to provide an overview of current advances (up to March 2018) in Neural Style Transfer (NST). Our contributions are threefold. First, we investigate, classify and summarise recent advances in the field of NST. Second, we present several evaluation methods and experimentally compare different NST algorithms. Third, we summarise current challenges in this field and propose possible directions on how to deal with them in future works.

The organisation of this paper is as follows. We start our discussion with a brief review of previous artistic rendering methods without CNNs in Section  2 . Then Section  3 explores the derivations and foundations of NST. Based on the discussions in Section  3 , we categorise and explain existing NST algorithms in Section  4 . Some improvement strategies for these methods and their extensions will be given in Section  5 . Section  6 presents several methodologies for evaluating NST algorithms and aims to build a standardised benchmark for follow-up studies. Then we demonstrate the commercial applications of NST in Section  7 , including both current successful usages and its potential applications. In Section  8 , we summarise current challenges in the field of NST, as well as propose possible directions on how to deal with them in future works. Finally, Section  9 concludes the paper and delineates several promising directions for future research.

Refer to caption

2 Style Transfer Without Neural Networks

Artistic stylisation is a long-standing research topic. Due to its wide variety of applications, it has been an important research area for more than two decades. Before the appearance of NST, the related researches have expanded into an area called non-photorealistic rendering (NPR). In this section, we briefly review some of these artistic rendering (AR) algorithms without CNNs. Specifically, we focus on artistic stylization of 2D images, which is called image-based artistic rendering (IB-AR) in [ 14 ] . For a more comprehensive overview of IB-AR techniques, we recommend [ 3 , 14 , 15 ] . Following the IB-AR taxonomy defined by Kyprianidis et al. [ 14 ] , we first introduce each category of IB-AR techniques without CNNs and then discuss their strengths and weaknesses.

Stroke-Based Rendering. Stroke-based rendering (SBR) refers to a process of placing virtual strokes (e.g., brush strokes, tiles, stipples) upon a digital canvas to render a photograph with a particular style [ 16 ] . The process of SBR is generally starting from a source photo, incrementally compositing strokes to match the photo, and finally producing a non-photorealistic imagery, which looks like the photo but with an artistic style. During this process, an objective function is designed to guide the greedy or iterative placement of strokes.

The goal of SBR algorithms is to faithfully depict a prescribed style. Therefore, they are generally effective at simulating certain types of styles (e.g., oil paintings, watercolours, sketches). However, each SBR algorithm is carefully designed for only one particular style and not capable of simulating an arbitrary style, which is not flexible.

Region-Based Techniques. Region-based rendering is to incorporate region segmentation to enable the adaption of rendering based on the content in regions. Early region-based IB-AR algorithms exploit the shape of regions to guide the stroke placement [ 17 , 18 ] . In this way, different stroke patterns can be produced in different semantic regions in an image. Song et al. [ 19 ] further propose a region-based IB-AR algorithm to manipulate geometry for artistic styles. Their algorithm creates simplified shape rendering effects by replacing regions with several canonical shapes.

Considering regions in rendering allows the local control over the level of details. However, the problem in SBR persists: one region-based rendering algorithm is not capable of simulating an arbitrary style.

Example-Based Rendering. The goal of example-based rendering is to learn the mapping between an exemplar pair. This category of IB-AR techniques is pioneered by Hertzmann et al., who propose a framework named image analogies [ 9 ] . Image analogies aim to learn a mapping between a pair of source images and target stylised images in a supervised manner. The training set of image analogy comprises pairs of unstylised source images and the corresponding stylised images with a particular style. Image analogy algorithm then learns the analogous transformation from the example training pairs and creates analogous stylised results when given a test input photograph. Image analogy can also be extended in various ways, e.g., to learn stroke placements for portrait painting rendering [ 20 ] .

In general, image analogies are effective for a variety of artistic styles. However, pairs of training data are usually unavailable in practice. Another limitation is that image analogies only exploit low-level image features. Therefore, image analogies typically fail to effectively capture content and style, which limits the performance.

Image Processing and Filtering. Creating an artistic image is a process that aims for image simplification and abstraction. Therefore, it is natural to consider adopting and combining some related image processing filters to render a given photo. For example, in [ 21 ] , Winnemöller et al. for the first time exploit bilateral [ 22 ] and difference of Gaussians filters [ 23 ] to automatically produce cartoon-like effects.

Compared with other categories of IB-AR techniques, image-filtering based rendering algorithms are generally straightforward to implement and efficient in practice. At an expense, they are very limited in style diversity.

Summary. Based on the above discussions, although some IB-AR algorithms without CNNs are capable of faithfully depicting certain prescribed styles, they typically have the limitations in flexibility, style diversity, and effective image structure extractions. Therefore, there is a demand for novel algorithms to address these limitations, which gives birth to the field of NST.

3 Derivations of Neural Style Transfer

For a better understanding of the NST development, we start by introducing its derivations. To automatically transfer an artistic style, the first and most important issue is how to model and extract style from an image. Since style is very related to texture 1 1 1 We clarify that style is very related to texture but not limited to texture. Style also involves a large degree of simplification and shape abstraction effects, which falls back to the composition or alignment of texture features. , a straightforward way is to relate Visual Style Modelling back to previously well-studied Visual Texture Modelling methods. After obtaining the style representation, the next issue is how to reconstruct an image with desired style information while preserving its content, which is addressed by the Image Reconstruction techniques.

3.1 Visual Texture Modelling

Visual texture modelling [ 24 ] is previously studied as the heart of texture synthesis [ 25 , 26 ] . Throughout the history, there are two distinct approaches to model visual textures, which are Parametric Texture Modelling with Summary Statistics and Non-parametric Texture Modelling with Markov Random Fields (MRFs) .

1) Parametric Texture Modelling with Summary Statistics. One path towards texture modelling is to capture image statistics from a sample texture and exploit summary statistical property to model the texture. The idea is first proposed by Julesz [ 27 ] , who models textures as pixel-based N 𝑁 N -th order statistics. Later, the work in [ 28 ] exploits filter responses to analyze textures, instead of direct pixel-based measurements. After that, Portilla and Simoncelli [ 29 ] further introduce a texture model based on multi-scale orientated filter responses and use gradient descent to improve synthesised results. A more recent parametric texture modelling approach proposed by Gatys et al. [ 30 ] is the first to measure summary statistics in the domain of a CNN. They design a Gram-based representation to model textures, which is the correlations between filter responses in different layers of a pre-trained classification network (VGG network) [ 31 ] . More specifically, the Gram-based representation encodes the second order statistics of the set of CNN filter responses. Next, we will explain this representation in detail for the usage of the following sections.

Assume that the feature map of a sample texture image I s subscript 𝐼 𝑠 I_{s} at layer l 𝑙 l of a pre-trained deep classification network is ℱ l ​ ( I s ) ∈ ℝ C × H × W superscript ℱ 𝑙 subscript 𝐼 𝑠 superscript ℝ 𝐶 𝐻 𝑊 \mathcal{F}^{l}(I_{s})\in\mathbb{R}^{C\times H\times W} , where C 𝐶 C is the number of channels, and H 𝐻 H and W 𝑊 W represent the height and width of the feature map ℱ ​ ( I s ) ℱ subscript 𝐼 𝑠 \mathcal{F}(I_{s}) . Then the Gram-based representation can be obtained by computing the Gram matrix 𝒢 ​ ( ℱ l ​ ( I s ) ′ ) ∈ ℝ C × C 𝒢 superscript ℱ 𝑙 superscript subscript 𝐼 𝑠 ′ superscript ℝ 𝐶 𝐶 \mathcal{G}(\mathcal{F}^{l}(I_{s})^{\prime})\in\mathbb{R}^{C\times C} over the feature map ℱ l ​ ( I s ) ′ ∈ ℝ C × ( H ​ W ) superscript ℱ 𝑙 superscript subscript 𝐼 𝑠 ′ superscript ℝ 𝐶 𝐻 𝑊 \mathcal{F}^{l}(I_{s})^{\prime}\in\mathbb{R}^{C\times(HW)} (a reshaped version of ℱ l ​ ( I s ) superscript ℱ 𝑙 subscript 𝐼 𝑠 \mathcal{F}^{l}(I_{s}) ):

(1)

𝑗 𝛿 (i,j+\delta) . In this way, the representation incorporates spatial arrangement information and is therefore more effective at modelling textures with symmetric properties.

2) Non-parametric Texture Modelling with MRFs. Another notable texture modelling methodology is to use non-parametric resampling. A variety of non-parametric methods are based on MRFs model, which assumes that in a texture image, each pixel is entirely characterised by its spatial neighbourhood. Under this assumption, Efros and Leung [ 25 ] propose to synthesise each pixel one by one by searching similar neighbourhoods in the source texture image and assigning the corresponding pixel. Their work is one of the earliest non-parametric algorithms with MRFs. Following their work, Wei and Levoy [ 26 ] further speed up the neighbourhood matching process by always using a fixed neighbourhood.

3.2 Image Reconstruction

In general, an essential step for many vision tasks is to extract an abstract representation from the input image. Image reconstruction is a reverse process, which is to reconstruct the whole input image from the extracted image representation. It is previously studied to analyse a particular image representation and discover what information is contained in the abstract representation. Here our major focus is on CNN representation based image reconstruction algorithms, which can be categorised into Image-Optimisation-Based Online Image Reconstruction (IOB-IR) and Model-Optimisation-Based Offline Image Reconstruction (MOB-IR).

1) Image-Optimisation-Based Online Image Reconstruction. The first algorithm to reverse CNN representations is proposed by Mahendran and Vedaldi [ 33 , 34 ] . Given a CNN representation to be reversed, their algorithm iteratively optimises an image (generally starting from random noise) until it has a similar desired CNN representation. The iterative optimisation process is based on gradient descent in image space. Therefore, the process is time-consuming especially when the desired reconstructed image is large.

2) Model-Optimisation-Based Offline Image Reconstruction. To address the efficiency issue of [ 33 , 34 ] , Dosovitskiy and Brox [ 35 ] propose to train a feed-forward network in advance and put the computational burden at training stage. At testing stage, the reverse process can be simply done with a network forward pass. Their algorithm significantly speeds up the image reconstruction process. In their later work [ 36 ] , they further combine Generative Adversarial Network (GAN) [ 37 ] to improve the results.

4 A Taxonomy of Neural Style Transfer Algorithms

NST is a subset of the aforementioned example-based IB-AR techniques. In this section, we first provide a categorisation of NST algorithms and then explain major 2D image based non-photorealistic NST algorithms (Figure  2 , purple boxes) in detail. More specifically, for each algorithm, we start by introducing the main idea and then discuss its weaknesses and strengths. Since it is complex to define the notion of style [ 3 , 38 ] and therefore very subjective to define what criteria are important to make a successful style transfer algorithm [ 39 ] , here we try to evaluate these algorithms in a more structural way by only focusing on details, semantics, depth and variations in brush strokes 2 2 2 We claim that the visual criteria with respect to a successful style transfer are definitely not limited to these factors. . We will discuss more about the problem of aesthetic evaluation criterion in Section  8 and also present more evaluation results in Section  6 .

Our proposed taxonomy of NST techniques is shown in Figure  2 . We keep the taxonomy of IB-AR techniques proposed by Kyprianidis et al. [ 14 ] unaffected and extend it by NST algorithms. Current NST methods fit into one of two categories, Image-Optimisation-Based Online Neural Methods (IOB-NST) and Model-Optimisation-Based Offline Neural Methods (MOB-NST). The first category transfers the style by iteratively optimising an image, i.e., algorithms belong to this category are built upon IOB-IR techniques. The second category optimises a generative model offline and produces the stylised image with a single forward pass, which exploits the idea of MOB-IR techniques.

4.1 Image-Optimisation-Based Online Neural Methods

DeepDream [ 40 ] is the first attempt to produce artistic images by reversing CNN representations with IOB-IR techniques. By further combining Visual Texture Modelling techniques to model style, IOB-NST algorithms are subsequently proposed, which build the early foundations for the field of NST. Their basic idea is to first model and extract style and content information from the corresponding style and content images, recombine them as the target representation, and then iteratively reconstruct a stylised result that matches the target representation. In general, different IOB-NST algorithms share the same IOB-IR technique, but differ in the way they model the visual style, which is built on the aforementioned two categories of Visual Texture Modelling techniques. The common limitation of IOB-NST algorithms is that they are computationally expensive, due to the iterative image optimisation procedure.

4.1.1 Parametric Neural Methods with Summary Statistics

The first subset of IOB-NST methods is based on Parametric Texture Modelling with Summary Statistics . The style is characterised as a set of spatial summary statistics.

We start by introducing the first NST algorithm proposed by Gatys et al. [ 10 , 4 ] . By reconstructing representations from intermediate layers of the VGG-19 network, Gatys et al. observe that a deep convolutional neural network is capable of extracting image content from an arbitrary photograph and some appearance information from the well-known artwork. According to this observation, they build the content component of the newly stylised image by penalising the difference of high-level representations derived from content and stylised images, and further build the style component by matching Gram-based summary statistics of style and stylised images, which is derived from their proposed texture modelling technique [ 30 ] (Section 3.1 ). The details of their algorithm are as follows.

Given a content image I c subscript 𝐼 𝑐 I_{c} and a style image I s subscript 𝐼 𝑠 I_{s} , the algorithm in [ 4 ] tries to seek a stylised image I 𝐼 I that minimises the following objective:

(2)

where ℒ c subscript ℒ 𝑐 \mathcal{L}_{c} compares the content representation of a given content image to that of the stylised image, and ℒ s subscript ℒ 𝑠 \mathcal{L}_{s} compares the Gram-based style representation derived from a style image to that of the stylised image. α 𝛼 \alpha and β 𝛽 \beta are used to balance the content component and style component in the stylised result.

The content loss ℒ c subscript ℒ 𝑐 \mathcal{L}_{c} is defined by the squared Euclidean distance between the feature representations ℱ l superscript ℱ 𝑙 \mathcal{F}^{l} of the content image I c subscript 𝐼 𝑐 I_{c} in layer l 𝑙 l and that of the stylised image I 𝐼 I which is initialised with a noise image:

(3)

where { l c } subscript 𝑙 𝑐 \{l_{c}\} denotes the set of VGG layers for computing the content loss. For the style loss ℒ s subscript ℒ 𝑠 \mathcal{L}_{s} , [ 4 ] exploits Gram-based visual texture modelling technique to model the style, which has already been explained in Section  3.1 . Therefore, the style loss is defined by the squared Euclidean distance between the Gram-based style representations of I s subscript 𝐼 𝑠 I_{s} and I 𝐼 I :

(4)

where 𝒢 𝒢 \mathcal{G} is the aforementioned Gram matrix to encode the second order statistics of the set of filter responses. { l s } subscript 𝑙 𝑠 \{l_{s}\} represents the set of VGG layers for calculating the style loss.

The choice of content and style layers is an important factor in the process of style transfer. Different positions and numbers of layers can result in very different visual experiences. Given the pre-trained VGG-19 [ 31 ] as the loss network, Gatys et al.’s choice of { l s } subscript 𝑙 𝑠 \{l_{s}\} and { l c } subscript 𝑙 𝑐 \{l_{c}\} in [ 4 ] is { l s } = { r ​ e ​ l ​ u ​ 1 ​ _ ​ 1 , r ​ e ​ l ​ u ​ 2 ​ _ ​ 1 , r ​ e ​ l ​ u ​ 3 ​ _ ​ 1 , r ​ e ​ l ​ u ​ 4 ​ _ ​ 1 , r ​ e ​ l ​ u ​ 5 ​ _ ​ 1 } subscript 𝑙 𝑠 𝑟 𝑒 𝑙 𝑢 1 _ 1 𝑟 𝑒 𝑙 𝑢 2 _ 1 𝑟 𝑒 𝑙 𝑢 3 _ 1 𝑟 𝑒 𝑙 𝑢 4 _ 1 𝑟 𝑒 𝑙 𝑢 5 _ 1 \{l_{s}\}=\{relu1\_1,relu2\_1,relu3\_1,relu4\_1,relu5\_1\} and { l c } = { r ​ e ​ l ​ u ​ 4 ​ _ ​ 2 } subscript 𝑙 𝑐 𝑟 𝑒 𝑙 𝑢 4 _ 2 \{l_{c}\}=\{relu4\_2\} . For { l s } subscript 𝑙 𝑠 \{l_{s}\} , the idea of combining multiple layers (up to higher layers) is critical for the success of Gatys et al.’s NST algorithm. Matching the multi-scale style representations leads to a smoother and more continuous stylisation, which gives the visually most appealing results [ 4 ] . For the content layer { l c } subscript 𝑙 𝑐 \{l_{c}\} , matching the content representations on a lower layer preserves the undesired fine structures (e.g., edges and colour map) of the original content image during stylisation. In contrast, by matching the content on a higher layer of the network, the fine structures can be altered to agree with the desired style while preserving the content information of the content image. Also, using VGG-based loss networks for style transfer is not the only option. Similar performance can be achieved by selecting other pre-trained classification networks, e.g., ResNet [ 41 ] .

In Equation ( 2 ), both ℒ c subscript ℒ 𝑐 \mathcal{L}_{c} and ℒ s subscript ℒ 𝑠 \mathcal{L}_{s} are differentiable. Thus, with random noise as the initial I 𝐼 I , Equation ( 2 ) can be minimised by using gradient descent in image space with backpropagation. In addition, a total variation denoising term is usually added in practice to encourage the smoothness in the stylised result.

The algorithm of Gatys et al. does not need ground truth data for training and also does not have explicit restrictions on the type of style images, which addresses the limitations of previous IB-AR algorithms without CNNs (Section  2 ). However, the algorithm of Gatys et al. does not perform well in preserving the coherence of fine structures and details during stylisation since CNN features inevitably lose some low-level information. Also, it generally fails for photorealistic synthesis, due to the limitations of Gram-based style representation. Moreover, it does not consider the variations of brush strokes and the semantics and depth information contained in the content image, which are important factors in evaluating the visual quality.

In addition, a Gram-based style representation is not the only choice to statistically encode style information. There are also some other effective statistical style representations, which are derived from a Gram-based representation. Li et al. [ 42 ] derive some different style representations by considering style transfer in the domain of transfer learning, or more specifically, domain adaption [ 43 ] . Given that training and testing data are drawn from different distributions, the goal of domain adaption is to adapt a model trained on labelled training data from a source domain to predict labels of unlabelled testing data from a target domain. One way for domain adaption is to match a sample in the source domain to that in the target domain by minimising their distribution discrepancy, in which Maximum Mean Discrepancy (MMD) is a popular choice to measure the discrepancy between two distributions. Li et al. prove that matching Gram-based style representations between a pair of style and stylised images is intrinsically minimising MMD with a quadratic polynomial kernel. Therefore, it is expected that other kernel functions for MMD can be equally applied in NST, e.g., the linear kernel, polynomial kernel and Gaussian kernel. Another related representation is the batch normalisation (BN) statistic representation, which is to use mean and variance of the feature maps in VGG layers to model style:

(5)

where ℱ c l ∈ ℝ H × W subscript superscript ℱ 𝑙 𝑐 superscript ℝ 𝐻 𝑊 \mathcal{F}^{l}_{c}\in\mathbb{R}^{H\times W} is the c 𝑐 c -th feature map channel at layer l 𝑙 l of VGG network, and C l superscript 𝐶 𝑙 C^{l} is the number of channels.

The main contribution of Li et al.’s algorithm is to theoretically demonstrate that the Gram matrices matching process in NST is equivalent to minimising MMD with the second order polynomial kernel, thus proposing a timely interpretation of NST and making the principle of NST clearer. However, the algorithm of Li et al. does not resolve the aforementioned limitations of Gatys et al.’s algorithm.

One limitation of the Gram-based algorithm is its instabilities during optimisations. Also, it requires manually tuning the parameters, which is very tedious. Risser et al. [ 44 ] find that feature activations with quite different means and variances can still have the same Gram matrix, which is the main reason of instabilities. Inspired by this observation, Risser et al. introduce an extra histogram loss, which guides the optimisation to match the entire histogram of feature activations. They also present a preliminary solution to automatic parameter tuning, which is to explicitly prevent gradients with extreme values through extreme gradient normalisation.

By additionally matching the histogram of feature activations, the algorithm of Risser et al. achieves a more stable style transfer with fewer iterations and parameter tuning efforts. However, its benefit comes at an expense of a high computational complexity. Also, the aforementioned weaknesses of Gatys et al.’s algorithm still exist, e.g., a lack of consideration in depth and the coherence of details.

All these aforementioned neural methods only compare content and stylised images in the CNN feature space to make the stylised image semantically similar to the content image. But since CNN features inevitably lose some low-level information contained in the image, there are usually some unappealing distorted structures and irregular artefacts in the stylised results. To preserve the coherence of fine structures during stylisation, Li et al. [ 45 ] propose to incorporate additional constraints upon low-level features in pixel space. They introduce an additional Laplacian loss, which is defined as the squared Euclidean distance between the Laplacian filter responses of a content image and stylised result. Laplacian filter computes the second order derivatives of the pixels in an image and is widely used for edge detection.

The algorithm of Li et al. has a good performance in preserving the fine structures and details during stylisation. But it still lacks considerations in semantics, depth, variations in brush strokes, etc.

4.1.2 Non-parametric Neural Methods with MRFs

Non-parametric IOB-NST is built on the basis of Non-parametric Texture Modelling with MRFs . This category considers NST at a local level, i.e., operating on patches to match the style.

Li and Wand [ 46 ] are the first to propose an MRF-based NST algorithm. They find that the parametric NST method with summary statistics only captures the per-pixel feature correlations and does not constrain the spatial layout, which leads to a less visually plausible result for photorealistic styles. Their solution is to model the style in a non-parametric way and introduce a new style loss function which includes a patch-based MRF prior:

(6)

where Ψ ​ ( ℱ l ​ ( I ) ) Ψ superscript ℱ 𝑙 𝐼 \Psi(\mathcal{F}^{l}(I)) is the set of all local patches from the feature map ℱ l ​ ( I ) superscript ℱ 𝑙 𝐼 \mathcal{F}^{l}(I) . Ψ i subscript Ψ 𝑖 \Psi_{i} denotes the i t ​ h superscript 𝑖 𝑡 ℎ i^{th} local patch and Ψ N ​ N ​ ( i ) subscript Ψ 𝑁 𝑁 𝑖 \Psi_{NN(i)} is the most similar style patch with the i 𝑖 i -th local patch in the stylised image I 𝐼 I . The best matching Ψ N ​ N ​ ( i ) subscript Ψ 𝑁 𝑁 𝑖 \Psi_{NN(i)} is obtained by calculating normalised cross-correlation over all style patches in the style image I s subscript 𝐼 𝑠 I_{s} . m 𝑚 m is the total number of local patches. Since their algorithm matches a style in the patch-level, the fine structure and arrangement can be preserved much better.

The advantage of the algorithm of Li and Wand is that it performs especially well for photorealistic styles, or more specifically, when the content photo and the style are similar in shape and perspective, due to the patch-based MRF loss. However, it generally fails when the content and style images have strong differences in perspective and structure since the image patches could not be correctly matched. It is also limited in preserving sharp details and depth information.

4.2 Model-Optimisation-Based Offline Neural Methods

Although IOB-NST is able to yield impressive stylised images, there are still some limitations. The most concerned limitation is the efficiency issue. The second category MOB-NST addresses the speed and computational cost issue by exploiting MOB-IR to reconstruct the stylised result, i.e., a feed-forward network g 𝑔 g is optimised over a large set of images I c subscript 𝐼 𝑐 I_{c} for one or more style images I s subscript 𝐼 𝑠 I_{s} :

(7)

Depending on the number of artistic styles a single g 𝑔 g can produce, MOB-NST algorithms are further divided into Per-Style-Per-Model (PSPM) MOB-NST methods , Multiple-Style-Per-Model (MSPM) MOB-NST Methods, and Arbitrary-Style-Per-Model (ASPM) MOB-NST Methods.

4.2.1 Per-Style-Per-Model Neural Methods

1) Parametric PSPM with Summary Statistics. The first two MOB-NST algorithms are proposed by Johnson et al. [ 47 ] and Ulyanov et al. [ 48 ] respectively. These two methods share a similar idea, which is to pre-train a feed-forward style-specific network and produce a stylised result with a single forward pass at testing stage. They only differ in the network architecture, for which Johnson et al. ’s design roughly follows the network proposed by Radford et al. [ 49 ] but with residual blocks as well as fractionally strided convolutions, and Ulyanov et al. use a multi-scale architecture as the generator network. The objective function is similar to the algorithm of Gatys et al. [ 4 ] , which indicates that they are also Parametric Methods with Summary Statistics .

The algorithms of Johnson et al. and Ulyanov et al. achieve a real-time style transfer. However, their algorithm design basically follows the algorithm of Gatys et al., which makes them suffer from the same aforementioned issues as Gatys et al.’s algorithm (e.g., a lack of consideration in the coherence of details and depth information).

Shortly after [ 47 , 48 ] , Ulyanov et al. [ 50 ] further find that simply applying normalisation to every single image rather than a batch of images (precisely batch normalization (BN) ) leads to a significant improvement in stylisation quality. This single image normalisation is called instance normalisation (IN), which is equivalent to batch normalisation when the batch size is set to 1 1 1 . The style transfer network with IN is shown to converge faster than BN and also achieves visually better results. One interpretation is that IN is a form of style normalisation and can directly normalise the style of each content image to the desired style [ 51 ] . Therefore, the objective is easier to learn as the rest of the network only needs to take care of the content loss.

2) Non-parametric PSPM with MRFs. Another work by Li and Wand [ 52 ] is inspired by the MRF-based NST [ 46 ] algorithm in Section  4.1.2 . They address the efficiency issue by training a Markovian feed-forward network using adversarial training. Similar to [ 46 ] , their algorithm is a Patch-based Non-parametric Method with MRFs . Their method is shown to outperform the algorithms of Johnson et al. and Ulyanov et al. in the preservation of coherent textures in complex images, thanks to their patch-based design. However, their algorithm has a less satisfying performance with non-texture styles (e.g., face images), since their algorithm lacks a consideration in semantics. Other weaknesses of their algorithm include a lack of consideration in depth information and variations of brush strokes, which are important visual factors.

4.2.2 Multiple-Style-Per-Model Neural Methods

Although the above PSPM approaches can produce stylised images two orders of magnitude faster than previous IOB-NST methods, separate generative networks have to be trained for each particular style image, which is quite time-consuming and inflexible. But many paintings (e.g., impressionist paintings) share similar paint strokes and only differ in their colour palettes. Intuitively, it is redundant to train a separate network for each of them. MSPM is therefore proposed, which improves the flexibility of PSPM by further incorporating multiple styles into one single model. There are generally two paths towards handling this problem: 1) tying only a small number of parameters in a network to each style ( [ 53 , 54 ] ) and 2) still exploiting only a single network like PSPM but combining both style and content as inputs ( [ 55 , 56 ] ).

1) Tying only a small number of parameters to each style. An early work by Dumoulin et al. [ 53 ] is built on the basis of the proposed IN layer in PSPM algorithm [ 50 ] (Section  4.2.1 ). They surprisingly find that using the same convolutional parameters but only scaling and shifting parameters in IN layers is sufficient to model different styles. Therefore, they propose an algorithm to train a conditional multi-style transfer network based on conditional instance normalisation (CIN), which is defined as:

(8)

where ℱ ℱ \mathcal{F} is the input feature activation and s 𝑠 s is the index of the desired style from a set of style images. As shown in Equation ( 8 ), the conditioning for each style I s subscript 𝐼 𝑠 I_{s} is done by scaling and shifting parameters γ s superscript 𝛾 𝑠 \gamma^{s} and β s superscript 𝛽 𝑠 \beta^{s} after normalising feature activation ℱ ​ ( I c ) ℱ subscript 𝐼 𝑐 \mathcal{F}(I_{c}) , i.e., each style I s subscript 𝐼 𝑠 I_{s} can be achieved by tuning parameters of an affine transformation. The interpretation is similar to that for [ 50 ] in Section  4.2.1 , i.e., the normalisation of feature statistics with different affine parameters can normalise input content image to different styles. Furthermore, the algorithm of Dumoulin et al. can also be extended to combine multiple styles in a single stylised result by combining affine parameters of different styles.

Another algorithm which follows the first path of MSPM is proposed by Chen et al. [ 54 ] . Their idea is to explicitly decouple style and content, i.e., using separate network components to learn the corresponding content and style information. More specifically, they use mid-level convolutional filters (called “StyleBank” layer) to individually learn different styles. Each style is tied to a set of parameters in “StyleBank” layer. The rest components in the network are used to learn content information, which is shared by different styles. Their algorithm also supports flexible incremental training, which is to fix the content components in the network and only train a “StyleBank” layer for a new style.

In summary, both the algorithms of Dumoulin et al. and Chen et al. have the benefits of little efforts needed to learn a new style and a flexible control over style fusion. However, they do not address the common limitations of NST algorithms, e.g., a lack of details, semantics, depth and variations in brush strokes.

2) Combining both style and content as inputs. One disadvantage of the first category is that the model size generally becomes larger with the increase of the number of learned styles. The second path of MSPM addresses this limitation by fully exploring the capability of one single network and combining both content and style into the network for style identification. Different MSPM algorithms differ in the way to incorporate style into the network.

In [ 55 ] , given N 𝑁 N target styles, Li et al. design a selection unit for style selection, which is a N 𝑁 N -dimensional one-hot vector. Each bit in the selection unit represents a specific style I s subscript 𝐼 𝑠 I_{s} in the set of target styles. For each bit in the selection unit, Li et al. first sample a corresponding noise map f ​ ( I s ) 𝑓 subscript 𝐼 𝑠 f(I_{s}) from a uniform distribution and then feed f ​ ( I s ) 𝑓 subscript 𝐼 𝑠 f(I_{s}) into the style sub-network to obtain the corresponding style encoded features ℱ ​ ( f ​ ( I s ) ) ℱ 𝑓 subscript 𝐼 𝑠 \mathcal{F}(f(I_{s})) . By feeding the concatenation of the style encoded features ℱ ​ ( f ​ ( I s ) ) ℱ 𝑓 subscript 𝐼 𝑠 \mathcal{F}(f(I_{s})) and the content encoded features E ​ n ​ c ​ ( I c ) 𝐸 𝑛 𝑐 subscript 𝐼 𝑐 Enc(I_{c}) into the decoder part D ​ e ​ c 𝐷 𝑒 𝑐 Dec of the style transfer network, the desired stylised result can be produced: I = D ​ e ​ c ​ ( ℱ ​ ( f ​ ( I s ) ) ⊕ E ​ n ​ c ​ ( I c ) ) 𝐼 𝐷 𝑒 𝑐 direct-sum ℱ 𝑓 subscript 𝐼 𝑠 𝐸 𝑛 𝑐 subscript 𝐼 𝑐 I=Dec(\ \mathcal{F}(f(I_{s}))\ \oplus\ Enc(I_{c})\ ) .

Another work by Zhang and Dana [ 56 ] first forwards each style image in the style set through the pre-trained VGG network and obtain multi-scale feature activations ℱ ​ ( I s ) ℱ subscript 𝐼 𝑠 \mathcal{F}(I_{s}) in different VGG layers. Then multi-scale ℱ ​ ( I s ) ℱ subscript 𝐼 𝑠 \mathcal{F}(I_{s}) are combined with multi-scale encoded features E ​ n ​ c ​ ( I c ) 𝐸 𝑛 𝑐 subscript 𝐼 𝑐 Enc(I_{c}) from different layers in the encoder through their proposed inspiration layers. The inspiration layers are designed to reshape ℱ ​ ( I s ) ℱ subscript 𝐼 𝑠 \mathcal{F}(I_{s}) to match the desired dimension, and also have a learnable weight matrix to tune feature maps to help minimise the objective function.

The second type of MSPM addresses the limitation of the increased model size in the first type of MSPM. At an expense, the style scalability of the second type of MSPM is much smaller, since only one single network is used for multiple styles. We will quantitatively compare the style scalability of different MSPM algorithms in Section  6 . In addition, some aforementioned limitations in the first type of MSPM still exist, i.e., the second type of MSPM algorithms are still limited in preserving the coherence of fine structures and also depth information.

4.2.3 Arbitrary-Style-Per-Model Neural Methods

The third category, ASPM-MOB-NST, aims at one-model-for-all, i.e., one single trainable model to transfer arbitrary artistic styles. There are also two types of ASPM, one built upon Non-parametric Texture Modelling with MRFs and the other one built upon Parametric Texture Modelling with Summary Statistics .

1) Non-parametric ASPM with MRFs. The first ASPM algorithm is proposed by Chen and Schmidt [ 57 ] . They first extract a set of activation patches from content and style feature activations computed in pre-trained VGG network. Then they match each content patch to the most similar style patch and swap them (called “Style Swap” in [ 57 ] ). The stylised result can be produced by reconstructing the resulting activation map after “Style Swap”, with either IOB-IR or MOB-IR techniques. The algorithm of Chen and Schmidt is more flexible than the previous approaches due to its characteristic of one-model-for-all-style. But the stylised results of [ 57 ] are less appealing since the content patches are typically swapped with the style patches which are not representative of the desired style. As a result, the content is well preserved while the style is generally not well reflected.

2) Parametric ASPM with Summary Statistics. Considering [ 53 ] in Section  4.2.2 , the simplest approach for arbitrary style transfer is to train a separate parameter prediction network P 𝑃 P to predict γ s superscript 𝛾 𝑠 \gamma^{s} and β s superscript 𝛽 𝑠 \beta^{s} in Equation ( 8 ) with a number of training styles [ 58 ] . Given a test style image I s subscript 𝐼 𝑠 I_{s} , CIN layers in the style transfer network take affine parameters γ s superscript 𝛾 𝑠 \gamma^{s} and β s superscript 𝛽 𝑠 \beta^{s} from P ​ ( I s ) 𝑃 subscript 𝐼 𝑠 P(I_{s}) , and normalise the input content image to the desired style with a forward pass.

Another similar approach based on [ 53 ] is proposed by Huang and Belongie [ 51 ] . Instead of training a parameter prediction network, Huang and Belongie propose to modify conditional instance normalisation (CIN) in Equation ( 8 ) to adaptive instance normalisation (AdaIN):

(9)

AdaIN transfers the channel-wise mean and variance feature statistics between content and style feature activations, which also shares a similar idea with [ 57 ] . Different from [ 53 ] , the encoder in the style transfer network of [ 51 ] is fixed and comprises the first few layers in pre-trained VGG network. Therefore, ℱ ℱ \mathcal{F} in [ 51 ] is the feature activation from a pre-trained VGG network. The decoder part needs to be trained with a large set of style and content images to decode resulting feature activations after AdaIN to the stylised result: I = D ​ e ​ c ​ ( AdaIN ​ ( ℱ ​ ( I c ) , ℱ ​ ( I s ) ) ) 𝐼 𝐷 𝑒 𝑐 AdaIN ℱ subscript 𝐼 𝑐 ℱ subscript 𝐼 𝑠 I=Dec(\ \textrm{AdaIN}(\mathcal{F}(I_{c}),\mathcal{F}(I_{s}))\ ) .

The algorithm of Huang and Belongie [ 51 ] is the first ASPM algorithm that achieves a real-time stylisation. However, the algorithm of Huang and Belongie [ 51 ] is data-driven and limited in generalising on unseen styles. Also, simply adjusting the mean and variance of feature statistics makes it hard to synthesise complicated style patterns with rich details and local structures.

A more recent work by Li et al. [ 59 ] attempts to exploit a series of feature transformations to transfer arbitrary artistic style in a style learning free manner. Similar to [ 51 ] , Li et al. use the first few layers of pre-trained VGG as the encoder and train the corresponding decoder. But they replace the AdaIN layer [ 51 ] in between the encoder and decoder with a pair of whitening and colouring transformations (WCT): I = D ​ e ​ c ​ ( WCT ​ ( ℱ ​ ( I c ) , ℱ ​ ( I s ) ) ) 𝐼 𝐷 𝑒 𝑐 WCT ℱ subscript 𝐼 𝑐 ℱ subscript 𝐼 𝑠 I=Dec(\ \textrm{WCT}(\mathcal{F}(I_{c}),\mathcal{F}(I_{s}))\ ) . Their algorithm is built on the observation that the whitening transformation can remove the style related information and preserve the structure of content. Therefore, receiving content activations ℱ ​ ( I c ) ℱ subscript 𝐼 𝑐 \mathcal{F}(I_{c}) from the encoder, whitening transformation can filter the original style out of the input content image and return a filtered representation with only content information. Then, by applying colouring transformation, the style patterns contained in ℱ ​ ( I s ) ℱ subscript 𝐼 𝑠 \mathcal{F}(I_{s}) are incorporated into the filtered content representation, and the stylised result I 𝐼 I can be obtained by decoding the transformed features. They also extend this single-level stylisation to multi-level stylisation to further improve visual quality.

The algorithm of Li et al. is the first ASPM algorithm to transfer artistic styles in a learning-free manner. Therefore, compared with [ 51 ] , it does not have the limitation in generalisation capabilities. But the algorithm of Li et al. is still not effective at producing sharp details and fine strokes. The stylisation results will be shown in Section  6 . Also, it lacks a consideration in preserving depth information and variations in brush strokes.

Refer to caption

5 Improvements and Extensions

Since the emergence of NST algorithms, there are also some researches devoted to improving current NST algorithms by controlling perceptual factors (e.g., stroke size control, spatial style control, and colour control) (Figure  2 , green boxes). Also, all of aforementioned NST methods are designed for general still images. They may not be appropriate for specialised types of images and videos (e.g., doodles, head portraits, and video frames). Thus, a variety of follow-up studies (Figure  2 , pink boxes) aim to extend general NST algorithms to these particular types of images and even extend them beyond artistic image style (e.g., audio style).

Controlling Perceptual Factors in Neural Style Transfer. Gatys et al. themselves [ 60 ] propose several slight modifications to improve their previous algorithm [ 4 ] . They demonstrate a spatial style control strategy to control the style in each region of the content image. Their idea is to define guidance channels for the feature activations for both content and style image. The guidance channel has values in [ 0 , 1 ] 0 1 [0,1] specifying which style should be transferred to which content region, i.e., the content regions where the content guidance channel is 1 1 1 should be rendered with the style where the style guidance channel is equal to 1 1 1 . While for the colour control, the original NST algorithm produces stylised images with the colour distribution of the style image. However, sometimes people prefer a colour-preserving style transfer, i.e., preserving the colour of the content image during style transfer. The corresponding solution is to first transform the style image’s colours to match the content image’s colours before style transfer, or alternatively perform style transfer only in the luminance channel.

For stroke size control, the problem is much more complex. We show sample results of stroke size control in Figure  3 . The discussions of stroke size control strategy need to be split into several cases [ 61 ] :

1) IOB-NST with non-high-resolution images: Since current style statistics (e.g., Gram-based and BN-based statistics) are scale-sensitive [ 61 ] , to achieve different stroke sizes, the solution is simply resizing a given style image to different scales.

2) MOB-NST with non-high-resolution images: One possible solution is to resize the input image to different scales before the forward pass, which inevitably hurts stylisation quality. Another possible solution is to train multiple models with different scales of a style image, which is space and time consuming. Also, the possible solution fails to preserve stroke consistency among results with different stroke sizes, i.e., the results vary in stroke orientations, stroke configurations, etc. However, users generally desire to only change the stroke size but not others. To address this problem, Jing et al. [ 61 ] propose a stroke controllable PSPM algorithm. The core component of their algorithm is a StrokePyramid module, which learns different stroke sizes with adaptive receptive fields. Without trading off quality and speed, their algorithm is the first to exploit one single model to achieve flexible continuous stroke size control while preserving stroke consistency , and further achieve spatial stroke size control to produce new artistic effects. Although one can also use ASPM algorithm to control stroke size, ASPM trades off quality and speed. As a result, ASPM is not effective at producing fine strokes and details compared with [ 61 ] .

3) IOB-NST with high-resolution images: For high-resolution images (e.g., 3000 × 3000 3000 3000 3000\times 3000 pixels in [ 60 ] ), a large stroke size cannot be achieved by simply resizing style image to a large scale. Since only the region in the content image with a receptive field size of VGG can be affected by a neuron in the loss network, there is almost no visual difference between a large and larger brush strokes in a small image region with receptive field size. Gatys et al. [ 60 ] tackle this problem by proposing a coarse-to-fine IOB-NST procedure with several steps of downsampling, stylising, upsampling and final stylising.

4) MOB-NST with high-resolution images: Similar to 3), stroke size in stylised result does not vary with style image scale for high-resolution images. The solution is also similar to Gatys et al. ’s algorithm in [ 60 ] , which is a coarse-to-fine stylisation procedure [ 62 ] . The idea is to exploit a multimodel, which comprises multiple subnetworks. Each subnetwork receives the upsampled stylised result of the previous subnetwork as the input, and stylises it again with finer strokes.

Another limitation of current NST algorithms is that they do not consider the depth information contained in the image. To address this limitation, the depth preserving NST algorithm [ 63 ] is proposed. Their approach is to add a depth loss function based on [ 47 ] to measure the depth difference between the content image and the stylised image. The image depth is acquired by applying a single-image depth estimation algorithm (e.g., Chen et al.’s work in [ 64 ] ).

Semantic Style Transfer. Given a pair of style and content images which are similar in content, the goal of semantic style transfer is to build a semantic correspondence between the style and content, which maps each style region to a corresponding semantically similar content region. Then the style in each style region is transferred to the semantically similar content region.

1) Image-Optimisation-Based Semantic Style Transfer. Since the patch matching scheme naturally meets the requirements of the region-based correspondence, Champandard [ 65 ] proposes to build a semantic style transfer algorithm based on the aforementioned patch-based algorithm [ 46 ] (Section  4.1.2 ). Although the result produced by the algorithm of Li and Wand [ 46 ] is close to the target of semantic style transfer, [ 46 ] does not incorporate an accurate segmentation mask, which sometimes leads to a wrong semantic match. Therefore, Champandard augments an additional semantic channel upon [ 46 ] , which is a downsampled semantic segmentation map. The segmentation map can be either manually annotated or from a semantic segmentation algorithm [ 66 , 67 ] . Despite the effectiveness of [ 65 ] , MRF-based design is not the only choice. Instead of combining MRF prior, Chen and Hsu [ 68 ] provide an alternative way for semantic style transfer, which is to exploit masking out process to constrain the spatial correspondence and also a higher order style feature statistic to further improve the result. More recently, Mechrez et al. [ 69 ] propose an alternative contextual loss to realise semantic style transfer in a segmentation-free manner.

2) Model-Optimisation-Based Semantic Style Transfer. As before, the efficiency issue is always a big issue. Both [ 65 ] and [ 68 ] are based on IOB-NST algorithms and therefore leave much room for improvement. Lu et al. [ 70 ] speed up the process by optimising the objective function in feature space, instead of in pixel space. More specifically, they propose to do feature reconstruction, instead of image reconstruction as previous algorithms do. This optimisation strategy reduces the computation burden, since the loss does not need to propagate through a deep network. The resulting reconstructed feature is decoded into the final result with a trained decoder. Since the speed of [ 70 ] does not reach real-time, there is still big room for further research.

Instance Style Transfer. Instance style transfer is built on instance segmentation and aims to stylise only a single user-specified object within an image. The challenge mainly lies in the transition between a stylised object and non-stylised background. Castillo et al. [ 71 ] tackle this problem by adding an extra MRF-based loss to smooth and anti-alias boundary pixels.

Doodle Style Transfer. An interesting extension can be found in [ 65 ] , which is to exploit NST to transform rough sketches into fine artworks. The method is simply discarding content loss term and using doodles as segmentation map to do semantic style transfer.

Stereoscopic Style Transfer. Driven by the demand of AR/VR, Chen et al. [ 72 ] propose a stereoscopic NST algorithm for stereoscopic images. They propose a disparity loss to penalise the bidirectional disparity. Their algorithm is shown to produce more consistent strokes for different views.

Portrait Style Transfer. Current style transfer algorithms are usually not optimised for head portraits. As they do not impose spatial constraints, directly applying these existing algorithms to head portraits will deform facial structures, which is unacceptable for the human visual system. Selim et al. [ 73 ] address this problem and extend [ 4 ] to head portrait painting transfer. They propose to use the notion of gain maps to constrain spatial configurations, which can preserve the facial structures while transferring the texture of the style image.

Video Style Transfer. NST algorithms for video sequences are substantially proposed shortly after Gatys et al.’s first NST algorithm for still images [ 4 ] . Different from still image style transfer, the design of video style transfer algorithm needs to consider the smooth transition between adjacent video frames. Like before, we divide related algorithms into Image-Optimisation-Based and Model-Optimisation-Based Video Style Transfer.

1) Image-Optimisation-Based Online Video Style Transfer. The first video style transfer algorithm is proposed by Ruder et al. [ 74 , 75 ] . They introduce a temporal consistency loss based on optical flow to penalise the deviations along point trajectories. The optical flow is calculated by using novel optical flow estimation algorithms [ 76 , 77 ] . As a result, their algorithm eliminates temporal artefacts and produces smooth stylised videos. However, they build their algorithm upon [ 4 ] and need several minutes to process a single frame.

2) Model-Optimisation-Based Offline Video Style Transfer. Several follow-up studies are devoted to stylising a given video in real-time. Huang et al. [ 78 ] propose to augment Ruder et al.’s temporal consistency loss [ 74 ] upon current PSPM algorithm. Given two consecutive frames, the temporal consistency loss is directly computed using two corresponding outputs of style transfer network to encourage pixel-wise consistency, and a corresponding two-frame synergic training strategy is introduced for the computation of temporal consistency loss. Another concurrent work which shares a similar idea with [ 78 ] but with an additional exploration of style instability problem can be found in [ 79 ] . Different from [ 78 , 79 ] , Chen et al. [ 80 ] propose a flow subnetwork to produce feature flow and incorporate optical flow information in feature space. Their algorithm is built on a pre-trained style transfer network (an encoder-decoder pair) and wraps feature activations from the pre-trained stylisation encoder using the obtained feature flow.

Character Style Transfer. Given a style image containing multiple characters, the goal of Character Style Transfer is to apply the idea of NST to generate new fonts and text effects. In [ 81 ] , Atarsaikhan et al. directly apply the algorithm in [ 4 ] to font style transfer and achieve visually plausible results. While Yang et al. [ 82 ] propose to first characterise style elements and exploit extracted characteristics to guide the generation of text effects. A more recent work [ 83 ] designs a conditional GAN model for glyph shape prediction, and also an ornamentation network for colour and texture prediction. By training these two networks jointly, font style transfer can be realised in an end-to-end manner.

Photorealistic Style Transfer. Photorealistic style transfer (also known as colour style transfer) aims to transfer the style of colour distributions. The general idea is to build upon current semantic style transfer but to eliminate distortions and preserve the original structure of the content image.

1) Image-Optimisation-Based Photorealistic Style Transfer. The earliest photorealistic style transfer approach is proposed by Luan et al. [ 84 ] . They propose a two-stage optimisation procedure, which is to initialise the optimisation by stylising a given photo with non-photorealistic style transfer algorithm [ 65 ] and then penalise image distortions by adding a photorealism regularization. But since Luan et al.’s algorithm is built on the Image-Optimisation-Based Semantic Style Transfer method [ 65 ] , their algorithm is computationally expensive. Similar to [ 84 ] , another algorithm proposed by Mechrez et al. [ 85 ] also adopts a two-stage optimisation procedure. They propose to refine the non-photorealistic stylised result by matching the gradients in the output image to those in the content photo. Compared to [ 84 ] , the algorithm of Mechrez et al. achieves a faster photorealistic stylisation speed.

2) Model-Optimisation-Based Photorealistic Style Transfer. Li et al. [ 86 ] address the efficiency issue of [ 84 ] by handling this problem with two steps, the stylisation step and smoothing step. The stylisation step is to apply the NST algorithm in [ 59 ] but replace upsampling layers with unpooling layers to produce the stylised result with fewer distortions. Then the smoothing step further eliminates structural artefacts. These two aforementioned algorithms [ 84 , 86 ] are mainly designed for natural images. Another work in [ 87 ] proposes to exploit GAN to transfer the colour from human-designed anime images to sketches. Their algorithm demonstrates a promising application of Photorealistic Style Transfer, which is the automatic image colourisation.

Attribute Style Transfer. Image attributes are generally referred to image colours, textures, etc. Previously, image attribute transfer is accomplished through image analogy [ 9 ] in a supervised manner (Section  2 ). Derived from the idea of patch-based NST [ 46 ] , Liao et al. [ 88 ] propose a deep image analogy to study image analogy in the domain of CNN features. Their algorithm is based on a patch matching technique and realises a weakly supervised image analogy, i.e., their algorithm only needs a single pair of source and target images instead of a large training set.

Fashion Style Transfer. Fashion style transfer receives fashion style image as the target and generates clothing images with desired fashion styles. The challenge of Fashion Style Transfer lies in the preservation of similar design with the basic input clothing while blending desired style patterns. This idea is first proposed by Jiang and Fu [ 89 ] . They tackle this problem by proposing a pair of fashion style generator and discriminator.

Audio Style Transfer. In addition to transferring image styles, [ 90 , 91 ] extend the domain of image style to audio style, and synthesise new sounds by transferring the desired style from a target audio. The study of audio style transfer also follows the route of image style transfer, i.e., Audio-Optimisation-Based Online Audio Style Transfer and then Model-Optimisation-Based Offline Audio Style Transfer . Inspired by image-based IOB-NST, Verma and Smith [ 90 ] propose a Audio-Optimisation-Based Online Audio Style Transfer algorithm based on online audio optimisation. They start from a noise signal and optimise it iteratively using backpropagation. [ 91 ] improves the efficiency by transferring an audio in a feed-forward manner and can produce the result in real-time.

(1) (2) (3) (4) (5)
(6) (7) (8) (9) (10)
No. Author Name & Year
1 Claude Monet (1886)
2 Georges Rouault (1907)
3 Henri de Toulouse-Lautrec (1893)
4 Wassily Kandinsky (1922)
5 John Ruskin (1847)
6 Severini Gino (1913)
7 Juan Gris (1912)
8 Vincent van Gogh (1889)
9 Pieter Bruegel the Elder (1563)
10 Egon Schiele (1915)

Note: All our style images are in the public domain.

6 Evaluation Methodology

The evaluations of NST algorithms remain an open and important problem in this field. In general, there are two major types of evaluation methodologies that can be employed in the field of NST, i.e., qualitative evaluation and quantitative evaluation. Qualitative evaluation relies on the aesthetic judgements of observers. The evaluation results are related to lots of factors (e.g., age and occupation of participants). While quantitative evaluation focuses on the precise evaluation metrics, which include time complexity, loss variation, etc. In this section, we experimentally compare different NST algorithms both qualitatively and quantitatively.

Group I Group II Group III Group IV Group V Group VI
Content & Style:
Gatys et al. ]:
Johnson et al. ]:
Ulyanov et al. ]:
Li and Wand ]:
Group I Group II Group III Group IV Group V Group VI
Content:
Gatys et al. ]:
Johnson et al. ]:
Ulyanov et al. ]:
Li and Wand ]:
Group I Group II Group III Group IV Group V Group VI
Content & Style:
Dumoulin et al. ]:
Chen et al. ]:
Li et al. ]:
Zhang and Dana ]:
Group I Group II Group III Group IV Group V Group VI
Content:
Dumoulin et al. ]:
Chen et al. ]:
Li et al. ]:
Zhang and Dana ]:
Group I Group II Group III Group IV Group V Group VI
Content & Style:
Chen and Schmidt ]:
Ghiasi et al. ]:
Huang and Belongie ]:
Li et al. ]:
Group I Group II Group III Group IV Group V Group VI
Content:
Chen and Schmidt ]:
Ghiasi et al. ]:
Huang and Belongie ]:
Li et al. ]:

6.1 Experimental Setup

Evaluation datasets. Totally, there are ten style images and twenty content images used in our experiment.

For style images, we select artworks of diversified styles, as shown in Figure  4 . For example, there are impressionism, cubism, abstract, contemporary, futurism, surrealist, and expressionism art. Regarding the mediums, some of these artworks are painted on canvas, while others are painted on cardboard or wool, cotton, polyester, etc. In addition, we also try to cover a range of image characteristics (such as details, contrast, complexity and color distributions), inspired by the works in [ 92 , 93 , 95 ] . More detailed information of our style images are given in Table  I .

For content images, there are already carefully selected and well-described benchmark datasets for evaluating stylisation by Mould and Rosin [ 92 , 93 , 95 ] . Their proposed NPR benchmark called NPRgeneral consists of the images that cover a wide range of characteristics (e.g., contrast, texture, edges and meaningful structures) and satisfy lots of criteria. Therefore, we directly use the selected twenty images in their proposed NPRgeneral benchmark as our content images.

For the algorithms based on offline model optimisation, MS-COCO dataset [ 96 ] is used to perform the training. All the content images are not used in training.

Principles. To maximise the fairness of the comparisons, we also obey the following principles during our experiment:

1) In order to cover every detail in each algorithm, we try to use the provided implementation from their published literatures. To maximise the fairness of comparison especially for speed comparison, for [ 10 ] , we use a popular torch-based open source code [ 97 ] , which is also admitted by the authors. In our experiment, except for [ 53 , 32 ] which are based on TensorFlow, all the other codes are implemented based on Torch 7.

2) Since the visual effect is influenced by the content and style weight, it is difficult to compare results with different degrees of stylisation. Simply giving the same content and style weight is not an optimal solution due to the different ways to calculate losses in each algorithm (e.g., different choices of content and style layers, different loss functions). Therefore, in our experiment, we try our best to balance the content and style weight among different algorithms.

3) We try to use the default parameters (e.g., choice of layers, learning rate, etc) suggested by the authors except for the aforementioned content and style weight. Although the results for some algorithms may be further improved by more careful hyperparameter tuning, we select the authors’ default parameters since we hold the point that the sensitivity for hyperparameters is also an important implicit criterion for comparison. For example, we cannot say an algorithm is effective if it needs heavy work to tune its parameters for each style.

There are also some other implementation details to be noted. For [ 47 ] and [ 48 ] , we use the instance normalisation strategy proposed in [ 50 ] , which is not covered in the published papers. Also, we do not consider the diversity loss term (proposed in [ 50 , 55 ] ) for all algorithms, i.e., one pair of content and style images corresponds to one stylised result in our experiment. For Chen and Schmidt’s algorithm [ 57 ] , we use the feed-forward reconstruction to reconstruct the stylised results.

6.2 Qualitative Evaluation

Example stylised results are shown in Figure  5 , Figure  7 and Figure  9 . More results can be found in the supplementary material 3 3 3 https://www.dropbox.com/s/5xd8iizoigvjcxz/SupplementaryMaterial_neuralStyleReview.pdf?dl=0 .

1) Results of IOB-NST. Following the content and style images, Figure  5 contains the results of Gatys et al.’s IOB-NST algorithm based on online image optimisation [ 4 ] . The style transfer process is computationally expensive, but in contrast, the results are appealing in visual quality. Therefore, the algorithm of Gatys et al. is usually regarded as the gold-standard method in the community of NST.

2) Results of PSPM-MOB-NST. Figure  5 shows the results of Per-Style-Per-Model MOB-NST algorithms (Section 4.2 ). Each model only fits one style. It can be noticed that the stylised results of Ulyanov et al. [ 48 ] and Johnson et al. [ 47 ] are somewhat similar. This is not surprising since they share a similar idea and only differ in their detailed network architectures. For the results of Li and Wand [ 52 ] , the results are sightly less impressive. Since [ 52 ] is based on Generative Adversarial Network (GAN), to some extent, the training process is not that stable. But we believe that GAN-based style transfer is a very promising direction, and there are already some other GAN-based works [ 83 , 87 , 98 ] (Section  5 ) in the field of NST.

3) Results of MSPM-MOB-NST. Figure  7 demonstrates the results of Multiple-Style-Per-Model MOB-NST algorithms. Multiple styles are incorporated into a single model. The idea of both Dumoulin et al.’s algorithm [ 53 ] and Chen et al.’s algorithm [ 54 ] is to tie a small number of parameters to each style. Also, both of them build their algorithm upon the architecture of [ 47 ] . Therefore, it is not surprising that their results are visually similar. Although the results of [ 53 , 54 ] are appealing, their model size will become larger with the increase of the number of learned styles. In contrast, Zhang and Dana’s algorithm [ 56 ] and Li et al.’s algorithm [ 55 ] use a single network with the same trainable network weights for multiple styles. The model size issue is tackled, but there seem to be some interferences among different styles, which slightly influences the stylisation quality.

4) Results of ASPM-MOB-NST. Figure  9 presents the last category of MOB-NST algorithms, namely Arbitrary-Style-Per-Model MOB-NST algorithms. Their idea is one-model-for-all. Globally, the results of ASPM are slightly less impressive than other types of algorithms. This is acceptable in that a three-way trade-off between speed, flexibility and quality is common in research. Chen and Schmidt’s patch-based algorithm [ 57 ] seems to not combine enough style elements into the content image. Their algorithm is based on similar patch swap. When lots of content patches are swapped with style patches that do not contain enough style elements, the target style will not be reflected well. Ghiasi et al.’s algorithm [ 58 ] is data-driven and their stylisation quality is very dependent on the varieties of training styles. For the algorithm of Huang and Belongie [ 51 ] , they propose to match global summary feature statistics and successfully improve the visual quality compared with [ 57 ] . However, their algorithm seems not good at handling complex style patterns, and their stylisation quality is still related to the varieties of training styles. The algorithm of Li et al. [ 59 ] replaces the training process with a series of transformations. But [ 59 ] is not effective at producing sharp details and fine strokes.

Saliency Comparison. NST is an art creation process. As indicated in [ 3 , 38 , 39 ] , the definition of style is subjective and also very complex, which involves personal preferences, texture compositions as well as the used tools and medium. As a result, it is difficult to define the aesthetic criterion for a stylised artwork. For the same stylised result, different people may have different or even opposite views. Nevertheless, our goal is to compare the results of different NST techniques (shown in Figure  5 , Figure  7 and Figure  9 ) as objectively as possible. Here, we consider comparing saliency maps, as proposed in [ 63 ] . The corresponding results are shown in Figure  6 , Figure  8 and Figure  10 . Saliency maps can demonstrate visually dominant locations in images. Intuitively, a successful style transfer could weaken or enhance the saliency maps in content images, but should not change the integrity and coherence. From Figure  6 (saliency detection results of IOB-NST and PSPM-MOB-NST), it can be noticed that the stylised results of [ 4 , 48 , 47 ] preserve the structures of content images well; however, for [ 52 ] , it might be harder for an observer to recognise the objects after stylisation. Using similar analytical method, from Figure  8 (saliency detection results of MSPM-MOB-NST), [ 53 ] and [ 54 ] preserve similar saliency of the original content images since they both tie a small number of parameters to each style. [ 56 ] and [ 55 ] are also similar regarding the ability to retain the integrity of the original saliency maps, because they both use a single network for all styles. As shown in Figure  10 , for the saliency detection results of ASPM-MOB-NST, [ 58 ] and [ 51 ] perform better than [ 57 ] and [ 59 ] ; however, both [ 58 ] and [ 51 ] are data-driven methods and their quality depends on the diversity of training styles. In general, it seems that the results of MSPM-MOB-NST preserve better saliency coherence than ASPM-MOB-NST, but a little inferior to IOB-NST and PSPM-MOB-NST.

Methods Time(s) Styles/Model
256 256 512 512 1024 1024
Gatys et al. ] 14.32 51.19 200.3
Johnson et al. ] 0.014 0.045 0.166 1
Ulyanov et al. ] 0.022 0.047 0.145 1
Li and Wand ] 0.015 0.055 0.229 1
Zhang and Dana ] 0.019 (0.039) 0.059 (0.133) 0.230 (0.533)

Li et al. ] 0.017 0.064 0.254

Chen and Schmidt ] 0.123 (0.130) 1.495 (1.520)
Huang and Belongie ] 0.026 (0.037) 0.095 (0.137) 0.382 (0.552)
Li et al. ] 0.620 1.139 2.947

Note: The fifth column shows the number of styles that a single model can produce. Time both excludes (out of parenthesis) and includes (in parenthesis) the style encoding process is shown, since [ 56 ] , [ 57 ] and [ 51 ] support storing encoded style statistics in advance to further speed up the stylisation process for the same style but different content images. Time of [ 57 ] for producing 1024 × 1024 1024 1024 1024\times 1024 images is not shown due to the memory limitation. The speed of [ 53 , 58 ] are similar to [ 47 ] since they share similar architecture. We do not redundantly list them in this table.

Types Methods Pros & Cons
E AS LF VQ
IOB-NST Gatys et al. ] Good and usually regarded as a gold standard.
PSPM- MOB-NST Ulyanov et al. ] The results of , ] are close to ]. ] is generally less appealing than , ].
Johnson et al. ]
Li and Wand ]
MSPM- MOB-NST Dumoulin et al. ] The results of ] and ] are close to ], but the model size generally becomes larger with the increase of the number of learned styles. , ] have a fixed model size but there seem to be some interferences among different styles.
Chen et al. ]
Li et al. ]
Zhang and Dana ]
ASPM- MOB-NST Chen and Schmidt ] In general, the results of ASPM are less impressive than other types of NST algorithms. ] does not combine enough style elements. , ] are generally not effective at producing complex style patterns. ] is not good at producing sharp details and fine strokes.
Ghiasi et al. ]
Huang and Belongie ]
Li et al. ]

Note: E , AS , LF , and VQ represent Efficient , Arbitrary Style , Learning-Free , and Visual Quality , respectively. IOB-NST denotes the category Image-Optimisation-Based Neural Style Transfer and MOB-NST represents Model-Optimisation-Based Neural Style Transfer .

6.3 Quantitative Evaluation

Regarding the quantitative evaluation, we mainly focus on five evaluation metrics, which are: generating time for a single content image of different sizes; training time for a single model; average loss for content images to measure how well the loss function is minimised; loss variation during training to measure how fast the model converges; style scalability to measure how large the learned style set can be.

k(k\in Z^{+}) denotes that a single model can produce multiple styles, which corresponds to MSPM algorithms. ∞ \infty means a single model works for any style, which corresponds to ASPM algorithms. The numbers reported in Table  II are obtained by averaging the generating time of 100 images. Note that we do not include the speed of [ 53 , 58 ] in Table  II as their algorithm is to scale and shift parameters based on the algorithm of Johnson et al. [ 47 ] . The time required to stylise one image using [ 53 , 32 ] is very close to [ 47 ] under the same setting. For Chen et al.’s algorithm in [ 54 ] , since their algorithm is protected by patent and they do not make public the detailed architecture design, here we just attach the speed information provided by the authors for reference: On a Pascal Titan X GPU, 256 × 256 256 256 256\times 256 : 0.007 0.007 0.007 s; 512 × 512 512 512 512\times 512 : 0.024 0.024 0.024 s; 1024 × 1024 1024 1024 1024\times 1024 : 0.089 0.089 0.089 s. For Chen and Schmidt’s algorithm [ 57 ] , the time for processing a 1024 × 1024 1024 1024 1024\times 1024 image is not reported due to the limit of video memory. Swapping patches for two 1024 × 1024 1024 1024 1024\times 1024 images needs more than 24 GB video memory and thus, the stylisation process is not practical. We can observe that except for [ 57 , 59 ] , all the other MOB-NST algorithms are capable of stylising even high-resolution content images in real-time. ASPM algorithms are generally slower than PSPM and MSPM, which demonstrates the aforementioned three-way trade-off again.

2) Training time. Another concern is the training time for one single model. The training time of different algorithms is hard to compare as sometimes the model trained with just a few iterations is capable of producing enough visually appealing results. So we just outline our training time of different algorithms (under the same setting) as a reference for follow-up studies. On a NVIDIA Quadro M6000, the training time for a single model is about 3.5 3.5 3.5 hours for the algorithm of Johnson et al. [ 47 ] , 3 3 3 hours for the algorithm of Ulyanov et al. [ 48 ] , 2 2 2 hours for the algorithm of Li and Wand [ 52 ] , 4 4 4 hours for Zhang and Dana [ 56 ] , and 8 8 8 hours for Li et al. [ 55 ] . Chen and Schmidt’s algorithm [ 57 ] and Huang and Belongie’s algorithm [ 51 ] take much longer (e.g., a couple of days), which is acceptable since a pre-trained model can work for any style. The training time of [ 58 ] depends on how large the training style set is. For MSPM algorithms, the training time can be further reduced through incremental learning over a pre-trained model. For example, the algorithm of Chen et al. only needs 8 8 8 minutes to incrementally learn a new style, as reported in [ 54 ] .

Refer to caption

3) Loss comparison. One way to evaluate some MOB-NST algorithms which share the same loss function is to compare their loss variation during training, i.e., the training curve comparison. It helps researchers to justify the choice of architecture design by measuring how fast the model converges and how well the same loss function can be minimised. Here we compare training curves of two popular MOB-NST algorithms [ 47 , 48 ] in Figure  11 , since most of the follow-up works are based on their architecture designs. We remove the total variation term and keep the same objective for both two algorithms. Other settings (e.g., loss network, chosen layers) are also kept the same. For the style images, we randomly select four styles from our style set and represent them in different colours in Figure  11 . It can be observed that the two algorithms are similar in terms of the convergence speed. Also, both algorithms minimise the content loss well during training, and they mainly differ in the speed of learning the style objective. The algorithm in [ 47 ] minimises the style loss better.

Another related criterion is to compare the final loss values of different algorithms over a set of test images. This metric demonstrates how well the same loss function can be minimised by using different algorithms. For a fair comparison, the loss function and other settings are also required to be kept the same. We show the results of one IOB-NST algorithm [ 4 ] and two MOB-NST algorithms [ 47 , 48 ] in Figure  12 . The result is consistent with the aforementioned trade-off between speed and quality. Although MOB-NST algorithms are capable of stylising images in real-time, they are not good as IOB-NST algorithms in terms of minimising the same loss function.

4) Style scalability. Scalability is a very important criterion for MSPM algorithms. However, it is very hard to measure since the maximum capabilities of a single model is highly related to the set of particular styles. If most styles have somewhat similar patterns, a single model can produce thousands of styles or even more, since these similar styles share somewhat similar distribution of style feature statistics. In contrast, if the style patterns vary a lot among different style images, the capability of a single model will be much smaller. But it is hard to measure how much these styles differ from each other in style patterns. Therefore, to provide the reader a reference, here we just summarise the authors’ attempt for style scalability: the number is 32 32 32 for [ 53 ] , 1000 1000 1000 for both [ 54 ] and [ 55 ] , and 100 100 100 for [ 56 ] .

A summary of the advantages and disadvantages of the mentioned algorithms in this experiment section can be found in Table  III .

7 Applications

Due to the visually plausible stylised results, the research of NST has led to many successful industrial applications and begun to deliver commercial benefits. In this section, we summarise these applications and present some potential usages.

7.1 Social Communication

One reason why NST catches eyes in both academia and industry is its popularity in some social networking sites, e.g., Facebook and Twitter. A recently emerged mobile application named Prisma [ 11 ] is one of the first industrial applications that provide the NST algorithm as a service. Due to its high stylisation quality, Prisma achieved great success and is becoming popular around the world. Some other applications providing the same service appeared one after another and began to deliver commercial benefits, e.g., a web application Ostagram [ 12 ] requires users to pay for a faster stylisation speed. Under the help of these industrial applications [ 13 , 99 , 100 ] , people can create their own art paintings and share their artwork with others on Twitter and Facebook, which is a new form of social communication. There are also some related application papers: [ 101 ] introduces an iOS app Pictory which combines style transfer techniques with image filtering; [ 102 ] further presents the technical implementation details of Pictory ; [ 103 ] demonstrates the design of another GPU-based mobile app ProsumerFX .

The application of NST in social communication reinforces the connections between people and also has positive effects on both academia and industry. For academia, when people share their own masterpiece, their comments can help the researchers to further improve the algorithm. Moreover, the application of NST in social communication also drives the advances of other new techniques. For instance, inspired by the real-time requirements of NST for videos, Facebook AI Research (FAIR) first developed a new mobile-embedded deep learning system Caffe2Go and then Caffe2 (now merged with PyTorch), which can run deep neural networks on mobile phones [ 104 ] . For industry, the application brings commercial benefits and promotes the economic development.

7.2 User-assisted Creation Tools

Another use of NST is to make it act as user-assisted creation tools. Although there are no popular applications that applied the NST technique in creation tools, we believe that it will be a promising potential usage in the future.

As a creation tool for painters and designers, NST can make it more convenient for a painter to create an artwork of a particular style, especially when creating computer-made artworks. Moreover, with NST algorithms, it is trivial to produce stylised fashion elements for fashion designers and stylised CAD drawings for architects in a variety of styles, which will be costly when creating them by hand.

7.3 Production Tools for Entertainment Applications

Some entertainment applications such as movies, animations and games are probably the most application forms of NST. For example, creating an animation usually requires 8 8 8 to 24 24 24 painted frames per second. The production costs will be largely reduced if NST can be applied to automatically stylise a live-action video into an animation style. Similarly, NST can significantly save time and costs when applied to the creation of some movies and computer games.

There are already some application papers aiming at introducing how to apply NST for production, e.g., Joshi et al. explore the use of NST in redrawing some scenes in a movie named Come Swim [ 105 ] , which indicates the promising potential applications of NST in this field. In [ 106 ] , Fišer et al. study an illumination-guided style transfer algorithm for stylisation of 3D renderings. They demonstrate how to exploit their algorithm for rendering previews on various geometries, autocomplete shading, and transferring style without a reference 3D model.

8 Future Challenges

The advances in the field of NST are inspiring and some algorithms have already found use in industrial applications. Although current algorithms are capable of good performance, there are still several challenges and open issues. In this section, we summarise key challenges within this field of NST and discuss possible strategies on how to deal with them in future works. Since NST is very related to NPR, some critical problems in NPR (summarised in [ 3 , 107 , 108 , 109 , 110 , 14 ] ) also remain future challenges for the research of NST. Therefore, we first review some of the major challenges existing in both NPR and NST and then discuss the research questions specialised for the field of NST.

8.1 Evaluation Methodology

Aesthetic evaluation is a critical issue in both NPR and NST. In the field of NPR, the necessity of aesthetic evaluation is explained by many researchers [ 3 , 107 , 108 , 109 , 110 , 14 ] , e.g., in [ 3 ] , Rosin and Collomosse use two chapters to explore this issue. This problem is increasingly critical as the fields of NPR and NST mature. As pointed out in [ 3 ] , researchers need some reliable criteria to assess the benefits of their proposed approach over the prior art and also a way to evaluate the suitability of one particular approach to one particular scenario. However, most NPR and NST papers evaluate their proposed approach with side-by-side subjective visual comparisons, or through measurements derived from various user studies [ 111 , 112 , 59 ] . For example, to evaluate the proposed universal style transfer algorithm, Li et al. [ 59 ] conduct a user study which is to ask participants to vote for their favourite stylised results. We argue that it is not an optimal solution since the results vary a lot with different observers. Inspired by [ 113 ] , we conduct a simple experiment for user studies with the stylised results of different NST algorithms. In our experiment, each stylised image is rated by 8 different raters (4 males and 4 females) with the same occupation and age. As depicted in Figure  13 , given the same stylised result, different observers with the same occupation and age still have quite different ratings. Nevertheless, there is currently no gold standard evaluation method for assessing NPR and NST algorithms. This challenge of aesthetic evaluation will continue to be an open question in both NPR and NST communities, the solution of which might require the collaboration with professional artists and the efforts in the identification of underlying aesthetic principles.

In the field of NST, there is another important issue related to aesthetic evaluation. Currently, there is no standard benchmark image set for evaluating NST algorithms. Different authors typically use their own images for evaluation. In our experiment, we use the carefully selected NPR benchmark image set named NPRgeneral [ 92 , 93 ] as our content images to compare different techniques, which is backed by the comprehensive study in [ 92 , 93 ] ; however, we have to admit that the selection of our style images is far from being a standard NST benchmark style set. Different from NPR, NST algorithms do not have explicit restrictions on the types of style images. Therefore, to compare the style scalability of different NST methods, it is critical to seek a benchmark style set which collectively exhibits a broad range of possible properties, accompanied by a detailed description of adopted principles, numerical measurements of image characteristics as well as a discussion of limitations like the works in [ 92 , 93 , 95 ] . Based on the above discussion, seeking an NST benchmark image set is quite a separate and important research direction, which provides not only a way for researchers to demonstrate the improvement of their proposed approach over the prior art, but also a tool to measure the suitability of one particular NST algorithm to one particular requirement. In addition, as the emergence of several NST extensions (Section  5 ), it remains another open problem to study the specialised benchmark data set and also the corresponding evaluation criteria for assessing those extended works (e.g., video style transfer, audio style transfer, stereoscopic style transfer, character style transfer and fashion style transfer).

8.2 Interpretable Neural Style Transfer

Another challenging problem is the interpretability of NST algorithms. Like many other CNN-based vision tasks, the process of NST is like a black box, which makes it quite uncontrollable. In this part, we focus on three critical issues related to the interpretability of NST, i.e., interpretable and controllable NST via disentangled representations, normalisation methods associated with NST, and adversarial examples in NST.

Representation disentangling. The goal of representation disentangling is to learn dimension-wise interpretable representations, where some changes in one or more specific dimensions correspond to changes precisely in a single factor of variation while being invariant to other factors [ 114 , 115 , 116 , 117 ] . Such representations are useful to a variety of machine learning tasks, e.g., visual concepts learning [ 118 ] and transfer learning [ 119 ] . For example, in style transfer, if one could learn a representation where the factors of variation (e.g., colour, shape, stroke size, stroke orientation and stroke composition) are precisely disentangled, these factors could then be freely controlled during stylisation. For example, one could change the stroke orientations in a stylised image by simply changing the corresponding dimension in the learned disentangled representation. Towards the goal of disentangled representation, current methods fit into two categories, which are supervised approaches and unsupervised ones. The basic idea of supervised disentangling methods is to exploit annotated data to supervise the mapping between inputs and attributes [ 120 , 121 ] . Despite their effectiveness, supervised disentangling approaches typically require numbers of training samples. However, in the case of NST, it is quite complicated to model and capture some of those aforementioned factors of variation. For example, it is hard to collect a set of images which have different stroke orientations but exactly the same colour distribution, stroke size and stroke composition. By contrast, unsupervised disentangling methods do not require annotations; however, they usually yield disentangled representations which are dimension-wise uncontrollable and uninterpretable [ 122 ] , i.e., we could not control what would be encoded in each specific dimension. Based on the above discussion, to acquire disentangled representations in NST, the first issue to be addressed is how to define, model and capture the complicated factors of variation in NST.

Paper Author Name
] Ulyanov et al.
] Dumoulin et al.
] Huang and Belongie

Normalisation methods. The advances in the field of NST are closely related to the emergence of novel normalisation methods, as shown in Table  IV . Some of these normalisation methods also have an influence on a larger vision community beyond style transfer (e.g., image recolourisation [ 123 ] and video colour propagation [ 124 ] ). In this part, we first briefly review these normalisation methods in NST and then discuss the corresponding problem. The first emerged normalisation method in NST is instance normalisation (or contrast normalisation ) proposed by Ulyanov et al. [ 50 ] . Instance normalisation is equivalent to batch normalisation when the batch size is one. It is shown that style transfer network with instance normalisation layer converges faster and produces visually better results compared with the network with batch normalisation layer. Ulyanov et al. believe that the superior performance of instance normalisation results from the fact that instance normalisation enables the network to discard contrast information in content images and therefore makes learning simpler. Another explanation proposed by Huang and Belongie [ 51 ] is that instance normalisation performs a kind of style normalisation by normalising feature statistics (i.e., the mean and variance). With instance normalisation , the style of each individual image could be directly normalised to the target style. As a result, the rest of the network only needs to take care of the content loss, making the objective easier to learn. Based on instance normalisation , Dumoulin et al. [ 53 ] further propose conditional instance normalisation , which is to scale and shift parameters in instance normalisation layers (shown in Equation ( 8 )). Following the interpretation proposed by Huang and Belongie, by using different affine parameters, the feature statistics could be normalised to different values. Correspondingly, the style of each individual sample could be normalised to different styles. Furthermore, in [ 51 ] , Huang and Belongie propose adaptive instance normalisation to adaptively instance normalise content feature by the style feature statistics (shown in Equation ( 9 )). In this way, they believe that the style of an individual image could be normalised to arbitrary styles. Despite the superior performance achieved by instance normalisation , conditional instance normalisation and adaptive instance normalisation , the reason behind their success still remains unclear. Although Ulyanov et al. [ 50 ] and Huang and Belongie [ 51 ] propose their own hypothesis based on pixel space and feature space respectively, there is a lack of theoretical proof for their proposed theories. In addition, their proposed theories are also built on other hypothesises, e.g., Huang and Belongie propose their interpretation based on the observation by Li et al. [ 42 ] : channel-wise feature statistics, namely mean and variance, could represent styles. However, it remains uncertain why feature statistics could represent the style, or even whether the feature statistics could represent all styles, which relates back to the interpretability of style representations.

Adversarial examples. Several studies have shown that deep classification networks are easily fooled by adversarial examples [ 125 , 126 ] , which are generated by applying perturbations to input images (e.g., Figure  14 (c)). Previous studies on adversarial examples mainly focus on deep classification networks. However, as shown in Figure  14 , we find that adversarial examples also exist in generative style transfer networks. In Figure  14 (d), one can hardly recognise the content, which is originally contained in Figure  14 (c). It reveals the difference between generative networks and the human vision system. The perturbed image is still recognisable to humans but leads to a different result for generative style transfer networks. However, it remains unclear why some perturbations could make such a difference, and whether some similar noised images uploaded by the user could still be stylised into the desired style. Interpreting and understanding adversarial examples in NST could help to avoid some failure cases in stylisation.

8.3 Three-way Trade-off in Neural Style Transfer

In the field of NST, there is a three-way trade-off between speed, flexibility and quality. IOB-NST achieves superior performance in quality but is computationally expensive. PSPM-MOB-NST achieves real-time stylisation; however, PSPM-MOB-NST needs to train a separate network for each style, which is not flexible. MSPM-MOB-NST improves the flexibility by incorporating multiple styles into one single model, but it still needs to pre-train a network for a set of target styles. Although ASPM-MOB-NST algorithms successfully transfer arbitrary styles, they are not that satisfying in perceptual quality and speed. The quality of data-driven ASPM quite relies on the diversity of training styles. However, one can hardly cover every style due to the great diversity of artworks. Image transformation based ASPM algorithm transfers arbitrary styles in a learning-free manner, but it is behind others in speed. Another related issue is the problem of hyperparameter tuning. To produce the most visually appealing results, it remains uncertain how to set the value of content and style weights, how to choose layers for computing content and style loss, which optimiser to use and how to set the value of learning rate. Currently, researchers empirically set these hyperparameters; however, one set of hyperparameters does not necessarily work for any style and it is tedious to manually tune these parameters for each combination of content and style images. One of the keys for this problem is a better understanding of the optimisation procedure in NST. A deep understanding of optimisation procedure would help understand how to find the local minima that lead to a high quality.

(a) (b) (c) (d)

9 Discussions and Conclusions

Over the past several years, NST has continued to become an inspiring research area, motivated by both scientific challenges and industrial demands. A considerable amount of researches have been conducted in the field of NST. Key advances in this field are summarised in Figure  2 . A summary of the corresponding style transfer loss functions can be found in Table  V . NST is quite a fast-paced area, and we are looking forwarding to more exciting works devoted to advancing the development of this field.

Paper Loss Description
Gatys et al. ] The first proposed style loss based on Gram-based style representations.
Johnson et al. ] Widely adopted content loss based on perceptual similarity.
Berger and Memisevic ] Computing over horizontally and vertically translated feature representations. More effective at modelling style with symmetric properties, compared with .
Li et al. ] Subtracting the mean of feature representations before computing . Eliminating large discrepancy in scale. Effective at multi-style transfer with one single network.
Zhang and Dana ] Computing over multi-scale feature representations. Eliminating a few artefacts.
Li et al. ] is equivalent to . is capable of comparable quality with , but with lower computational complexity.
Li et al. ] Achieving comparable quality with , but conceptually clearer in theory.
Risser et al. ] Matching the entire histogram of feature representations. Eliminating instability artefacts, compared with single .
Li et al. ] Eliminating distorted structures and irregular artefacts.
Li and Wand ] More effective when the content and style are similar in shape and perspective, compared with .
Champandard ] Incorporating a segmentation mask over . Enabling a more accurate semantic match.
Li and Wand ] Computed based on PatchGAN. Utilising contextual correspondence between patches. More effective at preserving coherent textures in complex images, compared with .
Jing et al. ] Achieving continuous stroke size control while preserving stroke consistency.
Wang et al. ] Enabling a coarse-to-fine stylisation procedure. Capable of producing large but also subtle strokes for high-resolution content images.
Liu et al. ] Preserving depth maps of content images. Effective at retaining spatial layout and structure of content images, compared with single .
Ruder et al. ] Designed for video style transfer. Penalising the deviations along point trajectories based on optical flow. Capable of maintaining temporal consistency among stylised video frames.
Chen et al. ] Designed for stereoscopic style transfer. Penalising bidirectional disparity. Capable of consistent strokes for different views.

During the period of preparing this review, we are also delighted to find that related researches on NST also bring new inspirations for other areas [ 127 , 128 , 129 , 130 , 131 ] and accelerate the development of a wider vision community. For the area of Image Reconstruction , inspired by NST, Ulyanov et al. [ 127 ] propose a novel deep image prior, which replaces the manually-designed total variation regulariser in [ 33 ] with a randomly initialised deep neural network. Given a task-dependent loss function ℒ ℒ \mathcal{L} , an image I o subscript 𝐼 𝑜 I_{o} and a fixed uniform noise z 𝑧 z as inputs, their algorithm can be formulated as:

(10)

One can easily notice that Equation ( 10 ) is very similar to Equation ( 7 ). The process in [ 127 ] is equivalent with the training process of MOB-NST when there is only one available image in the training set, but replacing I c subscript 𝐼 𝑐 I_{c} with z 𝑧 z and ℒ t ​ o ​ t ​ a ​ l subscript ℒ 𝑡 𝑜 𝑡 𝑎 𝑙 \mathcal{L}_{total} with ℒ ℒ \mathcal{L} . In other words, g 𝑔 g in [ 127 ] is trained to overfit one single sample. Inspired by NST, Upchurch et al. [ 128 ] propose a deep feature interpolation technique and provide a new baseline for the area of Image Transformation (e.g., face aging and smiling). Upon the procedure of IOB-NST algorithm [ 4 ] , they add an extra step which is interpolating in the VGG feature space. In this way, their algorithm successfully changes image contents in a learning-free manner. Another field closely related to NST is Face Photo-sketch Synthesis . For example, [ 132 ] exploits style transfer to generate shadings and textures for final face sketches. Similarly, for the area of Face Swapping , the idea of MOB-NST algorithm [ 48 ] can be directly applied to build a feed-forward Face-Swap algorithm [ 133 ] . NST also provides a new way for Domain Adaption , as is validated in the work of Atapour-Abarghouei and Breckon [ 131 ] . They apply style transfer technique to translate images from different domains so as to improve the generalisation capabilities of their Monocular Depth Estimation model.

Despite the great progress in recent years, the area of NST is far from a mature state. Currently, the first stage of NST is to refine and optimise recent NST algorithms, aiming to perfectly imitate varieties of styles. This stage involves two technical directions. The first one is to reduce failure cases and improve stylised quality on a wider variety of style and content images. Although there is not an explicit restriction on the type of styles, NST does have styles it is particularly good at and also some certain styles it is weak in. For example, NST typically performs well in producing irregular style elements (e.g., paintings), as demonstrated in many NST papers [ 4 , 47 , 53 , 59 ] ; however, for some styles with regular elements such as low-poly styles [ 134 , 135 ] and pixelator styles [ 136 ] , NST generally produces distorted and irregular results due to the property of CNN-based image reconstruction. For content images, previous NST papers usually use natural images as content to demonstrate their proposed algorithms; however, given abstract images (e.g., sketches and cartoons) as input content, NST typically does not combine enough style elements to match the content [ 137 ] , since a pre-trained classification network could not extract proper image content from these abstract images. The other technical direction of the first stage lies in deriving more extensions from general NST algorithms. For example, as the emergence of 3D vision techniques, it is promising to study 3D surface stylisation, which is to directly optimise and produce 3D objects for both photorealistic and non-photorealistic stylisation. After moving beyond the first stage, a further trend of NST is to not just imitate human-created art with NST techniques, but rather to create a new form of AI-created art under the guidance of underlying aesthetic principles. The first step towards this direction has been taken, i.e., using current NST methods [ 54 , 53 , 62 ] to combine different styles. For example, in [ 62 ] , Wang et al. successfully utilise their proposed algorithm to produce a new style which fuses the coarse texture distortions of one style with the fine brush strokes of another style image.

  • [1] B. Gooch and A. Gooch, Non-photorealistic rendering .   Natick, MA, USA: A. K. Peters, Ltd., 2001.
  • [2] T. Strothotte and S. Schlechtweg, Non-photorealistic computer graphics: modeling, rendering, and animation .   Morgan Kaufmann, 2002.
  • [3] P. Rosin and J. Collomosse, Image and video-based artistic stylisation .   Springer Science & Business Media, 2012, vol. 42.
  • [4] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 2414–2423.
  • [5] A. A. Efros and W. T. Freeman, “Image quilting for texture synthesis and transfer,” in Proceedings of the 28th annual conference on Computer graphics and interactive techniques .   ACM, 2001, pp. 341–346.
  • [6] I. Drori, D. Cohen-Or, and H. Yeshurun, “Example-based style synthesis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , vol. 2.   IEEE, 2003, pp. II–143.
  • [7] O. Frigo, N. Sabater, J. Delon, and P. Hellier, “Split and match: Example-based adaptive patch sampling for unsupervised style transfer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 553–561.
  • [8] M. Elad and P. Milanfar, “Style transfer via texture synthesis,” IEEE Transactions on Image Processing , vol. 26, no. 5, pp. 2338–2351, 2017.
  • [9] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin, “Image analogies,” in Proceedings of the 28th annual conference on Computer graphics and interactive techniques .   ACM, 2001, pp. 327–340.
  • [10] L. A. Gatys, A. S. Ecker, and M. Bethge, “A neural algorithm of artistic style,” ArXiv e-prints , Aug. 2015.
  • [11] I. Prisma Labs, “Prisma: Turn memories into art using artificial intelligence,” 2016. [Online]. Available: http://prisma-ai.com
  • [12] “Ostagram,” 2016. [Online]. Available: http://ostagram.ru
  • [13] A. J. Champandard, “Deep forger: Paint photos in the style of famous artists,” 2015. [Online]. Available: http://deepforger.com
  • [14] J. E. Kyprianidis, J. Collomosse, T. Wang, and T. Isenberg, “State of the ‘art’: A taxonomy of artistic stylization techniques for images and video,” IEEE transactions on visualization and computer graphics , vol. 19, no. 5, pp. 866–885, 2013.
  • [15] A. Semmo, T. Isenberg, and J. Döllner, “Neural style transfer: A paradigm shift for image-based artistic rendering?” in Proceedings of the Symposium on Non-Photorealistic Animation and Rendering .   ACM, 2017, pp. 5:1–5:13.
  • [16] A. Hertzmann, “Painterly rendering with curved brush strokes of multiple sizes,” in Proceedings of the 25th annual conference on Computer graphics and interactive techniques .   ACM, 1998, pp. 453–460.
  • [17] A. Kolliopoulos, “Image segmentation for stylized non-photorealistic rendering and animation,” Ph.D. dissertation, University of Toronto, 2005.
  • [18] B. Gooch, G. Coombe, and P. Shirley, “Artistic vision: painterly rendering using computer vision techniques,” in Proceedings of the 2nd international symposium on Non-photorealistic animation and rendering .   ACM, 2002, pp. 83–ff.
  • [19] Y.-Z. Song, P. L. Rosin, P. M. Hall, and J. Collomosse, “Arty shapes,” in Proceedings of the Fourth Eurographics conference on Computational Aesthetics in Graphics, Visualization and Imaging .   Eurographics Association, 2008, pp. 65–72.
  • [20] M. Zhao and S.-C. Zhu, “Portrait painting using active templates,” in Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Non-Photorealistic Animation and Rendering .   ACM, 2011, pp. 117–124.
  • [21] H. Winnemöller, S. C. Olsen, and B. Gooch, “Real-time video abstraction,” in ACM Transactions On Graphics (TOG) , vol. 25, no. 3.   ACM, 2006, pp. 1221–1226.
  • [22] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Proceedings of the IEEE International Conference on Computer Vision .   IEEE, 1998, pp. 839–846.
  • [23] B. Gooch, E. Reinhard, and A. Gooch, “Human facial illustrations: Creation and psychophysical evaluation,” ACM Transactions on Graphics , vol. 23, no. 1, pp. 27–44, 2004.
  • [24] L.-Y. Wei, S. Lefebvre, V. Kwatra, and G. Turk, “State of the art in example-based texture synthesis,” in Eurographics 2009, State of the Art Report, EG-STAR .   Eurographics Association, 2009, pp. 93–117.
  • [25] A. A. Efros and T. K. Leung, “Texture synthesis by non-parametric sampling,” in Proceedings of the IEEE International Conference on Computer Vision , vol. 2.   IEEE, 1999, pp. 1033–1038.
  • [26] L.-Y. Wei and M. Levoy, “Fast texture synthesis using tree-structured vector quantization,” in Proceedings of the 27th annual conference on Computer graphics and interactive techniques .   ACM Press/Addison-Wesley Publishing Co., 2000, pp. 479–488.
  • [27] B. Julesz, “Visual pattern discrimination,” IRE transactions on Information Theory , vol. 8, no. 2, pp. 84–92, 1962.
  • [28] D. J. Heeger and J. R. Bergen, “Pyramid-based texture analysis/synthesis,” in Proceedings of the 22nd annual conference on Computer graphics and interactive techniques .   ACM, 1995, pp. 229–238.
  • [29] J. Portilla and E. P. Simoncelli, “A parametric texture model based on joint statistics of complex wavelet coefficients,” International journal of computer vision , vol. 40, no. 1, pp. 49–70, 2000.
  • [30] L. A. Gatys, A. S. Ecker, and M. Bethge, “Texture synthesis using convolutional neural networks,” in Advances in Neural Information Processing Systems , 2015, pp. 262–270.
  • [31] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 , 2014.
  • [32] G. Berger and R. Memisevic, “Incorporating long-range consistency in cnn-based texture generation,” in International Conference on Learning Representations , 2017.
  • [33] A. Mahendran and A. Vedaldi, “Understanding deep image representations by inverting them,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2015, pp. 5188–5196.
  • [34] ——, “Visualizing deep convolutional neural networks using natural pre-images,” International Journal of Computer Vision , vol. 120, no. 3, pp. 233–255, 2016.
  • [35] A. Dosovitskiy and T. Brox, “Inverting visual representations with convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 4829–4837.
  • [36] ——, “Generating images with perceptual similarity metrics based on deep networks,” in Advances in Neural Information Processing Systems , 2016, pp. 658–666.
  • [37] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems , 2014, pp. 2672–2680.
  • [38] X. Xie, F. Tian, and H. S. Seah, “Feature guided texture synthesis (fgts) for artistic style transfer,” in Proceedings of the 2nd international conference on Digital interactive media in entertainment and arts .   ACM, 2007, pp. 44–49.
  • [39] M. Ashikhmin, “Fast texture transfer,” IEEE Computer Graphics and Applications , no. 4, pp. 38–43, 2003.
  • [40] A. Mordvintsev, C. Olah, and M. Tyka, “Inceptionism: Going deeper into neural networks,” 2015. [Online]. Available: https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html
  • [41] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778.
  • [42] Y. Li, N. Wang, J. Liu, and X. Hou, “Demystifying neural style transfer,” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17 , 2017, pp. 2230–2236. [Online]. Available: https://doi.org/10.24963/ijcai.2017/310
  • [43] V. M. Patel, R. Gopalan, R. Li, and R. Chellappa, “Visual domain adaptation: A survey of recent advances,” IEEE signal processing magazine , vol. 32, no. 3, pp. 53–69, 2015.
  • [44] E. Risser, P. Wilmot, and C. Barnes, “Stable and controllable neural texture synthesis and style transfer using histogram losses,” ArXiv e-prints , Jan. 2017.
  • [45] S. Li, X. Xu, L. Nie, and T.-S. Chua, “Laplacian-steered neural style transfer,” in Proceedings of the 2017 ACM on Multimedia Conference .   ACM, 2017, pp. 1716–1724.
  • [46] C. Li and M. Wand, “Combining markov random fields and convolutional neural networks for image synthesis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 2479–2486.
  • [47] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision , 2016, pp. 694–711.
  • [48] D. Ulyanov, V. Lebedev, A. Vedaldi, and V. Lempitsky, “Texture networks: Feed-forward synthesis of textures and stylized images,” in International Conference on Machine Learning , 2016, pp. 1349–1357.
  • [49] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” ArXiv e-prints , Nov. 2015.
  • [50] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 6924–6932.
  • [51] X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 1501–1510.
  • [52] C. Li and M. Wand, “Precomputed real-time texture synthesis with markovian generative adversarial networks,” in European Conference on Computer Vision , 2016, pp. 702–716.
  • [53] V. Dumoulin, J. Shlens, and M. Kudlur, “A learned representation for artistic style,” in International Conference on Learning Representations , 2017.
  • [54] D. Chen, L. Yuan, J. Liao, N. Yu, and G. Hua, “Stylebank: An explicit representation for neural image style transfer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 1897–1906.
  • [55] Y. Li, F. Chen, J. Yang, Z. Wang, X. Lu, and M.-H. Yang, “Diversified texture synthesis with feed-forward networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 3920–3928.
  • [56] H. Zhang and K. Dana, “Multi-style generative network for real-time transfer,” arXiv preprint arXiv:1703.06953 , 2017.
  • [57] T. Q. Chen and M. Schmidt, “Fast patch-based style transfer of arbitrary style,” in Proceedings of the NIPS Workshop on Constructive Machine Learning , 2016.
  • [58] G. Ghiasi, H. Lee, M. Kudlur, V. Dumoulin, and J. Shlens, “Exploring the structure of a real-time, arbitrary neural artistic stylization network,” in Proceedings of the British Machine Vision Conference , 2017.
  • [59] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang, “Universal style transfer via feature transforms,” in Advances in Neural Information Processing Systems , 2017, pp. 385–395.
  • [60] L. A. Gatys, A. S. Ecker, M. Bethge, A. Hertzmann, and E. Shechtman, “Controlling perceptual factors in neural style transfer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 3985–3993.
  • [61] Y. Jing, Y. Liu, Y. Yang, Z. Feng, Y. Yu, D. Tao, and M. Song, “Stroke controllable fast style transfer with adaptive receptive fields,” in European Conference on Computer Vision , 2018.
  • [62] X. Wang, G. Oxholm, D. Zhang, and Y.-F. Wang, “Multimodal transfer: A hierarchical deep convolutional neural network for fast artistic style transfer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 5239–5247.
  • [63] X.-C. Liu, M.-M. Cheng, Y.-K. Lai, and P. L. Rosin, “Depth-aware neural style transfer,” in Proceedings of the Symposium on Non-Photorealistic Animation and Rendering , 2017, pp. 4:1–4:10.
  • [64] W. Chen, Z. Fu, D. Yang, and J. Deng, “Single-image depth perception in the wild,” in Advances in Neural Information Processing Systems , 2016, pp. 730–738.
  • [65] A. J. Champandard, “Semantic style transfer and turning two-bit doodles into fine artworks,” ArXiv e-prints , Mar. 2016.
  • [66] J. Ye, Z. Feng, Y. Jing, and M. Song, “Finer-net: Cascaded human parsing with hierarchical granularity,” in Proceedings of the IEEE International Conference on Multimedia and Expo .   IEEE, 2018, pp. 1–6.
  • [67] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, and A. Agrawal, “Context encoding for semantic segmentation,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition .   IEEE, 2018.
  • [68] Y.-L. Chen and C.-T. Hsu, “Towards deep style transfer: A content-aware perspective,” in Proceedings of the British Machine Vision Conference , 2016.
  • [69] R. Mechrez, I. Talmi, and L. Zelnik-Manor, “The contextual loss for image transformation with non-aligned data,” in European Conference on Computer Vision , 2018.
  • [70] M. Lu, H. Zhao, A. Yao, F. Xu, Y. Chen, and L. Zhang, “Decoder network over lightweight reconstructed feature for fast semantic style transfer,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 2469–2477.
  • [71] C. Castillo, S. De, X. Han, B. Singh, A. K. Yadav, and T. Goldstein, “Son of zorn’s lemma: Targeted style transfer using instance-aware semantic segmentation,” in IEEE International Conference on Acoustics, Speech and Signal Processing .   IEEE, 2017, pp. 1348–1352.
  • [72] D. Chen, L. Yuan, J. Liao, N. Yu, and G. Hua, “Stereoscopic neural style transfer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018.
  • [73] A. Selim, M. Elgharib, and L. Doyle, “Painting style transfer for head portraits using convolutional neural networks,” ACM Transactions on Graphics , vol. 35, no. 4, p. 129, 2016.
  • [74] M. Ruder, A. Dosovitskiy, and T. Brox, “Artistic style transfer for videos,” in German Conference on Pattern Recognition , 2016, pp. 26–36.
  • [75] ——, “Artistic style transfer for videos and spherical images,” International Journal of Computer Vision , 2018.
  • [76] P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, “Deepflow: Large displacement optical flow with deep matching,” in Proceedings of the IEEE International Conference on Computer Vision .   IEEE, 2013, pp. 1385–1392.
  • [77] J. Revaud, P. Weinzaepfel, Z. Harchaoui, and C. Schmid, “Epicflow: Edge-preserving interpolation of correspondences for optical flow,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2015, pp. 1164–1172.
  • [78] H. Huang, H. Wang, W. Luo, L. Ma, W. Jiang, X. Zhu, Z. Li, and W. Liu, “Real-time neural style transfer for videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 783–791.
  • [79] A. Gupta, J. Johnson, A. Alahi, and L. Fei-Fei, “Characterizing and improving stability in neural style transfer,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 4067–4076.
  • [80] D. Chen, J. Liao, L. Yuan, N. Yu, and G. Hua, “Coherent online video style transfer,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 1105–1114.
  • [81] G. Atarsaikhan, B. K. Iwana, A. Narusawa, K. Yanai, and S. Uchida, “Neural font style transfer,” in Proceedings of the IAPR International Conference on Document Analysis and Recognition , vol. 5.   IEEE, 2017, pp. 51–56.
  • [82] S. Yang, J. Liu, Z. Lian, and Z. Guo, “Awesome typography: Statistics-based text effects transfer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 7464–7473.
  • [83] S. Azadi, M. Fisher, V. Kim, Z. Wang, E. Shechtman, and T. Darrell, “Multi-content gan for few-shot font style transfer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018.
  • [84] F. Luan, S. Paris, E. Shechtman, and K. Bala, “Deep photo style transfer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition .   IEEE, 2017, pp. 6997–7005.
  • [85] R. Mechrez, E. Shechtman, and L. Zelnik-Manor, “Photorealistic style transfer with screened poisson equation,” in Proceedings of the British Machine Vision Conference , 2017.
  • [86] Y. Li, M.-Y. Liu, X. Li, M.-H. Yang, and J. Kautz, “A closed-form solution to photorealistic image stylization,” in European Conference on Computer Vision , 2018.
  • [87] L. Zhang, Y. Ji, and X. Lin, “Style transfer for anime sketches with enhanced residual u-net and auxiliary classifier gan,” in Proceedings of the Asian Conference on Pattern Recognition , 2017.
  • [88] J. Liao, Y. Yao, L. Yuan, G. Hua, and S. B. Kang, “Visual attribute transfer through deep image analogy,” ACM Transactions on Graphics (TOG) , vol. 36, no. 4, p. 120, 2017.
  • [89] S. Jiang and Y. Fu, “Fashion style generator,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence .   AAAI Press, 2017, pp. 3721–3727.
  • [90] P. Verma and J. O. Smith, “Neural style transfer for audio spectograms,” in Proceedings of the NIPS Workshop on Machine Learning for Creativity and Design , 2017.
  • [91] P. K. Mital, “Time domain neural audio style transfer,” in Proceedings of the NIPS Workshop on Machine Learning for Creativity and Design , 2018.
  • [92] D. Mould and P. L. Rosin, “A benchmark image set for evaluating stylization,” in Proceedings of the Joint Symposium on Computational Aesthetics and Sketch Based Interfaces and Modeling and Non-Photorealistic Animation and Rendering .   Eurographics Association, 2016, pp. 11–20.
  • [93] ——, “Developing and applying a benchmark for evaluating image stylization,” Computers & Graphics , vol. 67, pp. 58–76, 2017.
  • [94] J. Wang, H. Jiang, Z. Yuan, M.-M. Cheng, X. Hu, and N. Zheng, “Salient object detection: A discriminative regional feature integration approach.” International Journal of Computer Vision , vol. 123, no. 2, 2017.
  • [95] P. L. Rosin, D. Mould, I. Berger, J. Collomosse, Y.-K. Lai, C. Li, H. Li, A. Shamir, M. Wand, T. Wang, and H. Winnemöller, “Benchmarking non-photorealistic rendering of portraits,” in Proceedings of the Symposium on Non-Photorealistic Animation and Rendering .   ACM, 2017, p. 11.
  • [96] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision .   Springer, 2014, pp. 740–755.
  • [97] J. Johnson, “neural-style,” https://github.com/jcjohnson/neural-style , 2015.
  • [98] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 2223–2232.
  • [99] “DeepArt,” 2016. [Online]. Available: https://deepart.io/
  • [100] R. Sreeraman, “Neuralstyler: Turn your videos/photos/gif into art,” 2016. [Online]. Available: http://neuralstyler.com/
  • [101] A. Semmo, M. Trapp, J. Döllner, and M. Klingbeil, “Pictory: Combining neural style transfer and image filtering,” in ACM SIGGRAPH 2017 Appy Hour .   ACM, 2017, pp. 5:1–5:2.
  • [102] S. Pasewaldt, A. Semmo, M. Klingbeil, and J. Döllner, “Pictory - neural style transfer and editing with coreml,” in SIGGRAPH Asia 2017 Mobile Graphics & Interactive Applications .   ACM, 2017, pp. 12:1–12:2.
  • [103] T. Dürschmid, M. Söchting, A. Semmo, M. Trapp, and J. Döllner, “Prosumerfx: Mobile design of image stylization components,” in SIGGRAPH Asia 2017 Mobile Graphics & Interactive Applications .   ACM, 2017, pp. 1:1–1:8.
  • [104] Y. Jia and P. Vajda, “Delivering real-time ai in the palm of your hand,” 2016. [Online]. Available: https://code.facebook.com/posts/196146247499076/delivering-real-time-ai-in-the-palm-of-your-hand
  • [105] B. J. Joshi, K. Stewart, and D. Shapiro, “Bringing impressionism to life with neural style transfer in come swim,” ArXiv e-prints , Jan. 2017.
  • [106] J. Fišer, O. Jamriška, M. Lukáč, E. Shechtman, P. Asente, J. Lu, and D. Sỳkora, “Stylit: illumination-guided example-based stylization of 3d renderings,” ACM Transactions on Graphics (TOG) , vol. 35, no. 4, p. 92, 2016.
  • [107] D. H. Salesin, “Non-photorealistic animation & rendering: 7 grand challenges,” Keynote talk at NPAR , 2002.
  • [108] A. A. Gooch, J. Long, L. Ji, A. Estey, and B. S. Gooch, “Viewing progress in non-photorealistic rendering through heinlein’s lens,” in Proceedings of the 8th International Symposium on Non-Photorealistic Animation and Rendering .   ACM, 2010, pp. 165–171.
  • [109] D. DeCarlo and M. Stone, “Visual explanations,” in Proceedings of the 8th international symposium on non-photorealistic animation and rendering .   ACM, 2010, pp. 173–178.
  • [110] A. Hertzmann, “Non-photorealistic rendering and the science of art,” in Proceedings of the 8th International Symposium on Non-Photorealistic Animation and Rendering .   ACM, 2010, pp. 147–157.
  • [111] D. Mould, “Authorial subjective evaluation of non-photorealistic images,” in Proceedings of the Workshop on Non-Photorealistic Animation and Rendering .   ACM, 2014, pp. 49–56.
  • [112] T. Isenberg, P. Neumann, S. Carpendale, M. C. Sousa, and J. A. Jorge, “Non-photorealistic rendering in context: an observational study,” in Proceedings of the 4th international symposium on Non-photorealistic animation and rendering .   ACM, 2006, pp. 115–126.
  • [113] J. Ren, X. Shen, Z. Lin, R. Mech, and D. J. Foran, “Personalized image aesthetics,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 638–647.
  • [114] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE transactions on pattern analysis and machine intelligence , vol. 35, no. 8, pp. 1798–1828, 2013.
  • [115] H. Kim and A. Mnih, “Disentangling by factorising,” in International Conference on Machine Learning , 2018.
  • [116] Z. Feng, X. Wang, C. Ke, A. Zeng, D. Tao, and M. Song, “Dual swap disentangling,” in Advances in neural information processing systems , 2018.
  • [117] Z. Feng, Z. Yu, Y. Yang, Y. Jing, J. Jiang, and M. Song, “Interpretable partitioned embedding for customized multi-item fashion outfit composition,” in Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval .   ACM, 2018, pp. 143–151.
  • [118] I. Higgins, N. Sonnerat, L. Matthey, A. Pal, C. P. Burgess, M. Botvinick, D. Hassabis, and A. Lerchner, “Scan: learning abstract hierarchical compositional visual concepts,” in International Conference on Learning Representations , 2018.
  • [119] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, “Building machines that learn and think like people,” Behavioral and Brain Sciences , vol. 40, 2017.
  • [120] D. Bouchacourt, R. Tomioka, and S. Nowozin, “Multi-level variational autoencoder: Learning disentangled representations from grouped observations,” in AAAI Conference on Artificial Intelligence , 2018.
  • [121] C. Wang, C. Wang, C. Xu, and D. Tao, “Tag disentangled generative adversarial networks for object image re-rendering,” in International Joint Conference on Artificial Intelligence , 2017.
  • [122] I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae: Learning basic visual concepts with a constrained variational framework,” in International Conference on Learning Representations , 2017.
  • [123] J. Cho, S. Yun, K. Lee, and J. Y. Choi, “Palettenet: Image recolorization with given color palette,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops , 2017, pp. 62–70.
  • [124] S. Meyer, V. Cornillère, A. Djelouah, C. Schroers, and M. Gross, “Deep video color propagation,” in Proceedings of the British Machine Vision Conference , 2018.
  • [125] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in International Conference on Learning Representations , 2014.
  • [126] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in International Conference on Learning Representations , 2015.
  • [127] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018.
  • [128] P. Upchurch, J. Gardner, G. Pleiss, R. Pless, N. Snavely, K. Bala, and K. Weinberger, “Deep feature interpolation for image content changes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 7064–7073.
  • [129] M. He, D. Chen, J. Liao, P. V. Sander, and L. Yuan, “Deep exemplar-based colorization,” ACM Transactions on Graphics (Proc. of Siggraph 2018) , 2018.
  • [130] Q. Fan, D. Chen, L. Yuan, G. Hua, N. Yu, and B. Chen, “Decouple learning for parameterized image operators,” in European Conference on Computer Vision , 2018.
  • [131] A. Atapour-Abarghouei and T. Breckon, “Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018.
  • [132] C. Chen, X. Tan, and K.-Y. K. Wong, “Face sketch synthesis with style transfer using pyramid column feature,” in IEEE Winter Conference on Applications of Computer Vision .   Lake Tahoe, USA, 2018.
  • [133] I. Korshunova, W. Shi, J. Dambre, and L. Theis, “Fast face-swap using convolutional neural networks,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 3677–3685.
  • [134] W. Zhang, S. Xiao, and X. Shi, “Low-poly style image and video processing,” in Systems, Signals and Image Processing (IWSSIP), 2015 International Conference on .   IEEE, 2015, pp. 97–100.
  • [135] M. Gai and G. Wang, “Artistic low poly rendering for images,” The visual computer , vol. 32, no. 4, pp. 491–500, 2016.
  • [136] T. Gerstner, D. DeCarlo, M. Alexa, A. Finkelstein, Y. Gingold, and A. Nealen, “Pixelated image abstraction,” in Proceedings of the Symposium on Non-Photorealistic Animation and Rendering .   Eurographics Association, 2012, pp. 29–36.
  • [137] X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” arXiv preprint arXiv:1804.04732 , 2018.

ar5iv homepage

Captcha Page

We apologize for the inconvenience...

To ensure we keep this website safe, please can you confirm you are a human by ticking the box below.

If you are unable to complete the above request please contact us using the below link, providing a screenshot of your experience.

https://ioppublishing.org/contacts/

  • DOI: 10.1109/TVCG.2019.2921336
  • Corpus ID: 4875951

Neural Style Transfer: A Review

  • Yongcheng Jing , Yezhou Yang , +2 authors Mingli Song
  • Published in IEEE Transactions on… 11 May 2017
  • Computer Science

Figures and Tables from this paper

figure 1

655 Citations

A study of neural artistic style transfer models and architectures for indian art styles, stada: style transfer as data augmentation, depth-aware neural style transfer using instance normalization, cs 229 project final report : neural style transfer, neural style transfer: a critical review, development of real-time style transfer for video system, neural comic style transfer: case study, style is a distribution of features, style transfer-based image synthesis as an efficient regularization technique in deep learning, a state-of-the-arts and prospective in neural style transfer.

  • Highly Influenced

170 References

Style transfer via texture synthesis, depth-aware neural style transfer, neural style transfer: a paradigm shift for image-based artistic rendering, exploring the neural algorithm of artistic style, towards deep style transfer: a content-aware perspective, stable and controllable neural texture synthesis and style transfer using histogram losses.

  • Highly Influential

Deep Photo Style Transfer

Multimodal transfer: a hierarchical deep convolutional neural network for fast artistic style transfer, real-time neural style transfer for videos, semantic style transfer and turning two-bit doodles into fine artworks.

  • 11 Excerpts

Related Papers

Showing 1 through 3 of 0 Related Papers

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Neural Style Transfer: A Review

  • PMID: 31180860
  • DOI: 10.1109/TVCG.2019.2921336

The seminal work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNNs) in creating artistic imagery by separating and recombining image content and style. This process of using CNNs to render a content image in different styles is referred to as Neural Style Transfer (NST). Since then, NST has become a trending topic both in academic literature and industrial applications. It is receiving increasing attention and a variety of approaches are proposed to either improve or extend the original NST algorithm. In this paper, we aim to provide a comprehensive overview of the current progress towards NST. We first propose a taxonomy of current algorithms in the field of NST. Then, we present several evaluation methods and compare different NST algorithms both qualitatively and quantitatively. The review concludes with a discussion of various applications of NST and open problems for future research. A list of papers discussed in this review, corresponding codes, pre-trained models and more comparison results are publicly available at: https://osf.io/f8tu4/.

PubMed Disclaimer

Similar articles

  • A study of neural artistic style transfer models and architectures for Indian art styles. Mercy Faustina J, Akash V, Gupta A, Divya V, Manoj T, Sadagopan N, Sivaselvan B. Mercy Faustina J, et al. Network. 2023 Feb-Nov;34(4):282-305. doi: 10.1080/0954898X.2023.2252073. Epub 2023 Sep 5. Network. 2023. PMID: 37668425
  • Robust Nonparametric Distribution Transfer with Exposure Correction for Image Neural Style Transfer. Liu S, Hong C, He J, Tian Z. Liu S, et al. Sensors (Basel). 2020 Sep 14;20(18):5232. doi: 10.3390/s20185232. Sensors (Basel). 2020. PMID: 32937788 Free PMC article.
  • Style Transfer Via Texture Synthesis. Elad M, Milanfar P. Elad M, et al. IEEE Trans Image Process. 2017 May;26(5):2338-2351. doi: 10.1109/TIP.2017.2678168. Epub 2017 Mar 8. IEEE Trans Image Process. 2017. PMID: 28287968
  • Overview of image-to-image translation by use of deep neural networks: denoising, super-resolution, modality conversion, and reconstruction in medical imaging. Kaji S, Kida S. Kaji S, et al. Radiol Phys Technol. 2019 Sep;12(3):235-248. doi: 10.1007/s12194-019-00520-y. Epub 2019 Jun 20. Radiol Phys Technol. 2019. PMID: 31222562 Review.
  • Uncovering Ecological Patterns with Convolutional Neural Networks. Brodrick PG, Davies AB, Asner GP. Brodrick PG, et al. Trends Ecol Evol. 2019 Aug;34(8):734-745. doi: 10.1016/j.tree.2019.03.006. Epub 2019 May 8. Trends Ecol Evol. 2019. PMID: 31078331 Review.
  • Sand Painting Generation Based on Convolutional Neural Networks. Chang CC, Peng PH. Chang CC, et al. J Imaging. 2024 Feb 7;10(2):44. doi: 10.3390/jimaging10020044. J Imaging. 2024. PMID: 38392092 Free PMC article.
  • A hybrid Cycle GAN-based lightweight road perception pipeline for road dataset generation for Urban mobility. Rajagopal BG, Kumar M, Alshehri AH, Alanazi F, Deifalla AF, Yosri AM, Azam A. Rajagopal BG, et al. PLoS One. 2023 Nov 30;18(11):e0293978. doi: 10.1371/journal.pone.0293978. eCollection 2023. PLoS One. 2023. PMID: 38032941 Free PMC article.
  • Collagen fiber centerline tracking in fibrotic tissue via deep neural networks with variational autoencoder-based synthetic training data generation. Park H, Li B, Liu Y, Nelson MS, Wilson HM, Sifakis E, Eliceiri KW. Park H, et al. Med Image Anal. 2023 Dec;90:102961. doi: 10.1016/j.media.2023.102961. Epub 2023 Sep 12. Med Image Anal. 2023. PMID: 37802011
  • D-LMBmap: a fully automated deep-learning pipeline for whole-brain profiling of neural circuitry. Li Z, Shang Z, Liu J, Zhen H, Zhu E, Zhong S, Sturgess RN, Zhou Y, Hu X, Zhao X, Wu Y, Li P, Lin R, Ren J. Li Z, et al. Nat Methods. 2023 Oct;20(10):1593-1604. doi: 10.1038/s41592-023-01998-6. Epub 2023 Sep 28. Nat Methods. 2023. PMID: 37770711 Free PMC article.
  • Neural noiseprint transfer: a generic noiseprint-based counter forensics framework. Elliethy A. Elliethy A. PeerJ Comput Sci. 2023 Apr 27;9:e1359. doi: 10.7717/peerj-cs.1359. eCollection 2023. PeerJ Comput Sci. 2023. PMID: 37346693 Free PMC article.

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • IEEE Computer Society
  • IEEE Engineering in Medicine and Biology Society

Other Literature Sources

  • The Lens - Patent Citations
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

✏️ Neural Style Transfer: A Review

ycjing/Neural-Style-Transfer-Papers

Folders and files.

NameName
181 Commits

Repository files navigation

Neural-style-transfer-papers.

Selected papers, corresponding codes and pre-trained models in our review paper "Neural Style Transfer: A Review" [arXiv Version] [IEEE Version]

The corresponding OSF repository can be found at: https://osf.io/f8tu4/ .

If I missed your paper in this review, please email me or just pull a request here. I am more than happy to add it. Thanks!

If you find this repository useful for your research, please consider citing

Please also consider citing our ECCV paper and AAAI (Oral) paper:

There is a recent nice NST framework called pystiche , developed by Philip Meier . If you are interested, please refer to https://github.com/pmeier/pystiche . A package that comprises reference implementations of NST papers with pystiche can be found at pystiche_papers (work in progress).

[June, 2019] Update the Images (TVCG) (.png) and Supplementary Material (TVCG) in the Materials . Warmly welcome to use Images (TVCG) for comparison results in your paper!

[May, 2019] Our paper Neural Style Transfer: A Review has been accepted by TVCG as a regular paper. This repository will be updated soon.

[July, 2018] Our paper Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields has been accepted by ECCV 2018. Our review will be updated correspondingly.

[June, 2018] Upload a new version of our paper on arXiv which adds several missing papers (e.g., the work of Wang et al. ZM-Net: Real-time Zero-shot Image Manipulation Network ).

[Apr, 2018] We have released a new version of the paper with significant changes at: https://arxiv.org/pdf/1705.04058.pdf Appreciate the feedback!

[Feb, 2018] Update the Images (Images_neuralStyleTransferReview_v2) in the Materials . Add the results of Li et al.'s NIPS 2017 paper.

[Jan, 2018] Pre-trained models and all the content images , the style images , and the stylized results in the paper have been released.

neural style transfer research paper

Materials corresponding to Our Paper

✅ Supplementary Material (TVCG)

✅ Pre-trained Models

✅ Images (TVCG)(.png)

A Taxonomy of Current Methods

1. image-optimisation-based online neural methods, 1.1. parametric neural methods with summary statistics.

✅ [ A Neural Algorithm of Artistic Style ] [Paper] (First Neural Style Transfer Paper)

  • Torch-based
  • TensorFlow-based
  • TensorFlow-based with L-BFGS optimizer support
  • Caffe-based
  • Keras-based
  • MXNet-based
  • MatConvNet-based

✅ [ Image Style Transfer Using Convolutional Neural Networks ] [Paper] (CVPR 2016)

✅ [ Incorporating Long-range Consistency in CNN-based Texture Generation ] [Paper] (ICLR 2017)

  •   Theano-based

✅ [ Laplacian-Steered Neural Style Transfer ] [Paper] (ACM MM 2017)

  • Torch-based & TensorFlow-based

✅ [ Demystifying Neural Style Transfer ] [Paper] (Theoretical Explanation) (IJCAI 2017)

  •   MXNet-based

✅ [ Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses ] [Paper]

1.2. Non-parametric Neural Methods with MRFs

✅ [ Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis ] [Paper] (CVPR 2016)

✅ [ Arbitrary Style Transfer with Deep Feature Reshuffle ] [Paper] (CVPR 2018)

2. Model-Optimisation-Based Offline Neural Methods

2.1. per-style-per-model neural methods.

✅ [ Perceptual Losses for Real-Time Style Transfer and Super-Resolution ] [Paper] (ECCV 2016)

  • Chainer-based

❇️ Pre-trained Models:

  • Torch-models
  • Chainer-models

✅ [ Texture Networks: Feed-forward Synthesis of Textures and Stylized Images ] [Paper] (ICML 2016)

✅ [ Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks ] [Paper] (ECCV 2016)

2.2. Multiple-Style-Per-Model Neural Methods

✅ [ A Learned Representation for Artistic Style ] [Paper] (ICLR 2017)

✅ [ Multi-style Generative Network for Real-time Transfer ] [Paper]   (arXiv, 03/2017)

  • PyTorch-based

✅ [ Diversified Texture Synthesis With Feed-Forward Networks ] [Paper] (CVPR 2017)

  •   Torch-based

✅ [ StyleBank: An Explicit Representation for Neural Image Style Transfer ] [Paper] (CVPR 2017)

2.3. Arbitrary-Style-Per-Model Neural Methods

✅ [ Fast Patch-based Style Transfer of Arbitrary Style ] [Paper]

✅ [ Exploring the Structure of a Real-time, Arbitrary Neural Artistic Stylization Network ] [Paper] (BMVC 2017)

✅ [ Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization ] [Paper] (ICCV 2017)

  • TensorFlow-based with Keras
  • TensorFlow-based without Keras

✅ [ Dynamic Instance Normalization for Arbitrary Style Transfer ] [Paper] (AAAI 2020)

✅ [ Universal Style Transfer via Feature Transforms ] [Paper] (NIPS 2017)

  • PyTorch-based #1
  • PyTorch-based #2

✅ [ Meta Networks for Neural Style Transfer ] [Paper] (CVPR 2018)

✅ [ ZM-Net: Real-time Zero-shot Image Manipulation Network ] [Paper]

✅ [ Avatar-Net: Multi-Scale Zero-Shot Style Transfer by Feature Decoration ] [Paper] (CVPR 2018)

✅ [ Learning Linear Transformations for Fast Arbitrary Style Transfer ] [Paper]

Improvements and Extensions

✅ [ Preserving Color in Neural Artistic Style Transfer ] [Paper]

✅ [ Controlling Perceptual Factors in Neural Style Transfer ] [Paper] (CVPR 2017)

✅ [ Content-Aware Neural Style Transfer ] [Paper]

✅ [ Towards Deep Style Transfer: A Content-Aware Perspective ] [Paper] (BMVC 2016)

✅ [ Neural Doodle_Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artwork ] [Paper]

✅ [ Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artwork ] [Paper]

✅ [ The Contextual Loss for Image Transformation with Non-Aligned Data ] [Paper] (ECCV 2018)

✅ [ Improved Texture Networks: Maximizing Quality and Diversity in Feed-forward Stylization and Texture Synthesis ] [Paper] (CVPR 2017)

✅ [ Instance Normalization:The Missing Ingredient for Fast Stylization ] [Paper]

✅ [ A Style-Aware Content Loss for Real-time HD Style Transfer ] [Paper] (ECCV 2018)

✅ [ Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer ] [Paper] (CVPR 2017)

✅ [ Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields ] [Paper] (ECCV 2018)

✅ [ Depth-Preserving Style Transfer ] [Paper]

✅ [ Depth-Aware Neural Style Transfer ] [Paper] (NPAR 2017)

✅ [ Neural Style Transfer: A Paradigm Shift for Image-based Artistic Rendering? ] [Paper] (NPAR 2017)

✅ [ Pictory: Combining Neural Style Transfer and Image Filtering ] [Paper] (ACM SIGGRAPH 2017 Appy Hour)

✅ [ Painting Style Transfer for Head Portraits Using Convolutional Neural Networks ] [Paper] (SIGGRAPH 2016)

✅ [ Son of Zorn's Lemma Targeted Style Transfer Using Instance-aware Semantic Segmentation ] [Paper] (ICASSP 2017)

✅ [ Style Transfer for Anime Sketches with Enhanced Residual U-net and Auxiliary Classifier GAN ] [Paper] (ACPR 2017)

✅ [ Artistic Style Transfer for Videos ] [Paper] (GCPR 2016)

✅ [ DeepMovie: Using Optical Flow and Deep Neural Networks to Stylize Movies ] [Paper]

✅ [ Characterizing and Improving Stability in Neural Style Transfer ] [Paper] ) (ICCV 2017)

✅ [ Coherent Online Video Style Transfer ] [Paper] (ICCV 2017)

✅ [ Real-Time Neural Style Transfer for Videos ] [Paper] (CVPR 2017)

✅ [ A Common Framework for Interactive Texture Transfer ] [Paper] (CVPR 2018)

✅ [ Deep Photo Style Transfer ] [Paper] (CVPR 2017)

✅ [ A Closed-form Solution to Photorealistic Image Stylization ] [Paper] (ECCV 2018)

✅ [ Photorealistic Style Transfer via Wavelet Transforms ] [Paper]

✅ [ Decoder Network Over Lightweight Reconstructed Feature for Fast Semantic Style Transfer ] [Paper] (ICCV 2017)

✅ [ Stereoscopic Neural Style Transfer ] [Paper] (CVPR 2018)

✅ [ Awesome Typography: Statistics-based Text Effects Transfer ] [Paper] (CVPR 2017)

  • Matlab-based

✅ [ Neural Font Style Transfer ] [Paper] (ICDAR 2017)

✅ [ Rewrite: Neural Style Transfer For Chinese Fonts ] [Project]

✅ [ Separating Style and Content for Generalized Style Transfer ] [Paper] (CVPR 2018)

✅ [ Visual Attribute Transfer through Deep Image Analogy ] [Paper] (SIGGRAPH 2017)

✅ [ Fashion Style Generator ] [Paper] (IJCAI 2017)

✅ [ Deep Painterly Harmonization ] [Paper]

✅ [ Fast Face-Swap Using Convolutional Neural Networks ] [Paper] (ICCV 2017)

✅ [ Learning Selfie-Friendly Abstraction from Artistic Style Images ] [Paper] (ACML 2018)

✅ [ Style Transfer with Adaptation to the Central Objects of the Scene ] [Paper] (NEUROINFORMATICS 2019)

Application

✅ AlterDraw

  • Website code

✅ Deep Forger

✅ NeuralStyler

✅ Style2Paints

Application Papers

✅ [ Bringing Impressionism to Life with Neural Style Transfer in Come Swim ] [Paper]

✅ [ Imaging Novecento. A Mobile App for Automatic Recognition of Artworks and Transfer of Artistic Styles ] [Paper]

✅ [ ProsumerFX: Mobile Design of Image Stylization Components ] [Paper]

✅ [ Pictory - Neural Style Transfer and Editing with coreML ] [Paper]

✅ [ Tiny Transform Net for Mobile Image Stylization ] [Paper] (ICMR 2017)

✅ [ Caffe2Go ][ https://code.facebook.com/posts/196146247499076/delivering-real-time-ai-in-the-palm-of-your-hand/ ]

✅ [ Supercharging Style Transfer ][ https://research.googleblog.com/2016/10/supercharging-style-transfer.html ]

✅ [ Issue of Layer Chosen Strategy ][ http://yongchengjing.com/pdf/Issue_layerChosenStrategy_neuralStyleTransfer.pdf ]

✅ [ Picking an optimizer for Style Transfer ][ https://blog.slavv.com/picking-an-optimizer-for-style-transfer-86e7b8cba84b ]

✅ [ Enhanced Color Style Transfer (Photo-surrealism Style Transfer) ] [Project]

✅ [ Conditional Fast Style Transfer Network ] [Paper]

✅ [ Unseen Style Transfer Based on a Conditional Fast Style Transfer Network ] [Paper]

✅ [ DeepStyleCam: A Real-time Style Transfer App on iOS ] [Paper]

✅ [ Deep Feature Rotation for Multimodal Image Style Transfer ] [Paper]   (NICS 2021)

Contributors 6

  • Español – América Latina
  • Português – Brasil
  • Tiếng Việt
  • TensorFlow Core

Neural style transfer

This tutorial uses deep learning to compose one image in the style of another image (ever wish you could paint like Picasso or Van Gogh?). This is known as neural style transfer and the technique is outlined in A Neural Algorithm of Artistic Style (Gatys et al.).

For a simple application of style transfer with a pretrained model from TensorFlow Hub , check out the Fast style transfer for arbitrary styles tutorial that uses an arbitrary image stylization model . For an example of style transfer with TensorFlow Lite , refer to Artistic style transfer with TensorFlow Lite .

Neural style transfer is an optimization technique used to take two images—a content image and a style reference image (such as an artwork by a famous painter)—and blend them together so the output image looks like the content image, but “painted” in the style of the style reference image.

This is implemented by optimizing the output image to match the content statistics of the content image and the style statistics of the style reference image. These statistics are extracted from the images using a convolutional network.

For example, let’s take an image of this dog and Wassily Kandinsky's Composition 7:

neural style transfer research paper

Yellow Labrador Looking , from Wikimedia Commons by Elf . License CC BY-SA 3.0

neural style transfer research paper

Now, what would it look like if Kandinsky decided to paint the picture of this Dog exclusively with this style? Something like this?

neural style transfer research paper

Import and configure modules

Download images and choose a style image and a content image:

Visualize the input

Define a function to load an image and limit its maximum dimension to 512 pixels.

Create a simple function to display an image:

png

Fast Style Transfer using TF-Hub

This tutorial demonstrates the original style-transfer algorithm, which optimizes the image content to a particular style. Before getting into the details, let's see how the TensorFlow Hub model does this:

png

Define content and style representations

Use the intermediate layers of the model to get the content and style representations of the image. Starting from the network's input layer, the first few layer activations represent low-level features like edges and textures. As you step through the network, the final few layers represent higher-level features—object parts like wheels or eyes . In this case, you are using the VGG19 network architecture, a pretrained image classification network. These intermediate layers are necessary to define the representation of content and style from the images. For an input image, try to match the corresponding style and content target representations at these intermediate layers.

Load a VGG19 and test run it on our image to ensure it's used correctly:

Now load a VGG19 without the classification head, and list the layer names

Choose intermediate layers from the network to represent the style and content of the image:

Intermediate layers for style and content

So why do these intermediate outputs within our pretrained image classification network allow us to define style and content representations?

At a high level, in order for a network to perform image classification (which this network has been trained to do), it must understand the image. This requires taking the raw image as input pixels and building an internal representation that converts the raw image pixels into a complex understanding of the features present within the image.

This is also a reason why convolutional neural networks are able to generalize well: they’re able to capture the invariances and defining features within classes (e.g. cats vs. dogs) that are agnostic to background noise and other nuisances. Thus, somewhere between where the raw image is fed into the model and the output classification label, the model serves as a complex feature extractor. By accessing intermediate layers of the model, you're able to describe the content and style of input images.

Build the model

The networks in tf.keras.applications are designed so you can easily extract the intermediate layer values using the Keras functional API.

To define a model using the functional API, specify the inputs and outputs:

model = Model(inputs, outputs)

This following function builds a VGG19 model that returns a list of intermediate layer outputs:

And to create the model:

Calculate style

The content of an image is represented by the values of the intermediate feature maps.

It turns out, the style of an image can be described by the means and correlations across the different feature maps. Calculate a Gram matrix that includes this information by taking the outer product of the feature vector with itself at each location, and averaging that outer product over all locations. This Gram matrix can be calculated for a particular layer as:

\[G^l_{cd} = \frac{\sum_{ij} F^l_{ijc}(x)F^l_{ijd}(x)}{IJ}\]

This can be implemented concisely using the tf.linalg.einsum function:

Extract style and content

Build a model that returns the style and content tensors.

When called on an image, this model returns the gram matrix (style) of the style_layers and content of the content_layers :

Run gradient descent

With this style and content extractor, you can now implement the style transfer algorithm. Do this by calculating the mean square error for your image's output relative to each target, then take the weighted sum of these losses.

Set your style and content target values:

Define a tf.Variable to contain the image to optimize. To make this quick, initialize it with the content image (the tf.Variable must be the same shape as the content image):

Since this is a float image, define a function to keep the pixel values between 0 and 1:

Create an optimizer. The paper recommends LBFGS, but Adam works okay, too:

To optimize this, use a weighted combination of the two losses to get the total loss:

Use tf.GradientTape to update the image.

Now run a few steps to test:

png

Since it's working, perform a longer optimization:

png

Total variation loss

One downside to this basic implementation is that it produces a lot of high frequency artifacts. Decrease these using an explicit regularization term on the high frequency components of the image. In style transfer, this is often called the total variation loss :

png

This shows how the high frequency components have increased.

Also, this high frequency component is basically an edge-detector. You can get similar output from the Sobel edge detector, for example:

png

The regularization loss associated with this is the sum of the squares of the values:

That demonstrated what it does. But there's no need to implement it yourself, TensorFlow includes a standard implementation:

Re-run the optimization

Choose a weight for the total_variation_loss :

Now include it in the train_step function:

Reinitialize the image-variable and the optimizer:

And run the optimization:

png

Finally, save the result:

This tutorial demonstrates the original style-transfer algorithm. For a simple application of style transfer check out this tutorial to learn more about how to use the arbitrary image style transfer model from TensorFlow Hub .

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-08-16 UTC.

Main Navigation

  • Contact NeurIPS
  • Code of Ethics
  • Code of Conduct
  • Create Profile
  • Journal To Conference Track
  • Diversity & Inclusion
  • Proceedings
  • Future Meetings
  • Exhibitor Information
  • Privacy Policy

NeurIPS 2024, the Thirty-eighth Annual Conference on Neural Information Processing Systems, will be held at the Vancouver Convention Center

Monday Dec 9 through Sunday Dec 15. Monday is an industry expo.

neural style transfer research paper

Registration

Pricing » Registration 2024 Registration Cancellation Policy »

Conference Hotels NeurIPS has contracted Hotel guest rooms for the Conference at group pricing, requiring reservations only through this page. Please do not make room reservations through any other channel, as it only impedes us from putting on the best Conference for you. We thank you for your assistance in helping us protect the NeurIPS conference.

Announcements

  • Financial Grant and Volunteer applications will open 1 Sep 
  • The 2024 visa process has changed, see the Visa Information page for details.
  • The  accepted competitions  have been released.

Latest NeurIPS Blog Entries [ All Entries ]

Aug 23, 2024
Aug 02, 2024
Jun 19, 2024
Jun 04, 2024
May 17, 2024
May 07, 2024
Apr 17, 2024
Apr 15, 2024
Mar 03, 2024
Dec 11, 2023

Important Dates

Mar 15 '24 11:46 AM PDT *
Apr 05 '24 (Anywhere on Earth)
Apr 21 '24 (Anywhere on Earth)
Main Conference Paper Submission Deadline May 22 '24 01:00 PM PDT *
May 22 '24 01:00 PM PDT *
Jun 14 '24 (Anywhere on Earth)
Sep 01 '24 06:00 AM PDT *
Sep 05 '24 (Anywhere on Earth)
Main Conference Author Notification Sep 25 '24 06:00 PM PDT *
Datasets and Benchmarks - Author Notification Sep 26 '24 (Anywhere on Earth)
Financial Assistance and Volunteer Applications closed Oct 01 '24 (Anywhere on Earth)
Workshop Accept/Reject Notification Date Oct 09 '24 (Anywhere on Earth)
Oct 23 '24 (Anywhere on Earth)
Oct 30 '24 (Anywhere on Earth)
Nov 15 '24 11:00 PM PST *

Timezone:

If you have questions about supporting the conference, please contact us .

View NeurIPS 2024 exhibitors » Become an 2024 Exhibitor Exhibitor Info »

Organizing Committee

General chair, program chair, workshop chair, workshop chair assistant, tutorial chair, competition chair, data and benchmark chair, diversity, inclusion and accessibility chair, affinity chair, ethics review chair, communication chair, social chair, journal chair, creative ai chair, workflow manager, logistics and it, mission statement.

The Neural Information Processing Systems Foundation is a non-profit corporation whose purpose is to foster the exchange of research advances in Artificial Intelligence and Machine Learning, principally by hosting an annual interdisciplinary academic conference with the highest ethical standards for a diverse and inclusive community.

About the Conference

The conference was founded in 1987 and is now a multi-track interdisciplinary annual meeting that includes invited talks, demonstrations, symposia, and oral and poster presentations of refereed papers. Along with the conference is a professional exposition focusing on machine learning in practice, a series of tutorials, and topical workshops that provide a less formal setting for the exchange of ideas.

More about the Neural Information Processing Systems foundation »

NeurIPS uses cookies to remember that you are logged in. By using our websites, you agree to the placement of cookies.
  • Transfer (Psychology)

Research on Neural Style Transfer

  • November 2021
  • Journal of Physics Conference Series 2079(1):012029
  • 2079(1):012029
  • This person is not on ResearchGate, or hasn't claimed this research yet.

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Jainil Shah
  • Vardhan Doshi
  • Ayush Kapadia
  • Neha Agarwal
  • Benny Hansen Lifindra
  • Darlis Herumurti
  • Imam Kuswardayan
  • Naiyan Wang

Jiaying Liu

  • Vadim Lebedev

Andrea Vedaldi

  • Victor Lempitsky

Leon A. Gatys

  • Michael Wand
  • Carlos D. Castillo

Soham De

  • Tom Goldstein

Pierre Wilmot

  • Eric Risser
  • Connelly Barnes

Justin Johnson

  • Guillaume Berger

Roland Memisevic

  • Mordvintsev
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

American Psychological Association

References provide the information necessary for readers to identify and retrieve each work cited in the text .

Check each reference carefully against the original publication to ensure information is accurate and complete. Accurately prepared references help establish your credibility as a careful researcher and writer.

Consistency in reference formatting allows readers to focus on the content of your reference list, discerning both the types of works you consulted and the important reference elements (who, when, what, and where) with ease. When you present each reference in a consistent fashion, readers do not need to spend time determining how you organized the information. And when searching the literature yourself, you also save time and effort when reading reference lists in the works of others that are written in APA Style.

neural style transfer research paper

Academic Writer ®

Master academic writing with APA’s essential teaching and learning resource

illustration or abstract figure and computer screen

Course Adoption

Teaching APA Style? Become a course adopter of the 7th edition Publication Manual

illustration of woman using a pencil to point to text on a clipboard

Instructional Aids

Guides, checklists, webinars, tutorials, and sample papers for anyone looking to improve their knowledge of APA Style

Enhancing Offline Signature Verification via Transfer Learning and Deep Neural Networks

  • Original Paper
  • Published: 19 August 2024
  • Volume 9 , article number  4 , ( 2024 )

Cite this article

neural style transfer research paper

  • S. Singh 1 ,
  • S. Chandra 1 &
  • Agya Ram Verma   ORCID: orcid.org/0000-0003-3139-7103 1 , 2  

11 Accesses

Explore all metrics

This paper presents a brief overview of signature identification and verification systems based on transfer learning. Different databases, namely CEDAR, ICDAR-2011, and BHSig260, are utilized for this study. In the field of biometrics and forensics, automated signature verification plays a crucial role in validating a person’s authenticity. The signature can be offline (handwritten) or online (digital). This study mainly focuses on offline signatures forged by the skilled forgers because offline systems lack dynamic information such as pressure and velocity available in online systems. The offline signatures are analyzed on pretrained models, and their efficiency is analyzed on two critical metrics in the field of biometrics and security systems, namely false acceptance rate (FAR) and false rejection rate (FRR). InceptionV3 model gives highest accuracy of 99.10% and lowest FRR and FAR of 1.03% and 0.74%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

neural style transfer research paper

Similar content being viewed by others

neural style transfer research paper

DeepSignature: fine-tuned transfer learning based signature verification system

neural style transfer research paper

Online Signature Verification Using Deep Learning Approach

neural style transfer research paper

Offline Handwritten Signature Forgery Verification Using Deep Learning Methods

Explore related subjects.

  • Artificial Intelligence

Data Availability

The study relied on the data provided in [ 16 , 17 ].

Naz S, Bibi K, Ahmad R (2022) DeepSignature: fine-tuned transfer learning based signature verification system. Multimed Tools Appl 81(26):38113–38122

Article   Google Scholar  

Poddar J, Parikh V, Bharti SK (2020) Offline signature recognition and forgery detection using deep learning. Proced Comput Sci 170:610–617

Vohra K (2021) Signature verification using support vector machine and convolution neural network. Turk J Comput Math Educ (TURCOMAT) 12(1S):80–89

Agarwal R, Verma OP (2020) An efficient copy move forgery detection using deep learning feature extraction and matching algorithm. Multimed Tools Appl 79:7355–7376. https://doi.org/10.1007/s11042-019-08495-z

Hafemann LG, Sabourin R, Oliveira LS (2020) Meta-learning for fast classifier adaptation to new users of signature verification systems. IEEE Trans Inf Forensics Secur 15:1735–1745. https://doi.org/10.1109/TIFS.2019.2949425

Ghosh R (2021) A recurrent neural network based deep learning model for offline signature verification and recognition system. Expert Syst Appl 168:114249

Ghosh S, Ghosh S, Kumar P, Scheme E, Roy PP (2021) A novel spatio-temporal Siamese network for 3D signature recognition. Pattern Recogn Lett 144:13–20

Hameed MM, Ahmad R, Kiah MLM, Murtaza G (2021) Machine learning-based offline signature verification systems: a systematic review. Signal Process: Image Commun 93:116139

Google Scholar  

Foroozandeh A, Hemmat AA, Rabbani H (2020). Offline handwritten signature verification and recognition based on deep transfer learning. In 2020 International conference on machine vision and image processing (MVIP) (pp. 1–7). IEEE

Alsuhimat FM, Mohamad FS (2023) A hybrid method of feature extraction for signatures verification using CNN and HOG a multi-classification approach. IEEE Access 11:21873–21882

Prajapati PR, Poudel S, Baduwal M, Burlakoti S, Panday SP (2021) Signature verification using convolutional neural network and autoencoder. J Inst Eng 16(1):33

Xia Z, Shi T, Xiong NN, Sun X, Jeon B (2018) A privacy-preserving handwritten signature verification method using combinational features and secure kNN. IEEE Access 6:46695–46705

Mshir S, Kaya M (2020). Signature recognition using machine learning. In 2020 8th International symposium on digital forensics and security (ISDFS) (pp. 1–4). IEEE

Okawa M (2019) Template matching using time-series averaging and DTW with dependent warping for online signature verification. IEEE Access 7:81010–81019

Lai S, Jin L, Yang W (2017). Online signature verification using recurrent neural network and length-normalized path signature descriptor. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) (Vol. 1, pp. 400–405). IEEE

CEDAR Signature Dataset | Papers With Code

Handwritten Signature Datasets (kaggle.com)

Verma AR, Chandra S, Singh GK et al (2023) ECG data compression using of empirical wavelet transform for telemedicine and e-healthcare systems. Augment Hum Res 8:2. https://doi.org/10.1007/s41133-023-00063-3

Download references

Author information

Authors and affiliations.

Department of Electronics and Communication Engineering, IIIT Allahabad, Prayagraj, India

S. Singh, S. Chandra & Agya Ram Verma

Department of Electronics and Communication Engineering, Govind Ballabh Pant Institute of Engineering and Technology, Pauri, India

Agya Ram Verma

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Agya Ram Verma .

Ethics declarations

Conflict of interest.

No conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Singh, S., Chandra, S. & Verma, A.R. Enhancing Offline Signature Verification via Transfer Learning and Deep Neural Networks. Augment Hum Res 9 , 4 (2024). https://doi.org/10.1007/s41133-024-00069-5

Download citation

Received : 05 April 2024

Revised : 05 August 2024

Accepted : 08 August 2024

Published : 19 August 2024

DOI : https://doi.org/10.1007/s41133-024-00069-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Handwritten signature
  • Transfer learning
  • Signature verification
  • Find a journal
  • Publish with us
  • Track your research

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

electronics-logo

Article Menu

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Bi-level orthogonal multi-teacher distillation.

neural style transfer research paper

1. Introduction

  • Our work introduces a novel BOMD approach that combines orthogonal projections and bi-level optimization for effective knowledge transfer from an ensemble of diverse teacher models.
  • A key component of our BOMD method is the use of bi-level optimization to learn optimal weighting factors for combining knowledge from multiple teachers. Unlike heuristic weighting strategies, our approach treats the weighting factors as upper-level variables and the student’s parameters as lower-level variables in a nested optimization problem.
  • Through extensive experiments on benchmark datasets, we validate the effectiveness and flexibility of our BOMD approach. Our method achieves state-of-the-art performance on the CIFAR-100 benchmark for multi-teacher knowledge distillation, consistently outperforming existing approaches across diverse teacher–student scenarios, including homogeneous and heterogeneous teacher ensembles.

2. Related Work

2.1. knowledge distillation, 2.2. multi-teacher knowledge distillation, 2.3. difference of our method vs. existing methods, 3. bi-level orthogonal multi-teacher distillation, 3.1. multi-teacher feature-based distillation, 3.2. multi-teacher logit-based distillation, 3.3. multiple orthogonal projections, 3.4. benefits and limitations, 3.5. bi-level optimization for weighting factors, 4. experiments, 4.1. datasets and implementation details, 4.2. settings and hyperparameters, 4.3. experimental framework and devices, 5. experiment results, 5.1. distillation performance of multi-teacher kd methods on cifar-100, 5.2. compared to single-teacher methods, 5.3. distillation performance on large-scale datasets, 5.4. results on cifar-100 with three teachers, 5.5. results on cifar-100 with five teachers, 5.6. advantages over other method, 5.7. analysis of our method, 5.8. limitations of our method, 6. conclusions.

  • Align teacher features with the student’s feature space through orthogonal projections, preserving structural properties during knowledge transfer.
  • Optimize weighting factors for combining teacher knowledge using a principled bi-level optimization approach.
  • Achieve significant performance improvements even when distilling to very compact student models.

Limitations and Future Work

Author contributions, data availability statement, conflicts of interest.

  • Dong, P.; Niu, X.; Li, L.; Xie, L.; Zou, W.; Ye, T.; Wei, Z.; Pan, H. Prior-Guided One-shot Neural Architecture Search. arXiv 2022 , arXiv:2206.13329. [ Google Scholar ] [ CrossRef ]
  • Dong, P.; Li, L.; Wei, Z.; Niu, X.; Tian, Z.; Pan, H. EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization. In Proceedings of the International Conference on Computer Vision (ICCV), Paris, France, 4–6 October 2023. [ Google Scholar ]
  • Zhu, C.; Li, L.; Wu, Y.; Sun, Z. Saswot: Real-time semantic segmentation architecture search without training. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 7722–7730. [ Google Scholar ]
  • Wei, Z.; Dong, P.; Hui, Z.; Li, A.; Li, L.; Lu, M.; Pan, H.; Li, D. Auto-prox: Training-free vision transformer architecture search via automatic proxy discovery. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 15814–15822. [ Google Scholar ]
  • Wei, Z.; Pan, H.; Li, L.; Dong, P.; Tian, Z.; Niu, X.; Li, D. TVT: Training-Free Vision Transformer Search on Tiny Datasets. arXiv 2023 , arXiv:2311.14337. [ Google Scholar ] [ CrossRef ]
  • Lu, L.; Chen, Z.; Lu, X.; Rao, Y.; Li, L.; Pang, S. UniADS: Universal Architecture-Distiller Search for Distillation Gap. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024. [ Google Scholar ]
  • Dong, P.; Li, L.; Pan, X.; Wei, Z.; Liu, X.; Wang, Q.; Chu, X. ParZC: Parametric Zero-Cost Proxies for Efficient NAS. arXiv 2024 , arXiv:2402.02105. [ Google Scholar ] [ CrossRef ]
  • Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015 , arXiv:1503.02531. [ Google Scholar ] [ CrossRef ]
  • Fukuda, T.; Suzuki, M.; Kurata, G.; Thomas, S.; Cui, J.; Ramabhadran, B. Efficient Knowledge Distillation from an Ensemble of Teachers. In Proceedings of the Interspeech, Stockholm, Sweden, 20–24 August 2017; pp. 3697–3701. [ Google Scholar ]
  • Chen, D.; Mei, J.P.; Wang, C.; Feng, Y.; Chen, C. Online knowledge distillation with diverse peers. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3430–3437. [ Google Scholar ]
  • Zhang, H.; Chen, D.; Wang, C. Confidence-aware multi-teacher knowledge distillation. In Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 4498–4502. [ Google Scholar ]
  • Kwon, K.; Na, H.; Lee, H.; Kim, N.S. Adaptive knowledge distillation based on entropy. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 7409–7413. [ Google Scholar ]
  • Du, S.; You, S.; Li, X.; Wu, J.; Wang, F.; Qian, C.; Zhang, C. Agree to disagree: Adaptive ensemble knowledge distillation in gradient space. Adv. Neural Inf. Process. Syst. 2020 , 33 , 12345–12355. [ Google Scholar ]
  • Li, L. Self-regulated feature learning via teacher-free feature distillation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 347–363. [ Google Scholar ]
  • Dong, P.; Li, L.; Wei, Z. Diswot: Student architecture search for distillation without training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 11898–11908. [ Google Scholar ]
  • Liu, X.; Li, L.; Li, C.; Yao, A. Norm: Knowledge distillation via n-to-one representation matching. arXiv 2023 , arXiv:2305.13803. [ Google Scholar ] [ CrossRef ]
  • Li, L.; Liang, S.N.; Yang, Y.; Jin, Z. Teacher-free distillation via regularizing intermediate representation. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [ Google Scholar ]
  • Li, L.; Dong, P.; Wei, Z.; Yang, Y. Automated knowledge distillation via monte carlo tree search. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 17413–17424. [ Google Scholar ]
  • Li, L.; Dong, P.; Li, A.; Wei, Z.; Yang, Y. Kd-zero: Evolving knowledge distiller for any teacher–student pairs. Adv. Neural Inf. Process. Syst. 2023 , 36 , 69490–69504. [ Google Scholar ]
  • Buciluǎ, C.; Caruana, R.; Niculescu-Mizil, A. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006; pp. 535–541. [ Google Scholar ]
  • Romero, A.; Ballas, N.; Kahou, S.E.; Chassang, A.; Gatta, C.; Bengio, Y. Fitnets: Hints for thin deep nets. arXiv 2014 , arXiv:1412.6550. [ Google Scholar ] [ CrossRef ]
  • Yim, J.; Joo, D.; Bae, J.; Kim, J. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4133–4141. [ Google Scholar ]
  • Zagoruyko, S.; Komodakis, N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv 2016 , arXiv:1612.03928. [ Google Scholar ] [ CrossRef ]
  • Tian, Y.; Krishnan, D.; Isola, P. Contrastive Representation Distillation. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [ Google Scholar ]
  • Yuan, F.; Shou, L.; Pei, J.; Lin, W.; Gong, M.; Fu, Y.; Jiang, D. Reinforced multi-teacher selection for knowledge distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 14284–14291. [ Google Scholar ]
  • Ahn, S.; Hu, S.X.; Damianou, A.; Lawrence, N.D.; Dai, Z. Variational information distillation for knowledge transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9163–9171. [ Google Scholar ]
  • Yang, J.; Martinez, B.; Bulat, A.; Tzimiropoulos, G. Knowledge distillation via softmax regression representation learning. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, Austria, 3–7 May 2021. [ Google Scholar ]
  • Chen, D.; Mei, J.; Zhang, Y.; Wang, C.; Wang, Z.; Feng, Y.; Chen, C. Cross-Layer Distillation with Semantic Calibration. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 7028–7036. [ Google Scholar ]

Click here to enlarge figure

TeacherVGG13ResNet32x4ResNet32x4WRN-40-2WRN-40-2ResNet20x4ARI (%)
75.17 ± 0.1879.31 ± 0.1479.31 ± 0.1476.62 ± 0.2676.62 ± 0.2678.632 ± 0.24
Ensemble77.0781.1681.1679.6279.6280.81
StudentVGG8MobileNetV2VGG8MobileNetV2WRN-40-1ShuffleNetV1/
70.74 ± 0.4065.64 ± 0.1970.74 ± 0.4065.64 ± 0.1971.93 ± 0.2271.70 ± 0.43
AVER [ ]73.98 ± 0.1368.42 ± 0.0673.23 ± 0.3569.67 ± 0.0174.56 ± 0.1375.73 ± 0.0249.97%
AEKD-logits [ ]73.82 ± 0.0968.39 ± 0.1373.22 ± 0.2969.56 ± 0.3474.18 ± 0.2575.93 ± 0.3254.87%
FitNet-MKD [ ]74.05 ± 0.0768.46 ± 0.4973.24 ± 0.2469.29 ± 0.4274.95 ± 0.3075.98 ± 0.0646.97%
AEKD-feature [ ]73.99 ± 0.1568.18 ± 0.0673.38 ± 0.1669.44 ± 0.2574.96 ± 0.1876.86 ± 0.0343.16%
CA-MKD [ ]74.27 ± 0.1669.19 ± 0.0475.08 ± 0.0770.87 ± 0.1475.27 ± 0.2177.19 ± 0.4911.98%
BOMD74.90 ± 0.0769.88 ± 0.0475.86 ± 0.1871.56 ± 0.0375.78 ± 0.1177.98 ± 0.35/
TeacherResNet32x4WRN-40-2WRN-40-2
79.31 ± 0.1476.62 ± 0.2676.62 ± 0.26
StudentMobileNetV2MobileNetV2WRN-40-1
65.64 ± 0.1965.64 ± 0.1971.93 ± 0.22
KD [ ]67.57 ± 0.1069.31 ± 0.2074.22 ± 0.09
AT [ ]67.38 ± 0.2169.18 ± 0.3774.83 ± 0.15
VID [ ]67.78 ± 0.1368.57 ± 0.1174.37 ± 0.22
CRD [ ]69.04 ± 0.1670.14 ± 0.0674.82 ± 0.06
SRRL [ ]68.77 ± 0.0669.44 ± 0.1374.60 ± 0.04
SemCKD [ ]68.86 ± 0.2669.61 ± 0.0574.41 ± 0.16
BOMD69.89 ± 0.1271.45 ± 0.1275.76 ± 0.15
DatasetStanford DogsTiny-ImageNet
TeacherResNet101ResNet34x4ResNet32x4VGG13
68.39 ± 1.4466.07 ± 0.5153.38 ± 0.1149.17 ± 0.33
StudentShuffleNetV2x0.5ShuffleNetV2x0.5MobileNetV2MobileNetV2
59.36 ± 0.7359.36 ± 0.7339.46 ± 0.3839.46 ± 0.38
AVER [ ]65.13 ± 0.1363.46 ± 0.2141.78 ± 0.1541.87 ± 0.11
EBKD [ ]64.28 ± 0.1364.19 ± 0.1141.24 ± 0.1141.46 ± 0.24
CA-MKD [ ]64.09 ± 0.3564.28 ± 0.2043.90 ± 0.0942.65 ± 0.05
AEKD-feature [ ]64.91 ± 0.2162.13 ± 0.2942.03 ± 0.1241.56 ± 0.14
AEKD-logits [ ]65.18 ± 0.2463.97 ± 0.1441.46  ±  0.2841.19 ± 0.23
BOMD65.54 ± 0.1264.67 ± 0.1844.21 ± 0.0444.35 ± 0.12
TeacherResNet5673.47ResNet859.32VGG1171.52
ResNet20x478.39WRN-40-276.51VGG1375.19
VGG1375.19ResNet20x478.39ResNet32x479.31
StudentVGG870.74 ± 0.40ResNet8x472.79 ± 0.14VGG870.74 ± 0.40
FitNet-MKD [ ]75.06 ± 0.1375.21 ± 0.1273.43 ± 0.08
AVER [ ]75.11 ± 0.5775.16 ± 0.1173.59 ± 0.06
EBKD [ ]74.18 ± 0.2275.44 ± 0.2973.45 ± 0.08
AEKD-feature [ ]74.69 ± 0.5773.98 ± 0.1873.40 ± 0.06
AEKD-logits [ ]75.17 ± 0.3073.93 ± 0.1774.15 ± 0.08
CA-MKD [ ]75.53 ± 0.1475.27 ± 0.1874.63 ± 0.17
BOMD76.42 ± 0.1576.49 ± 0.1475.98 ± 0.14
TeacherResNet859.32VGG1171.52ResNet859.32
VGG1171.52ResNet5673.47VGG1171.52
ResNet5673.47VGG1375.19VGG1375.19
VGG1375.19ResNet20x478.39WRN-40-276.51
ResNet32x479.31ResNet32x479.31ResNet20x478.39
StudentVGG870.74 ± 0.40VGG870.74 ± 0.40MobileNetV265.64 ± 0.19
AEKD-feature [ ]74.02 ± 0.0875.06 ± 0.0369.41 ± 0.21
AVER [ ]74.47 ± 0.4774.48 ± 0.1269.41 ± 0.04
AEKD-logits [ ]73.53 ± 0.1074.90 ± 0.1769.28 ± 0.21
EBKD [ ]74.37 ± 0.0773.94 ± 0.2969.26 ± 0.64
CA-MKD [ ]74.64 ± 0.2375.02 ± 0.2170.30 ± 0.51
BOMD75.56 ± 0.3475.32 ± 0.1371.46 ± 0.26
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Gong, S.; Wen, W. Bi-Level Orthogonal Multi-Teacher Distillation. Electronics 2024 , 13 , 3345. https://doi.org/10.3390/electronics13163345

Gong S, Wen W. Bi-Level Orthogonal Multi-Teacher Distillation. Electronics . 2024; 13(16):3345. https://doi.org/10.3390/electronics13163345

Gong, Shuyue, and Weigang Wen. 2024. "Bi-Level Orthogonal Multi-Teacher Distillation" Electronics 13, no. 16: 3345. https://doi.org/10.3390/electronics13163345

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. (PDF) Research on Neural Style Transfer

    neural style transfer research paper

  2. (PDF) Neural Style Transfer with Content Discrimination

    neural style transfer research paper

  3. Neural style transfer: an image transformer framework

    neural style transfer research paper

  4. (PDF) Neural Style Transfer: A Review

    neural style transfer research paper

  5. (PDF) Structure-Preserving Neural Style Transfer

    neural style transfer research paper

  6. (PDF) A Review on Neural Style Transfer

    neural style transfer research paper

COMMENTS

  1. [1705.04058] Neural Style Transfer: A Review

    Neural Style Transfer: A Review. Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Yizhou Yu, Mingli Song. View a PDF of the paper titled Neural Style Transfer: A Review, by Yongcheng Jing and 5 other authors. The seminal work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNNs) in creating artistic imagery by ...

  2. Neural Style Transfer: A Review

    The seminal work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNNs) in creating artistic imagery by separating and recombining image content and style. This process of using CNNs to render a content image in different styles is referred to as Neural Style Transfer (NST). Since then, NST has become a trending topic both in academic literature and industrial ...

  3. (PDF) Neural Style Transfer: A Critical Review

    In this paper, the current progress in Neural Style Transfer with all related aspects such as still images and videos is presented critically.

  4. Neural Style Transfer: A Review

    Neural Style Transfer: A Review. The seminal work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNNs) in creating artistic imagery by separating and recombining image content and style. This process of using CNNs to render a content image in different styles is referred to as Neural Style Transfer (NST).

  5. Image neural style transfer: A review

    Introduction. Style transfer is a prominent research area in computer vision, which involves dividing an image into a content image and a style image, and transferring the style of the style image to the content image. This process modifies the texture characteristics of the content image without altering its content, based on the texture ...

  6. PDF Image Style Transfer Using Convolutional Neural Networks

    We introduce A Neural Algorithm of Artistic Style, a new algo-rithm to perform image style transfer. Conceptually, it is a texture transfer algorithm that constrains a texture synthe-sis method by feature representations from state-of-the-art Convolutional Neural Networks. Since the texture model is also based on deep image representations, the ...

  7. Neural Style Transfer: A Review

    This review aims to provide an overview of the current progress towards Neural Style Transfer, as well as discussing its various applications and open problems for future research.

  8. Neural Style Transfer: A Review

    This review aims to provide an overview of the current progress towards Neural Style Transfer, as well as discussing its various ap-plications and open problems for future research.

  9. Neural Style Transfer: A Critical Review

    This article also reviewed the challenges faced in applyingvideo neural style transfer in real-time on mobile devices and presents research gaps with future research directions. NST, a fascinating deep learning application, has considerable research and application potential in the coming years.

  10. Image neural style transfer: A review

    We review the development process from traditional style transfer algorithms to convolutional neural network-based ones, evaluate their effectiveness and application value, and discuss future research directions and challenges for generative adversarial network-based style transfer algorithms.

  11. Style Transfer

    In this paper, we present a method which combines the flexibility of the neural algorithm of artistic style with the speed of fast style transfer networks to allow real-time stylization using any content/style image pair.

  12. [1705.04058] Neural Style Transfer: A Review

    Abstract The seminal work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNNs) in creating artistic imagery by separating and recombining image content and style. This process of using CNNs to render a content image in different styles is referred to as Neural Style Transfer (NST). Since then, NST has become a trending topic both in academic literature and industrial ...

  13. PDF Exploring Style Transfer: Extensions to Neural Style Transfer

    The process of applying different styles to the same semantic image content through convo-lutional neural networks is known as Neural Style Transfer. In this paper, we explore several extensions and improve-ments to the original neural style transfer algorithm.

  14. IOPscience

    Abstract. Researchers have successfully applied the convolutional neural network (CNN) to style transfer. Since then, Neural Style Transfer (NST) has received widespread attention in both scientific and industrial fields. Researchers in the field of machine vision are constantly proposing ways to optimize image style migration. This paper aims to summarize the history of style transfer before ...

  15. A Review on Neural Style Transfer

    When using neural network, this method is referred as Neural Style Transfer (NST), which is a hot topic in the field of image processing and video processing.

  16. [PDF] Neural Style Transfer: A Review

    This paper generalizes the neural algorithm for style transfer from two perspectives: where to transfer and what to transfer, and proposes a simple yet effective strategy, named masking out, to constrain the transfer layout.

  17. Neural Style Transfer: A Review

    The seminal work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNNs) in creating artistic imagery by separating and recombining image content and style. This process of using CNNs to render a content image in different styles is referred to as Neural Style Transfer (NST). …

  18. ycjing/Neural-Style-Transfer-Papers

    Neural-Style-Transfer-Papers Selected papers, corresponding codes and pre-trained models in our review paper "Neural Style Transfer: A Review" [arXiv Version] [IEEE Version] The corresponding OSF repository can be found at: https://osf.io/f8tu4/. If I missed your paper in this review, please email me or just pull a request here.

  19. Neural Style Transfer

    Neural Style Transfer Research paper A Neural Algorithm of Artistic Style — by Leon A. Gatys, Alexander S. Ecker, Matthias Bethge In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image.

  20. Neural style transfer

    Neural style transfer is an optimization technique used to take two images—a content image and a style reference image (such as an artwork by a famous painter)—and blend them together so the output image looks like the content image, but "painted" in the style of the style reference image. This is implemented by optimizing the output ...

  21. 2024 Conference

    The Neural Information Processing Systems Foundation is a non-profit corporation whose purpose is to foster the exchange of research advances in Artificial Intelligence and Machine Learning, principally by hosting an annual interdisciplinary academic conference with the highest ethical standards for a diverse and inclusive community.

  22. (PDF) Research on Neural Style Transfer

    The process of using CNN (Convolutional Neural Network) to blend the contents of a picture with different styles is called neural style transfer (NST). The purpose of this paper is to introduce ...

  23. [1508.06576] A Neural Algorithm of Artistic Style

    Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images. Moreover, in light of the striking similarities ...

  24. References

    References provide the information necessary for readers to identify and retrieve each work cited in the text. Consistency in reference formatting allows readers to focus on the content of your reference list, discerning both the types of works you consulted and the important reference elements with ease.

  25. Enhancing Offline Signature Verification via Transfer ...

    This paper presents a brief overview of signature identification and verification systems based on transfer learning. Different databases, namely CEDAR, ICDAR-2011, and BHSig260, are utilized for this study. In the field of biometrics and forensics, automated signature verification plays a crucial role in validating a person's authenticity. The signature can be offline (handwritten) or ...

  26. Ieee Transactions on Artificial Intelligence, Vol. 00, No. 0, Month

    compiled research papers focused on symbolic knowledge, as ... signed to mimic the conversational style of a psychotherapist. SHRDLU[27], introduced in 1968, was an early example of ... work compression via factor transfer," Advances in neural information processing systems, vol. 31, 2018. [55]M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C ...

  27. Bi-Level Orthogonal Multi-Teacher Distillation

    Multi-teacher knowledge distillation is a powerful technique that leverages diverse information sources from multiple pre-trained teachers to enhance student model performance. However, existing methods often overlook the challenge of effectively transferring knowledge to weaker student models. To address this limitation, we propose BOMD (Bi-level Optimization for Multi-teacher Distillation ...