404 Not found

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Measuring novelty in science with word embedding

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Validation, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliations School of Economics and Management, Lund University, Lund, Sweden, Institute for Future Initiative, The University of Tokyo, Tokyo, Japan, National Institute of Science and Technology Policy, Tokyo, Japan

ORCID logo

Roles Data curation, Writing – original draft, Writing – review & editing

Affiliations School of Economics and Management, Harbin Institute of Technology, Shenzhen, China, World Intellectual Property Organization, Geneva, Switzerland

Roles Data curation, Formal analysis, Writing – review & editing

Affiliation National Institute of Science and Technology Policy, Tokyo, Japan

  • Sotaro Shibayama, 
  • Deyun Yin, 
  • Kuniko Matsumoto

PLOS

  • Published: July 2, 2021
  • https://doi.org/10.1371/journal.pone.0254034
  • Reader Comments

Fig 1

Novelty is a core value in science, and a reliable measurement of novelty is crucial. This study proposes a new approach of measuring the novelty of scientific articles based on both citation data and text data. The proposed approach considers an article to be novel if it cites a combination of semantically distant references. To this end, we first assign a word embedding –a vector representation of each vocabulary–to each cited reference on the basis of text information included in the reference. With these vectors, a distance between every pair of references is computed. Finally, the novelty of a focal document is evaluated by summarizing the distances between all references. The approach draws on limited text information (the titles of references) and publicly shared library for word embeddings, which minimizes the requirement of data access and computational cost. We share the code, with which one can compute the novelty score of a document of interest only by having the focal document’s reference list. We validate the proposed measure through three exercises. First, we confirm that word embeddings can be used to quantify semantic distances between documents by comparing with an established bibliometric distance measure. Second, we confirm the criterion-related validity of the proposed novelty measure with self-reported novelty scores collected from a questionnaire survey. Finally, as novelty is known to be correlated with future citation impact, we confirm that the proposed measure can predict future citation.

Citation: Shibayama S, Yin D, Matsumoto K (2021) Measuring novelty in science with word embedding. PLoS ONE 16(7): e0254034. https://doi.org/10.1371/journal.pone.0254034

Editor: Alessandro Muscio, Universita degli Studi di Foggia, ITALY

Received: February 15, 2021; Accepted: June 17, 2021; Published: July 2, 2021

Copyright: © 2021 Shibayama et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files .

Funding: S.S. received a research grant from Lars Erik Lundberg Foundation ( https://www.lundbergsstiftelserna.se ) and Japan Society for the Promotion of Science (19K01830, https://www.jsps.go.jp/english/index.html ). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Novelty constitutes a core value in science, as new discoveries shape the basis of scientific advancement [ 1 , 2 ] and has broader impact on technological innovation [ 3 ]. Accordingly, novelty serves as a key criterion for the evaluation of scientific output as well as decision makings such as funding allocation, employment, and scientific awards [ 1 , 4 – 6 ]. It is therefore critical that scientific novelty can be reliably measured. In practice, novelty is usually assessed through peer review on a small scale [ 7 ], while evaluating novelty on a larger scale remains to be a challenge. Though recent bibliometric techniques have enabled us to measure various qualities of scientific discoveries, including novelty [ 8 – 11 ], the validity and practical utility of the extant measures are debatable [ 12 , 13 ].

Previous bibliometric measures for the novelty of scientific documents draw on roughly two data sources, either citation data or text data. Text data are of obvious use, in that once a scientific discovery is documented, its novelty should be revealed in text information. Nonetheless, due to the ambiguity and complexity of natural languages, previous measures use text data rather superficially without sufficiently exploiting the semantic information [e.g., 14 ]. It is relatively recently that such semantic information got extracted from text data and translated into bibliometric indices [e.g., 15 ]. To circumvent the technical challenges in extracting semantic information from text data, citation data have been extensively utilized in previous novelty measures. As a citation represents information flow from a cited document to a citing document, it can be used to infer certain qualities, including novelty, of a document without scrutinizing the content [ 10 , 16 ]. However, the validity of this approach has been occasionally questioned [ 12 ]. In fact, insufficient validation has been a limitation common to most novelty measures [ 17 ]. Furthermore, a practical limitation common to previous measures is that they require access to a large-scale bibliometric database (often the whole universe of scientific documents), which are usually proprietary and expensive, and high computational power, which potential users of the measures do not always have.

To address previous limitations, we propose a new approach to compute the novelty of scientific documents by combining both citation and text data (see Fig 1 ). Our approach features recombinant novelty [ 18 – 21 ], considering a document to be novel if it cites a combination of semantically distant documents. This is in line with the previous measures based on citation data [e.g., 8 ]. Unlike previous measures, however, we use text data to quantify the distances between cited documents. Specifically, based on the text information included in cited documents, we map each document to a word embedding –a high-dimensional vector assigned to each vocabulary [ 22 ]–with which to compute distances between cited documents. To the best of our knowledge, this is the first to use the word-embedding technique to measure the novelty of scientific documents.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0254034.g001

For text information, we test three sources–the abstract, keywords, and the title of cited documents–finding all satisfactory performance. Of the three sources, titles of cited documents are often included in the focal document itself, and the burden of data access is minimized. As a library of word embeddings, we draw on scispaCy [ 23 ], which is publicly available and thus significantly reduces the computational cost. We publicly share the code [ 24 ], with which one can compute the novelty score of a document only with the focal document’s reference list.

We validate the proposed measure in three exercises. First, we confirm that word embeddings from the selected library can be used to quantify semantic distances between documents by comparing with an established bibliometric distance measure. Second, we test the criterion-related validity of the proposed novelty measure based on self-reported novelty scores collected from a questionnaire survey. Third, as novelty is known to be a predictor of future citation impact [ 8 , 11 ], we test whether the proposed measure is correlated with future citation.

This paper is structured as follows. In the next section, we categorize previous novelty measures and discuss their characteristics and limitations. The following section describes our proposed measure and outlines its operationalization. Then, we present the methods and data for the validation exercises. Finally, we present the results and conclude.

Literature review

Previous bibliometric measures for novelty can be categorized based on their conceptualization and operationalization ( Table 1 ). Conceptually, some measures aim to represent the uniqueness of a certain knowledge element (Groups 1 and 4)–for example, a discovery of a new molecule and development of a new material. In contrast, other measures aim to capture a recombination of knowledge elements (Groups 2 and 3), in which a new or rare combination of knowledge is considered to be a sign of novelty. The notion of recombination as a source of novelty has been widely discussed in the literature. The creativity literature argues that associating remote elements is a path to creative solution in general as well as in science [ 18 , 19 ], and the management literature suggests that combining components is a major route to technological innovation [ 20 , 21 ].

thumbnail

https://doi.org/10.1371/journal.pone.0254034.t001

For operationalization, a group of measures exploits citation information to assess novelty indirectly (Group 3), and the other draws on text analysis to assess the content of documents (Groups 1, 2, and 4). Among the latter, the majority uses text information only superficially without using the semantic information of the text (Groups 1 and 2), but recent measures attempt to extract semantic information (Group 4). Studies on novelty measures have been relatively advanced in technology management, in which a patent is used as a unit of document [e.g., 16 , 25 ]. We also refer to these measures because the key idea behind the measures is applicable to scientific documents. In what follows, we discuss four groups of previous measures.

(1) A new word

The first group of novelty measures is based on the first appearance of a word(s) that appears in a document [ 14 , 25 ]. If a document includes or is associated with a certain word or a sequence of words that is new to the world, it can be inferred that the document delivers novel information. For example, if a document contains a previously unknown chemical compound, it suggests that the document is novel. In this category, Azoulay et al . [ 14 ] drew on Medical Subject Heading (MeSH), a controlled keyword dictionary, and operationalized the novelty of a journal article based on the average age of keywords (the number of years since its first appearance). Balsmeier et al . [ 26 ] and Arts et al . [ 25 ] also identified novel inventions based on the first occurrence of a word as well as a sequence of words (bigram and trigram) in patent documents.

(2) Recombination of words

The second group is technically similar to the first group but conceptually different as it is to measure "recombinant" novelty [ 19 , 20 ]. When a document includes a rare combination of knowledge elements, even if each element has been known, the document can be considered to be novel. In this category, Boudreau et al . [ 9 ] measured the novelty of a research grant proposal based on a new combination of MeSH keywords. Similarly, drawing on a controlled dictionary of patent classifications, Verhoeven et al . [ 27 ] measured recombinant novelty by a new combination of IPC codes assigned to the patent. Arts et al . [ 25 ] also measured the novelty of a patent based on a new combination of two words that appeared in the patent.

The first and second groups are intuitively straightforward but have some limitations. Among others, these measures largely disregard semantic information included in text data. For example, the first group may consider a new synonym of an existing concept to be novel, unless controlled dictionaries are available. Similarly, the second group may consider any recombination equally novel regardless of the semantic distance between combined elements.

(3) Recombination of cited documents

The third group also measures recombinant novelty, but instead of using text information, it draws on citation information. A document citing another document implies that knowledge in the latter is used by the former [ 28 ]. Thus, a document can be characterized by its cited documents, by considering each of cited documents to be a knowledge element that is incorporated into the citing document. Based on the recombinant novelty concept [ 18 , 19 ], a document citing a set of documents that have rarely been cited together can be considered as a sign of novelty. In contrast to the first and second groups, in which a single word is considered a representation of knowledge, considering a cited document as a knowledge element adds semantic richness, making this approach popular in previous studies.

In this group, Dahlin and Behrens [ 16 ] proposed a novelty measure of patents based on a rare combination of cited references. Trapido [ 10 ] applied the same approach to journal articles, specifically in the field of electrical engineering. This approach is extended by Matsumoto et al . [ 17 ] so that it is applicable in any scientific field. A variation of this approach is to draw on journals in which cited documents are published [ 8 , 11 ]. That is, if a focal document cites documents in two journals that have rarely been cited together, it is considered as a sign of novelty. This approach thus consolidates the unit of knowledge further at the journal level. Though considering a document or a journal as a unit of knowledge, without needing to scrutinize the content of documents, is convenient, its validity is under dispute [ 12 , 13 ].

(4) A distant text

The last group quantifies the uniqueness of a document based on text analysis, and relies on more recent development of natural language processing (NLP) to extract semantic information. In particular, drawing on the word embedding technique, Hain et al . [ 15 ] proposed a measure of patent novelty. Word embeddings map each word to a high-dimensional vector (i.e., a list of numbers). It allows us to quantify a semantic relationship between a pair of words by calculating the distance between the vectors–i.e., similar words have close vectors while dissimilar words have remote vectors. Hain et al . [ 15 ] assigned a vector to each patent by aggregating the vectors for a set of words that appear in the patent. Then, they calculated a distance between every pair of patents, with which a patent remote from any other patent is considered to be novel.

Proposed measure of novelty

Measuring novelty with word embedding.

As a new approach, we propose to measure recombinant novelty of scientific documents by applying the combination of the word embedding technique and citation analysis. We consider a cited document as an appropriate unit of knowledge input, as in Group 3. Unlike the previous measures, which disregard the content of cited documents, we draw on the word embedding technique to extract semantic information in cited documents.

The word embedding technique often draws on machine learning algorithms (e.g., word2vec) to calculate a vector representation for each word based on the co-occurrences of words in a text corpus [ 22 ]. The approach is gaining confidence as the performance of machine learning has been improving, and has been recently applied to scientific documents for various purposes. For example, Tshitoyan et al . [ 29 ] captures the knowledge structure in the extant literature in material sciences with which they predict future scientific discoveries in the field. Still, to the best of our knowledge, the technique has not been used to measure the novelty of scientific documents.

Although computing word embeddings is demanding, some algorithms are publicly available, and some well-trained word embedding models (a list of vectors for a set of vocabularies) are also publicly accessible [ 30 ]. In this study, we use scispaCy as an established and publicly available library of word embeddings. ScispaCy builds on a popular spaCy model [ 30 ] and offers vector representations in a 200-dimensional vector space for 600,000 vocabularies specializing in biomedical texts [ 23 , 31 ].

Operationalization

With the selected word embedding library and citation information, the novelty of a document is computed through the following steps ( Fig 1 ). Suppose that a focal document cites N references, and that each of the cited references has some text information. One can use various sources of text information, such as the full text and the abstract. In the following analysis, we construct respective measures from three text sources: the abstract, keywords, and the title of cited documents. Of the three sources, we intend to propose primarily using the title to minimize data requirement and maximize the utility of the measure.

Step 1. First, we vectorize the text information of the i -th reference as v i ∈ℝ 200 ( i ∈{1,…, N }). Since the text information includes multiple words, v i is calculated as the mean of word embeddings of all words included.

novelty of a research paper

The cosine distance ranges from 0 to 2, where a larger value indicates a larger distance.

novelty of a research paper

Computational cost

The aforementioned previous measures of novelty require extensive data access and processing. Text-based approaches ( Table 1 , Groups 1, 2, and 4) require the entire history of word uses, and citation-based approaches ( Table 1 , Group 3) need comprehensive citation network data. This poses two practical challenges for potential users of the novelty measures. First, the required data are usually proprietary, and thus, literally expensive. Second, processing the massive data takes high computational power. Not all users have such rich resources, compromising the utility of the measures.

Our proposed approach addresses these issues and aims to allow anyone to compute and use the novelty measures. Our measure requires only limited data access and little need for proprietary data. The measure can be computed only with the titles of a focal document’s cited references, which is often included in the focal document itself, and a publicly available library of word embeddings. The approach requires only small data processing. Unlike previous measures, our approach does not require extensive citation network analysis unlike Group 3, nor comparison with the whole document universe unlike Group 4. With the publicly shared code, anyone can compute the measure.

Methods and data

Previous novelty measures have been rarely validated with a few exceptions [ 17 ]. To confirm the validity of our proposed measure, we carry out three exercises. The primary analysis is to test the criterion-related validity based on self-reported novelty scores for selected documents. As a preparatory step to this main analysis, we test whether scispaCy word embeddings can be indeed used to measure distances between documents (corresponding to Step 2). Finally, since novelty is known as a predictor of future citation impact [ 8 , 11 ], we run regression analyses to test whether our proposed measure is positively associated with future citation.

To compute the proposed measures, we downloaded bibliometric information from Web of science (WoS). Since scispaCy specializes in the vocabularies in biomedicine, we focus on documents within relevant Subject Categories [ 32 ]. We focus on "article" as a document type and documents written in "English" [ 33 ]. We employ different sets of random samples for each analysis as detailed below.

Validation of distance

Before validating the novelty measure itself, we test if scispaCy word embeddings convey semantic information of a text, and that they can assess the distance between a pair of documents. To this end, we compute distances of pairs of documents in two approaches–one based on scispaCy word embeddings and the other with a previously established approach–and confirm that the two are sufficiently correlated.

novelty of a research paper

For this analysis, we employed the following sampling strategy. First, we randomly sampled 100 authors in the field of biomedicine. Then, we collected all documents authored by these authors [ 34 ]. Finally, we filtered out documents outside of the biomedical field as well as documents missing reference information, resulting in 1,600 documents (16 documents per author on average). We compute the distance measures between documents written by the same author (i.e., we do not compare documents written by different authors). This is because co-citation is rare between a randomly chosen pair of documents written by different authors, which spuriously inflates the correlation.

Validation of novelty

After confirming that the scispaCy word embeddings carry semantic information of text, we test the criterion-related validity of the proposed novelty measure ( Eq 2 ). To this end, we draw on self-reported novelty scores, which we obtained from a questionnaire survey we conducted in 2009–2010 [ 35 , 36 ]. The survey was responded by 2,081 scientists from various scientific fields, of whom this study draws on a subset of 321 respondents in biomedical fields.

The survey included a wide range of questionnaire items, one section of which asked the respondents to assess a randomly selected journal article that they published in 2001–2006. This section includes eight items to characterize the finding reported in the article ( Table 2 ). As novelty is a multifaceted concept [ 37 ], the survey incorporated four aspects (theory, phenomenon, method, and material) in which the article may make scientific contribution. For each aspect, the survey further included two items, one indicating newness and the other indicating improvement over existing literature. We expect that the proposed measure should be correlated more with the newness items but less with the improvement items. Each item was responded in a 5-point scale (1: not relevant at all—5: highly relevant).

thumbnail

https://doi.org/10.1371/journal.pone.0254034.t002

novelty of a research paper

Prediction of future citation

novelty of a research paper

For this analysis, we randomly sampled 2,000 articles published in biomedicine fields in 2010, and evaluated their citation impact as of 2020 (10 years after publication). We oversampled top-1% cited articles, so that the final sample consists of approximately 1,000 top-1% cited articles and 1,000 non-top-1% cited articles.

Description of the measure

novelty of a research paper

The same sample for the third validation study (prediction of future citation) is used, except that oversampled highly-cited documents are excluded. The 947 selected documents include in total approximately 230,000 combinations of cited references, for which the distance ( Eq 1 ) is computed (A). The distances are summarized at the focal document level ( Eq 2 ), and Novel 100 is displayed as an example (B). Novelty measures with different q values are illustrated in S1 Appendix . Since abstracts and keywords are not available for all documents, the sample sizes are smaller.

https://doi.org/10.1371/journal.pone.0254034.g002

novelty of a research paper

https://doi.org/10.1371/journal.pone.0254034.t003

novelty of a research paper

Table 4 reports the correlation between the series of the proposed bibliometric measures (on the vertical axis) and the self-reported questionnaire scores (on the horizontal axis). On top of the eight scores from the questionnaire, we added two summary scores by taking the mean of the four newness scores (Column 9) and the mean of the four improvement scores (Column 10) respectively. We expect that our proposed measure should be correlated with the newness scores (Columns 1, 3, 5, 7, and 9) rather than the improvement scores (Columns 2, 4, 6, 8, and 10). Focusing on the newness summary score (Column 9), Fig 3 illustrates the correlation coefficients with novelty measures from three different text sources and with different q values.

thumbnail

Pearson’s correlation coefficient. Novel q ( q ∈{100,99,95,90,80,50}) is correlated with the mean of four self-reported newness scores (Column 9 in Table 4 ). † p<0.1, *p<0.05, **p<0.01, ***p<0.001.

https://doi.org/10.1371/journal.pone.0254034.g003

thumbnail

https://doi.org/10.1371/journal.pone.0254034.t004

novelty of a research paper

https://doi.org/10.1371/journal.pone.0254034.t005

novelty of a research paper

The probability of a focal document falling within the top 1 percentile is predicted. For easier interpretation and comparison, the horizontal axis takes the percentile of the novelty measures. (A) based on Row 1 in Table 5 . (B) and (C) based on curvilinear models incorporating the quadratic term of the novelty measures ( S1 Appendix ).

https://doi.org/10.1371/journal.pone.0254034.g004

novelty of a research paper

We find that adding the quadratic term increases a model fit for the novelty measures with smaller q ’s. Fig 4B and 4C illustrate the curvilinear associations for Novel 90 and Novel 50 , showing that the optimal level of novelty scores decreases for lower q ’s. This also suggests that a document with too many recombinations does not attract citation.

Alternative measure of recombination within a document

Although the proposed measure utilizes recombination between cited documents, it is plausible to find recombination within a focal document itself. By decomposing the text information (the title, the abstract, or keywords) of a focal document into words, assigning word embeddings to them, and measuring the distance of every pair of words, we additionally constructed similar sets of novelty measures. This is in line with a category of previous measures [ 25 ] except that we use word embeddings to compute word distances.

We tested the validity of this additional set of measures for the correlation with self-reported novelty as well as for the prediction of future citation ( S1 Appendix ). The result is overall unsatisfactory. Correlations with the self-reported scores are mostly insignificant and sometimes negatively significant. Similarly, correlations with future citation impact are insignificant or negatively significant. Thus, the proposed approach to quantify recombinant novelty does not work with the text information within a focal document itself. This contrasts with the previous measures of recombination within a document [ 9 , 25 ], which may be attributable to a different operationalization that the previous measures are based on the first appearance of a combined use of two words rather than their distance.

Discussion and conclusion

Novelty is a core value in science [ 1 , 2 ], and thus, a reliable approach to measure the novelty of scientific documents in a large scale is crucial. This study is the first to propose measuring the recombinant novelty of scientific documents based on the word-embedding technique. Most previous measures for recombinant novelty in science have been based solely on citation data [ 8 , 10 , 11 , 16 , 17 ]. Although citation network data is an effective tool to indirectly retrieve semantic information, recent advancement in text analysis allows us to extract it more directly and possibly more accurately [ 39 , 40 ]. Combining citation data and text data, we provide a well-validated and user-friendly measure of scientific novelty.

One limitation common to most previous measures is insufficient validation [ 17 ]. To address this issue, we investigated our proposed measure from multiple angles. First, we show that the word embeddings, with which the novelty measure is computed, can be used to gauge the distance between scientific documents. Second, the novelty measures are significantly positively correlated with self-reported scores for various dimensions of newness but not with those for improvement, suggesting that the proposed measure can distinguish novel discoveries from mere improvements. Third, the novelty measure is found to be a significant predictor of citation impact in 10 years. Overall, these results confirm the validity of the proposed measure.

We examined several variations of novelty measures. First, we tested different percentile values ( q ) in aggregating the distance scores across all pairs of cited references. The result shows greater performance with higher q ’s both in the correlation with self-reported novelty measures and in the prediction of future citation. Thus, the novelty of scientific documents is determined by a small number of distant recombination. This contrasts with the previous recombinant novelty measures based on more average distances [ 9 ].

novelty of a research paper

Another limitation common to previous measures is their computational cost for expensive data access as well as processing of massive data. Many potential users of the novelty measure cannot afford to it, which has substantially compromised the utility of the measures and delayed the progress of studies on scientific novelty. Our proposed approach overcomes these challenges. Drawing on limited text information (titles of cited references) and publicly shared library of word embeddings (scispaCy), our approach minimizes data access requirement as well as computational cost. Using the shared code, one can compute the novelty score of a document of interest only with the reference list of the document. Thus, we encourage the application of the approach for various purposes.

The approach has two limitations that future work needs to address. First, it depends on publicly available word-embedding libraries. ScispaCy specializes in biomedicine. Similar libraries are available in some fields but not in others, in which one needs to start with computing word embeddings. When a different library is used, the external validity of our approach needs to be tested. Second, we disregard the time dependency of word embeddings. The semantic distances between words change over time. Iterated computation of word embeddings may be required, for example, when novelty scores across different time points are to be compared.

Supporting information

S1 appendix. supplementary analysis..

https://doi.org/10.1371/journal.pone.0254034.s001

S1 Dataset.

https://doi.org/10.1371/journal.pone.0254034.s002

S2 Dataset.

https://doi.org/10.1371/journal.pone.0254034.s003

S3 Dataset.

https://doi.org/10.1371/journal.pone.0254034.s004

S4 Dataset.

https://doi.org/10.1371/journal.pone.0254034.s005

  • 1. Merton RK. Sociology of science. Chicago: University of Chicago Press; 1973.
  • View Article
  • Google Scholar
  • 4. Storer N. The social system of science. New York, NY: Holt, Rinehart and Winston; 1966. https://doi.org/10.1126/science.153.3740.1080 pmid:17737583
  • 7. Chubin DE, Hackett EJ. Peerless science: peer review and U.S. science policy. Albany, N.Y.: State University of New York Press; 1990. xiii, 267 p. p.
  • PubMed/NCBI
  • 17. Matsumoto K, Shibayama S, Kang B, Igami M. A validation study of knowledge combinatorial novelty. Tokyo: NISTEP; 2020.
  • 23. Neumann M, King D, Beltagy I, Ammar W, editors. ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. Proceedings of the 18th BioNLP Workshop and Shared Task; 2019 aug; Florence, Italy: Association for Computational Linguistics.
  • 24. A python code is found online [ https://github.com/DeyunYinWIPO/Novelty/ ].
  • 30. Honnibal M, Montani I. spaCy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing 2017 [Available from: https://github.com/explosion/spaCy .
  • 31. We draw on the "en_core_sci_lg" model.
  • 32. Allergy; Anatomy & Morphology; Anesthesiology; Cardiovascular System & Cardiology; Dentistry, Oral Surgery & Medicine; Dermatology; Emergency Medicine; Gastroenterology & Hepatology; General & Internal Medicine; Geriatrics & Gerontology; Hematology; Infectious Diseases; Integrative & Complementary Medicine; Neurosciences & Neurology; Obstetrics & Gynecology; Oncology; Ophthalmology; Orthopedics; Otorhinolaryngology; Pathology; Pediatrics; Pharmacology & Pharmacy; Psychiatry; Radiology, Nuclear Medicine & Medical Imaging; Rehabilitation; Respiratory System; Rheumatology; Surgery; Toxicology; Transplantation; Tropical Medicine; Urology & Nephrology.
  • 33. ScispaCy is available in other languages, so the proposed approach is applicable to non-English articles.
  • 34. To disambiguate the author identity, we rely on the author IDs that the WoS algorithm estimated. The precision of the algorithm is debatable, but it is of a limited concern for the purpose of this analysis.
  • 35. Nagaoka S, Igami M, Eto M, Ijichi T. Knowledge creation process in science: Basic findings from a large-scale survey of researchers in Japan. IIR Working Paper2010.
  • 36. 7,562 published articles were randomly sampled for this survey. The response rate was 27%.
  • 39. Gentzkow M, Kelly BT, Taddy M. Text as Data. National Bureau of Economic Research, Inc; 2017. https://doi.org/10.3390/data2040038 pmid:30498741

We use cookies on this site to enhance your experience

By clicking any link on this page you are giving your consent for us to set cookies.

A link to reset your password has been sent to your email.

Back to login

We need additional information from you. Please complete your profile first before placing your order.

Thank you. payment completed., you will receive an email from us to confirm your registration, please click the link in the email to activate your account., there was error during payment, orcid profile found in public registry, download history, novelty effect: how to ensure your research ideas are original and new.

  • Charlesworth Author Services
  • 12 January, 2022
What has been is what will be, and what has been done is what will be done, and there is nothing new under the sun. — Ecclesiastes

Novelty can be described as the quality of being new, original or unusual . Novelty in scientific publishing is crucial, because journal editors and peer reviewers greatly prize novel research over and above confirmatory papers or research with negative results . After all, why give precious and limited journal space to something previously reported when authors submit novel, unreported discoveries?

How do you know what constitutes as novel? How can you as an academic author enhance the novelty effect with your research submissions ? Below we explore ideas that will help you maximise the novelty effect in your submissions.

a. New discovery

This comprises research on and reports of completely new discoveries. These can be new chemical elements, planets or other astrological phenomena, new species of flora or fauna, previously undiagnosed diseases, viruses etc. These are things never seen or reported before. Often such new discoveries serve as a seedbed for multiple reports or even completely new avenues of research. Journals prize submissions on new discoveries and often tout them in media reports.

b. The exceptionally rare

Not quite as exciting as new discoveries are reports on things not new, but seen or encountered exceptionally rarely, or not for a long time. An example is the sighting of the rare pink handfish, recently spotted in Australia for the first time in decades. In biomedical publishing , rare case reports of a near-unique condition (such as the separation of conjoined twins) are occasionally published and make the nightly news.

c. New theories

Typically, these papers provide substantial data which supports the novel thesis. Reports of new theories must have rigorous logic and need to stand on clear and well-documented foundations. They can’t be simple flights of theoretical fancy. As with new discoveries, new theories can spawn whole new branches of scientific inquiry.

d. New or significantly improved diagnostic/laboratory techniques

Reports on novel techniques don’t usually receive coverage from the mass media, but can often garner huge numbers of references if the new technique is adopted by the scientific community. Publication-worthy techniques include those which are more efficient, less time-consuming or more reliable than currently existing techniques or diagnostic procedures. Anything that is truly new or improves significantly on an established technique is potentially worthy of publication. In medicine, new surgical techniques are very important, but here’s a tip : try to provide a large prospective case series with long-term follow-up instead of a just a single case report.

e. Existing data combined into new knowledge

There is a profound novelty effect when researchers combine existing data/knowledge into something new. Ideas from disparate, previously unrelated fields of research can lead to completely novel discoveries with untold potential applications. Translational or applied research (particularly in the biomedical sciences) has borne abundant fruit over the last many decades. Translational applications of chemistry and physics to medicine have seen enormous advances in the diagnosis and treatment of numerous diseases.

f. Incremental additions to the literature

Not all research or publications will report on truly novel discoveries; in fact, very few will. But that doesn’t necessarily diminish the novelty effect of your work. The vast majority of published research adds incrementally to what is already known, nudging scientific knowledge forward. The accumulation of incremental discovery leads, over time, to large gains in understanding and knowledge.

How to ensure and verify the novelty effect

Whether your research reports something completely new or furthers an existing field in a new way, you need to make sure the contribution is indeed new.

  • Do your homework : Pore through the literature (in as many languages as possible) to make sure your idea is indeed new, or significantly different enough to be considered new.
  • To the degree possible, provide the ‘idea genealogy’ for your concept : Reference the major sources of those who have come before you. Through references and by describing your thought processes, describe clearly how you came up with the new idea or combination of ideas.
  • Disclose your sources of inspiration and new application : Doing so constitutes academic honesty, gives credit to those upon whose shoulders your research rests and provides intellectual fertiliser for other scientists who may, in turn, be able to build upon your own ideas.

All the best for your (novel) submission!

Maximise your publication success with Charlesworth Author Services.

Charlesworth Author Services, a trusted brand supporting the world’s leading academic publishers, institutions and authors since 1928. 

To know more about our services, visit: Our Services

Share with your colleagues

Related articles.

novelty of a research paper

Continuous Learning 2: Methods to Keep up with the Latest Literature

Charlesworth Author Services 17/11/2021 00:00:00

novelty of a research paper

How to identify Gaps in research and determine your original research topic

Charlesworth Author Services 14/09/2021 00:00:00

novelty of a research paper

Tips for designing your Research Question

Charlesworth Author Services 01/08/2017 00:00:00

Related webinars

novelty of a research paper

Bitesize Webinar: How to write and structure your academic article for publication: Module 5: Conduct a Literature Review

Charlesworth Author Services 04/03/2021 00:00:00

novelty of a research paper

Bitesize Webinar: How to write and structure your academic article for publication: Module 8: Write a strong methods section

Charlesworth Author Services 05/03/2021 00:00:00

novelty of a research paper

Bitesize Webinar: How to write and structure your academic article for publication: Module 9:Write a strong results and discussion section

novelty of a research paper

Bitesize Webinar: How to write and structure your academic article for publication - Module 14: Increase your chances for publication

Charlesworth Author Services 20/04/2021 00:00:00

Writing the paper

novelty of a research paper

How to write an Introduction to an academic article

Charlesworth Author Services 17/08/2020 00:00:00

novelty of a research paper

Strategies for writing the Results section in a scientific paper

Charlesworth Author Services 27/10/2021 00:00:00

novelty of a research paper

Writing an effective Discussion section in a scientific paper

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Anaesth
  • v.67(3); 2023 Mar
  • PMC10220166

Novelty in research: A common reason for manuscript rejection!

Nishant kumar.

Department of Anaesthesia, Lady Hardinge Medical College and Associated Hospitals, New Delhi, India

Zulfiqar Ali

1 Department of Anesthesiology, Sher-i-Kashmir Institute of Medical Sciences, Srinagar, Jammu and Kashmir, India

Rudrashish Haldar

2 Department of Anesthesiology, Sanjay Gandhi Post Graduate Institute of Medical Science, Lucknow, Uttar Pradesh, India

We often hear back from reviewers and editors of scientific journals that a particular manuscript (original research, case report, series or letter to the editor) has not been accepted because it lacks novelty. Though disheartening, the reason for such a response from said reviewers needs proper elucidation, as a moral obligation from the editorial board towards the authors of the manuscripts.

Research, as defined by the Cambridge Dictionary is ‘a detailed study of a subject, especially in order to discover (new) information or reach a (new) understanding’. [ 1 ] Novelty on the other hand is defined as ‘the quality of being new, original, or unusual’ or a ‘new or unfamiliar thing or experience’. Therefore, adding the adjective novel along with research is actually one of the most common redundancies that is similar to ‘return back’ or ‘revert back’ and denotes one and the same thing! [ 1 ]

Without delving into the nitty-gritty of the English language, novel research can be best described as one or more elements of research that are unique, such as a new methodology or a new observation that leads to the acquisition of new knowledge. It is this novelty that contributes to scientific progress. Since the main aim of research is to unravel what is unknown or to challenge views or ideas that may or may not be based on sound scientific principles, this exclusivity of novel research therefore allows us to expand our horizon beyond the realms of known domains. [ 2 ]

Having defined novelty in research, one of the most common mistakes that researchers commit is confusing novelty with originality. These terms are often used interchangeably. Originality implies the genuineness of the work and signifying that the said work has not been copied from any other source. Originality can always be examined by plagiarism checkers, and data is often analysed for duplication or fabrication only if there exists a certain doubt regarding its factuality. A study, therefore, can be mutually exclusive i.e. novel, but not original, or it can be original but not novel. It is the latter that reviewers and editors encounter most often.

The most common scenario encountered in anaesthesia related manuscripts that lacks novelty is the substitution of the same anaesthetic technique to different surgical procedures or patient populations (based on gender or age), with no expected change in the result. Here, the hypothesis and study designs are almost identical; however, the agents are replaced with different ones. A classic example is the comparison of the duration of analgesia with a longer acting analgesic or that of a local anaesthetic with a shorter one. The intrinsic properties of a drug are already well known, and, irrespective of it being an abdominal surgery or a limb surgery, the drugs are going to behave according to their pharmacological properties. Similarly, modern airway devices, such as video laryngoscopes, have conclusively been proven to be better aids than the conventional ones. A comparison of any new laryngoscope would definitely be a novel idea, in terms of whether it outperforms the existing device. If a certain number of studies, systematic reviews, or metanalyses have already been published on that particular device or drug, the study undertaken cannot be considered novel unless the results of the aforementioned study, utilising sound scientific principles, actually challenge or contradict the existing ideas.

Another common scenario faced by the reviewers or editors is the anaesthetic management of common or uncommon syndromes or diseases. They are often well described in literature, but when managed as per the existing guidelines and expected challenges they do not constitute novelty. A case report is novel and worth publishing if an unforeseen or unanticipated event has occurred or the case has been managed in a unique or unconventional manner or significant innovative skills or equipment have been employed. However, due caution has to be exercised as this should not lead the researcher to be overtly adventurous or show undue bravado by going against the principles of patient safety.

Now here lies the contradiction. We have been harping on novelty, introducing new ideas, and challenging old fixed ideas when conducting research and reporting cases. However, at the same time, due caution must be exercised, and one must not to be adventurous, unconventional, or bold. There is a fine line of distinction between these two. Herein comes the role of ethics, a separate topic of discussion altogether.

Research or advancement may not always be novel just by intervention or experimentation. Theoretical or hypothesis testing may also contribute paradigm-changing findings. Some of these may include thought-based experiments, rectifying or logical rearrangement of existing knowledge, re-evaluating space and time, utilising principles of philosophy, and analysing already existing data from a new and different perspective. [ 3 ] A thorough literature search is pivotal for designing a novel research project as it helps to understand known facts and gaps. An attempt at bridging identified research gaps adds to the novelty of the study. [ 2 ]

Another aspect of novel research is technological advancement. Most research starts from an idea, a thought, or an observation that further leads to hypothesis building, experimentation, data collection, analysis, and, finally, principle building. Technological advancement may stem from any of these phases. Novelty in research propels the industry to excel and outdo itself. [ 4 ]

Can novelty in research be measured? The answer is a resounding yes. Traditionally, it has been measured through peer reviews and by applying bibliometric measures such as citation or text data, keeping in mind their inherent limitations. However, word embedding is a new technique that can reliably measure novelty and even predict future citations. However, this is currently limited by publicly available word-embedding libraries and its high costs. [ 5 ]

To the average author and reader, novelty adds to their knowledge and makes them aware of complications that they may encounter. It offers a way out by conventional or different measures, within the realm of scientific, ethical and principles of social justice, should they get stuck, keeping in mind the quote of Hippocrates: ‘ Primum non nocere’ ( First, do no harm ).

Introducing a novelty indicator for scientific research: validating the knowledge-based combinatorial approach

  • Published: 23 June 2021
  • Volume 126 , pages 6891–6915, ( 2021 )

Cite this article

  • Kuniko Matsumoto 1 ,
  • Sotaro Shibayama 2 ,
  • Byeongwoo Kang 3 &
  • Masatsura Igami 1  

1495 Accesses

7 Citations

1 Altmetric

Explore all metrics

Citation counts have long been considered as the primary bibliographic indicator for evaluating the quality of research—a practice premised on the assumption that citation count is reflective of the impact of a scientific publication. However, identifying several limitations in the use of citation counts alone, scholars have advanced the need for multifaceted quality evaluation methods. In this study, we apply a novelty indicator to quantify the degree of citation similarity between a focal paper and a pre-existing same-domain paper from various fields in the natural sciences by proposing a new way of identifying papers that fall into the same domain of focal papers using bibliometric data only. We also conduct a validation analysis, using Japanese survey data, to confirm its usefulness. Employing ordered logit and ordinary least squares regression models, this study tests the consistency between the novelty scores of 1871 Japanese papers published in the natural sciences between 2001 and 2006 and researchers’ subjective judgments of their novelty. The results show statistically positive correlations between novelty scores and researchers’ assessment of research types reflecting aspects of novelty in various natural science fields. As such, this study demonstrates that the proposed novelty indicator is a suitable means of identifying the novelty of various types of natural scientific research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

novelty of a research paper

Similar content being viewed by others

novelty of a research paper

The impact of a paper’s new combinations and new components on its citation

Yan Yan, Shanwu Tian & Jingjing Zhang

Emerging trends and new developments in information science: a document co-citation analysis (2009–2016)

Jianhua Hou, Xiucai Yang & Chaomei Chen

Change of perspective: bibliometrics from the point of view of cited references—a literature overview on approaches to the evaluation of cited references in bibliometrics

Werner Marx & Lutz Bornmann

The journal field refers the 22 scientific fields in the Essential Science Indicators (ESI) of Thomson Reuters.

The reclassification procedures of multidisciplinary field papers were as follows: (i) collecting the references of a focal paper in the multidisciplinary field; (ii) identifying the scientific field of each reference, where a field was identified based on the scientific fields of a journal; (iii) finding the most frequent scientific field in the references of the focal paper, except for multidisciplinary fields; and (iv) using the most frequent scientific field as the scientific field of the focal paper.

These correspond to focal papers without reference papers or having no same-domain papers. For these focal papers, the novelty scores are not calculable or become zero (the latter case is rare in our study; there are only two observations).

As shown in Tables 2 and 3 , our novelty scores are close to 1 and their variances are small. Previous research indicators (i.e., those used by Dahlin and Behrens ( 2005 ) and Trapido ( 2015 )), which are the basis of our indicators, also have similar features. The small variation in the scores may make it difficult to interpret whether novelty is high or low, especially for the practical use of the indicators. On this point, applying methods such as standardization would help interpret the indicators. Figure  2 is one such example where we adopted percentile representation for the horizontal axis.

This tendency is also confirmed in the other citation windows.

The ordered logit and OLS regression models use the same dependent and independent variables with robust standard errors.

Ahmed, T., Johnson, B., Oppenheim, C., & Peck, C. (2004). Highly cited old papers and the reasons why they continue to be cited: Part II. The 1953 Watson and Crick article on the structure of DNA. Scientometrics, 61 , 147–156.

Article   Google Scholar  

Baird, L. M., & Oppenheim, C. (1994). Do citations matter? Journal of Information Science, 20 (1), 2–15.

Bornmann, L., Schier, H., Marx, W., & Daniel, H. D. (2012). What factors determine citation counts of publications in chemistry besides their quality? Journal of Informetrics, 6 (1), 11–18.

Bornmann, L., Tekles, A., Zhang, H. H., & Fred, Y. Y. (2019). Do we measure novelty when we analyze unusual combinations of cited references? A validation study of bibliometric novelty indicators based on F1000Prime data. Journal of Informetrics, 13 (4), 100979.

Clarivate Analytics. (2020). Web of science core collection help. https://images.webofknowledge.com/images/help/WOS/hp_subject_category_terms_tasca.html . Accessed 16 October 2020.

Dahlin, K. B., & Behrens, D. M. (2005). When is an invention really radical? Defining and measuring technological radicalness. Research Policy, 34 (5), 717–737.

Fleming, L. (2001). Recombinant uncertainty in technological search. Management Science, 47 (1), 117–132.

Hicks, D., Wouters, P., Waltman, L., De Rijcke, S., & Rafols, I. (2015). Bibliometrics: The Leiden Manifesto for research metrics. Nature, 520 (7548), 429–431.

Igami, M., Nagaoka, S., & Walsh, J. P. (2015). Contribution of postdoctoral fellows to fast-moving and competitive scientific research. The Journal of Technology Transfer, 40 (4), 723–741.

Kaplan, S., & Vakili, K. (2015). The double-edged sword of recombination in breakthrough innovation. Strategic Management Journal, 36 (10), 1435–1457.

Lee, Y.-N., Walsh, J. P., & Wang, J. (2015). Creativity in scientific teams: Unpacking novelty and impact. Research Policy, 44 (3), 684–697.

MacRoberts, M., & MacRoberts, B. (1996). Problems of citation analysis. Scientometrics, 36 (3), 435–444.

Mednick, S. (1962). The associative basis of the creative process. Psychological Review, 69 (3), 220–232.

Murayama, K., Nirei, M., & Shimizu, H. (2015). Management of science, serendipity, and research performance: Evidence from a survey of scientists in Japan and the US. Research Policy, 44 (4), 862–873.

Nagaoka, S., Igami, M., Eto, M., & Ijichi, T. (2010). Knowledge creation process in science: Basic findings from a large-scale survey of researchers in Japan . IIR Working Paper, WP#10–08. Japan: Institute of Innovation Research, Hitotsubashi University.

Nelson, R. R., & Winter, S. G. (1982). An evolutionary theory of economic change . Belknap Press of Harvard University Press.

Nieminen, P., Carpenter, J., Rucker, G., & Schumacher, M. (2006). The relationship between quality of research and citation frequency. BMC Medical Research Methodology . https://doi.org/10.1186/1471-2288-6-42

Oppenheim, C., & Renn, S. P. (1978). Highly cited old papers and reasons why they continue to be cited. Journal of the American Society for Information Science, 29 , 225–231.

Romer, P. M. (1994). The origins of endogenous growth. Journal of Economic Perspectives, 8 (1), 3–22.

Schumpeter, J. A. (1939). Business cycles: A theoretical, historical and statistical analysis of the capitalist process . McGraw-Hill Book Company.

Simonton, D. K. (2003). Scientific creativity as constrained stochastic behavior: The integration of product, person and process perspectives. Psychological Bulletin, 129 (4), 475–494.

Tahamtan, I., & Bornmann, L. (2018). Creativity in science and the link to cited references: Is the creative potential of papers reflected in their cited references? Journal of Informetrics, 12 (3), 906–930.

Thelwall, M. (2017). Web indicators for research evaluation: A practical guide. Synthesis Lectures on Information Concepts, Retrieval and Services, 8 (4), i1–i155.

Google Scholar  

Trapido, D. (2015). How novelty in knowledge earns recognition: The role of consistent identities. Research Policy, 44 (8), 1488–1500.

Uddin, S., Khan, A., & Baur, L. A. (2015). A framework to explore the knowledge structure of multidisciplinary research fields. PLoS ONE, 10 (4), e0123537. https://doi.org/10.1371/journal.pone.0123537

Uzzi, B., Mukherjee, S., Stringer, M., & Jones, B. (2013). Atypical combinations and scientific impact. Science, 342 (6157), 468–472.

Verhoeven, D., Bakker, J., & Veugelers, R. (2016). Measuring technological novelty with patent-based indicators. Research Policy, 45 (3), 707–723.

Walsh, J. P., & Lee, Y. N. (2015). The bureaucratization of science. Research Policy, 44 (8), 1584–1600.

Wang, J., Lee, Y.-N., & Walsh, J. P. (2018). Funding model and creativity in science: Competitive versus block funding and status contingency effects. Research Policy, 47 (6), 1070–1083.

Wang, J., Veugelers, R., & Stephan, P. (2017). Bias against novelty in science: A cautionary tale for users of bibliometric indicators. Research Policy, 46 (8), 1416–1436.

Zdaniuk, B. (2014). Ordinary least-squares (OLS) model. In A. C. Michalos (Ed.), Encyclopedia of quality of life and well-being research. Springer.

Download references

Acknowledgements

We wish to thank Natsuo Onodera for his invaluable insights regarding the measuring of the novelty score.

Author information

Authors and affiliations.

Research Unit for Science and Technology Analysis and Indicators, National Institute of Science and Technology Policy (NISTEP), Tokyo, Japan

Kuniko Matsumoto & Masatsura Igami

School of Economics and Management, Lund University, Lund, Sweden

Sotaro Shibayama

Institute of Innovation Research, Hitotsubashi University, Tokyo, Japan

Byeongwoo Kang

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by KM. The first draft of the manuscript was written by KM, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kuniko Matsumoto .

Ethics declarations

Conflict of interest.

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

About this article

Matsumoto, K., Shibayama, S., Kang, B. et al. Introducing a novelty indicator for scientific research: validating the knowledge-based combinatorial approach. Scientometrics 126 , 6891–6915 (2021). https://doi.org/10.1007/s11192-021-04049-z

Download citation

Received : 18 November 2020

Accepted : 17 May 2021

Published : 23 June 2021

Issue Date : August 2021

DOI : https://doi.org/10.1007/s11192-021-04049-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Bibliometrics
  • Reference combination
  • Find a journal
  • Publish with us
  • Track your research

Advertisement

Issue Cover

  • Previous Article
  • Next Article

1. Introduction

2. related work, 3. current methodology: encompassing multiple premises for document-level novelty detection, 4. dataset description, 5. evaluation, 6. summary, conclusion, and future work, acknowledgments, novelty detection: a perspective from natural language processing.

The author carried out this work during his doctoral studies at the Indian Institute of Technology Patna, India.

  • Cite Icon Cite
  • Open the PDF for in another window
  • Permissions
  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Search Site

Tirthankar Ghosal , Tanik Saikh , Tameesh Biswas , Asif Ekbal , Pushpak Bhattacharyya; Novelty Detection: A Perspective from Natural Language Processing. Computational Linguistics 2022; 48 (1): 77–117. doi: https://doi.org/10.1162/coli_a_00429

Download citation file:

  • Ris (Zotero)
  • Reference Manager

The quest for new information is an inborn human trait and has always been quintessential for human survival and progress. Novelty drives curiosity, which in turn drives innovation. In Natural Language Processing (NLP), Novelty Detection refers to finding text that has some new information to offer with respect to whatever is earlier seen or known. With the exponential growth of information all across the Web, there is an accompanying menace of redundancy. A considerable portion of the Web contents are duplicates, and we need efficient mechanisms to retain new information and filter out redundant information. However, detecting redundancy at the semantic level and identifying novel text is not straightforward because the text may have less lexical overlap yet convey the same information. On top of that, non-novel/redundant information in a document may have assimilated from multiple source documents, not just one. The problem surmounts when the subject of the discourse is documents, and numerous prior documents need to be processed to ascertain the novelty/non-novelty of the current one in concern. In this work, we build upon our earlier investigations for document-level novelty detection and present a comprehensive account of our efforts toward the problem. We explore the role of pre-trained Textual Entailment (TE) models to deal with multiple source contexts and present the outcome of our current investigations. We argue that a multipremise entailment task is one close approximation toward identifying semantic-level non-novelty. Our recent approach either performs comparably or achieves significant improvement over the latest reported results on several datasets and across several related tasks (paraphrasing, plagiarism, rewrite). We critically analyze our performance with respect to the existing state of the art and show the superiority and promise of our approach for future investigations. We also present our enhanced dataset TAP-DLND 2.0 and several baselines to the community for further research on document-level novelty detection.

Of all the passions of mankind, the love of novelty most rules the mind. –Shelby Foote

This quote by Shelby Foote 1 sums up the importance of novelty in our existence. Most of the breakthrough discoveries and remarkable inventions throughout history, from flint for starting a fire to self-driving cars, have something in common: They result from curiosity. A basic human attribute is the impulse to seek new information and experiences and explore novel possibilities. Humans elicit novel signals from various channels: text, sound, scene, via basic senses, and so forth. Novelty is important in our lives to drive progress, to quench our curiosity needs. Arguably the largest source of information elicitation in this digitization age is texts: be it books, the Web, papers, social media, and so forth. However, with the abundance of information comes the problem of duplicates, near-duplicates, and redundancies. Although document duplication is encouraged in certain use-cases (e.g., Content Syndication in Search Engine Optimization [SEO]), it impedes the search for new information. Hence identifying redundancies is important to seek novelties. We humans are already equipped with an implicit mechanism ( Two Stage Theory of Human Recall : recall-recognition [Tarnow 2015 ]) through which we can segregate new information from old information. In our work, we are interested in exploring how machines would identify semantic-level non-novel information and hence pave the way to identify documents having significant content of new information. Specifically, here in this work, we investigate how we can automatically discover novel knowledge from the dimension of text or identify that a given text has new information. We rely on certain principles of Machine Learning and NLP to design efficient neural architectures for textual novelty detection at the document level.

Textual novelty detection has been known for a long time as an information retrieval problem (Soboroff and Harman 2005 ) where the goal is to retrieve relevant pieces of text that carry new information with respect to whatever is previously seen or known to the reader. With the exponential rise of information across the Web, the problem becomes more relevant now as information duplication (prevalence of non-novel information) is more prominent. The deluge of redundant information impedes critical, time-sensitive, and quality information to end-users. Duplicates or superfluous texts hinder reaching new information that may prove crucial to a given search. According to a particular SEO study 2 by Google in 2016, 25%–30% of documents on the Web exist as duplicates (which is quite a number!). With the emergence of humongous language models like GPT-3 (Brown et al. 2020 ), machines are now capable of generating artificial and semantically redundant information. Information duplication is not just restricted to lexical surface forms (mere copy), but there is duplication at the level of semantics (Bernstein and Zobel 2005 ). Hence, identifying whether a document contains new information in the reader’s interest is a significant problem to explore to save space and time and retain the reader’s attention. Novelty Detection in NLP finds application in several tasks, including text summarization (Bysani 2010 ), plagiarism detection (Gipp, Meuschke, and Breitinger 2014 ), modeling interestingness (Bhatnagar, Al-Hegami, and Kumar 2006 ), tracking the development of news over time (Ghosal et al. 2018b ), identifying fake and misinformation (Qin et al. 2016 ), and so on.

As we mentioned, novelty detection as an information retrieval problem signifies retrieving relevant sentences that contain new information in discourse. Sentence-level novelty detection (Allan, Wade, and Bolivar 2003a ), although important, would not suffice in the present-day deluge of Web information in the form of documents. Hence, we emphasize the problem’s document-level variant, which categorizes a document (as novel, non-novel, or partially novel) based on the amount of new information in the concerned document. Sentence-level novelty detection is a well-investigated problem in information retrieval (Li and Croft 2005 ; Clarke et al. 2008 ; Soboroff and Harman 2003 ; Harman 2002a ); however, we found that document-novelty detection attracted relatively less attention in the literature. Moreover, the research on the concerned problem encompassing semantic-level comprehension of documents is scarce, perhaps because of the argument that every document contains something new (Soboroff and Harman 2005 ). Comprehending the novelty of an entire document with confidence is a complex task even for humans. Robust semantic representation of documents is still an active area of research, which somewhat limits the investigation of novelty mining at the document level. Hence, categorizing a document as novel or non-novel is not straightforward and involves complex semantic phenomena of inference, relevance, diversity, relativity, and temporality, as we show in our earlier work (Ghosal et al. 2018b ).

This article presents a comprehensive account of the document-level novelty detection investigations that we have conducted so far (Ghosal et al., 2018b , 2019 , 2021 ). The major contribution here is that we present our recent exploration of re-modeling multi-premise entailment for the problem and explain why it is a close approximation to identify semantic-level redundancy. We argue that to ascertain a given text’s novelty, we would need multi-hop reasoning on the source texts for which we draw reference from the Question Answering (QA) literature (Yang et al. 2018 ). We show that our new approach achieves comparable performance to our earlier explorations, sometimes better.

We organize the rest of this article as follows: In the remainder of the current section, we motivate our current approach in light of TE. In Section 2 , we discuss the related work on textual novelty detection so far, along with our earlier approaches toward the problem. Section 3 describes the current methods that utilized multiple premises for document-level novelty detection. Section 4 focuses on the dataset description. We report our evaluations in Section 5 . We conclude with plans for future works in Section 6 .

1.1 Textual Novelty Detection: An Entailment Perspective

T entails H if, typically, a human reading T would infer that H is most likely true. (Dagan, Glickman, and Magnini 2005 ).
Example 1 Text 1 : I left the restaurant satisfactorily . (Premise P ) Text 2 : I had good food . (Hypothesis H )

The ambiance was good ( H 1 )

The price was low ( H 2 )

I got some extra fries at no cost ( H 3 )

I received my birthday discount at the restaurant ( H 4 )

Novelty detection tasks in both the TREC (Soboroff and Harman 2005 ), and RTE-TAC (Bentivogli et al. 2011 ) were designed from an information retrieval perspective where the main goal was to retrieve relevant sentences to decide on the novelty of a statement. We focus on the automatic classification and scoring of a document based on its new information content from a machine learning perspective.

As is evident from the examples, the premise-hypothesis pair shows significantly less lexical overlap, making the entailment decisions more challenging while working at the semantic level. Our methods encompass such semantic phenomena, which were less prominent in the TREC and RTE-TAC datasets.

For ascertaining the novelty of a statement, we opine that a single premise is not enough. We would need the context, world knowledge, and reasoning over multiple facts. We discuss the same in the subsequent section.

1.2 Multiple Premise Entailment (MPE) for Novelty Detection

We deem the NLP task MPE as one close approximation to simulate the phenomenon of textual non-novelty. MPE (Lai, Bisk, and Hockenmaier 2017 ) is a variant of the standard TE task in which the premise text consists of multiple independently written sentences (source), all related to the same topic. The task is to decide whether the hypothesis sentence (target) can be used to describe the same topic (entailment) or cannot be used to describe the same topic (contradiction), or may or may not describe the same topic (neutral). The main challenge is to infer what happened in the topic from the multiple premise statements, in some cases aggregating information across multiple sentences into a coherent whole. The MPE task is more pragmatic than the usual TE task as it aims to assimilate information from multiple sources to decide the entailment status of the hypothesis.

Similarly, the novelty detection problem becomes more practical and hence intense when we need to consider multiple sources of knowledge (premises) to decide whether a given text (hypothesis) contains new information or not. In the real world, it is highly unlikely that a certain text would assimilate information from just another text (unlike the Premise-Hypothesis pair instances in most NLI datasets). To decide on the novelty of a text, we need to consider the context and reason over multiple facts. Let us consider the following example. Here, source would signify information that is already seen or known (Premise) to the reader, and target would signify the text for which novelty/redundancy is to be ascertained (Hypothesis).

Example 2 Source : Survey says Facebook is still the most popular social networking site (s 1 ). It was created by Mark Zuckerberg and his colleagues when they were students at Harvard back in 2004 (s 2 ). Harvard University is located in Cambridge, Massachusetts, which is just a few miles from Boston (s 3 ). Zuckerberg now lives in Palo Alto, California (s 4 ) . Target : Facebook was launched in Cambridge (t 1 ). The founder resides in California (t 2 ) .

Clearly, the target text would appear non-novel to a reader with respect to the source/premise. However, to decide on each sentence’s novelty in the target text, we would need to consider multiple sentences in the source text, not just one. Here in this case, to decide on the novelty of t 1 , we would need the premises s 1 , s 2 , s 3 and similarly s 1 , s 2 , s 4 to decide for t 2 . s 4 is not of interest to t 1 , neither is s 3 to t 2 . Thus to answer for the novelty of a certain text, it is quite likely that we may need to reason over multiple relevant sentences. Hence a multi-premise inference scenario appears to be appropriate here. In our earlier work (Ghosal et al. 2018b ), we already consider Relevance to be one important criteria for Novelty Detection . So, selecting relevant premises for a statement is an important step toward detecting the novelty of the statement.

Leveraging multi-premise TE concept for document-level novelty detection with pre-trained entailment models.

Presenting the TAP-DLND 2.0 dataset extending on TAP-DLND 1.0 (Ghosal et al. 2018b ) and including sentence-level annotations to generate a document-level novelty score.

In this section, we present a comprehensive discussion on the existing literature and explorations on textual novelty detection. We have been working on the document-level variant of the problem for some time. We briefly discuss our earlier approaches and learning so far before discussing our current hypothesis and approach.

2.1 Existing Literature

We survey the existing literature and advances on textual novelty detection and closely related sub-problems.

2.1.1  Early Days .

Textual novelty detection has a history of earlier research (mostly from IR) with a gradual evolution via different shared tasks. We trace the first significant concern on novelty detection back to the new event/first story detection task of the Topic Detection, and Tracking (TDT) campaigns (Wayne 1997 ). Techniques in TDT mostly involved grouping news stories into clusters and then measuring the belongingness of an incoming story to any of the clusters based on some preset similarity threshold. If a story does not belong to any of the existing clusters, it is treated as the first story of a new event, and a new cluster is started. Vector space models, language models, lexical chain, and so forth, were used to represent each incoming news story/document. Some notable contributions in TDT are from Allan, Papka, and Lavrenko ( 1998 ); Yang et al. ( 2002 ); Stokes and Carthy ( 2001 ); Franz et al. ( 2001 ); Allan et al. ( 2000 ); Yang, Pierce, and Carbonell ( 1998 ); and Brants, Chen, and Farahat ( 2003 ). A close approximation of event-level document clustering via cross-document event tracking can be found in Bagga and Baldwin ( 1999 ).

2.1.2  Sentence-level Novelty Detection .

Research on sentence-level novelty detection gained prominence in the novelty tracks of Text Retrieval Conferences (TREC) from 2002 to 2004 (Harman 2002b ; Soboroff and Harman 2003 ; Soboroff 2004 ; Soboroff and Harman 2005 ). Given a topic and an ordered list of relevant documents, the goal of these tracks was to highlight relevant sentences that contain new information. Significant work on sentence-level novelty detection on TREC data came from Allan, Wade, and Bolivar ( 2003b ); Kwee, Tsai, and Tang ( 2009 ); and Li and Croft ( 2005 ). Language model measures, vector space models with cosine similarity, and word count measures were the dominant approaches. Some other notable work on finding effective features to represent natural language sentences for novelty computation was based on the sets of terms (Zhang et al. 2003 ), term translations (Collins-Thompson et al. 2002 ), Named Entities (NEs) or NE patterns (Gabrilovich, Dumais, and Horvitz 2004 ; Zhang and Tsai 2009 ), Principal Component Analysis Vectors (Ru et al. 2004 ), Contexts (Schiffman and McKeown 2005 ), and Graphs (Gamon 2006 ). Tsai, Tang, and Chan ( 2010 ) and Tsai and Luk Chan ( 2010 ) presented an evaluation of metrics for sentence-level novelty mining.

Next came the novelty subtracks in the Recognizing Textual Entailment-Text Analytics Conferences (RTE-TAC) 6 and 7 (Bentivogli et al. 2010 , 2011 ) where TE (Dagan et al. 2013 ) was viewed as one close neighbor to sentence-level novelty detection. The findings confirmed that summarization systems could exploit the TE techniques for novelty detection when deciding which sentences should be included in the update summaries.

2.1.3  Document-level Novelty Detection .

At the document level, pioneering work was conducted by Yang et al. ( 2002 ) via topical classification of online document streams and then detecting novelty of documents in each topic exploiting the NEs. Zhang, Callan, and Minka ( 2002b ) viewed novelty as an opposite characteristic to redundancy and proposed a set of five redundancy measures ranging from the set difference, geometric mean, and distributional similarity to calculate the novelty of an incoming document with respect to a set of documents in the memory. They also presented the first publicly available Associated Press-Wall Street Journal (APWSJ) news dataset for document-level novelty detection. Tsai and Zhang ( 2011 ) applied a document to sentence-level (d2s) framework to calculate the novelty of each sentence in a document that aggregates to detect novelty of the entire document. Karkali et al. ( 2013 ) computed a novelty score based on the inverse-document-frequency scoring function. Verheij et al. ( 2012 ) presented a comparative study of different novelty detection methods and evaluated them on news articles where language model-based methods performed better than the cosine similarity-based ones. More recently, Dasgupta and Dey ( 2016 ) conducted experiments with an information entropy measure to calculate the innovativeness of a document. Zhao and Lee ( 2016 ) proposed an intriguing idea of assessing the novelty appetite of a user based on a curiosity distribution function derived from curiosity arousal theory and the Wundt curve in psychology research.

2.1.4  Diversity and Novelty .

Novelty detection is also studied in information retrieval literature for content diversity detection. The idea is to retrieve relevant yet diverse documents in response to a user query to yield better search results. Carbonell and Goldstein ( 1998 ) were the first to explore diversity and relevance for novelty with their Maximal Marginal Relevance measure. Some other notable work along this line are from Chandar and Carterette ( 2013 ) and Clarke et al. ( 2008 , 2011 ). Our proposed work significantly differs from the existing literature regarding the methodology adopted and how we address the problem.

2.1.5  Retrieving Relevant Information for Novelty Detection .

Selecting and retrieving relevant sentences is one core component of our current work. In recent years, there has been much research on similar sentence retrieval, especially in QA. Ahmad et al. ( 2019 ) introduced Retrieval Question Answering (ReQA), a benchmark for evaluating large-scale sentence level answer retrieval models, where they established a baseline for both traditional information retrieval (sparse term based) and neural (dense) encoding models on the Stanford Question Answering Dataset (SQuAD) (Rajpurkar et al. 2016 ). Huang et al. ( 2019 ) explored a multitask sentence encoding model for semantic retrieval in QA systems. Du et al. ( 2021 ) introduced SentAugment, a data augmentation method that computes task-specific query embeddings from labeled data to retrieve sentences from a bank of billions of unlabeled sentences crawled from the Web. Yang et al. ( 2020 ) uses the Universal Sentence Encoder (USE) for semantic similarity and semantic retrieval in a multilingual setting. However, in our current work, we apply a simple TE probability-based ranking method to rank the relevant source sentences with respect to a given target query sentence.

2.2 Our Explorations So Far

As is evident from our discussion so far, textual novelty detection was primarily investigated in the Information Retrieval community, and the focus was on novel sentence retrieval. We began our exploration on textual novelty detection with the motivation to cast the problem as a document classification task in machine learning. The first hurdle we came across was the non-availability of a proper document-level novelty detection dataset that could cater to our machine learning experimental needs. We could refer to the only available dataset, the APWSJ (Zhang, Callan, and Minka 2002a ). However, APWSJ too was not developed from a machine learning perspective as the dataset is skewed toward novel documents (only 8.9% instances are non-novel ). Hence, we decided to develop a dataset (Ghosal et al. 2018b ) from newspaper articles. We discuss our dataset in detail in Section 4.1 . Initially, we performed some pilot experiments to understand the role of TE in textual novelty detection (Saikh et al. 2017 ). We extracted features from source-target documents and experimented with several machine learning methods, including Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Random Forest (RF), and so on. We also investigated our idea of TE-based novelty detection on the sentence-level entailment-based benchmark datasets from the Recognizing Textual Entailment (RTE) tasks (Bentivogli et al. 2010 , 2011 ).

We discuss the approaches we developed so far in the subsequent section.

2.2.1  Feature-based Method for Document Novelty Detection .

We view novelty as an opposite characteristic to Semantic Textual Similarity (STS), with our first investigation (Ghosal et al. 2018b ) on document-level novelty detection as a classification problem. We curate several features from a target document (with respect to a predefined set of source documents) like paragraph vector (doc2vec) similarity, KL divergence, summarization similarity (concept centrality using TextRank [Mihalcea and Tarau 2004 ]), lexical n-gram similarity, new words count, NE and keyword similarity , and so forth, and build our classifier based on RF. The dominant feature for the classification was new word count followed by document-level semantic similarity, keyword, and named-entity similarity .

2.2.2  RDV-CNN Method for Document Novelty .

2.2.3  detecting document novelty via decomposable attention ..

With our subsequent investigation (Ghosal et al. 2021 ) we experiment with a decomposable attention-based deep neural approach inspired by Bowman et al. ( 2015 ) and Parikh et al. ( 2016 ). For a semantically redundant document ( non-novel ), we contend that the neural attention mechanism would be able to identify the sentences in the source document that has identical information and is responsible for non-novelty of the target document (we call it Premise Selection ). We then jointly encode the source-target alignment and pass it through an MLP for classification. This approach is simple with an order of fewer parameters as compared to other complex deep neural architectures. Inspired by works on attention in the Machine Translation literature ( Bahdanau, Cho, and Bengio ), it relies only on the learning of sentence-level alignments to generate document-level novelty judgments.

Our current work differs from the existing literature on novelty detection, even from our earlier attempts in many aspects. The majority of earlier prominent work on novelty detection focused on novel sentence retrieval. In our earlier attempts, we did not consider multiple premises for ascertaining the novelty of an information unit (sentence in our case). Here, we attempt a multi-hop multi-premise entailment to address the scenario we discussed in Section 1.2 . Assimilating information from multiple sources and enhancing the retrieved source information with their relevant weights are some crucial contributions for document-level novelty detection in this work. Finally, we introduce a novel dataset to quantify document novelty.

A somewhat similar work for information assimilation from multiple premises is Augenstein et al. ( 2019 ) where the authors perform automatic claim verification from multiple information sources. In that work, the authors collect claims from 26 fact-checking Web sites in English, pair them with textual sources and rich metadata, and label them for veracity by human expert journalists. Although our work encompasses information assimilation from multiple sources, we differ from Augenstein et al. ( 2019 ) in the motivation and the task definitions. However, we can draw parallels with our work as novel facts would be hard to verify because there would not be enough evidence to corroborate those facts’ claims. However, if a fact is entailed from authentic information sources, it can be verified, which means that it would not be a novel one. The discussion opens up an interesting aspect: A verified fact contains information that could be entailed from authentic information sources; hence the fact would not be saying something drastically new. A fact that is novel would be hard to verify due to a lack of prior information.

As discussed in Section 1.2 , reasoning over multiple facts is essential for textual novelty detection. We may need to assimilate information from multiple source texts to ascertain the state of the novelty of a given statement or a fact. If a text is redundant against a given prior, it is redundant against the set of all the relevant priors. However, it has to be novel against all the relevant priors for a text to be novel . Here, a prior signifies the relevant information exposed to the reader that s/he should refer to determine the newness of the target text . If no such priors are available, possibly the target text has new information. Organizers of TREC information retrieval exercises (Soboroff 2004 ) formulated the tasks along this line. If for a given query (target), no relevant source is found from a test collection, possibly the query is new. Here s 1 , s 2 , s 3 , s 4 are the relevant priors for t 1 , t 2 .

a relevance detection module, followed by

a novelty detection module

Multi-premise entailment-based document-level novelty detection architectures’ overview. It has two components: the Relevance Detection module, which computes relevance scores, and the Novelty Detection module, which aggregates multiple premises, computes entailment, and classifies the target document. The entailment model in the relevance module uses full entailment stack (ESIM in this case), whereas the novelty module uses multiple partial entailment stacks (excluding the last projection layer) to aggregate the premises via a join operation.

Multi-premise entailment-based document-level novelty detection architectures’ overview. It has two components: the Relevance Detection module, which computes relevance scores, and the Novelty Detection module, which aggregates multiple premises, computes entailment, and classifies the target document. The entailment model in the relevance module uses full entailment stack (ESIM in this case), whereas the novelty module uses multiple partial entailment stacks (excluding the last projection layer) to aggregate the premises via a join operation.

3.1 Relevance Detection

The goal of this module is to find relevant premises (source sentences) for each sentence in the target document. We treat the sentences in the target document as our multiple hypotheses, that is, we understand a target document to comprise multiple hypothesis statements. The objective is to find to what extent each of these hypotheses is entailed from the premises in the source documents and use that knowledge to decide the target document’s novelty. Ideally, a non-novel document would find the majority of its sentences highly entailed from the various sentences in the source documents . A source sentence is considered relevant if it contains information related to the target sentence and may serve as the premise to determine the newness of the target sentence. We model this relevance in terms of entailment probabilities, that is, how well the information in the source and the target correlate. We use a pre-trained inference model to give us the entailment probabilities between all possible pairs of target and source sentences. Not all sentences in the source documents would be relevant for a given target sentence (as per the example in Section 1.2 , s 4 is not relevant for t 1 and s 3 is not relevant to t 2 ). For each target sentence ( t k ), we select the top f source sentences with the highest entailment probabilities ( α kf ) as the relevant priors. After softmax, the final layer of a pre-trained entailment model would give us the entailment probability between a given premise-hypothesis pair.

3.1.1  Input .

Let S 1 , S 2 , …, S n be the source documents retrieved from a document collection for a target document T . In our experiments, we already had the source documents designated for a given target document. We split the source and target documents into corresponding sentences. Here, s ij denotes the i th sentence of the source document j . t k represents the sentences in the target document ( T ). The final objective is to determine whether T is novel or non-novel with respect to S 1 , S 2 , …, S n .

3.1.2  Inference Model .

3.2 selection module and relevance scores.

Not all the source sentences would contribute toward the target sentence. Hence, we retain the topmost f relevant source sentences for the target sentence t k based on the entailment probabilities or what we term as the relevance scores . In Figure 1 , α kf denotes the relevance scores for the top f selected source sentences for a target sentence t k . We would further use these relevance scores while arriving at a Source-Aware Target (SAT) representation in the Novelty Detection module. Thus, the relevance module’s outputs are multiple relevant source sentences s kf for a given target sentence t k and their pairwise relevance scores.

3.3 Novelty Detection Module

The goal of the Novelty Detection module is to assimilate information from the multiple relevant source sentences (from source documents) to ascertain the novelty of the target document. The novelty detection module would take as input the target document sentences paired with their corresponding f relevant source sentences. This module would again make use of a pre-trained entailment model (i.e., ESIM here) along with the relevance scores between each source-target sentence pair from the earlier module to independently arrive at a SAT representation for each target sentence t k . We use the earlier module’s relevance scores to incentivize the contributing source sentences and penalize the less-relevant ones for the concerned target sentence. Finally, we concatenate the k SAT representations, passing it through a final feed-forward and linear layer, to decide on the novelty of T . We discuss the assimilation of multiple premises weighted by their relevance scores in the following section. The number of entailment functions in this layer depends on the number of target sentences ( k ) and the number of relevant source sentences you want to retain for each target sentence (i.e., f ).

3.3.1  Relevance-weighted Inference Model to Support Multi-premise Entailment .

A typical neural entailment model consists of an input encoding layer, local inference layer, and inference composition layer (see Figure 1b ). The input layer encodes the premise (source) and hypothesis (target) texts; the local inference layer makes use of cross-attention between the premise and hypothesis representations to yield entailment relations, followed by additional layers that use this cross-attention to generate premise attended representations of the hypothesis and vice versa. The final layers are classification layers, which determine entailment based on the representations from the previous layer. In order to assimilate information from multiple source sentences, we use the relevance scores from the previous module to scale up the representations from the various layers of the pre-trained entailment model (E) and apply a suitable join operation (Trivedi et al. 2019 ). In this join operation, we use a part of the entailment stack to give us a representation for each sentence pair that represents important features of the sentence pair and hence gives us a meaningful document level representation when combined with weights. We denote this part of the stack as f e 1 . The rest of the entailment stack that we left out in the previous step is used to obtain the final representation from the combined intermediate representations and is denoted by f e 2 . This way, we aim to emphasize the top relevant source-target pairs and attach lesser relevance scores to the bottom ones for a given target sentence t k . The join operation would facilitate the assimilation of multiple source information to infer on the target.

We now discuss how we incorporate the relevance scores to various layers of the pre-trained entailment model (E) and assimilate the multiple source information for a given target sentence t k .

3.3.2  Input Layer to Entailment Model .

3.3.3  cross-attention layer ..

Next is the cross attention between the source and target sentences to yield the entailment relationships. In order to put emphasis on the most relevant source-target pairs, we scale the cross-attention matrices with the relevance scores from the previous module and then re-normalize the final matrix.

So, for a source sentence s against a given target sentence t , we obtain a source to target cross-attention matrix s ˜ i and a target to source cross-attention matrix t ˜ j with dimension ( i × j ) and ( j × i ), respectively.

Now for our current multi-source and multi-target scenario, for the given target sentence t k , we found f relevant source sentences s k 1 , s k 2 , …, s kf . The assimilation mechanism would scale the corresponding attention matrices by a factor α kf for each source ( s f )-target ( t k ) pair to generate the SAT for t k against s k 1 , s k 2 , …, s kf .

3.3.4  Source-Aware Target Representations .

Selected source premises ( s kf ) from the selection module are scaled with the relevance attention weights ( α kf ) to attach importance to the selected premises. The transformation from s kf to h kf is achieved by cross-attention between source and target sentences followed by a concatenation of the attention-weighted premise, followed by the higher-order entailment layers in the ESIM stack (pooling, concatenation, feed-forward, linear) (Chen et al. 2017 ). h kf is the output of the entailment stack, which is further scaled with the attention weight ( α kf ). For further details on how the ESIM stack for inference works (for e.g., the transformation of source representations to the entailment hidden state representations), please consult Chen et al. ( 2017 ).

3.3.5  Novelty Classification .

We stack the SAT representations ( SAT k ) for all the sentences in the target document and pass the fused representation through an MLP to discover important features and finally classify with a layer having softmax activation function. The output is whether the target document is Novel or Non-Novel with respect to the source documents.

The most popular datasets for textual novelty detection are the ones released in TREC 2002–2004 (Harman 2002a ; Soboroff and Harman 2003 ) and RTE-TAC 2010–2011 (Bentivogli et al. 2010 , 2011 ). However, these datasets are for sentence-level novelty mining and hence do not cater to our document-level investigation needs. Therefore, for the current problem of the document-level novelty classification, we experiment with two document-level datasets: the APWSJ (Zhang, Callan, and Minka 2002b ), and the one we developed—TAP-DLND 1.0 (Ghosal et al. 2018b ). We also extend our TAP-DLND 1.0 dataset, include sentence-level annotations to arrive at a document-level novelty score, and coin it as TAP-DLND 2.0 , which we present in this article. All these datasets are in the newswire domain.

4.1 TAP-DLND 1.0 Corpus

We experiment with our benchmark resource for document-level novelty detection (Ghosal et al. 2018b ). The dataset is balanced and consists of 2,736 novel and 2,704 non-novel documents. There are several categories of events; ten to be precise (Business, Politics, Sports, Arts and Entertainment, Accidents, Society, Crime, Nature, Terror, Society). For each novel/non-novel document, there are three source documents against which the target documents are annotated. While developing this dataset, we ensured that Relevance , Relativity , Diversity , and Temporality (Ghosal et al. 2018b ) characteristics were preserved.

To annotate a document as non-novel whose semantic content significantly overlaps with the source document(s) (maximum redundant information).

To annotate a document as novel if its semantic content, as well as intent (direction of reporting), significantly differs from the source document(s) (minimum or no information overlap). It could be an update on the same event or describing a post-event situation.

We left out the ambiguous cases (for which the human annotators were unsure about the label).

The TAP-DLND 1.0 corpus structure. We retain the structure in the extended dataset (TAP-DLND 2.0) we use in the current work.

The TAP-DLND 1.0 corpus structure. We retain the structure in the extended dataset (TAP-DLND 2.0) we use in the current work.

Apart from the inter-rater agreement, we use Jaccard Similarity (Jaccard 1901 ), BLEU (Papineni et al. 2002 ), and ROUGE (Lin 2004 ) to judge the quality of data. We compute the average scores between source and target documents and show this in Table 1 . It is clear that non-novel documents’ similarity with the corresponding source documents are higher compared to their novel counterparts, which is justified.

On measuring quality of annotations via automatic metrics (weak indicators).

4.2 APWSJ Dataset

The APWSJ dataset consists of news articles from the Associated Press (AP) and Wall Street Journal (WSJ) covering the same period (1988–1990) with many on the same topics, guaranteeing some redundancy in the document stream. There are 11,896 documents on 50 topics (Q101–Q150 TREC topics). After sentence segmentation, these documents have 319,616 sentences in all. The APWSJ data contain a total of 10,839 (91.1%) novel documents and 1,057 (8.9%) non-novel documents. However, similar to Zhang, Callan, and Minka ( 2002b ), we use the documents within the designated 33 topics with redundancy judgments by the assessors. The dataset was meant to filter superfluous documents in a retrieval scenario to deliver only the documents having a redundancy score below a calculated threshold. Documents for each topic were delivered chronologically, and the assessors provided two degrees of judgments on the non-novel documents: absolute redundant or somewhat redundant , based on the preceding documents. The unmarked documents are treated as novel . However, because there is a huge class imbalance, we follow Zhang, Callan, and Minka ( 2002b ), and include the somewhat redundant documents also as non-novel and finally arrive at ∼37% non-novel instances. Finally, there are 5,789 total instances, with 3,656 novel and 2,133 non-novel. The proportion of novel instances for the novelty classification experiments is 63.15%.

4.3 TAP-DLND 2.0 Corpus

We present the extended version of our TAP-DLND 1.0 corpus with this work. The new TAP-DLND 2.0 dataset is available at https://github.com/Tirthankar-Ghosal/multipremise-novelty-detection . Whereas TAP-DLND 1.0 is for document-level novelty classification, the TAP-DLND 2.0 dataset is catered toward deducing the novelty score of a document (quantifying novelty) based on the information contained in the preceding/source documents. Also, we annotate the new dataset at the sentence level (more fine-grained) in an attempt to weed out inconsistencies that may have persisted with document-level annotations.

We re-annotate TAP-DLND 1.0 from scratch, now at the sentence level, extend to more than 7,500 documents, and finally deduce a document-level novelty score for each target document. The judgment of novelty at the document level is not always unanimous and is subjective. Novelty comprehension also depends on the appetite of the observer/reader (in our case, the annotator or the labeler) (Zhao and Lee 2016 ). It is also quite likely that every document may contain something new with respect to previously seen information (Soboroff and Harman 2003 ). However, this relative amount of new information is not always justified to label the entire document as novel. Also, the significance of the new information with respect to the context plays a part. It may happen that a single information update is so crucial and central to the context that it may affect the novelty comprehension of the entire document for a labeler. Hence, to reduce inconsistencies, we take an objective view and deem that instead of looking at the target document in its entirety, if we look into the sentential information content, we may get more fine-grained new information content in the target document discourse. Thus, with this motivation, we formulate a new set of annotation guidelines for annotations at the sentence level. We associate scores with each annotation judgment, which finally cumulates to a document-level novelty score. We design an easy-to-use interface ( Figure 4 ) to facilitate the annotations and perform the annotation event-wise. For a particular event, an annotator reads the predetermined three seed source documents, gathers information regarding that particular event, and then proceeds to annotate the target documents, one at a time. Upon selecting the desired target document, the interface splits the document into constituent sentences and allows six different annotation options for each target sentence (cf. Table 2 ). We finally take the cumulative average as the document-level novelty score for the target document. We exclude the sentences marked as irrelevant (IRR) from the calculation. The current data statistics for TAP-DLND 2.0 is in Table 3 . We also plot the correspondence between the classes of TAP-DLND 1.0 and the novelty scores of TAP-DLND 2.0 to see how the perception of novelty varied across sentence and document-level annotations. The plot is in Figure 3 . We divide the whole range of novelty scores (from TAP-DLND 2.0 annotations) within a set of five intervals, which are placed in the x -axis. The number of novel/non-novel annotated documents (from TAP-DLND 1.0) are shown in the vertical bars. We can see that the number of novel documents steadily increases as the novelty score range increases, while the reverse scenario is true for non-novel documents. This behavior signifies that the perception did not change drastically when we moved from document-level to sentence-level annotations and also that our assigned scores (in Table 2 ) reflect this phenomena to some extent.

Sentence-level annotations. The target document sentences are annotated with respect to the information contained in the source documents for each event. The annotations are qualitatively defined. We assign scores to quantify them.

TAP-DLND 2.0 dataset statistics. Inter-rater agreement (Fleiss 1971 ) is measured for 100 documents for sentence-level annotations by two raters.

The novelty class and novelty score correspondence between TAP-DLND 1.0 and TAP-DLND 2.0 datasets. The blue bars and orange bars represent number of novel and non-novel documents (y-axis) in the given score range (x-axis), respectively.

The novelty class and novelty score correspondence between TAP-DLND 1.0 and TAP-DLND 2.0 datasets. The blue bars and orange bars represent number of novel and non-novel documents ( y -axis) in the given score range ( x -axis), respectively.

The sentence-level annotation interface used to generate the document-level novelty score (gold standard).

The sentence-level annotation interface used to generate the document-level novelty score (gold standard).

4.3.1  About the Annotators .

We had the same annotators from TAP-DLND 1.0 working on the TAP-DLND 2.0 dataset. One of the two full-time annotators holds a master’s degree in Linguistics, and the other annotator holds a master’s degree in English. They were hired full-time and paid the usual research fellow stipend in India. The third annotator to resolve the differences in the annotations is the first author of this article. The annotation period lasted more than six months. On average, it took ∼30 minutes to annotate one document of average length, but the time decreased and the consensus increased as we progressed in the project. A good amount of time went into reading the source documents carefully and then proceeding toward annotating the target document based on the acquired knowledge from the source documents for a given event. Because the annotators were already familiar with the events and documents (as they also did the document-level annotations for TAP-DLND 1.0), it was an advantage for them to do the sentence-level annotations.

4.3.2  Annotation Example .

We define the problem as associating a qualitative novelty score to a document based on the amount of new information contained in it. Let us consider the following example:

Source Text: Singapore, an island city-state off southern Malaysia, is a global financial center with a tropical climate and multicultural population. Its colonial core centers on the Padang, a cricket field since the 1830s and now flanked by grand buildings such as City Hall, with its 18 Corinthian columns. In Singapore’s circa-1820 Chinatown stands the red-and-gold Buddha Tooth Relic Temple, said to house one of Buddha’s teeth.

Target Text: Singapore is a city-state in Southeast Asia. Founded as a British trading colony in 1819, since independence, it has become one of the world’s most prosperous, tax-friendly countries and boasts the world’s busiest port. With a population size of over 5.5 million people, it is a very crowded city, second only to Monaco as the world’s most densely populated country.

The task is to find the novelty score of the target text with respect to the source text. It is quite clear that the target text has new information with respect to the source, except that the first sentence in the target contains some redundant content ( Singapore is a city-state ). Analyzing the first sentence in the target text, we obtain two pieces of information: that Singapore is a city-state and Singapore lies in Southeast Asia . Keeping the source text in mind, we understand that the first part is redundant whereas the second part has new information, that is, we can infer that 50% information is novel in the first target sentence. Here, we consider only the surface-level information in the text and do not take into account any pragmatic knowledge of the reader regarding the geographical location of Singapore and Malaysia in Asia. Here, our new information appetite is more fine-grained and objective.

This scoring mechanism, although straightforward, intuitively resembles the human-level perception of the amount of new information. However, we do agree that this approach attaches equal weights to long and short sentences. Long sentences would naturally contain more information, whereas short sentences would convey less information. Also, we do not consider the relative importance of sentences within the documents. However, for the sake of initial investigation and ease of annotation, we proceed with this simple quantitative view of novelty and create a dataset that would be a suitable testbed for our experiments to predict the document-level novelty score. Identifying and annotating an information unit would be complex. However, we plan for further research with annotation at the phrase-level and with relative importance scores.

4.4 Datasets for Allied Tasks

Finding semantic-level redundancy is more challenging than finding novelty in texts (Ghosal et al. 2018a ). The challenge scales up when it is at the level of documents. Semantic-level redundancy is a good approximation of non-novelty. Novel texts usually consist of new terms and generally are lexically different from the source texts. Hence with our experiments, we stress on detecting non-novelties, which would eventually lead us to identify novelties in text. Certain tasks could simulate the detection of non-novelty. Paraphrasing is one such linguistic task where paraphrases convey the same information as the source texts yet have a significantly less lexical similarity. Another task that comes close to identifying novelties in the text is plagiarism detection, which is a common problem in academia. We train our model with the document-level novelty datasets and test its efficacy to detect paraphrases and plagiarized texts. We use the following well-known datasets for our investigation.

4.4.1  Webis Crowd Paraphrase Corpus .

The Webis Crowd Paraphrase Corpus 2011 (Webis-CPC-11) (Burrows, Potthast, and Stein 2013 ) consists of 7,859 candidate paraphrases obtained from the Amazon Mechanical Turk crowdsourcing. The corpus 3 is made up of 4,067 accepted paraphrases, 3,792 rejected non-paraphrases, and the original texts. For our experiment, we assume the original text as the source document and the corresponding candidate paraphrase/non-paraphrase as the target document. We hypothesize that a paraphrased document would not contain any new information, and we treat them as non-novel instances. Table 4 shows an example of our interpretation of non-novelty in the dataset.

Sample text from Webis-CPC-11 to simulate the high-level semantic paraphrasing in the dataset.

4.4.2  P4PIN Plagiarism Corpus .

We use the P4PIN corpus (Sánchez-Vega 2016 ), a corpus especially built for evaluating the identification of paraphrase plagiarism. This corpus is an extension of the P4P corpus (Barrón-Cedeño et al. 2013 ), which contains pairs of text fragments where one fragment represents the original source text, and the other represents a paraphrased version of the original. In addition, the P4PIN corpus includes not paraphrase plagiarism cases, that is, negative examples formed by pairs of unrelated text samples with likely thematic or stylistic similarity. The P4PIN dataset consists of 3,354 instances, 847 positives, and 2,507 negatives. We are interested in detecting plagiarism cases and also seeing the novelty scores for each category of instances predicted by our model. Table 5 represents a plagiarism (non-novel) example from P4PIN.

Sample from P4PIN to show plagiarism (non-novel) instance.

4.4.3  Wikipedia Rewrite Corpus .

The dataset (Clough and Stevenson 2011 ) contains 100 pairs of short texts (193 words on average). For each of 5 questions about topics of computer science (e.g., “What is dynamic programming?”), a reference answer (source text, hereafter) has been manually created by copying portions of text from a relevant Wikipedia article. According to the degree of the rewrite, the dataset is 4-way classified as cut & paste (38 texts; a simple copy of text portions from the Wikipedia article), light revision (19; synonym substitutions and changes of grammatical structure allowed), heavy revision (19; rephrasing of Wikipedia excerpts using different words and structure), and no plagiarism (19; answer written independently from the Wikipedia article). We test or model on this corpus to examine the novelty scores predicted by our proposed approach for each category of answers. Please note that the information content for each of these answer categories is more or less the same as they cater to the same question. A sample from the dataset is shown in Table 6 . For easier comprehension and fairer comparison, we accumulate some relevant dataset statistics in Table 7 .

Sample from Wikipedia Rewrite Dataset to show a plagiarism (non-novel) instance.

Statistics of all the datasets. L → Average length of documents (sentences), Size → Size of the dataset in terms of number of documents. Emphasis is on detecting semantic-level non-novelty, which is supposedly more challenging than detecting novel texts.

In this section, we evaluate the performance of our proposed approach, comparing it with baselines and also with our earlier approaches. We further show how our model performs in allied tasks like paraphrase detection, plagiarism detection, and identifying rewrites.

5.1 Baselines and Ablation Study

We carefully choose our baselines so that those also help in our ablation study. Baseline 1 emphasizes the role of textual entailment (i.e., what happens if we do not use the entailment principle in our model). With the Baseline 2 system, we investigate what happens if we do not include the relevance detection module in our architecture. Baseline 3 is similar to our earlier forays ( Section 2.2 ) in the sense that we examine what happens if we do not assimilate information from multiple relevant premises and just fixate our attention to one single most relevant source premise. So, in essence, our Baseline systems 1, 2, 3 also signify our ablations on the proposed approach.

5.1.1  Baseline 1: Joint Encoding of Source and Target Documents .

With this baseline, we want to see the importance of TE for our task of textual novelty detection. We use the Transformer variant of the Universal Sentence Encoder (Cer et al. 2018 ) to encode sentences in the documents to fixed-sized sentence embeddings (512 dimensions) and then stack them up to form the document embedding. We pass the source and target document representations to an MLP for corresponding feature extraction and final classification via softmax.

5.1.2  Baseline 2: Importance of Relevance Detection .

With this baseline, we investigate the significance of relevance detection as a prior task to novelty detection. We turn off the relevance detection module and use the individual entailment decisions from the pre-trained ESIM model to arrive at the document-level aggregated decision.

5.1.3  Baseline 3: Single Premise .

We keep all other parameters of our proposed model intact, but instead of having multiple premises, we take only the closest (top) premise (from the source sentences) for each target sentence. This way, we want to establish the importance of aggregating multiple premise entailment decisions for document-level novelty detection.

5.1.4  Baseline 4: Using BERT with MLP .

We want to see how the recent state-of-the-art pre-trained large language models perform on our task. Essentially we use a BERT-base model (bert-base-uncased) with 12-layers, 12-attention-heads, and an embedding size of 768, for a total of 110M parameters and fine-tune on the novelty datasets in consideration. We feed the concatenation of source and target separated by [SEP] token into a pre-trained BERT (Bidirectional Encoder Representation from Transformers) (Devlin et al. 2019 ) model, then take the pooled output from the [CLS] token of the encoder and pass the representation so obtained to an MLP followed by classification via softmax. We take the implementation available in the HuggingFace library. 4 The original BERT model is pre-trained on the Toronto Book Corpus and Wikipedia. We keep the following hyperparameters during the task-specific (novelty detection) fine-tuning step: Learning rate: 2e-5, Num_train_epochs: 10, drop-out-rate: 0.1.

5.1.5  Baseline 5: Using a Simple Passage-level Aggregation Strategy .

We follow a simple passage-level aggregation strategy as in Wang et al. ( 2018 ). We concatenate the selected source premises (top f ) after the selection module to form the union passage of the premises (i.e., we do not scale with the relevance weights as in the original model) and then proceed next as per our proposed approach.

5.2 Comparing Systems

We compare with our earlier works on the same datasets, keeping all experimental configurations the same. A brief description of the prior work is in Section 2.2 . Kindly refer to the papers for a detailed overview of the techniques.

5.2.1  Comparing System-1 .

With our first exploration on document-level novelty detection (Ghosal et al. 2018b ), we use several features ranging from lexical similarity , semantic similarity , divergence , keywords/NEs overlap , new word count , and so on. The best-performing classifier was RF (Ho 1995 ). The idea was to exploit similarity and divergence-based handcrafted features for the problem. For more details on this comparing system, kindly refer to Section 2.2.1 . This is the paper where we introduced the TAP-DLND 1.0 dataset for document-level novelty detection.

5.2.2  Comparing System-2 .

With our next exploration, we introduce the concept of a RDV as a fused representation of the source and target documents (Ghosal et al. 2018a ). We use a CNN to extract useful features for classifying the target document into novelty classes. For more details on this comparing system, kindly refer to Section 2.2.2 .

5.2.3  Comparing System-3 .

To determine the amount of new information (novelty score) in a document, we generate a Source-Encapsulated Target Document Vector (SETDV) and train a CNN to predict the novelty score of the document (Ghosal et al. 2019 ). The value of the novelty score of a document ranges between 0 and 100 on the basis of new information content as annotated by our annotators (see Section 4.3). The architecture is quite similar to our RDV-CNN (Ghosal et al. 2018a ), except that here, instead of classification, we are predicting the novelty score of the target document. The motivation here is that it is not always straightforward to ascertain what amount of newness makes a document appear novel to a reader. It is subjective and depends on the novelty appetite of the reader (Zhao and Lee 2016 ). Hence, we attempted to quantify newness for documents. The SETDV-CNN architecture also manifests the two-stage theory of human recall (Tulving and Kroll 1995 ) (search and retrieval, recognition) to select the most probable premise documents for a given target document.

5.2.4  Comparing System-4 .

With this work, we went on to explore the role of textual alignment (via decomposable attention mechanism) between target and source documents to produce a joint representation (Ghosal et al. 2021 ). We use a feed-forward network to extract features and classify the target document on the basis of new information content. For more details on this comparing system, kindly refer to Section 2.2.3 .

5.3 BERT-NLI Variant of the Proposed Architecture

Because the contextual language models supposedly capture semantics better than the static language models, we experiment with a nearby variant in our proposed architecture. We make use of the BERT-based NLI model (Gao, Colombo, and Wang 2021 ) to examine the performance of BERT as the underlying language model in place of GloVe. This particular model is an instance of a NLI model, generated by fine-tuning Transformers on the SNLI and MultiNLI datasets (similar to ESIM). We use the same BERT-base variant as we do in Baseline 4. The rest of the architecture is similar to our proposed approach. We use the same BERT-based NLI model in the relevance module (to derive the relevance scores) and in the novelty detection module (for the final classification). We use the same configuration as Gao, Colombo, and Wang ( 2021 ) for fine-tuning the BERT-base on the NLI datasets. 5

5.4 Hyperparameter Details

Our current architecture uses the ESIM stack as the entailment model pre-trained on SNLI and MultiNLI for both the relevance and novelty detection modules. Binary Cross Entropy is the loss function, and the default dropout is 0.5. We train for 10 epochs with Adam optimizer and keep the learning rate as 0.0004. The final feed-forward network has ReLU activation with a dropout of 0.2. The input size for the Bi-LSTM context encoder is 300 dimensions. We use the GloVe 800B embeddings for the input tokens. For all uses of ESIM in our architecture, we initialize with the same pre-trained entailment model weights available with AllenNLP (Gardner et al. 2018 ).

5.5 Results

We discuss the results of our current approach in this section. We use TAP-DLND 1.0 and APWSJ datasets for our novelty classification experiments and the proposed TAP-DLND 2.0 dataset for quantifying new information experiments. We also report our experimental results on the Webis-CPC dataset, where we assume paraphrases to be simulating semantic-level non-novelty. We also show use cases of our approach for semantic-level plagiarism detection (another form of non-novelty in academia) with P4PIN and Wikipedia Rewrite datasets.

5.5.1  Evaluation Metrics .

We keep the usual classification metrics for the novelty classification task: Precision, Recall, F 1 score, and Accuracy. For the APWSJ dataset, instead of accuracy, we report the Mistake (100-Accuracy) to compare with the earlier works. For the novelty scoring experiments on TAP-DLND 2.0, we evaluate our baselines and proposed model against the ground-truth scores using Pearson correlation coefficient, mean absolute error (the lower, the better), root mean squared error (the lower, the better), and the cosine similarity between the actual scores and the predicted scores.

5.5.2  On TAP-DLND 1.0 Dataset .

Table 8 shows our results on TAP-DLND 1.0 dataset for the novelty classification task. As discussed, in Section 3.2 , here we keep f = 10, that is, the topmost ten relevant source sentences (based on α kf scores) as the relevant premises for each target sentence t k in the target document. We can see that our current approach performs comparably with our preceding approach (Comparing System 4). With a high recall for non-novel class, we can say that our approach has an affinity to discover document-level non-novelty , which is comparatively more challenging at the semantic level. The results in Table 9 are from 10-fold cross-validation experiments.

Results on TAP-DLND 1.0. P → Precision, R → Recall, A → Accuracy, R → Recall, N → Novel, NN → Non-Novel, 10-fold cross-validation output, PLA → Passage-level Aggregation, as in Wang et al. ( 2018 ).

Results for redundant class on APWSJ. Mistake → 100-Accuracy. Except for Zhang, Callan, and Minka ( 2002b ), all other results correspond to a 10-fold cross-validation output.

5.5.3  On APWSJ Dataset .

The APWSJ dataset is more challenging than TAP-DLND 1.0 because of the sheer number of preceding documents one has to process for deciding the state of the novelty of the current one. The first document in the chronologically ordered set of documents for a given topic is always novel as it starts the story. The novelty of all other documents is judged based on the chronologically preceding ones. Thus for the final document in a given topic (see Section 4.2 for the TREC topics), the network needs to process all the preceding documents in that topic. Although APWSJ was developed from an information retrieval perspective, we take a classification perspective (i.e., to classify the current document into novel or non-novel categories based on its chronological priors) for our experiments. Table 9 reports our result and compares it with earlier systems. Kindly note that we take the same experimental condition as the original paper (Zhang, Callan, and Minka 2002b ) and consider partially-redundant documents into the redundant class. Our current approach performs much better than the earlier reported results with f = 10, thereby signifying the importance of multi-premise entailment for the task at hand. We report our results on the redundant class as in earlier systems. Finding semantic-level non-novelty for documents is much more challenging than identifying whether a document has enough new things to say to classify it as novel .

5.5.4  On TAP-DLND 2.0 Dataset .

On our newly created dataset for predicting novelty scores, instead of classification we try to squash the output to a numerical score. We use the same architecture in Figure 1 but use sigmoid activation at the last layer to restrict the score within the range of 100. Table 10 shows our performance. This experiment is particularly important to quantify the amount of newness in the target document with respect to the source documents. Kindly note we allow a +5 and −5 range with respect to the human-annotated score for our predicted scores. We see that our current approach performs comparably with the earlier reported results.

Performance of the proposed approach against the baselines and comparing systems TAP-DLND 2.0. PC → Pearson Correlation Coefficient, MAE → Mean Absolute Error, RMSE → Root Mean-Squared Error, Cosine → Cosine similarity between predicted and actual score vectors. Comparing System 2 and 3 are thematically the same.

5.5.5  Ablation Studies .

As we mentioned, our baselines serve as means of ablation studies. Baseline 1 is the simplest one where we simply let the network discover useful features from the universal representations of the source-target pairs. We do not apply any sophisticated approach, and it performs the worst. Baseline 1 establishes the importance of our TE pipeline in the task. In Baseline 2, we do not consider the relevance detection module and hence do not include the relevance weights in the architecture. Baseline 2 performs much better than Baseline 1 (relative improvement of 8.3% in the TAP-DLND 1.0 dataset and minimizing mistakes to the extent of 10% for APWSJ). For Baseline 3, we take only the single most relevant premise (having the highest relevance score) instead of multiple premises. It improves over Baseline 2 by a margin of 3.9% for TAP-DLND 1.0 and 5.2% for APWSJ. We observe almost similar behavior for novelty-scoring in TAP-DLND 2.0. However, with our proposed approach, we attain significant performance gain over our ablation baselines, as is evident in Tables 8 , 9 , and 10 . Thus our analysis indicates the importance of having relevance scores in a multi-premise scenario for the task at hand.

5.6 Results on Related Tasks

To evaluate the efficacy of our approach, we went ahead to test our model on certain related tasks to textual novelty ( Section 4.4 ).

5.6.1  Paraphrase Detection .

As already mentioned, paraphrase detection is one such task that simulates the notion of non-novelty at the semantic level. Detecting semantic-level redundancies is not straightforward. We are interested in identifying those documents that are lexically distant from the source yet convey the same meaning (thus semantically non-novel). For our purpose, we experiment with the Webis-CPC-11 corpus, which consists of paraphrases from high-level literary texts (see Table 4 , for example, simulating non-novelty). We report our results on the paraphrase class as the non-paraphrase instances in this dataset do not conform to novel documents. We perform comparably with our earlier results ( Table 11 ). This is particularly encouraging because detecting semantic-level non-novelty is challenging, and the quality of texts in this dataset is richer than more straightforward newspaper texts ( Table 4 ).

Results for paraphrase class on Webis-CPC, 10-fold cross-validation output.

5.6.2  Plagiarism Detection .

We envisage plagiarism as one form of semantic-level non-novelty. We discuss our performance on plagiarism detection below.

P4PIN Dataset

Semantic-level plagiarism is another task that closely simulates non-novelty. The P4PIN dataset is not large (only 847 plagiarism instances) and is not suitable for a deep learning experiment setup. We adapt a transfer learning scheme and train our model on TAP-DLND 1.0 (novelty detection task), and test if our model can identify the plagiarism cases in P4PIN. We are not interested in the non-plagiarism instances as those do not conform to our idea of novelty . Non-plagiarism instances in P4PIN exhibit thematic and stylistic similarity to the content of the original text. We correctly classify 832 out of 847 plagiarized instances, yielding a sensitivity of 0.98 toward identifying semantic-level plagiarism. Figure 5a shows the predicted novelty scores for the documents in P4PIN (trained on TAP-DLND 2.0). We can clearly see that the concentration of novelty scores for the plagiarism class is at the bottom, indicating low novelty, while that for the non-plagiarism class is at the upper half, signifying higher novelty scores.

Predicted novelty scores for documents in P4PIN and WikiRewrite by our model trained on TAP-DLND 1.0.

Predicted novelty scores for documents in P4PIN and WikiRewrite by our model trained on TAP-DLND 1.0.

Wikipedia Rewrite

We also check how our model can identify the various degree of rewrites (plagiarism) with the Wikipedia Rewrite Dataset. Here again, we train on TAP-DLND 2.0. We take the negative log of the predicted scores (the higher the result, the less is the novelty score) and plot along the y -axis in Figure 5b . According to our definition, all the four classes of documents ( near-copy , light-revision , heavy-revision , non-plagiarism ) are not novel. But the degree of non-novelty should be higher for near copy , followed by light revision , and then heavy revision . Near Copy simulates a case of lexical-level plagiarism whereas light revision and heavy revision could be thought of as plagiarism at the semantic-level. The novelty scores predicted by our model display the novelty score concentration in clusters for each category. If there is no plagiarism, the novelty score is comparatively higher (non-plagiarism instances are at the bottom signifying higher novelty scores). All these performances of our approach on prediction of the non-novel instances indicates that finding multiple sources and assimilating the corresponding information to arrive at the judgement for novelty/non-novelty is essential.

5.7 On Using Contextual Language Models

It is quite evident from our experiments that the recent pre-trained large contextual language models (BERT in particular) with a simple architecture performs well with the concerned task (Baseline 4). The BERT-NLI version of the GloVe-based ESIM stack modeled as per our proposed approach performs comparably, sometimes even better. Especially the BERT-NLI version of our proposed approach performs better in identifying semantic-level redundancies (non-novelty, paraphrases). We assume that it would be an interesting direction to use the very large billion parameter language models (like T5 [Raffel et al. 2020 ], GPT3 [Brown et al. 2020 ], Megatron-Turing Natural Language Generation, 6 etc.) to automatically learn the notion of newness from the source-target itself.

The passage-level aggregation baseline (Wang et al. 2018 ) performed comparatively better than the other baselines; however, the proposed approach edged it. This is probably due to scaling the selected premise representations by their corresponding relevance scores.

5.8 Analysis

The actual documents in all of our datasets are long and would not fit within the scope of this article. Hence, we take the same example in Section 1 (Example 2) to analyze the performance of our approach.

Figure 6 depicts the heatmap of the attention scores between the target and source document sentences. We can clearly see that for target sentence t 1 the most relevant source sentences predicted by our model are s 1 , s 2 , s 3 . While we read t 1 ( Facebookwas launched in Cambridge ) against the source document, we can understand that t 1 is offering no new information. But in order to do that we need to do a multi-hop reasoning over s 1 ( Facebook ) → s 2 ( created in Harvard ) → s 3 ( Harvard is in Cambridge ). The other information in s 4 ( Zuckerberg lives in California ) does not contribute to ascertaining t 1 and hence is a distracting information. Our model pays low attention to s 4 .

Heatmap depicting the attention scores between the source and target document (Example 2 in Section 1). t1, t2 are the target document sentences (vertical axes), and s1, s2, s3, s4 are source document sentences (horizontal axes). The brighter the shade, the more is the alignment, signifying an affinity toward non-novelty.

Heatmap depicting the attention scores between the source and target document (Example 2 in Section 1 ). t 1 , t 2 are the target document sentences (vertical axes), and s 1 , s 2 , s 3 , s 4 are source document sentences (horizontal axes). The brighter the shade, the more is the alignment, signifying an affinity toward non-novelty.

Similarly, when we consider the next target sentence t 2 ( The founder resides in California ), we understand that s 4 ( Zuckerberg lives in California ), s 2 ( Zuckerberg created Facebook ), and s 1 ( Facebook ) are the source sentences, which ascertains that t 2 does not have any new information. s 3 ( Harvard is in Cambridge ) finds no relevance to the sentence in concern. Hence our model assigns lowest attention score to s 3 for t 2 , signifying that s 3 is a distracting premise.

Finally, our model predicts that the target document in concern is non-novel with respect to the source document. The predicted novelty-score was 20.59 on a scale of 100. Let us now take a more complicated example.

Source Document 1 (S 1 ): Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people who fall sick with COVID-19 will experience mild to moderate symptoms and recover without special treatment.

Source Document 2 (S 2 ): The virus that causes COVID-19 is mainly transmitted through droplets generated when an infected person coughs, sneezes, or exhales. These droplets are too heavy to hang in the air and quickly fall on floors or surfaces. You can be infected by breathing in the virus if you are within close proximity of someone who has COVID-19 or by touching a contaminated surface and then your eyes, nose, or mouth.

Source Document 3 (S 3 ): You can reduce your chances of being infected or spreading COVID-19 by regularly and thoroughly cleaning your hands with an alcohol-based hand rub or washing them with soap and water. Washing your hands with soap and water or using alcohol-based hand rub kills viruses that may be on your hands.

Target T 1 (Non-Novel): Coronavirus is a respiratory illness, meaning it is mainly spread through virus-laden droplets from coughs and sneezes. The government’s advice on Coronavirus asks the public to wash their hands more often and avoid touching their eyes, nose, and mouth. Hands touch many surfaces and can pick up viruses. Once contaminated, hands can transfer the virus to your eyes, nose, or mouth. From there, the virus can enter your body and infect you. You can also catch it directly from the coughs or sneezes of an infected person.

Target T 2 : COVID-19 symptoms are usually mild and begin gradually. Some people become infected but don’t develop any symptoms and feel unwell. Most people (about 80%) recover from the disease without needing special treatment. Older people, and those with underlying medical problems like high blood pressure, heart problems or diabetes, are more likely to develop serious illnesses.

The heatmap for the above examples after prediction is shown in Figure 7 . Keeping the source documents ( S 1 , S 2 , S 3 ) the same, we analyze our model’s prediction against the two Target Documents ( T 1 and T 2 ). The source document sentences are along the horizontal axes, while the target document sentences are along the vertical axes. After reading T 1 and T 2 against S 1 , S 2 , S 3 we can understand that T 1 is offering very little new information, however T 2 has some amount of new information ( Older people are more susceptible to the disease ). Our model predicts 22.73 and 40.30 as novelty scores for T 1 and T 2 , respectively, which is somewhat intuitive. Intuitively, both the target documents T 1 and T 2 appears non-novel with respect to the source documents S 1 , S 2 , and S 3 .

Heatmap depicting the attention scores between the source (S1, S2, S3) and target document (T1, T2). The brighter the shade, the more is the alignment, signifying an affinity toward non-novelty.

Heatmap depicting the attention scores between the source ( S 1 , S 2 , S 3 ) and target document ( T 1 , T 2 ). The brighter the shade, the more is the alignment, signifying an affinity toward non-novelty.

The third sentence in T 2 ( Most people (about 80%) recover from the disease without needing special treatment ) highly attends the second sentence in S 1 ( Most people who fall sick with COVID-19 will experience mild to moderate symptoms and recover without special treatment ). Similarly, the third sentence in S 2 pays greater attention to the fourth sentence in T 1 , signifying that the target sentence has less/no new information with respect to the source candidates.

We can see via the above heatmap figures how multiple premises in the source documents are attending the target sentences, which is correctly captured by our approach, hence establishing our hypothesis. We also experiment with our earlier best-performing model, Comparing System 4: Decomposable attention-based novelty detection. However, the decomposable attention-based model predicts the class incorrectly, as we can see in Figure 8 . The model assigns low attention values between the source-target pair sentences, hence predicting the target document as novel . However, our current approach correctly predicts the class label of the target document.

Heatmap of attention values from the decomposable attention-based model for novelty detection (Comparing System 4) for the Target T2 against the Source documents S1, S2, S3. Due to low attention values, the model predicts the document pair as ’Novel’, which is not correct.

Heatmap of attention values from the decomposable attention-based model for novelty detection (Comparing System 4) for the Target T 2 against the Source documents S 1 , S 2 , S 3 . Due to low attention values, the model predicts the document pair as ’Novel’, which is not correct.

5.9 Error Analysis

Long Documents: The misclassified instances in the datasets (APWSJ, TAP-DLND 1.0) are too long. Also, the corresponding source documents have a good amount of information. Although our architecture works at sentence-level and then composes at the document level, finding the relevant premises from large documents is challenging.

Non-coherence of Premises: Another challenge is to aggregate the premises as the premises are not in a coherent order after selection in the Selection Module.

Named Entities: Let us consider a misclassified instance (see the heatmap in Figure 9 ) with respect to the COVID-19 source documents in the earlier example.

Heatmap of the misclassification instance.

Heatmap of the misclassification instance.

Target T 3 (Novel): The world has seen the emergence of a Novel Corona Virus on 31 December 2019, officially referred to as COVID-19. The virus was first isolated from persons with pneumonia in Wuhan city, China. The virus can cause a range of symptoms, ranging from mild illness to pneumonia. Symptoms of the disease are fever, cough, sore throat, and headaches. In severe cases, difficulty in breathing and deaths can occur. There is no specific treatment for people who are sick with Coronavirus and no vaccine to prevent the disease.

We could clearly understand that T 3 has new information with respect to the source documents. But due to higher correspondence in NEs and certain content words (e.g., virus) between source-target pairs, our classifier may have got confused and predicted T 3 as non-novel. Kindly note that our documents in the actual dataset are much longer than the examples we demonstrate, adding more complexity to the task.

Textual Novelty Detection has an array of use-cases starting from search and retrieval on the Web, NLP tasks like plagiarism detection, paraphrase detection, summarization, modeling interestingness, fake news detection, and so forth. However, less attention is paid to the document-level variant of the problem in comparison to sentence-level novelty detection. In this work, we present a comprehensive account of our experiments so far on document-level novelty detection . We study existing literature on textual novelty detection as well as our earlier explorations on the topic. Here we assert that we would need to perform information assimilation from multiple premises to identify the novelty of a given text. Our current approach performs better than our earlier approaches. Also, we show that our method could be suitably applied to allied tasks like Plagiarism Detection and Paraphrase Detection. We point out some limitations of our approach, which we aim to explore next.

In the future, we would aim to explore novelty detection in scientific texts, which would be much more challenging than newspaper texts. We would also like to investigate how we could address situations when the number of source documents increases exponentially. An interesting direction to probe next would be to understand the subjectivity associated with the task across multiple human raters to understand better how newness is perceived by humans under different conditions. This would also help understand and probably eliminate any human biases toward the novelty labeling that may have accidentally crept in. We make our data and codes are available at https://github.com/Tirthankar-Ghosal/multipremise-novelty-detection .

This work sums up one chapter of the dissertation of the first author. The current work draws inspiration from our earlier works published in LREC 2018, COLING 2018, IJCNN 2019, and NLE 2020. We acknowledge the contributions and thank the several anonymous reviewers for their suggestions to take up this critical challenge and improve our investigations. We thank our annotators, Ms. Amitra Salam and Ms. Swati Tiwari, for their commending efforts to develop the dataset. We also thank the Visvesvaraya Ph.D. Scheme of Digital India Corporation under the Ministry of Electronics and Information Technology, Government of India, for providing Ph.D. fellowship to the first author and faculty award to the fourth author to do our investigations on Textual Novelty. Dr. Asif Ekbal acknowledges the Visvesvaraya Young Faculty Research Fellowship (YFRF) Award, supported by the Ministry of Electronics and Information Technology (MeitY), Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia) for this research.

https://en.wikipedia.org/wiki/Shelby_Foote .

https://searchengineland.com/googles-matt-cutts-25-30-of-the-webs-content-is-duplicate-content-thats-okay-180063 .

https://www.uni-weimar.de/en/media/chairs/computer-science-department/webis/data/corpus .

https://huggingface.co/transformers/model_doc/bert.html#bertmodel .

https://github.com/yg211/bert_nli .

https://tinyurl.com/megatron-nvidia .

Author notes

Email alerts, related articles, related book chapters, affiliations.

  • Online ISSN 1530-9312
  • Print ISSN 0891-2017

A product of The MIT Press

Mit press direct.

  • About MIT Press Direct

Information

  • Accessibility
  • For Authors
  • For Customers
  • For Librarians
  • Direct to Open
  • Open Access
  • Media Inquiries
  • Rights and Permissions
  • For Advertisers
  • About the MIT Press
  • The MIT Press Reader
  • MIT Press Blog
  • Seasonal Catalogs
  • MIT Press Home
  • Give to the MIT Press
  • Direct Service Desk
  • Terms of Use
  • Privacy Statement
  • Crossref Member
  • COUNTER Member  
  • The MIT Press colophon is registered in the U.S. Patent and Trademark Office

This Feature Is Available To Subscribers Only

Sign In or Create an Account

IMAGES

  1. Actuality and novelty of researches

    novelty of a research paper

  2. Sample Research Paper

    novelty of a research paper

  3. How to Write and Publish a Research Paper.pdf

    novelty of a research paper

  4. 43+ Research Paper Examples

    novelty of a research paper

  5. ECE Novelty Handout

    novelty of a research paper

  6. (PDF) Assessment of Novelty for Intellectual Property with Implications

    novelty of a research paper

VIDEO

  1. WEBINAR : STATE OF THE ART, RESEARCH GAP, AND NOVELTY IN RESEARCH

  2. How to Write an Review Article

  3. paper made flower

  4. paper plane

  5. Scholarly Vs. Popular Sources

  6. 5(b) Novelty: Research Details

COMMENTS

  1. How can I highlight the novelty of my research in the manuscript?

    Answer: The best way to highlight the novelty in your study is by comparing it with the work that was done by others and pointing out the things that your study does which was never done before. To do this, you should first c onduct a thorough literature search to identify what is already known in your field of research and what are the gaps to ...

  2. Novelty in Research: What It Is and How to Know Your Work is Original

    The word 'novelty' comes from the Latin word 'novus,' which simply means new. Apart from new, the term is also associated with things, ideas or products for instance, that are original or unusual. Novelty in research refers to the introduction of a new idea or a unique perspective that adds to the existing knowledge in a particular ...

  3. Novelty in research: What it is and how to know if your work is

    What is means by novelty in research? The word 'novelty' comes from the Latin word 'novus,' which simply applies new. Divided from new, the terminology is plus associated with things, ideas alternatively related for instance, that were original other unusual. ... An overwhelming number of research papers are issued every day, making it ...

  4. Introducing a novelty indicator for scientific research: validating the

    To measure the novelty of individual scientific papers, this study adopts a novelty indicator based on the combination-based novelty measure proposed by Dahlin and Behrens ().To assess the novelty of patents, Dahlin and Behrens proposed quantifying the degree of citation similarity between a focal patent and prior arts in the same technological domain to capture unusual knowledge recombination.

  5. Three keys to unlocking successful manuscripts

    Although scientific journals have different evaluation criteria for the novelty of scientific papers, a lack of novelty has always been the most common reason for manuscript rejection. 1, 3 Thus, highlighting and clarifying the novelty of the research topic during manuscript preparation plays key role in overall manuscript publication, which ...

  6. Scientific collaboration, research funding, and novelty in ...

    Novelty in scholarly publications. Along with the access to publication data, substantial bibliometric research focusing on the impact of research has been conducted [].Systematic analysis of citations serves as a good proxy for scientific performance, since it implies the realization of peer-recognition of a paper and impact on science community [].

  7. How should novelty be valued in science?

    Not surprisingly this crisis follows an explosion in papers reporting weak claims of novelty (Henikoff and Levis, 1991; Friedman and Karlsson, 1997). Others have argued that the reward system in modern molecular biology incentivizes statistically underpowered research designs ( Higginson and Munafò, 2016 ).

  8. Avoiding the Empty Review: Answering "How Novel and Significant Is This

    Authors are asked by the reviewers to provide additional evidence to support their conclusions, consider alternative interpretations of data, and revise unclear text or provide additional context for conclusions. Notably, discussions of novelty and significance are often absent or empty statements within the review. Consider research novelty.

  9. How to highlight novelty in your research paper?

    To highlight novelty in the research paper mention important features in abstract or in the conclusion part of the research article. Write short sentences. Summarize the result of each experiment ...

  10. Measuring novelty in science with word embedding

    Novelty is a core value in science, and a reliable measurement of novelty is crucial. This study proposes a new approach of measuring the novelty of scientific articles based on both citation data and text data. The proposed approach considers an article to be novel if it cites a combination of semantically distant references. To this end, we first assign a word embedding-a vector ...

  11. How to ensure novelty effect in research?

    Novelty can be described as the quality of being new, original or unusual. Novelty in scientific publishing is crucial, because journal editors and peer reviewers greatly prize novel research over and above confirmatory papers or research with negative results. After all, why give precious and limited journal space to something previously ...

  12. Novelty in research: A common reason for manuscript rejection!

    Novelty in research propels the industry to excel and outdo itself. Can novelty in research be measured? The answer is a resounding yes. Traditionally, it has been measured through peer reviews and by applying bibliometric measures such as citation or text data, keeping in mind their inherent limitations. However, word embedding is a new ...

  13. PDF Introducing a novelty indicator for scientific research: validating the

    sive. However, despite its potential to measure the elaborate novelty of research, studies of novelty indicators using paired reference papers as a measure remain scarce. To the best of our knowledge, only a few papers can be found in this category, of which Dahlin and Beh-rens (2005) was one of the rst to present the indicator.

  14. What is Novelty ?

    The novelty of the research and research impact can be a strategic way to engage the attention of the readers in a research paper. The essence of novelty of the obtained results of the research ...

  15. Novelty of a research paper

    1. The goal of a research paper is to advance the knowledge of the field. There is no goal that every known fact be documented in scientific literature. So, you need to ask yourself whether your work truly contributes to the body of knowledge. If unsure, ask a supervisor, colleague or the editor of the journal you would consider submitting to.

  16. Novelty Detection: A Perspective from Natural Language Processing

    Research on sentence-level novelty detection gained prominence in the novelty tracks of Text Retrieval Conferences (TREC) from 2002 to 2004 (Harman 2002b; Soboroff and Harman 2003; ... This is the paper where we introduced the TAP-DLND 1.0 dataset for document-level novelty detection.

  17. The prominent and heterogeneous gender disparities in scientific

    Scientific novelty is the essential driving force for research breakthroughs and innovation. However, little is known about how early-career scientists pursue novel research paths, and the gender disparities in this process. To address this research gap, this study investigates a comprehensive dataset of 279,424 doctoral theses in biomedical sciences authored by US Ph.D. graduates.

  18. PDF National Bureau of Economic Research

    National Bureau of Economic Research