Graph Enhanced Representation Learning for News Recommendation

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations.

  • Jiang S Lu Y Song H Lu Z Zhang Y (2024) A Hybrid News Recommendation Approach Based on Title–Content Matching Mathematics 10.3390/math12132125 12 :13 (2125) Online publication date: 6-Jul-2024 https://doi.org/10.3390/math12132125
  • Zhang G Li D Gu H Lu T Gu N (2024) Heterogeneous Graph Neural Network with Personalized and Adaptive Diversity for News Recommendation ACM Transactions on the Web 10.1145/3649886 18 :3 (1-33) Online publication date: 6-May-2024 https://dl.acm.org/doi/10.1145/3649886
  • Chen B Xu Y Zhen J He X Fang Q Cao J (2024) NRMG: News Recommendation With Multiview Graph Convolutional Networks IEEE Transactions on Computational Social Systems 10.1109/TCSS.2023.3266520 11 :2 (2245-2255) Online publication date: Apr-2024 https://doi.org/10.1109/TCSS.2023.3266520
  • Show More Cited By

Index Terms

Computing methodologies

Machine learning

Information systems

Information retrieval

Retrieval tasks and goals

Document filtering

Information systems applications

World Wide Web

Recommendations

Personalized news recommendation: methods and challenges.

Personalized news recommendation is important for users to find interesting news information and alleviate information overload. Although it has been extensively studied over decades and has achieved notable success in improving user experience, there are ...

Graph neural news recommendation based on multi-view representation learning

Accurate news representation is of crucial importance in personalized news recommendation. Most of existing news recommendation model lack comprehensiveness because they do not consider the higher-order structure between user–news interactions, ...

Personalized news recommendation: a review and an experimental investigation

Online news articles, as a new format of press releases, have sprung up on the Internet. With its convenience and recency, more and more people prefer to read news online instead of reading the paper-format press releases. However, a gigantic amount of ...

Information

Published in.

Acadmica sinica, Taiwan

The Chinese University of Hong Kong, Hong Kong

Microsoft Research Asia, China

University of Twente, Netherlands

  • SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

  • Graph Attention Network
  • News Recommendation
  • Transformer
  • Research-article
  • Refereed limited

Acceptance Rates

Contributors, other metrics, bibliometrics, article metrics.

  • 66 Total Citations View Citations
  • 1,646 Total Downloads
  • Downloads (Last 12 months) 139
  • Downloads (Last 6 weeks) 10
  • Li Y Yang Z Li X Li Z (2024) CAFI: News Recommendation with Candidate Perception of Fine-Grained Interaction Information 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD) 10.1109/CSCWD61410.2024.10580786 (1952-1959) Online publication date: 8-May-2024 https://doi.org/10.1109/CSCWD61410.2024.10580786
  • Ma A Yu Y Shi C Guo Z Chua T (2024) Cross-view hypergraph contrastive learning for attribute-aware recommendation Information Processing and Management: an International Journal 10.1016/j.ipm.2024.103701 61 :4 Online publication date: 1-Jul-2024 https://dl.acm.org/doi/10.1016/j.ipm.2024.103701
  • Ding Y Wang B Cui X Xu M (2024) Popularity prediction with semantic retrieval for news recommendation Expert Systems with Applications 10.1016/j.eswa.2024.123308 (123308) Online publication date: Feb-2024 https://doi.org/10.1016/j.eswa.2024.123308
  • Liu W Zhang Z Wang B (2024) Dual-view hypergraph attention network for news recommendation Engineering Applications of Artificial Intelligence 10.1016/j.engappai.2024.108256 133 (108256) Online publication date: Jul-2024 https://doi.org/10.1016/j.engappai.2024.108256
  • Wang Y Zhang D Wulamu A (2024) A Multi-User-Multi-Scenario-Multi-Mode aware network for personalized recommender systems Engineering Applications of Artificial Intelligence 10.1016/j.engappai.2024.108169 133 (108169) Online publication date: Jul-2024 https://doi.org/10.1016/j.engappai.2024.108169
  • Kiss R Szűcs G (2024) Unsupervised Graph Representation Learning with Inductive Shallow Node Embedding Complex & Intelligent Systems 10.1007/s40747-024-01545-6 Online publication date: 12-Jul-2024 https://doi.org/10.1007/s40747-024-01545-6
  • Azizi A Momtazi S (2024) SNRBERT: session-based news recommender using BERT User Modeling and User-Adapted Interaction 10.1007/s11257-024-09409-x Online publication date: 31-Jul-2024 https://doi.org/10.1007/s11257-024-09409-x

View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

HTML Format

View this article in HTML Format.

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

Graph Enhanced Representation Learning for News Recommendation

With the explosion of online news, personalized news recommendation becomes increasingly important for online news platforms to help their users find interesting information. Existing news recommendation methods achieve personalization by building accurate news representations from news content and user representations from their direct interactions with news (e.g., click), while ignoring the high-order relatedness between users and news. Here we propose a news recommendation method which can enhance the representation learning of users and news by modeling their relatedness in a graph setting. In our method, users and news are both viewed as nodes in a bipartite graph constructed from historical user click behaviors. For news representations, a transformer architecture is first exploited to build news semantic representations. Then we combine it with the information from neighbor news in the graph via a graph attention network. For user representations, we not only represent users from their historically clicked news, but also attentively incorporate the representations of their neighbor users in the graph. Improved performances on a large-scale real-world dataset validate the effectiveness of our proposed method.

1. Introduction

Both the overwhelming number of newly-sprung news and huge volumes of online news consumption pose challenges to online news aggregating platforms. Thus, how to target different users’ news reading interests and avoid showcasing excessive irrelevant news becomes an important problem for these platforms  (Phelan et al . , 2011 ; Liu et al . , 2010 ) . A possible solution is personalized news recommendation, which depicts user interests from previous user-news interactions  (Li et al . , 2011 ; Bansal et al . , 2015 ) . However, unlike general personalized recommendation methods, news recommendation is unique from certain aspects. The fast iteration speed of online news makes traditional ID-based recommendation methods such as collaborative filtering (CF) suffer from data sparsity problem  (Guo et al . , 2014 ) . Meanwhile, rich semantic information in news texts distinguishes itself from recommendation in other domains (e.g., music, fashion and food). Therefore, a precise understanding of textual content is also vital for news recommendation.

Refer to caption

Existing news recommendation methods achieve personalized news ranking by building accurate news and user representations. They usually build news representations from news content  (Bansal et al . , 2015 ; Lian et al . , 2018 ; Zhu et al . , 2019 ; Wu et al . , 2019b ) . Based on that, user representations are constructed from their click behaviors, e.g., the aggregation of their clicked news representations. For example, Wang et al. proposed DKN  (Wang et al . , 2018 ) , which formed news representations from their titles via convolutional neural network (CNN). Then they utilized an attention network to select important clicked news for user representations. Wu et al.  (Wu et al . , 2019d ) further enhanced personalized news representations by incorporating user IDs as attention queries to select important words in news titles. When forming user representations, the same attention query was used to select important clicked news. Compared with traditional collaborative filtering methods  (Konstan et al . , 1997 ; Ren et al . , 2017 ; Ling et al . , 2014 ) , which suffer heavy cold-start problems  (Lika et al . , 2014 ) , these methods gained a competitive edge by learning semantic news representations directly from news context. However, most of them build news representations only from news content and build user representations only from users’ historically clicked news. When the news content such as titles are short and vague, and the historical behaviors of user are sparse, it is difficult for them to learn accurate news and user representations.

Our work is motivated by several observations. First, from user-news interactions, a bipartite graph can be established. Within this graph, both users and news can be viewed as nodes and interactions between them can be viewed as edges. Among them, some news are viewed by the same user, thus are defined as neighbor news. Similarly, specific users may share common clicked news, and are denoted as neighbor users. For example, in Figure  1 , news n 1 subscript 𝑛 1 n_{1} and n 5 subscript 𝑛 5 n_{5} are neighbors because they are both clicked by user u 2 subscript 𝑢 2 u_{2} . Meanwhile, u 1 subscript 𝑢 1 u_{1} and u 2 subscript 𝑢 2 u_{2} are neighbor users. Second, news representation may be enhanced by considering neighbor news in the graph. For example, neighbor news n 1 subscript 𝑛 1 n_{1} and n 5 subscript 𝑛 5 n_{5} both relates to politics. However, the expression “The King” in n 5 subscript 𝑛 5 n_{5} is vague without any external information. By linking it to news n 1 subscript 𝑛 1 n_{1} , which is more detailed and explicit, we may infer that n 5 subscript 𝑛 5 n_{5} talks about president Trump. Thus, when forming news representation for n 5 subscript 𝑛 5 n_{5} , n 1 subscript 𝑛 1 n_{1} may be modeled simultaneously as a form of complementary information. Third, neighbor users in the graph may share some similar news preferences. Incorporating such similarities may further enrich target user representations. As illustrated, u 1 subscript 𝑢 1 u_{1} and u 2 subscript 𝑢 2 u_{2} share common clicked news n 1 subscript 𝑛 1 n_{1} , indicating that they may be both interested in political news. Nevertheless, it is challenging to form accurate user representation for u 1 subscript 𝑢 1 u_{1} since the click history of u 1 subscript 𝑢 1 u_{1} is very sparse. Thus, explicitly introducing information from u 2 subscript 𝑢 2 u_{2} may enrich the representation of u 1 subscript 𝑢 1 u_{1} and lead to better recommendation performances.

In this paper, we propose to incorporate the graph relatedness of users and news to enhance their representation learning for news recommendation. First, we utilize the transformer  (Vaswani et al . , 2017 ) to build news semantic representations from textual content. In this way, the multi-head self-attention network encodes word dependency in titles at both short and long distance. We also add topic embeddings of news since they may contain important information. Then we further enhance news representations by aggregating neighbor news via a graph attention network. To enrich neighbour news representations, we utilize both their semantic representations and ID embeddings. For user representations, besides attentively building user representations from their ID embeddings and historically clicked news, our approach also leverages graph information. We use the attention mechanism to aggregate the ID embeddings of neighbor users. Finally, recommendation is made by taking the dot product between user and news representations. We conduct extensive experiments on a large real-world dataset. The improved performances over a set of well-known baselines validate the effectiveness of our approach.

2. Related Work

Neural news recommendation receives attention from both data mining and natural language processing fields  (Zheng et al . , 2018 ; Wang et al . , 2017 ; Hamilton et al . , 2017 ) . Many previous works handle this problem by learning news and user representations from textual content  (Wu et al . , 2019d ; An et al . , 2019 ; Zhu et al . , 2019 ; Wu et al . , 2019c ) . From such viewpoint, user representations are built upon clicked news representations using certain summation techniques (e.g., attentive aggregation or sequential encoding). For instance, Okura  (Okura et al . , 2017 ) incorporated denoising autoencoder to form news representations. Then they explored various types of recurrent networks to encode users. An et. al  (An et al . , 2019 ) attentively encoded news by combining title and topic information. They learned news representations via CNN and formed user representations from their clicked news via a gated recurrent unit (GRU) network. Zhu et. al.  (Zhu et al . , 2019 ) exploited long short-term memory network (LSTM) to encode clicked news, then applied a single-directional attention network to select important click history for user representations. Though effective in extracting information from textual content, the works presented above neglect relatedness between neighbor users (or items) in the interaction graph. Different from their methods, our approach exploits both context meaning and neighbor relatedness in graph.

Recently, graph neural networks (GNN) have received wide attention, and a surge of attempts have been made to develop GNN architectures for recommender systems  (Ying et al . , 2018 ; Wu et al . , 2019a ; Hamilton et al . , 2017 ) . These models leverage both node attributes and graph structure by representing users and items using a combination of neighbor node embeddings  (Song et al . , 2019 ) . For instance, Wang et. al.  (Wang et al . , 2019a ) combined knowledge graph (KG) with collaborative signals via a graph attention network, thus enhancing user and item representations with entity information in KG. Ying et. al.  (Ying et al . , 2018 ) introduced graph convolution to web-scale recommendation. Node representations of users and items were formed using visual and annotation features. In most works, representations are initially formed via node embedding, then optimized by receiving propagation signals from the graph  (Wang et al . , 2019b ; Wu et al . , 2019a ) . Although node embeddings are enhanced by adding item relation  (Xin et al . , 2019 ) , visual features  (Ying et al . , 2018 ) or knowledge graphs  (Wang et al . , 2019c ) , rich semantic information in the textual content may not be fully exploited. Different form their work, our approach learns the node embeddings of news directly from its textual content. We utilize the transformer architecture to model context dependency in news titles. Thus, our approach improves the node embedding by forming context-aware news representation.

3. Our Approach

Refer to caption

In this section, we will introduce our G raph E nhanced R epresentation L earning ( GERL ) approach illustrated in Figure  2 , which consists of a one-hop interaction learning module and a two-hop graph learning module. The one-hop interaction learning module represents target user from historically clicked news and represents candidate news based on its textual content. The two-hop graph learning module learns neighbor embeddings of news and users using a graph attention network.

3.1. Transformer for Context Understanding

Motivated by Vaswani et al.  (Vaswani et al . , 2017 ) , we utilize the transformer to form accurate context representations from news titles and topics. News titles are usually clear and concise. Hence, to avoid the degradation of performance caused by excessive parameters, we simplify the transformer to single layer of multi-head attention. 1 1 1 We also tried the original transformer architecture but the performance is sub-optimal.

The following layer is a word-level multi-head self-attention network. Interactions between words are important for learning news representations. For instance, in the title “Sparks gives Penny Toler a fire from the organization”, the interaction between “Sparks” and “organization” helps understand the title. Moreover, a word may relate to more than one words in the title. For example, the word “Sparks” has interactions with both words “fire” and “organization”. Thus, we employ the multi-head self-attention to form contextual word representations. The representation of the i t ​ h superscript 𝑖 𝑡 ℎ i^{th} word learned by the k t ​ h superscript 𝑘 𝑡 ℎ k^{th} attention head is computed as:

(1)

Next, we utilize an additive word attention network to model relative importance of different words and aggregate them into title representations. For instance, the word “fire” is more important than other words in the above example. The attention weight β i w superscript subscript 𝛽 𝑖 𝑤 \beta_{i}^{w} of the i t ​ h subscript 𝑖 𝑡 ℎ i_{th} word is computed as:

(2)

where 𝐪 w subscript 𝐪 𝑤 \mathbf{q}_{w} , 𝐔 w subscript 𝐔 𝑤 \mathbf{U}_{w} and 𝐮 w subscript 𝐮 𝑤 \mathbf{u}_{w} are trainable parameters in the word attention network. The news title representation 𝐯 t subscript 𝐯 𝑡 \mathbf{v}_{t} is then calculated as: 𝐯 t = Σ i = 1 M ​ β i w ​ 𝐡 i subscript 𝐯 𝑡 superscript subscript Σ 𝑖 1 𝑀 superscript subscript 𝛽 𝑖 𝑤 subscript 𝐡 𝑖 \mathbf{v}_{t}=\Sigma_{i=1}^{M}\beta_{i}^{w}\mathbf{h}_{i} .

3.2. One-hop Interaction Learning

The one-hop interaction learning module learns candidate news and click behaviors of target users. More specifically, it can be decomposed into three parts: (1) Candidate news semantic representations; (2) Target user semantic representations; (3) Target user ID representations.

Candidate News Semantic Representations. Since understanding the content of candidate news is crucial for recommendation, we propose to utilize the transformer to form accurate representation of it. Given the candidate news n 𝑛 n , the one-hop (denoted as superscript O ) output of the transformer module (denoted as subscript t ) is 𝐧 t O superscript subscript 𝐧 𝑡 𝑂 \mathbf{n}_{t}^{O} .

(3)

where 𝐪 n subscript 𝐪 𝑛 \mathbf{q}_{n} , 𝐔 n subscript 𝐔 𝑛 \mathbf{U}_{n} and 𝐮 n subscript 𝐮 𝑛 \mathbf{u}_{n} are the trainable parameters of the news attention network. The one-hop user semantic representation 𝐮 t O superscript subscript 𝐮 𝑡 𝑂 \mathbf{u}_{t}^{O} is then calculated as: 𝐮 t O = Σ i = 1 K ​ β i n ​ 𝐯 i superscript subscript 𝐮 𝑡 𝑂 superscript subscript Σ 𝑖 1 𝐾 superscript subscript 𝛽 𝑖 𝑛 subscript 𝐯 𝑖 \mathbf{u}_{t}^{O}=\Sigma_{i=1}^{K}\beta_{i}^{n}\mathbf{v}_{i} .

Target User ID Representations. Since user IDs represent each user uniquely, we incorporate them as latent representations of user interests  (Lv et al . , 2011 ; Marlin and Zemel, 2004 ) . We use a trainable ID embedding matrix ℳ u ∈ ℛ N u × Q subscript ℳ 𝑢 superscript ℛ subscript 𝑁 𝑢 𝑄 \mathcal{M}_{u}\in\mathcal{R}^{N_{u}\times Q} to represent each user ID as a low-dimensional vector, where N u subscript 𝑁 𝑢 N_{u} is the number of users and Q 𝑄 Q is the dimension of the ID embedding. For the user u 𝑢 u , the one-hop ID embedding vector is denoted as 𝐮 e O superscript subscript 𝐮 𝑒 𝑂 \mathbf{u}_{e}^{O} .

3.3. Two-hop Graph Learning

The two-hop graph learning module mines the relatedness between neighbor users and news from the interaction graph. Additionally, for a given target user, neighbor users usually have different levels of similarity with her/his. The same situation exists between neighbor news. To utilize this kind of similarity, we aggregate neighbor news and user information with a graph attention network  (Song et al . , 2019 ) . The utilized graph information here is heterogeneous, including both semantic representations and ID embeddings. In this two-hop graph learning module, there are also three parts: (1) Neighbor user ID representations; (2) Neighbor news ID representations; (3) Neighbor news semantic representations.

(4)

where 𝐪 u subscript 𝐪 𝑢 \mathbf{q}_{u} , 𝐔 u subscript 𝐔 𝑢 \mathbf{U}_{u} and 𝐮 u subscript 𝐮 𝑢 \mathbf{u}_{u} are trainable parameters in the neighbor user attention network. The two-hop neighbor user ID representation 𝐮 e T superscript subscript 𝐮 𝑒 𝑇 \mathbf{u}_{e}^{T} is then calculated as: 𝐮 e T = Σ i = 1 D ​ β i u ​ 𝐦 u i superscript subscript 𝐮 𝑒 𝑇 superscript subscript Σ 𝑖 1 𝐷 superscript subscript 𝛽 𝑖 𝑢 subscript 𝐦 subscript 𝑢 𝑖 \mathbf{u}_{e}^{T}=\Sigma_{i=1}^{D}\beta_{i}^{u}\mathbf{m}_{u_{i}} .

3.4. Recommendation and Model Training

𝜆 1 \lambda+1 way classification task. We regard the clicked news as positive and the rest λ 𝜆 \lambda unclicked news as negative. We apply maximum likelihood method to minimize the log-likelihood on the positive class:

(5)

4. Experiments

4.1. datasets and experimental settings.

We constructed a large-scale real-world dataset by randomly sampling user logs from MSN News,  2 2 2 https://www.msn.com/en-us/news. statistics of which are shown in Table  1 . The logs were collected from Dec. 13rd, 2018 to Jan. 12nd, 2019 and split by time, with logs in the last week for testing, 10% of the rest for validation and others for training.

In our experiment, we construct D 𝐷 D neighbors of the candidate news by random sampling from the clicked logs of its previous users. For the target user, since there exist massive neighbors users, we rank them according to the number of common clicked news with the target user. Then we pertain the top D 𝐷 D users and use them as graph inputs. Here we set D 𝐷 D to be 15 and use zero padding for cold-start user and newly-sprung news.  3 3 3 Due to limit of computational resources,we set D 𝐷 D to be this moderate value. The dimensions of word embedding, topic embedding and ID embedding are set to 300, 128 and 128 respectively. We use the pretrained Glove embedding  (Pennington et al . , 2014 ) to initialize the embedding matrix. There are 8 heads in the multi-head self-attention network, and the output dimension of each head is 16. The negative sampling ratio λ 𝜆 \lambda is set to 4. The maximum number of user clicked news is set to 50, and the maximum length of news title is set to 30. To mitigate overfitting, we apply dropout strategy  (Srivastava et al . , 2014 ) with the rate of 0.2 after outputs from the transformer and ID embedding layers. Adam  (Kingma and Ba, 2014 ) is set to be the optimizer and the batch size is set to be 128. These hyperparameters are selected according to the performances on the validation dataset.

For evaluation, we use the average AUC, MRR, nDCG@5 and nDCG@10 scores over all impressions. We independently repeat each experiment for 5 times and report the average results.

# users 242,175 # samples 32,563,990
# news 249,038 # positive samples 805,411
# sessions 377,953 # negative samples 31,758,579
# avg. words per title 10.99 # topics 285

4.2. Performance Evaluation

In this section, we will evaluate the performance of our approach by comparing it with some baseline methods and a variant of our own method, which are listed as follow:

NGCF   (Wang et al . , 2019b ) : a graph neural network based collaborative filtering method for general recommendation. They use ID embeddings as node representations.

LibFM   (Rendle, 2012 ) : a feature based model for general recommendation using matrix factorization.

Wide&Deep   (Cheng et al . , 2016 ) : a general recommendation model which has both a linear wide channel and a deep dense-layer channel.

DFM   (Lian et al . , 2018 ) : a neural news model utilizing an inception module to learn user features and a dense layer to merge them with item features.

DSSM   (Huang et al . , 2013 ) : a sparse textual feature based model which learns news representation via multiple dense layers.

DAN   (Zhu et al . , 2019 ) : a CNN based news model which learns news representations from news titles. An attentional LSTM is used to learn user representations.

GRU   (Okura et al . , 2017 ) : a deep news model using an auto-encoder to learn news representations and a GRU network to learn user representations.

DKN   (Wang et al . , 2018 ) : a CNN based news model enhanced by the knowledge graph. They utilize news-level attention to form user representations.

GERL-Graph : Our model without the two-hop graph learning .

Methods AUC MRR nDCG@5 nDCG@10
NGCF  , ) 55.45 0.16 17.19 0.05 17.23 0.10 22.08 0.09
LibFM  ) 61.83 0.10 19.31 0.06 20.45 0.08 25.69 0.08
Wide&Deep  , ) 64.62 0.14 20.71 0.12 22.43 0.15 27.99 0.15
DFM  , ) 64.72 0.19 20.75 0.14 22.60 0.20 28.22 0.19
DSSM  , ) 65.49 0.18 20.93 0.13 22.93 0.22 28.65 0.27
DAN  , ) 65.52 0.13 21.25 0.18 23.14 0.21 28.73 0.15
GRU  , ) 65.69 0.19 21.29 0.10 23.16 0.11 28.75 0.11
DKN  , ) 65.88 0.13 21.46 0.21 23.23 0.25 28.84 0.21
GERL-Graph 67.74 0.13 22.71 0.15 25.03 0.13 30.65 0.15
GERL 68.55 0.12 23.33 0.10 25.82 0.14 31.44 0.12

For fair comparison, we extract the TF-IDF  (Jones, 2004 ) feature from the concatenation of the clicked or candidate news titles and topics as sparse feature inputs for LibFM, Wide&Deep, DFM and DSSM. For DSSM, the negative sampling ratio is also set to 4. We try to tune all baselines to their best performances. The experimental results are summarized in Table  2 , and we have several observations:

First, methods which represent news directly from news texts (e.g., DAN, GRU, DKN, GERL-Graph, GERL) usually outperform feature based methods (e.g., LibFM, Wide&Deep, DFM, DSSM). The possible reason is that although feature based methods learn news content, the useful information exploited from news texts is limited, which may lead to sub-optimal news recommendation results.

Second, compared with NGCF, which also exploits neighbor information in the graph, our method achieves better results. This is because NGCF is an ID-based collaborative filtering method, which may suffer from cold-start problem significantly. This result further proves the effectiveness of introducing textual understanding into graph neural networks for news recommendation.

Third, compared with other methods that involve textual content of news (e.g., DAN, GRU, DKN), our GERL-Graph can consistently outperform other baseline methods. This may because the multi-head attention in transformer module learns contextual dependency accurately. Moreover, our approach utilizes attention mechanism to select important words and news.

Fourth, our GERL approach which combines both textual understanding and graph relatedness learning outperforms all other methods. This is because GERL encodes neighbor user and news information by attentively exploiting the interaction graph. The result validates the effectiveness of our approach.

4.3. Effectiveness of Graph Learning

Refer to caption

To validate the effectiveness of the two-hop graph learning module, we remove each component of representations in the module to examine its relative importance and illustrate the results in Figure  3 .  4 4 4 We use a trainable dense layer to transform vector 𝐮 𝐮 \mathbf{u} or 𝐯 𝐯 \mathbf{v} and keep the dimension uniform as before. Based on it, several observations can be made. First, adding the neighbor user information improves performances more significantly than adding neighbor news information. In our GERL-Graph approach, candidate news can be directly modeled through titles and topics, while target users are only represented by their clicked news. When the user history is sparse, they may not be well represented. Hence, adding IDs of neighbor users may assist our model to learn better user representations. Second, the improvement brought by neighbor news semantic representations outweighs that brought by neighbour news ID. This is intuitive since titles of news contain more explicit and concrete meanings than IDs. Third, combining each part in the graph learning leads to the best model performance. By adding graph information both from neighbor users and news, our model forms better representations for recommendation.

4.4. Ablation Study on Attention Mechanism

Next, we explore the effectiveness of two categories of attention by removing certain part of them. Instead, to keep dimensions of vectors unchanged, we use average pooling to aggregate information. First, we verify two types of attention inside the transformer in Figure  4(a) . From it, we conclude that both the additive and the self attention are beneficial for news context understanding. This is because self-attention encodes interactions between words and additive attention helps to select important words. Among them, self-attention contributes more to improving model performances, as it models both short-distance and long-distance word dependency. Moreover, it forms diverse word representations with multiple attention heads. Also, we verify the model-level attention, e.g., attention inside the one-hop interaction learning and that in the two-hop graph learning . From Figure  4(b) , we observe that the attention in the one-hop module is more important. One-hop attention selects important clicked news of users, thus helping model user preferences directly. Compared with that, two-hop attention models relative importance of neighbors, which may only represent interests implicitly. By using both attentions simultaneously, we obtain the best performances.

4.5. Hyperparameter Analysis

Here we explore the influences of two hyperparameters. One is the number of attention heads in the transformer module. Another one is the degree of graph nodes in the graph learning module.

Number of Attention Heads. In the transformer module, the number of self-attention heads is crucial for learning context representations. We illustrate its influence in Figure  5(a) . An evident increase can be observed when the number increases from 2 2 2 to 8 8 8 , as the rich textual meanings may not be fully exploited when there are few heads. However, the performances drop a little when head number increases from 8. This may happen because news titles are concise and brief, thus too many parameters may be sub-optimal. Based on the above discussion, we set the number to be 8.

Degree of graph nodes. In the graph learning module, the degree of user and item nodes decides how many similar neighbors our model will learn. We increase the node degree from 5 5 5 to 25 25 25 and showcase its influence in Figure  5(b) . As illustrated, the performance improves when more neighbors are taken as model inputs, which is intuitive because more relatedness information from the graph is incorporated. Meanwhile, the increasing trend becomes smooth when the degree is larger than 15 15 15 . Therefore, we choose a moderate value 15 15 15 as the number of node degree.

Refer to caption

5. Conclusion

In this paper, we propose a graph enhanced representation learning architecture for news recommendation. Our approach consists of a one-hop interaction learning module and a two-hop graph learning module. The one-hop interaction learning module forms news representations via the transformer architecture. It also learns user representations by attentively aggregating their clicked news. The two-hop graph learning module enhances the representations of users and news by aggregating their neighbor embeddings via a graph attention network. Both IDs and textual contents of news are utilized to enrich the neighbor embeddings. Experiments are conducted on a real-world dataset, the improvement of recommendation performances validates the effectiveness of our approach.

Acknowledgements.

  • An et al . (2019) Mingxiao An, Fangzhao Wu, Chuhan Wu, Kun Zhang, Zheng Liu, and Xing Xie. 2019. Neural News Recommendation with Long- and Short-term User Representations. In ACL . Association for Computational Linguistics, Florence, Italy, 336–345.
  • Bansal et al . (2015) Trapit Bansal, Mrinal Das, and Chiranjib Bhattacharyya. 2015. Content driven user profiling for comment-worthy recommendations of news and blog articles. In RecSys. ACM, 195–202.
  • Cheng et al . (2016) Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al . 2016. Wide & deep learning for recommender systems. In DLRS . ACM, 7–10.
  • Guo et al . (2014) Guibing Guo, Jie Zhang, and Daniel Thalmann. 2014. Merging trust in collaborative filtering to alleviate data sparsity and cold start. Knowledge-Based Systems 57 (2014), 57–68.
  • Hamilton et al . (2017) William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584 (2017).
  • Huang et al . (2013) Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM . ACM, 2333–2338.
  • Jones (2004) Karen Spärck Jones. 2004. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation (2004).
  • Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  • Konstan et al . (1997) Joseph A Konstan, Bradley N Miller, David Maltz, Jonathan L Herlocker, Lee R Gordon, and John Riedl. 1997. GroupLens: applying collaborative filtering to Usenet news. Commun. ACM 40, 3 (1997), 77–87.
  • Li et al . (2011) Lei Li, Dingding Wang, Tao Li, Daniel Knox, and Balaji Padmanabhan. 2011. SCENE: a scalable two-stage personalized news recommendation system. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval . ACM, 125–134.
  • Lian et al . (2018) Jianxun Lian, Fuzheng Zhang, Xing Xie, and Guangzhong Sun. 2018. Towards Better Representation Learning for Personalized News Recommendation: a Multi-Channel Deep Fusion Approach.. In IJCAI . 3805–3811.
  • Lika et al . (2014) Blerina Lika, Kostas Kolomvatsos, and Stathes Hadjiefthymiades. 2014. Facing the cold start problem in recommender systems. Expert Systems with Applications 41, 4 (2014), 2065–2073.
  • Ling et al . (2014) Guang Ling, Michael R Lyu, and Irwin King. 2014. Ratings meet reviews, a combined approach to recommend. In Proceedings of the 8th ACM Conference on Recommender systems . ACM, 105–112.
  • Liu et al . (2010) Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized news recommendation based on click behavior. In IUI . ACM, 31–40.
  • Lv et al . (2011) Yuanhua Lv, Taesup Moon, Pranam Kolari, Zhaohui Zheng, Xuanhui Wang, and Yi Chang. 2011. Learning to model relatedness for news recommendation. In WWW . ACM, 57–66.
  • Marlin and Zemel (2004) Benjamin Marlin and Richard S Zemel. 2004. The multiple multiplicative factor model for collaborative filtering. In ICML . ACM, 73.
  • Okura et al . (2017) Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. 2017. Embedding-based news recommendation for millions of users. In KDD . ACM, 1933–1942.
  • Pennington et al . (2014) Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In EMNLP . 1532–1543.
  • Phelan et al . (2011) Owen Phelan, Kevin McCarthy, Mike Bennett, and Barry Smyth. 2011. Terms of a feather: Content-based news recommendation and discovery using twitter. In ECIR . Springer, 448–459.
  • Ren et al . (2017) Zhaochun Ren, Shangsong Liang, Piji Li, Shuaiqiang Wang, and Maarten de Rijke. 2017. Social collaborative viewpoint regression with explainable recommendations. In WSDM . ACM, 485–494.
  • Rendle (2012) Steffen Rendle. 2012. Factorization machines with libfm. TIST 3, 3 (2012), 57.
  • Song et al . (2019) Weiping Song, Zhiping Xiao, Yifan Wang, Laurent Charlin, Ming Zhang, and Jian Tang. 2019. Session-based social recommendation via dynamic graph attention networks. In WSDM . ACM, 555–563.
  • Srivastava et al . (2014) Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. JMLR 15, 1 (2014), 1929–1958.
  • Vaswani et al . (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS . 5998–6008.
  • Wang et al . (2018) Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. DKN: Deep knowledge-aware network for news recommendation. In WWW . International World Wide Web Conferences Steering Committee, 1835–1844.
  • Wang et al . (2019c) Hongwei Wang, Fuzheng Zhang, Mengdi Zhang, Jure Leskovec, Miao Zhao, Wenjie Li, and Zhongyuan Wang. 2019c. Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommender Systems. In 25th ACM SIGKDD (KDD ’19) . ACM, New York, NY, USA, 968–977.
  • Wang et al . (2019a) Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019a. KGAT: Knowledge Graph Attention Network for Recommendation. In KDD . 950–958.
  • Wang et al . (2019b) Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019b. Neural Graph Collaborative Filtering. In Proceedings of the 42nd International ACM SIGIR 2019, Paris, France, July 21-25, 2019. 165–174.
  • Wang et al . (2017) Xuejian Wang, Lantao Yu, Kan Ren, Guanyu Tao, Weinan Zhang, Yong Yu, and Jun Wang. 2017. Dynamic attention deep model for article recommendation by learning human editors’ demonstration. In KDD . ACM, 2051–2059.
  • Wu et al . (2019c) Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, and Xing Xie. 2019c. Neural news recommendation with attentive multi-view learning. arXiv preprint arXiv:1907.05576 (2019).
  • Wu et al . (2019d) Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, and Xing Xie. 2019d. Npa: Neural news recommendation with personalized attention. In KDD . ACM, 2576–2584.
  • Wu et al . (2019b) Chuhan Wu, Fangzhao Wu, Mingxiao An, Yongfeng Huang, and Xing Xie. 2019b. Neural News Recommendation with Topic-Aware News Representation. In ACL . 1154–1159.
  • Wu et al . (2019a) Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019a. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 33. 346–353.
  • Xin et al . (2019) Xin Xin, Xiangnan He, Yongfeng Zhang, Yongdong Zhang, and Joemon Jose. 2019. Relational Collaborative Filtering: Modeling Multiple Item Relations for Recommendation. In SIGIR .
  • Ying et al . (2018) Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In KDD . ACM, 974–983.
  • Zhai et al . (2016) Shuangfei Zhai, Keng-hao Chang, Ruofei Zhang, and Zhongfei Mark Zhang. 2016. Deepintent: Learning attentions for online advertising with recurrent neural networks. In KDD . ACM, 1295–1304.
  • Zheng et al . (2018) Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In WWW . 167–176.
  • Zhu et al . (2019) Qiannan Zhu, Xiaofei Zhou, Zeliang Song, Jianlong Tan, and Li Guo. 2019. DAN: Deep Attention Neural Network for News Recommendation. In AAAI , Vol. 33. 5973–5980.

ar5iv homepage

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Graph Enhanced Representation Learning for News Recommendation

  • Wu, Fangzhao
  • Huang, Yongfeng

With the explosion of online news, personalized news recommendation becomes increasingly important for online news platforms to help their users find interesting information. Existing news recommendation methods achieve personalization by building accurate news representations from news content and user representations from their direct interactions with news (e.g., click), while ignoring the high-order relatedness between users and news. Here we propose a news recommendation method which can enhance the representation learning of users and news by modeling their relatedness in a graph setting. In our method, users and news are both viewed as nodes in a bipartite graph constructed from historical user click behaviors. For news representations, a transformer architecture is first exploited to build news semantic representations. Then we combine it with the information from neighbor news in the graph via a graph attention network. For user representations, we not only represent users from their historically clicked news, but also attentively incorporate the representations of their neighbor users in the graph. Improved performances on a large-scale real-world dataset validate the effectiveness of our proposed method.

  • Computer Science - Information Retrieval;
  • Computer Science - Computation and Language

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PeerJ Comput Sci
  • PMC10280677

Logo of peerjcs

Design of news recommendation model based on sub-attention news encoder

Associated data.

The following information was supplied regarding data availability:

The code is available in the Supplemental Files . The datasets are available at:

- Hugging Face: CCX Dataset: https://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cucm/srnd/collab11/collab11/contactc.html .

- GitHub: Microsoft News Dataset: https://github.com/yzh1994414/MIND.git .

- MIND: https://msnews.github.io/ .

- The data used from those datasets are available at Zenodo: Wenting Zhang. (2023). Design of news recommendation model based on sub-attention news encoder [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7536816 .

To extract finer-grained segment features from news and represent users accurately and exhaustively, this article develops a news recommendation (NR) model based on a sub-attention news encoder. First, by using convolutional neural network (CNN) and sub-attention mechanism, this model extracts a rich feature matrix from the news text. Then, from the perspective of image position and channel, the granular image data is retrieved. Next, the user’s news browsing history is injected with a multi-head self-attention mechanism, and time series prediction is applied to the user’s interests. Finally, the experimental results show that the proposed model performs well on the indicators: mean reciprocal rank (MRR), Normalized Discounted Cumulative Gain (NDCG) and area under the curve (AUC), with an average increase of 4.18%, 5.63% and 6.55%, respectively. The comparative results demonstrate that the model performs best on a variety of datasets and has fastest convergence speed in all cases. The proposed model may provide guidance for the design of the news recommendation system in the future.

Introduction

Due to the sheer quantity of content available, there is a risk of information overload when using online news platforms (such as Google News). Recommending relevant news based on a user’s interests is an excellent way to reduce information overload. In contrast to other types of projects such as movies, books, educational resources, etc., news typically has a short shelf life and is frequently replaced by newer news. Some users’ interests may be long-lasting, whereas others may be triggered by particular contexts or temporary needs, and the rate at which these interests change is typically less stable than in other fields ( Ge et al., 2020 ). However, conventional collaborative filtering techniques have difficulty keeping up with the ever-changing interests of users because they do not account for the sequential browsing data of users ( Wu et al., 2020 ).

Consequently, a number of researchers have proposed serialized news recommendation (NR) methods, which typically use recurrent neural network (RNN) and attention mechanism (AM) to model users’ historical interaction behaviors and capture users’ sequential reading patterns. The RNN uses the session information from the user’s browsing history as input sequence, the gated recurrent unit (GRU) captures the sequence information from the user’s behavior sequence to learn the user’s long-term interest, and the final hidden state of the GRU network is used as output to create a user representation ( Santosh, Saha & Ganguly, 2020 ). The recommendations made to users now take their immediate and long-term objectives into account. Some researchers have turned to AM in order to forecast future preferences based on past actions ( Lee et al., 2020 ).

Various methods for recommending news articles based on their subject matter have been proposed. Some researchers have suggested that autoencoders be used to encode articles and RNNs to encode end users  ( Zhang, Jia & Wang, 2021 ). According to the comparative analysis of these models, the following issues frequently arise in the current NR system. The first challenge is determining how to efficiently extract news information from news content, which is essential for NR, especially when processing lengthy news stories and capturing key words in the text to reduce the effect of irrelevant words on semantics ( Tran, Hamad & Zaib, 2021 ). The second issue is how to simulate user preferences accurately. Current recommender systems frequently omit user-related item features from their representations when building user profiles.

To address the aforementioned issues, this article proposes a NR model based on a sub-attention news encoder which helps the neural network extract news features more efficiently and accurately by emphasizing relevant information and suppressing irrelevant information. The proposed system includes both the news encoding module (NE) and the user preference encoding module (UE). The following is a summary of the article’s major contributions:

(1) This article first incorporates a sub-attention mechanism (SAM), then learns key texts with convolutional neural network (CNN), and finally fuses the contextual features of news texts to extract more effective news features.

(2) This article first models old news records using a GRU network to extract sequence characteristics and then models the characteristics of browsing content. In addition, the article employs fused multi-head AM to model the characteristics of user browsing history. Finally, an attention mechanism is used to learn user preference representations from previous news features.

Related Work

The classification of NR methods is shown in Fig. 1 .

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1246-g001.jpg

Among the classic techniques are collaborative filtering, content, and hybrid approaches, among others. By analyzing the ratings that users have given for similar items or for ratings given by other users, collaborative filtering-based approaches attempt to predict the preferences of users.

Among the classic techniques are collaborative filtering, content, and hybrid approaches, among others. By analyzing the ratings that users have given for similar items or for ratings given by other users, collaborative filtering-based approaches attempt to predict the preferences of users. For instance, in order to inform their recommendations, scientists consider both explicit and implicit ratings provided by users. However, when these strategies are first implemented, they frequently encounter difficulties due to a ”cold start”, which occurs because news articles are frequently replaced  ( Ge et al., 2020 ; Wu et al., 2020 ; Lee et al., 2020 ; Zhang, Jia & Wang, 2021 ).

By analyzing the content of a user’s news-browsing habits, content-based methods can mitigate the cold-start problem and provide relevant article recommendations. These techniques are known as “content-based” techniques. Some academics have proposed a content-based deep neural network model that employs cosine similarity calculations to determine the degree to which two documents are associated ( Tran, Hamad & Zaib, 2021 ; Khattar, Kumar & Varma, 2018 ; Wang, Wang & Lu, 2022 ). Content-based recommendation, which is based on deep learning and is an explicit modeling of news features and user interest features, possesses strong interpretability and has been vigorously and successfully developed. In addition, content-based recommendation is highly interpretable and has been actively and successfully developed. Researchers from 2017 proposed combining a denoising autoencoder for news representation learning and a GRU network for news-based user representation learning. Researchers have proposed a number of multi-head AM-based NR models in 2019 ( Zhang, Wang & Ren, 2022 ). The primary purpose of these models is to extract the relationship between context and news by employing multi-head AM at both the word and news levels ( Tran et al., 2022 ). Some of the researchers also combed CNN for news features, which were extracted from the source and subjected to a subject-based classification. The extraction of features yielded extremely satisfying results ( Qian, Wang & Hu, 2021 ). Some researchers employs a recurrent neural network and an id embedding to determine the user’s long-term and short-term interests. There are also problems with suggestions based on the displayed content. There is a gap in the construction of the portrait because the majority of approaches rely on article-level matching, which has the potential to conceal semantic and interest features buried in finer-grained news segments ( Ma & Zhu, 2019 ; Shi, Ma & Wang, 2021 ).

In most instances, a hybrid approach consolidates distinct recommendation procedures into a single overarching strategy. A few researchers have proposed a two-stage recommendation framework that employs both collaborative filtering and content-based approaches. Because these methods disregard the sequential information in a user’s browsing history, it is challenging to determine how users’ interests evolve over time ( Wu, Wu & Qi, 2021 ; Qi et al., 2021 ).

In NR, modeling the news accurately is a crucial responsibility. Some works model and obtain semantic representations of news utilizing only a single DL technique. CNN’s algorithm is widely employed to extract news text features. Several academics, for instance, have proposed a DL-based NR model ( Shi, Xu & Hu, 2021 ). The content representation module of this system performs convolution calculations beginning at the word level on the text content of CNN news stories in order to generate the embedded representation of news content ( Ke, 2020 ). In the DAINN model, CNN is used to represent the word-level content of news text content. This is one of the many applications CNN offers ( Wang et al., 2021 ). Some works select multiple types of news data to model the news in order to enrich the information with semantically relevant content  ( Qiu, Hu & Wu, 2022 ). For news editors, researchers have developed a professional news screening and recommendation system. This system aims to solve the issue of ambiguous news screening standards, which arise when news editors place more emphasis on the quality of the news text and less emphasis on metadata such as keywords and topics when screening news. The researchers propose modeling news and predicting its screening criteria using two distinct data types. These records contain the text and category of news articles ( Sun et al., 2021 ; Halder, Poddar & Kan, 2018 ). A CNN model with one convolution layer and a total of 1,050 convolution kernels is used to represent textual content in this framework by capturing latent semantic patterns in word sequences, while one-hot vectors are used to represent elements such as news categories ( Wu, Wu & Wang, 2021 ). The CNN model contains a total of 1050 convolution kernels. After collecting data and information, the final step is to combine the two to forecast the likelihood of news screening. In addition, the model constructs CNN based on characters, which increases its generalizability across multiple languages. Since there is insufficient semantic information in the characters, the news semantic features extracted by character-level CNN will not be sufficiently rich, and the expansion of input sequences may increase computational costs ( Vinh Tran, Nguyen Pham & Tay, 2019 ).

Some scholars propose the DAN model, which integrates news summaries, which are more informative than news headlines, into news data  ( Zan, Zhang & Meng, 2021 ). This is done while considering news headlines. DAN is based on the idea of two concurrent CNNs so that users can comprehend the news feature representation. The pulse coupled neural network (PCNN) receives headlines and summaries at the word level as inputs, learns representations of news at the level of headlines and summaries, and combines these representations to form the final news feature representation ( Wang, Dai & Cao, 2023 ). PCNN-based models are more competitive than models that rely solely on news headlines because they are supported by a greater number of data features. Previous research has demonstrated that CNN is widely used for modeling news; however, due to CNN’s fixed receptive field, it is not suitable for modeling longer news word sequences. This limitation eliminates CNN as a suitable tool for this task ( Wang, Zhang & Rao, 2020 ; Deng, Li & Liu, 2021 ).

Using matrix factorization as a foundation, social information has been incorporated into the recommendation model in order to capture more expressive user preference vectors. characterized by a high degree of adaptability and an enhanced capacity for recommendation. The two subcategories of recommendation models that fall under the category of models that use matrix factorization are collaborative factorization methods and social regularization methods ( Xia, Huang & Xu, 2020 ). Due to the presence of the user feature vector in both the user-item rating matrix and the social matrix, the collaborative decomposition method can decompose both of these matrices simultaneously. The SOREC-based model is an example of a collaborative decomposition technique based on matrix factorization ( Wang, Xi & Huang, 2020 ). This method begins by decomposing the user-item rating matrix and the social relationship matrix into a vector of latent user features. Decomposing the user-item rating matrix and limiting the user feature vector with the social matrix is a crucial aspect of the matrix factorization-based social regularization method. Several researchers have proposed a model of matrix factorization with social regularization. Utilizing the social matrix, this model regularizes the user eigenvectors.

Specialists have proposed DNN-based nonlinear recommendation methods as an alternative to matrix factorization-based tasks due to their limitations. Then, DNN is used to extract user nonlinear feature vectors from the social relations matrix and incorporate them into the factorization of the probability matrix for scoring prediction. Social relations are initially input into a graph embedding model in order to pre-train user node embeddings. DeepSOR is the name of the model in question ( Jiang, Wang & Wei, 2020 ). DSCF, an additional DNN-based social recommendation model, can utilize high-order social neighbor information to increase the amount of social information and user-item interaction data it can extract ( Chen, Cao & Chen, 2021 ).

Sessions are frequently mentioned throughout the serialization recommendations, and sequences are constructed from sessions. Recent research heavily relies on RNN modeling for a variety of purposes. Using tensor-based CNN and RNN to record the click-order of users’ selected stories, researchers have successfully extracted session-level representations of news features ( Wu, Wu & Qi, 2021 ; Ke, 2020 ). The most recent hidden state of the GRU network can be used to infer the user’s short-term interests. Following its widespread adoption in other machine learning applications, supervised learning has grown in popularity in recent years, resulting in a rise in the use of AI for recommendation tasks. Some academics have only recently begun exploring the viability of using AM to survey user preferences. According to Reference  Deng, Li & Liu (2021) , the purpose of constructing an AM-based RNN was to record the order in which users clicked on various news articles. Google initially proposed SAM, and the company was the first to implement it for MT. It is possible to compute the SAM representation from the sequence alone, making it a subset of the more general AM. SAM is able to identify long-distance dependencies between words because, unlike conventional AM, it places a greater emphasis on the interactive learning that occurs between multiple words within a sentence ( Qiu, Hu & Wu, 2022 ; Vinh Tran, Nguyen Pham & Tay, 2019 ; Wang, Xi & Huang, 2020 ). Academics have proposed a time-sensitive SAM that can learn users’ short-term interests based on their recent browsing sequences and combine these interests with users’ long-term interests to provide more accurate news recommendations. A sequential knowledge-aware recommendation model was the brainchild of a different researcher. This model uses SAM to identify sequential patterns in user interaction logs in order to generate recommendations ( Guo, Yin & Chen, 2021 ).

According to the comparative analysis of these models, the following issues frequently arise in the current NR system. The first challenge is determining how to efficiently extract news information from news content, which is essential for NR, especially when processing lengthy news stories and capturing key words in the text to reduce the effect of irrelevant words on semantics. The second issue is how to simulate user preferences accurately. Current recommender systems frequently omit user-related item features from their representations when building user profiles.

In this article, we build a news recommendation system based on the concept of sub-attention, which allows it to independently learn user features (as shown in Fig. 2 ). CNN makes use of a text feature extraction model that is based on SAM in order to discover characteristics of users who read the news and possible candidates for news stories. GRU’s time series feature extraction mechanism is able to collect information about the user because it combines the user feature model with the SAM and the MAM. The resulting candidate news prediction ranking model uses both the user’s profile and their browsing history to assign scores to individual articles.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1246-g002.jpg

Initially, a CNN- and SAM-based news feature extraction module will be introduced. Through the utilization of convolutional neural networks (CNNs) and word-level support vector machine (SAM) modeling, this article is able to process lengthy news articles by giving more weight to specific words.

Initially, a CNN- and SAM-based news feature extraction module will be introduced. Through the utilization of convolutional neural networks (CNNs) and word-level support vector machine (SAM) modeling, this article is able to process lengthy news articles by giving more weight to specific words. These techniques enable the efficient extraction of the text’s context, local characteristics, and global characteristics.

News text and its vector representation as

Suppose the number of convolution kernels is p , the context-related word vector matrix output by the CNN layer is

When the length of the news article is excessive, the semantic vector may not be able to accurately represent all of the sequence’s information. This is due to the fact that the significance of the information carried by the content entered first is easily diminished by subsequent content. We introduce SAM into news text in this article. This helps the neural network extract news features more efficiently and accurately by emphasizing relevant information and suppressing irrelevant information.

Suppose the i -th word weight is w i

where ww i isthe training weight matrix, b i is the random error. When the activation function changes, w i is calculated as

Further, output the news feature vector w news

This article develops a user feature extraction model in order to handle the time series features that are introduced by the user’s browsing time. Additionally, in order to extract the correlation between historical news and the key news that affects users, the model must handle the time series features. The first component is a neural network–based GRU time series prediction module. This module outputs the feature matrix via the browsing record list in order to take into account the length of time it takes to complete a browsing session.

The historical news matrix after encoding representation is

The calculation process in the middle of GRU are as follows:

where W r , W z , W h is the parameter matrix of the corresponding layer, r t is the reset gate, z t is the update gate, h t is the candidate hidden state of t , and h t is the hidden state of t .

The structure of GRU is shown in Fig. 3 . In the structure of GRU, the reset gate determines how the new input information is combined with the previous memory, and the update gate defines the amount of previous memory saved to the current time step; Reset gates are used to control the degree to which state information from the previous moment is ignored.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1246-g003.jpg

In the structure of GRU, the reset gate determines how the new input information is combined with the previous memory, and the update gate defines the amount of previous memory saved to the current time step; reset gates are used to control the degree to which state information from the previous moment is ignored.

By observing how frequently the same user reads similar articles in the news, MSA (multi-head self-attention) mechanism is able to glean more nuanced characteristics about them. Multiple zoom-click SA units are stacked to create the MSA. Each news item in the input newsgroup must perform an attention calculation with every other news item; this is done so that the system can learn the connection between stories that have been viewed by the same user.

Assuming that the feature matrix of news is F , the MSA is calculated as follows:

When performing MSA, Q, K, and V are linearly transformed before being input to the scaled dot-product attention, and the parameter W of the linear transformation of Q, K, and V varies from iteration to iteration. This ensures that the linear transformation of Q, K, and V produces the most accurate results possible. The value that is arrived at by applying a linear transformation to the h-scaled dot-product of attention is what is considered to be the end result of the MSA.

The framework of MSA is shown in Fig. 4 , where FC is fully connected layer.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1246-g004.jpg

The framework of MSA is shown, where FC is a fully connected layer.

The rest of the parameter settings and loss functions in this article are the same as in Reference 14.

Four datasets taken from the real world are chosen in order to test how well the model that has been proposed works. The first dataset is a news network dataset called CCX, and the second dataset is called Microsoft News Dataset (MIND). The third and fourth datasets both come from Adressa, but the third dataset, known as Adressa1, contains weekly data. The Adressa2 dataset, which comes in at number four, is a monthly dataset. For the purpose of providing adequate historical session information, non-anonymous users with subscription services were chosen, sessions with fewer than three interactions were discarded, and sessions with more than four interactions were kept. The proportional breakdown of the dataset into its training set, validation set, and test set is 8 to 1 to 1. The details of the four datasets are shown in Table 1 .

The first dataset is a news network dataset called CCX, and the second dataset is called the Microsoft News Dataset (MIND). The third and fourth datasets both come from Adressa, but the third dataset, known as Adressa1, contains weekly data. The fourth datasbet, the Adressa2 dataset, is a monthly dataset.

CCXMINDAdressa1Adressa2
News number120375176550173525695135
User number954910000596322986
Title length10.3511.3212.1711.35
Content length672.33854.09598.66601.35

The evaluation indicators used in this article are mean reciprocal rank (MRR), Normalized Discounted Cumulative Gain (NDCG) and area under the curve (AUC).

The average of the reciprocal ranks of the items that were correctly recommended is known as the mean reciprocal rank (MRR). When the rank number is greater than K, the reciprocal rank is equal to 0. The MRR metric considers the ranking order of recommendations; a higher MRR value indicates that the recommended item is more likely to be correct. nDCG is the normalized discounted cumulative gain. When comparing two samples, AUC indicates the likelihood that the positive sample’s predicted value is higher than the negative sample’s. The calculation formulas of the three indicators are as follows:

where U is the ranking of all samples, P and Q are the number of positive and negative samples.

The specific parameter settings are shown in Table 2 .

where U is the ranking of all samples, P and Q are the number of positive and negative samples. The specific parameter settings are shown.

NameValue
Learning rate0.0001
Batch size128
Attention size64
Query dimension64
Number of SAM8
Dropout0.5
Hidden dimension128
Kernel128
Embedding dimension256
OptimizerAdam

Comparison algorithm This article adopts DAN, LSTUR, TANR and TANN in Khattar, Kumar & Varma (2018) ; Ma & Zhu (2019) ; Wu, Wu & Qi (2021) ; Qi et al. (2021) . First, we compare the performance of different datasets in MRR@5 and MRR@15 on four datasets, as shown in the Figs. 5 and ​ and6 6 .

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1246-g005.jpg

Comparison algorithm This article adopts DAN, LSTUR, TANR and TANN in Khattar, Kumar & Varma (2018) ; Ma & Zhu (2019) ; Wu, Wu & Qi (2021) ; Qi et al. (2021) . First, we compare the performance of different datasets in MRR@5 and MRR@15 on four datasets.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1246-g006.jpg

The improvement interval for the MRR@5 model is between 0.39% and 5.56%, and the average improvement rate of the MRR@5 model is 3.85%. This rate is compared to the comparison model.

It is clear from looking at Figs. 5 and ​ and6 6 that, out of the four different data sets, the indicators produced by this model are the most accurate ones. The improvement interval for the MRR@5 model is between 0.39% and 5.56%, and the average improvement rate of the MRR@5 model is 3.85%. This rate is compared to the comparison model. The improvement range goes from 0.95% to 5.78%, with the average improvement rate for MRR@15 coming in at 4.18%.

Further, we compared the remaining two indicators on the four datasets, and the comparison results are shown in Figs. 7 and ​ and8 8 .

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1246-g007.jpg

This figure and Fig. 8 reveal that our method’s indicators have the highest overall performance of all the benchmark methods. They have demonstrated an average increase in nDCG@15 and AUC of 5.63% and 6.55%, respectively, proving that the model presented in this article is accurate.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1246-g008.jpg

GRU is used to collect the information that users provide sequentially. In addition, unlike other benchmark methods, the proposed model not only considers the user interaction behavior that occurs within a session, but also the correlation of user interests that occurs between sessions; as a result, the performance of the recommendation has been enhanced even further.

Figures 7 and ​ and8 8 reveal that our method’s indicators have the highest overall performance of all the benchmark methods. They have demonstrated an average increase in nDCG@15 and AUC of 5.63% and 6.55%, respectively, proving that the model presented in this article is accurate. This is because our model not only uses GRU to capture the sequential information of users, but also incorporates SAM to better learn the primary interests of users during a session. GRU is used to collect the information that users provide sequentially. In addition, unlike other benchmark methods, the proposed model not only considers the user interaction behavior that occurs within a session, but also the correlation of user interests that occurs between sessions; as a result, the performance of the recommendation has been enhanced even further.

The following three model iterations are evaluated: (1) The SAM mechanism is not considered for use in Variation 1, as it does not take the primary interests of users during each session into account. (2) In Variation 2, the GRU is not considered for determining the correlation between user interests during the current session and those from previous sessions. (3) Variation 3 considers both the primary interests of users in each session and the relationship between the primary interests of users across sessions. Figure 9 displays the outcomes of experiments conducted on CCX dataset.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1246-g009.jpg

SAM is able to validate the effectiveness of the interest-aware attention layer design because it can determine the proportion of each news item clicked during a session. This figure demonstrates that model performance decreases significantly when SAM is not used to discover user interests during a session.

SAM is able to validate the effectiveness of the interest-aware attention layer design because it can determine the proportion of each news item clicked during a session. Figure 9 demonstrates that model performance decreases significantly when SAM is not used to discover user interests during a session. When combined, the various weights of have the potential to capture the primary interests of users within a single session. In addition, the session-aware attention layer slightly degrades the experimental results, indicating that it is reasonable to model the correlation of user interests across multiple sessions in order to discover the final user representation. This is possible in order to discover the user’s preferences.

When mining news features and user features, existing NR models frequently ignore the relationship between browsed news, time series changes, and the importance of various news to users. This results in an incompleteness in the models’ predictions of what users should pay attention to. In the meantime, the models that are currently used provide a deeper level of media coverage. It is unfortunate that there is not enough granular feature mining done in the realm of content. A NR model that is based on a sub-attention news encoder is developed in order to accurately and exhaustively represent users while also extracting finer-grained segment features from news. As part of its deep learning methodology, the model initially pulls a rich feature matrix from the news text using CNN and a sub-attention mechanism. This is the first step in the process. In the second step of the process, the network is built with two student branches, and Conv64 is used to extract the primary features. Granular image data is retrieved by looking at it from the point of view of the image position and channel. Next, a multi-head self-attention mechanism is injected into the user’s news browsing history, and time series prediction is applied to the user’s interests. In the end, but certainly not least, we put the model to the test by contrasting it to other models and evaluating it on both the real-world Chinese dataset as well as the English dataset using metrics such as convergence time, Mean squared error (MSE), Mean squared error (RMSE), and so on. The results of the experiments show that the model that is presented in this article performs better on a wide variety of datasets and converges more quickly in every circumstance. In future work, we will focus on the application of this model to user interaction performance, and verify the validity of the model through examples.

Supplemental Information

Supplemental information 1.

This code contains some of the formulas in the article, references, etc.

Funding Statement

The author received no funding for this work.

Additional Information and Declarations

The author declares that they have no competing interests.

Wenting Zhang conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

  • DOI: 10.1145/3366423.3380050
  • Corpus ID: 214728019

Graph Enhanced Representation Learning for News Recommendation

  • Suyu Ge , Chuhan Wu , +2 authors Yongfeng Huang
  • Published in The Web Conference 31 March 2020
  • Computer Science

Figures and Tables from this paper

figure 1

107 Citations

Hierarchical preference hash network for news recommendation, digat: modeling news recommendation with dual-graph interaction, recognize news transition from collective behavior for news recommendation, user modeling with click preference and reading satisfaction for news recommendation, personalized news recommendation with knowledge-aware interactive matching, ✨ going beyond local: global graph-enhanced personalized news recommendations.

  • Highly Influenced

NRMG: News Recommendation With Multiview Graph Convolutional Networks

Mm-rec: multimodal news recommendation, a survey of personalized news recommendation, d-han: dynamic news recommendation with hierarchical attention network, 38 references, neural news recommendation with attentive multi-view learning, neural news recommendation with topic-aware news representation, personalized news recommendation based on click behavior, neural news recommendation with long- and short-term user representations, npa: neural news recommendation with personalized attention, learning to model relatedness for news recommendation, dkn: deep knowledge-aware network for news recommendation.

  • Highly Influential

SCENE: a scalable two-stage personalized news recommendation system

Embedding-based news recommendation for millions of users, dan: deep attention neural network for news recommendation, related papers.

Showing 1 through 3 of 0 Related Papers

graph enhanced representation learning for news recommendation

Published in The Web Conference 2020

Suyu Ge Chuhan Wu Fangzhao Wu Tao Qi Yongfeng Huang

Boosting Patient Representation Learning via Graph Contrastive Learning

  • Conference paper
  • First Online: 22 August 2024
  • Cite this conference paper

graph enhanced representation learning for news recommendation

  • Zhenhao Zhang 11 ,
  • Yuxi Liu 12 ,
  • Jiang Bian 12 ,
  • Antonio Jimeno Yepes 13 ,
  • Jun Shen 14 ,
  • Fuyi Li 15 ,
  • Guodong Long 16 &
  • Flora D. Salim 17  

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14949))

Included in the following conference series:

  • Joint European Conference on Machine Learning and Knowledge Discovery in Databases

Building deep neural network models for clinical prediction tasks is an increasingly active area of research. While existing approaches show promising performance, the learned patient representations from deep neural networks are often task-specific and not generalizable across multiple clinical prediction tasks. In this paper, we propose a novel neural network architecture leveraging the graph contrastive learning paradigm to learn patient representations that are applicable to a wide range of clinical prediction tasks. In particular, our approach consists of three well-designed modules for learning graph-based patient representations, alongside a pretraining mechanism that exploits self-supervised information in generated patient graphs. These modules collaboratively integrate patient graph structure learning, refinement, and contrastive learning, enhanced by masked graph modeling as a pretraining mechanism to optimize learning outcomes. Empirical results show that the proposed approach outperforms baselines in both self-supervised and supervised learning scenarios, offering robust, effective, and more generalizable patient representations in healthcare applications.

Z. Zhang and Y. Liu—Contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

https://github.com/LZlab01/GCL-EHR .

Cai, D., Sun, C., Song, M., Zhang, B., Hong, S., Li, H.: Hypergraph contrastive learning for electronic health records. In: Proceedings of the 2022 SIAM International Conference on Data Mining (SDM), pp. 127–135. SIAM (2022)

Google Scholar  

Che, Z., Purushotham, S., Cho, K., Sontag, D., Liu, Y.: Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8 (1), 6085 (2018)

Article   Google Scholar  

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

Cho, K., et al.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

Choi, E., et al.: Learning the graphical structure of electronic health records with graph convolutional transformer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 606–613 (2020)

Ericsson, L., Gouk, H., Loy, C.C., Hospedales, T.M.: Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Process. Mag. 39 (3), 42–62 (2022)

Harutyunyan, H., Khachatrian, H., Kale, D.C., Ver Steeg, G., Galstyan, A.: Multitask learning and benchmarking with clinical time series data. Sci. Data 6 (1), 96 (2019)

Hilton, C.B., et al.: Personalized predictions of patient outcomes during and after hospitalization using artificial intelligence. NPJ Digital Med. 3 (1), 51 (2020)

Huang, G., Ma, F.: Concad: contrastive learning-based cross attention for sleep apnea detection. In: Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part V 21, pp. 68–84. Springer (2021)

Huang, X., Lai, W.: Clustering graphs for visualization via node similarities. J. Visual Lang. Comput. 17 (3), 225–253 (2006)

Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

Liu, Y., Qin, S., Yepes, A.J., Shao, W., Zhang, Z., Salim, F.D.: Integrated convolutional and recurrent neural networks for health risk prediction using patient journey data with many missing values. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1658–1663. IEEE (2022)

Liu, Y., Qin, S., Zhang, Z., Shao, W.: Compound density networks for risk prediction using electronic health records. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1078–1085. IEEE (2022)

Liu, Y., Zhang, Z., Yepes, A.J., Salim, F.D.: Modeling long-term dependencies and short-term correlations in patient journey data with temporal attention networks for health prediction. In: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pp. 1–10 (2022)

Liu, Z., Li, X., Peng, H., He, L., Philip, S.Y.: Heterogeneous similarity graph neural network on electronic health records. In: 2020 IEEE International Conference on Big Data (Big Data). pp. 1196–1205. IEEE (2020)

Luo, J., Ye, M., Xiao, C., Ma, F.: Hitanet: hierarchical time-aware attention networks for risk prediction on electronic health records. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 647–656 (2020)

Ochoa, J.G.D., Mustafa, F.E.: Graph neural network modelling as a potentially effective method for predicting and analyzing procedures based on patients’ diagnoses. Artif. Intell. Med. 131 , 102359 (2022)

Sheikhalishahi, S., Balaraman, V., Osmani, V.: Benchmarking machine learning models on multi-centre eicu critical care dataset. PLoS ONE 15 (7), e0235424 (2020)

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)

Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

Wang, T., Jin, D., Wang, R., He, D., Huang, Y.: Powerful graph convolutional networks with adaptive propagation mechanism for homophily and heterophily. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 4210–4218 (2022)

Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487. PMLR (2016)

You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z., Shen, Y.: Graph contrastive learning with augmentations. Adv. Neural. Inf. Process. Syst. 33 , 5812–5823 (2020)

Yu, J., Xia, X., Chen, T., Cui, L., Hung, N.Q.V., Yin, H.: Xsimgcl: towards extremely simple graph contrastive learning for recommendation. IEEE Trans. Knowl. Data Eng. (2023)

Zhang, Y.: Attain: attention-based time-aware lstm networks for disease progression modeling. In: In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI-2019), pp. 4369-4375, Macao, China. (2019)

Zheng, Z., Tan, Y., Wang, H., Yu, S., Liu, T., Liang, C.: Casangcl: pre-training and fine-tuning model based on cascaded attention network and graph contrastive learning for molecular property prediction. Briefings Bioinform. 24 (1), bbac566 (2023)

Zhu, W., Razavian, N.: Variationally regularized graph-based representation learning for electronic health records. In: Proceedings of the Conference on Health, Inference, and Learning, pp. 1–13 (2021)

Zhu, Y., Xu, Y., Liu, Q., Wu, S.: An empirical study of graph contrastive learning. arXiv preprint arXiv:2109.01116 (2021)

Zhu, Y., Xu, Y., Yu, F., Liu, Q., Wu, S., Wang, L.: Deep graph contrastive representation learning. arXiv preprint arXiv:2006.04131 (2020)

Zhu, Y., Xu, Y., Yu, F., Liu, Q., Wu, S., Wang, L.: Graph contrastive learning with adaptive augmentation. In: Proceedings of the Web Conference 2021, pp. 2069–2080 (2021)

Download references

Author information

Authors and affiliations.

College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China

Zhenhao Zhang

College of Medicine, University of Florida, Gainesville, FL, 32610, USA

Yuxi Liu & Jiang Bian

School of Computing Technologies, RMIT University, Melbourne, VIC, 3001, Australia

Antonio Jimeno Yepes

School of Computing and Information Technology, UOW, Wollongong, NSW, 2522, Australia

South Australian immunoGENomics Cancer Institute, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA, 5005, Australia

Australian AI Institute, FEIT, UTS, Sydney, NSW, 2007, Australia

Guodong Long

School of Computer Science and Engineering, UNSW, Sydney, NSW, 2052, Australia

Flora D. Salim

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Yuxi Liu .

Editor information

Editors and affiliations.

LTCI, Télécom Paris, Palaiseau Cedex, France

Albert Bifet

Faculty of Informatics, Vytautas Magnus University, Akademija, Lithuania

Tomas Krilavičius

Stockholm University, Kista, Sweden

Ioanna Miliou

School of Information Technology, Halmstad University, Halmstad, Sweden

Slawomir Nowaczyk

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Zhang, Z. et al. (2024). Boosting Patient Representation Learning via Graph Contrastive Learning. In: Bifet, A., Krilavičius, T., Miliou, I., Nowaczyk, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14949. Springer, Cham. https://doi.org/10.1007/978-3-031-70378-2_21

Download citation

DOI : https://doi.org/10.1007/978-3-031-70378-2_21

Published : 22 August 2024

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-70377-5

Online ISBN : 978-3-031-70378-2

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Effective Edge-wise Representation Learning in Edge-Attributed Bipartite Graphs

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options.

graph enhanced representation learning for news recommendation

Supplemental Material

Index terms.

Computing methodologies

Machine learning

Learning paradigms

Supervised learning

Supervised learning by classification

Machine learning approaches

Factorization methods

Mathematics of computing

Discrete mathematics

Graph theory

Graph algorithms

Recommendations

Interval non-edge-colorable bipartite graphs and multigraphs.

An edge-coloring of a graph G with colors 1,...,t is called an interval t-coloring if all colors are used, and the colors of edges incident to any vertex of G are distinct and form an interval of integers. In 1991, Erdï s constructed a bipartite graph ...

On sum edge-coloring of regular, bipartite and split graphs

An edge-coloring of a graph G with natural numbers is called a sum edge-coloring if the colors of edges incident to any vertex of G are distinct and the sum of the colors of the edges of G is minimum. The edge-chromatic sum of a graph G is the sum of ...

Equistarable bipartite graphs

Recently, Milanič and Trotignon introduced the class of equistarable graphs as graphs without isolated vertices admitting positive weights on the edges such that a subset of edges is of total weight 1 if and only if it forms a maximal star. Based on ...

Information

Published in.

cover image ACM Conferences

  • General Chairs:

Author Picture

Northeastern University, USA

Author Picture

CENTAI / Eurecat, Italy

  • SIGMOD: ACM Special Interest Group on Management of Data
  • SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data

Association for Computing Machinery

New York, NY, United States

Publication History

Check for updates, author tags.

  • attributed graph
  • bipartite graph
  • edge classification
  • graph representation learning
  • Research-article

Acceptance Rates

Contributors, other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 39 Total Downloads
  • Downloads (Last 12 months) 39
  • Downloads (Last 6 weeks) 39

View options

View or Download as a PDF file.

View online with eReader .

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

IMAGES

  1. Figure 1 from Graph Enhanced Representation Learning for News

    graph enhanced representation learning for news recommendation

  2. Figure 1 from Graph Enhanced Representation Learning for News

    graph enhanced representation learning for news recommendation

  3. Graph Enhanced Representation Learning for News Recommendation

    graph enhanced representation learning for news recommendation

  4. Figure 2 from Graph Enhanced Representation Learning for News

    graph enhanced representation learning for news recommendation

  5. Table 1 from Graph Enhanced Representation Learning for News

    graph enhanced representation learning for news recommendation

  6. [PDF] Graph Enhanced Representation Learning for News Recommendation

    graph enhanced representation learning for news recommendation

VIDEO

  1. Graph Representation in Data Structure |Adjacency Matrix and Adjacecy List

  2. Fair Graph Representation Learning via Sensitive Attribute Disentanglement

  3. RAL-ICRA'22: SegContrast: 3D Point Cloud Feature Representation Learning ... by Nunes et al

  4. KDD 2023

  5. [rfp1015] Semantic Evolvement Enhanced Graph Autoencoder for Rumor Detection

  6. KDD 2023

COMMENTS

  1. Graph Enhanced Representation Learning for News Recommendation

    Here we propose a news recommendation method which can enhance the representation learning of users and news by modeling their relatedness in a graph setting. In our method, users and news are both viewed as nodes in a bipartite graph constructed from historical user click behaviors. For news representations, a transformer architecture is first ...

  2. Graph Enhanced Representation Learning for News Recommendation

    In this paper, we propose a graph enhanced representation learning architecture for news recommendation. Our approach consists of. a one-hop interaction learning module and a two-hop graph learn-ing module. The one-hop interaction learning module forms news representations via the transformer architecture.

  3. Graph Enhanced Representation Learning for News Recommendation

    A news recommendation method which can enhance the representation learning of users and news by modeling their relatedness in a graph setting and improved performances on a large-scale real-world dataset validate the effectiveness of this proposed method. With the explosion of online news, personalized news recommendation becomes increasingly important for online news platforms to help their ...

  4. Graph neural news recommendation based on multi-view representation

    Accurate news representation is of crucial importance in personalized news recommendation. Most of existing news recommendation model lack comprehensiveness because they do not consider the higher-order structure between user-news interactions, relevance between user clicks on news. In this paper, we propose graph neural news recommendation based on multi-view representation learning which ...

  5. Graph Enhanced Representation Learning for News Recommendation

    Some works [6, 10,18,30] have attempted to leverage the graph information to enhance the representations learning for news recommendation with GNNs. For example, DKN [30] uses the one-hop ...

  6. Graph Enhanced Representation Learning for News Recommendation

    Graph Enhanced Representation Learning for News Recommendation; research-article . Share on. Graph Enhanced Representation Learning for News Recommendation. Authors: Suyu Ge.

  7. Graph Enhanced Representation Learning for News Recommendation

    DOI: 10.1145/3366423.3380050 Corpus ID: 214728019; Graph Enhanced Representation Learning for News Recommendation @article{Ge2020GraphER, title={Graph Enhanced Representation Learning for News Recommendation}, author={Suyu Ge and Chuhan Wu and Fangzhao Wu and Tao Qi and Yongfeng Huang}, journal={Proceedings of The Web Conference 2020}, year={2020} }

  8. Candidate-Aware Attention Enhanced Graph Neural Network for News

    We propose a graph neural news recommendation model GNNR with high-order information encoded by propagating embedding over the graph. ... Ge, S., Wu, C., Wu, F., et al.: Graph enhanced representation learning for news recommendation. In: Proceedings of The Web Conference 2020, pp. 2863-2869 (2020)

  9. Graph Enhanced Representation Learning for News Recommendation

    Graph Enhanced Representation Learning for News Recommendation Suyu Ge Tsinghua National Laboratory for Information Science and Technology Tsinghua University [email protected] , Chuhan Wu Tsinghua National Laboratory for Information Science and Technology Tsinghua University [email protected] , Fangzhao Wu Microsoft Research Asia ...

  10. PDF Candidate-Aware Attention Enhanced Graph Neural Network for News

    In this paper, we propose Candidate-aware Attention Enhanced Graph Neural Network for News Recommendation (GNNR), which encodes high-order connec-tions into the representation of news through information propagation along the graph. And then combine obtained representations with rep-resentations from news content.

  11. Graph neural news recommendation based on multi-view representation

    In this paper, we propose graph neural news recommendation based on multi-view representation learning which encodes high-order connections into the representation of news through information propagation along the graph. For news representations, we learn click news and candidate news content information embedding from various news attributes.

  12. Graph Enhanced Representation Learning for News Recommendation

    Here we propose a news recommendation method which can enhance the representation learning of users and news by modeling their relatedness in a graph setting. In our method, users and news are both viewed as nodes in a bipartite graph constructed from historical user click behaviors.

  13. Going Beyond Local: Global Graph-Enhanced Personalized News Recommendations

    enriching the news representations beyond merely local news in-formation. 3 METHOD. Our proposed approach, GLORY, focuses on enhancing historical news representation by utilizing a global news graph, and improv-ing candidate news representation through a global entity graph, as depicted in Fig. 2. First, we learn the representation of news text and

  14. UNEG: A Description-Enhanced Graph-based News Recommendation Method

    Modeling news and users accurately became more and more crucial in personalized news recommendation. Exploring rich relational information with graph-based representation learning methods is challenging. Existing graph-based methods learn the representations of news items and user interests with external knowledge graphs or user-news bipartite graphs. However, they rarely link two news through ...

  15. Going Beyond Local: Global Graph-Enhanced Personalized News Recommendations

    Precisely recommending candidate news articles to users has always been a core challenge for personalized news recommendation systems. Most recent works primarily focus on using advanced natural language processing techniques to extract semantic information from rich textual data, employing content-based methods derived from local historical news.

  16. Dual-view hypergraph attention network for news recommendation

    News Recommendation (NR) helps users to quickly find their mostly interested information. Recently, some NR systems based on graph neural networks has achieved significant performance improvement in that they use a graph structure to pair-wisely link two nodes (e.g., user, news, topic node) for representation learning.

  17. Attention-Based Graph Neural Network for News Recommendation

    Therefore, we propose an attention-based graph neural network news recommendation model. In our model, muti-channel convolutional neural network is used to generate news representations, and recurrent neural network is used to extract the news sequence information that users clicked on. Users, news, and topics are modeled as three types of ...

  18. Graph Enhanced Representation Learning for News Recommendation

    Here we propose a news recommendation method which can enhance the representation learning of users and news by modeling their relatedness in a graph setting. In our method, users and news are both viewed as nodes in a bipartite graph constructed from historical user click behaviors. For news representations, a transformer architecture is first ...

  19. Graph Neural News Recommendation with User Existing and Potential

    Suyu Ge, Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2020. Graph enhanced representation learning for news recommendation. In Proceedings of the WWW. ACM / IW3C2, 2863-2869. ... Chuan Shi, Cheng Yang, and Chao Shao. 2020. Graph neural news recommendation with long-term and short-term interest modeling. Information Processing and ...

  20. Design of news recommendation model based on sub-attention news encoder

    Abstract. To extract finer-grained segment features from news and represent users accurately and exhaustively, this article develops a news recommendation (NR) model based on a sub-attention news encoder. First, by using convolutional neural network (CNN) and sub-attention mechanism, this model extracts a rich feature matrix from the news text.

  21. Deep graph representation learning influence maximization with

    The parameterized aggregation stage of GNNs, on the other hand, offers representation by learning the crucial graph topologies and scaling linearly with the number of edges and parameters. Because of its invariance to permutations and awareness of input sparsity, GNNs' inductive bias effectively encodes combinatorial and relational data.

  22. Aspect-driven User Preference and News Representation Learning for News

    commendation by modeling fine-grained aspect-level user preferences and news features. As far as we k. ow, this is the first work for news recommendation driven by aspect-level information. We devise a novel Aspect-driven News Recommender System (A. RS), which is built on aspect-level user pref-erence and news representation learning. ANR.

  23. News Graph: An Enhanced Knowledge Graph for News Recommendation

    An enhanced knowledge graph is proposed called news graph, which is the first time that a domain specific graph is constructed for news recommendations and can greatly benefit a wide range of news recommendation tasks, including personalized article recommendation, article category classification, article popularity prediction, and local news detection. Knowledge graph, which contains rich ...

  24. Search-based Time-aware Graph-enhanced Recommendation with Sequential

    We call this extended version Search-based Time-Aware Graph-Enhanced Recommendation (STAGE). We conduct extensive experiments on three real-world datasets and STARec achieves consistent superiority. ... Learning graph representation with generative adversarial nets. IEEE Trans. Knowl. Data Eng. 33, 8 (2019), 3090-3103. Crossref. Google ...

  25. Graph Enhanced Representation Learning for News Recommendation

    Table 1: Statistics of our dataset. - "Graph Enhanced Representation Learning for News Recommendation" Skip to search form Skip to main content Skip to account menu ... {Ge2020GraphER, title={Graph Enhanced Representation Learning for News Recommendation}, author={Suyu Ge and Chuhan Wu and Fangzhao Wu and Tao Qi and Yongfeng Huang}, journal ...

  26. Boosting Patient Representation Learning via Graph Contrastive Learning

    The intuition behind our approach is to incorporate graph contrastive learning paradigm in patient representation learning using EHR data. Our approach consists of three well-designed modules for learning graph-based patient representations, alongside a pretraining mechanism that exploits self-supervised information in generated patient graphs.

  27. Effective Edge-wise Representation Learning in Edge-Attributed

    Graph representation learning (GRL) is to encode graph elements into informative vector representations, which can be used in downstream tasks for analyzing graph-structured data and has seen extensive applications in various domains. ... Liying Wang, and Jessica Lam. 2023. EEGNN: Edge Enhanced Graph Neural Network with a Bayesian Nonparametric ...