Ensuring timely and reliable access to and use of information.
As the total potential impact on the university increases from low to high, data classification should become more restrictive, moving from public to restricted . If an appropriate classification is still unclear after considering these points, contact the Information Security Office for assistance.
The Information Security Office and the Office of General Counsel have defined several types of Restricted data based on state and federal regulatory requirements. This list does not encompass all types of restricted data. Predefined types of restricted information are defined as follows:
An Authentication Verifier is a piece of information that is held in confidence by an individual and used to prove that the person is who they say they are. In some instances, an Authentication Verifier may be shared amongst a small group of individuals. An Authentication Verifier may also be used to prove the identity of a system or service. Examples include, but are not limited to: | ||
See the University's . | ||
EPHI is defined as any Protected Health Information (PHI) that is stored in or transmitted by electronic media. For the purpose of this definition, electronic media includes: | ||
Export Controlled Materials are defined as any information or materials that are subject to the United States export control regulations, including, but not limited to, the Export Administration Regulations (EAR) published by the US Department of Commerce and the International Traffic in Arms Regulations (ITAR) published by the US Department of State. See the for more information. |
| |
FTI is defined as any return, return information, or taxpayer return information that is entrusted to the University by the Internal Revenue Services. See for more information. | ||
Payment card information is defined as a credit card number (also referred to as a primary account number or PAN) in combination with one or more of the following data elements: Payment Card Information is also governed by the University's (login required). |
| |
Personally Identifiable Education Records are defined as any Education Records that contain one or more of the following personal identifiers: See Carnegie Mellon's for more information on what constitutes an Education Record. | ||
For the purpose of meeting security breach notification requirements, PII is defined as a person’s first name or first initial and last name in combination with one or more of the following data elements: | ||
PHI is defined as individually identifiable health information transmitted by electronic media, maintained in electronic media, or transmitted or maintained in any other form or medium by a Covered Component, as defined in Carnegie Mellon’s . PHI is considered individually identifiable if it contains one or more of the following identifiers: Per Carnegie Mellon's , PHI does not include education records or treatment records covered by the Family Educational Rights and Privacy Act or employment records held by the University in its role as an employer. | ||
Controlled Technical Information means technical information with military or space applications that is subject to controls on the access, use, reproduction, modification, performance, display, release, disclosure, or dissemination per . | ||
Documents and data labeled or marked For Official Use Only are a pre-cursor of as defined by the . | ||
The EU’s General Data Protection Regulation (GDPR) defines personal data as any information that can identify a natural person, directly or indirectly, by reference to an identifier, including: Any personal data that is collected from individuals in European Economic Area (EEA) countries is subject to GDPR. For questions, send an email to . |
| |
|
| |
, as defined by is a designation from the US government for information that must be protected according to specific requirements (see ). CUI is an umbrella term for multiple other data types, such as , For , and information. Personally Identifiable Information can also be CUI when given to the University as part of a Federal government contract or sub-contract. |
|
|
|
|
1.0 | 11/16/22 | Guideline moved from the ISO site. |
2.0 | 4/14/23 | Guideline was updated and approved by the Data Stewardship Council. |
This week: the arXiv Accessibility Forum
Help | Advanced Search
Title: modeling text-label alignment for hierarchical text classification.
Abstract: Hierarchical Text Classification (HTC) aims to categorize text data based on a structured label hierarchy, resulting in predicted labels forming a sub-hierarchy tree. The semantics of the text should align with the semantics of the labels in this sub-hierarchy. With the sub-hierarchy changing for each sample, the dynamic nature of text-label alignment poses challenges for existing methods, which typically process text and labels independently. To overcome this limitation, we propose a Text-Label Alignment (TLA) loss specifically designed to model the alignment between text and labels. We obtain a set of negative labels for a given text and its positive label set. By leveraging contrastive learning, the TLA loss pulls the text closer to its positive label and pushes it away from its negative label in the embedding space. This process aligns text representations with related labels while distancing them from unrelated ones. Building upon this framework, we introduce the Hierarchical Text-Label Alignment (HTLA) model, which leverages BERT as the text encoder and GPTrans as the graph encoder and integrates text-label embeddings to generate hierarchy-aware representations. Experimental results on benchmark datasets and comparison with existing baselines demonstrate the effectiveness of HTLA for HTC.
Comments: | Accepted in ECML-PKDD 2024 Research Track |
Subjects: | Computation and Language (cs.CL) |
Cite as: | [cs.CL] |
(or [cs.CL] for this version) | |
Focus to learn more arXiv-issued DOI via DataCite | |
: | Focus to learn more DOI(s) linking to related resources |
Access paper:.
Code, data and media associated with this article, recommenders and search tools.
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .
IMAGES
VIDEO
COMMENTS
In this paper, we provide a comprehensive review of more than 150 deep learning based models for text classification developed in recent years, and discuss their technical contributions, similarities, and strengths. We also provide a summary of more than 40 popular datasets widely used for text classification.
1169 papers with code • 92 benchmarks • 145 datasets. Text Classification is the task of assigning a sentence or document an appropriate category. The categories depend on the chosen dataset and can range from topics. Text Classification problems include emotion classification, news classification, citation intent classification, among others.
The AAPD is a large dataset in the computer science field for the multi-label text classification from website 1. It has 55,840 papers, including the abstract and the corresponding subjects with 54 labels in total. The aim is to predict the corresponding subjects of each paper according to the abstract. Patent Dataset.
In Proceedings of the 40th International ACM Conference on Research and Development in Information Retrieval (SIGIR'17). Digital Library. Google Scholar ... it is a difficult classification task for short text classification. In this paper, a short text classification framework based on Siamese CNNs and few-shot learning is proposed. The ...
Text classification is the most fundamental and essential task in natural language processing. The last decade has seen a surge of research in this area due to the unprecedented success of deep learning. Numerous methods, datasets, and evaluation metrics have been proposed in the literature, raising the need for a comprehensive and updated survey.
Abstract. Text classification (TC) is an approach used for the classification of any kind of documents for the target category or out. In this paper, we implemented the Support Vector Machines (SVM) model in classifying English text and documents. Here we did two analytical experiments to check the selected classifiers using English documents.
The explosive and widespread growth of data necessitates the use of text classification to extract crucial information from vast amounts of data. Consequently, there has been a surge of research in both classical and deep learning text classification methods. Despite the numerous methods proposed in the literature, there is still a pressing need for a comprehensive and up-to-date survey ...
Text classification is the most fundamental and essential task in natural language processing. The last decade has seen a surge of research in this area due to the unprecedented success of deep learning. Numerous methods, datasets, and evaluation metrics have been proposed in the literature, raising the need for a comprehensive and updated survey.
Text classification is a fundamental task in multiple practical scenarios of natural language processing (NLP). During the last few decades, many text classification methods based on deep learning (DL) models have been proposed and adopted in various fields. This article provides an overview description of the mainstream deep learning approaches that are applied in text classification in ...
With the rapid development of deep learning technology, CNN and LSTM have become two of the most popular neural networks. This paper combines CNN and LSTM or its variant and makes a slight change. It proposes a text classification model named NA-CNN-LSTM or NA-CNN-COIF-LSTM, which has no activation function in CNN. The experimental results on the subjective and objective text categorization ...
Text Classification (TC), also known as Document Classification or Text Categorization, is the process of assigning several predefined categories to a set of texts, often based on its content (Jindal et al., 2015; Wang & Deng, 2017). With the advent of the era of big data, the enormous quantity and diversity of digital documents have made it ...
In this paper [], authors have given an excellent introduction to text classification.The paper is very beginner-friendly and has explained almost every text classification technique from basic linear classification models to advanced deep learning models, from supervised learning models to unsupervised learning models, and has also explained pros and cons of every technique.
This paper illustrates the text classification process using machine learning techniques. The. references cited cover the major theoretical issues and gui de the researcher to interesting research ...
Text classification is a crucial task in the field of natural language processing (NLP), which is relevant for many real-world applications, including file classification, web search, sentiment analysis, and email categorization [1, 2].Numerous studies on text classification have been conducted [].In the conventional text classification approach, a bag of words is a well-liked and typical way ...
With the expanding measure of information and requirement for precision or accuracy automation process is required for the text classification. Another attractive research opportunity is constructing complex "text data models using Deep learning systems" which have the capability to carry out intricate NLP tasks with semantic requirements.
In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in the real-world problem are discussed. Subjects:
If you work in this field, keeping up to date with all the novel innovations is essential. So, let's look at 10 must-read articles and research papers on text classification. 1. "The impact of preprocessing on text classification". Authors: Alper Kursay Uysal and Serkan Gunal.
This paper states the strengths, limitations and current research trends in text classificati on in an a d-. vanc ed field like AI. The knowledge about text classification is crucial for data ...
Text classification is a popular research topic in the natural language processing. Recently solving text classification problems with graph neural network (GNN) has received increasing attention. ... The research contribution of this paper has three main aspects. Instead of building one text graph for all documents (including test set), our ...
Text Classification (TC) is the process of assigning several different categories to a set of texts. This study aims to evaluate the state of the arts of TC studies. Firstly, TC-related publications indexed in Web of Science were selected as data. In total, 3,121 TC-related publications were published in 760 journals between 2000 and 2020.
In this paper, Text classification is carried out by using a deep learning model that is CNN and a hybrid model using CNN & LSTM and compare the performance of two models. In[1] Classical text classification research focuses on three main areas: feature enginee ring, attributes selection, and the application of various ML algorithms. ...
Text Classification process consists of various sub phases, each of which has its own importance and. need, as shown in figure 1. Text Classifier, in general, consists of six basic sub parts: Data ...
Sentiment Analysis. Sentiment analysis is a popular branch of text classification, which aims to analyze people's opinions in textual data (such as product reviews, movie reviews, and tweets), and extract their polarity and viewpoint. Sentiment classification can be either a binary or a multi-class problem.
Utilizing both real and generated data, this work serves as a prototype for future research involving authentic clinical datasets. The primary aim is to propose a robust methodology and algorithm for employing LLMs in this domain rather than delivering an immediate, ready-to-use classification tool.
The categorization of brief Chinese texts, a critical area for extracting insights from data with limited information content, presents unique challenges such as limited word count, ambiguity, and non-standardized information. These factors complicate the extraction and representation of textual features. This study introduces the BERT-based BRLC (BERT Recurrent Layer Composition) model ...
As the quality of life rises, the demand for flowers has increased significantly, leading to higher expectations for flower sorting system efficiency and speed. This paper presents a real-time, high-precision end-to-end method, which can complete three key tasks in the sorting system: flower localization, flower classification, and flower grading. In order to improve the challenging maturity ...
View a PDF of the paper titled Text Classification via Large Language Models, by Xiaofei Sun and 5 other authors. Despite the remarkable success of large-scale Language Models (LLMs) such as GPT-3, their performances still significantly underperform fine-tuned models in the task of text classification. This is due to (1) the lack of reasoning ...
Background Antimicrobial stewardship (AMS) aims to improve antibiotic use while reducing resistance and its consequences. There is a paucity of data on the availability of AMS programmes in southern Nigeria. Further, there is no data on Nigerian healthcare professionals' knowledge of the WHO 'Access, Watch and Reserve' (AWaRe) classification of antibiotics. This study sought to assess ...
Note: This Guideline applies to all operational and research data. Definitions. The definitions below are for use within the Guidelines for Data Classification. An affiliate is anyone associated with the university, including students, staff, faculty, emeritus faculty, and any sponsored guests. Most individuals affiliated with the university ...
Hierarchical Text Classification (HTC) aims to categorize text data based on a structured label hierarchy, resulting in predicted labels forming a sub-hierarchy tree. The semantics of the text should align with the semantics of the labels in this sub-hierarchy. With the sub-hierarchy changing for each sample, the dynamic nature of text-label alignment poses challenges for existing methods ...