• Data Mining
  • Computer Science
  • Text Mining

Using Text Mining Techniques for Extracting Information from Research Articles

  • January 2018
  • Studies in Computational Intelligence
  • In book: Intelligent Natural Language Processing: Trends and Applications (pp.373-397)

Said A. Salloum at University of Sharjah

  • University of Sharjah

Mostafa Al-Emran at British University in Dubai

  • British University in Dubai
  • This person is not on ResearchGate, or hasn't claimed this research yet.

Khaled Shaalan at British University in Dubai

Abstract and Figures

Text mining processing framework

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Muhammad Dzikry Afandi
  • Ahmad Homaidi
  • Ach Zubairi
  • Sally Simmons
  • John Hagan Jr

Medina Srem-Sai

  • Chin-Teng Lin
  • Mohammed Thanoon
  • Sami Karali
  • Dr. R.Mangai Begum
  • J.S. Baruni

Markus Stocker

  • Lauren Snyder
  • Matthew Anfuso

Mohamad Yaser Jaradeh

  • Nikolaos Donos

Elena Calciolari

  • Xinxin Wang
  • Srivageesh K Srinidhi
  • U Someswara Shashank

Manju Venugopalan

  • K. A. Apoorva
  • Sivanesan Sangeetha

Alfio Regla

  • Azza Abdel Monem

Khaled Shaalan

  • Educ Inform Tech
  • Shailaja Jayashankar

R. Sridaran

  • Stat Anal Data Min

Hossein Hassani

  • Lect Notes Comput Sci

Pazienza Maria Teresa

  • Mengdong Chen
  • Lianzhong Liu

Sarween Zaza

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
  • Corpus ID: 63254021

Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS

  • Goutam Chakraborty , Murali Pagolu , Satish Garla
  • Published 25 October 2013
  • Computer Science

128 Citations

Analysis of unstructured data: applications of text analytics and sentiment mining.

  • Highly Influenced

A practical guide to text mining with topic extraction

Using text analysis to improve the quality of scoring models with sas ® enterprise, text mining using latent semantic analysis: an illustration through examination of 30 years of research at jis, when big data made the headlines: mining the text of big data coverage in the news media, exploring the art and science of sas ® text analytics : best practices in developing rule-based models, breaking through the barriers : innovative sampling techniques for unstructured data analysis, exploring online drug reviews using text analytics, sentiment analysis and data mining models, a comprehensive study of text mining approach, from unstructured text to the data warehouse : customer support at the university of north texas, 31 references, text mining: predictive methods for analyzing unstructured information, text mining: finding nuggets in mountains of textual data, sas ® since 1976 : an application of text mining to reveal trends, discovering evolutionary theme patterns from text: an exploration of temporal text mining, text mining : applications and theory, automatic analysis, theme generation, and summarization of machine-readable texts, data and text mining: a business applications approach, foundations of statistical natural language processing, the automatic creation of literature abstracts, handbook of natural language processing, related papers.

Showing 1 through 3 of 0 Related Papers

  • Skip to main content
  • Accessibility information

text mining thesis pdf

  • Enlighten Enlighten

Enlighten Theses

  • Latest Additions
  • Browse by Year
  • Browse by Subject
  • Browse by College/School
  • Browse by Author
  • Browse by Funder
  • Login (Library staff only)

In this section

Text-mining in macroeconomics: the wealth of words

Azqueta Gavaldon, Andres (2020) Text-mining in macroeconomics: the wealth of words. PhD thesis, University of Glasgow.


The coming to life of the Royal Society in 1660 surely represented an important milestone in the history of science, not least in Economics. Yet, its founding motto, ``Nullius in verba'', could be somewhat misleading. Words in fact may play an important role in Economics. In order to extract relevant information that words provide, this thesis relies on state-of-the-art methods from the information retrieval and computer science communities.

Chapter 1 shows how policy uncertainty indices can be constructed via unsupervised machine learning models. Using unsupervised algorithms proves useful in terms of the time and resources needed to compute these indices. The unsupervised machine learning algorithm, called Latent Dirichlet Allocation (LDA), allows obtaining the different themes in documents without any prior information about their context. Given that this algorithm is widely used throughout this thesis, this chapter offers a detailed while intuitive description of its underlying mechanics.

Chapter 2 uses the LDA algorithm to categorize the political uncertainty embedded in the Scottish media. In particular, it models the uncertainty regarding Brexit and the Scottish referendum for independence. These referendum-related indices are compared with the Google search queries ``Scottish independence'' and ``Brexit'', showing strong similarities. The second part of the chapter examines the relationship of these indices on investment in a longitudinal panel dataset of 2,589 Scottish firms over the period 2008-2017. It presents evidence of greater sensitivity for firms that are financially constrained or whose investment is to a greater degree irreversible. Additionally, it is found that Scottish companies located on the border with England have a stronger negative correlation with Scottish political uncertainty than those operating in the rest of the country. Contrary to expectations, we notice that investment coming from manufacturing companies appears less sensitive to political uncertainty.

Chapter 3 builds eight different policy-related uncertainty indicators for the four largest euro area countries using press-media in German, French, Italian and Spanish from January 2000 until May 2019. This is done in two steps. Firstly, a continuous bag of word model is used to obtain semantically similar words to ``economy'' and ``uncertainty'' across the four languages and contexts. This allows for the retrieval of all news-articles relevant to economic uncertainty. Secondly, LDA is again employed to model the different sources of uncertainty for each country, highlighting how easily LDA can adapt to different languages and contexts. Using a Bayesian Structural Vector Autoregressive set up (BSVAR) a strong heterogeneity in the relationship between uncertainty and investment in machinery and equipment is then documented. For example, while investment in France, Italy and Spain reacts heavily to political uncertainty shocks, in Germany it is more sensitive to trade uncertainty shocks.

Finally, Chapter 4 analyses English language media from Europe, India and the United States, augmented by a sentiment analysis to study how different narratives concerning cryptocurrencies influence their prices. The time span ranges from April 2013 to December 2018 a period where cryptocurrency prices experienced a parabolic behaviour. In addition, this case study is motivated by Shiller's belief that narratives around cryptocurrencies might have led to this price behaviour. Nonetheless, the relationship between narratives and prices ought to be driven by complex interactions. For example, articles written in the media about a specific phenomenon will attract or detract new investors depending on their content and tone (sentiment). Moreover, the press might also react to price changes by increasing the coverage of a given topic. For this reason, a recent causal model, Convergent Cross Mapping (CCM), suited to discovering causal relationships in complex dynamical ecosystems is used. I find bidirectional causal relationships between narratives concerning investment and regulation while a mild unidirectional causal association exists in narratives that relate technology and security to prices.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: macroeconomics, text-mining, uncertainty, machine learning.
Subjects: >
>
>
Colleges/Schools: > >
Supervisor's Name: Nolan, Professor Charles and Leith, Professor Campbell
Date of Award: 2020
Depositing User:
Unique ID: glathesis:2020-81641
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 08 Sep 2020 11:49
Last Modified: 31 Aug 2022 10:10
Thesis DOI:
URI:

Actions (login required)

View Item

Downloads per month over past year

View more statistics

-

The University of Glasgow is a registered Scottish charity: Registration Number SC004401

An Introduction to Text Mining

  • First Online: 01 January 2012

Cite this chapter

text mining thesis pdf

  • Charu C. Aggarwal 3 &
  • ChengXiang Zhai 4  

20k Accesses

41 Citations

The problem of text mining has gained increasing attention in recent years because of the large amounts of text data, which are created in a variety of social network, web, and other information-centric applications. Unstructured data is the easiest form of data which can be created in any application scenario. As a result, there has been a tremendous need to design methods and algorithms which can effectively process a wide variety of text applications. This book will provide an overview of the different methods and algorithms which are common in the text domain, with a particular focus on mining methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unable to display preview.  Download preview PDF.

Similar content being viewed by others

text mining thesis pdf

Text Mining

text mining thesis pdf

Text Mining: The New Data Mining Frontier

text mining thesis pdf

Techniques, Applications, and Issues in Mining Large-Scale Text Databases

C. Aggarwal. Data Streams: Models and Algorithms , Springer, 2007.

Google Scholar  

C. Aggarwal. Social Network Data Analytics , Springer, 2011.

R. A. Baeza-Yates, B. A. Ribeiro-Neto, Modern Information Retrieval - the concepts and technology behind search, Second edition , Pearson Education Ltd., Harlow, England, 2011.

S. Chakrabarti, B. Dom, P. Indyk. Enhanced Hypertext Categorization using Hyperlinks, ACM SIGMOD Conference , 1998.

W. B. Croft, D. Metzler, T. Strohma, Search Engines - Information Retrieval in Practice , Pearson Education, 2009.

S. Deerwester, S. Dumais, T. Landauer, G. Furnas, R. Harshman. Indexing by Latent Semantic Analysis. JASIS , 41(6), pp. 391–407, 1990.

D. A. Grossman, O. Frieder, Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval) , Springer-Verlag New York, Inc, 2004.

J. Han, M. Kamber. Data Mining: Concepts and Techniques , 2nd Edition, Morgan Kaufmann, 2005.

C. Manning, P. Raghavan, H. Schutze, Introduction to Information Retrieval , Cambridge University Press, 2008.

I. T. Jolliffee. Principal Component Analysis. Springer , 2002.

S. J. Pan, Q. Yang. A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering , 22(10): pp 1345–1359, Oct. 2010.

Article   Google Scholar  

G. Salton. An Introduction to Modern Information Retrieval , Mc Graw Hill, 1983.

K. Sparck Jones P. Willett (ed.). Readings in Information Retrieval , Morgan Kaufmann Publishers Inc, 1997.

Download references

Author information

Authors and affiliations.

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

Charu C. Aggarwal

University of Illinois at Urbana-Champaign, Urbana, IL, USA

ChengXiang Zhai

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Charu C. Aggarwal .

Editor information

Editors and affiliations.

Thomas J. Watson Research Center, IBM, Skyline Drive 19, Hawthorne, 10532, New York, USA

at Urbana-Champaign, University of Illinois, URBANA, 61801, Illinois, USA

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Aggarwal, C.C., Zhai, C. (2012). An Introduction to Text Mining. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_1

Download citation

DOI : https://doi.org/10.1007/978-1-4614-3223-4_1

Published : 07 January 2012

Publisher Name : Springer, Boston, MA

Print ISBN : 978-1-4614-3222-7

Online ISBN : 978-1-4614-3223-4

eBook Packages : Computer Science Computer Science (R0)

Text Mining

What support is available for text mining.

We can help you with:

  • Starting text mining projects
  • Web Scraping, Information Retrieval, Text Collection Methods (API)
  • Machine Learning for classification & Clustering
  • Natural Language Processing
  • Python, R, SQL

If you need help with any of the topics mentioned above, please reach out to us at: [email protected] .

  • Text Mining Guide
  • Web Scraping:
  • Programming based –  Beautiful Soup ,  Scrapy , Selenium
  • Commercial Software (Free/Paid) – Parse Hub, Dexi.io, Scraping-bot.io
  • Text Cleaning
  • TextClean – Collection of open-source tools for cleaning & normalizing text documents in R
  • OpenRefine – Open-source data cleansing tool by Google
  • Trifacta Wrangler – Free tool dor data preparation
  • Text Analytics & Visualization:
  • Gale Digital Scholar Lab – Apply natural language processing tools to raw text data (OCR) from Gale Primary Sources in a single research platform.
  • ProQuest TDM – text mine large sets of news, scholarly, and other publications UW Libraries licenses with ProQuest
  • Rosette Text Analytics – Suite of interoperable components for text analytics
  • WordStat – Advanced Content Analysis
  • Apache OpenNLP – Document Categorizer and more
  • Natural Language Toolkit – Industrial strength NLP libraries in Python

Software Carpentry Workshops

Watch for quarterly workshops to build skills in R or python through eScience Institute

Available to: Current students, faculty, and staff

Offered: Quarterly

More information: https://uwescience.github.io/carpentries/

IMAGES

  1. (PDF) Text Mining Research: A Survey

    text mining thesis pdf

  2. (PDF) TECHNOLOGY ADVANCEMENT: AN APPLICATION OF TEXT MINING

    text mining thesis pdf

  3. (PDF) A Study on Text Mining Methods and Applications

    text mining thesis pdf

  4. (PDF) The application of text mining methods in innovation research

    text mining thesis pdf

  5. (PDF) A Text Mining Research Based on LDA Topic Modelling

    text mining thesis pdf

  6. (PDF) TEXT MINING MODEL: A REVIEW

    text mining thesis pdf

VIDEO

  1. Machine Learning Example: Text Mining with TIMi

  2. Occlusion in Augmented Reality

  3. Minería de Texto en R

  4. Data mining masters thesis ! Post graduation research thesis

  5. How to Download Thesis from Krishikosh(Updated 2024)

  6. MS Word formatting for Research papers and PhD thesis

COMMENTS

  1. PDF MASTER'S THESIS Deep Learning for text data mining: Solving ...

    Avito Loops has significant experience in text data mining and has already developed two text classifiers: one based on entity recognition, pattern matching and voting, the other based on machine learning and decision trees. This project's challenge was to develop a new classifier based on Deep Learning.

  2. (PDF) Text Mining : Techniques and its Application

    PDF | Text mining has become an exciting research field as it tries to discover valuable information from unstructured texts. The unstructured texts... | Find, read and cite all the research you ...

  3. PDF Applying Text Mining and Machine Learning to Build Methods for

    question-based examination processes. This thesis responds to the trend to employ machine learning and text mining techniques in evaluation of students' responses to open questions. The present research is focused on the identification of the best approach to automate the grading of students' answers in open-

  4. PDF Text-mining and machine-learning solid-state synthesis from the

    Therefore, this thesis aims to achieve two objectives: 1) constructing a text-mining pipeline that extracts solid-state synthesis datasets from scientific papers, and 2) imple- menting an interpretable machine-learning method to predict solid-state synthesis condi-

  5. (PDF) Using Text Mining Techniques for Extracting Information from

    The primary goals of this research are (1) Using text mining techniques for. identifying the topics of a scienti fic text related to ML research and developing a. hierarchical and evolutionary ...

  6. PDF Developing text-mining methods to review the published literature

    Abstract Text mining is considered an efective approach for the identification of relevant phenom- ena in systematic reviews. Topic models have shown to be a promising unsupervised technique to reveal common topics in text data. This research used three topic modeling text mining algorithms, LDA, Top2Vec, and BERTopic, to identify the relevant phe-

  7. A Brief Survey of Text Mining: Classification, Clustering and

    Text mining is the task of extracting meaningful information from text, which has gained significant attentions in recent years. In this paper, we describe several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering.

  8. PDF Microsoft Word

    ABSTRACT. The objective of this thesis is to develop efficient text classification models to. classify text documents. In usual text mining algorithms, a document is represented as. a vector whose dimension is the number of distinct keywords in it, which can be very. large.

  9. PDF Text mining: An introduction to theory and some applications

    This article focusses on Text Mining (TM henceforth), that is a set of statistical and computer science techniques specifically developed to analyse text data, and aims to give a theoretical introduction to TM and to provide some examples of its applications. Text has always been an informative source of insight into a specific field or individuals. However, with the advent of new technologies ...

  10. [PDF] Text Mining and Analysis: Practical Methods, Examples, and Case

    The results of the study support the thesis that models with a properly conducted text-mining process have better classification quality than models without text variables and the use of this data mining approach is recommended when input data includes text variables.

  11. PDF The Text Mining Handbook

    Text mining is a new and exciting area of computer science research that tries to solve the crisis of information overload by combining techniques from data mining, machine learning, natural language processing, information retrieval, and knowledge management. Similarly, link detection - a rapidly evolving approach to the analysis of

  12. PDF FROM TEXT MINING TO KNOWLEDGE MINING

    FROM TEXT MINING TO KNOWLEDGE MINING: AN INTEGRATED FRAMEWORK OF CONCEPT EXTRACTION AND CATEGORIZATION FOR DOMAIN ONTOLOGY . Ph.D Dissertation . ... The research reported in this thesis was supported by Prokex project (EUREKA_HU_12-1-2012-0039), in cooperation with the Corvinno Technology Transfer Center, so I praise the enormous ...

  13. PDF Text mining methodologies with R: An application to central bank texts

    text, such as an interest rate announcement, befor heoretical background behind text analysis and interpretation of text. Section 3 describes text extraction and Section 4 pre ents methodologies for cleaning and storing text data for text mining. Sec-tion 5 presents several common approaches to text data structures used in Section 6, which de

  14. Text-mining in macroeconomics: the wealth of words

    In order to extract relevant information that words provide, this thesis relies on state-of-the-art methods from the information retrieval and computer science communities. Chapter 1 shows how policy uncertainty indices can be constructed via unsupervised machine learning models. Using unsupervised algorithms proves useful in terms of the time ...

  15. PDF A Study of Text Mining Framework

    provided by ASU Digital Repository. A Study of Text Mining Framework. for Automated Classification of Software Requirements. in Enterprise Systems. by. Japa Swadia. A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science. Approved April 2016 by the Graduate Supervisory Committee:

  16. PDF Analysis of Text Mining from Full-text Articles and Abstracts by

    The texts are mined mostly form PDF format, followed by Microsoft word format and HTML format (Web pages). Postgraduate students prefer mining texts from full-text articles than from abstracts and the sources postgraduate students mostly mine text is through the World Wide Web, followed by library databases.

  17. The application of text mining methods in innovation research: current

    Abstract Unstructured data in the form of digitized text is rapidly increasing in volume, accessibility, and relevance for research on innovation and beyond. While traditional attempts to analyze text (i.e., qualitative analysis) are limited in processing large amounts of data, text mining presents a set of approaches that allow researchers to explore large-scale collections of texts in an ...

  18. PDF A survey of the literature: how scholars use text mining in Educational

    We find that text mining is becoming more popular and essential in educational research. The conclusion is that we can employ three steps (text source selection, text mining techniques application, and educational information discovery) to use text mining in educational studies. We also summarize diferent options in each step in this paper.

  19. PDF Chapter 1 AN INTRODUCTION TO TEXT MINING

    1. Introduction Data mining is a field which has seen rapid advances in recent years [8] because of the immense advances in hardware and software technology which has lead to the availability of different kinds of data. This is particularly true for the case of text data, where the development of hardware and software platforms for the web and social networks has enabled the rapid creation of ...

  20. PDF Text Mining Methods for Mapping Opinions from Georeferenced ...

    ng application areas are opinion mining and geographical text mining. My thesis relates to exploring automated techniques to identify the geographical location that best describes the content of textual documents, with the objective of building a system that discovers and maps opinions towards certain themes, expressed in the context of particular

  21. Text Mining Thesis PDF

    The document discusses the challenges of writing a thesis on text mining and how an expert service can help alleviate these challenges. It notes that text mining requires deep understanding of computational techniques and linguistic principles while navigating intricacies. Seeking professional assistance can be invaluable for students grappling with the complex subject matter and demands of ...

  22. Text Mining

    Text Mining Are you looking for patterns in large sets of text or researching ways to make sense of textual data using sentiment analysis, topic modeling, or more? Whether you're new to text mining or stuck with text mining questions, we're here to help!