The coming to life of the Royal Society in 1660 surely represented an important milestone in the history of science, not least in Economics. Yet, its founding motto, ``Nullius in verba'', could be somewhat misleading. Words in fact may play an important role in Economics. In order to extract relevant information that words provide, this thesis relies on state-of-the-art methods from the information retrieval and computer science communities.
Chapter 1 shows how policy uncertainty indices can be constructed via unsupervised machine learning models. Using unsupervised algorithms proves useful in terms of the time and resources needed to compute these indices. The unsupervised machine learning algorithm, called Latent Dirichlet Allocation (LDA), allows obtaining the different themes in documents without any prior information about their context. Given that this algorithm is widely used throughout this thesis, this chapter offers a detailed while intuitive description of its underlying mechanics.
Chapter 2 uses the LDA algorithm to categorize the political uncertainty embedded in the Scottish media. In particular, it models the uncertainty regarding Brexit and the Scottish referendum for independence. These referendum-related indices are compared with the Google search queries ``Scottish independence'' and ``Brexit'', showing strong similarities. The second part of the chapter examines the relationship of these indices on investment in a longitudinal panel dataset of 2,589 Scottish firms over the period 2008-2017. It presents evidence of greater sensitivity for firms that are financially constrained or whose investment is to a greater degree irreversible. Additionally, it is found that Scottish companies located on the border with England have a stronger negative correlation with Scottish political uncertainty than those operating in the rest of the country. Contrary to expectations, we notice that investment coming from manufacturing companies appears less sensitive to political uncertainty.
Chapter 3 builds eight different policy-related uncertainty indicators for the four largest euro area countries using press-media in German, French, Italian and Spanish from January 2000 until May 2019. This is done in two steps. Firstly, a continuous bag of word model is used to obtain semantically similar words to ``economy'' and ``uncertainty'' across the four languages and contexts. This allows for the retrieval of all news-articles relevant to economic uncertainty. Secondly, LDA is again employed to model the different sources of uncertainty for each country, highlighting how easily LDA can adapt to different languages and contexts. Using a Bayesian Structural Vector Autoregressive set up (BSVAR) a strong heterogeneity in the relationship between uncertainty and investment in machinery and equipment is then documented. For example, while investment in France, Italy and Spain reacts heavily to political uncertainty shocks, in Germany it is more sensitive to trade uncertainty shocks.
Finally, Chapter 4 analyses English language media from Europe, India and the United States, augmented by a sentiment analysis to study how different narratives concerning cryptocurrencies influence their prices. The time span ranges from April 2013 to December 2018 a period where cryptocurrency prices experienced a parabolic behaviour. In addition, this case study is motivated by Shiller's belief that narratives around cryptocurrencies might have led to this price behaviour. Nonetheless, the relationship between narratives and prices ought to be driven by complex interactions. For example, articles written in the media about a specific phenomenon will attract or detract new investors depending on their content and tone (sentiment). Moreover, the press might also react to price changes by increasing the coverage of a given topic. For this reason, a recent causal model, Convergent Cross Mapping (CCM), suited to discovering causal relationships in complex dynamical ecosystems is used. I find bidirectional causal relationships between narratives concerning investment and regulation while a mild unidirectional causal association exists in narratives that relate technology and security to prices.
Item Type: | Thesis (PhD) |
---|---|
Qualification Level: | Doctoral |
Keywords: | macroeconomics, text-mining, uncertainty, machine learning. |
Subjects: | > > > |
Colleges/Schools: | > > |
Supervisor's Name: | Nolan, Professor Charles and Leith, Professor Campbell |
Date of Award: | 2020 |
Depositing User: | |
Unique ID: | glathesis:2020-81641 |
Copyright: | Copyright of this thesis is held by the author. |
Date Deposited: | 08 Sep 2020 11:49 |
Last Modified: | 31 Aug 2022 10:10 |
Thesis DOI: | |
URI: |
View Item |
Downloads per month over past year
View more statistics
The University of Glasgow is a registered Scottish charity: Registration Number SC004401
20k Accesses
41 Citations
The problem of text mining has gained increasing attention in recent years because of the large amounts of text data, which are created in a variety of social network, web, and other information-centric applications. Unstructured data is the easiest form of data which can be created in any application scenario. As a result, there has been a tremendous need to design methods and algorithms which can effectively process a wide variety of text applications. This book will provide an overview of the different methods and algorithms which are common in the text domain, with a particular focus on mining methods.
This is a preview of subscription content, log in via an institution to check access.
Subscribe and save.
Tax calculation will be finalised at checkout
Purchases are for personal use only
Institutional subscriptions
Unable to display preview. Download preview PDF.
C. Aggarwal. Data Streams: Models and Algorithms , Springer, 2007.
Google Scholar
C. Aggarwal. Social Network Data Analytics , Springer, 2011.
R. A. Baeza-Yates, B. A. Ribeiro-Neto, Modern Information Retrieval - the concepts and technology behind search, Second edition , Pearson Education Ltd., Harlow, England, 2011.
S. Chakrabarti, B. Dom, P. Indyk. Enhanced Hypertext Categorization using Hyperlinks, ACM SIGMOD Conference , 1998.
W. B. Croft, D. Metzler, T. Strohma, Search Engines - Information Retrieval in Practice , Pearson Education, 2009.
S. Deerwester, S. Dumais, T. Landauer, G. Furnas, R. Harshman. Indexing by Latent Semantic Analysis. JASIS , 41(6), pp. 391–407, 1990.
D. A. Grossman, O. Frieder, Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval) , Springer-Verlag New York, Inc, 2004.
J. Han, M. Kamber. Data Mining: Concepts and Techniques , 2nd Edition, Morgan Kaufmann, 2005.
C. Manning, P. Raghavan, H. Schutze, Introduction to Information Retrieval , Cambridge University Press, 2008.
I. T. Jolliffee. Principal Component Analysis. Springer , 2002.
S. J. Pan, Q. Yang. A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering , 22(10): pp 1345–1359, Oct. 2010.
Article Google Scholar
G. Salton. An Introduction to Modern Information Retrieval , Mc Graw Hill, 1983.
K. Sparck Jones P. Willett (ed.). Readings in Information Retrieval , Morgan Kaufmann Publishers Inc, 1997.
Download references
Authors and affiliations.
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Charu C. Aggarwal
University of Illinois at Urbana-Champaign, Urbana, IL, USA
ChengXiang Zhai
You can also search for this author in PubMed Google Scholar
Correspondence to Charu C. Aggarwal .
Editors and affiliations.
Thomas J. Watson Research Center, IBM, Skyline Drive 19, Hawthorne, 10532, New York, USA
at Urbana-Champaign, University of Illinois, URBANA, 61801, Illinois, USA
Reprints and permissions
© 2012 Springer Science+Business Media, LLC
Aggarwal, C.C., Zhai, C. (2012). An Introduction to Text Mining. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_1
DOI : https://doi.org/10.1007/978-1-4614-3223-4_1
Published : 07 January 2012
Publisher Name : Springer, Boston, MA
Print ISBN : 978-1-4614-3222-7
Online ISBN : 978-1-4614-3223-4
eBook Packages : Computer Science Computer Science (R0)
What support is available for text mining.
We can help you with:
If you need help with any of the topics mentioned above, please reach out to us at: [email protected] .
Software Carpentry Workshops
Watch for quarterly workshops to build skills in R or python through eScience Institute
Available to: Current students, faculty, and staff
Offered: Quarterly
More information: https://uwescience.github.io/carpentries/
IMAGES
VIDEO
COMMENTS
Avito Loops has significant experience in text data mining and has already developed two text classifiers: one based on entity recognition, pattern matching and voting, the other based on machine learning and decision trees. This project's challenge was to develop a new classifier based on Deep Learning.
PDF | Text mining has become an exciting research field as it tries to discover valuable information from unstructured texts. The unstructured texts... | Find, read and cite all the research you ...
question-based examination processes. This thesis responds to the trend to employ machine learning and text mining techniques in evaluation of students' responses to open questions. The present research is focused on the identification of the best approach to automate the grading of students' answers in open-
Therefore, this thesis aims to achieve two objectives: 1) constructing a text-mining pipeline that extracts solid-state synthesis datasets from scientific papers, and 2) imple- menting an interpretable machine-learning method to predict solid-state synthesis condi-
The primary goals of this research are (1) Using text mining techniques for. identifying the topics of a scienti fic text related to ML research and developing a. hierarchical and evolutionary ...
Abstract Text mining is considered an efective approach for the identification of relevant phenom- ena in systematic reviews. Topic models have shown to be a promising unsupervised technique to reveal common topics in text data. This research used three topic modeling text mining algorithms, LDA, Top2Vec, and BERTopic, to identify the relevant phe-
Text mining is the task of extracting meaningful information from text, which has gained significant attentions in recent years. In this paper, we describe several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering.
ABSTRACT. The objective of this thesis is to develop efficient text classification models to. classify text documents. In usual text mining algorithms, a document is represented as. a vector whose dimension is the number of distinct keywords in it, which can be very. large.
This article focusses on Text Mining (TM henceforth), that is a set of statistical and computer science techniques specifically developed to analyse text data, and aims to give a theoretical introduction to TM and to provide some examples of its applications. Text has always been an informative source of insight into a specific field or individuals. However, with the advent of new technologies ...
The results of the study support the thesis that models with a properly conducted text-mining process have better classification quality than models without text variables and the use of this data mining approach is recommended when input data includes text variables.
Text mining is a new and exciting area of computer science research that tries to solve the crisis of information overload by combining techniques from data mining, machine learning, natural language processing, information retrieval, and knowledge management. Similarly, link detection - a rapidly evolving approach to the analysis of
FROM TEXT MINING TO KNOWLEDGE MINING: AN INTEGRATED FRAMEWORK OF CONCEPT EXTRACTION AND CATEGORIZATION FOR DOMAIN ONTOLOGY . Ph.D Dissertation . ... The research reported in this thesis was supported by Prokex project (EUREKA_HU_12-1-2012-0039), in cooperation with the Corvinno Technology Transfer Center, so I praise the enormous ...
text, such as an interest rate announcement, befor heoretical background behind text analysis and interpretation of text. Section 3 describes text extraction and Section 4 pre ents methodologies for cleaning and storing text data for text mining. Sec-tion 5 presents several common approaches to text data structures used in Section 6, which de
In order to extract relevant information that words provide, this thesis relies on state-of-the-art methods from the information retrieval and computer science communities. Chapter 1 shows how policy uncertainty indices can be constructed via unsupervised machine learning models. Using unsupervised algorithms proves useful in terms of the time ...
provided by ASU Digital Repository. A Study of Text Mining Framework. for Automated Classification of Software Requirements. in Enterprise Systems. by. Japa Swadia. A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science. Approved April 2016 by the Graduate Supervisory Committee:
The texts are mined mostly form PDF format, followed by Microsoft word format and HTML format (Web pages). Postgraduate students prefer mining texts from full-text articles than from abstracts and the sources postgraduate students mostly mine text is through the World Wide Web, followed by library databases.
Abstract Unstructured data in the form of digitized text is rapidly increasing in volume, accessibility, and relevance for research on innovation and beyond. While traditional attempts to analyze text (i.e., qualitative analysis) are limited in processing large amounts of data, text mining presents a set of approaches that allow researchers to explore large-scale collections of texts in an ...
We find that text mining is becoming more popular and essential in educational research. The conclusion is that we can employ three steps (text source selection, text mining techniques application, and educational information discovery) to use text mining in educational studies. We also summarize diferent options in each step in this paper.
1. Introduction Data mining is a field which has seen rapid advances in recent years [8] because of the immense advances in hardware and software technology which has lead to the availability of different kinds of data. This is particularly true for the case of text data, where the development of hardware and software platforms for the web and social networks has enabled the rapid creation of ...
ng application areas are opinion mining and geographical text mining. My thesis relates to exploring automated techniques to identify the geographical location that best describes the content of textual documents, with the objective of building a system that discovers and maps opinions towards certain themes, expressed in the context of particular
The document discusses the challenges of writing a thesis on text mining and how an expert service can help alleviate these challenges. It notes that text mining requires deep understanding of computational techniques and linguistic principles while navigating intricacies. Seeking professional assistance can be invaluable for students grappling with the complex subject matter and demands of ...
Text Mining Are you looking for patterns in large sets of text or researching ways to make sense of textual data using sentiment analysis, topic modeling, or more? Whether you're new to text mining or stuck with text mining questions, we're here to help!