Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

forecasting-logo

Article Menu

big data analytics research papers 2021

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Big data and predictive analytics for business intelligence: a bibliographic study (2000–2021).

big data analytics research papers 2021

Share and Cite

Chen, Y.; Li, C.; Wang, H. Big Data and Predictive Analytics for Business Intelligence: A Bibliographic Study (2000–2021). Forecasting 2022 , 4 , 767-786. https://doi.org/10.3390/forecast4040042

Chen Y, Li C, Wang H. Big Data and Predictive Analytics for Business Intelligence: A Bibliographic Study (2000–2021). Forecasting . 2022; 4(4):767-786. https://doi.org/10.3390/forecast4040042

Chen, Yili, Congdong Li, and Han Wang. 2022. "Big Data and Predictive Analytics for Business Intelligence: A Bibliographic Study (2000–2021)" Forecasting 4, no. 4: 767-786. https://doi.org/10.3390/forecast4040042

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Advances, Systems and Applications

  • Open access
  • Published: 06 August 2022

Big data analytics in Cloud computing: an overview

  • Blend Berisha 1 ,
  • Endrit Mëziu 1 &
  • Isak Shabani 1  

Journal of Cloud Computing volume  11 , Article number:  24 ( 2022 ) Cite this article

37k Accesses

44 Citations

10 Altmetric

Metrics details

Big Data and Cloud Computing as two mainstream technologies, are at the center of concern in the IT field. Every day a huge amount of data is produced from different sources. This data is so big in size that traditional processing tools are unable to deal with them. Besides being big, this data moves fast and has a lot of variety. Big Data is a concept that deals with storing, processing and analyzing large amounts of data. Cloud computing on the other hand is about offering the infrastructure to enable such processes in a cost-effective and efficient manner. Many sectors, including among others businesses (small or large), healthcare, education, etc. are trying to leverage the power of Big Data. In healthcare, for example, Big Data is being used to reduce costs of treatment, predict outbreaks of pandemics, prevent diseases etc. This paper, presents an overview of Big Data Analytics as a crucial process in many fields and sectors. We start by a brief introduction to the concept of Big Data, the amount of data that is generated on a daily bases, features and characteristics of Big Data. We then delve into Big Data Analytics were we discuss issues such as analytics cycle, analytics benefits and the movement from ETL to ELT paradigm as a result of Big Data analytics in Cloud. As a case study we analyze Google’s BigQuery which is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. As a Platform as a Service (PaaS) supports querying using ANSI SQL. We use the tool to perform different experiments such as average read, average compute, average write, on different sizes of datasets.

Introduction

We live in the data age. We see them everywhere and this is due to the great technological developments that have taken place in recent years. The rate of digitalization has increased significantly and now we are rightly talking about” digital information societies”. If 20 or 30 years ago only 1% of the information produced was digital, now over 94% of this information is digital and it comes from various sources such as our mobile phones, servers, sensor devices on the Internet of Things, social networks, etc. [ 1 ]. The year 2002 is considered the” beginning of the digital age” where an explosion of digitally produced equipment and information was seen.

The number and amount of information collected has increased significantly due to the increase of devices that collect this information such as mobile devices, cheap and numerous sensor devices on the Internet of Things (IoT), remote sensing, software logs, cameras, microphones, RFID readers, wireless sensor networks, etc. [ 2 ]. According to statistics, the amount of data generated / day is about 44 zettabytes (44 × 10 21 bytes). Every second, 1.7 MB of data is generated per person [ 3 ]. Based on International Data Group forecasts, the global amount of data will increase exponentially from 2020 to 2025, with a move from 44 to 163 zettabytes [ 4 ]. Figure  1 shows the amount of global data generated, copied and consumed. As can be seen, in the years 2010–2015, the rate of increase from year to year has been smaller, while since 2018, this rate has increased significantly thus making the trend exponential in nature [ 3 ].

figure 1

Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2024 (estimated) [ 3 ]

To get a glimpse of the amount of data that is generated on a daily basis, let’s see a portion of data that different platforms produce. On the Internet, there is so much information at our fingertips. We add to the stockpile everytime we look for answers from our search engines. As a results Google now produces more than 500,000 searches every second (approximately 3.5 billion search per day) [ 5 ]. By the time of writing this article, this number must have changed! Social media on the other hand is a massive data producer. 

People’s ‘love affair’ with social media certainly fuels data creation. Every minute, Snapchat users share 527,760 photos, more than 120 professionals join LinkedIn, users watch 4,146,6000 Youtube videos, 456,000 are sent to Twitter and Instagram users post 46,740 photos [ 5 ]. Facebook remains the largest social media platform, with over 300 million photos uploaded every day with more than 510,000 comments posted and 293,000 statuses updated every minute.

With the increase in the number and quantity of data, there have been advantages but also challenges as systems for managing relational databases and other traditional systems have difficulties in processing and analyzing this quantity. For this reason, the term ‘big data’ arose not only to describe the amount of data but also the need for new technologies and ways of processing and analyzing this data. Cloud Computing has facilitated data storage, processing and analysis. Using Cloud we have access to almost limitless storage and computer power offered by different vendors. Cloud delivery models such as: IAAS (Infrastructure as a Service), PAAS (Platform as a Service) can help organisations across different sectors handle Big Data easier and faster. The aim of this paper is to provide an overview of how analytics of Big Data in Cloud Computing can be done. For this we use Google’s platform BigQuery which is a serverless data warehouse with built-in machine learning capabilities. It’s very robust and has plenty of features to help with the analytics of different size and type of data.

What is big data?

Many authors and organizations have tried to provide a definition of ‘Big Data’. According to [ 6 ] “Big Data refers to data volumes in the range of exabytes and beyond”. In Wikipedia [ 7 ] big data is defined as an accumulation of datasets so huge and complex that it becomes hard to process using database management tools or traditional data processing applications, while the challenges include capture, storage, search, sharing, transfer, analysis, and visualization.

Sam Madden from Massachusetts Institute of Technology (MIT) considers” Big Data” to be data that is too big, too fast, or too hard for existing tools to process [ 8 ]. By too big, it means data that is at the petabyte level and that comes from various sources. By ‘too fast’ it means data growth which is fast and should also be processed quickly. By too hard it means the difficulty that arises as a result the data not adapting to the existing processing tools [ 9 ]. In PCMag (one of the most popular journals on technological trends), Big data refers to the massive amounts of data that is collected over time that are difficult to analyze and handle using common database management tools [ 10 ]. There are many other definitions for Big Data, but we consider that these are enough to gain an impression on this concept.

Features and characteristics of big data

One question that researchers have struggled to answer is what might qualify as ‘big data’? For this reason, in 2001 industry analyst Doug Laney from Gartner introduced the 3 V model which are three features that must complement the data to be considered” big data”: volume, velocity, variety . Volume is a property or characteristic that determines the size of data, usually reported in Terabyte or Petabyte. For example, social networks like Facebook store among others photos of users. Due to the large number of users, it is estimated that Facebook stores about 250 billion photos and over 2.5 trillion posts of its users. This is an extremely large amount of data that needs to be stored and processed. Volume is the most representative feature of ‘big data’ [ 8 ]. In terms of volume, tera or peta level data is usually considered ‘big’ although this depends on the capacity of those analyzing this data and the tools available to them [ 8 ]. Figure  2 shows what each of the three V's represent.

figure 2

3 V’s of Big Data [ 6 ]

The second property or characteristic is velocity . This refers to the degree to which data is generated or the speed at which this data must be processed and analyzed [ 8 ]. For example, Facebook users upload more than 900 million photos a day, which is approximately 104 uploaded photos per second. In this way, Facebook needs to process, store and retrieve this information to its users in real time. Figure  3 shows some statistics obtained from [ 11 ] which show the speed of data generation from different sources. As can be seen, social media and the Internet of Things (IoT) are the largest data generators, with a growing trend.

figure 3

Examples of the velocity of Big Data [ 9 ]

There are two main types of data processing: batch and stream. In batch, processing happens in blocks of data that have been stored over a period of time. Usually data processed in batch are big, so they will take longer to process. Hadoop MapReduce is considered to be the best framework for processing data in batches [ 11 ]. This approach works well in situations where there is no need for real-time analytics and where it is important to process large volumes of data to get more detailed insights.

Stream processing, on the other hand, is a key to the processing and analysis of data in real time. Stream processing allows for data processing as they arrive. This data is immediately fed into analytics tools so the results are generated instantly. There are many scenarios where such an approach can be useful such as fraud detection, where anomalies that signal fraud are detected in real time. Another use case would be online retailers, where real-time processing would enable them to compile large histories of costumer interactions so that additional purchases could be recommended for the costumers in real time [ 11 ].

The third property is variety , which refers to different types of data which are generated from different sources. “Big Data” is usually classified into three major categories: structured data (transactional data, spreadsheets, relational databases etc.), semi-structured (Extensible Markup Language - XML, web server logs etc) and unstructured (social media posts, audio, images, video etc.). In the literature, as a fourth category is also mentioned ‘meta-data’ which represents data about data. This is also shown in Fig.  4 . Most of the data today belong to the category of unstructured data (80%) [ 11 ].

figure 4

Main categories of data variety in Big Data [ 9 ]

Over time, the tree features of big data have been complemented by two additional ones: veracity and value . Veracity is equivalent to quality, which means data that are clean and accurate and that have something to offer [ 12 ]. The concept is also related to the reliability of data that is extracted (e.g., costumer sentiments in social media are not highly reliable data). Value of the data is related to the social or economic value data can generate. The degree of value data can produce depends also on the knowledge of those that make use of it.

Big data analytics in cloud computing

Cloud Computing is the delivery of computing services such as servers, storage, databases, networking, software, analytics etc., over the Internet (“the cloud”) with the aim of providing flexible resources, faster innovation and economies of scale [ 13 ]. Cloud computing has revolutionized the way computing infrastructure is abstracted and used. Cloud paradigms have been extended to include anything that can be considered as a service (hence x a service). The many benefits of cloud computing such as elasticity, pay-as-you-go or pay-per-use model, low upfront investment etc., have made it a viable and desirable choice for big data storage, management and analytics [ 13 ]. Because big data is now considered vital for many organizations and fields, service providers such as Amazon, Google and Microsoft are offering their own big data systems in a cost-efficient manner. These systems offer scalability for business of all sizes. This had led to the prominence of the term Analytics as a Service (AaaS) as a faster and efficient way to integrate, transform and visualize different types of data. Data Analytics.

Big data analytics cycle

According to [ 14 ] processing big data for analytics differs from processing traditional transactional data. In traditional environments, data is first explored then a model design as well as a database structure is created. Figure  5 . depicts the flow of big data analysis. As can be seen, it starts by gathering data from multiple sources, such as multiple files, systems, sensors and the Web. This data is then stored in the so called” landing zone” which is a medium capable of handling the volume, variety and velocity of data. This is usually a distributed file system. After data is stored, different transformations occur in this data to preserve its efficiency and scalability. Afer that, they are integrated into particular analytical tasks, operational reporting, databases or raw data extracts [ 14 ].

figure 5

Flow in the processing of Big Data [ 11 ]

Moving from ETL to ELT paradigm

ETL (Extract, Transform, Load) is about taking data from a data source, applying the transformations that might be required and then load it into a data warehouse to run reports and queries against them. The downside of this approach or paradigm is that is characterized by a lot of I/O activity, a lot of string processing, variable transformation and a lot of data parsing [ 15 ].

ELT (Extract, Load, Transform) is about taking the most compute-intensive activity (transformation) and doing it not in an on-premise service which is already under pressure with regular transaction-handling but instead taking it to the cloud [ 15 ]. This means that there is no need for data staging because data warehousing solution is used for different types.

of data including those that are structured, semi-structured, unstructured and raw. This approach employs the concept of” data lakes” that are different from OLAP (Online Analytical Processing) data warehouses because they do not require the transformation of data before loading them [ 15 ]. Figure 6 illustrates the differences between the two paradigms. As seen, the main difference is where transformation process takes place.

figure 6

Differences between ETL and ELT [ 15 ]

ELT has many benefits over traditional ETL paradigm. The most crucial, as mentioned, is the fact that data of any format can be ingested as soon as it becomes available. Another one is the fact that only the data required for particular analysis can be transformed. In ETL, the entire pipeline and structure of the data in the OLAP may require modification if the previous structure does not allow for new types of analysis [ 16 ].

Some advantages of big data analytics

As mentioned, companies across various sectors in the industry are leveraging Big Data in order to promote decision making that is data-driven. Besides tech industry, the usage and popularity of Big Data has expanded to include healthcare, governance, retail, supply chain management, education etc. Some of the benefits of Big Data Analytics mentioned in [ 17 ] include:

Data accumulation from different sources including the Internet, online shopping sites, social media, databases, external third-party sources etc.

Identification of crucial points that are hidden within large datasets in order to influence business decisions.

Identification of the issues regarding systems and business processes in real time.

Facilitation of service/product delivery to meet or exceed client expecations.

Responding to customer requests, queries and grievances in real time.

Some other benefits according to [ 16 ] are related to:

Cost optimization - One of the biggest advantages of Big Data tools such as Hadoop or Spark is that they offer cost advantages to businesses regarding the storage, processing and analysis of large amounts of data. Authors mention the logistics industry as an example to highlight the cost-reduction benefits of Big Data. In this industry, the cost of product returns is 1.5 times higher than that of actual shipping costs. With Big Data Analytics, companies can minimize product return costs by predicting the likelihood of product returns. By doing so, they can then estimate which products are most likely to be returned and thus enable the companies to take suitable measures to reduce losses on returns.

Efficiency improvements - Big Data can improve operational efficiency by a margin. Big Data tools can amass large amounts of useful costumer data by interacting and gaining their feedback. This data can then be analyzed and interpreted to extract some meaningful patterns hidden within such as customer taste and preferences, buying behaviors etc. This in turn allows companies to create personalized or tailored products/services.

Innovation - Insights from Big Data can be used to tweak business strategies, develop new products/services, optimize service delivery, improve productivity etc. These can all lead to more innovation.

As seen, Big Data Analytics has been mostly leveraged by businesses, but other sectors have also benefited. For example, in healthcare many states are now utilizing the power of Big Data to predict and also prevent epidemics, cure diseases, cut down costs etc. This data has also been used to establish many efficient treatment models. With Big Data more comprehensive reports were generated and these were then converted into relevant critical insights to provide better care [ 17 ].

In education, Big Data has also been used extensively. They have enabled teachers to measure, monitor and respond in real-time to student’s understanding of the material. Professors have created tailor-made materials for students with different knowledge levels to increase their interest [ 18 ].

Case study: GOOGLE’S big query for data processing and analytics

Google Cloud Platform contains a number of services designed to analyze and process big data. Throughout this paper we have described and discussed the architecture and main components of Biguery as one of the most used big data processing tools in GCP. BigQuery is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. It is a Platform as a Service (PaaS) that supports querying using ANSI SQL. It also has built-in machine learning capabilities. Since its launch in 2011 it has gained a lot of popularity and many big companies have utilized it for their data analytics [ 19 ].

From a user perspective, BigQuery has an intuitive user interface which can be accessed in a number of ways depending on user needs. The simplest way to interact with this tool is to use its graphical web interface as shown in Fig.  7 . Slightly more complicated but faster approaches include using cloud console or Bigquery APIs. From Fig. 7 Bigquery web interface offers you the options to add or select existing datasets, schedule and construct queries or transfer data and display results.

figure 7

BigQuery Interface

Data processing and query construction occurs under the sql workspace section, Bigquery offers a rich sql-like syntax to compute and process large sets of data, it operates on relational datasets with well-defined structure including tables with specified columns and types. Figure  8 shows a simple query construction syntax and highlights its execution details. Data displayed under query results shows main performance components of the executed query starting from elapsed time, consumed slot time, size of data processed, average and maximum wait, write and compute times. Query defined in Fig.  8 combines three datasets which contain information regarding Covid-19 reported cases, deaths and recoveries from more than 190 countries through year 2020 till January 2021. Google BigQuery is flexible in a way that allows you to use and combine various datasets suitable for your task easily and with small delays. It contains an ever growing list of public datasets at your disposal and also offers the options to create, edit and import your own. Figure  9 shows the process of adding a table to the newly created dataset. From the Fig.  9 , we see that for table creation as a source we have used a local csv file, this file will be used to create table schema and populate it with data, aside from local upload option as a source to create the table we can use Google BigTable, Google Cloud Storage or Google Drive. The newly created table with its respective data then is ready to be used to construct queries and obtain new insights as shown in Fig. 8 .

figure 8

BigQuery execution details

figure 9

Adding table to the created dataset

One advantage of using imported data in the cloud is the option to manage its access and visibility in the cloud project and cloud members scope. Depending from the way of use, queried data can be saved directly to the local computer through the use of “save results” option from Fig. 8 which offers a variety of formats and data extensions settings to choose from but can also be explored in different configurations using “explore data” option. You can also save constructed queries for later use or schedule query execution interval for more accurate data transmutation through API endpoints. Figure 10 shows how much the average compute time will change/increase with the increase in the size of the dataset used.

figure 10

Average compute time dependence in dataset size

Experiments with different dataset sizes

Before moving to data exploration lets analyze performance results of BigQuery in simple queries with variable dataset sizes. In Table  1 we have shown the query execution details of five simple select queries done on five different datasets. The results are displayed against six different performance categories, from the data we see a correlation between size of the dataset and its average read, write and compute.

From the graph we see that the dependence between dataset size and average compute size is exponential, meaning that with the increase in data size, average compute time is exponentially increased.

Data returned from constructed queries aside from being displayed in a simple tabular form or as a JSON object can also be transferred to data studio which is an integrated tool to better display and visualize gathered information. One way of displaying queried data from Fig. 8 with data studio tool is shown in Fig.  11 . In this case a bar table chart visualization option is chosen.

figure 11

Using data studio for data visualization

Big Data is not a new term but has gained its spotlight due to the huge amounts of data that are produced daily from different sources. From our analysis we saw that big data is increasing in a fast pace, leading to benefits but also challenges. Cloud Computing is considered to be the best solution for storing, processing and analyzing Big Data. Companies like Amazon, Google and Microsoft offer their public services to facilitate the process of dealing with Big Data. From the analysis we saw that there are multiple benefits that Big Data analytics provides for many different fields and sectors such as healthcare, education and business. We also saw that because of the interaction of Big Data with Cloud Computing there is a shift in the way data is processed and analyzed. In traditional settings, ETL is used whereas in Big Data, ELT is used. We saw that the latter has clear advantages when compared to the former.

From our case study we saw that BigQuery is very good for running complex analytical queries, which means there is no point in running queries that are doing simple aggregation or filtering. BigQuery is suitable for heavy queries, those that operate using a big set of data. The bigger the dataset, the more it is likely to gain in performance. This is when compared to the traditional relational databases,as BigQuery implements different parallel schemas to speed up the execution time.

BigQuery doesn’t like joins and merging data into one table gets a better execution time. It is good for scenarios where data does not change often as it has built-in cache. BigQuery can also be used when one wants to reduce the load on the relational database as it offers different options and configurations to improve query performance. Also pay as you go service can be used where charges are made based on usage or flat rate service which offers a specific slot rate and charges in daily, monthly or yearly plan.

Availability of data and materials

The datasets used during the current study are available from the corresponding author on reasonable request. The authors declare that they have no funder.

Hillbert M, Lopez P (2011) The world’s technological capacity to store, communicate and compute information. Science III:62–65

Google Scholar  

J. Hellerstein,“ Gigaom Blog,”2019. Available: https://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallelprogramming/ . Accessed 20 Jan 2021

Statista,“Statista,“2020. Available: https://www.statista.com/statistics/871513/worldwide-data-created/ . Accessed 21 Jan 2021

Reinsel D, Gantz J, Rydning J (2017) Data age 2025: the evolution of data to-life critical. International Data Corporation, Framingham

Forbes, “Forbes”, 2020. Available: https://www.forbes.com/sites/bernardmarr/2018/05/21/how-muchdata-do-we-create-every-day-the-mind-blowing-stats-everyone-shouldread/?sh=5936b00460ba

Kaisler S, Armour F, Espinosa J (2013) Big data: issues and challenges moving forward, Wailea, Maui, HI, s.n, pp 995–1004

Wikipedia,“ Wikipedia,” 2018. Available: https://www.en.wikipedia.org/wiki/Bigdata/ . Accessed 4 Jan 2021

D. Gewirtz,“ ZDNet,” 2018. Available: https://www.zdnet.com/article/volume-velocity-and-varietyunderstanding-the-three-vs-of-big-data/ . Accessed 1 Jan 2021

Weathington J (2012) Big Data Defined. Tech Republic.  https://www.techrepublic.com/article/big-data-defined/

PCMagazine,“ PC Magazine,” 2018. Available: http://www.pcmag.com/encyclopedia/term/62849/big-data . Accessed 9 Jan 2021

Akhtar SMF (2018) Big Data Architect’s Handbook, Packt

WhishWorks, “WhishWorks”, 2019. Available: https://www.whishworks.com/blog/data-analytics/understanding-the3-vs-of-big-data-volume-velocity-and-variety/ . Accessed 23 Jan 2021

Yadav S, Sohal A (2017) Review paper on big data analytics in Cloud computing. Int J Comp Trends Technol (IJCTT) IX. 49(3);156-160

Kimball R, Ross M (2013) The data warehouse toolkit: the definitive guide to dimensional modeling, 3rd edn. John Wiley & Sons

LaprinthX, “LaprinthX,”2018. Available: https://laptrinhx.com/better-faster-smarter-elt-vs-etl-2084402419/ . Accessed 22 Jan 2021

Xplenty, “XPlenty, ”, 2019. Available: https://www.xplenty.com/blog/etl-vs-elt/# . Accessed 20 Jan 2021

Forbes,“Forbes,”,2018. Available: https://www.forbes.com/sites/forbestechcouncil/2019/11/06/fivebenefits-of-big-data-analytics-and-how-companies-can-getstarted/?sh=7e1b901417e4 . Accessed 13 Jan 202

EDHEC, “EDHEC, ”, 2019. Available: https://master.edhec.edu/news/three-ways-educators-are-using-bigdata-analytics-improve-learning-process# . Accessed 6 Jan 2021

Google Cloud, “BigQuery, ”, 2020. Available: https://cloud.google.com/bigquery . Accessed 5 Jan 2021

Download references

Acknowledgements

The authors would like to thank the colleageous and professors from the University of Prishtina for their insightful comments and suggestions that helped in improving the quality of the paper.

The authors declare that they have no funder.

Author information

Authors and affiliations.

Faculty of Electrical and Computer Engineering, Department of Computer Engineering, University of Prishtina, 10000, Prishtina, Kosovo

Blend Berisha, Endrit Mëziu & Isak Shabani

You can also search for this author in PubMed   Google Scholar

Contributions

Blend Berisha wrote the Introduction, Features and characteristics of Big Data and Conclusions. Endrit Meziu wrote Big Data¨ Analytics in Cloud Computing and part of the case study. Isak Shabani has contributed in the methodology, resources and in supervising the work process. All authors prepared the figures and also reviewed the manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Isak Shabani .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Berisha, B., Mëziu, E. & Shabani, I. Big data analytics in Cloud computing: an overview. J Cloud Comp 11 , 24 (2022). https://doi.org/10.1186/s13677-022-00301-w

Download citation

Received : 08 April 2022

Accepted : 24 July 2022

Published : 06 August 2022

DOI : https://doi.org/10.1186/s13677-022-00301-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cloud computing

big data analytics research papers 2021

To read this content please select one of the options below:

Please note you do not have access to teaching notes, big data analytics and hotel guest experience: a critical analysis of the literature.

International Journal of Contemporary Hospitality Management

ISSN : 0959-6119

Article publication date: 4 April 2022

Issue publication date: 19 May 2022

Guest experience and satisfaction have been central constructs in the hospitality management literature for decades. In recent years, the use of big data as an increasing trending practice in hospitality research has been characterised as a modern approach that offers valuable insights into understanding and enhancing guest experience and satisfaction. Recognising such potential, both researchers and practitioners need to better understand big data’s application and contribution in the hospitality landscape. The purpose of this paper is to critically review and synthesise the literature to shed light on trends and extant patterns in the application of big data in hospitality, particularly in research focusing on hotel guest experience and satisfaction.

Design/methodology/approach

This research is based on a Preferred Reporting Items for Systematic Reviews and Meta-analysis literature review of academic journal articles in Google Scholar published up to the end of 2020.

By data types, user-generated content, especially online reviews and ratings, was at the centre of attention for hospitality-related big data research. By variables, the hospitality-related big data fell into two crucial factor categories: physical environment and guest-to-staff interactions.

Originality/value

This paper shows that big data research can create new insights into attributes that have been extensively researched in the hospitality field. It facilitates a thorough understanding of big data studies and provides valuable insights into future prospects for both researchers and practitioners.

  • Guest experience
  • Guest satisfaction
  • Hospitality

Zarezadeh, Z.Z. , Rastegar, R. and Xiang, Z. (2022), "Big data analytics and hotel guest experience: a critical analysis of the literature", International Journal of Contemporary Hospitality Management , Vol. 34 No. 6, pp. 2320-2336. https://doi.org/10.1108/IJCHM-10-2021-1293

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles

All feedback is valuable.

Please share your general feedback

Report an issue or find answers to frequently asked questions

Contact Customer Support

Advertisement

Advertisement

Big data in cybersecurity: a survey of applications and future trends

  • Original Article
  • Published: 06 January 2021
  • Volume 7 , pages 85–114, ( 2021 )

Cite this article

big data analytics research papers 2021

  • Mohammed M. Alani   ORCID: orcid.org/0000-0002-4324-1774 1  

2409 Accesses

23 Citations

Explore all metrics

With over 4.57 billion people using the Internet in 2020, the amount of data being generated has exceeded 2.5 quintillion bytes per day. This rapid increase in the generation of data has pushed the applications of big data to new heights; one of which is cybersecurity. The paper aims to introduce a thorough survey on the use of big data analytics in building, improving, or defying cybersecurity systems. This paper surveys state-of-the-art research in different areas of applications of big data in cybersecurity. The paper categorizes applications into areas of intrusion and anomaly detection, spamming and spoofing detection, malware and ransomware detection, code security, cloud security, along with another category surveying other directions of research in big data and cybersecurity. The paper concludes with pointing to possible future directions in research on big data applications in cybersecurity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

big data analytics research papers 2021

Similar content being viewed by others

big data analytics research papers 2021

Big Data and Its Role in Cybersecurity

big data analytics research papers 2021

Big Data Analytics and Cybersecurity: Emerging Trends

big data analytics research papers 2021

Cyber Security Challenges of Big Data Applications in Cloud Computing: A State of the Art

Explore related subjects.

  • Artificial Intelligence

20 ransomware statistics youre powerless to resist reading - hashed out by the ssl store. https://www.thesslstore.com/blog/ransomware-statistics/ . Accessed on 01 Aug 2020

2019 cyber security statistics trends & data: The ultimate list of cyber security stats—purplesec. https://purplesec.us/resources/cyber-security-statistics/ . Accessed on 30 Jul 2020

2020 trustwave global security report—trustwave. https://www.trustwave.com/en-us/resources/library/documents/2020-trustwave-global-security-report/ . Accessed on 01 Aug 2020

5 cybersecurity threats to be aware of in 2020—ieee computer society. https://www.computer.org/publications/tech-news/trends/5-cybersecurity-threats-to-be-aware-of-in-2020/ . Accessed on 30 Jul 2020

Apple reveals windows 10 is four times more popular than the mac. howpublished, https://www.theverge.com/2017/4/4/15176766/apple-microsoft-windows-10-vs-mac-users-figures-stats . Accessed on 3 Dec 2018

Computer science. https://arxiv.org/archive/cs . Accessed on 30 Jul 2020

Cyberthreat trends: 15 cybersecurity threats for 2020—nortonlifelock. https://us.norton.com/internetsecurity-emerging-threats-cyberthreat-trends-cybersecurity-threat-review.html . Accessed on 30 Jul 2020

Github—mozilla/openwpm: A web privacy measurement framework. https://github.com/mozilla/OpenWPM . Accessed on 23 Mar 2019

Global digital population as of april 2020. https://www.statista.com/statistics/617136/digital-population-worldwide/ . Accessed: 13 May 2020

Global\_2020\_forecast\_highlights. https://www.cisco.com/c/dam/m/en_us/solutions/service-provider/vni-forecast-highlights/pdf/Global_2020_Forecast_Highlights.pdf . Accessed on 30 Jul 2020

Half of the malware detected in 2019 was classified as zero-day threats, making it the most common malware to date - cynet. https://www.cynet.com/blog/half-of-the-malware-detected-in-2019-was-classified-as-zero-day-threats-making-it-the-most-common-malware-to-date/ . Accessed on 30 Jul 2020

How much data do we create every day? https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/ . Accessed 22 Oct 2018

The iot rundown for 2020: Stats, risks, and solutions—security today. https://securitytoday.com/Articles/2020/01/13/The-IoT-Rundown-for-2020.aspx . Accessed on 30 Jul 2020

Malware statistics youd better get your computer vaccinated. https://dataprot.net/statistics/malware-statistics/ . Accessed 29 May 2020

Microsoft vulnerabilities more than doubled in 2017. howpublished, https://www.securitynow.com/author.asp?section_id=649&doc_id=740671 . Accessed 3 Dec 2018

Top cybersecurity threats in 2020. https://onlinedegrees.sandiego.edu/top-cyber-security-threats/ . Accessed on 30 Jul 2020

Twitter hack: Us and uk teens arrested over breach of celebrity accounts—twitter—the guardian. https://www.theguardian.com/technology/2020/jul/31/twitter-hack-arrests-florida-uk-teenagers . Accessed on 01 Aug 2020

What is big data analytics, howpublished. https://searchbusinessanalytics.techtarget.com/definition/big-data-analytics . Accessed 24 Nov 2018

Ransomware cyber attacks: Which industries are being hit the hardest? https://www.bitsighttech.com/blog/ransomware-cyber-attacks . Accessed on 08 Dec 2018

Us hospital pays $55,000 to hackers after ransomware attack—zdnet. https://www.zdnet.com/article/us-hospital-pays-55000-to-ransomware-operators/ . Accessed on 08 Dec 2018

Abdlhamed M, Kifayat K, Shi Q, Hurst W (2017) Intrusion prediction systems. Information fusion for cyber-security analytics. Springer, New York, pp 155–174

Chapter   Google Scholar  

Abraham S, Nair S (2015) Predictive cyber-security analytics framework: a non-homogenous markov model for security quantification. arXiv:1501.01901

Aditham S, Ranganathan N (2018) A system architecture for the detection of insider attacks in big data systems. IEEE Trans Depend Secure Comput 15(6):974–987

Article   Google Scholar  

Alani MM (2016) What is the cloud? Elements of cloud computing security. Springer, New York, pp 1–14

Google Scholar  

AlEroud A, Karabatis G (2017) Using contextual information to identify cyber-attacks. Information fusion for cyber-security analytics. Springer, New York, pp 1–16

Aleroud A, Zhou L (2017) Phishing environments, techniques, and countermeasures: a survey. Comput Secur 68:160–196

Alguliyev, R., Imamverdiyev, Y.: Big data: big promises for information security. In: Application of Information and Communication Technologies (AICT), 2014 IEEE 8th international conference on. IEEE, pp 1–4

Alhuzali A, Gjomemo R, Eshete B, Venkatakrishnan V (2018) \(\{\) NAVEX \(\}\) : Precise and scalable exploit generation for dynamic web applications. In: 27th \(\{\) USENIX \(\}\) Security Symposium ( \(\{\) USENIX \(\}\) Security 18), pp 377–392

Alrabaee S, Shirani P, Wang L, Debbabi M (2018) Fossil: a resilient and efficient system for identifying foss functions in malware binaries. ACM Trans Priv Secur 21(2):8

Alsadhan AA, Hussain A, Alani MM (2018) Detecting ndp distributed denial of service attacks using machine learning algorithm based on flow-based representation. In: 2018 11th International Conference on Developments in eSystems Engineering (DeSE). IEEE, pp 134–140

Amini L, Christodorescu M, Cohen MA, Parthasarathy S, Rao J, Sailer R, Schales DL, Venema WZ, Verscheure O (2015) Adaptive cyber-security analytics. US Patent 9,032,521

Baikalov IA, Froelich C, McConnell T, McGloughlin JP et al (2016) Cyber security analytics architecture. US Patent 9,516,041

Balaban D (2020) 11 types of spoofing attacks every security professional should know about—2020-03-24—security magazine. https://www.securitymagazine.com/articles/91980-types-of-spoofing-attacks-every-security-professional-should-know-about . Accessed on 01 Aug 2020

Banescu S, Collberg C, Pretschner A (2017) Predicting the resilience of obfuscated code against symbolic execution attacks via machine learning. In: 26th \(\{\) USENIX \(\}\) Security Symposium ( \(\{\) USENIX \(\}\) Security 17), pp 661–678

Barradas D, Santos N, Rodrigues L (2018) Effective detection of multimedia protocol tunneling using machine learning. In: 27th \(\{\) USENIX \(\}\) Security Symposium ( \(\{\) USENIX \(\}\) Security 18), pp 169–185

Biham E, Shamir A (1991) Differential cryptanalysis of des-like cryptosystems. J Cryptol 4(1):3–72

Article   MathSciNet   MATH   Google Scholar  

Bilge L, Han Y, Dell’Amico M (2017) Riskteller: Predicting the risk of cyber incidents. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. ACM, pp 1299–1311

Cao P, Badger EC, Kalbarczyk ZT, Iyer RK, Withers A, Slagell AJ (2015) Towards an unified security testbed and security analytics framework. In: Proceedings of the 2015 symposium and bootcamp on the science of security. ACM

Cao Y, Yang J (2015) Towards making systems forget with machine unlearning. In: 2015 IEEE symposium on security and privacy. IEEE, pp 463–480

Chakraborty R, Vishik C, Rao HR (2013) Privacy preserving actions of older adults on social media: Exploring the behavior of opting out of information sharing. Decis Support Syst 55(4):948–956

Chiew KL, Yong KSC, Tan CL (2018) A survey of phishing attacks: Their types, vectors and technical approaches. Expert Syst Appl 106:1–20

Cinque M, Della Corte R, Pecchia A (2019) Microservices monitoring with event logs and black box execution tracing. IEEE Trans Serv Comput

Cinque M, Della Corte R, Pecchia A (2020) Contextual filtering and prioritization of computer application logs for security situational awareness. Future Gener Comput Syst 111:668–680

Curtin M, Dolske J (1998) A brute force search of des keyspace. In: 8th Usenix Symposium, January. Citeseer, pp 26–29

Cuzzocrea A, Martinelli F, Mercaldo F, Grasso GM (2018) Experimenting and assessing machine learning tools for detecting and analyzing malicious behaviors in complex environments. J Reliab Intell Environ 4(4):225–245

DATA G (2018) Malware in 2018: the danger is on the web—g data blog. https://www.gdatasoftware.com/blog/2018/09/31037-malware-figures-first-half-2018-danger-web . Accessed on 31 Mar 2019

Dias LF, Correia M (2020) Big data analytics for intrusion detection: an overview. In: Handbook of research on machine and deep learning applications for cyber security. IGI Global, pp 292–316

Du M, Li F, Zheng G, Srikumar V (2017) Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. ACM, New York, pp 1285–1298

Englehardt S, Narayanan A (2016) Online tracking: A 1-million-site measurement and analysis. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. ACM, pp 1388–1401

Fang W, Wen XZ, Zheng Y, Zhou M (2017) A survey of big data security and privacy preserving. IETE Tech Rev 34(5):544–560

Farris KA, Shah A, Cybenko G, Ganesan R, Jajodia S (2018) Vulcon: A system for vulnerability prioritization, mitigation, and management. ACM Trans Priv Secur 21(4):16:1–16:28. https://doi.org/10.1145/3196884

Feng Q, Zhou R, Xu C, Cheng Y, Testa B, Yin H (2016) Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. ACM, New York, pp 480–491. https://doi.org/10.1145/2976749.2978370

Funk C, Garnaeva M (2013) Kaspersky security bulletin 2013. overall statistics for 2013, vol 10. Kaspersky Lab

Gai K, Qiu M, Elnagdy SA (2016) A novel secure big data cyber incident analytics framework for cloud-based cybersecurity insurance. In: 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE international conference on intelligent data and security (IDS). IEEE, pp 171–176

Gandomi A, Haider M (2015) Beyond the hype: Big data concepts, methods, and analytics. Int J Inf Manag 35(2):137–144

García S, Ramírez-Gallego S, Luengo J, Benítez JM, Herrera F (2016) Big data preprocessing: methods and prospects. Big Data Anal 1(1):9

Gong NZ, Liu B (2018) Attribute inference attacks in online social networks. ACM Trans Priv Secur 21(1):3

Gou L, Zhou MX, Yang H (2014) Knowme and shareme: understanding automatically discovered personality traits from social media and user sharing preferences. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York, pp 955–964

Grahn K, Westerlund M, Pulkkis G (2017) Analytics for network security: a survey and taxonomy. Information fusion for cyber-security analytics. Springer, New York, pp 175–193

Guo W, Mu D, Xu J, Su P, Wang G, Xing X (2018) Lemna: explaining deep learning based security applications. In: Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, CCS ’18. ACM, New York, pp 364–379. https://doi.org/10.1145/3243734.3243792

Gutierrez CN, Kim T, Della Corte R, Avery J, Goldwasser D, Cinque M, Bagchi S (2018) Learning from the ones that got away: Detecting new forms of phishing attacks. IEEE Trans Depend Secure Comput 15(6):988–1001

Gyöngyi Z, Garcia-Molina H, Pedersen J (2004) Combating web spam with trustrank. In: Proceedings of the Thirtieth international conference on Very large data bases, vol 30. VLDB Endowment, pp 576–587

Hale B (2016) Estimating log generation for security information event and log management. Retrieved Sep 15

He P, Zhu J, He S, Li J, Lyu MR (2018) Towards automated log parsing for large-scale log data analysis. IEEE Trans Depend Secure Comput 15(6):931–944

Hong JB, Nhlabatsi A, Kim DS, Hussein A, Fetais N, Khan KM (2019) Systematic identification of threats in the cloud: a survey. Comput Netw 150:46–69

Hossain MN, Wang J, Weisse O, Sekar R, Genkin D, He B, Stoller SD, Fang G, Piessens F, Downing E, et al (2018) Dependence-preserving data compaction for scalable forensic analysis. In: 27th \(\{\) USENIX \(\}\) Security Symposium ( \(\{\) USENIX \(\}\) Security 18), pp 1723–1740

Huang DY, Aliapoulios MM, Li VG, Invernizzi L, Bursztein E, McRoberts K, Levin J, Levchenko K, Snoeren AC, McCoy D (2018) Tracking ransomware end-to-end. In: 2018 IEEE symposium on security and privacy (SP). IEEE, pp 618–631

Ikram M, Onwuzurike L, Farooqi S, Cristofaro ED, Friedman A, Jourjon G, Kaafar MA, Shafiq MZ (2017) Measuring, characterizing, and detecting facebook like farms. ACM Trans Priv Secur 20(4):13

Jansen K, Schäfer M, Moser D, Lenders V, Pöpper C, Schmitt J (2018) Crowd-gps-sec: Leveraging crowdsourcing to detect and localize gps spoofing attacks. In: 2018 IEEE symposium on security and privacy (SP). IEEE, pp 1018–1031

John NA (2013) The social logics of sharing. Commun Rev 16(3):113–131

Johnstone M, Peacock M (2020) Seven pitfalls of using data science in cybersecurity. In: Data Science in Cybersecurity and Cyberthreat Intelligence. Springer, Nwe York

Jurgens D (2013) That’s what friends are for: Inferring location in online social media platforms based on social relationships. In: Seventh International AAAI conference on weblogs and social media

Kelsey J, Schneier B, Wagner D (1996) Key-schedule cryptanalysis of idea, g-des, gost, safer, and triple-des. Annual International Cryptology Conference. Springer, New York, pp 237–251

Khan MUK, Park HS, Kyung CM (2019) Rejecting motion outliers for efficient crowd anomaly detection. IEEE Trans Inf Forensics Secur 14(2):541–556

Kim D, Kwon BJ, Kozák K, Gates C, Dumitras T (2018) The broken shield: Measuring revocation effectiveness in the windows code-signing \(\{\) PKI \(\}\) . In: 27th \(\{\) USENIX \(\}\) Security Symposium ( \(\{\) USENIX \(\}\) Security 18), pp 851–868

Kim S, Woo S, Lee H, Oh H (2017) Vuddy: A scalable approach for vulnerable code clone discovery. In: 2017 IEEE symposium on security and privacy (SP). IEEE, pp 595–614

Koli J (2018) Randroid: Android malware detection using random machine learning classifiers. In: 2018 Technologies for smart-city energy security and power (ICSESP). IEEE, pp 1–6

Kotenko I, Saenko I, Branitskiy A (2020) Machine learning and big data processing for cybersecurity data analysis. Data science in cybersecurity and cyberthreat intelligence. Springer, New York, pp 61–85

Kumar R, Goyal R (2019) On cloud security requirements, threats, vulnerabilities and countermeasures: a survey. Comput Sci Rev 33:1–48

Article   MathSciNet   Google Scholar  

Kwon BJ, Mondal J, Jang J, Bilge L, Dumitraş T (2015) The dropper effect: insights into malware distribution with downloader graph analytics. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. ACM, New York, pp 1118–1129

Laney D (2001) 3d data management: controlling data volume, velocity and variety. META Group Res Note 6(70):1

Li H, Xu X, Liu C, Ren T, Wu K, Cao X, Zhang W, Yu Y, Song D (2018) A machine learning approach to prevent malicious calls over telephony networks. In: 2018 IEEE symposium on security and privacy (SP). IEEE, pp 53–69

Liao X, Yuan K, Wang X, Li Z, Xing L, Beyah R (2016) Acing the ioc game: Toward automatic discovery and analysis of open-source cyber threat intelligence. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. ACM, New York, pp 755–766

Liao X, Yuan K, Wang X, Pei Z, Yang H, Chen J, Duan H, Du K, Alowaisheq E, Alrwais S, et al (2016) Seeking nonsense, looking for trouble: Efficient promotional-infection detection through semantic inconsistency search. In: 2016 IEEE symposium on security and privacy (SP). IEEE, pp 707–723

MacDonald N (2012) Information security is becoming a big data analytics problem. https://www.gartner.com/en/documents/1960615 . Accessed on 13 May 2020

Madi T, Jarraya Y, Alimohammadifar A, Majumdar S, Wang Y, Pourzandi M, Wang L, Debbabi M (2018) Isotop: auditing virtual networks isolation across cloud layers in openstack. ACM Trans Priv Secur 22(1):1

Mahmood T, Afzal U (2013) Security analytics: big data analytics for cybersecurity: a review of trends, techniques and tools. In: Information assurance (ncia), 2013 2nd national conference on. IEEE, pp 129–134

Majumdar S, Tabiban A, Jarraya Y, Oqaily M, Alimohammadifar A, Pourzandi M, Wang L, Debbabi M (2018) Learning probabilistic dependencies among events for proactive security auditing in clouds. J Comput Secur:1–38 ( preprint )

Maltby D (2011) Big data analytics. In: 74th Annual Meeting of the Association for Information Science and Technology (ASIST), pp 1–6

Martha V (2015) Big data processing algorithms. Big data. Springer, New York, pp 61–91

Matsui M (1993) Linear cryptanalysis method for des cipher. Workshop on the Theory and Application of of Cryptographic Techniques. Springer, New York, pp 386–397

Nadgowda S, Isci C, Bal M (2018) Déjàvu: bringing black-box security analytics to cloud. In: Proceedings of the 19th International middleware conference industry. ACM, New York, pp 17–24

Nilizadeh S, Labrèche F, Sedighian A, Zand A, Fernandez J, Kruegel C, Stringhini G, Vigna G (2017) Poised: Spotting twitter spam off the beaten paths. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. ACM, New York, pp 1159–1174

Oltisk J (2013) The big data security analytics era is here. Tech. rep, Enterprise Strategy Group

Pearce P, Ensafi R, Li F, Feamster N, Paxson V (2017) Augur: Internet-wide detection of connectivity disruptions. In: 2017 IEEE symposium on security and privacy (SP). IEEE, pp 427–443

Pierazzi F, Casolari S, Colajanni M, Marchetti M (2016) Exploratory security analytics for anomaly detection. Comput Secur 56:28–49

Reaves B, Vargas L, Scaife N, Tian D, Blue L, Traynor P, Butler KR (2018) Characterizing the security of the sms ecosystem with public gateways. ACM Trans Priv Secur 22(1):2

Richardson R, North MM (2017) Ransomware: evolution, mitigation and prevention. Int Manag Rev 13(1):10

Rieck K, Holz T, Willems C, Düssel P, Laskov P (2008) Learning and classification of malware behavior. International conference on detection of intrusions and malware, and vulnerability assessment. Springer, New York, pp 108–125

Rijmen V, Daemen J (2001) Advanced encryption standard. In: Proceedings of federal information processing standards publications. National Institute of Standards and Technology, pp 19–22

Rose C (2011) The security implications of ubiquitous social media

Salva S, Regainia L (2019) A catalogue associating security patterns and attack steps to design secure applications. J Comput Secur:1–26 ( Preprint )

Shen Y, Mariconti E, Vervier PA, Stringhini G (2018) Tiresias: predicting security events through deep learning. In: Proceedings of the 2018 ACM SIGSAC conference on computer and communications security. ACM, New York, pp 592–605

Shu X, Araujo F, Schales DL, Stoecklin MP, Jang J, Huang H, Rao JR (2018) Threat intelligence computing. In: Proceedings of the 2018 ACM SIGSAC conference on computer and communications security. ACM, New York, pp 1883–1898

Shu X, Yao DD, Ramakrishnan N, Jaeger T (2017) Long-span program behavior modeling and attack detection. ACM Trans Priv Secur 20(4):12

Siadati H, Memon N (2017) Detecting structurally anomalous logins within enterprise networks. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. ACM, New York, pp 1273–1284

Singer PW, Friedman A (2014) Cybersecurity: what everyone needs to know. Oxford University Press, Oxford

Book   Google Scholar  

Sipola T (2015) Knowledge discovery from network logs. Cyber security: analytics, technology and automation. Springer, New York, pp 195–203

Siwicki B (2016) Ransomware attackers collect ransom from kansas hospital, dont unlock all the data, then demand more money. Healthcare IT News

Standard DE et al (1977) Federal information processing standards publication 46. National Bureau of Standards, US Department of Commerce, vol 23

Suciu O, Marginean R, Kaya Y, Daume III H, Dumitras T (2018) When does machine learning \(\{\) FAIL \(\}\) ? generalized transferability for evasion and poisoning attacks. In: 27th \(\{\) USENIX \(\}\) Security Symposium ( \(\{\) USENIX \(\}\) Security 18), pp 1299–1316

Sun B, Takahashi T, Zhu L, Mori T (2020) Discovering malicious urls using machine learning techniques. Data science in cybersecurity and cyberthreat intelligence. Springer, New York, pp 33–60

Talabis M, McPherson R, Miyamoto I, Martin J (2014) Information security analytics: finding security insights, patterns, and anomalies in big data. Syngress

Tan Z, Nagar UT, He X, Nanda P, Liu RP, Wang S, Hu J (2014) Enhancing big data security with collaborative intrusion detection. IEEE Cloud Comput 1(3):27–33

Tankard C (2012) Big data security. Netw Secur 2012(7):5–8

Terzi DS, Terzi R, Sagiroglu S (2015) A survey on security and privacy issues in big data. In: 2015 10th international conference for internet technology and secured transactions (ICITST). IEEE, pp 202–207

Thirumaran J et al (2018) Applications of big data analytics-network security. Int J Res Sci Eng Technol 5(1):55–59

Tipton H (2019) Information security management handbook, vol IV. CRC Press, Boca Raton

Book   MATH   Google Scholar  

Ugarte-Pedrero X, Graziano M, Balzarotti D (2019) A close look at a daily dataset of malware samples. ACM Trans Priv Secur 22(1):6

Ullah F, Babar MA (2019) Architectural tactics for big data cybersecurity analytics systems: a review. J Syst Softw 151:81–118

Von Solms R, Van Niekerk J (2013) From information security to cyber security. Comput Secur 38:97–102

Wohlin C (2014) Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering, pp 1–10

Xu Z, Wu Z, Li Z, Jee K, Rhee J, Xiao X, Xu F, Wang H, Jiang G (2016) High fidelity data reduction for big data security dependency analyses. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. ACM, New York, pp 504–516

Yang X, Ma T, Shi Y (2007) Typical dos/ddos threats under ipv6. In: 2007 International multi-conference on computing in the global information technology (ICCGI’07). IEEE

Yao Y, Viswanath B, Cryan J, Zheng H, Zhao BY (2017) Automated crowdturfing attacks and defenses in online review systems. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. ACM, New York, pp 1143–1158

You I, Yim K (2010) Malware obfuscation techniques: a brief survey. In: 2010 International conference on broadband, wireless computing, communication and applications. IEEE, pp 297–300

Yuan Y, Adhatarao SS, Lin M, Yuan Y, Liu Z, Fu X (2020) Ada: Adaptive deep log anomaly detector. In: IEEE INFOCOM 2020-IEEE conference on computer communications. IEEE, pp 2449–2458

Zhang D (2018) Big data security and privacy protection. In: 8th International conference on management and computer science (ICMCS 2018). Atlantis Press

Zhang J, Zhang R, Zhang Y, Yan G (2016) The rise of social botnets: Attacks and countermeasures. IEEE Trans Depend Secure Comput

Zhao JY, Kessler EG, Yu J, Jalal K, Cooper CA, Brewer JJ, Schwaitzberg SD, Guo WA (2018) Impact of trauma hospital ransomware attack on surgical residency training. J Surg Res 232:389–397

Zhu Z, Dumitraş T (2016) Featuresmith: automatically engineering features for malware detection by mining the security literature. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp 767–778. ACM, New York

Zoldi S, Athwal J, Li H, Kennel M, Xue X (2015) Cyber security adaptive analytics threat monitoring system and method. US Patent 9,191,403

Zuech R, Khoshgoftaar TM, Wald R (2015) Intrusion detection and big heterogeneous data: a survey. J Big Data 2(1):3

Zuo Y, Wu Y, Min G, Huang C, Pei K (2020) An intelligent anomaly detection scheme for micro-services architectures with temporal and spatial data analysis. IEEE Trans Cogn Commun Netw

Download references

Author information

Authors and affiliations.

Senior Member of the ACM, Toronto, Canada

Mohammed M. Alani

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Mohammed M. Alani .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Literature summary tables

See Tables 1 , 2 , 3 , 4 , 5 and 6 .

Rights and permissions

Reprints and permissions

About this article

Alani, M.M. Big data in cybersecurity: a survey of applications and future trends. J Reliable Intell Environ 7 , 85–114 (2021). https://doi.org/10.1007/s40860-020-00120-3

Download citation

Received : 13 May 2020

Accepted : 05 November 2020

Published : 06 January 2021

Issue Date : June 2021

DOI : https://doi.org/10.1007/s40860-020-00120-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cybersecurity
  • Big data analytics
  • Security analytics
  • Find a journal
  • Publish with us
  • Track your research

Role of Big Data Analytics in Audit Quality using IS Success Model as a Basis

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, index terms.

Applied computing

Enterprise computing

Business-IT alignment

Recommendations

Impact of data mining, big data analytics and data visualization on audit quality.

The existence of various financial statement scandals has made users of financial statements doubt the quality of the audit and demand auditor to provide an audit with better audit quality. Currently, the level of difficulty to meet these demands is ...

The Impact of Auditor's Competency, Audit Risk, and Professional Skepticism to Audit Quality in Fraud Detection Using Remote Audit

The purpose of this study was to determine the effect of auditors' professional skepticism, audit risk, and auditor competence on audit quality in conducting remote audits in the midst of the COVID-19 pandemic. To determine this effect, the researcher ...

The role of big data analytics and decision-making in achieving project success

Big data analytics and decision-making are critical to project success. Therefore, this study aims to investigate the impact of big data analytics on project success and the moderating effect of decision-making from the resource-based ...

  • Investigated the impact of big data analytics (BDA) on project success.

Information

Published in.

cover image ACM Other conferences

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

  • Audit Quality
  • Big Data Analytics
  • IS Success Model
  • Research-article
  • Refereed limited

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

HTML Format

View this article in HTML Format.

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

  • Open access
  • Published: 06 January 2022

The use of Big Data Analytics in healthcare

  • Kornelia Batko   ORCID: orcid.org/0000-0001-6561-3826 1 &
  • Andrzej Ślęzak 2  

Journal of Big Data volume  9 , Article number:  3 ( 2022 ) Cite this article

80k Accesses

154 Citations

33 Altmetric

Metrics details

The introduction of Big Data Analytics (BDA) in healthcare will allow to use new technologies both in treatment of patients and health management. The paper aims at analyzing the possibilities of using Big Data Analytics in healthcare. The research is based on a critical analysis of the literature, as well as the presentation of selected results of direct research on the use of Big Data Analytics in medical facilities. The direct research was carried out based on research questionnaire and conducted on a sample of 217 medical facilities in Poland. Literature studies have shown that the use of Big Data Analytics can bring many benefits to medical facilities, while direct research has shown that medical facilities in Poland are moving towards data-based healthcare because they use structured and unstructured data, reach for analytics in the administrative, business and clinical area. The research positively confirmed that medical facilities are working on both structural data and unstructured data. The following kinds and sources of data can be distinguished: from databases, transaction data, unstructured content of emails and documents, data from devices and sensors. However, the use of data from social media is lower as in their activity they reach for analytics, not only in the administrative and business but also in the clinical area. It clearly shows that the decisions made in medical facilities are highly data-driven. The results of the study confirm what has been analyzed in the literature that medical facilities are moving towards data-based healthcare, together with its benefits.

Introduction

The main contribution of this paper is to present an analytical overview of using structured and unstructured data (Big Data) analytics in medical facilities in Poland. Medical facilities use both structured and unstructured data in their practice. Structured data has a predetermined schema, it is extensive, freeform, and comes in variety of forms [ 27 ]. In contrast, unstructured data, referred to as Big Data (BD), does not fit into the typical data processing format. Big Data is a massive amount of data sets that cannot be stored, processed, or analyzed using traditional tools. It remains stored but not analyzed. Due to the lack of a well-defined schema, it is difficult to search and analyze such data and, therefore, it requires a specific technology and method to transform it into value [ 20 , 68 ]. Integrating data stored in both structured and unstructured formats can add significant value to an organization [ 27 ]. Organizations must approach unstructured data in a different way. Therefore, the potential is seen in Big Data Analytics (BDA). Big Data Analytics are techniques and tools used to analyze and extract information from Big Data. The results of Big Data analysis can be used to predict the future. They also help in creating trends about the past. When it comes to healthcare, it allows to analyze large datasets from thousands of patients, identifying clusters and correlation between datasets, as well as developing predictive models using data mining techniques [ 60 ].

This paper is the first study to consolidate and characterize the use of Big Data from different perspectives. The first part consists of a brief literature review of studies on Big Data (BD) and Big Data Analytics (BDA), while the second part presents results of direct research aimed at diagnosing the use of big data analyses in medical facilities in Poland.

Healthcare is a complex system with varied stakeholders: patients, doctors, hospitals, pharmaceutical companies and healthcare decision-makers. This sector is also limited by strict rules and regulations. However, worldwide one may observe a departure from the traditional doctor-patient approach. The doctor becomes a partner and the patient is involved in the therapeutic process [ 14 ]. Healthcare is no longer focused solely on the treatment of patients. The priority for decision-makers should be to promote proper health attitudes and prevent diseases that can be avoided [ 81 ]. This became visible and important especially during the Covid-19 pandemic [ 44 ].

The next challenges that healthcare will have to face is the growing number of elderly people and a decline in fertility. Fertility rates in the country are found below the reproductive minimum necessary to keep the population stable [ 10 ]. The reflection of both effects, namely the increase in age and lower fertility rates, are demographic load indicators, which is constantly growing. Forecasts show that providing healthcare in the form it is provided today will become impossible in the next 20 years [ 70 ]. It is especially visible now during the Covid-19 pandemic when healthcare faced quite a challenge related to the analysis of huge data amounts and the need to identify trends and predict the spread of the coronavirus. The pandemic showed it even more that patients should have access to information about their health condition, the possibility of digital analysis of this data and access to reliable medical support online. Health monitoring and cooperation with doctors in order to prevent diseases can actually revolutionize the healthcare system. One of the most important aspects of the change necessary in healthcare is putting the patient in the center of the system.

Technology is not enough to achieve these goals. Therefore, changes should be made not only at the technological level but also in the management and design of complete healthcare processes and what is more, they should affect the business models of service providers. The use of Big Data Analytics is becoming more and more common in enterprises [ 17 , 54 ]. However, medical enterprises still cannot keep up with the information needs of patients, clinicians, administrators and the creator’s policy. The adoption of a Big Data approach would allow the implementation of personalized and precise medicine based on personalized information, delivered in real time and tailored to individual patients.

To achieve this goal, it is necessary to implement systems that will be able to learn quickly about the data generated by people within clinical care and everyday life. This will enable data-driven decision making, receiving better personalized predictions about prognosis and responses to treatments; a deeper understanding of the complex factors and their interactions that influence health at the patient level, the health system and society, enhanced approaches to detecting safety problems with drugs and devices, as well as more effective methods of comparing prevention, diagnostic, and treatment options [ 40 ].

In the literature, there is a lot of research showing what opportunities can be offered to companies by big data analysis and what data can be analyzed. However, there are few studies showing how data analysis in the area of healthcare is performed, what data is used by medical facilities and what analyses and in which areas they carry out. This paper aims to fill this gap by presenting the results of research carried out in medical facilities in Poland. The goal is to analyze the possibilities of using Big Data Analytics in healthcare, especially in Polish conditions. In particular, the paper is aimed at determining what data is processed by medical facilities in Poland, what analyses they perform and in what areas, and how they assess their analytical maturity. In order to achieve this goal, a critical analysis of the literature was performed, and the direct research was based on a research questionnaire conducted on a sample of 217 medical facilities in Poland. It was hypothesized that medical facilities in Poland are working on both structured and unstructured data and moving towards data-based healthcare and its benefits. Examining the maturity of healthcare facilities in the use of Big Data and Big Data Analytics is crucial in determining the potential future benefits that the healthcare sector can gain from Big Data Analytics. There is also a pressing need to predicate whether, in the coming years, healthcare will be able to cope with the threats and challenges it faces.

This paper is divided into eight parts. The first is the introduction which provides background and the general problem statement of this research. In the second part, this paper discusses considerations on use of Big Data and Big Data Analytics in Healthcare, and then, in the third part, it moves on to challenges and potential benefits of using Big Data Analytics in healthcare. The next part involves the explanation of the proposed method. The result of direct research and discussion are presented in the fifth part, while the following part of the paper is the conclusion. The seventh part of the paper presents practical implications. The final section of the paper provides limitations and directions for future research.

Considerations on use Big Data and Big Data Analytics in the healthcare

In recent years one can observe a constantly increasing demand for solutions offering effective analytical tools. This trend is also noticeable in the analysis of large volumes of data (Big Data, BD). Organizations are looking for ways to use the power of Big Data to improve their decision making, competitive advantage or business performance [ 7 , 54 ]. Big Data is considered to offer potential solutions to public and private organizations, however, still not much is known about the outcome of the practical use of Big Data in different types of organizations [ 24 ].

As already mentioned, in recent years, healthcare management worldwide has been changed from a disease-centered model to a patient-centered model, even in value-based healthcare delivery model [ 68 ]. In order to meet the requirements of this model and provide effective patient-centered care, it is necessary to manage and analyze healthcare Big Data.

The issue often raised when it comes to the use of data in healthcare is the appropriate use of Big Data. Healthcare has always generated huge amounts of data and nowadays, the introduction of electronic medical records, as well as the huge amount of data sent by various types of sensors or generated by patients in social media causes data streams to constantly grow. Also, the medical industry generates significant amounts of data, including clinical records, medical images, genomic data and health behaviors. Proper use of the data will allow healthcare organizations to support clinical decision-making, disease surveillance, and public health management. The challenge posed by clinical data processing involves not only the quantity of data but also the difficulty in processing it.

In the literature one can find many different definitions of Big Data. This concept has evolved in recent years, however, it is still not clearly understood. Nevertheless, despite the range and differences in definitions, Big Data can be treated as a: large amount of digital data, large data sets, tool, technology or phenomenon (cultural or technological.

Big Data can be considered as massive and continually generated digital datasets that are produced via interactions with online technologies [ 53 ]. Big Data can be defined as datasets that are of such large sizes that they pose challenges in traditional storage and analysis techniques [ 28 ]. A similar opinion about Big Data was presented by Ohlhorst who sees Big Data as extremely large data sets, possible neither to manage nor to analyze with traditional data processing tools [ 57 ]. In his opinion, the bigger the data set, the more difficult it is to gain any value from it.

In turn, Knapp perceived Big Data as tools, processes and procedures that allow an organization to create, manipulate and manage very large data sets and storage facilities [ 38 ]. From this point of view, Big Data is identified as a tool to gather information from different databases and processes, allowing users to manage large amounts of data.

Similar perception of the term ‘Big Data’ is shown by Carter. According to him, Big Data technologies refer to a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data by enabling high velocity capture, discovery and/or analysis [ 13 ].

Jordan combines these two approaches by identifying Big Data as a complex system, as it needs data bases for data to be stored in, programs and tools to be managed, as well as expertise and personnel able to retrieve useful information and visualization to be understood [ 37 ].

Following the definition of Laney for Big Data, it can be state that: it is large amount of data generated in very fast motion and it contains a lot of content [ 43 ]. Such data comes from unstructured sources, such as stream of clicks on the web, social networks (Twitter, blogs, Facebook), video recordings from the shops, recording of calls in a call center, real time information from various kinds of sensors, RFID, GPS devices, mobile phones and other devices that identify and monitor something [ 8 ]. Big Data is a powerful digital data silo, raw, collected with all sorts of sources, unstructured and difficult, or even impossible, to analyze using conventional techniques used so far to relational databases.

While describing Big Data, it cannot be overlooked that the term refers more to a phenomenon than to specific technology. Therefore, instead of defining this phenomenon, trying to describe them, more authors are describing Big Data by giving them characteristics included a collection of V’s related to its nature [ 2 , 3 , 23 , 25 , 58 ]:

Volume (refers to the amount of data and is one of the biggest challenges in Big Data Analytics),

Velocity (speed with which new data is generated, the challenge is to be able to manage data effectively and in real time),

Variety (heterogeneity of data, many different types of healthcare data, the challenge is to derive insights by looking at all available heterogenous data in a holistic manner),

Variability (inconsistency of data, the challenge is to correct the interpretation of data that can vary significantly depending on the context),

Veracity (how trustworthy the data is, quality of the data),

Visualization (ability to interpret data and resulting insights, challenging for Big Data due to its other features as described above).

Value (the goal of Big Data Analytics is to discover the hidden knowledge from huge amounts of data).

Big Data is defined as an information asset with high volume, velocity, and variety, which requires specific technology and method for its transformation into value [ 21 , 77 ]. Big Data is also a collection of information about high-volume, high volatility or high diversity, requiring new forms of processing in order to support decision-making, discovering new phenomena and process optimization [ 5 , 7 ]. Big Data is too large for traditional data-processing systems and software tools to capture, store, manage and analyze, therefore it requires new technologies [ 28 , 50 , 61 ] to manage (capture, aggregate, process) its volume, velocity and variety [ 9 ].

Undoubtedly, Big Data differs from the data sources used so far by organizations. Therefore, organizations must approach this type of unstructured data in a different way. First of all, organizations must start to see data as flows and not stocks—this entails the need to implement the so-called streaming analytics [ 48 ]. The mentioned features make it necessary to use new IT tools that allow the fullest use of new data [ 58 ]. The Big Data idea, inseparable from the huge increase in data available to various organizations or individuals, creates opportunities for access to valuable analyses, conclusions and enables making more accurate decisions [ 6 , 11 , 59 ].

The Big Data concept is constantly evolving and currently it does not focus on huge amounts of data, but rather on the process of creating value from this data [ 52 ]. Big Data is collected from various sources that have different data properties and are processed by different organizational units, resulting in creation of a Big Data chain [ 36 ]. The aim of the organizations is to manage, process and analyze Big Data. In the healthcare sector, Big Data streams consist of various types of data, namely [ 8 , 51 ]:

clinical data, i.e. data obtained from electronic medical records, data from hospital information systems, image centers, laboratories, pharmacies and other organizations providing health services, patient generated health data, physician’s free-text notes, genomic data, physiological monitoring data [ 4 ],

biometric data provided from various types of devices that monitor weight, pressure, glucose level, etc.,

financial data, constituting a full record of economic operations reflecting the conducted activity,

data from scientific research activities, i.e. results of research, including drug research, design of medical devices and new methods of treatment,

data provided by patients, including description of preferences, level of satisfaction, information from systems for self-monitoring of their activity: exercises, sleep, meals consumed, etc.

data from social media.

These data are provided not only by patients but also by organizations and institutions, as well as by various types of monitoring devices, sensors or instruments [ 16 ]. Data that has been generated so far in the healthcare sector is stored in both paper and digital form. Thus, the essence and the specificity of the process of Big Data analyses means that organizations need to face new technological and organizational challenges [ 67 ]. The healthcare sector has always generated huge amounts of data and this is connected, among others, with the need to store medical records of patients. However, the problem with Big Data in healthcare is not limited to an overwhelming volume but also an unprecedented diversity in terms of types, data formats and speed with which it should be analyzed in order to provide the necessary information on an ongoing basis [ 3 ]. It is also difficult to apply traditional tools and methods for management of unstructured data [ 67 ]. Due to the diversity and quantity of data sources that are growing all the time, advanced analytical tools and technologies, as well as Big Data analysis methods which can meet and exceed the possibilities of managing healthcare data, are needed [ 3 , 68 ].

Therefore, the potential is seen in Big Data analyses, especially in the aspect of improving the quality of medical care, saving lives or reducing costs [ 30 ]. Extracting from this tangle of given association rules, patterns and trends will allow health service providers and other stakeholders in the healthcare sector to offer more accurate and more insightful diagnoses of patients, personalized treatment, monitoring of the patients, preventive medicine, support of medical research and health population, as well as better quality of medical services and patient care while, at the same time, the ability to reduce costs (Fig.  1 ).

figure 1

(Source: Own elaboration)

Healthcare Big Data Analytics applications

The main challenge with Big Data is how to handle such a large amount of information and use it to make data-driven decisions in plenty of areas [ 64 ]. In the context of healthcare data, another major challenge is to adjust big data storage, analysis, presentation of analysis results and inference basing on them in a clinical setting. Data analytics systems implemented in healthcare are designed to describe, integrate and present complex data in an appropriate way so that it can be understood better (Fig.  2 ). This would improve the efficiency of acquiring, storing, analyzing and visualizing big data from healthcare [ 71 ].

figure 2

Process of Big Data Analytics

The result of data processing with the use of Big Data Analytics is appropriate data storytelling which may contribute to making decisions with both lower risk and data support. This, in turn, can benefit healthcare stakeholders. To take advantage of the potential massive amounts of data in healthcare and to ensure that the right intervention to the right patient is properly timed, personalized, and potentially beneficial to all components of the healthcare system such as the payer, patient, and management, analytics of large datasets must connect communities involved in data analytics and healthcare informatics [ 49 ]. Big Data Analytics can provide insight into clinical data and thus facilitate informed decision-making about the diagnosis and treatment of patients, prevention of diseases or others. Big Data Analytics can also improve the efficiency of healthcare organizations by realizing the data potential [ 3 , 62 ].

Big Data Analytics in medicine and healthcare refers to the integration and analysis of a large amount of complex heterogeneous data, such as various omics (genomics, epigenomics, transcriptomics, proteomics, metabolomics, interactomics, pharmacogenetics, deasomics), biomedical data, talemedicine data (sensors, medical equipment data) and electronic health records data [ 46 , 65 ].

When analyzing the phenomenon of Big Data in the healthcare sector, it should be noted that it can be considered from the point of view of three areas: epidemiological, clinical and business.

From a clinical point of view, the Big Data analysis aims to improve the health and condition of patients, enable long-term predictions about their health status and implementation of appropriate therapeutic procedures. Ultimately, the use of data analysis in medicine is to allow the adaptation of therapy to a specific patient, that is personalized medicine (precision, personalized medicine).

From an epidemiological point of view, it is desirable to obtain an accurate prognosis of morbidity in order to implement preventive programs in advance.

In the business context, Big Data analysis may enable offering personalized packages of commercial services or determining the probability of individual disease and infection occurrence. It is worth noting that Big Data means not only the collection and processing of data but, most of all, the inference and visualization of data necessary to obtain specific business benefits.

In order to introduce new management methods and new solutions in terms of effectiveness and transparency, it becomes necessary to make data more accessible, digital, searchable, as well as analyzed and visualized.

Erickson and Rothberg state that the information and data do not reveal their full value until insights are drawn from them. Data becomes useful when it enhances decision making and decision making is enhanced only when analytical techniques are used and an element of human interaction is applied [ 22 ].

Thus, healthcare has experienced much progress in usage and analysis of data. A large-scale digitalization and transparency in this sector is a key statement of almost all countries governments policies. For centuries, the treatment of patients was based on the judgment of doctors who made treatment decisions. In recent years, however, Evidence-Based Medicine has become more and more important as a result of it being related to the systematic analysis of clinical data and decision-making treatment based on the best available information [ 42 ]. In the healthcare sector, Big Data Analytics is expected to improve the quality of life and reduce operational costs [ 72 , 82 ]. Big Data Analytics enables organizations to improve and increase their understanding of the information contained in data. It also helps identify data that provides insightful insights for current as well as future decisions [ 28 ].

Big Data Analytics refers to technologies that are grounded mostly in data mining: text mining, web mining, process mining, audio and video analytics, statistical analysis, network analytics, social media analytics and web analytics [ 16 , 25 , 31 ]. Different data mining techniques can be applied on heterogeneous healthcare data sets, such as: anomaly detection, clustering, classification, association rules as well as summarization and visualization of those Big Data sets [ 65 ]. Modern data analytics techniques explore and leverage unique data characteristics even from high-speed data streams and sensor data [ 15 , 16 , 31 , 55 ]. Big Data can be used, for example, for better diagnosis in the context of comprehensive patient data, disease prevention and telemedicine (in particular when using real-time alerts for immediate care), monitoring patients at home, preventing unnecessary hospital visits, integrating medical imaging for a wider diagnosis, creating predictive analytics, reducing fraud and improving data security, better strategic planning and increasing patients’ involvement in their own health.

Big Data Analytics in healthcare can be divided into [ 33 , 73 , 74 ]:

descriptive analytics in healthcare is used to understand past and current healthcare decisions, converting data into useful information for understanding and analyzing healthcare decisions, outcomes and quality, as well as making informed decisions [ 33 ]. It can be used to create reports (i.e. about patients’ hospitalizations, physicians’ performance, utilization management), visualization, customized reports, drill down tables, or running queries on the basis of historical data.

predictive analytics operates on past performance in an effort to predict the future by examining historical or summarized health data, detecting patterns of relationships in these data, and then extrapolating these relationships to forecast. It can be used to i.e. predict the response of different patient groups to different drugs (dosages) or reactions (clinical trials), anticipate risk and find relationships in health data and detect hidden patterns [ 62 ]. In this way, it is possible to predict the epidemic spread, anticipate service contracts and plan healthcare resources. Predictive analytics is used in proper diagnosis and for appropriate treatments to be given to patients suffering from certain diseases [ 39 ].

prescriptive analytics—occurs when health problems involve too many choices or alternatives. It uses health and medical knowledge in addition to data or information. Prescriptive analytics is used in many areas of healthcare, including drug prescriptions and treatment alternatives. Personalized medicine and evidence-based medicine are both supported by prescriptive analytics.

discovery analytics—utilizes knowledge about knowledge to discover new “inventions” like drugs (drug discovery), previously unknown diseases and medical conditions, alternative treatments, etc.

Although the models and tools used in descriptive, predictive, prescriptive, and discovery analytics are different, many applications involve all four of them [ 62 ]. Big Data Analytics in healthcare can help enable personalized medicine by identifying optimal patient-specific treatments. This can influence the improvement of life standards, reduce waste of healthcare resources and save costs of healthcare [ 56 , 63 , 71 ]. The introduction of large data analysis gives new analytical possibilities in terms of scope, flexibility and visualization. Techniques such as data mining (computational pattern discovery process in large data sets) facilitate inductive reasoning and analysis of exploratory data, enabling scientists to identify data patterns that are independent of specific hypotheses. As a result, predictive analysis and real-time analysis becomes possible, making it easier for medical staff to start early treatments and reduce potential morbidity and mortality. In addition, document analysis, statistical modeling, discovering patterns and topics in document collections and data in the EHR, as well as an inductive approach can help identify and discover relationships between health phenomena.

Advanced analytical techniques can be used for a large amount of existing (but not yet analytical) data on patient health and related medical data to achieve a better understanding of the information and results obtained, as well as to design optimal clinical pathways [ 62 ]. Big Data Analytics in healthcare integrates analysis of several scientific areas such as bioinformatics, medical imaging, sensor informatics, medical informatics and health informatics [ 65 ]. Big Data Analytics in healthcare allows to analyze large datasets from thousands of patients, identifying clusters and correlation between datasets, as well as developing predictive models using data mining techniques [ 65 ]. Discussing all the techniques used for Big Data Analytics goes beyond the scope of a single article [ 25 ].

The success of Big Data analysis and its accuracy depend heavily on the tools and techniques used to analyze the ability to provide reliable, up-to-date and meaningful information to various stakeholders [ 12 ]. It is believed that the implementation of big data analytics by healthcare organizations could bring many benefits in the upcoming years, including lowering health care costs, better diagnosis and prediction of diseases and their spread, improving patient care and developing protocols to prevent re-hospitalization, optimizing staff, optimizing equipment, forecasting the need for hospital beds, operating rooms, treatments, and improving the drug supply chain [ 71 ].

Challenges and potential benefits of using Big Data Analytics in healthcare

Modern analytics gives possibilities not only to have insight in historical data, but also to have information necessary to generate insight into what may happen in the future. Even when it comes to prediction of evidence-based actions. The emphasis on reform has prompted payers and suppliers to pursue data analysis to reduce risk, detect fraud, improve efficiency and save lives. Everyone—payers, providers, even patients—are focusing on doing more with fewer resources. Thus, some areas in which enhanced data and analytics can yield the greatest results include various healthcare stakeholders (Table 1 ).

Healthcare organizations see the opportunity to grow through investments in Big Data Analytics. In recent years, by collecting medical data of patients, converting them into Big Data and applying appropriate algorithms, reliable information has been generated that helps patients, physicians and stakeholders in the health sector to identify values and opportunities [ 31 ]. It is worth noting that there are many changes and challenges in the structure of the healthcare sector. Digitization and effective use of Big Data in healthcare can bring benefits to every stakeholder in this sector. A single doctor would benefit the same as the entire healthcare system. Potential opportunities to achieve benefits and effects from Big Data in healthcare can be divided into four groups [ 8 ]:

Improving the quality of healthcare services:

assessment of diagnoses made by doctors and the manner of treatment of diseases indicated by them based on the decision support system working on Big Data collections,

detection of more effective, from a medical point of view, and more cost-effective ways to diagnose and treat patients,

analysis of large volumes of data to reach practical information useful for identifying needs, introducing new health services, preventing and overcoming crises,

prediction of the incidence of diseases,

detecting trends that lead to an improvement in health and lifestyle of the society,

analysis of the human genome for the introduction of personalized treatment.

Supporting the work of medical personnel

doctors’ comparison of current medical cases to cases from the past for better diagnosis and treatment adjustment,

detection of diseases at earlier stages when they can be more easily and quickly cured,

detecting epidemiological risks and improving control of pathogenic spots and reaction rates,

identification of patients who are predicted to have the highest risk of specific, life-threatening diseases by collating data on the history of the most common diseases, in healing people with reports entering insurance companies,

health management of each patient individually (personalized medicine) and health management of the whole society,

capturing and analyzing large amounts of data from hospitals and homes in real time, life monitoring devices to monitor safety and predict adverse events,

analysis of patient profiles to identify people for whom prevention should be applied, lifestyle change or preventive care approach,

the ability to predict the occurrence of specific diseases or worsening of patients’ results,

predicting disease progression and its determinants, estimating the risk of complications,

detecting drug interactions and their side effects.

Supporting scientific and research activity

supporting work on new drugs and clinical trials thanks to the possibility of analyzing “all data” instead of selecting a test sample,

the ability to identify patients with specific, biological features that will take part in specialized clinical trials,

selecting a group of patients for which the tested drug is likely to have the desired effect and no side effects,

using modeling and predictive analysis to design better drugs and devices.

Business and management

reduction of costs and counteracting abuse and counseling practices,

faster and more effective identification of incorrect or unauthorized financial operations in order to prevent abuse and eliminate errors,

increase in profitability by detecting patients generating high costs or identifying doctors whose work, procedures and treatment methods cost the most and offering them solutions that reduce the amount of money spent,

identification of unnecessary medical activities and procedures, e.g. duplicate tests.

According to research conducted by Wang, Kung and Byrd, Big Data Analytics benefits can be classified into five categories: IT infrastructure benefits (reducing system redundancy, avoiding unnecessary IT costs, transferring data quickly among healthcare IT systems, better use of healthcare systems, processing standardization among various healthcare IT systems, reducing IT maintenance costs regarding data storage), operational benefits (improving the quality and accuracy of clinical decisions, processing a large number of health records in seconds, reducing the time of patient travel, immediate access to clinical data to analyze, shortening the time of diagnostic test, reductions in surgery-related hospitalizations, exploring inconceivable new research avenues), organizational benefits (detecting interoperability problems much more quickly than traditional manual methods, improving cross-functional communication and collaboration among administrative staffs, researchers, clinicians and IT staffs, enabling data sharing with other institutions and adding new services, content sources and research partners), managerial benefits (gaining quick insights about changing healthcare trends in the market, providing members of the board and heads of department with sound decision-support information on the daily clinical setting, optimizing business growth-related decisions) and strategic benefits (providing a big picture view of treatment delivery for meeting future need, creating high competitive healthcare services) [ 73 ].

The above specification does not constitute a full list of potential areas of use of Big Data Analysis in healthcare because the possibilities of using analysis are practically unlimited. In addition, advanced analytical tools allow to analyze data from all possible sources and conduct cross-analyses to provide better data insights [ 26 ]. For example, a cross-analysis can refer to a combination of patient characteristics, as well as costs and care results that can help identify the best, in medical terms, and the most cost-effective treatment or treatments and this may allow a better adjustment of the service provider’s offer [ 62 ].

In turn, the analysis of patient profiles (e.g. segmentation and predictive modeling) allows identification of people who should be subject to prophylaxis, prevention or should change their lifestyle [ 8 ]. Shortened list of benefits for Big Data Analytics in healthcare is presented in paper [ 3 ] and consists of: better performance, day-to-day guides, detection of diseases in early stages, making predictive analytics, cost effectiveness, Evidence Based Medicine and effectiveness in patient treatment.

Summarizing, healthcare big data represents a huge potential for the transformation of healthcare: improvement of patients’ results, prediction of outbreaks of epidemics, valuable insights, avoidance of preventable diseases, reduction of the cost of healthcare delivery and improvement of the quality of life in general [ 1 ]. Big Data also generates many challenges such as difficulties in data capture, data storage, data analysis and data visualization [ 15 ]. The main challenges are connected with the issues of: data structure (Big Data should be user-friendly, transparent, and menu-driven but it is fragmented, dispersed, rarely standardized and difficult to aggregate and analyze), security (data security, privacy and sensitivity of healthcare data, there are significant concerns related to confidentiality), data standardization (data is stored in formats that are not compatible with all applications and technologies), storage and transfers (especially costs associated with securing, storing, and transferring unstructured data), managerial skills, such as data governance, lack of appropriate analytical skills and problems with Real-Time Analytics (health care is to be able to utilize Big Data in real time) [ 4 , 34 , 41 ].

The research is based on a critical analysis of the literature, as well as the presentation of selected results of direct research on the use of Big Data Analytics in medical facilities in Poland.

Presented research results are part of a larger questionnaire form on Big Data Analytics. The direct research was based on an interview questionnaire which contained 100 questions with 5-point Likert scale (1—strongly disagree, 2—I rather disagree, 3—I do not agree, nor disagree, 4—I rather agree, 5—I definitely agree) and 4 metrics questions. The study was conducted in December 2018 on a sample of 217 medical facilities (110 private, 107 public). The research was conducted by a specialized market research agency: Center for Research and Expertise of the University of Economics in Katowice.

When it comes to direct research, the selected entities included entities financed from public sources—the National Health Fund (23.5%), and entities operating commercially (11.5%). In the surveyed group of entities, more than a half (64.9%) are hybrid financed, both from public and commercial sources. The diversity of the research sample also applies to the size of the entities, defined by the number of employees. Taking into account proportions of the surveyed entities, it should be noted that in the sector structure, medium-sized (10–50 employees—34% of the sample) and large (51–250 employees—27%) entities dominate. The research was of all-Poland nature, and the entities included in the research sample come from all of the voivodships. The largest group were entities from Łódzkie (32%), Śląskie (18%) and Mazowieckie (18%) voivodships, as these voivodships have the largest number of medical institutions. Other regions of the country were represented by single units. The selection of the research sample was random—layered. As part of medical facilities database, groups of private and public medical facilities have been identified and the ones to which the questionnaire was targeted were drawn from each of these groups. The analyses were performed using the GNU PSPP 0.10.2 software.

The aim of the study was to determine whether medical facilities in Poland use Big Data Analytics and if so, in which areas. Characteristics of the research sample is presented in Table 2 .

The research is non-exhaustive due to the incomplete and uneven regional distribution of the samples, overrepresented in three voivodeships (Łódzkie, Mazowieckie and Śląskie). The size of the research sample (217 entities) allows the authors of the paper to formulate specific conclusions on the use of Big Data in the process of its management.

For the purpose of this paper, the following research hypotheses were formulated: (1) medical facilities in Poland are working on both structured and unstructured data (2) medical facilities in Poland are moving towards data-based healthcare and its benefits.

The paper poses the following research questions and statements that coincide with the selected questions from the research questionnaire:

From what sources do medical facilities obtain data? What types of data are used by the particular organization, whether structured or unstructured, and to what extent?

From what sources do medical facilities obtain data?

In which area organizations are using data and analytical systems (clinical or business)?

Is data analytics performed based on historical data or are predictive analyses also performed?

Determining whether administrative and medical staff receive complete, accurate and reliable data in a timely manner?

Determining whether real-time analyses are performed to support the particular organization’s activities.

Results and discussion

On the basis of the literature analysis and research study, a set of questions and statements related to the researched area was formulated. The results from the surveys show that medical facilities use a variety of data sources in their operations. These sources are both structured and unstructured data (Table 3 ).

According to the data provided by the respondents, considering the first statement made in the questionnaire, almost half of the medical institutions (47.58%) agreed that they rather collect and use structured data (e.g. databases and data warehouses, reports to external entities) and 10.57% entirely agree with this statement. As much as 23.35% of representatives of medical institutions stated “I agree or disagree”. Other medical facilities do not collect and use structured data (7.93%) and 6.17% strongly disagree with the first statement. Also, the median calculated based on the obtained results (median: 4), proves that medical facilities in Poland collect and use structured data (Table 4 ).

In turn, 28.19% of the medical institutions agreed that they rather collect and use unstructured data and as much as 9.25% entirely agree with this statement. The number of representatives of medical institutions that stated “I agree or disagree” was 27.31%. Other medical facilities do not collect and use structured data (17.18%) and 13.66% strongly disagree with the first statement. In the case of unstructured data the median is 3, which means that the collection and use of this type of data by medical facilities in Poland is lower.

In the further part of the analysis, it was checked whether the size of the medical facility and form of ownership have an impact on whether it analyzes unstructured data (Tables 4 and 5 ). In order to find this out, correlation coefficients were calculated.

Based on the calculations, it can be concluded that there is a small statistically monotonic correlation between the size of the medical facility and its collection and use of structured data (p < 0.001; τ = 0.16). This means that the use of structured data is slightly increasing in larger medical facilities. The size of the medical facility is more important according to use of unstructured data (p < 0.001; τ = 0.23) (Table 4 .).

To determine whether the form of medical facility ownership affects data collection, the Mann–Whitney U test was used. The calculations show that the form of ownership does not affect what data the organization collects and uses (Table 5 ).

Detailed information on the sources of from which medical facilities collect and use data is presented in the Table 6 .

The questionnaire results show that medical facilities are especially using information published in databases, reports to external units and transaction data, but they also use unstructured data from e-mails, medical devices, sensors, phone calls, audio and video data (Table 6 ). Data from social media, RFID and geolocation data are used to a small extent. Similar findings are concluded in the literature studies.

From the analysis of the answers given by the respondents, more than half of the medical facilities have integrated hospital system (HIS) implemented. As much as 43.61% use integrated hospital system and 16.30% use it extensively (Table 7 ). 19.38% of exanimated medical facilities do not use it at all. Moreover, most of the examined medical facilities (34.80% use it, 32.16% use extensively) conduct medical documentation in an electronic form, which gives an opportunity to use data analytics. Only 4.85% of medical facilities don’t use it at all.

Other problems that needed to be investigated were: whether medical facilities in Poland use data analytics? If so, in what form and in what areas? (Table 8 ). The analysis of answers given by the respondents about the potential of data analytics in medical facilities shows that a similar number of medical facilities use data analytics in administration and business (31.72% agreed with the statement no. 5 and 12.33% strongly agreed) as in the clinical area (33.04% agreed with the statement no. 6 and 12.33% strongly agreed). When considering decision-making issues, 35.24% agree with the statement "the organization uses data and analytical systems to support business decisions” and 8.37% of respondents strongly agree. Almost 40.09% agree with the statement that “the organization uses data and analytical systems to support clinical decisions (in the field of diagnostics and therapy)” and 15.42% of respondents strongly agree. Exanimated medical facilities use in their activity analytics based both on historical data (33.48% agree with statement 7 and 12.78% strongly agree) and predictive analytics (33.04% agrees with the statement number 8 and 15.86% strongly agree). Detailed results are presented in Table 8 .

Medical facilities focus on development in the field of data processing, as they confirm that they conduct analytical planning processes systematically and analyze new opportunities for strategic use of analytics in business and clinical activities (38.33% rather agree and 10.57% strongly agree with this statement). The situation is different with real-time data analysis, here, the situation is not so optimistic. Only 28.19% rather agree and 14.10% strongly agree with the statement that real-time analyses are performed to support an organization’s activities.

When considering whether a facility’s performance in the clinical area depends on the form of ownership, it can be concluded that taking the average and the Mann–Whitney U test depends. A higher degree of use of analyses in the clinical area can be observed in public institutions.

Whether a medical facility performs a descriptive or predictive analysis do not depend on the form of ownership (p > 0.05). It can be concluded that when analyzing the mean and median, they are higher in public facilities, than in private ones. What is more, the Mann–Whitney U test shows that these variables are dependent from each other (p < 0.05) (Table 9 ).

When considering whether a facility’s performance in the clinical area depends on its size, it can be concluded that taking the Kendall’s Tau (τ) it depends (p < 0.001; τ = 0.22), and the correlation is weak but statistically important. This means that the use of data and analytical systems to support clinical decisions (in the field of diagnostics and therapy) increases with the increase of size of the medical facility. A similar relationship, but even less powerful, can be found in the use of descriptive and predictive analyses (Table 10 ).

Considering the results of research in the area of analytical maturity of medical facilities, 8.81% of medical facilities stated that they are at the first level of maturity, i.e. an organization has developed analytical skills and does not perform analyses. As much as 13.66% of medical facilities confirmed that they have poor analytical skills, while 38.33% of the medical facility has located itself at level 3, meaning that “there is a lot to do in analytics”. On the other hand, 28.19% believe that analytical capabilities are well developed and 6.61% stated that analytics are at the highest level and the analytical capabilities are very well developed. Detailed data is presented in Table 11 . Average amounts to 3.11 and Median to 3.

The results of the research have enabled the formulation of following conclusions. Medical facilities in Poland are working on both structured and unstructured data. This data comes from databases, transactions, unstructured content of emails and documents, devices and sensors. However, the use of data from social media is smaller. In their activity, they reach for analytics in the administrative and business, as well as in the clinical area. Also, the decisions made are largely data-driven.

In summary, analysis of the literature that the benefits that medical facilities can get using Big Data Analytics in their activities relate primarily to patients, physicians and medical facilities. It can be confirmed that: patients will be better informed, will receive treatments that will work for them, will have prescribed medications that work for them and not be given unnecessary medications [ 78 ]. Physician roles will likely change to more of a consultant than decision maker. They will advise, warn, and help individual patients and have more time to form positive and lasting relationships with their patients in order to help people. Medical facilities will see changes as well, for example in fewer unnecessary hospitalizations, resulting initially in less revenue, but after the market adjusts, also the accomplishment [ 78 ]. The use of Big Data Analytics can literally revolutionize the way healthcare is practiced for better health and disease reduction.

The analysis of the latest data reveals that data analytics increase the accuracy of diagnoses. Physicians can use predictive algorithms to help them make more accurate diagnoses [ 45 ]. Moreover, it could be helpful in preventive medicine and public health because with early intervention, many diseases can be prevented or ameliorated [ 29 ]. Predictive analytics also allows to identify risk factors for a given patient, and with this knowledge patients will be able to change their lives what, in turn, may contribute to the fact that population disease patterns may dramatically change, resulting in savings in medical costs. Moreover, personalized medicine is the best solution for an individual patient seeking treatment. It can help doctors decide the exact treatments for those individuals. Better diagnoses and more targeted treatments will naturally lead to increases in good outcomes and fewer resources used, including doctors’ time.

The quantitative analysis of the research carried out and presented in this article made it possible to determine whether medical facilities in Poland use Big Data Analytics and if so, in which areas. Thanks to the results obtained it was possible to formulate the following conclusions. Medical facilities are working on both structured and unstructured data, which comes from databases, transactions, unstructured content of emails and documents, devices and sensors. According to analytics, they reach for analytics in the administrative and business, as well as in the clinical area. It clearly showed that the decisions made are largely data-driven. The results of the study confirm what has been analyzed in the literature. Medical facilities are moving towards data-based healthcare and its benefits.

In conclusion, Big Data Analytics has the potential for positive impact and global implications in healthcare. Future research on the use of Big Data in medical facilities will concern the definition of strategies adopted by medical facilities to promote and implement such solutions, as well as the benefits they gain from the use of Big Data analysis and how the perspectives in this area are seen.

Practical implications

This work sought to narrow the gap that exists in analyzing the possibility of using Big Data Analytics in healthcare. Showing how medical facilities in Poland are doing in this respect is an element that is part of global research carried out in this area, including [ 29 , 32 , 60 ].

Limitations and future directions

The research described in this article does not fully exhaust the questions related to the use of Big Data Analytics in Polish healthcare facilities. Only some of the dimensions characterizing the use of data by medical facilities in Poland have been examined. In order to get the full picture, it would be necessary to examine the results of using structured and unstructured data analytics in healthcare. Future research may examine the benefits that medical institutions achieve as a result of the analysis of structured and unstructured data in the clinical and management areas and what limitations they encounter in these areas. For this purpose, it is planned to conduct in-depth interviews with chosen medical facilities in Poland. These facilities could give additional data for empirical analyses based more on their suggestions. Further research should also include medical institutions from beyond the borders of Poland, enabling international comparative analyses.

Future research in the healthcare field has virtually endless possibilities. These regard the use of Big Data Analytics to diagnose specific conditions [ 47 , 66 , 69 , 76 ], propose an approach that can be used in other healthcare applications and create mechanisms to identify “patients like me” [ 75 , 80 ]. Big Data Analytics could also be used for studies related to the spread of pandemics, the efficacy of covid treatment [ 18 , 79 ], or psychology and psychiatry studies, e.g. emotion recognition [ 35 ].

Availability of data and materials

The datasets for this study are available on request to the corresponding author.

Abouelmehdi K, Beni-Hessane A, Khaloufi H. Big healthcare data: preserving security and privacy. J Big Data. 2018. https://doi.org/10.1186/s40537-017-0110-7 .

Article   Google Scholar  

Agrawal A, Choudhary A. Health services data: big data analytics for deriving predictive healthcare insights. Health Serv Eval. 2019. https://doi.org/10.1007/978-1-4899-7673-4_2-1 .

Al Mayahi S, Al-Badi A, Tarhini A. Exploring the potential benefits of big data analytics in providing smart healthcare. In: Miraz MH, Excell P, Ware A, Ali M, Soomro S, editors. Emerging technologies in computing—first international conference, iCETiC 2018, proceedings (Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST). Cham: Springer; 2018. p. 247–58. https://doi.org/10.1007/978-3-319-95450-9_21 .

Bainbridge M. Big data challenges for clinical and precision medicine. In: Househ M, Kushniruk A, Borycki E, editors. Big data, big challenges: a healthcare perspective: background, issues, solutions and research directions. Cham: Springer; 2019. p. 17–31.

Google Scholar  

Bartuś K, Batko K, Lorek P. Business intelligence systems: barriers during implementation. In: Jabłoński M, editor. Strategic performance management new concept and contemporary trends. New York: Nova Science Publishers; 2017. p. 299–327. ISBN: 978-1-53612-681-5.

Bartuś K, Batko K, Lorek P. Diagnoza wykorzystania big data w organizacjach-wybrane wyniki badań. Informatyka Ekonomiczna. 2017;3(45):9–20.

Bartuś K, Batko K, Lorek P. Wykorzystanie rozwiązań business intelligence, competitive intelligence i big data w przedsiębiorstwach województwa śląskiego. Przegląd Organizacji. 2018;2:33–9.

Batko K. Możliwości wykorzystania Big Data w ochronie zdrowia. Roczniki Kolegium Analiz Ekonomicznych. 2016;42:267–82.

Bi Z, Cochran D. Big data analytics with applications. J Manag Anal. 2014;1(4):249–65. https://doi.org/10.1080/23270012.2014.992985 .

Boerma T, Requejo J, Victora CG, Amouzou A, Asha G, Agyepong I, Borghi J. Countdown to 2030: tracking progress towards universal coverage for reproductive, maternal, newborn, and child health. Lancet. 2018;391(10129):1538–48.

Bollier D, Firestone CM. The promise and peril of big data. Washington, D.C: Aspen Institute, Communications and Society Program; 2010. p. 1–66.

Bose R. Competitive intelligence process and tools for intelligence analysis. Ind Manag Data Syst. 2008;108(4):510–28.

Carter P. Big data analytics: future architectures, skills and roadmaps for the CIO: in white paper, IDC sponsored by SAS. 2011. p. 1–16.

Castro EM, Van Regenmortel T, Vanhaecht K, Sermeus W, Van Hecke A. Patient empowerment, patient participation and patient-centeredness in hospital care: a concept analysis based on a literature review. Patient Educ Couns. 2016;99(12):1923–39.

Chen H, Chiang RH, Storey VC. Business intelligence and analytics: from big data to big impact. MIS Q. 2012;36(4):1165–88.

Chen CP, Zhang CY. Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci. 2014;275:314–47.

Chomiak-Orsa I, Mrozek B. Główne perspektywy wykorzystania big data w mediach społecznościowych. Informatyka Ekonomiczna. 2017;3(45):44–54.

Corsi A, de Souza FF, Pagani RN, et al. Big data analytics as a tool for fighting pandemics: a systematic review of literature. J Ambient Intell Hum Comput. 2021;12:9163–80. https://doi.org/10.1007/s12652-020-02617-4 .

Davenport TH, Harris JG. Competing on analytics, the new science of winning. Boston: Harvard Business School Publishing Corporation; 2007.

Davenport TH. Big data at work: dispelling the myths, uncovering the opportunities. Boston: Harvard Business School Publishing; 2014.

De Cnudde S, Martens D. Loyal to your city? A data mining analysis of a public service loyalty program. Decis Support Syst. 2015;73:74–84.

Erickson S, Rothberg H. Data, information, and intelligence. In: Rodriguez E, editor. The analytics process. Boca Raton: Auerbach Publications; 2017. p. 111–26.

Fang H, Zhang Z, Wang CJ, Daneshmand M, Wang C, Wang H. A survey of big data research. IEEE Netw. 2015;29(5):6–9.

Fredriksson C. Organizational knowledge creation with big data. A case study of the concept and practical use of big data in a local government context. 2016. https://www.abo.fi/fakultet/media/22103/fredriksson.pdf .

Gandomi A, Haider M. Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag. 2015;35(2):137–44.

Groves P, Kayyali B, Knott D, Van Kuiken S. The ‘big data’ revolution in healthcare. Accelerating value and innovation. 2015. http://www.pharmatalents.es/assets/files/Big_Data_Revolution.pdf (Reading: 10.04.2019).

Gupta V, Rathmore N. Deriving business intelligence from unstructured data. Int J Inf Comput Technol. 2013;3(9):971–6.

Gupta V, Singh VK, Ghose U, Mukhija P. A quantitative and text-based characterization of big data research. J Intell Fuzzy Syst. 2019;36:4659–75.

Hampel HOBS, O’Bryant SE, Castrillo JI, Ritchie C, Rojkova K, Broich K, Escott-Price V. PRECISION MEDICINE-the golden gate for detection, treatment and prevention of Alzheimer’s disease. J Prev Alzheimer’s Dis. 2016;3(4):243.

Harerimana GB, Jang J, Kim W, Park HK. Health big data analytics: a technology survey. IEEE Access. 2018;6:65661–78. https://doi.org/10.1109/ACCESS.2018.2878254 .

Hu H, Wen Y, Chua TS, Li X. Toward scalable systems for big data analytics: a technology tutorial. IEEE Access. 2014;2:652–87.

Hussain S, Hussain M, Afzal M, Hussain J, Bang J, Seung H, Lee S. Semantic preservation of standardized healthcare documents in big data. Int J Med Inform. 2019;129:133–45. https://doi.org/10.1016/j.ijmedinf.2019.05.024 .

Islam MS, Hasan MM, Wang X, Germack H. A systematic review on healthcare analytics: application and theoretical perspective of data mining. In: Healthcare. Basel: Multidisciplinary Digital Publishing Institute; 2018. p. 54.

Ismail A, Shehab A, El-Henawy IM. Healthcare analysis in smart big data analytics: reviews, challenges and recommendations. In: Security in smart cities: models, applications, and challenges. Cham: Springer; 2019. p. 27–45.

Jain N, Gupta V, Shubham S, et al. Understanding cartoon emotion using integrated deep neural network on large dataset. Neural Comput Appl. 2021. https://doi.org/10.1007/s00521-021-06003-9 .

Janssen M, van der Voort H, Wahyudi A. Factors influencing big data decision-making quality. J Bus Res. 2017;70:338–45.

Jordan SR. Beneficence and the expert bureaucracy. Public Integr. 2014;16(4):375–94. https://doi.org/10.2753/PIN1099-9922160404 .

Knapp MM. Big data. J Electron Resourc Med Libr. 2013;10(4):215–22.

Koti MS, Alamma BH. Predictive analytics techniques using big data for healthcare databases. In: Smart intelligent computing and applications. New York: Springer; 2019. p. 679–86.

Krumholz HM. Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. Health Aff. 2014;33(7):1163–70.

Kruse CS, Goswamy R, Raval YJ, Marawi S. Challenges and opportunities of big data in healthcare: a systematic review. JMIR Med Inform. 2016;4(4):e38.

Kyoungyoung J, Gang HK. Potentiality of big data in the medical sector: focus on how to reshape the healthcare system. Healthc Inform Res. 2013;19(2):79–85.

Laney D. Application delivery strategies 2011. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf .

Lee IK, Wang CC, Lin MC, Kung CT, Lan KC, Lee CT. Effective strategies to prevent coronavirus disease-2019 (COVID-19) outbreak in hospital. J Hosp Infect. 2020;105(1):102.

Lerner I, Veil R, Nguyen DP, Luu VP, Jantzen R. Revolution in health care: how will data science impact doctor-patient relationships? Front Public Health. 2018;6:99.

Lytras MD, Papadopoulou P, editors. Applying big data analytics in bioinformatics and medicine. IGI Global: Hershey; 2017.

Ma K, et al. Big data in multiple sclerosis: development of a web-based longitudinal study viewer in an imaging informatics-based eFolder system for complex data analysis and management. In: Proceedings volume 9418, medical imaging 2015: PACS and imaging informatics: next generation and innovations. 2015. p. 941809. https://doi.org/10.1117/12.2082650 .

Mach-Król M. Analiza i strategia big data w organizacjach. In: Studia i Materiały Polskiego Stowarzyszenia Zarządzania Wiedzą. 2015;74:43–55.

Madsen LB. Data-driven healthcare: how analytics and BI are transforming the industry. Hoboken: Wiley; 2014.

Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Hung BA. Big data: the next frontier for innovation, competition, and productivity. Washington: McKinsey Global Institute; 2011.

Marconi K, Dobra M, Thompson C. The use of big data in healthcare. In: Liebowitz J, editor. Big data and business analytics. Boca Raton: CRC Press; 2012. p. 229–48.

Mehta N, Pandit A. Concurrence of big data analytics and healthcare: a systematic review. Int J Med Inform. 2018;114:57–65.

Michel M, Lupton D. Toward a manifesto for the ‘public understanding of big data.’ Public Underst Sci. 2016;25(1):104–16. https://doi.org/10.1177/0963662515609005 .

Mikalef P, Krogstie J. Big data analytics as an enabler of process innovation capabilities: a configurational approach. In: International conference on business process management. Cham: Springer; 2018. p. 426–41.

Mohammadi M, Al-Fuqaha A, Sorour S, Guizani M. Deep learning for IoT big data and streaming analytics: a survey. IEEE Commun Surv Tutor. 2018;20(4):2923–60.

Nambiar R, Bhardwaj R, Sethi A, Vargheese R. A look at challenges and opportunities of big data analytics in healthcare. In: 2013 IEEE international conference on big data; 2013. p. 17–22.

Ohlhorst F. Big data analytics: turning big data into big money, vol. 65. Hoboken: Wiley; 2012.

Olszak C, Mach-Król M. A conceptual framework for assessing an organization’s readiness to adopt big data. Sustainability. 2018;10(10):3734.

Olszak CM. Toward better understanding and use of business intelligence in organizations. Inf Syst Manag. 2016;33(2):105–23.

Palanisamy V, Thirunavukarasu R. Implications of big data analytics in developing healthcare frameworks—a review. J King Saud Univ Comput Inf Sci. 2017;31(4):415–25.

Provost F, Fawcett T. Data science and its relationship to big data and data-driven decisionmaking. Big Data. 2013;1(1):51–9.

Raghupathi W, Raghupathi V. An overview of health analytics. J Health Med Inform. 2013;4:132. https://doi.org/10.4172/2157-7420.1000132 .

Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst. 2014;2(1):3.

Ratia M, Myllärniemi J. Beyond IC 4.0: the future potential of BI-tool utilization in the private healthcare, conference: proceedings IFKAD, 2018 at: Delft, The Netherlands.

Ristevski B, Chen M. Big data analytics in medicine and healthcare. J Integr Bioinform. 2018. https://doi.org/10.1515/jib-2017-0030 .

Rumsfeld JS, Joynt KE, Maddox TM. Big data analytics to improve cardiovascular care: promise and challenges. Nat Rev Cardiol. 2016;13(6):350–9. https://doi.org/10.1038/nrcardio.2016.42 .

Schmarzo B. Big data: understanding how data powers big business. Indianapolis: Wiley; 2013.

Senthilkumar SA, Rai BK, Meshram AA, Gunasekaran A, Chandrakumarmangalam S. Big data in healthcare management: a review of literature. Am J Theor Appl Bus. 2018;4:57–69.

Shubham S, Jain N, Gupta V, et al. Identify glomeruli in human kidney tissue images using a deep learning approach. Soft Comput. 2021. https://doi.org/10.1007/s00500-021-06143-z .

Thuemmler C. The case for health 4.0. In: Thuemmler C, Bai C, editors. Health 4.0: how virtualization and big data are revolutionizing healthcare. New York: Springer; 2017.

Tsai CW, Lai CF, Chao HC, et al. Big data analytics: a survey. J Big Data. 2015;2:21. https://doi.org/10.1186/s40537-015-0030-3 .

Wamba SF, Gunasekaran A, Akter S, Ji-fan RS, Dubey R, Childe SJ. Big data analytics and firm performance: effects of dynamic capabilities. J Bus Res. 2017;70:356–65.

Wang Y, Byrd TA. Business analytics-enabled decision-making effectiveness through knowledge absorptive capacity in health care. J Knowl Manag. 2017;21(3):517–39.

Wang Y, Kung L, Wang W, Yu C, Cegielski CG. An integrated big data analytics-enabled transformation model: application to healthcare. Inf Manag. 2018;55(1):64–79.

Wicks P, et al. Scaling PatientsLikeMe via a “generalized platform” for members with chronic illness: web-based survey study of benefits arising. J Med Internet Res. 2018;20(5):e175.

Willems SM, et al. The potential use of big data in oncology. Oral Oncol. 2019;98:8–12. https://doi.org/10.1016/j.oraloncology.2019.09.003 .

Williams N, Ferdinand NP, Croft R. Project management maturity in the age of big data. Int J Manag Proj Bus. 2014;7(2):311–7.

Winters-Miner LA. Seven ways predictive analytics can improve healthcare. Medical predictive analytics have the potential to revolutionize healthcare around the world. 2014. https://www.elsevier.com/connect/seven-ways-predictive-analytics-can-improve-healthcare (Reading: 15.04.2019).

Wu J, et al. Application of big data technology for COVID-19 prevention and control in China: lessons and recommendations. J Med Internet Res. 2020;22(10): e21980.

Yan L, Peng J, Tan Y. Network dynamics: how can we find patients like us? Inf Syst Res. 2015;26(3):496–512.

Yang JJ, Li J, Mulder J, Wang Y, Chen S, Wu H, Pan H. Emerging information technologies for enhanced healthcare. Comput Ind. 2015;69:3–11.

Zhang Q, Yang LT, Chen Z, Li P. A survey on deep learning for big data. Inf Fusion. 2018;42:146–57.

Download references

Acknowledgements

We would like to thank those who have touched our science paths.

This research was fully funded as statutory activity—subsidy of Ministry of Science and Higher Education granted for Technical University of Czestochowa on maintaining research potential in 2018. Research Number: BS/PB–622/3020/2014/P. Publication fee for the paper was financed by the University of Economics in Katowice.

Author information

Authors and affiliations.

Department of Business Informatics, University of Economics in Katowice, Katowice, Poland

Kornelia Batko

Department of Biomedical Processes and Systems, Institute of Health and Nutrition Sciences, Częstochowa University of Technology, Częstochowa, Poland

Andrzej Ślęzak

You can also search for this author in PubMed   Google Scholar

Contributions

KB proposed the concept of research and its design. The manuscript was prepared by KB with the consultation of AŚ. AŚ reviewed the manuscript for getting its fine shape. KB prepared the manuscript in the contexts such as definition of intellectual content, literature search, data acquisition, data analysis, and so on. AŚ obtained research funding. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Kornelia Batko .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The author declares no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Batko, K., Ślęzak, A. The use of Big Data Analytics in healthcare. J Big Data 9 , 3 (2022). https://doi.org/10.1186/s40537-021-00553-4

Download citation

Received : 28 August 2021

Accepted : 19 December 2021

Published : 06 January 2022

DOI : https://doi.org/10.1186/s40537-021-00553-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Big Data Analytics
  • Data-driven healthcare

big data analytics research papers 2021

loader image

|| Learn Easy for Engineering Students

Big Data Analytics 21CS71

Big Data Analytics 21CS71

Course Code: 21CS71

Credits: 03

CIE Marks: 50

SEE Marks: 50

Total Marks: 100

Exam Hours: 03

Teaching Hours/Weeks: [L:T:P:S] 3:0:0:0

Introduction to Big Data Analytics : Big Data, Scalability and Parallel Processing, Designing Data Architecture, Data Sources, Quality, Pre-Processing and Storing, Data Storage and Analysis, Big Data Analytics Applications and Case Studies.

Introduction to Hadoop (T1) : Introduction, Hadoop and its Ecosystem, Hadoop Distributed File System, MapReduce Framework and Programming Model, Hadoop Yarn, Hadoop Ecosystem Tools. Hadoop Distributed File System Basics (T2) : HDFS Design Features, Components, HDFS User Commands. Essential Hadoop Tools (T2) : Using Apache Pig, Hive, Sqoop, Flume, Oozie, HBase.

NoSQL Big Data Management, MongoDB and Cassandra : Introduction, NoSQL Data Store, NoSQL Data Architecture Patterns, NoSQL to Manage Big Data, Shared-Nothing Architecture for Big Data Tasks, MongoDB, Databases, Cassandra Databases.

MapReduce, Hive and Pig : Introduction, MapReduce Map Tasks, Reduce Tasks and MapReduce Execution, Composing MapReduce for Calculations and Algorithms, Hive, HiveQL, Pig.

Machine Learning Algorithms for Big Data Analytics : Introduction, Estimating the relationships, Outliers, Variances, Probability Distributions, and Correlations, Regression analysis, Finding Similar Items, Similarity of Sets and Collaborative Filtering, Frequent Itemsets and Association Rule Mining. Text, Web Content, Link, and Social Network Analytics : Introduction, Text mining, Web Mining, Web Content and Web Usage Analytics, Page Rank, Structure of Web and analyzing a Web Graph, Social Network as Graphs and Social Network Analytics:

2018 SCHEME QUESTION PAPER

Model set 1 paper, previous year paper, previous year paper solution, leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

IMAGES

  1. (PDF) Big Data Analytics

    big data analytics research papers 2021

  2. (PDF) Review Paper on Big Data Analytics

    big data analytics research papers 2021

  3. (PDF) Big Data Analytics Research in India -A Perspective

    big data analytics research papers 2021

  4. (PDF) Cloud Based Big Data Analytics: A Survey of Current Research and

    big data analytics research papers 2021

  5. (PDF) Call for Special Issue Papers: Big Data Analytics and Intelligent

    big data analytics research papers 2021

  6. (PDF) An Overview of Big Data Analytics in Banking: Approaches

    big data analytics research papers 2021

VIDEO

  1. Elasticsearch in 5 minutes

  2. Big Data Analytics 2022 Question Paper

  3. #2

  4. Data analytics research proposal

  5. Journey to Data Analyst in 2024: A Step-by-Step Guide #dataanalytics

  6. Applications of Big Data Analytics

COMMENTS

  1. A new theoretical understanding of big data analytics capabilities in

    Of the 70 papers satisfying our selection criteria, publication year and type (journal or conference paper) reveal an increasing trend in big data analytics over the last 6 years (Table 6). Additionally, journals produced more BDA papers than Conference proceedings (Fig. 2 ), which may be affected during 2020-2021 because of COVID, and fewer ...

  2. Big data analytics capabilities: Patchwork or progress? A systematic

    This paper presents a systematic literature review of the research field on big data analytics capabilities (BDACs). With the emergence of big data and digital transformation, a growing number of researchers have highlighted the need for organizations to develop BDACs. ... Secondly, drawing on a literature review paper by Annarelli et al. (2021 ...

  3. Big Data Research

    A Security Management Framework for Big Data in Smart Healthcare. Parsa Sarosh, Shabir A. Parah, G. Mohiuddin Bhat, Khan Muhammad. Article 100225.

  4. Big Data: Big Data Analysis, Issues and Challenges and Technologies

    This content was downloaded from IP address 67.227.110.191 on 19/01/2021 at 17:22 ... the emergence of big data. In this paper, an efficient predictive analytics system for high dimensional big ...

  5. Factor Influencing the Adoption of Big Data Analytics: A Systematic

    Big data analytics offer SEMs with a way to analyze data produced from different sources to determine the answers to questions concerning cost reduction, time reduction, new product development, optimized offerings, and informed decision-making (Al-Dmour et al., 2023; Kushwaha et al., 2021; Ranjan & Foropon, 2021; Shahbaz et al., 2021).

  6. Articles

    In recent years, mobile applications have proliferated across domains such as E-banking, Augmented Reality, E-Transportation, and E-Healthcare. These applications are often built using microservices, an archit... Abdul Rasheed Mahesar, Xiaoping Li and Dileep Kumar Sajnani. Journal of Big Data 2024 11:123. Research Published on: 4 September 2024.

  7. Home page

    Journal of Big Data: Home page

  8. DECAS: a modern data-driven decision theory for big data and analytics

    DECAS: a modern data-driven decision theory for big ...

  9. Big Data and Predictive Analytics for Business Intelligence: A ...

    Big data technology and predictive analytics exhibit advanced potential for business intelligence (BI), especially for decision-making. This study aimed to explore current research studies, historic developing trends, and the future direction. A bibliographic study based on CiteSpace is implemented in this paper, 681 non-duplicate publications are retrieved from databases of Web of Science ...

  10. A comprehensive and systematic literature review on the big data

    The Internet of Things (IoT) is a communication paradigm and a collection of heterogeneous interconnected devices. It produces large-scale distributed, and diverse data called big data. Big Data Management (BDM) in IoT is used for knowledge discovery and intelligent decision-making and is one of the most significant research challenges today. There are several mechanisms and technologies for ...

  11. Big data analytics and machine learning: A retrospective overview and

    Therefore, the paper considers this gap in bibliometric studies and extends the bibliometric survey of big data analytics to capturing its culmination into ML. The study is approached with the following three research questions to address the research gaps: (1) what is the focus of the current research on big data analytics and ML?

  12. A decade of research into the application of big data and analytics in

    The need for data-driven decision-making primarily motivates interest in analysing Big Data in higher education. Although there has been considerable research on the value of Big Data in higher education, its application to address critical issues within the sector is still limited. This systematic review, conducted in December 2021 and encompassing 75 papers, analysed the applications of Big ...

  13. Big data applications on the Internet of Things: A systematic

    This paper systematically studies the latest research methods on big data in IoT approaches published between 2016 and August 2021. A methodical taxonomy is shown for big data in IoT-related fields consistent with the content of existing articles chosen with the SLR process in this research like healthcare, smart city, algorithms, industry, and ...

  14. Big data analysis for decision-making processes: challenges and

    This thinking is confirmed by the fact that most organizations continue to invest in big data analytics. Consequently, research in the last decade has focused on big data in the health-care sector and the literature ... where c stands for the number of obtained citations for each paper, x indicates the year 2021 and y is the year of publication ...

  15. Big data analytics in Cloud computing: an overview

    Big data analytics in Cloud computing: an overview

  16. 15 years of Big Data: a systematic literature review

    Big Data is still gaining attention as a fundamental building block of the Artificial Intelligence and Machine Learning world. Therefore, a lot of effort has been pushed into Big Data research in the last 15 years. The objective of this Systematic Literature Review is to summarize the current state of the art of the previous 15 years of research about Big Data by providing answers to a set of ...

  17. Research themes in big data analytics for policymaking: Insights from a

    The study offers a view of the foundations of big data and data analytics in the public policy literature, enabling scholars to have a more substantial and holistic viewpoint. We focused on two research questions: RQ1. What are the thematic communities of big data and data analytics literature concerning public policy-making? RQ2.

  18. Big Data, Big Data Analytics, and Policymaking During a ...

    To describe the promise and potential of big data analytics in healthcare. The paper describes the nascent field of big data analytics in healthcare, discusses the benefits, outlines an ...

  19. Big Data Applications the Banking Sector: A Bibliometric Analysis

    Furthermore, Al-Dmour et al. (2021) ... Profit is another theme in which big data research papers have been filtered and shown in the content analysis table. Competition and increased profit levels demand unique, accurate, and massive processed data for business changes. ... highlight in their research that big data analytics is widely ...

  20. Big data analytics and hotel guest experience: a critical analysis of

    The purpose of this paper is to critically review and synthesise the literature to shed light on trends and extant patterns in the application of big data in hospitality, particularly in research focusing on hotel guest experience and satisfaction.,This research is based on a Preferred Reporting Items for Systematic Reviews and Meta-analysis ...

  21. Time series big data: a survey on data stream frameworks, analysis and

    Big data has a substantial role nowadays, and its importance has significantly increased over the last decade. Big data's biggest advantages are providing knowledge, supporting the decision-making process, and improving the use of resources, services, and infrastructures. The potential of big data increases when we apply it in real-time by providing real-time analysis, predictions, and ...

  22. PDF Big data in cybersecurity: a survey of applications and ...

    Big data analytics, as identified in [6], is a complex pro-cess in which large and varied datasets are examined to uncover information including unknown correlations, hid-den patterns, market trends and customer preferences that can help organizations make informed decisions. Big data analytics often involves predictive models, sta-

  23. Big Data and Big Data Analytics: Concepts, Types and Technologies

    The term "Big Data" refers to the evolution and use of. technologies that provide the right user at the right time with. the right information from a mass of data that has been. growing ...

  24. Role of Big Data Analytics in Audit Quality using IS Success Model as a

    Impact of Data Mining, Big Data Analytics and Data Visualization on Audit Quality ICCMB '23: Proceedings of the 2023 6th International Conference on Computers in Management and Business The existence of various financial statement scandals has made users of financial statements doubt the quality of the audit and demand auditor to provide an ...

  25. The use of Big Data Analytics in healthcare

    The introduction of Big Data Analytics (BDA) in healthcare will allow to use new technologies both in treatment of patients and health management. The paper aims at analyzing the possibilities of using Big Data Analytics in healthcare. The research is based on a critical analysis of the literature, as well as the presentation of selected results of direct research on the use of Big Data ...

  26. vtucode » Big Data Analytics 21CS71

    Machine Learning Algorithms for Big Data Analytics: Introduction, Estimating the relationships, Outliers, Variances, Probability Distributions, and Correlations, Regression analysis, Finding Similar Items, Similarity of Sets and Collaborative Filtering, Frequent Itemsets and Association Rule Mining. Text, Web Content, Link, and Social Network Analytics: Introduction, Text mining, Web Mining ...