Skip to main content
Yearbook of Medical Informatics logoLink to Yearbook of Medical Informatics
. 2014 Aug 15;9(1):36–41. doi: 10.15265/IY-2014-0012

Challenges and Potential Solutions for Big Data Implementations in Developing Countries

D R Luna 1,, JC Mayan 1, MJ García 1, AA Almerares 1, M Househ 2
PMCID: PMC4287095  PMID: 25123719

Summary

Background

The volume of data, the velocity with which they are generated, and their variety and lack of structure hinder their use. This creates the need to change the way information is captured, stored, processed, and analyzed, leading to the paradigm shift called Big Data.

Objectives

To describe the challenges and possible solutions for developing countries when implementing Big Data projects in the health sector.

Methods

A non-systematic review of the literature was performed in PubMed and Google Scholar. The following keywords were used: “big data”, “developing countries”, “data mining”, “health information systems”, and “computing methodologies”. A thematic review of selected articles was performed.

Results

There are challenges when implementing any Big Data program including exponential growth of data, special infrastructure needs, need for a trained workforce, need to agree on interoperability standards, privacy and security issues, and the need to include people, processes, and policies to ensure their adoption. Developing countries have particular characteristics that hinder further development of these projects.

Conclusions

The advent of Big Data promises great opportunities for the healthcare field. In this article, we attempt to describe the challenges developing countries would face and enumerate the options to be used to achieve successful implementations of Big Data programs.

Keywords: Big data, developing countries, computing methodologies, data mining, health information systems

Introduction

“It’s the data, stupid”. Jim Gray

Over the last decade, there has been a data explosion, caused by the exponential increase of data generation through the progressive digitization of virtually every aspect of everyday life. As a result, more data have been created in the past 2 years than in the entire history of mankind [1].

Some aspects of this new reality hinder data use, e.g. their volume, the speed with which they are generated and their diversity. This situation creates the need to change the way data are captured, stored, processed, and analyzed [2, 3].

The first organizations to use Big Data (BD) techniques were information technology (IT) companies like Google, which created most of the software infrastructure needed to manage data in a scalable manner [4-8]. Although organizations in the healthcare field have been more cautious to use BD in their processes, this attitude is changing [9]. The adoption of electronic health records has grown considerably in recent years, with a subsequent increase in the amount of information generated [10], either quantitative (e.g. lab results), qualitative (e.g. notes), or transactional (e.g. medication orders) [11]. Furthermore, the emergence of genomics and the shift in focus to personalized medicine will push healthcare organizations towards the adoption of BD techniques [12].

Given the growing importance of BD, and the potential benefits of its use in the healthcare field, the objective of this paper is to describe the challenges that developing countries face when implementing BD projects in healthcare, and ways in which they can solve them.

A non-systematic review of the literature based on the bibliographic references in MEDLINE was performed, broadened by a Google Scholar search. The following keywords and their combinations were used: “big data”, “developing countries”, “health information systems”, “data mining”, and “computing methodologies”. The search was performed without date or language filters.

A thematic review of the selected articles was performed, in order to provide a conceptual approach to BD in general and its fields of application. Afterwards, the factors that constitute challenges for BD projects, regardless of the application domain, were enumerated. Finally, certain categories were established to gather the challenges faced by developing countries, and actions to overcome them were proposed.

Definition and Conceptual Framework

The term BD lacks a consistent definition, but it can be defined as a new paradigm, with a different outlook on data and how to analyze them, and a set of technologies designed to extract value from data, which could not possibly be gained using previously instituted techniques, given their volume, the speed with which they are acquired, and their diversity [13-15].

The term emerged in the 1990s, referring to the growth in volume of data [16]. In 2001, Doug Laney enriched the conceptualization of the term, stating that BD is characterized by three qualities, the 3 V’s: volume, velocity, and variety [17]. Recently, there have been other candidates to integrate this group, the most widespread being veracity [18] (See Figure 1):

Fig. 1.

Fig. 1

Adapted from [16] Analytics: the real world use of big data

  • ‘Volume’ refers to the exponential increase in the amount of data that are generated and stored. It is estimated that data production will be 44 times greater in 2020 than it was in 2009. BD techniques seek to generate knowledge from these large amounts of data.

  • ‘Velocity’ represents the increase in frequency with which data are delivered. The growth of integrated sensors in all types of devices, and the increasing adoption rates of mobile phones worldwide contribute to the continuous influx of data. BD projects seek to use these data to enable decision making in real time.

  • ‘Variety’ describes the different formats data can adopt, such as images, free text, video, and sound, among others. BD tries to harness existing data even if they are not structured or they have non-standard formats.

  • ‘Veracity’ refers to the confidence level associated with certain types of data. It is impossible to remove the inherent unpredictability of some data, such as weather data, or purchasing decisions of the public, so it is necessary to incorporate this dimension when planning BD projects.

“Business Intelligence” (BI) is a set of techniques used to analyze consolidated information and business processes in order to achieve improvements in competitiveness [19]. BI differs from BD in that BI requires well consolidated data to generate knowledge, whereas in a BD resource this is usually not possible, because data are more diverse, and new data are added constantly.

The adoption of BD is growing, generating benefits across multiple sectors. With the development of these techniques, the acceleration of scientific discovery and innovation is expected, as well as improvements in the understanding of human processes and social interactions, the acceleration of economic growth and improvements in health and quality of life [20].

There are several ways in which BD can help healthcare organizations to improve quality and efficiency [11, 21-24]:

  • generating new knowledge, for example allowing the creation of an observational evidence base to answer questions that are very difficult to answer using clinical trials;

  • disseminating knowledge, helping physicians through clinical decision support systems (CDSSs), which can provide suggestions and predictions in real time based on the personal data of patients;

  • translating personalized medicine initiatives into clinical practice by providing the opportunity to use analytical capabilities, which can integrate genomics with data extracted from medical records;

  • empowering patients by providing them with personalized information regarding their health and healthcare;

  • improving epidemiological surveillance: tools could be developed and integrated to medical records to predict the occurrence of highly prevalent or deadly diseases in the population.

Implementation Challenges of Big Data Projects

Even though many benefits are expected with the implementation of BD projects in all areas, there are difficulties common to all of them, regardless of whether they are conducted in developing or developed countries. Developing countries and their healthcare industries in particular, have unique characteristics that merit special analysis on the challenges faced by the application of BD and the ways they can be surmounted.

In this section we develop six broad categories to organize the content; with each domain we describe at first the difficulties that are common to all BD projects, and finally the challenges and opportunities to overcome them that are specific to developing countries and the healthcare sector.

Data Capture

As we stated previously, the exponential growth in the amounts of data generated is one of the most important factors to understand the BD phenomenon. Data sets are becoming larger and more difficult to manage using traditional database tools. As a result, organizations are faced with difficulties to capture, store, manage, and analyze data in a timely manner [15].

Consequently, this situation creates new infrastructure needs, and significant economic costs. Fortunately, storage costs are also decreasing. This allows for the capture of useful data, such as location data, which permit the mapping of real-time events for epidemiological surveillance.

Perhaps the greatest potential regarding the capture of data for developing countries originates from the progress mobile networks are making [25], and the growing market penetration of cell phones [26], which hold more sensors at lower prices with every passing year.

The growing adoption of mobile phones, 80% of which are located in developing countries [27], offers the possibility to use the data they provide to improve development programs. For example, SMS for Life uses a combination of mobile phones, SMS messages, the Internet, and electronic mapping technology to track weekly stock levels of malaria drugs at public health facilities. This program improved the distribution of malaria drugs in rural Tanzania, reducing facilities without stock from 78% to 26% [28]. In 2013, this initiative encompassed several countries in sub-Saharan Africa from Ghana to Kenya, with plans to increase the number of countries reached [29].

Infrastructure

A robust physical infrastructure is a key point for the operation and scalability of a BD project. It is based on a distributed model, where data can be physically stored in different places and integrated through networks. The fundamental condition to take advantage of this capacity lies in the quality of telecommunications, which offer a gateway to BD projects [30].

To perform the intensive analytical tasks required by BD projects, a special server architecture is necessary, comprising thousands of nodes with multiple processors and disks connected by a high speed network working in a distributed way [31]. Large Internet companies like Google, Microsoft, Yahoo, and Amazon use this architecture with centers distributed throughout the world offering their services.

All these changes in infrastructure involve substantial costs, generating economies of scale that favor large Internet companies [32], which take advantage of these barriers to provide infrastructure as a service (IaaS) to organizations who cannot afford them [33].

In addition, apart from the hardware infrastructure, an additional component is required: the software used to implement BD. The production, adoption, and adaptation of this software are key ingredients for BD, and require a properly trained workforce [30].

Many developing countries lack the storage and communications infrastructure needed to organize and integrate the amount of information that is generated in a BD project. Not only do these countries lack these resources, but they don’t have the computing capacity to analyze them. Deficiencies in electrical grids and telecommunication networks are common to many of them, and technological improvement projects have to compete with more pressing needs such as alimentation [34-36].

The vast majority of the necessary hardware resides in developed countries, and access to information and resources is skewed by a very unequal distribution of telecommunication capabilities to access them [30].

As an inexpensive alternative, it is possible to use clusters of video game consoles to substitute the supercomputers used in the analysis of BD [37, 38]. Another possibility is the use of resources present in developed countries, through private companies, which provide the necessary IaaS, such as Amazon Web Services and Google Compute Engine [39, 40], usually at a competitive price, through the cloud. These resources are useful because the efficiency these companies achieved allows them to offer safe and high quality services at affordable prices.

One example of the use of IaaS in developing countries is DHIS 2, a tool for the collection, validation, analysis, and presentation of aggregate statistical data for health information management activities. DHIS 2 has been implemented in more than 30 countries in Africa, Asia, Latin America, and the South Pacific, and countries that have adopted DHIS 2 as their nation-wide health information system software include Kenya, Tanzania, Uganda, Rwanda, Ghana, Liberia, and Bangladesh [41].

Regarding software used for organizing, integrating, and analyzing data, production is limited by the lack of a trained workforce, and the possibility to purchase or license the necessary systems is often not an option for developing countries. However, there are open source options with strong communities that provide the necessary functionalities for free. The most outstanding example is Apache Hadoop [42], a platform for processing large amounts of data distributed on computer clusters used by companies like Yahoo and Facebook.

Organizational Changes - Workforce

According to Villars et al, BD deployments require new IT administration and application developer skill sets. Additionally, the people who possess these skills are a scarce resource given the high market demand [15]. Hal Varian, Google’s chief economist, contends that statisticians will have the job most in demand in the next decade [43].

To take advantage of the opportunity created by BD, trained human resources are needed, with the ability to manage and analyze data, with knowledge in computer science, statistics, and mathematics. These resources are not only scarce in developing countries: it is estimated that the United States will have a shortage of 160,000 professionals with these skills by 2018 [44]. Most of these scientists are recruited by major technology companies in the core countries, with the consequence that countries with fewer resources will suffer this deficit even more.

Some developing countries are better positioned in this regard, including Brazil, Russia, India and China (the BRIC countries). In 2008, 40% of the specialized resources were trained in these countries [30].

As Internet and technological advances allow the outsourcing of infrastructures, there also exists the possibility to recruit the human resources needed for a BD project over the web. As an example, the Kaggle platform allows any organization to set a prize, and specialists from around the world can compete to solve BD problems [45]. Ultimately, this possibility depends on the economic resources that can be offered. One important example of a nonprofit organization is Datakind, a group of data scientists that work with high impact social organizations to improve their decision making processes [46].

Integration and Interoperability

One of the greatest challenges BD faces is to integrate data from many different sources. The use of standards to achieve interoperability between systems is a core requirement to effectively integrate information [47].

The major difficulty for achieving interoperability among multiple repositories of BD lies in the differences in the metadata used in one repository with respect to other repositories. Without standards for these metadata, the integration of data generated in BD projects will be even more challenging [48].

Health information systems are often fragmented and isolated in information silos hindering analysis and improvements in healthcare assistance [49]. This problem requires a political rather than a technological solution. In most cases, the required standards for systems to interoperate already exist, and they are the same in developing countries than in developed countries [50]. It is necessary to achieve consensus between government organizations, businesses, and stakeholders in order to advance in the development of digital agendas.

Developed countries have made progress in spreading digital agendas in the last decade, and are now better positioned than developing countries, although lately this gap is narrowing. According to the World Health Organization (WHO), since 2008 more than 20 developing countries are in the process of implementing strategic plans for eHealth [51].

The WHO and the International Telecommunications Union (ITU) published a document in order to help countries in the process of generating a national eHealth vision and an action plan (National eHealth Strategy Toolkit) [52]. These resources are especially useful for governments in developing countries.

Privacy and Security

Some characteristics of BD, such as the relative lack of structure and the informal nature of some data, can be a problem if they are sensitive, with potential privacy, safety or legal issues. Traditional database management systems support granular security policies that protect data at various levels. The software used in BD projects does not usually have these safety measures [15].

Another important challenge includes the security infrastructure and privacy policies. It is crucial to apply not only legal but also ethical considerations on the security of the data as soon as possible. The development of strategies to report on how data are collected, how they are protected, and how they will be used should be considered and recognized as a necessity [53].

Likewise, an action plan should be contemplated in case of possible data losses or security breaches. Sharing information in a clear and careful way will help reduce concerns related to security and privacy [54].

It is essential to ensure the privacy and confidentiality of personal data, especially with regard to the use of BD in healthcare. These factors should be considered part of the structure of a BD project from the beginning.

In the U.S., the Health Insurance Portability and Accountability Act’s (HIPAA) privacy and security rules govern how personal information should be protected, and define what safety standards should be applied [55, 56]. A regulatory framework is essential to build trust between all parties involved. This is why developing countries are advised to advance policies and regulatory frameworks in order to ensure the privacy and security of sensitive data.

Whatever the data, when they are related to humans, safety concerns will inevitably arise. If the goal is to share data, those who provide them have to be able to trust those who assume the responsibility of caring for their information [57, 58]. This will only be achieved with an appropriate regulatory framework.

Adoption

Data should be managed as a strategic asset within organizations. Existing barriers to the adoption of BD are usually cultural. Many organizations do not implement BD programs because they cannot appreciate the way in which data analysis can enhance their businesses [15].

Defining objectives and expected outcomes are critical in order to establish a governance capable to sustain projects of this magnitude. A BD program should include the people, processes, and policies needed [59].

The difficulties that were previously reviewed: economic issues, poor infrastructure, and lack of trained personnel, are common to most developing countries, and generate a gap in the adoption of BD as compared to developed countries that is equivalent to the digital divide [30].

Some ways to accelerate the adoption of BD techniques in developing countries are simple, such as sharing experiences and lessons learned [36]. Currently, developing countries have more access to sources of scientific information, due to the increased penetration of Internet, the emergence of the Open Access movement, which allows to access to scientific articles of prestigious publications for free, and the advent of new tools for searching scientific literature, like Google Scholar. A recent paper shows that Google Scholar provides greater access to free full-text articles than PubMed [60].

Spearheading a movement of BD for development, the United Nations launched Global Pulse in 2009, an innovation lab that aims to raise awareness of BD opportunities and catalyze the adoption of BD tools to help policymakers understand human wellbeing and emerging vulnerabilities in real time [61]. Other international institutions, like the World Economic Forum and the OECD are working to use BD for development goals [62, 63].

The implementation of strategic partnerships allowing for regional integrations will help developing countries reduce costs and lower the difficulty of implementing BD projects. If these countries want to adopt BD, it is essential for their governments to have a strategic vision on information technology, and implement regulatory mechanisms as well as incentives for its development.

Conclusions

The advent of BD provides great opportunities for the healthcare field. There are challenges to the implementation of any BD program, consisting of the exponential growth of data, the special infrastructure needed to analyze them, the need for a specially trained workforce, the need to agree on interoperability standards to integrate data, the privacy and security risks involved in these projects; and the requirement of a strategic vision that contemplates the need to include people, processes, and policies to ensure their adoption.

Developing countries have particular characteristics that hinder further development of these projects. In this article we attempted to describe these challenges focusing on the options these countries could use to achieve successful implementations of BD programs. We hope this article will be useful for those in charge of making policies or leading BD projects in developing countries.

Table 1.

Recommendations for developing countries

Data capture Take advantage of the high penetration rates of mobile phones to collect usage-associated data and sensor data for innovative BD projects.
Infrastructure Circumvent infrastructure and economic deficits using IaaS and open source software.
Organizational changes - Workforce Increase the number of data scientists trained. Make partnerships with nonprofit organizations like Datakind when trained resources are needed.
Integration and interoperability Advance in the creation and adoption of digital agendas.
Privacy and security Institute policies and regulatory frameworks to ensure the privacy and security of sensitive data.
Adoption Implement strategic partnerships with private and public institutions with expertise in BD tools and techniques.

Acknowledgements

The authors would like to thank Mariana Licia Napoli for her contribution to the translation of the original manuscript.

References


Articles from Yearbook of Medical Informatics are provided here courtesy of Thieme Medical Publishers

RESOURCES