Impact of YouTube Advertising on Sales with Regression Analysis and Statistical Modeling: Usefulness of Online Media in Business

Yang Zhou; Zubair Ahmad; Hassan Alsuhabi; M Yusuf; Ibrahim Alkhairy; A M Sharawy

doi:10.1155/2021/9863155

. 2021 Sep 7;2021:9863155. doi: 10.1155/2021/9863155

Impact of YouTube Advertising on Sales with Regression Analysis and Statistical Modeling: Usefulness of Online Media in Business

Yang Zhou ¹, Zubair Ahmad ^2,^✉, Hassan Alsuhabi ³, M Yusuf ⁴, Ibrahim Alkhairy ³, A M Sharawy ⁵

PMCID: PMC8443353 PMID: 34539772

Abstract

Computer technology plays a prominent role in almost every aspect of daily life including education, health care, online shopping, advertising, and even in homes. Computers help to make daily tasks much easier and convenient. Among social media, YouTube is a well-known social sharing networking service. As more and more people join social media and become everyday users, brands have also increased their online engagement. However, it is still unclear how to effectively measure value and return on advertising using social media. As of 2021, more than 31 million YouTube channels around the globe have been opened. In this paper, we consider YouTube advertising to check its effectiveness and benefits gained. Certain statistical tools are adopted to measure the extent of advertising benefits and their correlation in creating effective advertising campaigns on YouTube. Simple linear regression analysis is performed on the data representing the YouTube advertising budget of a company and the sales data of that company. Furthermore, we develop a new statistical distribution to provide the best description of the YouTube advertising data. The result of this research shows that YouTube is an effective medium for advertising and has a strong relationship with sales.

1. Introduction

Marketing is a collection of all those strategies that a company adapt to convey their messages or brands to their concerned audience. It has a key role in motivating the consumers to buy the company's brand or product [1]. Marketers can promote their brands directly to businesses (also called B2B marketing) or direct their products to consumers (also called B2C marketing). Basically, marketing has four principles (4Ps) such as (i) Product (P1), (ii) Price (P2), (iii) Place (P3), and (iv) Promotion (P4). These 4Ps are collectively known as marketing matrix [2].

The P1 refers to the company's services or products offered to their consumers. It deals with the warranty, packaging, appearance, quality, and so on. The P2 refers to the setup of the product's price. It not only deals with the selling price but also deals with the payment arrangement, discount, and credit terms. The P3 deals with the identification of the location where the company's product/service is made or distributed. The P4 includes the activities to influence the customer's decision and make the business known to them [3].

In the literature, numerous strategies (online and print mediums) have been suggested for marketing. However, among the available strategies, online advertising or online marketing is the most effective to reach the maximum audience. A number of venues are available for online marketing such as Facebook, YouTube, Twitter, Flickr, Pinterest, and Instagram [4]. Among the possible venues for online marketing, YouTube is one of the most effective platforms for online marketing (see Djafarova and Matson [5]; Pleyers and Vermeulen [6]; Semeradova and Weinlich [7]; Acikgoz and Burnaz [8]; and Al-Maroof et al. [9]).

YouTube is the second most popular SE (search engine) around the globe and provides an effective way of advertisement to capture consumer's attention. Around the mid of 2005, YouTube shared its first video, and since that grew rapidly. By March 2019, YouTube crossed a number of 1.5 billion active monthly users. Due to many active users, it attracted the attention of different business firms to spend more and more on advertising through YouTube. According to Abdelkader [10], the top hundred (100) advertisers of YouTube have increased their spending budget by over 60% annually.

In this paper, we use the YouTube medium as an advertising tool and test its impact on the sales of a company. To check its usefulness, a widely used statistical technique called SLRM (simple linear regression model) is adopted. In this regard, we test a claim (also called a hypothesis) using two different statistical tests, such as t-test and F-test. To carry out the statistical analysis, the NH (null hypothesis) H₀ and AH (alternative hypothesis) H₁ are formulated as H₀ = YouTube advertising has no significant relationship with sales vs.H₁ = YouTube advertising has a significant relationship with sales.

Besides the regression analysis, a new SD (statistical distribution) is proposed to model the YouTube advertising data. The new SD is called a HTBPT-Lomax (heavy-tailed beta power transformed Lomax) distribution. The HTBPT-Lomax is very flexible and possesses the HT (heavy-tailed) characteristics.

2. Methodology

In the practice of economic studies, regression analysis (RA) is a prominent technique that helps econometricians to know about how the dependent variable changes in relation to changes in independent variables [11]. In simple words, the RA helps to understand how the likelihood of the sale (dependent variable) is impacted by price or quantity purchased (independent variables) (see Nunez et al. [12]). There are main two types of RA, called (i) simple linear RA (SLRA) and (ii) multiple linear RA (MLRA).

In this work, we focus our study on SLRA, only. The SLRA assists to measure the relationship between Y (the output of the regression model) and an explanatory variable X (the input of the regression model). The simple linear regression model (SLRM) is defined by

\begin{matrix} Y = β_{0} + β_{1} X + ε, \end{matrix}

(1)

where

Y represents the outcome of the model that is what we are trying to predict.
X represents the input of the model that helps in predicting Y.
β₀ is called the intercept of the model. If X=0 (it means that X has no effect on Y), then Y=β₀.
β₁ is called the slope of the model and represents per unit changes in the outcome of the regression model.
ε represents the residual error term (RET) having a mean or an average value of 0.

3. Regression Analysis

The RA is widely used for two different conceptual purposes. First, regression analysis is used for prediction and forecasting, where its uses are closely related to the field of machine learning. Second, regression analysis is used to establish a causal relationship (CR) between X (predictor variables) and Y (response variable).

The RA has many applications in insurance, finance, and business, among others. In business and finance, RA is used to calculate the Beta (return volatility relative to the entire market) for a stock. The RA can also be used to predict the returns of business or predict business performance. This section offers RA to predict the Y (sale) based on the predictor variable (YouTube advertising).

3.1. Simple Linear Regression Model

The SLRM to explain the relationship between YouTube advertising and sales is given by

\begin{matrix} Y = β_{0} + β_{1} YouTube + ε . \end{matrix}

(2)

After performing the regression technique, we observe that the value of β₀ is 4.84708, which represents the predicted/estimated dollar sales (in thousands) for spending no advertising budget on the YouTube medium. Henceforth, for spending nothing on the YouTube advertising, the expected sale (ES) is 4.84708∗1000=$4847. The slope of the model provided in equation (2) is 0.04802 indicating 48 (1000∗0.04802) units increment in the sales. So, spending money on the YouTube medium, the ES is 4.84708+0.04802∗1000=52.86708, representing a sale of $52867. Corresponding to equation (2), the fitted regression model is given by

\begin{matrix} Y = 4.84708 + 0.04802 YouTube + ε . \end{matrix}

(3)

A visual display of the relationship between YouTube advertising and sales is provided in Figure 1. The plot obtained in Figure 1 represents a positive relationship. Therefore, spending money on YouTube advertising results an increase in the sale.

3.2. Hypothesis Testing

We adopt a well-known statistical procedure (hypothesis testing) to test the significance of YouTube advertising on sales. To carry out the analysis, the null (H₀) hypothesis and alternative (H₁) hypothesis can be formulated as H₀ = YouTube advertising has no significant relationship with sales vs. H₁ = YouTube advertising has a significant relationship with sales.

The standard error (SE) is very useful in performing hypothesis testing to test the regression coefficients (RCs). The SE measures the reliability of the coefficient estimates (CEs) and quantifies how far the CEs vary from the actual average/mean value of Y.

3.3. t-Test

To test H₀, first, we have to find whether the estimate of the regression coefficient β₁ is far from 0 or not. If the SE of the estimate of β₁ is too small, then even a small value of the estimate of β₁ will provide sufficient evidence against H₀. We use the t-test to measure how far β₁ is from zero. After implementing the t-statistic, the obtained results are provided in Table 1.

Table 1.

Numerical results of SLRM based on t-test.

Adv. media	RCs	Estimated values	SE	t-statistic	Pr(>\|t\|)
YouTube	β ₀	4.84707	0.39901	12.14700	2e − 16
YouTube	β ₁	0.04801	0.00482	9.95900	2e − 16

Open in a new tab

The value of the t-statistic shows how far the CE is from zero. Relative to SE, a larger value of the t-statistic provides evidence against H₀ and indicates that Y is associated with X. The value Pr(>|t|) indicates that the p value is greater than the t-statistic. The smaller the p value, the more chances to reject H₀.

From Table 1, it is obvious that the value of the t-statistic (for YouTube advertising) is far from zero, and the p value < 0.05 indicate that the value of β₁ is not equal to zero. Based on the above results and discussion, we can obtain that there is sufficient evidence to reject H₀.

3.4. F-Test

Here, we implement another powerful statistical test (called F-test) to check the impact of YouTube advertising on sales. If the value of the F-statistic is far from zero, then it is indicating a positive impact of YouTube advertising on sales. As given in Table 2, the value of the F-statistic is 99.18. Henceforth, using YouTube advertising medium as a predictor variable to predict Y indicates the better model.

Table 2.

Numerical results of SLRM based on F-test.

Adv. media	R ²	Adjusted R²	F-statistic	p value	Degree of freedom
YouTube	0.4366	0.4322	99.18	2.2e − 16	1 and 128

Open in a new tab

The R square (R²) is one of the most powerful/important statistical quantities used for measuring the quality of the model fit, and its values range from 0 to 1. The R² deals with the linear relationship between the predictor variable and the response variable. For a particular model, if the value of R² is near to 0 (near to 1), it represents the poor fit (the better fit). In this study, the value of R² is 0.4366 indicating that the sale can be increased up to 43.66%.

3.5. Residuals

In statistics and optimization, the residuals represent the deviation of an observed value of an element and its theoretical value. In regression analysis, the residual is the difference between any data point and the regression line. Sometimes they are also known as an error. An error in this context does not mean that something is wrong with the analysis; it just means that there is an unexplained difference between the observed and theoretical values. In simple words, the residual is the error that is not explained by the regression line.

The residual, represented by ε, can also be expressed by an equation. The term ε is the difference between observed value y and predicted value $\hat{y}$ . Mathematically, we have

\begin{matrix} ε = y - \hat{y} . \end{matrix}

(4)

The residual SE measures the quality of the fit of the regression model [13]. In the context of this study, different plots for the behavior of the residual are presented in Figure 2. From Figure 2, we can see that

The red line in the residual vs. fitted plot (see Figure 2) lies closer to the residual value of 0. Therefore, based on the residual vs. fitted plot in Figure 2, we can say that the residuals of the model are linearly related. Linearity means that the predicted variable in the regression model has a straight-line relationship with Y.
Homoscedasticity is a fundamental assumption of linear regression models. If this assumption is violated, the problem of heteroscedasticity arises. The scale-location plot shows the fact that the residuals satisfy the homoscedasticity property.
In RA, an observation whose deletion from the data has a significant effect on the estimates of the model parameters is called influential observation. The residual vs. leverage plot shows that there are fewer influential observations.
The plot of the quantile-quantile (Q-Q) function is a visual approach to check the normality. The Q-Q plot makes an angle of 45° (see Figure 2), which leads to the fact that the residuals are approximately normally distributed.

Graphical representations of the residuals.

3.6. Outlier Test

In this subsection, we perform the outlier test to detect whether there are outliers in the residual's data or not. After performing the outlier test, we observe that the 23^rd observation has the largest error. We can also see that the outlier is present as shown in box plot provided in Figure 3. Furthermore, we check the influential observations by using Cook's distance. Any observation that is far from Cook's distance is known as influential observation. We use the standard cut-off rule of 4/n to identify the influential observations. Here, we can see that the 23^rd observation is far from Cook's distance, representing the influential observation.

Box plot and Cook's distance plot of the YouTube advertising data.

3.7. Correlation Test

The correlation test is used to evaluate the association between two or more variables. Here, we have two variables (YouTube advertising and sales); therefore, we use the Pearson correlation analysis approach which measures a linear dependence between two variables. The Pearson correlation coefficient, denoted r, is obtained as

\begin{matrix} r = \frac{\sum_{i = 1}^{130} (Youtube - M_{Youtube}) (Sales - M_{Sales})}{\sum_{i = 1}^{130} {(Youtube - M_{Youtube})}^{2} {(Sales - M_{Sales})}^{2}}, \end{matrix}

(5)

where M_YouTube and M_Sales are the means of YouTube and sales, respectively. The p value (also called significance level) of the correlation can be obtained either by (i) using the correlation coefficient table with degree of freedom: n-2, where n represents the number of observations of YouTube and sales data or (ii) calculating t value, given by

\begin{matrix} t = \frac{r}{\sqrt{1 - r^{2}}} \sqrt{n - 2} . \end{matrix}

(6)

It is worthwhile to note that if the p value is <0.05, then the correlation between YouTube advertising and sales is significant. Using the above procedure, we observe that r=0.66073, which shows that there is a positive relationship between YouTube advertising and sales (see Figure 4). We also found that the p value is 2.2e − 16. Since the p value is less than 0.05, therefore, we reject the hypothesis of no relationship between YouTube advertising and sales.

4. Statistical Modeling

After showing the impact of YouTube advertising in the above sections, we now introduce a new statistical model for analyzing the YouTube advertising data. This section consists of three subsections: (i) the first phase of this section deals with the introduction of the statistical model, (ii) the second subsection deals with the parameter estimation, and (iii) the third section deals with the modeling of YouTube advertising data.

4.1. A New Statistical Distribution

The introduction of the new statistical distributions to model real phenomena is a prominent research topic, that is, quite rich and still increasing continuously. Among the applied fields, the statistical distributions play a prominent role to model financial and actuarial data sets. For example, Zhu and Galbraith [14] introduced a generalized asymmetric Student-t (GAS-t) distribution for analyzing econometric and financial data. Marchant et al. [15] studied the generalized Birnbaum–Saunders (GBS) distribution and analyzed data in management sciences. Nadarajah and Bakar [16] applied new composite models (CMs) to Danish fire insurance data. Theodossiou [17] considered the skewed generalized error (SGE) distribution for financial assets and returns. Bhati and Ravi [18] studied the generalized log-Moyal (GLM) distribution and analyzed the Norwegian fire insurance loss data. Punzo et al. [19] suggested finite mixtures of contaminated gamma (FMCG) for fitting econometric data. Punzo [20] used inverse Gaussian (IGa) distribution for modeling insurance and econometric data. Ahmad et al. [21] proposed a class of claim (CC) distributions and applied it to insurance claim data. Ahmad et al. [22] introduced the Z-Weibull distribution for analyzing the earthquake insurance data. Ahmad et al. [23] introduced new methods for generating heavy-tailed (HT) distributions and analyzed insurance data. Punzo and Bagnato [24] used the Laplace scale mixtures (LSMs) for modeling data related to cryptocurrencies. Tung et al. [25] introduced a new statistical distribution for modeling medical care insurance data. Zhao et al. [26] proposed the Lomax-Claim (LC) model to analyze the financial data. For more details about the usefulness of statistical distributions in applied sciences, we refer to Ahmad et al. [27].

We further carry this branch of distribution theory and introduce a new distribution to model the YouTube advertising data. The proposed model may be called the heavy-tailed beta power transformed Lomax (HTBPT-Lomax) distribution.

The cumulative distribution function (CDF) U(y; ξ) of the Lomax distribution is given by

\begin{matrix} U (y; ξ) = 1 - {(1 + λ_{2} y)}^{- λ_{1}}, y \geq 0, λ_{1}, λ_{2} > 0, \end{matrix}

(7)

where ξ=(λ₁, λ₂). The respective PDF (probability density function) expressed by u(y; ξ) is

\begin{matrix} u (y; ξ) = λ_{1} λ_{2} {(1 + λ_{2} y)}^{- λ_{1} - 1}, y, λ_{1}, λ_{2} > 0. \end{matrix}

(8)

Recently, Zhao et al. [28] introduced a new family called heavy-tailed beta power transformed (HTBPT) family of distributions. Its CDF P(y; β, ξ) and PDF p(y; β, ξ) are given by

\begin{matrix} P (y; β, ξ) = β^{1 - U (y; ξ)} - β [1 - U (y; ξ)], y \in ℝ, β > 0, \\ p (y; β, ξ) = u (y; ξ) [β - \log (β) β^{1 - U (y; ξ)}], y \in ℝ, β > 0, \end{matrix}

(9)

respectively.

Using equation (7) in equation (9), we get the CDF of the HTBPT-Lomax model given by

\begin{matrix} P (y; β, ξ) = β^{{(1 + λ_{2} y)}^{- λ_{1}}} - β {(1 + λ_{2} y)}^{- λ_{1}}, y \geq 0, β, λ_{1}, λ_{2} > 0. \end{matrix}

(10)

The respective PDF is

\begin{matrix} p (y; β, ξ) = λ_{1} λ_{2} {(1 + λ_{2} y)}^{- λ_{1} - 1} [β - \log (β) β^{{(1 + λ_{2} y)}^{- λ_{1}}}], y > 0. \end{matrix}

(11)

Different plots of the HTBPT-Lomax PDF p(y; β, ξ) are provided in Figure 5. These plots are obtained for λ₁=1.2, λ₂=1, and β=1.9 (red line); λ₁=1.2, λ₂=1, and β=3.1 (green line); λ₁=1.2, λ₂=1, and β=5.5 (black line); and λ₁=1.2, λ₂=1, and β=8.2 (blue line).

4.2. Estimation

Here, the estimators $(\hat{λ_{1}}, \hat{λ_{2}}, \hat{β})$ of the parameters (λ₁, λ₂, β) are obtained. Consider a random sample, say y₁, y₂,…, y_n obtained from p(y; β, ξ). Then, corresponding to p(y; β, ξ), the log-likelihood function π(λ₁, λ₂, β) is

\begin{matrix} π (λ_{1}, λ_{2}, β) = n \log λ_{1} + n \log λ_{2} - (λ_{1} + 1) \sum_{k = 1}^{n} \log (1 + λ_{2} y_{k}) \\ + \sum_{k = 1}^{n} \log [β - \log (β) β^{{(1 + λ_{2} y_{k})}^{- λ_{1}}}] . \end{matrix}

(12)

Corresponding to π(λ₁, λ₂, β), the partial derivatives are

\begin{matrix} \frac{\partial}{\partial λ_{1}} π (λ_{1}, λ_{2}, β) = \frac{n}{λ_{1}} - \sum_{k = 1}^{n} \log (1 + λ_{2} y_{k}) + \sum_{k = 1}^{n} \frac{\log (β) \log (β^{(1 + λ_{2} y_{k})}) β^{{(1 + λ_{2} y_{k})}^{- λ_{1}}}}{[β - \log (β) β^{{(1 + λ_{2} y_{k})}^{- λ_{1}}}]}, \\ \frac{\partial}{\partial β} π (λ_{1}, λ_{2}, β) = \frac{n}{λ_{2}} - (λ_{1} + 1) \sum_{k = 1}^{n} \frac{y_{k}}{(1 + λ_{2} y_{k})} + λ_{1} \sum_{k = 1}^{n} \frac{y_{k} {(\log (β))}^{2} β^{{(1 + λ_{2} y_{k})}^{- λ_{1}}} {(1 + λ_{2} y_{k})}^{- λ_{1} - 1}}{[β - \log (β) β^{{(1 + λ_{2} y_{k})}^{- λ_{1}}}]}, \\ \frac{\partial}{\partial λ_{2}} π (λ_{1}, λ_{2}, β) = \sum_{k = 1}^{n} \frac{1 - β^{{(1 + λ_{2} y_{k})}^{- λ_{1} - 1}} (\log (β) {(1 + λ_{2} y_{k})}^{- λ_{1}} + 1)}{[β - \log (β) β^{{(1 + λ_{2} y_{k})}^{- λ_{1}}}]} . \end{matrix}

(13)

Equating the expressions (∂/∂λ₁)π(λ₁, λ₂, β), (∂/∂λ₂)π(λ₁, λ₂, β), and (∂/∂β)π(λ₁, λ₂, β) to zero, i.e., (∂/∂λ₁)π(λ₁, λ₂, β)=(∂/∂λ₂)π(λ₁, λ₂, β)=(∂/∂β)π(λ₁, λ₂, β)=0 and solving these equations provide the estimators of λ₁, λ₂, and β, respectively.

4.3. An Application to YouTube Advertising Data

This subsection deals with the application of the HTBPT-Lomax model using a data set related to the YouTube advertising data. The data are available at https://www.businessofapps.com/data/youtube-statistics/. The box plot of the YouTube advertising data is provided in Figure 6 whereas the basic measures (BMs) of the data are presented in Table 3.

The box plot of the YouTube advertising data.

Table 3.

The BMs of the YouTube advertising data.

Minimum	1st quartile	Median	Mean	3rd quartile	Maximum
8.79	88.09	156.00	160.26	243.21	352.75

Open in a new tab

The HTBPT-Lomax model is compared with the Lomax model and a prominent version of the Lomax model called exponentiated Lomax (E-Lomax) model. The CDF of the E-Lomax is

\begin{matrix} U (y; θ, ξ) = {(1 - {(1 + λ_{2} y)}^{- λ_{1}})}^{θ}, 5 y \geq 0, λ_{1}, λ_{2}, θ > 0. \end{matrix}

(14)

For assessing the best fitting capability of the HTBPT-Lomax and other competitors, certain discrimination measures (DMs) and goodness-of-fits tests with respective p value are considered. The DMs are given by

(i)
The AIC (Akaike information criterion):
$\begin{matrix} AIC = 2 k - 2 Δ . \end{matrix}$ (15)
(ii)
The CAIC (corrected Akaike information criterion):
$\begin{matrix} CAIC = \frac{2 k n}{n - k - 1} - 2 Δ . \end{matrix}$ (16)
(iii)
The BIC (Bayesian information criterion):
$\begin{matrix} BIC = k \log (n) - 2 Δ . \end{matrix}$ (17)
(iv)
The HQIC (Hannan–Quinn information criterion):
$\begin{matrix} HQIC = 2 k \log (\log (n)) - 2 Δ, \end{matrix}$ (18)
where Δ represents the log-likelihood function. The other statistical tests are given by
(v)
The AD (Anderson–Darling) test statistic:
$\begin{matrix} AD = - n - \frac{1}{n} \sum_{l = 1}^{n} (2 l - 1) [\log P (y_{l}) + \log \{1 - P (y_{n - l + 1})\}] . \end{matrix}$ (19)
(vi)
The CM (Cramér–von Mises) test statistic:
$\begin{matrix} CM = \frac{1}{12 n} + \sum_{l = 1}^{n} {[\frac{2 l - 1}{2 n} - P (y_{l})]}^{2} . \end{matrix}$ (20)
(vii)
The KS (Kolmogorov–Smirnov) test statistic:
$\begin{matrix} KS = \sup_{y} [P_{n} (y) - P (y)] . \end{matrix}$ (21)

For certain data, a model with larger p value and smaller statistical tests values represents the best fit to those data. Table 4 offers the MLEs of the models applied to the YouTube advertising data. The values of the DMs and statistical tests are listed in Tables 5 and 6, respectively. From Tables 5 and 6, we observe that the HTBPT-Lomax model is the best among the fitted models as it has the smallest values of the DMs and statistical tests and larger p value. This fact shows the importance of the HTBPT-Lomax distribution to deal with the data related to financial events.

Table 4.

The estimated values of the parameters corresponding to the YouTube advertising data.

Model	λ ₁	λ ₂	β	θ
HTBPT-Lomax	40.19293	0.01498	0.02453	—
Lomax	37.76721	0.00834	—	—
E-Lomax	40.94021	0.01188	—	2.11716

Open in a new tab

Table 5.

The DMs of the fitted models corresponding to the YouTube advertising data.

Model	AIC	CAIC	BIC	HQIC
HTBPT-Lomax	530.80220	530.99260	539.40480	534.29770
Lomax	569.18910	569.28360	574.92420	571.51950
E-Lomax	540.77130	540.96180	549.37390	544.26680

Open in a new tab

Table 6.

The analytical measures of the fitted models corresponding to the YouTube advertising data,.

Model	AD	CM	KS	p-value
HTBPT-Lomax	0.30471	2.12335	0.09510	0.19030
Lomax	0.42388	2.83853	0.19251	0.13070
E-Lomax	0.46870	3.10789	0.10482	0.11490

Open in a new tab

In addition to the numerical results provided in Tables 5 and 6, a visual display of the competing models is provided in Figures 7 and 8. For this activity, we plotted the probability-probability (P-P) and Q-Q functions of the fitted distributions (HTBPT-Lomax (red line), Lomax (blue line), and E-Lomax (green line) (see Figures 7 and 8).

The P-P plots of the HTBPT-Lomax, Lomax, and E-Lomax models.

The Q-Q plots of the HTBPT-Lomax, Lomax. and E-Lomax models.

5. Concluding Remarks

This research studied the relationship between social media marketing and sales. In this paper, we studied the effect of YouTube advertising on the sales and profit. The data and information were scientifically tested and analyzed. For scientific study and analysis, we considered a linear regression modeling approach along with two statistical tests such as t-test and F-test. Based on these tools, it is observed that there was a positive relationship between YouTube advertising and sales. Besides these tests, the correlation test was also performed, and it found that there is a positive correlation between YouTube advertising and sales. A positive correlation means that the more we spend money on the YouTube advertising, the more will be sales and profit. Finally, the HTBPT-Lomax distribution was applied to model the YouTube advertising data. Based on the certain statistical tools, it is showed that the HTBPT-Lomax model outclassed the competitors.

Appendix

The R − Code used for analysis under Section 5 is as follows:

y {\textless}-read.csv (file.choose (), header = TRUE)
y = y [, 1]
y = y [!is.na (y)]
data = y
data
%-----------------------------------------------------------------
%-------------- PDF
%-----------------------------------------------------------------
pdf_Rayleigh {\textless}-function (par, x)
{
Lambda1 = par [1]
Lambda2 = par [2]
beta = par [3]
Lambda1∗Lambda2∗((1 + Lambda2∗y)^(-Lambda1-1))∗
(beta-(log (beta))∗(beta^((1 + Lambda2∗y)^(-Lambda1))))
}
%-----------------------------------------------------------------
%-------------- CDF
%-----------------------------------------------------------------
cdf_pm {\textless}- function (par, x)
{
Lambda1 = par [1]
Lambda2 = par [2]
beta = par [3]
(beta^((1 + Lambda2∗y)^(-Lambda1)))-beta∗((1 + Lambda2∗y)^(-Lambda1))
}
set.seed (0)
goodness.fit (pdf = pdf_pm, cdf = cdf_pm,
starts = c (0.5, 0.5, 0.5), data = data,
method = “BFGS,” domain = c (0, Inf), mle = NULL)

Data Availability

The data set is available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest to report regarding the present study.

Authors' Contributions

Yang Zhou and Zubair Ahmad conceptualized the study. Zubair Ahmad developed methodology. Yang Zhou, Zubair Ahmad, Hassan Alsuhabi, and M. Yusuf wrote the original draft. Zubair Ahmad, Ibrahim Alkhairy, and A. M. Sharawy were responsible for formal analysis. Yang Zhou supervised the study. Yang Zhou, M. Yusuf, and A. M. Sharawy investigated the study. Zubair Ahmad, Ibrahim Alkhairy, and Hassan Alsuhabi reviewed and edited the manuscript.

References

1.Alves H., Fernandes C., Raposo M. Social media marketing: a literature review and implications. Psychology and Marketing. 2016;33(12):1029–1038. doi: 10.1002/mar.20936. [DOI] [Google Scholar]
2.Abishovna B. A. The principle of effective marketing management. Procedia-Social and Behavioral Sciences. 2014;109:1322–1325. doi: 10.1016/j.sbspro.2013.12.632. [DOI] [Google Scholar]
3.Seggie S. H., Cavusgil E., Phelan S. E. Measurement of return on marketing investment: a conceptual framework and the future of marketing metrics. Industrial Marketing Management. 2007;36(6):834–841. doi: 10.1016/j.indmarman.2006.11.001. [DOI] [Google Scholar]
4.Dwivedi Y. K., Kapoor K. K., Chen H. Social media marketing and advertising. The Marketing Review. 2015;15(3):289–309. doi: 10.1362/146934715x14441363377999. [DOI] [Google Scholar]
5.Djafarova E., Matson N. Credibility of digital influencers on YouTube and instagram. International Journal of Internet Marketing and Advertising. 2021;15(2):131–148. doi: 10.1504/ijima.2021.114338. [DOI] [Google Scholar]
6.Pleyers G., Vermeulen N. How does interactivity of online media hamper ad effectiveness. International Journal of Market Research. 2021;63(3):335–352. doi: 10.1177/1470785319867640. [DOI] [Google Scholar]
7.Semerádová T., Weinlich P. Research Anthology on Strategies for Using Social Media as a Service and Tool in Business. Pennsylvania, PA, USA: IGI Global; 2021. The (In) Effectiveness of in-stream video ads. [DOI] [Google Scholar]
8.Acikgoz F., Burnaz S. The influence of “influencer marketing” on YouTube influencers. International Journal of Internet Marketing and Advertising. 2021;15(2):201–219. doi: 10.1504/ijima.2021.114331. [DOI] [Google Scholar]
9.Al-Maroof R., Ayoubi K., Alhumaid K., et al. The acceptance of social media video for knowledge acquisition, sharing and application: a com-parative study among YouTube users and TikTok Users’ for medical purposes. International Journal of Data and Network Science. 2021;5(3):197–214. doi: 10.5267/j.ijdns.2021.6.013. [DOI] [Google Scholar]
10.Abdelkader O. A. A study of “forced-ad resistance” leading to “Skip Ad” on YouTube. Turkish Journal of Computer and Mathematics Education. 2021;12(10):7263–7271. [Google Scholar]
11.Scrucca L., Santucci A., Aversa F. Regression modeling of competing risk using R: an in depth guide for clinicians. Bone Marrow Transplantation. 2010;45(9):1388–1395. doi: 10.1038/bmt.2009.359. [DOI] [PubMed] [Google Scholar]
12.Núñez E., Steyerberg E. W., Núñez J. Regression modeling strategies. Revista Española de Cardiología (English Edition) 2011;64(6):501–507. doi: 10.1016/j.rec.2011.01.017. [DOI] [PubMed] [Google Scholar]
13.Zhang Z. Residuals and regression diagnostics: focusing on logistic regression. Annals of Translational Medicine. 2016;4(10):195–198. doi: 10.21037/atm.2016.03.36. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Zhu D., Galbraith J. W. A generalized asymmetric Student- distribution with application to financial econometrics. Journal of Econometrics. 2010;157(2):297–305. doi: 10.1016/j.jeconom.2010.01.013. [DOI] [Google Scholar]
15.Marchant C., Bertin K., Leiva V., Saulo H. Generalized Birnbaum-Saunders kernel density estimators and an analysis of financial data. Computational Statistics & Data Analysis. 2013;63:1–15. doi: 10.1016/j.csda.2013.01.013. [DOI] [Google Scholar]
16.Nadarajah S., Bakar S. A. A. New composite models for the Danish fire insurance data. Scandinavian Actuarial Journal. 2014;2014(2):180–187. doi: 10.1080/03461238.2012.695748. [DOI] [Google Scholar]
17.Theodossiou P. Skewed generalized error distribution of financial assets and option pricing. Multinational Finance Journal. 2015;19(4):223–266. doi: 10.17578/19-4-1. [DOI] [Google Scholar]
18.Bhati D., Ravi S. On generalized log-Moyal distribution: a new heavy tailed size distribution. Insurance: Mathematics and Economics. 2018;79:247–259. doi: 10.1016/j.insmatheco.2018.02.002. [DOI] [Google Scholar]
19.Punzo A., Mazza A., Maruotti A. Fitting insurance and economic data with outliers: a flexible approach based on finite mixtures of contaminated gamma distributions. Journal of Applied Statistics. 2018;45(14):2563–2584. doi: 10.1080/02664763.2018.1428288. [DOI] [Google Scholar]
20.Punzo A. A new look at the inverse Gaussian distribution with applications to insurance and economic data. Journal of Applied Statistics. 2019;46(7):1260–1287. doi: 10.1080/02664763.2018.1542668. [DOI] [Google Scholar]
21.Ahmad Z., Mahmoudi E., Hamedani G. A class of claim distributions: properties, characterizations and applications to insurance claim data. Communications in Statistics-Theory and Methods. 2020;49:1–26. doi: 10.1080/03610926.2020.1772306. [DOI] [Google Scholar]
22.Ahmad Z., Mahmoudi E., Kharazmi O. On modeling the earthquake insurance data via a new member of the TX family. Computational Intelligence and Neuroscience. 2020;2020:20. doi: 10.1155/2020/7631495.7631495 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Ahmad Z., Mahmoudi E., Hamedani G. G., Kharazmi O. New methods to define heavy-tailed distributions with applications to insurance data. Journal of Taibah University for Science. 2020;14(1):359–382. doi: 10.1080/16583655.2020.1741942. [DOI] [Google Scholar]
24.Punzo A., Bagnato L. Modeling the cryptocurrency return distribution via laplace scale mixtures. Physica A: Statistical Mechanics and its Applications. 2021;563 doi: 10.1016/j.physa.2020.125354.125354 [DOI] [Google Scholar]
25.Tung Y. L., Ahmad Z., Hamedani G. G. On modeling the medical care insurance data via a new statistical model. CMC-Computers Materials & Continua. 2021;66(1):113–126. [Google Scholar]
26.Zhao J., Ahmad Z., Mahmoudi E., Hafez E. H., Mohie El-Din M. M. A new class of heavy-tailed distributions: modeling and simulating actuarial measures. Complexity. 2021;2021:18. doi: 10.1155/2021/5580228.5580228 [DOI] [Google Scholar]
27.Ahmad Z., Hamedani G. G., Butt N. S. Recent developments in distribution theory: a brief survey and some new generalized classes of distributions. Pakistan Journal of Statistics and Operation Research. 2019;15(1):87–110. doi: 10.18187/pjsor.v15i1.2803. [DOI] [Google Scholar]
28.Zhao J., Faqiri H., Ahmad Z., Emam W., Yusuf M., Sharawy A. M. The lomax-claim model: bivariate extension and applications to financial data. Complexity. 2021;2021 doi: 10.1155/2021/9993611.9993611 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data set is available from the corresponding author upon request.

[B1] 1.Alves H., Fernandes C., Raposo M. Social media marketing: a literature review and implications. Psychology and Marketing. 2016;33(12):1029–1038. doi: 10.1002/mar.20936. [DOI] [Google Scholar]

[B2] 2.Abishovna B. A. The principle of effective marketing management. Procedia-Social and Behavioral Sciences. 2014;109:1322–1325. doi: 10.1016/j.sbspro.2013.12.632. [DOI] [Google Scholar]

[B3] 3.Seggie S. H., Cavusgil E., Phelan S. E. Measurement of return on marketing investment: a conceptual framework and the future of marketing metrics. Industrial Marketing Management. 2007;36(6):834–841. doi: 10.1016/j.indmarman.2006.11.001. [DOI] [Google Scholar]

[B4] 4.Dwivedi Y. K., Kapoor K. K., Chen H. Social media marketing and advertising. The Marketing Review. 2015;15(3):289–309. doi: 10.1362/146934715x14441363377999. [DOI] [Google Scholar]

[B5] 5.Djafarova E., Matson N. Credibility of digital influencers on YouTube and instagram. International Journal of Internet Marketing and Advertising. 2021;15(2):131–148. doi: 10.1504/ijima.2021.114338. [DOI] [Google Scholar]

[B6] 6.Pleyers G., Vermeulen N. How does interactivity of online media hamper ad effectiveness. International Journal of Market Research. 2021;63(3):335–352. doi: 10.1177/1470785319867640. [DOI] [Google Scholar]

[B7] 7.Semerádová T., Weinlich P. Research Anthology on Strategies for Using Social Media as a Service and Tool in Business. Pennsylvania, PA, USA: IGI Global; 2021. The (In) Effectiveness of in-stream video ads. [DOI] [Google Scholar]

[B8] 8.Acikgoz F., Burnaz S. The influence of “influencer marketing” on YouTube influencers. International Journal of Internet Marketing and Advertising. 2021;15(2):201–219. doi: 10.1504/ijima.2021.114331. [DOI] [Google Scholar]

[B9] 9.Al-Maroof R., Ayoubi K., Alhumaid K., et al. The acceptance of social media video for knowledge acquisition, sharing and application: a com-parative study among YouTube users and TikTok Users’ for medical purposes. International Journal of Data and Network Science. 2021;5(3):197–214. doi: 10.5267/j.ijdns.2021.6.013. [DOI] [Google Scholar]

[B10] 10.Abdelkader O. A. A study of “forced-ad resistance” leading to “Skip Ad” on YouTube. Turkish Journal of Computer and Mathematics Education. 2021;12(10):7263–7271. [Google Scholar]

[B11] 11.Scrucca L., Santucci A., Aversa F. Regression modeling of competing risk using R: an in depth guide for clinicians. Bone Marrow Transplantation. 2010;45(9):1388–1395. doi: 10.1038/bmt.2009.359. [DOI] [PubMed] [Google Scholar]

[B12] 12.Núñez E., Steyerberg E. W., Núñez J. Regression modeling strategies. Revista Española de Cardiología (English Edition) 2011;64(6):501–507. doi: 10.1016/j.rec.2011.01.017. [DOI] [PubMed] [Google Scholar]

[B13] 13.Zhang Z. Residuals and regression diagnostics: focusing on logistic regression. Annals of Translational Medicine. 2016;4(10):195–198. doi: 10.21037/atm.2016.03.36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14.Zhu D., Galbraith J. W. A generalized asymmetric Student- distribution with application to financial econometrics. Journal of Econometrics. 2010;157(2):297–305. doi: 10.1016/j.jeconom.2010.01.013. [DOI] [Google Scholar]

[B15] 15.Marchant C., Bertin K., Leiva V., Saulo H. Generalized Birnbaum-Saunders kernel density estimators and an analysis of financial data. Computational Statistics & Data Analysis. 2013;63:1–15. doi: 10.1016/j.csda.2013.01.013. [DOI] [Google Scholar]

[B16] 16.Nadarajah S., Bakar S. A. A. New composite models for the Danish fire insurance data. Scandinavian Actuarial Journal. 2014;2014(2):180–187. doi: 10.1080/03461238.2012.695748. [DOI] [Google Scholar]

[B17] 17.Theodossiou P. Skewed generalized error distribution of financial assets and option pricing. Multinational Finance Journal. 2015;19(4):223–266. doi: 10.17578/19-4-1. [DOI] [Google Scholar]

[B18] 18.Bhati D., Ravi S. On generalized log-Moyal distribution: a new heavy tailed size distribution. Insurance: Mathematics and Economics. 2018;79:247–259. doi: 10.1016/j.insmatheco.2018.02.002. [DOI] [Google Scholar]

[B19] 19.Punzo A., Mazza A., Maruotti A. Fitting insurance and economic data with outliers: a flexible approach based on finite mixtures of contaminated gamma distributions. Journal of Applied Statistics. 2018;45(14):2563–2584. doi: 10.1080/02664763.2018.1428288. [DOI] [Google Scholar]

[B20] 20.Punzo A. A new look at the inverse Gaussian distribution with applications to insurance and economic data. Journal of Applied Statistics. 2019;46(7):1260–1287. doi: 10.1080/02664763.2018.1542668. [DOI] [Google Scholar]

[B21] 21.Ahmad Z., Mahmoudi E., Hamedani G. A class of claim distributions: properties, characterizations and applications to insurance claim data. Communications in Statistics-Theory and Methods. 2020;49:1–26. doi: 10.1080/03610926.2020.1772306. [DOI] [Google Scholar]

[B22] 22.Ahmad Z., Mahmoudi E., Kharazmi O. On modeling the earthquake insurance data via a new member of the TX family. Computational Intelligence and Neuroscience. 2020;2020:20. doi: 10.1155/2020/7631495.7631495 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23.Ahmad Z., Mahmoudi E., Hamedani G. G., Kharazmi O. New methods to define heavy-tailed distributions with applications to insurance data. Journal of Taibah University for Science. 2020;14(1):359–382. doi: 10.1080/16583655.2020.1741942. [DOI] [Google Scholar]

[B24] 24.Punzo A., Bagnato L. Modeling the cryptocurrency return distribution via laplace scale mixtures. Physica A: Statistical Mechanics and its Applications. 2021;563 doi: 10.1016/j.physa.2020.125354.125354 [DOI] [Google Scholar]

[B25] 25.Tung Y. L., Ahmad Z., Hamedani G. G. On modeling the medical care insurance data via a new statistical model. CMC-Computers Materials & Continua. 2021;66(1):113–126. [Google Scholar]

[B26] 26.Zhao J., Ahmad Z., Mahmoudi E., Hafez E. H., Mohie El-Din M. M. A new class of heavy-tailed distributions: modeling and simulating actuarial measures. Complexity. 2021;2021:18. doi: 10.1155/2021/5580228.5580228 [DOI] [Google Scholar]

[B27] 27.Ahmad Z., Hamedani G. G., Butt N. S. Recent developments in distribution theory: a brief survey and some new generalized classes of distributions. Pakistan Journal of Statistics and Operation Research. 2019;15(1):87–110. doi: 10.18187/pjsor.v15i1.2803. [DOI] [Google Scholar]

[B28] 28.Zhao J., Faqiri H., Ahmad Z., Emam W., Yusuf M., Sharawy A. M. The lomax-claim model: bivariate extension and applications to financial data. Complexity. 2021;2021 doi: 10.1155/2021/9993611.9993611 [DOI] [Google Scholar]

PERMALINK

Impact of YouTube Advertising on Sales with Regression Analysis and Statistical Modeling: Usefulness of Online Media in Business

Yang Zhou

Zubair Ahmad

Hassan Alsuhabi

M Yusuf

Ibrahim Alkhairy

A M Sharawy

Abstract

1. Introduction

2. Methodology

3. Regression Analysis

3.1. Simple Linear Regression Model

Figure 1.

3.2. Hypothesis Testing

3.3. t-Test

Table 1.

3.4. F-Test

Table 2.

3.5. Residuals

Figure 2.

3.6. Outlier Test

Figure 3.

3.7. Correlation Test

Figure 4.

4. Statistical Modeling

4.1. A New Statistical Distribution

Figure 5.

4.2. Estimation

4.3. An Application to YouTube Advertising Data

Figure 6.

Table 3.

Table 4.

Table 5.

Table 6.

Figure 7.

Figure 8.

5. Concluding Remarks

Appendix

Data Availability

Conflicts of Interest

Authors' Contributions

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases