Skip to main content
Data in Brief logoLink to Data in Brief
. 2017 Oct 6;15:459–468. doi: 10.1016/j.dib.2017.09.055

Datasets on the statistical properties of the first 3000 squared positive integers

Hilary I Okagbue a,, Muminu O Adamu b, Pelumi E Oguntunde a, Abiodun A Opanuga a, Ayodele A Adebiyi c, Sheila A Bishop a
PMCID: PMC5647521  PMID: 29062870

Abstract

The data in this article are as a result of a quest to uncover alternative research routes of deepening researchers’ understanding of integers apart from the traditional number theory approach. Hence, the article contains the statistical properties of the digits sum of the first 3000 squared positive integers. The data describes the various statistical tools applied to reveal different statistical and random nature of the digits sum of the first 3000 squared positive integers. Digits sum here implies the sum of all the digits that make up the individual integer.

Keywords: Positive integer, Digits sum, Harrell-Davis quantiles, Boxplots, Bootstrap, M-estimators, Confidence intervals, Curve estimation, Model fit


Specifications Table

Subject area Mathematics
More specific subject area Number Statistics, Computational number theory
Type of data Tables and Figures
How data was acquired The raw data is available in mathematical literature
Data format Analyzed
Experimental factors Zero and negative integers were not considered
Experimental features Exploratory data analysis, mathematical computation
Data source location Covenant University Mathematics Laboratory, Ota, Nigeria
Data accessibility All the data are in this data article

Value of the data

  • The data provides the exploratory statistics of digits sum of squared positive integers and their subsets.

  • This technique of analysis can be used in data reduction.

  • The data analysis can be applied to other known numbers.

  • The data when completely analyzed can help deepen the understanding of the random nature of integers.

1. Data

The data provides a description of the statistical properties of the digits sum of the first 3000 squared positive integers and the subsets. The subsets are the even and odd positive integers. The subsets are equivalence and their descriptive statistics are summarized in Fig. 1, Fig. 2, Fig. 3:

Fig. 1.

Fig. 1

The summary statistics of the digits sum of squared positive integers. Remark: The gaps observed in the histogram are because the digits sum of squared positive integers cannot yield some numbers such as: 2, 3, 5, 6, 8, 11, 12, 14, 15 and so on.

Fig. 2.

Fig. 2

The summary statistics of the digits sum of squared even positive integers. Remark: It can be seen that the mean and median of the data set are almost the same.

Fig. 3.

Fig. 3

The summary statistics of the digits sum of squared odd positive integers. Remark: Here, the mean and median of the data set are the same.

2. Experimental design, materials and methods

The digits sum or digital sum of integers has been a subject of interest because of its application in cryptography, primality testing, random number generation and data reduction. Details on the origin, theories and applications of the digits sum of squared positive integers, integers and other important number sequences can be found in [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28]. Recently digits sum and digital root have been applied in the analysis of lotto results [29].

2.1. Exploratory data analysis

The true nature of the percentiles are shown using the Harrell-Davis quantile which is a better estimator and a measure of variability because it makes use of the data in totality rather than the percentiles that are based on order statistics. The Harrell-Davis quantile of the digits sum of square of positive integers is shown in Fig. 4.

Fig. 4.

Fig. 4

Harrell-Davis quantiles.

Bootstrap methods are useful in construction of highly accurate and reliable confidence intervals (C.Is) for unknown and complicated probability distributions. The data for was resampled many times and C.Is was generated for the mean and the standard deviation. Bootstrap results varied slightly with the observed mean and standard deviation and convergence occurs as the confidence level increases. These are shown in Table 1, Table 2:

Table 1.

The bootstrap confidence interval for the mean of the digits sum of square of positive integers.

Confidence level (%) Lower limit Upper limit
99 27.02 27.76
98 27.03 27.77
97 27.07 27.75
96 27.08 27.72
95 27.10 27.70
94 27.12 27.68
93 27.12 27.70
92 27.12 27.66
91 27.12 27.66
90 27.14 27.64

Table 2.

The bootstrap confidence interval for the standard deviation of the digits sum of square of positive integers.

Confidence level (%) Lower limit Upper limit
99 8.22 8.763
98 8.246 8.735
97 8.262 8.715
96 8.261 8.709
95 8.292 8.700
94 8.308 8.693
93 8.29 8.689
92 8.325 8.681
91 8.316 8.66
90 8.311 8.674

The bootstrap estimate of the mean is closed to the observed one. However, the median remained unchanged. This is an evidence of the robustness and the resistant nature of the median against undue influence of outliers. This is also in agreement with the bootstrap confidence limits. The summary is shown in Table 3.

Table 3.

Estimation results of bootstrap of the mean and median of digits sum of squared positive integers.

Statistic P1 P5 Q1 Q2 (estimate) Q3 P95 P99 S.D. I.Q.R.
Mean 27.039 27.14 27.278 27.398 27.487 27.639 27.712 0.15221 0.20933
Median 27 27 27 27 27 27 27 0 0

P1=first\ percentile,P5=fifth\ percentile,Q1=first\ quartile,Q2=second\ quartile\ or\ the\ estimate,Q3=third\ quartile,P95=ninety\ five\ percentile,P99=ninety\ nine\ percentile,S.D.=standard\ deviation,I.Q.R.=the\ inter\ quartile\ range.

The M-Estimators are checked for the convergence to the mean or the median. The M-Estimators are robust and resistant to the undue effect of outliers. Technically, an M-Estimator can be assumed as the fixed point of the estimating function. The results of the M-estimator for the digits sum of the first 3000 squared positive integer is summarized in Table 4.

Table 4.

The M-estimators for the first 3000 squared positive integers.

Huber's M-estimatora Tukey's biweightb Hampel's M-estimatorc Andrews' waved
Sum of the digits of the squared positive integer. 27.43 27.44 27.42 27.44

Remark: The three M-estimators are the same but are closer to the mean than the median. This is an indication of the irregular behavior of the distribution.

a

The weighting constant is 1.339.

b

The weighting constant is 4.685.

c

The weighting constants are 1.700, 3.400, and 8.500.

d

The weighting constant is 1.340*pi.

The boxplot is an exploratory data analysis tool used to display graphically, the quantiles of a given numerical data. Outliers or extreme values are easily precipitated from the data and displayed graphically. The boxplots of the digits sums of squared positive integers and their subsets are shown in Fig. 5:

Fig. 5.

Fig. 5:

Boxplot summary of the digits sum of the first 3000 squared positive integers.

The data is slightly skewed to the left for the three cases with some outliers appearing in the case of the total. As the sample size increases, the frequency of the occurrence of the numbers below mean reduces and more outliers can also be obtained. On the other hand, more numbers are expected to appear as the sample size increases.

Particular patterns can be depicted through the use of individual value plots of observations. Some unique patterns were obtained for the even, odd and total squared positive integers. This is shown in Fig. 6, Fig. 7, Fig. 8:

Fig. 6.

Fig. 6

Individual value plot of digits sum of the first 3000 squared positive integers. Remark: Some gaps in the plot are synonymous with the result of the histogram. Some extreme values are also noticed in the plot.

Fig. 7.

Fig. 7

Individual value plot of digits sum of the squared even positive integers.

Fig. 8.

Fig. 8

Individual value plot of digits sum of the squared odd positive integers. Remark: The plots for the even and odd are identical.

The mean plot and median plot are shown in Fig. 9a and b.

Fig. 9.

Fig. 9

a: The mean plot, b: The median plot.

The mean plot showed the behavior of the mean. This is almost the same result by the bootstrap and bootstrap confidence intervals. As excepted the median plot is an indication of the robustness of the median.

Winsorizing and trimming are two ways of achieving robustness. The robustness of the central tendency (mean) of the digits sum of the first 3,000 squared positive integers was considered. These are shown in Fig. 10, Fig. 11.

Fig. 10.

Fig. 10

The Winsorized mean and robustness.

Fig. 11.

Fig. 11

The trimmed mean and robustness.

The data is robust because the possibility of obtaining outliers or extreme values decreases as more values are expected to cluster around the mean. As the sample size increases, the extreme values become fewer. In the case of trimming, the same result is obtained since there are few extreme values to exclude from the analysis.

2.2. Curve estimation

There are few curve estimation models that are available in fitting a given data. The result of fitting the digits sum of the first 3000 squared positive integers using the models is shown in Table 5.

Table 5.

Model fit for the digits sum of the first 3000 squared positive integers.

Model R Adjusted/R square SE of the estimates
Linear 0.466 0.217 7.516
Logarithmic 0.488 0.238 7.416
Inverse 0.179 0.032 8.359
Quadratic 0.472 0.223 7.489
Cubic 0.493 0.243 7.392
Compound 0.442 0.195 0.341
Power 0.516 0.266 0.326
S 0.300 0.090 0.363
Growth 0.442 0.195 0.341
Exponential 0.442 0.195 0.341

Remark: The low values of the R and adjusted R square indicate that the models barely fit the data and can give misleading results when used in prediction. Moreover, the power model provides the best fit and the inverse model provides the worst fit.

2.3. Probability distribution fit

Digits sum of the first 3000 squared positive integers is best fitted by Cauchy distribution and the details are shown in Table 6. This was done using EasyFit software.

Table 6.

Summary of the data fit.

Parameter Estimated value Standard deviation
Location 27.31296 0.129009
Scale 4.85180 0.122221

Remark: The data exhibits the characteristics of Cauchy distribution; the goodness of fit (Kolmogorov-Smirnov) test showed the statistic of 0.08616.

2.4. Mathematical computational results

The raw data of sum of the digits square of the first 3000 integers can be used to generate another set of numbers by finding the absolute value of the difference of two consecutive numbers and the total data generated is the initial data minus 1. The process was repeated until the mode and the median was equal to one. This is because any further step(s) add little or no effect to the analysis and also to save computational time. Normality is reduced by the process as evidenced by the increase in kurtosis and skewness. This is shown in Table 7.

Table 7.

Summary of the mathematical computation result.

Data Count Sum Average Variance Median Mode Kurtosis Skewness
Raw 3000 82193 27.39767 72.15358 27 27 −0.156 −0.06
1 2999 19446 6.484161 22.55003 6 2 1.228 0.988
2 2998 16061 5.357238 17.64978 4 2 2.196 1.278
3 2997 11968 3.993327 10.51798 4 5 4.045 1.563
4 2996 9012 3.008011 10.12681 2 1 4.740 1.905
5 2995 7218 2.410017 6.214596 2 1 7.225 2.235
6 2994 6173 2.06179 6.199655 1 1 7.410 2.312
7 2993 5261 1.757768 4.47105 1 1 10.190 2.618
8 2992 5140 1.717914 4.635881 1 1 10.377 2.711
9 2991 3949 1.320294 2.744869 1 1 13.765 2.954
10 2990 3831 1.281271 2.979743 1 1 14.716 3.098

Acknowledgements

This research is sponsored by the following: Covenant University Centre for Research, Innovation and Discovery and Statistics Sub Cluster of the Software Engineering, Modeling and Intelligent System Research Cluster of Covenant University.

Footnotes

Transparency document

Transparency data associated with this article can be found in the online version at doi:10.1016/j.dib.2017.09.055.

Transparency document. Supplementary material

Supplementary material

mmc1.pdf (79.4KB, pdf)

References

  • 1.Grabowski A. On square-free numbers. Formaliz. Math. 2013;21(2):153–162. [Google Scholar]
  • 2.Oh B.K., Sun Z.W. Mixed sums of squares and triangular numbers III. J. Number Theory. 2009;129(4):964–969. [Google Scholar]
  • 3.Hickerson D., Kleber M. Reducing a set of subtracting squares. J. Integer Seq. 1999;2(2) (Article 99.1.4) [Google Scholar]
  • 4.Weisstein E.W. Square number. MathWorld. 2002 [Google Scholar]
  • 5.Okagbue H.I., Adamu M.O., Iyase S.A., Opanuga A.A. Sequence of Integers generated by Summing the digits of their Squares. Indian J. Sci. Technol. 2015;8(15) (art 69912) [Google Scholar]
  • 6.Bishop S.A., Okagbue H.I., Adamu M.O., Olajide F.A. Sequences of numbers obtained by digit and iterative digit sums of Sophie Germain primes and its variants. Glob. J. Pure Appl. Math. 2016;12(2):1473–1480. [Google Scholar]
  • 7.Bishop S.A., Okagbue H.I., Adamu M.O., Opanuga A.A. Patterns obtained from digit and iterative digit sums of Palindromic, Repdigit and Repunit numbers, its variants and subsets. Glob. J. Pure Appl. Math. 2016;12(2):1481–1490. [Google Scholar]
  • 8.Okagbue H.I., Adamu M.O., Bishop S.A., Opanuga A.A. Digit and iterative digit sum of fibonacci numbers, their identities and powers. Int. J. Appl. Eng. Res. 2016;11(6):4623–4627. [Google Scholar]
  • 9.Farkas H.M. Sums of squares and triangular numbers. Online J. Anal. Comb. 2006;1(1):1–11. [Google Scholar]
  • 10.Browkin J., Brzeziñski J. On sequences of squares with constant second differences. Can. Math. Bull. 2006;49(4):481–491. [Google Scholar]
  • 11.Cilleruelo J., Luca F., Ruѐ J., Zumalacarregui A. On the sum of digits of some Sequences of integers. Cent. Eur. J. Math. 2013;11(1):188–195. [Google Scholar]
  • 12.Conway J.H., Guy R.K. The Books of Numbers. 1st edition. Springer-Verlag; NY: 1996. (ISBN: 0-387-97993-X) [Google Scholar]
  • 13.Liu J., Liu M.C., Zhan T. Squares of primes and powers of 2, II. J. Number Theory. 2002;92(1):99–116. [Google Scholar]
  • 14.Morgenbesser J. Doctoral thesis of the Vienna University of Technology; 1982. Gelfond’s sum of Digits Problems: Doctoral Thesis of the Vienna University of Technology. [Google Scholar]
  • 15.Morgenbesser J. The sum of digits of squares in Z[i] J. Number Theory. 2010;130(7):1433–1469. [Google Scholar]
  • 16.Allouche J.P., Shallit J.O. Sums of digits, overlaps and palindromes. Discret. Math. Theor. Comput. Sci. 2000;4(1):1–10. [Google Scholar]
  • 17.Cohn J.H.E. Square Fibonacci etc. Fibonacci Q. 1964;2(2):109–113. [Google Scholar]
  • 18.Ribenboim P. Springer-Verlag; NY: 2000. My Numbers, My Friends. [Google Scholar]
  • 19.Kaskin R., Karaath O. Some new properties of balancing numbers and squared triangular numbers. J. Integer Seq. 2012;15(1):1–13. (Article 12.1.4) [Google Scholar]
  • 20.Okagbue H.I., Adamu M.O., Oguntunde P.E., Opanuga A.A., Owoloko E.A., Bishop S.A. Datasets on the statistical and algebraic properties of primitive Pythagorean triples. Data Brief. 2017;14:686–694. doi: 10.1016/j.dib.2017.08.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Alfred U. n and n+1 consecutive integers with equal sum of squares. Math. Mag. 1962;35(3):155–164. [Google Scholar]
  • 22.Latushkin Y.A., Ushakov V.N. On the representation of Fibonacci and Lucas numbers as the sum of three squares. Math. Notes. 2012;91(5):663–670. [Google Scholar]
  • 23.Sun Z.W. Sums of squares and triangular numbers. Acta Arith. 2007;127(2):103–113. [Google Scholar]
  • 24.Okagbue H.I., Adamu M.O., Bishop S.A., Opanuga A.A. Properties of sequences generated by summing the digits of cubed positive integers. Indian J. Nat. Sci. 2015;6(32):10190–10201. [Google Scholar]
  • 25.Okagbue H.I., Opanuga A.A., Oguntunde P.E., Eze G.A. Positive numbers divisible by their iterative digit sum revisited. Pac. J. Sci. Technol. 2017;18(1):101–106. [Google Scholar]
  • 26.Okagbue H.I., Opanuga A.A., Oguntunde P.E., Eze G.A. On some notes on the engel expansion of ratios of sequences obtained from the sum of digits of squared positive integers. Pac. J. Sci. Technol. 2017;18(1):97–100. [Google Scholar]
  • 27.Mishra V.N., Khatri K., Mishra L.N. Statistical approximation by Kantorovich type Discrete $q-$Beta operators. Adv. Differ. Equ. 2013;2013:345. [Google Scholar]
  • 28.Guo S., Pan H., Sun Z.W. Mixed sums of squares and triangular numbers. Elect. J. Comb. Number Theory. 2007;A56:1–5. [Google Scholar]
  • 29.Okagbue H.I., Adamu M.O., Oguntunde P.E., Opanuga A.A., Rastolgi M.K. Exploration of UK Lotto results classified into two periods. Data Brief. 2017;14:213–219. doi: 10.1016/j.dib.2017.07.037. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

mmc1.pdf (79.4KB, pdf)

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES