`4.1.0`

Statistical methods in readable JavaScript for browsers, servers, and people.

When simple-statistics is included using a `script`

tag, the methods listed are available as properties of an object named
`ss`

, so call `ss.min()`

for the `min`

method.

- Basic Descriptive Statistics
- min
- max
- sum
- sumSimple
- quantile
- product
- Sorted Basic Descriptive Statistics
- minSorted
- maxSorted
- quantileSorted
- Measures of central tendency
- mean
- addToMean
- mode
- modeSorted
- modeFast
- median
- medianSorted
- harmonicMean
- geometricMean
- rootMeanSquare
- sampleSkewness
- Measures of dispersion
- variance
- sampleVariance
- standardDeviation
- sampleStandardDeviation
- medianAbsoluteDeviation
- interquartileRange
- sumNthPowerDeviations
- zScore
- Similarity
- sampleCorrelation
- sampleCovariance
- rSquared
- Linear Regression
- linearRegression
- linearRegressionLine
- Randomness
- shuffle
- shuffleInPlace
- sampleWithReplacement
- sample
- Classifiers
- BayesianClassifier ▸
- PerceptronModel ▸
- Distributions
- bernoulliDistribution
- binomialDistribution
- poissonDistribution
- chiSquaredDistributionTable
- standardNormalTable
- tTest
- tTestTwoSample
- cumulativeStdNormalProbability
- Errors
- errorFunction
- inverseErrorFunction
- probit
- Breaks
- ckmeans
- equalIntervalBreaks
- Utilities
- chunk
- chiSquaredGoodnessOfFit
- epsilon
- factorial
- uniqueCountSorted
- bisect
- combinationsReplacement
- combinations
- combineMeans
- combineVariances
- permutationsHeap
- sampleKurtosis
- subtractFromMean

The min is the lowest number in the array. This runs on `O(n)`

, linear time in respect to the array

Parameters

Returns

`number`

:
minimum value
Throws

- Error: if the the length of x is less than one

Example

`min([1, 5, -10, 100, 2]); // => -10`

This computes the maximum number in an array.

This runs on `O(n)`

, linear time in respect to the array

Parameters

Returns

`number`

:
maximum value
Throws

- Error: if the the length of x is less than one

Example

```
max([1, 2, 3, 4]);
// => 4
```

Our default sum is the Kahan-Babuska algorithm. This method is an improvement over the classical Kahan summation algorithm. It aims at computing the sum of a list of numbers while correcting for floating-point errors. Traditionally, sums are calculated as many successive additions, each one with its own floating-point roundoff. These losses in precision add up as the number of numbers increases. This alternative algorithm is more accurate than the simple way of calculating sums by simple addition.

This runs on `O(n)`

, linear time in respect to the array.

Parameters

Returns

`number`

:
sum of all input numbers
Example

`sum([1, 2, 3]); // => 6`

The simple sum of an array is the result of adding all numbers together, starting from zero.

This runs on `O(n)`

, linear time in respect to the array

Parameters

Returns

`number`

:
sum of all input numbers
Example

`sumSimple([1, 2, 3]); // => 6`

The quantile: this is a population quantile, since we assume to know the entire dataset in this library. This is an implementation of the Quantiles of a Population algorithm from wikipedia.

Sample is a one-dimensional array of numbers, and p is either a decimal number from 0 to 1 or an array of decimal numbers from 0 to 1. In terms of a k/q quantile, p = k/q - it's just dealing with fractions or dealing with decimal values. When p is an array, the result of the function is also an array containing the appropriate quantiles in input order

Parameters

p

`(number)`

the desired quantile, as a number between 0 and 1
Returns

`number`

:
quantile
Example

`quantile([3, 6, 7, 8, 8, 9, 10, 13, 15, 16, 20], 0.5); // => 9`

The product of an array is the result of multiplying all numbers together, starting using one as the multiplicative identity.

This runs on `O(n)`

, linear time in respect to the array

Parameters

Returns

`number`

:
product of all input numbers
Example

`product([1, 2, 3, 4]); // => 24`

These are special versions of methods that assume your input is sorted. This assumptions lets them run a lot faster, usually in O(1).

The minimum is the lowest number in the array. With a sorted array, the first element in the array is always the smallest, so this calculation can be done in one step, or constant time.

Parameters

Returns

`number`

:
minimum value
Example

`minSorted([-100, -10, 1, 2, 5]); // => -100`

The maximum is the highest number in the array. With a sorted array, the last element in the array is always the largest, so this calculation can be done in one step, or constant time.

Parameters

Returns

`number`

:
maximum value
Example

`maxSorted([-100, -10, 1, 2, 5]); // => 5`

This is the internal implementation of quantiles: when you know that the order is sorted, you don't need to re-sort it, and the computations are faster.

Parameters

p

`(number)`

desired quantile: a number between 0 to 1, inclusive
Returns

`number`

:
quantile value
Throws

Example

`quantileSorted([3, 6, 7, 8, 8, 9, 10, 13, 15, 16, 20], 0.5); // => 9`

These are different ways to identifying centers or locations of a distribution.

The mean, *also known as average*,
is the sum of all values over the number of values.
This is a measure of central tendency:
a method of finding a typical or central value of a set of numbers.

This runs on `O(n)`

, linear time in respect to the array

Parameters

Returns

`number`

:
mean
Throws

- Error: if the the length of x is less than one

Example

`mean([0, 10]); // => 5`

When adding a new value to a list, one does not have to necessary recompute the mean of the list in linear time. They can instead use this function to compute the new mean by providing the current mean, the number of elements in the list that produced it and the new value to add.

Since: 2.5.0

Parameters

Returns

`number`

:
the new mean
Example

`addToMean(14, 5, 53); // => 20.5`

The mode is the number that appears in a list the highest number of times. There can be multiple modes in a list: in the event of a tie, this algorithm will return the most recently seen mode.

This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.

This runs on `O(nlog(n))`

because it needs to sort the array internally
before running an `O(n)`

search to find the mode.

Parameters

Returns

`number`

:
mode
Example

`mode([0, 0, 1]); // => 0`

The mode is the number that appears in a list the highest number of times. There can be multiple modes in a list: in the event of a tie, this algorithm will return the most recently seen mode.

This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.

This runs in `O(n)`

because the input is sorted.

Parameters

Returns

`number`

:
mode
Throws

- Error: if sorted is empty

Example

`modeSorted([0, 0, 1]); // => 0`

The mode is the number that appears in a list the highest number of times. There can be multiple modes in a list: in the event of a tie, this algorithm will return the most recently seen mode.

modeFast uses a Map object to keep track of the mode, instead of the approach
used with `mode`

, a sorted array. As a result, it is faster
than `mode`

and supports any data type that can be compared with `==`

.
It also requires a
JavaScript environment with support for Map,
and will throw an error if Map is not available.

This is a measure of central tendency: a method of finding a typical or central value of a set of numbers.

modeFast(x: Array<any>): any?

Parameters

x

`(Array<any>)`

a sample of one or more data points
Returns

`any?`

:
mode
Throws

- ReferenceError: if the JavaScript environment doesn't support Map
- Error: if x is empty

Example

`modeFast(['rabbits', 'rabbits', 'squirrels']); // => 'rabbits'`

The median is
the middle number of a list. This is often a good indicator of 'the middle'
when there are outliers that skew the `mean()`

value.
This is a measure of central tendency:
a method of finding a typical or central value of a set of numbers.

The median isn't necessarily one of the elements in the list: the value can be the average of two elements if the list has an even length and the two central values are different.

Parameters

Returns

`number`

:
median value
Example

`median([10, 2, 5, 100, 2, 1]); // => 3.5`

The median is
the middle number of a list. This is often a good indicator of 'the middle'
when there are outliers that skew the `mean()`

value.
This is a measure of central tendency:
a method of finding a typical or central value of a set of numbers.

The median isn't necessarily one of the elements in the list: the value can be the average of two elements if the list has an even length and the two central values are different.

Parameters

Returns

`number`

:
median value
Example

`medianSorted([10, 2, 5, 100, 2, 1]); // => 52.5`

The Harmonic Mean is a mean function typically used to find the average of rates. This mean is calculated by taking the reciprocal of the arithmetic mean of the reciprocals of the input numbers.

This runs on `O(n)`

, linear time in respect to the array.

Parameters

Returns

`number`

:
harmonic mean
Throws

Example

`harmonicMean([2, 3]).toFixed(2) // => '2.40'`

The Geometric Mean is a mean function that is more useful for numbers in different ranges.

This is the nth root of the input numbers multiplied by each other.

The geometric mean is often useful for
**proportional growth**: given
growth rates for multiple years, like *80%, 16.66% and 42.85%*, a simple
mean will incorrectly estimate an average growth rate, whereas a geometric
mean will correctly estimate a growth rate that, over those years,
will yield the same end value.

This runs on `O(n)`

, linear time in respect to the array

Parameters

Returns

`number`

:
geometric mean
Throws

Example

```
var growthRates = [1.80, 1.166666, 1.428571];
var averageGrowth = geometricMean(growthRates);
var averageGrowthRates = [averageGrowth, averageGrowth, averageGrowth];
var startingValue = 10;
var startingValueMean = 10;
growthRates.forEach(function(rate) {
startingValue *= rate;
});
averageGrowthRates.forEach(function(rate) {
startingValueMean *= rate;
});
startingValueMean === startingValue;
```

The Root Mean Square (RMS) is
a mean function used as a measure of the magnitude of a set
of numbers, regardless of their sign.
This is the square root of the mean of the squares of the
input numbers.
This runs on `O(n)`

, linear time in respect to the array

Parameters

Returns

`number`

:
root mean square
Throws

- Error: if x is empty

Example

`rootMeanSquare([-1, 1, -1, 1]); // => 1`

Skewness is a measure of the extent to which a probability distribution of a real-valued random variable "leans" to one side of the mean. The skewness value can be positive or negative, or even undefined.

Implementation is based on the adjusted Fisher-Pearson standardized moment coefficient, which is the version found in Excel and several statistical packages including Minitab, SAS and SPSS.

Since: 4.1.0

Parameters

Returns

`number`

:
sample skewness
Throws

- Error: if x has length less than 3

Example

`sampleSkewness([2, 4, 6, 3, 1]); // => 0.590128656384365`

These are different ways of determining how spread out a distribution is.

The variance is the sum of squared deviations from the mean.

This is an implementation of variance, not sample variance:
see the `sampleVariance`

method if you want a sample measure.

Parameters

Returns

`number`

:
variance: a value greater than or equal to zero.
zero indicates that all values are identical.
Throws

- Error: if x's length is 0

Example

`variance([1, 2, 3, 4, 5, 6]); // => 2.9166666666666665`

The sample variance is the sum of squared deviations from the mean. The sample variance is distinguished from the variance by the usage of Bessel's Correction: instead of dividing the sum of squared deviations by the length of the input, it is divided by the length minus one. This corrects the bias in estimating a value from a set that you don't know if full.

References:

Parameters

Returns

`number`

:
sample variance
Throws

- Error: if the length of x is less than 2

Example

`sampleVariance([1, 2, 3, 4, 5]); // => 2.5`

The standard deviation is the square root of the variance. This is also known as the population standard deviation. It's useful for measuring the amount of variation or dispersion in a set of values.

Standard deviation is only appropriate for full-population knowledge: for samples of a population, sampleStandardDeviation is more appropriate.

Parameters

Returns

`number`

:
standard deviation
Example

```
variance([2, 4, 4, 4, 5, 5, 7, 9]); // => 4
standardDeviation([2, 4, 4, 4, 5, 5, 7, 9]); // => 2
```

The standard deviation is the square root of the variance.

Parameters

Returns

`number`

:
sample standard deviation
Example

```
sampleStandardDeviation([2, 4, 4, 4, 5, 5, 7, 9]).toFixed(2);
// => '2.14'
```

The Median Absolute Deviation is a robust measure of statistical dispersion. It is more resilient to outliers than the standard deviation.

Parameters

Returns

`number`

:
median absolute deviation
Example

`medianAbsoluteDeviation([1, 1, 2, 2, 4, 6, 9]); // => 1`

The Interquartile range is a measure of statistical dispersion, or how scattered, spread, or concentrated a distribution is. It's computed as the difference between the third quartile and first quartile.

Parameters

Returns

`number`

:
interquartile range: the span between lower and upper quartile,
0.25 and 0.75
Example

`interquartileRange([0, 1, 2, 3]); // => 2`

The sum of deviations to the Nth power. When n=2 it's the sum of squared deviations. When n=3 it's the sum of cubed deviations.

Parameters

Returns

`number`

:
sum of nth power deviations
Example

```
var input = [1, 2, 3];
// since the variance of a set is the mean squared
// deviations, we can calculate that with sumNthPowerDeviations:
var variance = sumNthPowerDeviations(input) / input.length;
```

The Z-Score, or Standard Score.

The standard score is the number of standard deviations an observation or datum is above or below the mean. Thus, a positive standard score represents a datum above the mean, while a negative standard score represents a datum below the mean. It is a dimensionless quantity obtained by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation.

The z-score is only defined if one knows the population parameters; if one only has a sample set, then the analogous computation with sample mean and sample standard deviation yields the Student's t-statistic.

Parameters

Returns

`number`

:
z score
Example

`zScore(78, 80, 5); // => -0.4`

The correlation is a measure of how correlated two datasets are, between -1 and 1

Parameters

Returns

`number`

:
sample correlation
Example

```
sampleCorrelation([1, 2, 3, 4, 5, 6], [2, 2, 3, 4, 5, 60]).toFixed(2);
// => '0.69'
```

Sample covariance of two datasets: how much do the two datasets move together? x and y are two datasets, represented as arrays of numbers.

Parameters

Returns

`number`

:
sample covariance
Throws

Example

`sampleCovariance([1, 2, 3, 4, 5, 6], [6, 5, 4, 3, 2, 1]); // => -3.5`

The R Squared
value of data compared with a function `f`

is the sum of the squared differences between the prediction
and the actual value.

Parameters

Returns

`number`

:
r-squared value
Example

```
var samples = [[0, 0], [1, 1]];
var regressionLine = linearRegressionLine(linearRegression(samples));
rSquared(samples, regressionLine); // = 1 this line is a perfect fit
```

Simple linear regression is a simple way to find a fitted line between a set of coordinates. This algorithm finds the slope and y-intercept of a regression line using the least sum of squares.

Parameters

Returns

`Object`

:
object containing slope and intersect of regression line
Example

`linearRegression([[0, 0], [1, 1]]); // => { m: 1, b: 0 }`

Given the output of `linearRegression`

: an object
with `m`

and `b`

values indicating slope and intercept,
respectively, generate a line function that translates
x values into y values.

Parameters

Returns

`Function`

:
method that computes y-value at any given
x-value on the line.
Example

```
var l = linearRegressionLine(linearRegression([[0, 0], [1, 1]]));
l(0) // = 0
l(2) // = 2
linearRegressionLine({ b: 0, m: 1 })(1); // => 1
linearRegressionLine({ b: 1, m: 1 })(1); // => 2
```

A Fisher-Yates shuffle
is a fast way to create a random permutation of a finite set. This is
a function around `shuffle_in_place`

that adds the guarantee that
it will not modify its input.

Parameters

x

`(Array)`

sample of 0 or more numbers
randomSource

```
(Function?
=
````Math.random`

)

an optional entropy source that
returns numbers between 0 inclusive and 1 exclusive: the range [0, 1)
Returns

`Array`

:
shuffled version of input
Example

```
var shuffled = shuffle([1, 2, 3, 4]);
shuffled; // = [2, 3, 1, 4] or any other random permutation
```

A Fisher-Yates shuffle
in-place - which means that it **will change the order of the original
array by reference**.

This is an algorithm that generates a random permutation of a set.

Parameters

x

`(Array)`

sample of one or more numbers
```
(Function?
=
````Math.random`

)

an optional entropy source that
returns numbers between 0 inclusive and 1 exclusive: the range [0, 1)
Returns

`Array`

:
x
Example

```
var x = [1, 2, 3, 4];
shuffleInPlace(x);
// x is shuffled to a value like [2, 1, 4, 3]
```

Sampling with replacement is a type of sampling that allows the same item to be picked out of a population more than once.

Parameters

x

`(Array<any>)`

an array of any kind of value
n

`(number)`

count of how many elements to take
```
(Function?
=
````Math.random`

)

an optional entropy source that
returns numbers between 0 inclusive and 1 exclusive: the range [0, 1)
Returns

`Array`

:
n sampled items from the population
Example

```
var sample = sampleWithReplacement([1, 2, 3, 4], 2);
sampleWithReplacement; // = [2, 4] or any other random sample of 2 items
```

Create a simple random sample
from a given array of `n`

elements.

The sampled values will be in any order, not necessarily the order they appear in the input.

Parameters

x

`(Array<any>)`

input array. can contain any type
n

`(number)`

count of how many elements to take
```
(Function?
=
````Math.random`

)

an optional entropy source that
returns numbers between 0 inclusive and 1 exclusive: the range [0, 1)
Returns

`Array`

:
subset of n elements in original array
Example

```
var values = [1, 2, 4, 5, 6, 7, 8, 9];
sample(values, 3); // returns 3 random values, like [2, 5, 8];
```

This is a naïve bayesian classifier that takes singly-nested objects.

new BayesianClassifier()

Example

```
var bayes = new BayesianClassifier();
bayes.train({
species: 'Cat'
}, 'animal');
var result = bayes.score({
species: 'Cat'
})
// result
// {
// animal: 1
// }
```

Instance Members

This is a single-layer Perceptron Classifier that takes arrays of numbers and predicts whether they should be classified as either 0 or 1 (negative or positive examples).

new PerceptronModel()

Example

```
// Create the model
var p = new PerceptronModel();
// Train the model with input with a diagonal boundary.
for (var i = 0; i < 5; i++) {
p.train([1, 1], 1);
p.train([0, 1], 0);
p.train([1, 0], 0);
p.train([0, 0], 0);
}
p.predict([0, 0]); // 0
p.predict([0, 1]); // 0
p.predict([1, 0]); // 0
p.predict([1, 1]); // 1
```

Instance Members

▸
predict(features)

▸
train(features, label)

**Train** the classifier with a new example, which is
a numeric array of features and a 0 or 1 label.

Parameters

Returns

`PerceptronModel`

:
this
The Bernoulli distribution
is the probability discrete
distribution of a random variable which takes value 1 with success
probability `p`

and value 0 with failure
probability `q`

= 1 - `p`

. It can be used, for example, to represent the
toss of a coin, where "1" is defined to mean "heads" and "0" is defined
to mean "tails" (or vice versa). It is
a special case of a Binomial Distribution
where `n`

= 1.

Parameters

p

`(number)`

input value, between 0 and 1 inclusive
Returns

`Array<number>`

:
values of bernoulli distribution at this point
Throws

- Error: if p is outside 0 and 1

Example

`bernoulliDistribution(0.3); // => [0.7, 0.3]`

The Binomial Distribution is the discrete probability
distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields
success with probability `probability`

. Such a success/failure experiment is also called a Bernoulli experiment or
Bernoulli trial; when trials = 1, the Binomial Distribution is a Bernoulli Distribution.

Parameters

Returns

`Array<number>`

:
output
The Poisson Distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.

The Poisson Distribution is characterized by the strictly positive
mean arrival or occurrence rate, `λ`

.

Parameters

lambda

`(number)`

location poisson distribution
Returns

`Array<number>`

:
values of poisson distribution at that point
**Percentage Points of the χ2 (Chi-Squared) Distribution**

The χ2 (Chi-Squared) Distribution is used in the common chi-squared tests for goodness of fit of an observed distribution to a theoretical one, the independence of two criteria of classification of qualitative data, and in confidence interval estimation for a population standard deviation of a normal distribution from a sample standard deviation.

Values from Appendix 1, Table III of William W. Hines & Douglas C. Montgomery, "Probability and Statistics in Engineering and Management Science", Wiley (1980).

chiSquaredDistributionTable

A standard normal table, also called the unit normal table or Z table, is a mathematical table for the values of Φ (phi), which are the values of the cumulative distribution function of the normal distribution. It is used to find the probability that a statistic is observed below, above, or between values on the standard normal distribution, and by extension, any normal distribution.

The probabilities are calculated using the Cumulative distribution function. The table used is the cumulative, and not cumulative from 0 to mean (even though the latter has 5 digits precision, instead of 4).

standardNormalTable

This is to compute a one-sample t-test, comparing the mean of a sample to a known value, x.

in this case, we're trying to determine whether the
population mean is equal to the value that we know, which is `x`

here. usually the results here are used to look up a
p-value, which, for
a certain level of significance, will let you determine that the
null hypothesis can or cannot be rejected.

Parameters

expectedValue

`(number)`

expected value of the population mean
Returns

`number`

:
value
Example

`tTest([1, 2, 3, 4, 5, 6], 3.385).toFixed(2); // => '0.16'`

This is to compute two sample t-test.
Tests whether "mean(X)-mean(Y) = difference", (
in the most common case, we often have `difference == 0`

to test if two samples
are likely to be taken from populations with the same mean value) with
no prior knowledge on standard deviations of both samples
other than the fact that they have the same standard deviation.

Usually the results here are used to look up a p-value, which, for a certain level of significance, will let you determine that the null hypothesis can or cannot be rejected.

`diff`

can be omitted if it equals 0.

This is used to confirm or deny
a null hypothesis that the two populations that have been sampled into
`sampleX`

and `sampleY`

are equal to each other.

Parameters

difference

```
(number?
=
````0`

)

Returns

`number`

:
test result
Example

`ss.tTestTwoSample([1, 2, 3, 4], [3, 4, 5, 6], 0); //= -2.1908902300206643`

**Cumulative Standard Normal Probability**

Since probability tables cannot be printed for every normal distribution, as there are an infinite variety of normal distributions, it is common practice to convert a normal to a standard normal and then use the standard normal table to find probabilities.

You can use `.5 + .5 * errorFunction(x / Math.sqrt(2))`

to calculate the probability
instead of looking it up in a table.

Parameters

z

`(number)`

Returns

`number`

:
cumulative standard normal probability
The `errorFunction(x/(sd * Math.sqrt(2)))`

is the probability that a value in a
normal distribution with standard deviation sd is within x of the mean.

This function returns a numerical approximation to the exact value.

Parameters

x

`(number)`

input
Returns

`number`

:
error estimation
Example

`errorFunction(1).toFixed(2); // => '0.84'`

The Inverse Gaussian error function
returns a numerical approximation to the value that would have caused
`errorFunction()`

to return x.

Parameters

x

`(number)`

value of error function
Returns

`number`

:
estimated inverted value
The Probit is the inverse of cumulativeStdNormalProbability(), and is also known as the normal quantile function.

It returns the number of standard deviations from the mean where the p'th quantile of values can be found in a normal distribution. So, for example, probit(0.5 + 0.6827/2) ≈ 1 because 68.27% of values are normally found within 1 standard deviation above or below the mean.

Parameters

p

`(number)`

Returns

`number`

:
probit
Breaks methods split datasets into chunks. Often these are used for segmentation or visualization of a dataset. A method of computing breaks that splits data evenly can make for a better choropleth map, for instance, because each color will be represented equally.

Ckmeans clustering is an improvement on heuristic-based clustering approaches like Jenks. The algorithm was developed in Haizhou Wang and Mingzhou Song as a dynamic programming approach to the problem of clustering numeric data into groups with the least within-group sum-of-squared-deviations.

Minimizing the difference within groups - what Wang & Song refer to as
`withinss`

, or within sum-of-squares, means that groups are optimally
homogenous within and the data is split into representative groups.
This is very useful for visualization, where you may want to represent
a continuous variable in discrete color or style groups. This function
can provide groups that emphasize differences between data.

Being a dynamic approach, this algorithm is based on two matrices that store incrementally-computed values for squared deviations and backtracking indexes.

This implementation is based on Ckmeans 3.4.6, which introduced a new divide and conquer approach that improved runtime from O(kn^2) to O(kn log(n)).

Unlike the original implementation, this implementation does not include any code to automatically determine the optimal number of clusters: this information needs to be explicitly provided.

*Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic
Programming* Haizhou Wang and Mingzhou Song ISSN 2073-4859

from The R Journal Vol. 3/2, December 2011

Parameters

nClusters

`(number)`

number of desired classes. This cannot be
greater than the number of values in the data array.
Returns

`Array<Array<number>>`

:
clustered input
Throws

- Error: if the number of requested clusters is higher than the size of the data

Example

```
ckmeans([-1, 2, -1, 2, 4, 5, 6, -1, 2, -1], 3);
// The input, clustered into groups of similar numbers.
//= [[-1, -1, -1, -1], [2, 2, 2], [4, 5, 6]]);
```

Given an array of x, this will find the extent of the x and return an array of breaks that can be used to categorize the x into a number of classes. The returned array will always be 1 longer than the number of classes because it includes the minimum value.

Parameters

Returns

`Array<number>`

:
array of class break positions
Example

`equalIntervalBreaks([1, 2, 3, 4, 5, 6], 4); //= [1, 2.25, 3.5, 4.75, 6]`

Split an array into chunks of a specified size. This function has the same behavior as PHP's array_chunk function, and thus will insert smaller-sized chunks at the end if the input size is not divisible by the chunk size.

`x`

is expected to be an array, and `chunkSize`

a number.
The `x`

array can contain any kind of data.

Parameters

Returns

`Array<Array>`

:
a chunked array
Throws

- Error: if chunk size is less than 1 or not an integer

Example

```
chunk([1, 2, 3, 4, 5, 6], 2);
// => [[1, 2], [3, 4], [5, 6]]
```

The χ2 (Chi-Squared) Goodness-of-Fit Test
uses a measure of goodness of fit which is the sum of differences between observed and expected outcome frequencies
(that is, counts of observations), each squared and divided by the number of observations expected given the
hypothesized distribution. The resulting χ2 statistic, `chiSquared`

, can be compared to the chi-squared distribution
to determine the goodness of fit. In order to determine the degrees of freedom of the chi-squared distribution, one
takes the total number of observed frequencies and subtracts the number of estimated parameters. The test statistic
follows, approximately, a chi-square distribution with (k − c) degrees of freedom where `k`

is the number of non-empty
cells and `c`

is the number of estimated parameters for the distribution.

chiSquaredGoodnessOfFit(data: Array<number>, distributionType: Function, significance: number): number

Parameters

distributionType

`(Function)`

a function that returns a point in a distribution:
for instance, binomial, bernoulli, or poisson
significance

`(number)`

Returns

`number`

:
chi squared goodness of fit
Example

```
// Data from Poisson goodness-of-fit example 10-19 in William W. Hines & Douglas C. Montgomery,
// "Probability and Statistics in Engineering and Management Science", Wiley (1980).
var data1019 = [
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3
];
ss.chiSquaredGoodnessOfFit(data1019, ss.poissonDistribution, 0.05)); //= false
```

We use `ε`

, epsilon, as a stopping criterion when we want to iterate
until we're "close enough". Epsilon is a very small number: for
simple statistics, that number is **0.0001**

This is used in calculations like the binomialDistribution, in which the process of finding a value is iterative: it progresses until it is close enough.

Below is an example of using epsilon in gradient descent,
where we're trying to find a local minimum of a function's derivative,
given by the `fDerivative`

method.

epsilon

Example

```
// From calculation, we expect that the local minimum occurs at x=9/4
var x_old = 0;
// The algorithm starts at x=6
var x_new = 6;
var stepSize = 0.01;
function fDerivative(x) {
return 4 * Math.pow(x, 3) - 9 * Math.pow(x, 2);
}
// The loop runs until the difference between the previous
// value and the current value is smaller than epsilon - a rough
// meaure of 'close enough'
while (Math.abs(x_new - x_old) > ss.epsilon) {
x_old = x_new;
x_new = x_old - stepSize * fDerivative(x_old);
}
console.log('Local minimum occurs at', x_new);
```

A Factorial, usually written n!, is the product of all positive integers less than or equal to n. Often factorial is implemented recursively, but this iterative approach is significantly faster and simpler.

Parameters

n

`(number)`

input, must be an integer number 1 or greater
Returns

`number`

:
factorial: n!
Throws

- Error: if n is less than 0 or not an integer

Example

`factorial(5); // => 120`

For a sorted input, counting the number of unique values is possible in constant time and constant memory. This is a simple implementation of the algorithm.

Values are compared with `===`

, so objects and non-primitive objects
are not handled in any special way.

Parameters

x

`(Array<any>)`

an array of any kind of value
Returns

`number`

:
count of unique values
Example

```
uniqueCountSorted([1, 2, 3]); // => 3
uniqueCountSorted([1, 1, 1]); // => 1
```

Bisection method is a root-finding method that repeatedly bisects an interval to find the root.

This function returns a numerical approximation to the exact value.

bisect(func: Function, start: Number, end: Number, maxIterations: Number, errorTolerance: Number): Number

Parameters

func

`(Function)`

input function
start

`(Number)`

start of interval
end

`(Number)`

end of interval
maxIterations

`(Number)`

the maximum number of iterations
errorTolerance

`(Number)`

the error tolerance
Returns

`Number`

:
estimated root value
Throws

- TypeError: Argument func must be a function

Example

`bisect(Math.cos,0,4,100,0.003); // => 1.572265625`

Implementation of Combinations with replacement Combinations are unique subsets of a collection - in this case, k x from a collection at a time. 'With replacement' means that a given element can be chosen multiple times. Unlike permutation, order doesn't matter for combinations.

Parameters

x

`(Array)`

any type of data
k

`(int)`

the number of objects in each group (without replacement)
Returns

`Array<Array>`

:
array of permutations
Example

`combinationsReplacement([1, 2], 2); // => [[1, 1], [1, 2], [2, 2]]`

Implementation of Combinations Combinations are unique subsets of a collection - in this case, k x from a collection at a time. https://en.wikipedia.org/wiki/Combination

Parameters

x

`(Array)`

any type of data
k

`(int)`

the number of objects in each group (without replacement)
Returns

`Array<Array>`

:
array of permutations
Example

`combinations([1, 2, 3], 2); // => [[1,2], [1,3], [2,3]]`

When combining two lists of values for which one already knows the means, one does not have to necessary recompute the mean of the combined lists in linear time. They can instead use this function to compute the combined mean by providing the mean & number of values of the first list and the mean & number of values of the second list.

Since: 3.0.0

Parameters

mean1

`(number)`

mean of the first list
n1

`(number)`

number of items in the first list
mean2

`(number)`

mean of the second list
n2

`(number)`

number of items in the second list
Returns

`number`

:
the combined mean
Example

`combineMeans(5, 3, 4, 3); // => 4.5`

When combining two lists of values for which one already knows the variances, one does not have to necessary recompute the variance of the combined lists in linear time. They can instead use this function to compute the combined variance by providing the variance, mean & number of values of the first list and the variance, mean & number of values of the second list.

combineVariances(variance1: number, mean1: number, n1: number, variance2: number, mean2: number, n2: number): number

Since: 3.0.0

Parameters

variance1

`(number)`

variance of the first list
mean1

`(number)`

mean of the first list
n1

`(number)`

number of items in the first list
variance2

`(number)`

variance of the second list
mean2

`(number)`

mean of the second list
n2

`(number)`

number of items in the second list
Returns

`number`

:
the combined mean
Example

`combineVariances(14 / 3, 5, 3, 8 / 3, 4, 3); // => 47 / 12`

Implementation of Heap's Algorithm for generating permutations.

Parameters

elements

`(Array)`

any type of data
Returns

`Array<Array>`

:
array of permutations
Kurtosis is a measure of the heaviness of a distribution's tails relative to its variance. The kurtosis value can be positive or negative, or even undefined.

Implementation is based on Fisher's excess kurtosis definition and uses unbiased moment estimators. This is the version found in Excel and available in several statistical packages, including SAS and SciPy.

Parameters

Returns

`number`

:
sample kurtosis
Throws

- Error: if x has length less than 4

Example

`sampleKurtosis([1, 2, 2, 3, 5]); // => 1.4555765595463122`

When removing a value from a list, one does not have to necessary recompute the mean of the list in linear time. They can instead use this function to compute the new mean by providing the current mean, the number of elements in the list that produced it and the value to remove.

Since: 3.0.0

Parameters

mean

`(number)`

current mean
n

`(number)`

number of items in the list
value

`(number)`

the value to remove
Returns

`number`

:
the new mean
Example

`subtractFromMean(20.5, 6, 53); // => 14`