50 Statistical Functions in Microsoft Excel
Updated on Apr 07, 2025 | 45 min read | 18.5k views
Share:
For working professionals
For fresh graduates
More
Updated on Apr 07, 2025 | 45 min read | 18.5k views
Share:
Table of Contents
Statistical functions in Excel are powerful tools for data analysis, allowing you to perform calculations, gain valuable insights, and draw conclusions from data. The best part is that you can easily calculate descriptive statistics, test hypotheses, analyze relationships, and generate random data.
The market volume of Microsoft Office software is forecasted to reach $33.09 billion by 2029, and Excel remains an integral part of it. This highlights its continued dominance in data analysis, business intelligence, and statistical computations across industries.
This guide will explore the top statistical functions in Excel that you can use to make data-driven decisions.
Take your data analysis skills to the next level with our Artificial Intelligence & Machine Learning Courses and unlock new career opportunities in tech and analytics.
The main purpose of statistical functions in Excel is to execute statistical, logical, and mathematical operations through specific values. Several statistical functions, principles, and algorithms are used for constructing statistical models, analyzing raw data, and many other Excel statistical operations.
The following points further highlight the importance of using statistical functions in Excel:
Excel’s statistical functions serve various industries, from finance to education. These functions enable professionals to perform complex Excel data analysis tasks with ease. Here’s an overview of how different industries fulfill their data needs by leveraging statistical functions in Excel:
Boost your data analysis expertise and prepare for high-impact roles in tech with these in-demand programs:
Excel is a preferred tool for both beginners and experts due to its user-friendly interface. The platform is also known for its extensive built-in Excel statistical tools.
Here is why most businesses and professionals prefer leveraging these tools:
Knowing what are statistical functions in Excel is not enough. You must also learn how it simplifies data analysis tasks across different domains. With their versatility, ease of use, and practical applications, statistical functions in Excel remain key tools for data analysis and decision-making. Here are some examples of Excel data summary functions and their real-time usage:
Application |
Excel Formula Used |
Description |
Business Forecasting |
TREND, LINEST, FORECAST.ETS |
Predict future sales and trends |
Survey Data Analysis |
COUNTIF, AVERAGEIF, PIVOT TABLES |
Summarize and interpret survey results |
Financial Risk Assessment |
STDEV, VAR.P, COVARIANCE.P |
Analyze market volatility |
Scientific Research |
T.TEST, ANOVA, REGRESSION |
Hypothesis testing and data modeling |
Marketing Analytics |
CORREL, LINEST, RANK |
Understand consumer preferences and sales performance |
Do you want to learn more about statistical functions in Excel? Pursue upGrad’s free Introduction to Data Analysis using Excel course now.
Excel's statistical functions provide multiple tools for data analysis. A few functions help calculate descriptive statistics, while a few others analyze relationships between variables or perform hypothesis testing.
On that note, here is a list of the top statistical functions in Excel for professionals. These Excel functions provide powerful tools for statistical analysis and help users process and interpret data efficiently.
Descriptive statistics functions are mathematical calculations that summarize a dataset's key characteristics, including its central tendency (mean, median, mode), dispersion (range, variance, standard deviation), and distribution shape.
These functions summarize data by calculating measures like mean, median, and standard deviation.
Function |
Description |
Examples |
AVERAGE(range) |
Excel mean calculation for a particular dataset. |
=AVERAGE(10, 20, 30, 40, 50) → Returns: 30 |
MEDIAN(range) |
Finds the middle value in a dataset. |
=MEDIAN(5, 15, 25, 35, 45) → Returns: 25 |
MODE.SNGL(range) |
Returns the most frequently occurring value. |
=MODE.SNGL(4, 6, 6, 8, 10) → Returns: 6 |
STDEV.P(range) / STDEV.S(range) |
Computes standard deviation for population/sample. |
=STDEV.S(10, 15, 20, 25, 30) → Returns: 7.91 |
VAR.P(range) / VAR.S(range) |
Find variance for population/sample. |
=VAR.S(3, 5, 7, 9, 11) → Returns: 8.5 |
Interested in statistical methods and analysis? Visit upGrad to pursue Inferential Statistics Online Courses.
A data distribution refers to how data points are spread across different values in a dataset. Professionals often visualize it using a graph like a histogram. Conversely, a probability function is a mathematical function that assigns a probability to each possible outcome of a random variable.
These functions help in probability calculations and normal distribution analysis.
Function |
Description |
Examples |
NORM.DIST(x, mean, std_dev, cumulative) |
Returns the normal distribution probability. |
=NORM.DIST(70, 65, 10, TRUE) → Returns: 0.69 (Cumulative probability of scoring ≤ 70) |
NORM.INV(probability, mean, std_dev) |
Finds the value corresponding to a probability in a normal distribution. |
=NORM.INV(0.95, 100, 15) → Returns: 124.67 (Value at the 95th percentile) |
BINOM.DIST(successes, trials, probability, cumulative) |
Computes binomial probability and other Excel statistical features. |
=BINOM.DIST(3, 10, 0.5, FALSE) → Returns: 0.117 (Probability of exactly 3 successes in 10 trials) |
POISSON.DIST(x, mean, cumulative) |
Calculates Poisson probability distribution. |
=POISSON.DIST(4, 3, FALSE) → Returns: 0.168 (Probability of exactly 4 events when the mean is 3) |
Do probability functions and calculations excite you? Consider upGrad’s Executive Diploma in Data Science & AI program now.
Correlation is a statistical function that helps you measure both variables. Conversely, linear regression is usually applied when a particular variable is manipulated.
These functions analyze correlation calculation in Excel and relationships between datasets and predict future trends.
Function |
Description |
Examples |
CORREL(array1, array2) |
Measures Excel data correlation between two datasets. |
=CORREL(A2:A10, B2:B10) → Returns: 0.85 (Strong positive correlation between datasets) |
LINEST(known_y’s, known_x’s, const, stats) |
Returns regression coefficients. |
=LINEST(B2:B10, A2:A10, TRUE, TRUE) → Returns: {2.5, 5} (Slope = 2.5, Intercept = 5) |
SLOPE(known_y’s, known_x’s) |
Calculates the slope of a regression line. |
=SLOPE(B2:B10, A2:A10) → Returns: 3.2 (Indicating for every 1 unit increase in X, Y increases by 3.2) |
INTERCEPT(known_y’s, known_x’s) |
Determines the y-intercept of a regression line. |
=INTERCEPT(B2:B10, A2:A10) → Returns: 4.6 (The point where the regression line crosses the Y-axis) |
Do you want to learn more about regression and correlation? Consider upGrad’s Professional Certificate Program in Business Analytics & Consulting program now.
Hypothesis testing functions include t-tests, confidence intervals, and power functions. These functions are used to evaluate whether sample data support a hypothesis. They also help determine statistical significance and compare data samples.
Function |
Description |
Examples |
T.TEST(array1, array2, tails, type) |
Performs a t-test for mean comparison. |
=T.TEST(A2:A10, B2:B10, 2, 1) → Returns: 0.03 (Indicates a key difference between the two groups) |
Z.TEST(array, x, sigma) |
Returns the probability of a z-test. |
=Z.TEST(A2:A20, 50, 10) → Returns: 0.08 (Indicating an 8% probability that the sample mean is 50) |
F.TEST(array1, array2) |
Evaluates variance differences between datasets. |
=F.TEST(A2:A15, B2:B15) → Returns: 0.12 (No difference in variances) |
CHISQ.TEST(actual_range, expected_range) |
Tests independence between categorical datasets. |
=CHISQ.TEST(A2:A10, B2:B10) → Returns: 0.04 (Suggesting an association between variables) |
Want to learn more about hypothesis testing? Enroll in upGrad’s free Basic Python Programming certification now.
Rank and Percentile in Microsoft Excel are two different statistical functions that help you analyze and understand the position of a value within a dataset. RANK allows you to determine the rank of a particular value within a dataset. Conversely, percentile functions indicate what percentage of values in a dataset fall below a certain value.
Function |
Description |
Examples |
RANK.EQ(number, ref, order) |
Assigns a rank to a value in a dataset. |
=RANK.EQ(85, A2:A10, 0) → Returns: 2 (Ranks 85 as the 2nd highest value in the dataset) |
PERCENTILE.EXC(array, k) |
Returns the kth percentile of a dataset. |
=PERCENTILE.EXC(A2:A20, 0.75) → Returns: 92 (75th percentile value in the dataset) |
QUARTILE.EXC(array, quart) |
Finds the specified quartile (Q1, Q2, Q3). |
=QUARTILE.EXC(A2:A15, 1) → Returns: 45 (First quartile, Q1, of the dataset) |
A random number function generates a random numerical value, which means each possible number within a specified range has an equal chance of being selected. Conversely, sampling uses random numbers to select a subset of a larger dataset. This ensures that each element in the dataset has an equal probability of being chosen for the sample.
These functions help you with simulations and sampling derived from multiple datasets.
Function |
Description |
Examples |
RAND() |
Generates a random number between 0 and 1. |
=RAND() → Returns: 0.673 (Generates a random number between 0 and 1) |
RANDBETWEEN(bottom, top) |
Returns a random integer between specified values. |
=RANDBETWEEN(1, 100) → Returns: 42 (Generates a random integer between 1 and 100) |
SAMPLE(range, n, replacement) |
Extracts a random sample from a dataset (requires add-ins). |
=SAMPLE(A1:A50, 5, TRUE) → Returns: {23, 7, 45, 12, 30} (Extracts 5 random values from A1:A50 with replacement) |
Do statistical functions and random numbers excite you? Pursue upGrad’s online data science course to learn more!
Measures of spread and dispersion are statistical tools that describe how spread out or varied a set of data is. They are also known as measures of variability. These functions are leveraged for analyzing variability in datasets.
Function |
Description |
Examples |
MIN(range) |
Returns the smallest value in a dataset. |
=MIN(A1:A10) → Returns: 5 (Finds the smallest value in the dataset) |
MAX(range) |
Returns the largest value in a dataset. |
=MAX(A1:A10) → Returns: 98 (Finds the largest value in the dataset) |
VARPA(range) |
Calculates variance for an entire population (including text and logical values). |
=VARPA(A1:A10) → Returns: 25.6 (Calculates variance, considering all values including text and logical values) |
STDEVP(range) |
Measures population standard deviation. |
=STDEVP(A1:A10) → Returns: 5.1 (Measures standard deviation for the entire population) |
GEOMEAN(range) |
Computes geometric mean. |
=GEOMEAN(A1:A5) → Returns: 18.2 (Computes the geometric mean of the values) |
HARMEAN(range) |
Calculates harmonic mean. |
=HARMEAN(A1:A5) → Returns: 14.7 (Calculates the harmonic mean of the dataset) |
Do you want to learn more about measures of dispersion? Consider upGrad’s Post Graduate Certificate in Data Science & AI (Executive) program now.
Time series forecasting is a technique that helps predict events through a sequence of time. It also predicts future events by analyzing the trends of the past. These functions help predict future data points based on historical data.
Function |
Description |
Examples |
FORECAST.LINEAR(x, known_y’s, known_x’s) |
Predicts a future value using linear regression. |
=FORECAST.LINEAR(2025, B2:B10, A2:A10) → Returns: 150 (Predicts the value for the year 2025 based on past data) |
TREND(known_y’s, known_x’s, new_x’s, const) |
Fits a trendline to data points. |
=TREND(B2:B10, A2:A10, A11:A15, TRUE) → Returns: {120, 130, 140, 150, 160} (Projects future values using a linear trend) |
GROWTH(known_y’s, known_x’s, new_x’s, const) |
Fits an exponential trendline. |
=GROWTH(B2:B10, A2:A10, A11:A15, TRUE) → Returns: {150, 180, 220, 260, 310} (Predicts future values using an exponential growth trend) |
MOVING AVERAGE |
Calculates moving averages for time-series data (via Data Analysis ToolPak). |
Select a range (A1:A20) and apply a 3-period moving average to smooth fluctuations in time-series data |
EXPON.DIST(x, lambda, cumulative) |
Computes the exponential probability distribution. |
=EXPON.DIST(2, 0.5, TRUE) → Returns: 0.632 (Computes cumulative exponential distribution probability for x = 2 and λ = 0.5) |
Want to learn more about forecasting and time series? Consider upGrad’s free course in Introduction to Data Analysis in Excel.
A probability distribution is a mathematical function that describes the probability of different possible values of a variable. It is often depicted using graphs or probability tables. These functions are used to analyze probability distributions.
Function |
Description |
Examples |
GAMMA.DIST(x, alpha, beta, cumulative) |
Returns gamma distribution probability. |
=GAMMA.DIST(2, 3, 1.5, TRUE) → Returns: 0.5768 (Computes cumulative gamma distribution probability for x = 2, α = 3, β = 1.5) |
GAMMA.INV(probability, alpha, beta) |
Finds inverse gamma distribution value. |
=GAMMA.INV(0.5, 3, 1.5) → Returns: 2.674 (Finds the value corresponding to 50% probability in a gamma distribution) |
BETA.DIST(x, alpha, beta, cumulative, lower, upper) |
Computes beta probability distribution. |
=BETA.DIST(0.7, 2, 5, TRUE, 0, 1) → Returns: 0.836 (Computes cumulative beta distribution probability for x = 0.7, α = 2, β = 5) |
BETA.INV(probability, alpha, beta, lower, upper) |
Returns inverse beta distribution. |
=BETA.INV(0.6, 2, 5, 0, 1) → Returns: 0.628 (Finds the value corresponding to 60% probability in a beta distribution) |
WEIBULL.DIST(x, alpha, beta, cumulative) |
Finds Weibull distribution probability. |
=WEIBULL.DIST(3, 1.5, 2, TRUE) → Returns: 0.738 (Computes cumulative Weibull distribution probability for x = 3, α = 1.5, β = 2) |
Statistical hypothesis testing helps determine whether data is statistically needed and whether a specific phenomenon can be explained as a byproduct of chance alone. These functions are specifically leveraged for inferential statistics and significance testing.
Function |
Description |
Examples |
F.DIST(x, degrees_freedom1, degrees_freedom2, cumulative) |
Returns the F-distribution probability. |
=F.DIST(3.5, 4, 10, TRUE) → Returns: 0.94 (Computes cumulative F-distribution probability for x = 3.5 with 4 and 10 degrees of freedom |
F.INV(probability, degrees_freedom1, degrees_freedom2) |
Finds inverse F-distribution. |
=F.INV(0.95, 4, 10) → Returns: 4.45 (Finds the value corresponding to 95% probability in an F-distribution) |
T.DIST(x, degrees_freedom, cumulative) |
Computes left-tailed Student’s t-distribution probability. |
=T.DIST(1.8, 15, TRUE) → Returns: 0.94 (Computes cumulative left-tailed Student’s t-distribution probability for x = 1.8 and 15 degrees of freedom) |
T.INV(probability, degrees_freedom) |
Finds inverse t-distribution value. |
=T.INV(0.95, 15) → Returns: 1.75 (Finds the t-value corresponding to 95% probability with 15 degrees of freedom) |
Ranking is the order of data points from lowest to highest, while quartiles divide a dataset into four equal parts. Conversely, percentiles divide a dataset into 100 equal parts. These statistical functions are used for dividing data into quartiles and percentiles.
Function |
Description |
Examples |
RANK.AVG(number, ref, order) |
Returns the rank of a value, averaging tied ranks. |
=RANK.AVG(85, A2:A10, 0) → Returns: 2.5 (Ranks 85 in the dataset, averaging tied ranks if duplicates exist) |
PERCENTILE.INC(array, k) |
Computes the kth percentile, including boundaries. |
=PERCENTILE.INC(A2:A10, 0.75) → Returns: 92 (Finds the 75th percentile value in the dataset, including boundaries) |
QUARTILE.INC(array, quart) |
Returns quartile values (0-4). |
=QUARTILE.INC(A2:A10, 3) → Returns: 88 (Finds the third quartile, which separates the top 25% of the data) |
Data sampling and randomization is a type of probability sampling in which the researcher selects a subset of participants from a population randomly. This statistical function is used for sorting and generating random values.
Function |
Description |
Examples |
RANDARRAY(rows, columns, min, max, integer) |
Generates an array of random numbers. |
=RANDARRAY(3, 2, 1, 100, TRUE) → Returns: A 3-row, 2-column array of random integers between 1 and 100. |
SORTBY(array, by_array, [order]) |
Sorts a dataset based on another column. |
=SORTBY(A2:A10, B2:B10, -1) → Returns: The values in column A are sorted by column B in descending order. |
FILTER(array, include, [if_empty]) |
Extracts values that meet criteria. |
=FILTER(A2:A10, B2:B10>50, "No matches") → Returns: Only the values from column A where column B is greater than 50, or "No matches" if none meet the criteria. |
SEQUENCE(rows, columns, start, step) |
Generates a sequence of numbers. |
=SEQUENCE(3, 2, 10, 5) → Returns: A 3-row, 2-column array starting from 10 and increasing by 5 (e.g., 10, 15, 20 in the first column). |
A variance in statistics refers to the spread of a data set around its mean value, while a covariance is the measure of directional relationships between two random variables. These statistical functions help measure relationships between data points.
Function |
Description |
Examples |
COVARIANCE.P(array1, array2) |
Calculates population covariance between two datasets. |
=COVARIANCE.P(A2:A10, B2:B10) → Returns: The population covariance between datasets in columns A and B. |
COVARIANCE.S(array1, array2) |
Computes sample covariance between two datasets. |
=COVARIANCE.S(A2:A10, B2:B10) → Returns: The sample covariance between datasets in columns A and B. |
VARA(range) |
Estimates variance, including text and logical values. |
=VARA(A2:A10) → Returns: The variance of values in A2:A10, considering text as 0 and TRUE as 1. |
STDEVA(range) |
Computes standard deviation, including text and logic. |
=STDEVA(A2:A10) → Returns: The standard deviation of values in A2:A10, treating text as 0 and TRUE as 1. |
Skewness refers to a measure of symmetry where a distribution, or data set, is symmetric if it looks the same to the left and right of the center point. Conversely, kurtosis is a measure of whether some specific heavy-tailed or light-tailed data relative to a normal distribution. These functions assist professionals in analyzing the shape of a distribution.
Function |
Description |
Examples |
SKEW(range) |
Measures the asymmetry of a dataset. |
=SKEW(A2:A10) → Returns: The skewness of the dataset in A2:A10, indicating whether data is left- or right-skewed. |
SKEW.P(range) |
Calculates population skewness. |
=SKEW.P(A2:A10) → Returns: The population skewness for the dataset in A2:A10, measuring asymmetry across the entire population. |
KURT(range) |
Evaluate the peakedness (kurtosis) of a dataset. |
=KURT(A2:A10) → Returns: The kurtosis of the dataset in A2:A10, showing whether data has a sharp or flat peak compared to a normal distribution. |
A chi-square distribution helps test the variance of a population distributed normally. This statistical function is specifically used for hypothesis testing and variance comparison. Only a single parameter usually defines it.
Function |
Description |
Examples |
CHISQ.DIST(x, deg_freedom, cumulative) |
Computes chi-square distribution probability. |
=CHISQ.DIST(5.2, 3, TRUE) → Returns: The cumulative probability for a chi-square distribution with 3 degrees of freedom at x = 5.2. |
CHISQ.INV(probability, deg_freedom) |
Finds inverse chi-square distribution value. |
=CHISQ.INV(0.95, 4) → Returns: The chi-square value corresponding to the 95% probability for a distribution with 4 degrees of freedom. |
F.DIST.RT(x, deg_freedom1, deg_freedom2) |
Returns right-tailed F-distribution probability. |
=F.DIST.RT(2.5, 4, 6) → Returns: The right-tailed probability of the F-distribution for x = 2.5, with 4 and 6 degrees of freedom. |
A confidence interval is the mean of an estimate plus and minus the variation in the same estimate. This is the statistical range of values that are estimated to fall between you redoing the test. The statistical function helps estimate confidence intervals and standard errors.
Function |
Description |
Examples |
CONFIDENCE.NORM(alpha, std_dev, size) |
Returns confidence interval for normal distribution. |
=CONFIDENCE.NORM(0.05, 1.5, 100) → Returns: The margin of error for a 95% confidence interval, assuming a normal distribution with a standard deviation of 1.5 and a sample size of 100. |
CONFIDENCE.T(alpha, std_dev, size) |
Computes confidence interval using Student’s t-distribution. |
=CONFIDENCE.T(0.05, 1.5, 50) → Returns: The margin of error for a 95% confidence interval using Student’s t-distribution with a standard deviation of 1.5 and a sample size of 50. |
STEYX(known_y’s, known_x’s) |
Returns the standard error of predicted y-values in regression. |
=STEYX(A2:A10, B2:B10) → Returns: The standard error of the predicted y-values in a regression analysis using dataset values in A2:A10 (dependent variable) and B2:B10 (independent variable). |
Data transformation normalization is a technique in data mining that helps you transform the values of a dataset into a common scale. This statistical function is important because many machine learning algorithms are sensitive to the scale of the input features. Hence, they can produce better results when the data is normalized.
Function |
Description |
Examples |
STANDARDIZE(x, mean, std_dev) |
Normalizes a value using mean and standard deviation. |
=STANDARDIZE(85, 70, 10) → Returns: 1.5 (Normalizes 85 using a mean of 70 and a standard deviation of 10).
|
LOGNORM.DIST(x, mean, std_dev, cumulative) |
Returns the log-normal distribution probability. |
=LOGNORM.DIST(10, 2, 0.5, TRUE) → Returns: The cumulative probability of x = 10 for a log-normal distribution with a mean of 2 and a standard deviation of 0.5. |
LOGNORM.INV(probability, mean, std_dev) |
Finds the inverse of the log-normal distribution. |
=LOGNORM.INV(0.95, 2, 0.5) → Returns: The value at the 95th percentile of a log-normal distribution with a mean of 2 and a standard deviation of 0.5. |
ZSCORE(range) |
Computes z-scores for each value in a dataset. |
=ZSCORE(A2:A10) → Returns: The z-scores for each value in the dataset A2:A10 based on its mean and standard deviation. |
LOG(x, base) |
Computes the logarithm of a number for a given base. |
=LOG(100, 10) → Returns: 2 (Calculates the logarithm of 100 with base 10). |
EXP(x) |
Returns the exponential value of x. |
=EXP(2) → Returns: 7.389 (Computes e^2, where e ≈ 2.718). |
A weighted statistics function refers to a statistical calculation where individual data points are assigned different "weights" based on their relative importance. They help calculate weighted averages and other Excel statistical computations and measures.
Function |
Description |
Examples |
SUMPRODUCT(array1, array2) |
Computes the sum of products, useful for weighted averages. |
=SUMPRODUCT(A2:A5, B2:B5) → Returns: The sum of the products of corresponding values in arrays A2:A5 and B2:B5. |
AVERAGE.WEIGHTED(range, weights) |
Returns weighted mean (requires add-ins). |
=AVERAGE.WEIGHTED(A2:A5, B2:B5) → Returns: The weighted average of values in A2:A5 using weights in B2:B5. |
WEIGHTED.MEAN(range, weights) |
Computes weighted mean using given weights. |
=WEIGHTED.MEAN(A2:A5, B2:B5) → Returns: The mean of A2:A5, giving more importance to values based on their weights in B2:B5. |
An array refers to a vector with one or more dimensions. A one-dimensional array is also considered a vector, while the one with two dimensions is considered a matrix. These statistical functions are specifically used for array-based calculations and matrix statistics.
Function |
Description |
Examples |
MMULT(array1, array2) |
Returns matrix product of two arrays. |
=MMULT(A2:B3, C2:D3) → Returns: The matrix product of the two given arrays. (Both arrays must have compatible dimensions. |
TRANSPOSE(array) |
Returns the transpose of a matrix. |
=TRANSPOSE(A1:C3) → Returns: The transposed version of the matrix A1:C3 (rows become columns and vice versa). |
MDETERM(array) |
Computes determinant of a matrix. |
=MDETERM(A1:B2) → Returns: The determinant of the 2x2 matrix in A1:B2. |
MINVERSE(array) |
Returns inverse of a square matrix. |
=MINVERSE(A1:C3) → Returns: The inverse of the given square matrix in A1:C3. (The matrix must be non-singular.) |
MUNIT(n) |
Generates an identity matrix of size n. |
=MUNIT(3) → Returns: A 3x3 identity matrix where diagonal elements are 1 and others are 0. |
An error estimation function in statistics refers to a mathematical formula that calculates the degree of error or uncertainty associated with a statistical estimate. It is specifically represented as the standard error and is calculated by dividing the standard deviation of the sample by the square root of the same sample size.
Functions |
Description |
Examples |
Standard Error (SE) |
Measures the variability of a sample statistic from the true population parameter. |
SE = σ / √n |
Mean Squared Error (MSE) |
Calculates the average squared difference between observed and predicted values. |
MSE = (Σ(actual - predicted)²) / n |
Root Mean Squared Error (RMSE) |
A commonly used metric that measures the standard deviation of residuals. |
RMSE = √MSE |
Relative Error |
Expresses the error as a percentage of the actual value for better interpretability. |
Relative Error = (Absolute Error / Actual Value) × 100% |
Confidence Interval (CI) Error |
Provides a range within which the true population parameter is expected to fall. |
95% CI = Sample Mean ± 1.96 × SE |
Ranking and positioning functions refer to mathematical or algorithmic procedures used to determine the relative order or "rank" of different items within a dataset. They are used for ranking values and identifying their position in a dataset.
Function |
Description |
Examples |
LARGE(array, k) |
Returns the kth largest value in a dataset. |
=LARGE(A1:A10, 2) → Returns: The 2nd largest value in the dataset A1:A10. |
SMALL(array, k) |
Returns the kth smallest value in a dataset. |
=SMALL(A1:A10, 3) → Returns: The 3rd smallest value in the dataset A1:A10. |
PERCENTRANK.EXC(array, x) |
Returns rank of value as a percentage of the dataset. |
=PERCENTRANK.EXC(A1:A10, 85) → Returns: The percentile rank of 85 in the dataset, excluding boundary values. |
PERCENTRANK.INC(array, x) |
Computes inclusive percentile rank. |
=PERCENTRANK.INC(A1:A10, 85) → Returns: The percentile rank of 85 in the dataset, including boundary values. |
Statistical significance is calculated by leveraging the cumulative distribution function, which tells the probability of certain outcomes. This usually involves assuming that the null hypothesis is true. These functions are specifically used for inferential statistics and hypothesis testing.
Function |
Description |
Examples |
KS.TEST(array1, array2) |
Performs Kolmogorov-Smirnov test for distribution comparison. |
=KS.TEST(A1:A10, B1:B10) → Returns: The p-value indicates whether the distributions of A1:A10 and B1:B10 are significantly different. |
WILCOXON.TEST(array1, array2) |
Conducts Wilcoxon rank-sum test (requires add-ins). |
=WILCOXON.TEST(A1:A10, B1:B10) → Returns: The test statistic and p-value, showing whether there is a significant difference between the two independent samples. |
MANNWHITNEY.U(array1, array2) |
Computes Mann-Whitney U test statistics. |
=MANNWHITNEY.U(A1:A10, B1:B10) → Returns: The U statistic, indicating whether one dataset tends to have larger values than the other. |
Non-parametric and distribution-free statistics refer to statistical methods that make minimal assumptions about the underlying distribution of data. This means they can be used to analyze data even when the population distribution is unknown or not normally distributed.
Function |
Description |
Examples |
MEDIANABSDEV(array) |
Computes median absolute deviation. |
=MEDIANABSDEV(A1:A10) → Returns: The median absolute deviation of the dataset A1:A10, measuring data dispersion. |
SPEARMAN(array1, array2) |
Measures Spearman rank Excel correlation coefficients. |
=SPEARMAN(A1:A10, B1:B10) → Returns: The Spearman rank correlation coefficient between A1:A10 and B1:B10, indicating the strength and direction of their monotonic relationship. |
KENDALL(array1, array2) |
Computes Kendall’s tau correlation. |
=KENDALL(A1:A10, B1:B10) → Returns: Kendall’s tau correlation coefficient for A1:A10 and B1:B10, measuring the association between the two datasets. |
These additional statistical functions are specifically used for advanced regression and trend prediction. They help refine predictive models, identify long-term patterns, and improve the accuracy of trend forecasting.
Function |
Description |
Examples |
RSQ(known_y's, known_x's) |
Returns R-squared for linear regression. |
=RSQ(A1:A10, B1:B10) → Returns: The R-squared value for the linear regression between A1:A10 (dependent variable) and B1:B10 (independent variable) |
LOGEST(known_y's, known_x's, const, stats) |
Returns parameters of an exponential curve fit. |
=LOGEST(A1:A10, B1:B10, TRUE, TRUE) → Returns: The parameters of an exponential curve that best fits the data in A1:A10 (dependent variable) and B1:B10 (independent variable) |
POLYFIT(x, y, degree) |
Computes polynomial regression coefficients (requires add-ins). |
=POLYFIT(A1:A10, B1:B10, 2) → Returns: The coefficients of a second-degree polynomial regression model for the data in A1:A10 (independent variable) and B1:B10 (dependent variable) |
FORECAST.ETS(x, known_y’s, known_x’s) |
Predicts future values using Exponential Smoothing (ETS). |
=FORECAST.ETS(11, A1:A10, B1:B10) → Returns: The predicted value for x = 11 based on the Exponential Smoothing (ETS) forecasting model using the known values in A1:A10 and B1:B10. |
A probability distribution refers to a statistical function that usually describes the possible values and probabilities for a random variable within a given range. These functions easily provide you with more probability-based calculations.
Function |
Description |
Examples |
BETA.PDF(x, alpha, beta) |
Computes beta probability density function. |
=BETA.PDF(0.5, 2, 5) → Returns: The beta probability density function value at x = 0.5 for a beta distribution with shape parameters alpha = 2 and beta = 5. |
NEGBINOM.DIST(successes, trials, probability, cumulative) |
Computes negative binomial probability. |
=NEGBINOM.DIST(3, 10, 0.4, FALSE) → Returns: The probability of observing exactly 3 successes in 10 trials with a success probability of 0.4 per trial. |
HYPGEOM.DIST(sample_successes, sample_size, population_successes, population_size, cumulative) |
Computes hypergeometric probability distribution. |
=HYPGEOM.DIST(2, 10, 5, 50, FALSE) → Returns: The probability of drawing exactly 2 successful outcomes in a sample of 10. |
EXPON.INV(probability, lambda) |
Returns inverse of an exponential distribution. |
=EXPON.INV(0.7, 0.5) → Returns: The inverse of the exponential distribution at probability 0.7 for a distribution with a rate parameter (lambda) of 0.5. |
Data manipulation enables businesses and professionals to derive meaningful insights and recognize trends or patterns in their data. Methods such as summarization, data aggregation, and visualization often help businesses find actionable information that guides their decision-making. Hence, these functions organize and process statistical data efficiently.
Function |
Description |
Examples |
UNIQUE(range) |
Returns unique values from a dataset. |
=UNIQUE(A1:A10) → Returns: A list of unique values from the dataset in A1:A10. |
SORT(range, index, order) |
Sorts dataset based on a specified column. |
=SORT(A1:C10, 2, 1) → Returns: The dataset A1:C10 sorted by the second column in ascending order. |
TEXTSPLIT(text, delimiter) |
Splits text based on a delimiter. |
=TEXTSPLIT("Apple, Orange,Banana", ",") → Returns: A list of separate values: "Apple", "Orange", "Banana", split by the comma delimiter. |
ARRAYTOTEXT(array, format) |
Converts an array to text for reporting. |
=ARRAYTOTEXT(A1:A5, 0) → Returns: A comma-separated text representation of the values in A1:A5 in a concise format. |
Correlation is a statistical function that quantifies the strength of the linear relationship between a pair of variables. Conversely, regression helps express the relationship between variables in the form of an equation. These functions analyze relationships between datasets and predict future trends.
Function |
Description |
Examples |
CORREL(array1, array2) |
Measures Excel data correlation between two datasets. |
=CORREL(A1:A10, B1:B10) → Returns: The correlation coefficient between datasets A1:A10 and B1:B10 |
LINEST(known_y’s, known_x’s, const, stats) |
Returns regression coefficients. |
=LINEST(A1:A10, B1:B10, TRUE, TRUE) → Returns: An array of regression coefficients |
SLOPE(known_y’s, known_x’s) |
Calculates the slope of a regression line. |
=SLOPE(A1:A10, B1:B10) → Returns: The slope of the regression line for A1:A10 (dependent variable) and B1:B10 (independent variable) |
INTERCEPT(known_y’s, known_x’s) |
Determines the y-intercept of a regression line. |
=INTERCEPT(A1:A10, B1:B10) → Returns: The y-intercept of the regression line and the independent variable (B1:B10) is zero. |
A weighted average refers to a statistical measure that usually assigns different weights to individual data points. These weights are allotted based on their relative significance, which results in a more accurate representation of the data set. These functions help compute weighted statistics for better analysis.
Function |
Description |
Examples |
SUMXMY2(array1, array2) |
Returns the sum of squares of differences between two arrays. |
=SUMXMY2(A1:A5, B1:B5) → Returns: The sum of squared differences between corresponding values in A1:A5 and B1:B5. |
AVERAGEIF(range, criteria, [average_range]) |
Computes the average of values that meet a specified condition. |
=AVERAGEIF(A1:A10, ">50") → Returns: The average of values in A1:A10 that are greater than 50. |
AVERAGEIFS(average_range, criteria_range1, criteria1, ...) |
Calculates the average for values that satisfy multiple conditions. |
=AVERAGEIFS(B1:B10, A1:A10, ">50", C1:C10, "<100") → Returns: The average of B1:B10 where A1:A10 is greater than 50 and C1:C10 is less than 100. |
Accuracy refers to the closeness of a particular agreement between a measured value and an accepted or true value. Conversely, measurement error is the amount of inaccuracy. These functions help evaluate the accuracy of statistical models and measurements.
Function |
Description |
Examples |
MAPE(actual_range, predicted_range) |
Computes the Mean Absolute Percentage Error for forecasting accuracy. |
=MAPE(A1:A10, B1:B10) → Returns: The Mean Absolute Percentage Error between actual values (A1:A10) and predicted values (B1:B10), |
RMSE(actual_range, predicted_range) |
Returns the Root Mean Square Error, a common measure of model performance. |
=RMSE(A1:A10, B1:B10) → Returns: The Root Mean Square Error between actual values (A1:A10) and predicted values (B1:B10) |
MEANABSDEV(range) |
Calculates the mean absolute deviation, an indicator of dispersion. |
=MEANABSDEV(A1:A10) → Returns: The Mean Absolute Deviation of A1:A10 |
Nonlinear regression refers to a statistical technique that helps describe nonlinear relationships in terms of experimental data. All these statistical models are assumed to be parametric, where the model is described in the form of a nonlinear equation. These functions help fit data points to nonlinear models.
Function |
Description |
Examples |
LOGEST(known_y's, known_x's, const, stats) |
Computes the parameters for an exponential regression model. |
=LOGEST(A1:A10, B1:B10, TRUE, TRUE) → Returns: Coefficients for an exponential regression model fitting A1:A10 and B1:B10. |
POLYFIT(x, y, degree) |
Determines polynomial regression coefficients (requires add-ins). |
=POLYFIT(A1:A10, B1:B10, 2) → Returns: Coefficients of a quadratic polynomial regression model for A1:A10 and B1:B10. |
EXPREG(known_y's, known_x's) |
Fits an exponential regression model to data points. |
=EXPREG(A1:A10, B1:B10) → Returns: The best-fit exponential regression equation for A1:A10 and B1:B10. |
Non-parametric statistical functions are statistical methods that analyze data without assuming the data follows a specific distribution. This means they do not rely on parameters like mean and standard deviation. These functions help analyze datasets without assuming a normal distribution.
Function |
Description |
Examples |
RANK.AVG(number, ref, order) |
Returns the rank of a value, averaging tied ranks. |
=RANK.AVG(85, A1:A10, 0) → Returns: The rank of 85 in A1:A10, averaging tied ranks, sorted in descending order. |
SIGN(TEST(array1, array2)) |
Performs a sign test for paired samples (requires add-ins). |
=SIGN.TEST(A1:A10, B1:B10) → Returns: The p-value for the sign test, comparing paired values in A1:A10 and B1:B10. |
PERCENTRANK(array, x, significance) |
Returns the rank of a value as a percentage of the dataset. |
=PERCENTRANK(A1:A10, 75, 2) → Returns: The rank of 75 as a percentage of values in A1:A10, rounded to 2 decimal places. |
MODE.MULT(range) |
Returns multiple modes in a dataset. |
=MODE.MULT(A1:A10) → Returns: An array of the most frequently occurring values in A1:A10. |
Advanced data distribution functions refer to complex statistical distributions beyond basic ones like normal or binomial distributions. These functions analyze probability distributions in different contexts.
Function |
Description |
Examples |
CHISQ.DIST.RT(x, deg_freedom) |
Computes the right-tailed chi-square distribution. |
=CHISQ.DIST.RT(5.2, 3) → Returns: The right-tailed chi-square probability for x = 5.2 with 3 degrees of freedom. |
CHISQ.INV.RT(probability, deg_freedom) |
Returns the inverse right-tailed chi-square distribution. |
=CHISQ.INV.RT(0.05, 4) → Returns: The critical chi-square value for a 5% right-tailed probability with 4 degrees of freedom. |
LOGNORM.PDF(x, mean, std_dev) |
Computes the log-normal probability density function. |
=LOGNORM.PDF(10, 2, 0.5) → Returns: The log-normal probability density at x = 10, given a mean of 2 and standard deviation of 0.5. |
NEGBINOM.PDF(successes, trials, probability) |
Calculates the negative binomial probability mass function. |
=NEGBINOM.PDF(3, 10, 0.4) → Returns: The probability of exactly 3 successes in 10 trials with a success rate of 0.4 per trial. |
Data cleaning refers to the process of removing data that does not belong in a particular dataset. Conversely, data transformation refers to the process of converting data from one structure or format into another. These functions help in preparing and structuring data for Excel statistical analysis.
Function |
Description |
Examples |
TRIMMEAN(array, percent) |
Returns the mean by excluding a percentage of extreme values. |
=TRIMMEAN(A1:A10, 0.1) → Returns: The mean of A1:A10 excluding the lowest and highest 10% of values. |
STANDARDIZE(x, mean, std_dev) |
Computes the standardized z-score of a value. |
=STANDARDIZE(85, 75, 5) → Returns: The z-score of 85, given a mean of 75 and standard deviation of 5. |
MINIFS(min_range, criteria_range1, criteria1, …) |
Returns the smallest number that meets given conditions. |
=MINIFS(A1:A10, B1:B10, ">50") → Returns: The smallest value in A1:A10 where corresponding values in B1:B10 are greater than 50. |
MAXIFS(max_range, criteria_range1, criteria1, …) |
Returns the largest number that meets given conditions. |
=MAXIFS(A1:A10, B1:B10, "<100") → Returns: The largest value in A1:A10 where corresponding values in B1:B10 are less than 100. |
The reliability function, also called the survival function, is useful for failure time analysis and estimating the reliability of components. Since each unit either fails or survives, the system survives only if both units survive.
Function |
Description |
Examples |
EXPON.PDF(x, lambda) |
Computes the exponential probability density function. |
=EXPON.PDF(2, 0.5) → Returns: The exponential probability density at x = 2, with a rate parameter (lambda) of 0.5. |
WEIBULL.PDF(x, alpha, beta) |
Returns the Weibull probability density function. |
=WEIBULL.PDF(3, 2, 1.5) → Returns: The Weibull probability density at x = 3, with shape parameter alpha = 2 and scale parameter beta = 1.5. |
LOGNORM.S.DIST(x, cumulative) |
Computes the standardized log-normal distribution. |
=LOGNORM.S.DIST(1.5, TRUE) → Returns: The cumulative log-normal distribution value for x = 1.5. |
CHISQ.TEST(actual_range, expected_range) |
Performs a chi-square test of independence. |
=CHISQ.TEST(A1:A10, B1:B10) → Returns: The p-value from the chi-square test of independence between the observed values in A1:A10 and the expected values in B1:B10. |
A matrix in statistics is a rectangular array of symbols, numbers, or expressions arranged in rows and columns. Matrices are often used in statistics to represent real-world data, conduct relevant research, and create various graphs. These functions deal with statistical calculations on matrices.
Function |
Description |
Examples |
MMULT(array1, array2) |
Returns the product of two matrices. |
=MMULT(A1:B2, C1:D2) → Returns: The matrix product of arrays A1:B2 and C1:D2. |
MDETERM(array) |
Computes the determinant of a square matrix. |
=MDETERM(A1:B2) → Returns: The determinant of the square matrix A1:B2. |
MINVERSE(array) |
Returns the inverse of a square matrix. |
=MINVERSE(A1:B2) → Returns: The inverse of the square matrix A1:B2. |
TRANSPOSE(array) |
Converts rows into columns and vice versa. |
=TRANSPOSE(A1:B2) → Returns: The transpose of the array A1:B2. |
Advanced regression and trend analysis refer to statistical methods that go beyond basic linear regression to analyze complex relationships between variables over time. These functions provide more in-depth regression and trend forecasting.
Function |
Description |
Examples |
FORECAST.ETS.SEASONALITY(known_y’s, known_x’s, [options]) |
Detects the seasonality in time-series forecasting using ETS models. |
=FORECAST.ETS.SEASONALITY(A1:A10, B1:B10) → Returns: The detected seasonality in the time-series data in A1:A10 and B1:B10 using ETS models. |
FORECAST.ETS.CONFINT(known_y’s, known_x’s, [options]) |
Returns the confidence interval for an ETS forecast. |
=FORECAST.ETS.CONFINT(A1:A10, B1:B10) → Returns: The confidence interval for an ETS forecast based on the known data in A1:A10 and B1:B10. |
LOGEST(known_y’s, known_x’s, const, stats) |
Computes an exponential curve fit using regression. |
=LOGEST(A1:A10, B1:B10, TRUE, TRUE) → Returns: Parameters for an exponential regression curve fit using the data in A1:A10 and B1:B10. |
TREND(known_y’s, known_x’s, new_x’s, const) |
Returns values along a linear trend. |
=TREND(A1:A10, B1:B10, C1:C5, TRUE) → Returns: The predicted y-values for the new x-values in C1:C5 based on the linear trend of A1:A10 and B1:B10. |
Bootstrap resampling is a powerful technique in statistics and data analysis that helps professionals estimate the uncertainty of a particular statistic. This is usually done by repeatedly sampling from the original data. These functions assist in resampling and bootstrapping for statistical inference.
Function |
Description |
Examples |
RESAMPLE(range, n, replacement) |
Generates bootstrap resamples (requires add-ins). |
=RESAMPLE(A1:A10, 1000, TRUE) → Returns: Generates 1000 bootstrap resamples from the range A1:A10, with replacement. |
BOOTSTRAP.MEAN(range, iterations) |
Computes a mean estimate using bootstrapping (requires add-ins). |
=BOOTSTRAP.MEAN(A1:A10, 1000) → Returns: A mean estimate based on 1000 iterations of bootstrapping from the range A1:A10. |
BOOTSTRAP.MEDIAN(range, iterations) |
Computes a median estimate using bootstrapping (requires add-ins). |
=BOOTSTRAP.MEDIAN(A1:A10, 1000) → Returns: A median estimate based on 1000 iterations of bootstrapping from the range A1:A10. |
PERMUT(n, k) |
Returns the number of permutations of k objects from n total objects. |
=PERMUT(5, 3) → Returns: The number of permutations of 3 objects selected from 5 total objects. |
Bayesian statistics is a system used to describe epistemological uncertainty by leveraging the mathematical language of probability. These functions support Bayesian probability calculations and are used to update the probability estimate for a hypothesis when additional evidence is acquired.
Function |
Description |
Examples |
BAYES.PROB(prior, likelihood, marginal) |
Computes Bayesian posterior probability. |
=BAYES.PROB(0.2, 0.8, 0.5) → Returns: The Bayesian posterior probability is based on a prior of 0.2, likelihood of 0.8, and marginal probability of 0.5. |
CONDITIONAL.PROB(A, B) |
Computes conditional probability |
=CONDITIONAL.PROB(0.3, 0.7) → Returns: The conditional probability of event A given event B, with probabilities of 0.3 and 0.7, respectively. |
POSTERIOR.MEAN(prior, data, likelihood) |
Estimates the posterior mean for Bayesian inference. |
=POSTERIOR.MEAN(0.2, 0.6, 0.5) → Returns: The estimated posterior mean for Bayesian inference, considering prior of 0.2, data likelihood of 0.6, and likelihood of 0.5. |
POSTERIOR.VAR(prior, data, likelihood) |
Computes posterior variance for Bayesian statistics. |
=POSTERIOR.VAR(0.2, 0.6, 0.5) → Returns: The posterior variance for Bayesian statistics, considering a prior of 0.2, data likelihood of 0.6, and likelihood of 0.5. |
Time series analysis is all about analyzing historic data and establishing any underlying trend and seasonal variations within the same data. This usually involves analyzing the general direction the data is heading in and can often be upward or downward. These functions are useful for smoothing and analyzing time-series data.
Function |
Description |
Examples |
MOVINGAVERAGE(range, n) |
Computes moving averages (via Excel Statistical Data Analysis ToolPak). |
=MOVINGAVERAGE(A1:A10, 3) → Returns: The moving average of the last 3 values in the range A1:A10. |
EXPONENTIAL.SMOOTH(range, alpha) |
Applies exponential smoothing to a dataset. |
=EXPONENTIAL.SMOOTH(A1:A10, 0.2) → Returns: Exponentially smoothed values for the dataset in A1:A10, with a smoothing factor of 0.2. |
DAMPED.TREND(range, alpha, beta) |
Estimates a damped trend for time series forecasting. |
=DAMPED.TREND(A1:A10, 0.3, 0.5) → Returns: A damped trend forecast for the time series in A1:A10, using alpha = 0.3 and beta = 0.5. |
HOLTWINTERS(range, alpha, beta, gamma) |
Uses Holt-Winters smoothing for time-series forecasting. |
=HOLTWINTERS(A1:A10, 0.2, 0.3, 0.4) → Returns: Holt-Winters smoothed time series forecast for A1:A10, with alpha = 0.2, beta = 0.3, and gamma = 0.4. |
Outlier detection refers to the process of detecting a data point that is far away from the average. The calculation usually depends on what you are trying to accomplish, potentially resolving or removing them from the analysis to prevent any more skewing. These functions help identify and manage outliers in datasets.
Function |
Description |
Examples |
IQR(range) |
Computes the interquartile range. |
=IQR(A1:A10) → Returns: The interquartile range of the dataset in A1:A10. |
MAD(range) |
Returns the median absolute deviation. |
=MAD(A1:A10) → Returns: The median absolute deviation of the dataset in A1:A10. |
WINSORIZE(range, percent) |
Applies Winsorization to limit extreme values. |
=WINSORIZE(A1:A10, 5) → Returns: The dataset in A1:A10 with extreme values capped at the 5th and 95th percentiles. |
TRIMMEAN(range, percentage) |
Computes the mean after excluding a percentage of extreme values. |
=TRIMMEAN(A1:A10, 0.1) → Returns: The mean of the dataset in A1:A10 after excluding the lowest and highest 10% of values. |
Resampling and Monte Carlo simulations are both statistical techniques that use random sampling to estimate the properties of a population or model. These functions are useful for probabilistic modeling and simulations.
Function |
Description |
Examples |
MONTECARLO.SIM(n, distribution, params) |
Performs Monte Carlo simulations. |
=MONTECARLO.SIM(1000, "Normal", {0, 1}) → Returns: A simulation of 1000 iterations of a normal distribution with mean 0 and standard deviation 1. |
RANDOMWALK(n, start, step) |
Simulates a random walk process. |
=RANDOMWALK(100, 0, 1) → Returns: A random walk simulation starting at 0 with steps of size 1 for 100 iterations. |
SAMPLE(range, n, replacement) |
Draws a random sample from a dataset. |
=SAMPLE(A1:A10, 5, TRUE) → Returns: A random sample of 5 values from A1:A10 with replacement. |
BOOTSTRAP.STDEV(range, iterations) |
Computes standard deviation using bootstrapping. |
=BOOTSTRAP.STDEV(A1:A10, 1000) → Returns: The standard deviation computed using 1000 bootstrapping iterations from the range A1:A10. |
Probability distributions refer to the possible values that a random variable can usually take. This statistical function is leveraged in investing, specifically in determining the possible performance of a particular stock. It also helps determine the risk management component of investing by helping to determine the maximum loss.
Function |
Description |
Examples |
TRIANGULAR.DIST(x, lower, mode, upper, cumulative) |
Computes the triangular probability distribution. |
=TRIANGULAR.DIST(5, 1, 4, 7, FALSE) → Returns: The probability density for x = 5 in a triangular distribution with lower bound 1, mode 4, and upper bound 7. |
EXTREMEVALUE.DIST(x, location, scale, cumulative) |
Returns the extreme value distribution probability. |
=EXTREMEVALUE.DIST(3, 2, 1, FALSE) → Returns: The probability density for x = 3 in an extreme value distribution with location 2 and scale 1. |
LOGISTIC.DIST(x, mean, scale, cumulative) |
Computes the logistic probability distribution. |
=LOGISTIC.DIST(5, 4, 1, FALSE) → Returns: The probability density for x = 5 in a logistic distribution with mean 4 and scale 1. |
Advanced statistical tests often go beyond the level of descriptive statistics applied in the analysis of various data. The most common examples are analysis of variance, correlation, and regression, path analysis techniques for model identification and fit, and structural equation modeling. These functions perform hypothesis testing and statistical comparisons.
Function |
Description |
Examples |
KRUSKAL.TEST(array1, array2, …) |
Performs a Kruskal-Wallis test for non-parametric ANOVA. |
=KRUSKAL.TEST(A1:A10, B1:B10) → Returns: The p-value from the Kruskal-Wallis test comparing the distributions of datasets A1:A10 and B1:B10 for non-parametric ANOVA. |
MOOD.MEDIAN.TEST(array1, array2) |
Compares medians of two datasets using Mood’s median test. |
=MOOD.MEDIAN.TEST(A1:A10, B1:B10) → Returns: The p-value from Mood's median test for comparing the medians of datasets A1:A10 and B1:B10. |
LEVENE.TEST(array1, array2) |
Checks for equality of variances in different groups. |
=LEVENE.TEST(A1:A10, B1:B10) → Returns: The p-value from Levene's test checks for equality of variances between the two datasets, A1:A10 and B1:B10. |
KS.TEST(array1, array2) |
Conducts a Kolmogorov-Smirnov test for comparing distributions. |
=KS.TEST(A1:A10, B1:B10) → Returns: The p-value from the Kolmogorov-Smirnov test for comparing the distributions of datasets A1:A10 and B1:B10. |
Correlation or dependence is any statistical relationship between two random variables or bivariate data. It commonly refers to the degree to which a pair of variables are related linearly. These functions assess the strength and direction of Excel data relationships between variables.
Function |
Description |
Examples |
PARTIAL.CORREL(array1, array2, control) |
Computes the partial correlation between two variables, controlling for a third. |
=PARTIAL.CORREL(A1:A10, B1:B10, C1:C10) → Returns: The partial correlation between datasets A1:A10 and B1:B10. |
DISTANCE.CORREL(array1, array2) |
Measures nonlinear dependencies between datasets. |
=DISTANCE.CORREL(A1:A10, B1:B10) → Returns: A measure of nonlinear dependency between datasets A1:A10 and B1:B10. |
KENDALL.TAU(array1, array2) |
Calculates Kendall’s tau correlation coefficient. |
=KENDALL.TAU(A1:A10, B1:B10) → Returns: Kendall's tau correlation coefficient for datasets A1:A10 and B1:B10. |
BICORREL(array1, array2) |
Computes biweight midcorrelation, a unique measure of correlation. |
=BICORREL(A1:A10, B1:B10) → Returns: The biweight midcorrelation between datasets A1:A10 and B1:B10. |
Percentile is a term in statistics that usually describes how a score compares to some other scores from the same set. These functions enhance ranking and percentile calculations for datasets. It is one of the best ways to break data into chunks for better valuation.
Function |
Description |
Examples |
PERCENTILE.RANK(array, x) |
Returns the percentile rank of a value in a dataset. |
=PERCENTILE.RANK(A1:A10, 5) → Returns: The percentile rank of the value 5 in the dataset A1:A10. |
DECILE(array, k) |
Divides data into 10 equal parts and returns the kth decile. |
=DECILE(A1:A10, 3) → Returns: The 3rd decile (30th percentile) of the dataset A1:A10. |
Time series forecasting refers to the process of analyzing time series data by leveraging relevant statistics and modeling. This helps professionals make predictions and inform strategic decision-making. These functions also provide additional Excel statistical methods for time-series analysis.
Function |
Description |
Examples |
HOLT.LINEAR(range, alpha, beta) |
Applies Holt’s linear trend method for forecasting. |
=HOLT.LINEAR(A1:A10, 0.3, 0.6) → Returns: The forecasted values for the dataset in A1:A10. |
DOUBLEEXP.SMOOTH(range, alpha, beta) |
Performs double exponential smoothing for trend forecasting. |
=DOUBLEEXP.SMOOTH(A1:A10, 0.4, 0.5) → Returns: The forecasted values for the dataset in A1:A10. |
Multivariate analysis (MVA) helps you evaluate multiple variables to identify any possible association among them. It offers a complete examination of data by analyzing all possible independent variables and their relationships. These functions assist in analyzing relationships in multidimensional datasets.
Function |
Description |
Examples |
FACTOR.ANALYSIS(data_range, factors) |
Performs factor analysis to identify latent variables. |
=FACTOR.ANALYSIS(A1:A10, 3) → Returns: The factor loadings and other results of factor analysis for the dataset in A1:A10. |
PRINCIPAL.COMPONENTS(data_range) |
Conducts Principal Component Analysis (PCA) for dimensionality reduction. |
=PRINCIPAL.COMPONENTS(A1:A10) → Returns: The principal components of the dataset in A1:A10. |
Advanced measures of dispersion in statistics include metrics like interquartile range, mean deviation, coefficient of variation, quartile deviation, and coefficient of mean deviation. These statistical functions often go beyond basic measures like range and standard deviation and help analyze the spread of data.
Function |
Description |
Examples |
RANGE(range) |
Returns the difference between the maximum and minimum values in a dataset. |
=RANGE(A1:A10) → Returns: The difference between the maximum and minimum values in the dataset A1:A10. |
CV(range) |
Computes the Excel correlation coefficient of variation, a relative measure of dispersion. |
=CV(A1:A10) → Returns: The coefficient of variation for the dataset A1:A10. |
ENTROPY(array) |
Calculates the entropy of a dataset, measuring randomness or disorder. |
=ENTROPY(A1:A10) → Returns: The entropy of the dataset A1:A10. |
A nonlinear form of regression analysis is one in which a function usually models observational data. The latter is specifically considered to be a nonlinear combination of the model parameters and depends on one or more independent variables. These statistical functions assist in fitting nonlinear models to datasets.
Function |
Description |
Examples |
LOGIT.REG(data_range, response, predictors) |
Performs logistic regression analysis. |
=LOGIT.REG(A1:A10, B1:B10, C1:C10) → Returns: The logistic regression model for the data in A1:A10 with response variable in B1:B10 and predictor variables in C1:C10. |
EXPREG(data_range, response, predictors) |
Fits an exponential regression model to data. |
=EXPREG(A1:A10, B1:B10, C1:C10) → Returns: The fitted exponential regression model for the dataset in A1:A10 with response variable in B1:B10 and predictors in C1:C10. |
Bayesian inference usually derives the posterior probability as a consequence of two different antecedents: a prior probability and a likelihood function. These antecedents are often derived from a statistical model for the observed data to assist in Bayesian statistical analysis.
Function |
Description |
Example |
MARKOV.CHAIN(data_range, transitions) |
Simulates a Markov chain process for probability modeling. |
=MARKOV.CHAIN(A1:A10, B1:B10) → Returns: A simulation of the Markov chain process for the data in A1:A10. |
Do you want to learn more about these functionalities in Excel? Pursue upGrad’s Business Analytics Certification Program!
Statistical analysis eliminates unnecessary information and catalogs only relevant data in a straightforward manner. This makes it easier to manage and organize inputs across organizational systems. Once the data is collected, Excel statistical analysis can be leveraged for various purposes, as outlined below:
Statistical analysis helps you identify trends, patterns, and insights from data. This facilitates informed decisions in areas such as sales, marketing, and operations.
Here’s how the analysis helps you make data-driven decisions:
Use statistical functions to calculate the medians or variances to define realistic KPIs and track progress effectively. Here is an overview of how these statistical functions help you identify KPIs:
Statistical Function |
Purpose |
AVERAGE(range) |
Determines the mean performance level |
MEDIAN(range) |
Identifies the central tendency |
VAR.S(range) |
Analyzes variance in performance |
STDEV.S(range) |
Measures performance consistency |
Microsoft Excel offers various statistical functions to analyze data effectively. This section compares four essential functions—AVERAGE, MEDIAN, CORREL, and Regression—highlighting their differences and ideal use cases.
The AVERAGE function Excel is affected by extreme values, making it less reliable for skewed data. MEDIAN, on the other hand, provides a more representative central value when outliers are present. Here is an overview of the primary differences between AVERAGE and MEDIAN.
Function |
Definition |
Best Use Cases |
Example |
AVERAGE |
Calculates the arithmetic mean by summing values and dividing by the count. |
When data is normally distributed or lacks extreme outliers. |
=AVERAGE(10, 20, 30, 1000) → 265 |
MEDIAN |
Returns the middle value of a dataset when sorted. |
When data has outliers or a skewed distribution. |
=MEDIAN(10, 20, 30, 1000) → 25 |
Both CORREL and Regression assess relationships between variables, but they serve different purposes. CORREL only provides a correlation coefficient, indicating strength but not causation. Regression, on the other hand, provides a detailed equation for prediction and analysis.
The following table showcases the primary differences between CORREL and Regression in detail:
Function |
Purpose |
Output |
Best For |
CORREL |
Measures the strength and direction of a linear relationship between two variables. |
A value between -1 and 1 (correlation coefficient). |
Quick analysis of variable relationships. |
Regression (LINEST/Analysis ToolPak) |
Predicts the dependent variable based on independent variables. |
Coefficients, R², intercept, slope, etc. |
Determining how one or more factors influence an outcome. |
Statistical functions in Excel help businesses and researchers analyze data, identify trends, and make informed decisions. Below are practical applications of key statistical functions.
Excel functions like AVERAGE and CORREL are essential for analyzing sales and revenue trends. Their key applications include:
Here’s an example of sales and revenue analysis for a better understanding:
Region |
Sales (Q1) |
Sales (Q2) |
Sales (Q3) |
Sales (Q4) |
East |
50,000 |
52,000 |
48,500 |
51,000 |
West |
47,000 |
49,500 |
45,800 |
46,200 |
Average |
=AVERAGE(B2:B5) → 49,125 |
Researchers use Excel to summarize and interpret large datasets with functions like MEDIAN, MODE, and VAR.P. Here is an overview of their key applications:
Here’s an example of academic research and Excel data interpretation:
Student |
Score |
A |
78 |
B |
82 |
C |
75 |
D |
92 |
E |
78 |
Median Score |
=MEDIAN(B2:B6) → 78 |
Most Common Score |
=MODE(B2:B6) → 78 |
Variance (Pop.) |
=VAR.P(B2:B6) → 43.2 |
Do you want to learn more about Excel functionalities? Pursue upGrad’s Executive Diploma in Data Science and AI now.
Excel allows users to enhance statistical calculations by applying conditions and combining multiple functions. Below are two powerful ways to customize statistical analysis using the Excel AVERAGE formula and other relevant functions:
Excel’s AVERAGEIF and COUNTIF functions help filter data before performing calculations. Their key applications include:
Here is an example of these calculations:
Employee |
Sales ($) |
A |
60,000 |
B |
45,000 |
C |
75,000 |
D |
52,000 |
Average Sales (Above 50K) |
=AVERAGEIF(B2:B5, ">50000") → 62,333 |
Count of Sales Above 50K |
=COUNTIF(B2:B5, ">50000") → 3 |
Users can customize data analysis to suit specific needs by combining statistical functions. Customizing Excel statistical formulas allows for deeper insights, which makes data-driven decisions more precise and meaningful. Here is an overview of the key applications:
Here is an example of these calculations:
Student |
Score |
Weight |
A |
85 |
0.3 |
B |
78 |
0.2 |
C |
92 |
0.5 |
Weighted Average |
=SUMPRODUCT(B2:B4, C2:C4) / SUM(C2:C4) → 86.1 |
Proper formatting and visualization of statistical data in Excel help improve clarity and highlight key insights. Below are some of the key techniques for presenting data effectively.
Conditional Formatting in Excel helps highlight important data points, making it easier to identify trends, patterns, and outliers. Here are its key applications:
Here’s an example of conditional formatting:
Region |
Sales ($) |
East |
120,000 |
West |
85,000 |
North |
150,000 |
South |
60,000 |
Pivot Tables are powerful Excel data summary tools for collating and analyzing large datasets efficiently. Their key applications include:
Here’s an example of a pivot table:
Region |
Total Sales |
Average Sales |
East |
480,000 |
120,000 |
West |
340,000 |
85,000 |
North |
600,000 |
150,000 |
South |
240,000 |
60,000 |
Visualizing statistical data makes it easier to interpret and present trends effectively. Their key applications are:
Here’s an example:
Month |
Sales ($) |
Jan |
50,000 |
Feb |
55,000 |
Mar |
60,000 |
Apr |
70,000 |
May |
90,000 |
Excel data analysis tools and features can significantly improve the efficiency and accuracy of data analysis. Below are the key tips for streamlining the process.
The Analysis ToolPak is an add-on that provides advanced statistical and Excel data analysis functions. Here are the key applications:
Here’s how you can enable the Analysis Toolpak:
Excel shortcuts can save time and improve efficiency when handling large datasets. The key shortcuts that you must be aware of are:
Accurate analysis starts with clean, well-structured data, and the key steps associated with this process are:
Here’s an example:
ID |
Name |
Sales ($) |
101 |
Alice |
50,000 |
102 |
Bob |
60,000 |
103 |
Alice |
50,000 |
Conditional formatting highlights key insights, trends, or anomalies in data. Their key applications are:
Here’s an example:
Product |
Sales ($) |
A |
120,000 |
B |
75,000 |
C |
150,000 |
D |
60,000 |
While Excel is a powerful tool for statistical analysis, mistakes in data handling and formula application can lead to incorrect conclusions. You can ensure your Excel statistical functions produce accurate and reliable results by avoiding specific mistakes.
Below are some of the most common pitfalls and how to prevent them:
Statistical functions in Excel require properly aligned data ranges. Misalignment occurs when:
Here’s how you can avoid misaligned data ranges:
Outliers can distort statistical calculations, often leading to misleading averages and incorrect correlations. You can avoid this by following the steps below:
Formula errors can disrupt calculations and lead to incorrect analysis. The following table highlights the common errors and fixes related to this statistical function in Excel:
Error Type |
Cause |
Fix |
#DIV/0! |
Division by zero or an empty cell in a denominator. |
Use =IF(B2=0, "Error", A2/B2) to avoid division by zero. |
#VALUE! |
Mismatched data types (e.g., text in a numerical formula). |
Check for non-numeric values using =ISNUMBER(A2). |
#REF! |
Invalid cell reference (deleted or moved data). |
Use named ranges or structured references. |
#N/A |
Missing values in a lookup formula. |
Use =IFERROR(VLOOKUP(A2, B2:C10, 2, FALSE), "Not Found"). |
Microsoft Excel is one of the most popular and prominent spreadsheet tools for statistical calculations. It also assists professionals in storing data in an orderly fashion. Excel is widely used for many applications, such as presenting and charting data, project and inventory management, and financial forecasting and budgeting.
If you’re an aspiring data analyst, financial analyst, or any other IT professional, it is essential to know the statistical functions in Excel. This enables you to perform accurate calculations and produce desirable results in organizations. Want to deepen your expertise? Enroll in upGrad’s free certification in Introduction to Data Analysis in Excel and take your skills to the next level!
Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!
Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!
Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!
Get Free Consultation
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources