One-Sample Z-Test Calculator

Based on the standard normal distribution (μ = 0, σ = 1)

Test Type

H₁: μ ≠ μ₀ — for tests where direction is unspecified.

Significance Level (α)

Solution

Share:

Worked Examples

Two-Tailed

Bolt strength: x̄ = 105, μ₀ = 100, σ = 15, n = 36

A factory tests whether its bolts' mean tensile strength differs from the 100 N specification at α = 0.05.

  1. State H₀: μ = 100; H₁: μ ≠ 100 (two-tailed).
  2. Standard error: 15/√36 = 2.5.
  3. z = (105 − 100) / 2.5 = 2.00.
  4. Two-tailed p-value: 2 × (1 − Φ(2.00)) ≈ 0.0455.
  5. Critical values at α = 0.05: ±1.96.
  6. Since |2.00| > 1.96 and p < 0.05, reject H₀.

Borderline-significant result — the unrounded p-value sits just under 0.05. Always pair this with effect size (a 5 N shift on a process with σ = 15 N) and a confidence interval.

Right-Tailed

Test scores: x̄ = 78, μ₀ = 75, σ = 10, n = 50

An educator tests whether a new curriculum raises mean scores above the 75-point baseline at α = 0.05.

  1. State H₀: μ = 75; H₁: μ > 75 (right-tailed).
  2. Standard error: 10/√50 ≈ 1.4142.
  3. z = (78 − 75) / 1.4142 ≈ 2.1213.
  4. Right-tailed p-value: 1 − Φ(2.1213) ≈ 0.0169.
  5. Critical value at α = 0.05: 1.6449.
  6. Since 2.1213 > 1.6449 and p < 0.05, reject H₀.

Choose a one-tailed test only when the alternative direction is fixed in advance — selecting it after seeing the data inflates the false-positive rate.

Left-Tailed

Battery life: x̄ = 9.2, μ₀ = 10, σ = 1.5, n = 40

A consumer-watchdog tests whether a battery brand's mean life is below the advertised 10-hour benchmark at α = 0.05.

  1. State H₀: μ = 10; H₁: μ < 10 (left-tailed).
  2. Standard error: 1.5/√40 ≈ 0.2372.
  3. z = (9.2 − 10) / 0.2372 ≈ −3.3735.
  4. Left-tailed p-value: Φ(−3.3735) ≈ 0.00037.
  5. Critical value at α = 0.05: −1.6449.
  6. Since −3.3735 < −1.6449 and p < 0.05, reject H₀.

A very small p-value indicates strong evidence that mean battery life falls short of the claim. Even at α = 0.001 we would still reject H₀.

One-Sample Z-Test Statistic

The z-statistic measures how many standard errors the sample mean (x̄) sits from the hypothesized population mean (μ₀). σ is the known population standard deviation and n is the sample size. The denominator σ/√n is the standard error of the mean — a larger sample shrinks it, making small differences detectable.

z = (x̄ − μ₀) / (σ / √n)

How It Works

A one-sample z-test asks whether your sample's mean differs from a hypothesized population mean by more than chance would predict. You need four ingredients: the sample mean (x̄), the hypothesized mean under the null hypothesis (μ₀), the known population standard deviation (σ), and the sample size (n). The calculator computes the standard error σ/√n, divides the observed difference x̄ − μ₀ by it to get the z-statistic, and converts that z to a p-value using the standard normal distribution. The p-value is the probability of seeing a difference at least this extreme if the null hypothesis were true. Compare it to your significance level α to decide whether to reject H₀. Use a left-tailed test when the alternative hypothesis says μ < μ₀, a right-tailed test when it says μ > μ₀, and a two-tailed test when it just says μ ≠ μ₀.

Example Problem

A factory claims its bolts have a mean tensile strength of 100 N with σ = 15 N. You sample 36 bolts and measure x̄ = 105 N. Test whether the true mean differs from 100 at α = 0.05 (two-tailed).

  1. State H₀: μ = 100 and H₁: μ ≠ 100. The two-tailed alternative makes no directional claim.
  2. Compute the standard error: σ/√n = 15/√36 = 15/6 = 2.5.
  3. Compute the z-statistic: z = (105 − 100) / 2.5 = 5 / 2.5 = 2.00.
  4. Find the two-tailed p-value: p = 2 × (1 − Φ(|2.00|)) = 2 × (1 − 0.9772) ≈ 0.0455.
  5. Find the critical values at α = 0.05: ±z_{0.975} = ±1.96.
  6. Compare: |2.00| > 1.96 and p ≈ 0.0455 < 0.05, so we reject H₀.
  7. Conclusion: there is statistically significant evidence at α = 0.05 that the true mean tensile strength differs from 100 N.

This is a borderline-significant result — the p-value is just under 0.05. A stricter α = 0.01 threshold would not reject H₀, so report effect size and confidence interval alongside the p-value when the verdict hinges on a single threshold.

Key Concepts

Three numbers shape every z-test result. The first is the effect size (x̄ − μ₀), the raw difference between your sample mean and the null value. The second is the standard error σ/√n, which scales the effect by the precision of your estimate — larger samples shrink the standard error and let smaller effects reach significance. The third is the significance threshold α, the false-positive rate you're willing to accept. Crucially, a 'significant' z-test does not say the effect is large or important — only that it is unlikely under H₀. Always pair the p-value with the effect size and a confidence interval. Pre-specify the tail direction; running both directions and reporting the smaller p-value inflates the false-positive rate well above the nominal α.

Applications

  • Quality control — checking whether a production batch's mean differs from a specification target
  • Clinical trials with known historical variance — testing whether a new treatment shifts a measured outcome from a known baseline
  • Market research — testing whether a survey average differs from a benchmark or industry norm
  • A/B testing with known variance — measuring whether a metric's mean differs from a control value
  • Educational assessment — testing whether a class's mean score differs from a national average with known σ
  • Manufacturing process monitoring — checking whether a measured dimension's mean has drifted from spec

Common Mistakes

  • Using a z-test when σ is unknown and n is small — that calls for a t-test
  • Choosing the tail direction after seeing the data — pre-specify it from the alternative hypothesis
  • Confusing 'reject H₀' with 'H₁ is true with high probability' — p-values are about data given H₀, not the reverse
  • Reporting only the p-value without effect size, confidence interval, or sample size
  • Comparing p-values across studies with very different sample sizes as if they were on the same scale
  • Treating a non-significant result as evidence the true effect is zero — absence of evidence is not evidence of absence

Frequently Asked Questions

What is a one-sample z-test?

A one-sample z-test compares a sample mean (x̄) to a hypothesized population mean (μ₀) when the population standard deviation (σ) is known. It computes z = (x̄ − μ₀) / (σ/√n) and reports a p-value indicating how surprising the observed difference would be if the null hypothesis were true.

When should I use a z-test instead of a t-test?

Use a z-test when the population standard deviation σ is known, or when the sample size is large enough (commonly n > 30) that the sample standard deviation is a reliable proxy. Use a t-test when σ is unknown and n is small. The t-test uses the t-distribution, which has heavier tails than the normal — it is more conservative for small samples.

What does the p-value tell me?

The p-value is the probability of obtaining a sample mean at least as extreme as yours, assuming the null hypothesis is true. Smaller p-values indicate stronger evidence against H₀. A p-value below your chosen α (commonly 0.05) leads to rejecting H₀, but the p-value does not measure the size or practical importance of the effect.

How do I choose the tail direction?

Choose left-tailed when your alternative hypothesis is μ < μ₀, right-tailed when it is μ > μ₀, and two-tailed when it is μ ≠ μ₀. Decide before you look at the data — picking the tail post hoc inflates the false-positive rate.

What does 'reject the null hypothesis' actually mean?

It means the data is unusual enough under H₀ that the most plausible explanation is that H₀ is wrong. It does not prove H₁ is true — it just shifts the burden of evidence. Rejection at α = 0.05 means that if H₀ were really true, you would see results this extreme by chance no more than 5% of the time.

What is the standard error of the mean?

The standard error of the mean is σ/√n — the standard deviation of the sampling distribution of x̄. It tells you how much x̄ would typically vary from the true population mean across repeated samples of size n. Larger samples make x̄ a more precise estimate, so the standard error shrinks as n grows.

What if my z-test result is not significant?

A non-significant result means the data is consistent with H₀ — but it does not prove H₀ is true. The effect could be smaller than your study had power to detect, or your sample could be too small. Report the effect size and confidence interval; consider whether a larger study could detect a meaningful effect.

Can I run the test if my data is not exactly normal?

For sample sizes above about 30, the central limit theorem ensures the sampling distribution of x̄ is approximately normal regardless of the underlying data shape — the z-test is robust. For small samples drawn from a strongly skewed distribution, consider a non-parametric alternative such as the Wilcoxon signed-rank test.

Reference: The one-sample z-test computes z = (x̄ − μ₀) / (σ/√n) and converts to a p-value using the standard normal cumulative distribution function via the Abramowitz and Stegun rational approximation. Critical values are produced from the inverse normal CDF (Acklam's rational approximation) at the chosen significance level α.

Related Calculators

Related Sites