Two-Sample Z-Test Calculator
Based on the standard normal distribution (μ = 0, σ = 1). Tests H₀: μ₁ = μ₂ for two independent samples with known σ.
Based on the standard normal distribution (μ = 0, σ = 1). Tests H₀: μ₁ = μ₂ for two independent samples with known σ.
Two-Tailed
A factory tests whether two production lines produce bolts with different mean tensile strength at α = 0.05.
A non-significant result means the data don't rule out equal means — but they also don't confirm it. Report the mean difference (5 N) and a confidence interval alongside the p-value to give context.
Right-Tailed
A clinical study tests whether a new treatment raises a mean outcome above a control's at α = 0.05.
Borderline result — at a more lenient α = 0.10 the test would reject. With equal n's and σ's the standard error simplifies to σ × √(2/n).
Left-Tailed
A consumer-watchdog tests whether brand A's mean battery life is below brand B's at α = 0.05.
Strong evidence that brand A's mean battery life is below brand B's. The result would still reject at the stricter α = 0.01 level (p ≈ 0.0042 < 0.01).
The z-statistic measures how many standard errors apart the two sample means are. The numerator is the observed difference in sample means; the denominator is the standard error of that difference, built from each group's known population variance and sample size. Larger samples shrink the standard error, letting smaller mean differences reach significance.
z = (x̄₁ − x̄₂) / √(σ₁²/n₁ + σ₂²/n₂)
A two-sample z-test asks whether two independent groups' population means differ by more than chance would predict. You need six ingredients: each group's sample mean (x̄₁, x̄₂), its known population standard deviation (σ₁, σ₂), and its sample size (n₁, n₂). The calculator computes the standard error of the mean difference √(σ₁²/n₁ + σ₂²/n₂), divides the observed difference x̄₁ − x̄₂ by it to get the z-statistic, and converts that z to a p-value via the standard normal distribution. Compare the p-value to your significance level α to decide whether to reject H₀: μ₁ = μ₂. Use a left-tailed test when H₁ says μ₁ < μ₂, a right-tailed test when H₁ says μ₁ > μ₂, and a two-tailed test when the alternative is just μ₁ ≠ μ₂.
A factory tests two manufacturing lines. Line A produces 36 bolts with mean tensile strength x̄₁ = 105 N (known σ₁ = 15 N). Line B produces 49 bolts with mean tensile strength x̄₂ = 100 N (known σ₂ = 12 N). Test whether the lines produce bolts with different mean strength at α = 0.05 (two-tailed).
A non-significant result is not proof that the two lines are identical — it just says the difference observed (5 N) is within what sampling noise can produce when σ₁ = 15, σ₂ = 12, and the samples are this size. A larger study or a tighter σ would have more power to detect the same effect.
The two-sample z-test rests on three quantities. First is the observed mean difference (x̄₁ − x̄₂), the raw effect. Second is the standard error of the difference √(σ₁²/n₁ + σ₂²/n₂) — this is the typical sampling fluctuation in that difference and shrinks as either sample grows or either σ shrinks. Third is the significance level α you choose in advance. Independence matters: the two samples must be drawn from separate populations, not paired or matched. When samples are paired (before/after on the same subjects), use a paired test on the differences instead. The test also assumes both σ's are known. When σ's are estimated from the samples and n is small, switch to Welch's t-test. With large samples, the z-test and Welch's t-test give nearly identical answers because the t-distribution converges to the normal.
A two-sample z-test compares the means of two independent groups when both population standard deviations are known. It computes z = (x̄₁ − x̄₂) / √(σ₁²/n₁ + σ₂²/n₂) and reports a p-value indicating how surprising the observed difference would be if the two population means were truly equal.
Use a two-sample z-test when both population standard deviations σ₁ and σ₂ are known, or when both sample sizes are large enough (commonly n > 30 each) that the sample standard deviations are reliable proxies. Use Welch's t-test when σ's are unknown and samples are small. With large samples both methods converge.
No. The standard error √(σ₁²/n₁ + σ₂²/n₂) lets each group contribute its own variance — it does not assume σ₁ = σ₂. This matches the structure of Welch's t-test and is appropriate even when the variances clearly differ. If σ₁ = σ₂ the calculator still works correctly.
Use a paired test when each observation in group 1 is naturally matched to one in group 2 — before/after measurements on the same subjects, twin pairs, or matched-pair experimental designs. The two-sample z-test assumes the groups are independent, so applying it to paired data wastes power and can be misleading.
The p-value is the probability of observing a mean difference at least as extreme as yours, assuming the two population means are equal. Smaller p-values indicate stronger evidence against H₀: μ₁ = μ₂. A p-value below your chosen α leads to rejecting H₀, but the p-value does not measure the size or practical importance of the difference.
Choose left-tailed when your alternative hypothesis is μ₁ < μ₂, right-tailed when it is μ₁ > μ₂, and two-tailed when it is μ₁ ≠ μ₂. Decide before you look at the data — picking the tail post hoc inflates the false-positive rate.
Yes. The standard error formula √(σ₁²/n₁ + σ₂²/n₂) handles unequal sample sizes naturally. The smaller group dominates the standard error, so a 20-vs-200 design has roughly the precision of a balanced 20-vs-20 design — adding more to the larger group brings diminishing returns.
Swapping group 1 and group 2 flips the sign of z and the sign of the mean difference, but the p-value is unchanged for a two-tailed test. For one-tailed tests, swapping the groups also reverses the tail direction (left becomes right). The conclusion about whether the means differ is the same either way.
Reference: The two-sample z-test computes z = (x̄₁ − x̄₂) / √(σ₁²/n₁ + σ₂²/n₂) and converts to a p-value using the standard normal cumulative distribution function via the Abramowitz and Stegun rational approximation. Critical values are produced from the inverse normal CDF (Acklam's rational approximation) at the chosen significance level α. The formula assumes the two samples are independent and both population standard deviations are known.