Estimating \(\pi\).

As you should probably know, \(\pi\) is an irrational number (what does that mean?) and also transcedental. Estimation of \(\pi\) has been a problem which has intrigued people for millenia. There are many famous formula which give \(\pi\), but involve infinite sums, such as \[ \pi =\frac{4}{1}-\frac{4}{3}+\frac{4}{5}-\frac{4}{7}+\frac{4}{9}+\ldots \] Here we will see another estimate, based on the area of a circle, and using an essential tool in statistics, random numbers.

Given a circle of radius \(\frac{1}{2}\), with a unit circle circumscribed around it, we know the square has area \(1\), while the circle has area \(\frac{\pi}{4}\). So if we could estimate the area of the circle as a ratio of the area of the square then this gives an estimate of \(\pi\). Suppose we can draw random points in the unit square. Then these points will be inside the circle if their distance from the center of the circle is less than \(1/2\). Otherwise they are inside the square but not the circle. Then we can estimate \(\pi\) as \[ \pi \approx 4\frac{N_{in}}{N_{total}} \] This can easily be done

center = c(0.5,0.5)
n = 100
x = runif(n)
y = runif(n)

distance = sqrt( (x-0.5)^2+(y-0.5)^2 )
summary(distance)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.05993 0.26574 0.38262 0.37474 0.48316 0.69908
length(which(distance < 0.5))
## [1] 79
estimate = 4*(length(which(distance < 0.5))/n)
estimate
## [1] 3.16

If we repeat this again with new random draws, the answer will likely be different. Also, as \(n\) is relatively small here there could be a large amount of variance in the answer.

center = c(0.5,0.5)
n = 100
x = runif(n)
y = runif(n)

distance = sqrt( (x-0.5)^2+(y-0.5)^2 )
estimate = 4*(length(which(distance < 0.5))/n)
estimate
## [1] 3.24
center = c(0.5,0.5)
n = 100
x = runif(n)
y = runif(n)

distance = sqrt( (x-0.5)^2+(y-0.5)^2 )
estimate = 4*(length(which(distance < 0.5))/n)
estimate
## [1] 2.92

With larger \(n\) then the estimate will get closer to the through value, and there will be less variance.

center = c(0.5,0.5)
n = 1000000
x = runif(n)
y = runif(n)

distance = sqrt( (x-0.5)^2+(y-0.5)^2 )
summary(distance)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0003302 0.2822225 0.3990435 0.3827369 0.4888618 0.7069171
length(which(distance < 0.5))
## [1] 784901
estimate = 4*(length(which(distance < 0.5))/n)
estimate
## [1] 3.139604

One way to see how the variablility decreases is to look at the running total.

n = 10000
running_pi = c(rep(0,n))
distance = c(rep(0,n))

for(i in 1:n){
  x = runif(1)
  y = runif(1)
  distance[i] = sqrt( (x-0.5)^2+(y-0.5)^2 )
  running_pi[i] = 4*(length(which(distance[1:i] < 0.5))/i)
}

plot(running_pi,type = 'l', main = "Estimate over iterations")
abline(a=3.14, b=0, col="red")


pi = data.frame(iter = c(1:n), pi_iter= running_pi)
library(ggplot2)

library(latex2exp)
ggplot(pi, aes(iter, pi_iter)) +
  geom_line()+
  ggtitle("Estimate evolving over time") +
  ylab(TeX("Estimate of $\\pi$")) +
  xlab("Iterations")+
  geom_abline(slope = 0,intercept = 3.14159,colour="red",alpha = 0.25)

Questions

Adapt the above code to estimate \(\pi\), where instead of simulate over the whole unit circle you use only one quadrant? Investigate whether this gives better or worse estimates for a given \(n\). Does it seem to be any better? Why? Why not?

History

While thankfully we can perform this simulation easily using random numbers generated by the computer, previously people had to perform these simulations themselves, sometimes thousands of times. This is described for a very similar method known as Buffons Needle (https://en.wikipedia.org/wiki/Buffon%27s_needle). Can you write some code to implement this method for estimating \(\pi\)?