Turning a list of numbers into a probability density function

By Tom Sittler, Director

Suppose you have a large list of n random samples from a continuous distribution, and you want to approximate the probability density function of the distribution.

You can’t just take the ith sample for all i and calculate the frequency of each value in the list. Your samples are from a continuous distribution, so each value is likely to be present only once in the sample. If you used this procedure, your probability density function would take the value 1/n everywhere.

Instead, you need to define buckets (small ranges over the domain of your function) and count how many values fall in each bucket.

I provide some code here to do this in Python, and paste the code below as well:

with open("./file1") as f:

 mylist = f.readlines()

mylist = [float(x.strip('\n')) for x in mylist] 

num=len(mylist)

maxl=max(mylist)

minl=min(mylist)


# turn list of values into PDF

n=100 # number of buckets

rangel=maxl - minl

step=rangel/n

bound=minl

countdict_pdf= {} # dictionary {bound : number of items between bound and bound+step}

for i in range(0,n):

 countdict_pdf[bound]=sum(1 if (bound<x and x<=bound+step) else 0 for x in mylist)/num

 bound=bound+step


#export dictionary to csv

import csv

with open('countdict_pdf.csv','w') as csv_file:

 writer = csv.writer(csv_file)

 for key, value in countdict_pdf.items():

writer.writerow([key, value])