*By Tom Sittler*

Suppose you have a large list of n random samples from a continuous distribution, and you want to approximate the probability density function of the distribution.

You can’t just take the ith sample for all i and calculate the frequency of each value in the list. Your samples are from a continuous distribution, so each value is likely to be present only once in the sample. If you used this procedure, your probability density function would take the value 1/n everywhere.

Instead, you need to define buckets (small ranges over the domain of your function) and count how many values fall in each bucket.

I provide some code here to do this in Python, and paste the code below as well:

with open("./file1") as f: mylist = f.readlines() mylist = [float(x.strip('\n')) for x in mylist] num=len(mylist) maxl=max(mylist) minl=min(mylist) # turn list of values into PDF n=100 # number of buckets rangel=maxl - minl step=rangel/n bound=minl countdict_pdf= {} # dictionary {bound : number of items between bound and bound+step} for i in range(0,n): countdict_pdf[bound]=sum(1 if (bound<x and x<=bound+step) else 0 for x in mylist)/num bound=bound+step #export dictionary to csv import csv with open('countdict_pdf.csv','w') as csv_file: writer = csv.writer(csv_file) for key, value in countdict_pdf.items(): writer.writerow([key, value])