File:Thiel-Sen estimator.svg
Summary
Description |
English: The en:Theil–Sen estimator (black line) of a set of sample points, compared to the simple linear regression line (blue). The points were generated by adding a small amount of jitter to points on the green dashed line and then replacing some of the points by random outliers. |
Date | |
Source | Own work |
Author | David Eppstein |
Licensing
I, the copyright holder of this work, hereby publish it under the following license:
![]() ![]() |
This file is made available under the Creative Commons CC0 1.0 Universal Public Domain Dedication. |
The person who associated a work with this deed has dedicated the work to the public domain by waiving all of their rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.
|
Source code
This image was created as a pdf file by the following Python code, then converted to SVG.
from pyx import canvas,path,color
from random import random,seed
seed(12345)
N = 103
noise = 10
slope = 1.0
def sample(x):
y = x * slope
if random() < (y/N)**3:
y = random()*N # outlier
else:
y += (random()-0.5)*noise # non-outlier, jitter
return y
samples = [(i*1.0,sample(i)) for i in range(N)]
c = canvas.canvas()
for x,y in samples:
c.fill(path.circle(x,y,0.5),[color.rgb.red])
def theilsen(samples):
N = len(samples)
def slope(i,j):
xi,yi = samples[i]
xj,yj = samples[j]
return (yi-yj)/(xi-xj)
def median(L):
L.sort()
if len(L) & 1:
return L[len(L)//2]
else:
return (L[len(L)//2 - 1] + L[len(L)//2])/2.0
m = median([slope(i,j) for i in range(N) for j in range(i)])
def error(i):
x,y = samples[i]
return y - m*x
b = median([error(i) for i in range(N)])
return m,b
m,b = 1,0
c.stroke(path.line(0,b,N,N*m+b),[color.rgb.green])
m,b = theilsen(samples)
c.stroke(path.line(0,b,N,N*m+b),[color.rgb.black])
def slr(samples):
N = len(samples)
sumxy = sum([x*y for x,y in samples])
sumx = sum([x for x,y in samples])
sumy = sum([y for x,y in samples])
sumxx = sum([x*x for x,y in samples])
m = (sumxy - sumx*sumy/N)/(sumxx - sumx**2/N)
b = sumy/N - m*sumx/N
return m,b
m,b = slr(samples)
c.stroke(path.line(0,b,N,N*m+b),[color.rgb.blue])
c.writePDFfile("ThielSen")