Wikimore

Summary

DescriptionThiel-Sen estimator.svg	English: The en:Theil–Sen estimator (black line) of a set of sample points, compared to the simple linear regression line (blue). The points were generated by adding a small amount of jitter to points on the green dashed line and then replacing some of the points by random outliers.
Date	3 July 2011
Source	Own work
Author	David Eppstein

Licensing

I, the copyright holder of this work, hereby publish it under the following license:

	This file is made available under the Creative Commons CC0 1.0 Universal Public Domain Dedication.
	The person who associated a work with this deed has dedicated the work to the public domain by waiving all of their rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. CC0falsefalse

Category:CC-Zero#Thiel-Sen%20estimator.svg

Category:Self-published work

Source code

This image was created as a pdf file by the following Python code, then converted to SVG.

from pyx import canvas,path,color
from random import random,seed
seed(12345)

N = 103
noise = 10
slope = 1.0

def sample(x):
    y = x * slope
    if random() < (y/N)**3:
        y = random()*N      # outlier
    else:
        y += (random()-0.5)*noise # non-outlier, jitter
    return y

samples = [(i*1.0,sample(i)) for i in range(N)]

c = canvas.canvas()

for x,y in samples:
    c.fill(path.circle(x,y,0.5),[color.rgb.red])

def theilsen(samples):
    N = len(samples)
    def slope(i,j):
        xi,yi = samples[i]
        xj,yj = samples[j]
        return (yi-yj)/(xi-xj)
    
    def median(L):
        L.sort()
        if len(L) & 1:
            return L[len(L)//2]
        else:
            return (L[len(L)//2 - 1] + L[len(L)//2])/2.0
    
    m = median([slope(i,j) for i in range(N) for j in range(i)])
    
    def error(i):
        x,y = samples[i]
        return y - m*x
    
    b = median([error(i) for i in range(N)])
    
    return m,b

m,b = 1,0
c.stroke(path.line(0,b,N,N*m+b),[color.rgb.green])

m,b = theilsen(samples)
c.stroke(path.line(0,b,N,N*m+b),[color.rgb.black])

def slr(samples):
    N = len(samples)
    sumxy = sum([x*y for x,y in samples])
    sumx = sum([x for x,y in samples])
    sumy = sum([y for x,y in samples])
    sumxx = sum([x*x for x,y in samples])
    m = (sumxy - sumx*sumy/N)/(sumxx - sumx**2/N)
    b = sumy/N - m*sumx/N
    return m,b

m,b = slr(samples)
c.stroke(path.line(0,b,N,N*m+b),[color.rgb.blue])

c.writePDFfile("ThielSen")

Category:Scatterplots Category:Regression analysis Category:Files by User:David Eppstein from en.wikipedia Category:Images with Python source code Category:Coefficient of determination

Wikimore

File:Thiel-Sen estimator.svg

Summary

Licensing

Source code