Spiral estimator, a robust sampling estimator.
This should be more robust than
This is a brainchild of this library’s lead developer Icelk.
You supply a
fitness_function to all functions which tells the algorithm which lines are
good. The magnitude is irrelevant, only order is considered. The algorithm tries to minimize
the returned value. This allows you to choose the desired properties of resulting
line/polynomial, without checking all possible values.
The polynomial regression implementation only allows for degrees 1 & 2. See details for more info on this.
The sampling technique means this might miss the right point to close in on. Therefore, I highly recommend using the higher quality options.
Since this uses a fitness function, the robustness is determined by that. Using the “default”
manhattan_distance gives good results (think least squares, but without the squared
importance of errors). This is what the implementations for
Since this tests a wide range of possibilities before deciding on one, it’s very likely we don’t get trapped in a local maxima.
The functions are
O(fitness function) where
O(fitness function) is the time
complexity of your
fitness_function. That’s often
O(n) as you’d probably in some way
sum up the points relative to the model.
This puts the algorithm similar to
ols, but with much worse (read: 4x-100x) performance.
This may be justified by the advantages.
It scales much better than
theil_sen and is more robust, but when the count of points is
theil_sen is faster.
The idea is to make a phase space
of the parameters to a line (
y=(slope)*x + (y-intersect)). We then traverse the phase space
with a logarithmic spiral
and sample points (we start at the angle θ e.g.
-12π and go to a max value, e.g.
on an interval. When the range of the spiral has been sampled, we choose the best point and
create a spiral there. Depending on how far out the new point, scale the whole spiral’s size
distance.sqrt().sqrt()). Then repeat.
Parameters are chosen for an optimal spiral. The logarithmic spiral was chosen due to the distribution of unknown numbers (which the coefficients of the line are). There’s generally more numbers in the range 0..100 than in 100..200. Therefore, we search more numbers in 0..100.
We can do this in 3 dimensions by creating a 3d spiral. For this, I used a
where the radius grows with the angle of the spiral, calculated by
k is a
parameter, similar to how a logarithmic spiral is created.
On implementing third degree polynomials, can we get a spiral on a hypersphere? Or do we just need a even distribution?
spiral::Options for more info on the parameters.
- Options for the spiral.
LogisticEstimatorwith a known ceiling for the input values. Uses manhattan distance as the fitness function.
- Samples points on a spherical spiral in the phase space of all second degree polynomials. As θ (the angle) increases, the imaginary sphere’s size is increased. This gives a good distribution of sample points in 3d space.
- Samples points on a logarithmic spiral in the phase space of all possible straight lines.