0.4.0 add useStdDev param of constructor and clear
This commit is contained in:
parent
f7e34b9f16
commit
815c181503
10 changed files with 476 additions and 1 deletions
2
LICENSE
2
LICENSE
|
@ -1,6 +1,6 @@
|
|||
MIT License
|
||||
|
||||
Copyright (c) 2020 Rob Tillaart
|
||||
Copyright (c) 2010-2020 Rob Tillaart
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
|
|
79
README.md
79
README.md
|
@ -1,2 +1,81 @@
|
|||
# Statistic
|
||||
|
||||
Statistic library for Arduino includes sum, average, variance and std deviation
|
||||
|
||||
# Description
|
||||
|
||||
The statistic library is made to get basic statistical information from a
|
||||
one dimensional set of data, e.g. a stream of values of a sensor.
|
||||
|
||||
The stability of the formulas is improved by the help of Gil Ross (Thanks!)
|
||||
|
||||
The functions implemented are:
|
||||
|
||||
* **clear(useStdDev)**
|
||||
* **add(value)**
|
||||
* **count()** returns zero if count == zero (of course)
|
||||
* **sum()** returns zero if count == zero
|
||||
* **minimum()** returns zero if count == zero
|
||||
* **maximum()** returns zero if count == zero
|
||||
* **average()** returns NAN if count == zero
|
||||
|
||||
These three functions only work id useStdDev == true:
|
||||
|
||||
* **variance()** returns NAN if count == zero
|
||||
* **pop_stdev()** population stdev, returns NAN if count == zero
|
||||
* **unbiased_stdev()** returnsNAN if count == zero
|
||||
|
||||
|
||||
# Operational
|
||||
|
||||
See examples
|
||||
|
||||
# FAQ
|
||||
|
||||
### Q: Are individual samples still available?
|
||||
The values added to the library are not stored in the lib as it would use lots
|
||||
of memory quite fast. Instead a few calculated values are kept to be able to
|
||||
calculate the most important statistics.
|
||||
|
||||
|
||||
### Q: How many samples can the lib hold? (internal variables and overflow)
|
||||
The counter of samples is an **uint32_t**, implying a maximum of about **4 billion** samples.
|
||||
In practice 'strange' things might happen before this number is reached.
|
||||
There are two internal variables, **_sum** which is the sum of the values and **_ssq**
|
||||
which is the sum of the squared values. Both can overflow especially **_ssq**
|
||||
can and probably will grow fast. The library does not protect against it.
|
||||
|
||||
There is a workaround for this (to some extend) if one knows the approx
|
||||
average of the samples before. Before adding values to the lib subtract
|
||||
the expected average. The sum of the samples would move to around zero.
|
||||
This workaround has no influence on the standard deviation.
|
||||
|
||||
!! Do not forget to add the expected average to the calculated average.
|
||||
|
||||
*(Q: should this subtraction trick be build into the lib?)*
|
||||
|
||||
|
||||
### Q: How about the precision of the library?
|
||||
The precision of the internal variables is restricted due to the fact
|
||||
that they are 32 bit float (IEEE754). If the internal variable **_sum** has
|
||||
a large value, adding relative small values to the dataset wouldn't
|
||||
change its value any more. Same is true for **_ssq**. One might argue that
|
||||
statistically speaking these values are less significant, but in fact it is wrong.
|
||||
|
||||
There is a workaround for this (to some extend). If one has the samples in an
|
||||
array or on disk, one can sort the samples in increasing order (abs value)
|
||||
and add them from this sorted list. This will minimize the error,
|
||||
but it works only if the samples are available and the they may be added
|
||||
in the sorted increasing order.
|
||||
|
||||
|
||||
### Q: When will internal var's overflow? esp. squared sum
|
||||
IEEE754 floats have a max value of about **+-3.4028235E+38**
|
||||
|
||||
|
||||
### Q: Why are there two functions for stdev?
|
||||
There are two stdev functions the population stdev and the unbiased stdev.
|
||||
See Wikipedia for an elaborate description of the difference between these two.
|
||||
|
||||
|
||||
|
||||
|
|
134
Statistic.cpp
Normal file
134
Statistic.cpp
Normal file
|
@ -0,0 +1,134 @@
|
|||
//
|
||||
// FILE: Statistic.cpp
|
||||
// AUTHOR: Rob dot Tillaart at gmail dot com
|
||||
// modified at 0.3 by Gil Ross at physics dot org
|
||||
// VERSION: 0.4.0
|
||||
// PURPOSE: Recursive statistical library for Arduino
|
||||
//
|
||||
// NOTE: 2011-01-07 Gill Ross
|
||||
// Rob Tillaart's Statistic library uses one-pass of the data (allowing
|
||||
// each value to be discarded), but expands the Sum of Squares Differences to
|
||||
// difference the Sum of Squares and the Average Squared. This is susceptible
|
||||
// to bit length precision errors with the float type (only 5 or 6 digits
|
||||
// absolute precision) so for long runs and high ratios of
|
||||
// the average value to standard deviation the estimate of the
|
||||
// standard error (deviation) becomes the difference of two large
|
||||
// numbers and will tend to zero.
|
||||
//
|
||||
// For small numbers of iterations and small Average/SE th original code is
|
||||
// likely to work fine.
|
||||
// It should also be recognised that for very large samples, questions
|
||||
// of stability of the sample assume greater importance than the
|
||||
// correctness of the asymptotic estimators.
|
||||
//
|
||||
// This recursive algorithm, which takes slightly more computation per
|
||||
// iteration is numerically stable.
|
||||
// It updates the number, mean, max, min and SumOfSquaresDiff each step to
|
||||
// deliver max min average, population standard error (standard deviation) and
|
||||
// unbiassed SE.
|
||||
// -------------
|
||||
//
|
||||
// HISTORY:
|
||||
// 0.1 2010-10-29 initial version
|
||||
// 0.2 2010-10-29 stripped to minimal functionality
|
||||
// 0.2.01 2010-10-30
|
||||
// added minimim, maximum, unbiased stdev,
|
||||
// changed counter to long -> int overflows @32K samples
|
||||
// 0.3 2011-01-07
|
||||
// branched from 0.2.01 version of Rob Tillaart's code
|
||||
// 0.3.1 2012-11-10 minor edits
|
||||
// 0.3.2 2012-11-10 minor edits
|
||||
// changed count -> unsigned long allows for 2^32 samples
|
||||
// added variance()
|
||||
// 0.3.3 2015-03-07
|
||||
// float -> double to support ARM (compiles)
|
||||
// moved count() sum() min() max() to .h; for optimizing compiler
|
||||
// 0.3.4 2017-07-31
|
||||
// Refactored const in many places
|
||||
// [reverted] double to float on request as float is 99.99% of the cases
|
||||
// good enough and float(32 bit) is supported in HW for some processors.
|
||||
// 0.3.5 2017-09-27
|
||||
// Added #include <Arduino.h> to fix uint32_t bug
|
||||
// 0.4.0 2020-05-13
|
||||
// refactor
|
||||
// Added flag to switch on the use of stdDev runtime. [idea marc.recksiedl]
|
||||
|
||||
|
||||
#include "Statistic.h"
|
||||
|
||||
Statistic::Statistic(bool useStdDev)
|
||||
{
|
||||
clear(useStdDev);
|
||||
}
|
||||
|
||||
void Statistic::clear(bool useStdDev) // useStdDev default true.
|
||||
{
|
||||
_cnt = 0;
|
||||
_sum = 0;
|
||||
_min = 0;
|
||||
_max = 0;
|
||||
_useStdDev = useStdDev;
|
||||
_ssqdif = 0.0;
|
||||
// note not _ssq but sum of square differences
|
||||
// which is SUM(from i = 1 to N) of f(i)-_ave_N)**2
|
||||
}
|
||||
|
||||
// adds a new value to the data-set
|
||||
void Statistic::add(const float value)
|
||||
{
|
||||
if (_cnt == 0)
|
||||
{
|
||||
_min = value;
|
||||
_max = value;
|
||||
} else {
|
||||
if (value < _min) _min = value;
|
||||
else if (value > _max) _max = value;
|
||||
}
|
||||
_sum += value;
|
||||
_cnt++;
|
||||
|
||||
if (_useStdDev && (_cnt > 1))
|
||||
{
|
||||
float _store = (_sum / _cnt - value);
|
||||
_ssqdif = _ssqdif + _cnt * _store * _store / (_cnt - 1);
|
||||
|
||||
// ~10% faster but limits the amount of samples to 65K as _cnt*_cnt overflows
|
||||
// float _store = _sum - _cnt * value;
|
||||
// _ssqdif = _ssqdif + _store * _store / (_cnt*_cnt - _cnt);
|
||||
//
|
||||
// solution: TODO verify
|
||||
// _ssqdif = _ssqdif + (_store * _store / _cnt) / (_cnt - 1);
|
||||
}
|
||||
}
|
||||
|
||||
// returns the average of the data-set added sofar
|
||||
float Statistic::average() const
|
||||
{
|
||||
if (_cnt == 0) return NAN; // prevent DIV0 error
|
||||
return _sum / _cnt;
|
||||
}
|
||||
|
||||
// Population standard deviation = s = sqrt [ S ( Xi - µ )2 / N ]
|
||||
// http://www.suite101.com/content/how-is-standard-deviation-used-a99084
|
||||
float Statistic::variance() const
|
||||
{
|
||||
if (!_useStdDev) return NAN;
|
||||
if (_cnt == 0) return NAN; // prevent DIV0 error
|
||||
return _ssqdif / _cnt;
|
||||
}
|
||||
|
||||
float Statistic::pop_stdev() const
|
||||
{
|
||||
if (!_useStdDev) return NAN;
|
||||
if (_cnt == 0) return NAN; // prevent DIV0 error
|
||||
return sqrt( _ssqdif / _cnt);
|
||||
}
|
||||
|
||||
float Statistic::unbiased_stdev() const
|
||||
{
|
||||
if (!_useStdDev) return NAN;
|
||||
if (_cnt < 2) return NAN; // prevent DIV0 error
|
||||
return sqrt( _ssqdif / (_cnt - 1));
|
||||
}
|
||||
|
||||
// -- END OF FILE --
|
45
Statistic.h
Normal file
45
Statistic.h
Normal file
|
@ -0,0 +1,45 @@
|
|||
#pragma once
|
||||
//
|
||||
// FILE: Statistic.h
|
||||
// AUTHOR: Rob dot Tillaart at gmail dot com
|
||||
// modified at 0.3 by Gil Ross at physics dot org
|
||||
// VERSION: 0.4.0
|
||||
// PURPOSE: Recursive Statistical library for Arduino
|
||||
// HISTORY: See Statistic.cpp
|
||||
//
|
||||
|
||||
#include <Arduino.h>
|
||||
#include <math.h>
|
||||
|
||||
#define STATISTIC_LIB_VERSION "0.4.0"
|
||||
|
||||
class Statistic
|
||||
{
|
||||
public:
|
||||
Statistic(bool useStdDev = true); // "switches on/off" stdev run time
|
||||
void clear(bool useStdDev = true); // "switches on/off" stdev run time
|
||||
void add(const float);
|
||||
|
||||
// returns the number of values added
|
||||
uint32_t count() const { return _cnt; }; // zero if count == zero
|
||||
float sum() const { return _sum; }; // zero if count == zero
|
||||
float minimum() const { return _min; }; // zero if count == zero
|
||||
float maximum() const { return _max; }; // zero if count == zero
|
||||
float average() const; // NAN if count == zero
|
||||
|
||||
// useStdDev must be true to use next three
|
||||
float variance() const; // NAN if count == zero
|
||||
float pop_stdev() const; // population stdev // NAN if count == zero
|
||||
float unbiased_stdev() const; // NAN if count == zero
|
||||
|
||||
protected:
|
||||
uint32_t _cnt;
|
||||
float _sum;
|
||||
float _min;
|
||||
float _max;
|
||||
bool _useStdDev;
|
||||
float _ssqdif; // sum of squares difference
|
||||
|
||||
};
|
||||
|
||||
// -- END OF FILE --
|
53
examples/Average/Average.ino
Normal file
53
examples/Average/Average.ino
Normal file
|
@ -0,0 +1,53 @@
|
|||
//
|
||||
// FILE: Average.ino
|
||||
// AUTHOR: Rob dot Tillaart at gmail dot com
|
||||
// VERSION: 0.4
|
||||
// PURPOSE: Sample sketch for statistic library Arduino
|
||||
//
|
||||
|
||||
#include "Statistic.h"
|
||||
|
||||
Statistic myStats;
|
||||
|
||||
uint32_t start;
|
||||
uint32_t stop;
|
||||
|
||||
void setup(void)
|
||||
{
|
||||
Serial.begin(115200);
|
||||
Serial.println(__FILE__);
|
||||
Serial.print("Demo Statistics lib ");
|
||||
Serial.println(STATISTIC_LIB_VERSION);
|
||||
myStats.clear(); //explicitly start clean
|
||||
start = millis();
|
||||
}
|
||||
|
||||
void loop(void)
|
||||
{
|
||||
long rn = random(0, 9999);
|
||||
myStats.add(rn / 100.0 + 1);
|
||||
if (myStats.count() == 10000)
|
||||
{
|
||||
stop = millis();
|
||||
Serial.print(" Count: ");
|
||||
Serial.println(myStats.count());
|
||||
Serial.print(" Min: ");
|
||||
Serial.println(myStats.minimum(), 4);
|
||||
Serial.print(" Max: ");
|
||||
Serial.println(myStats.maximum(), 4);
|
||||
Serial.print(" Average: ");
|
||||
Serial.println(myStats.average(), 4);
|
||||
Serial.print(" variance: ");
|
||||
Serial.println(myStats.variance(), 4);
|
||||
Serial.print(" pop stdev: ");
|
||||
Serial.println(myStats.pop_stdev(), 4);
|
||||
Serial.print(" unbias stdev: ");
|
||||
Serial.println(myStats.unbiased_stdev(), 4);
|
||||
Serial.print(" time(ms): ");
|
||||
Serial.println(stop - start);
|
||||
Serial.println("=====================================");
|
||||
myStats.clear();
|
||||
delay(1000);
|
||||
start = millis();
|
||||
}
|
||||
}
|
45
examples/StatisticArray/StatisticArray.ino
Normal file
45
examples/StatisticArray/StatisticArray.ino
Normal file
|
@ -0,0 +1,45 @@
|
|||
//
|
||||
// FILE: StatisticArray.ino
|
||||
// AUTHOR: Rob dot Tillaart at gmail dot com
|
||||
// VERSION: 0.1
|
||||
// PURPOSE: Sample sketch for statistic library Arduino
|
||||
//
|
||||
|
||||
#include "Statistic.h"
|
||||
|
||||
Statistic stats[4];
|
||||
|
||||
void setup(void)
|
||||
{
|
||||
Serial.begin(115200);
|
||||
Serial.println(__FILE__);
|
||||
Serial.print("Demo Statistics lib ");
|
||||
Serial.println(STATISTIC_LIB_VERSION);
|
||||
for (int i=0; i<4; i++)
|
||||
{
|
||||
stats[i].clear(); //explicitly start clean
|
||||
}
|
||||
}
|
||||
|
||||
void loop(void)
|
||||
{
|
||||
long rn = random(0, 9999);
|
||||
int idx = random(0, 4);
|
||||
stats[idx].add(rn / 100.0 + 1);
|
||||
|
||||
if (stats[idx].count() == 10000)
|
||||
{
|
||||
Serial.print("IDX: ");
|
||||
Serial.println(idx);
|
||||
Serial.print(" Count: ");
|
||||
Serial.println(stats[idx].count());
|
||||
Serial.print(" Min: ");
|
||||
Serial.println(stats[idx].minimum(), 4);
|
||||
Serial.print(" Max: ");
|
||||
Serial.println(stats[idx].maximum(), 4);
|
||||
Serial.print(" Average: ");
|
||||
Serial.println(stats[idx].average(), 4);
|
||||
Serial.println("=====================================");
|
||||
stats[idx].clear();
|
||||
}
|
||||
}
|
60
examples/TimingTest/TimingTest.ino
Normal file
60
examples/TimingTest/TimingTest.ino
Normal file
|
@ -0,0 +1,60 @@
|
|||
//
|
||||
// FILE: TimingTest.ino
|
||||
// AUTHOR: Rob dot Tillaart at gmail dot com
|
||||
// VERSION: 0.2.0
|
||||
// PURPOSE: measure time difference for runtime stddev toggle.
|
||||
// add is 1024 millis faster for 10K adds ==> ~ 100uSec per add faster.
|
||||
|
||||
#include "Statistic.h"
|
||||
|
||||
Statistic myStats;
|
||||
|
||||
uint32_t start;
|
||||
uint32_t stop;
|
||||
|
||||
bool useStdDev = true;
|
||||
|
||||
void setup(void)
|
||||
{
|
||||
Serial.begin(115200);
|
||||
Serial.println(__FILE__);
|
||||
Serial.print("Demo Statistics lib ");
|
||||
Serial.println(STATISTIC_LIB_VERSION);
|
||||
myStats.clear(useStdDev);
|
||||
start = millis();
|
||||
}
|
||||
|
||||
void loop(void)
|
||||
{
|
||||
long rn = random(0, 9999);
|
||||
myStats.add(rn / 100.0 + 1);
|
||||
if (myStats.count() == 10000)
|
||||
{
|
||||
stop = millis();
|
||||
Serial.print(" Count: ");
|
||||
Serial.println(myStats.count());
|
||||
Serial.print(" Min: ");
|
||||
Serial.println(myStats.minimum(), 4);
|
||||
Serial.print(" Max: ");
|
||||
Serial.println(myStats.maximum(), 4);
|
||||
Serial.print(" Average: ");
|
||||
Serial.println(myStats.average(), 4);
|
||||
if (useStdDev)
|
||||
{
|
||||
Serial.print(" variance: ");
|
||||
Serial.println(myStats.variance(), 4);
|
||||
Serial.print(" pop stdev: ");
|
||||
Serial.println(myStats.pop_stdev(), 4);
|
||||
Serial.print(" unbias stdev: ");
|
||||
Serial.println(myStats.unbiased_stdev(), 4);
|
||||
}
|
||||
Serial.print(" time(ms): ");
|
||||
Serial.println(stop - start);
|
||||
Serial.println("=====================================");
|
||||
useStdDev = !useStdDev;
|
||||
myStats.clear(useStdDev);
|
||||
start = millis();
|
||||
}
|
||||
}
|
||||
|
||||
// -- END OF FILE --
|
21
keywords.txt
Normal file
21
keywords.txt
Normal file
|
@ -0,0 +1,21 @@
|
|||
# Syntax Coloring Map For Statistic
|
||||
|
||||
# Datatypes (KEYWORD1)
|
||||
Statistic KEYWORD1
|
||||
|
||||
# Methods and Functions (KEYWORD2)
|
||||
clear KEYWORD2
|
||||
add KEYWORD2
|
||||
count KEYWORD2
|
||||
sum KEYWORD2
|
||||
minimum KEYWORD2
|
||||
maximum KEYWORD2
|
||||
average KEYWORD2
|
||||
variance KEYWORD2
|
||||
pop_stdev KEYWORD2
|
||||
unbiased_stdev KEYWORD2
|
||||
|
||||
# Instances (KEYWORD2)
|
||||
|
||||
# Constants (LITERAL1)
|
||||
STATISTIC_LIB_VERSION LITERAL1
|
27
library.json
Normal file
27
library.json
Normal file
|
@ -0,0 +1,27 @@
|
|||
{
|
||||
"name": "Statistic",
|
||||
"keywords": "Statistic,sum,min,max,average,variance,standard,deviation,population,unbiased",
|
||||
"description": "Library with basic statistical functions for Arduino.",
|
||||
"authors":
|
||||
[
|
||||
{
|
||||
"name": "Rob Tillaart",
|
||||
"email": "Rob.Tillaart@gmail.com",
|
||||
"maintainer": true
|
||||
},
|
||||
{
|
||||
"name": "Gil Ross"
|
||||
}
|
||||
],
|
||||
"repository":
|
||||
{
|
||||
"type": "git",
|
||||
"url": "https://github.com/RobTillaart/Arduino.git"
|
||||
},
|
||||
"version":"0.4.0",
|
||||
"frameworks": "arduino",
|
||||
"platforms": "*",
|
||||
"export": {
|
||||
"include": "libraries/Statistic"
|
||||
}
|
||||
}
|
11
library.properties
Normal file
11
library.properties
Normal file
|
@ -0,0 +1,11 @@
|
|||
name=Statistic
|
||||
version=0.4.0
|
||||
author=Rob Tillaart <rob.tillaart@gmail.com>
|
||||
maintainer=Rob Tillaart <rob.tillaart@gmail.com>
|
||||
sentence=Library with basic statistical functions for Arduino.
|
||||
paragraph=Supports
|
||||
category=Data Processing
|
||||
url=https://github.com/RobTillaart/Statistic
|
||||
architectures=*
|
||||
includes=Statistic.h
|
||||
depends=
|
Loading…
Reference in a new issue