diff --git a/README.md b/README.md index 2b3e35e..3a71146 100644 --- a/README.md +++ b/README.md @@ -49,12 +49,12 @@ of memory quite fast. Instead a few calculated values are kept to be able to calculate the most important statistics. -#### Q: How many samples can the lib hold? (internal variables and overflow) +#### Q: How many samples can the lib hold? Part 1: internal variables and overflow The counter of samples is an **uint32_t**, implying a maximum of about **4 billion** samples. In practice 'strange' things might happen before this number is reached. -There are two internal variables, **_sum** which is the sum of the values and **_ssq** -which is the sum of the squared values. Both can overflow especially **_ssq** +There are two internal variables, **\_sum** which is the sum of the values and **\_ssq** +which is the sum of the squared values. Both can overflow especially **\_ssq** can and probably will grow fast. The library does not protect against it. There is a workaround for this (to some extend) if one knows the approx @@ -67,12 +67,39 @@ This workaround has no influence on the standard deviation. - Q: should this subtraction trick be build into the lib? +#### Q: How many samples can the lib hold? Part 2: order of magnitude floats + +The samples are added in the internal variable **\_sum** and counted in **\_cnt**. +In time **\_sum** will outgrow the added values in order of magnitude. +As **\_sum** is a float with 23 bite = ~7 digits precision this problem starts +to become significant between 1 and 10 million calls to **add()**. +The assumption here is that what's added is always in the same order of magnitude +(+- 1) e.g. an analogRead. 10 million looks like a lot but an analogRead takes only +~0.1 millisecond on a slow device like an UNO. + +Beyond the point that values aren't added anymore, and the count still growing, +one will see that the average will go down (very) slowly, but down. + +There are 2 ways to detect this problem: +- check **count()** and decide after 100K samples to call **clear()**. +- (since 0.4.3) Check the return value of **add()** to see what value is actually +added to the internal **\_sum**. If this substantial different, it might be time +to call **clear()** too. + +For applications that need to have an average of large streams of data there also +exists a **runningAverage** library. This holds the last N (< 256) samples and take the +average of them. This will often be the better tool. + +Also a consideration is to make less samples if possible. When temperature does +not change more than 1x per minute, it makes no sense to sample it 2x per second. + + #### Q: How about the precision of the library? The precision of the internal variables is restricted due to the fact -that they are 32 bit float (IEEE754). If the internal variable **_sum** has +that they are 32 bit float (IEEE754). If the internal variable **\_sum** has a large value, adding relative small values to the dataset wouldn't -change its value any more. Same is true for **_ssq**. One might argue that +change its value any more. Same is true for **\_ssq**. One might argue that statistically speaking these values are less significant, but in fact it is wrong. There is a workaround for this (to some extend). If one has the samples in an