update readme.md
This commit is contained in:
		
							parent
							
								
									a751d44368
								
							
						
					
					
						commit
						6b03bce1d3
					
				
					 1 changed files with 32 additions and 5 deletions
				
			
		
							
								
								
									
										37
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										37
									
								
								README.md
									
									
									
									
									
								
							| 
						 | 
				
			
			@ -49,12 +49,12 @@ of memory quite fast. Instead a few calculated values are kept to be able to
 | 
			
		|||
calculate the most important statistics. 
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
#### Q: How many samples can the lib hold?  (internal variables and overflow)
 | 
			
		||||
#### Q: How many samples can the lib hold?  Part 1: internal variables and overflow
 | 
			
		||||
 | 
			
		||||
The counter of samples is an **uint32_t**, implying a maximum of about **4 billion** samples. 
 | 
			
		||||
In practice 'strange' things might happen before this number is reached. 
 | 
			
		||||
There are two internal variables, **_sum** which is the sum of the values and **_ssq** 
 | 
			
		||||
which is the sum of the squared values. Both can overflow especially **_ssq** 
 | 
			
		||||
There are two internal variables, **\_sum** which is the sum of the values and **\_ssq** 
 | 
			
		||||
which is the sum of the squared values. Both can overflow especially **\_ssq** 
 | 
			
		||||
can and probably will grow fast. The library does not protect against it.
 | 
			
		||||
 | 
			
		||||
There is a workaround for this (to some extend) if one knows the approx 
 | 
			
		||||
| 
						 | 
				
			
			@ -67,12 +67,39 @@ This workaround has no influence on the standard deviation.
 | 
			
		|||
- Q: should this subtraction trick be build into the lib?
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
#### Q: How many samples can the lib hold?  Part 2: order of magnitude floats
 | 
			
		||||
 | 
			
		||||
The samples are added in the internal variable **\_sum** and counted in **\_cnt**. 
 | 
			
		||||
In time **\_sum** will outgrow the added values in order of magnitude.
 | 
			
		||||
As **\_sum** is a float with 23 bite = ~7 digits precision this problem starts 
 | 
			
		||||
to become significant between 1 and 10 million calls to **add()**. 
 | 
			
		||||
The assumption here is that what's added is always in the same order of magnitude
 | 
			
		||||
(+- 1) e.g. an analogRead. 10 million looks like a lot but an analogRead takes only 
 | 
			
		||||
~0.1 millisecond on a slow device like an UNO.
 | 
			
		||||
 | 
			
		||||
Beyond the point that values aren't added anymore, and the count still growing,
 | 
			
		||||
one will see that the average will go down (very) slowly, but down.
 | 
			
		||||
 | 
			
		||||
There are 2 ways to detect this problem:
 | 
			
		||||
- check **count()** and decide after 100K samples to call **clear()**. 
 | 
			
		||||
- (since 0.4.3) Check the return value of **add()** to see what value is actually
 | 
			
		||||
added to the internal **\_sum**. If this substantial different, it might be time 
 | 
			
		||||
to call **clear()** too. 
 | 
			
		||||
 | 
			
		||||
For applications that need to have an average of large streams of data there also
 | 
			
		||||
exists a **runningAverage** library. This holds the last N (< 256) samples and take the 
 | 
			
		||||
average of them. This will often be the better tool. 
 | 
			
		||||
 | 
			
		||||
Also a consideration is to make less samples if possible. When temperature does 
 | 
			
		||||
not change more than 1x per minute, it makes no sense to sample it 2x per second.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
#### Q: How about the precision of the library?
 | 
			
		||||
 | 
			
		||||
The precision of the internal variables is restricted due to the fact 
 | 
			
		||||
that they are 32 bit float (IEEE754). If the internal variable **_sum** has 
 | 
			
		||||
that they are 32 bit float (IEEE754). If the internal variable **\_sum** has 
 | 
			
		||||
a large value, adding relative small values to the dataset wouldn't 
 | 
			
		||||
change its value any more. Same is true for **_ssq**. One might argue that 
 | 
			
		||||
change its value any more. Same is true for **\_ssq**. One might argue that 
 | 
			
		||||
statistically speaking these values are less significant, but in fact it is wrong.
 | 
			
		||||
 | 
			
		||||
There is a workaround for this (to some extend). If one has the samples in an 
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue