Tuesday, August 9, 2016

Why prefer start + (end – start) / 2 over (start + end) / 2 when calculating the middle of an array? – stackoverflow.com #JHedzWorlD

Why prefer start + (end – start) / 2 over (start + end) / 2 when calculating the middle of an array? – stackoverflow.com #JHedzWorlD



I’ve seen programmers use the formula


mid = start + (end - start) / 2 

instead of using the simpler formula


mid = (start + end) / 2 

for finding the middle element in the array or list.


Why do they use the former one?




There are three reasons.


First of all, start + (end - start) / 2 works even if you are using pointers, as long as end - start doesn’t overflow1.


int *start = ..., *end = ...; int *mid = start + (end - start) / 2; // works as expected int *mid = (start + end) / 2; // type error, won't compile 

Second of all, start + (end - start) / 2 won’t overflow if start and end are large positive numbers. With signed operands, overflow is undefined:


int start = 0x7ffffffe, end = 0x7fffffff; int mid = start + (end - start) / 2; // works as expected int mid = (start + end) / 2; // overflow... undefined 

(Note that end - start may overflow, but only if start < 0 or end < 0.)


Or with unsigned arithmetic, overflow is defined but gives you the wrong answer. However, for unsigned operands, start + (end - start) / 2 will never overflow as long as end >= start.


unsigned start = 0xfffffffeu, end = 0xffffffffu; unsigned mid = start + (end - start) / 2; // works as expected unsigned mid = (start + end) / 2; // mid = 0x7ffffffe 

Finally, you often want to round towards the start element.


int start = -3, end = 0; int mid = start + (end - start) / 2; // -2, closer to start int mid = (start + end) / 2; // -1, surprise! 

Footnotes


1 According to the C standard, if the result of pointer subtraction is not representable as a ptrdiff_t, then the behavior is undefined. However, in practice, this requires allocating a char array using at least half the entire address space.




We can take a simple example to demonstrate this fact. Suppose in a certain large array, we are trying to find the midpoint of the range [1000, INT_MAX]. Now, INT_MAX is the largest value the int data type can store. Even if 1 is added to this, the final value will become negative.


Also, start = 1000 and end = INT_MAX.


Using the formula: (start + end)/2,


the mid-point will be


(1000 + INT_MAX)/2 = -(INT_MAX+999)/2, which is negative and may give segmentation fault if we try to index using this value.



But, using the formula, (start + (end-start)/2), we get:


(1000 + (INT_MAX-1000)/2) = (1000 + INT_MAX/2 - 500) = (INT_MAX/2 + 500) which will not overflow.





To add to what others have already said, the first one explains its meaning clearer to those less mathematically minded:


mid = start + (end - start) / 2 

reads as:


mid equals start plus half of the length.



whereas:


mid = (start + end) / 2 

reads as:


mid equals half of start plus end



Which does not seem as clear as the first, at least when expressed like that.


as Kos pointed out it can also read:


mid equals the average of start and end



Which is clearer but still not, at least in my opinion, as clear as the first.





Why prefer start + (end – start) / 2 over (start + end) / 2 when calculating the middle of an array? – stackoverflow.com #JHedzWorlD

No comments:

Post a Comment