First-order noise shaping algorithms work by keeping an
error term that is the sum of all the previous errors caused by bit depth truncation. When each new sample is truncated, it is done to minimize this running error sum rather than simply rounding to nearest (which only minimizes the error for
that sample). The net effect of this is that the absolute amplitude of the noise resulting from truncation is slightly higher (compared to pure rounding) but falls off at 6 dB per octave toward lower frequencies. The final result is that there is less
audible noise at important parts of the audio bandwidth.
In theory, if you decoded two contiguous tracks using this method and you reset the error term at the track break and played them gapless, there
could be an audible click at the point that the error term was zeroed because there would be a DC jump. However, in the real world with real signals I believe that this is highly unlikely. And, in fact, I think that there is a greater chance for the case where a track that ends very loudly might make a small click in the silent beginning of the next track.
Best to stop losing any more sleep and just leave everything 24-bit.