It doesn't really matter what base it is, since big-O notation is usually written showing only the asymptotically highest order of n, so constant coefficients will drop away. Since a different logarithm base is equivalent to a constant coefficient, it is superfluous.
Big O notation is not affected by logarithmic base, because all logarithms in different bases are related by a constant factor, O(ln n) is equivalent to O(log n).
Once expressed in big-O() notation, both are correct. However, during the derivation of the O() polynomial, in the case of binary search, only log2 is correct. I assume this distinction was the intuitive inspiration for your question to begin with.
Also, as a matter of my opinion, writing O(log2 N) is better for your example, because it better communicates the derivation of the algorithm's run-time.
In big-O() notation, constant factors are removed. Converting from one logarithm base to another involves multiplying by a constant factor.
So O(log N) is equivalent to O(log2 N) due to a constant factor.
However, if you can easily typeset log2 N in your answer, doing so is more pedagogical. In the case of binary tree searching, you are correct that log2 N is introduced during the derivation of the big-O() runtime.
Before expressing the result as big-O() notation, the difference is very important. When deriving the polynomial to be communicated via big-O notation, it would be incorrect for this example to use a logarithm other than log2 N, prior to applying the O()-notation. As soon as the polynomial is used to communicate a worst-case runtime via big-O() notation, it doesn't matter what logarithm is used.
Yes, when talking about big-O notation, the base does not matter. However, computationally when faced with a real search problem it does matter.
When developing an intuition about tree structures, it's helpful to understand that a binary search tree can be searched in O(n log n) time because that is the height of the tree - that is, in a binary tree with n nodes, the tree depth is O(n log n) (base 2). If each node has three children, the tree can still be searched in O(n log n) time, but with a base 3 logarithm. Computationally, the number of children each node has can have a big impact on performance (see for example: link text)
First you must understand what it means for a function f(n) to be O( g(n) ).
The formal definition is: *A function f(n) is said to be O(g(n)) iff |f(n)| <= C * |g(n)| whenever n > k, where C and k are constants.*
so let f(n) = log base a of n, where a > 1 and g(n) = log base b of n, where b > 1
NOTE: This means the values a and b could be any value greater than 1, for example a=100 and b = 3
Now we get the following: log base a of n is said to be O(log base b of n) iff |log base a of n| <= C * |log base b of n| whenever n > k
Choose k=0, and C= log base a of b.
Now our equation looks like the following: |log base a of n| <= log base a of b * |log base b of n| whenever n > 0
Notice the right hand side, we can manipulate the equation: = log base a of b * |log base b of n| = |log base b of n| * log base a of b = |log base a of b^(log base b of n)| = |log base a of n|
Now our equation looks like the following: |log base a of n| <= |log base a of n| whenever n > 0
The equation is always true no matter what the values n,b, or a are, other than their restrictions a,b>1 and n>0.
So log base a of n is O(log base b of n) and since a,b doesn't matter we can simply omit them.