This is simply merging. Treat each list as if it were a stack, and continuously pop the smaller of the two stack heads, adding the item to the result list, until one of the stacks is empty. Then add all remaining items to the resulting list.
res = []
while l1 and l2:
if l1[0] < l2[0]:
res.append(l1.pop(0))
else:
res.append(l2.pop(0))
res += l1
res += l2
Well, the naive approach (combine 2 lists into large one and sort) will be O(N*log(N)) complexity. On the other hand, if you implement the merge manually (i do not know about any ready code in python libs for this, but i'm no expert) the complexity will be O(N), which is clearly faster.
The idea is described wery well in post by Barry Kelly.
function merge(left,right)
var list result
while length(left) > 0 and length(right) > 0
if first(left) ≤ first(right)
append first(left) to result
left = rest(left)
else
append first(right) to result
right = rest(right)
end while
while length(left) > 0
append left to result
while length(right) > 0
append right to result
return result
This is simple merging of two sorted lists. Take a look at the sample code below which merges two sorted lists of integers.
#!/usr/bin/env python
## merge.py -- Merge two sorted lists -*- Python -*-
## Time-stamp: "2009-01-21 14:02:57 ghoseb"
l1 = [1, 3, 4, 7]
l2 = [0, 2, 5, 6, 8, 9]
def merge_sorted_lists(l1, l2):
"""Merge sort two sorted lists
Arguments:
- `l1`: First sorted list
- `l2`: Second sorted list
"""
sorted_list = []
# Copy both the args to make sure the original lists are not
# modified
l1 = l1[:]
l2 = l2[:]
while (l1 and l2):
if (l1[0] <= l2[0]): # Compare both heads
item = l1.pop(0) # Pop from the head
sorted_list.append(item)
else:
item = l2.pop(0)
sorted_list.append(item)
# Add the remaining of the lists
sorted_list.extend(l1 if l1 else l2)
return sorted_list
if __name__ == '__main__':
print merge_sorted_lists(l1, l2)
This should work fine with datetime objects. Hope this helps.
..easy! Plus, it's using only two built-in functions, so assuming the lists are of a reasonable size, it should be quicker than implementing the sorting/merging in a loop. More importantly, the above is much less code, and very readable.
If your lists are large (over a few hundred thousand, I would guess), it may be quicker to use an alternative/custom sorting method, but there are likely other optimisations to be made first (e.g not storing millions of datetime objects)
Using the timeit.Timer().repeat() (which repeats the functions 1000000 times), I loosely benchmarked it against ghoseb's solution, and sorted(l1+l2) is substantially quicker:
There is a slight flaw in ghoseb's solution, making it O(n**2), rather than O(n).
The problem is that this is performing:
item = l1.pop(0)
With linked lists or deques this would be an O(1) operation, so wouldn't affect complexity, but since python lists are implemented as vectors, this copies the rest of the elements of l1 one space left, an O(n) operation. Since this is done each pass through the list, it turns an O(n) algorithm into an O(n**2) one. This can be corrected by using a method that doesn't alter the source lists, but just keeps track of the current position.
I've tried out benchmarking a corrected algorithm vs a simple sorted(l1+l2) as suggested by dbr
def merge(l1,l2):
if not l1: return list(l2)
if not l2: return list(l1)
# l2 will contain last element.
if l1[-1] > l2[-1]:
l1,l2 = l2,l1
it = iter(l2)
y = it.next()
result = []
for x in l1:
while y < x:
result.append(y)
y = it.next()
result.append(x)
result.append(y)
result.extend(it)
return result
I've tested these with lists generated with
l1 = sorted([random.random() for i in range(NITEMS)])
l2 = sorted([random.random() for i in range(NITEMS)])
For various sizes of list, I get the following timings (repeating 100 times):
So in fact, it looks like dbr is right, just using sorted() is preferable unless you're expecting very large lists, though it does have worse algorithmic complexity. The break even point being at around a million items in each source list (2 million total).
One advantage of the merge approach though is that it is trivial to rewrite as a generator, which will use substantially less memory (no need for an intermediate list).
[Edit]
I've retried this with a situation closer to the question - using a list of objects containing a field "date" which is a datetime object.
The above algorithm was changed to compare against .date instead, and the sort method was changed to:
This does change things a bit. The comparison being more expensive means that the number we perform becomes more important, relative to the constant-time speed of the implementation. This means merge makes up lost ground, surpassing the sort() method at 100,000 items instead. Comparing based on an even more complex object (large strings or lists for instance) would likely shift this balance even more.
This hasn't been mentioned, so I'll go ahead - there is a merge stdlib function in the heapq module of python 2.6+. If all you're looking to do is getting things done, this might be a better idea. Of course, if you want to implement your own, the merge of merge-sort is the way to go.
Recursive implementation is below. Average performance is O(n).
def merge_sorted_lists(A, B, sorted_list = None):
if sorted_list == None:
sorted_list = []
slice_index = 0
for element in A:
if element <= B[0]:
sorted_list.append(element)
slice_index += 1
else:
return merge_sorted_lists(B, A[slice_index:], sorted_list)
return sorted_list + B
or generator with improved space complexity:
def merge_sorted_lists_as_generator(A, B):
slice_index = 0
for element in A:
if element <= B[0]:
slice_index += 1
yield element
else:
for sorted_element in merge_sorted_lists_as_generator(B, A[slice_index:]):
yield sorted_element
return
for element in B:
yield element
As people have mentioned, it may call the comparison function more times by some constant factor (but maybe call it more times in a shorter period in many cases!).
I would never rely on this, however. – Daniel Nadasi
I believe the Python developers are committed to keeping timsort, or at least keeping a sort that's O(n) in this case.
Generalized sorting (i.e. leaving apart radix sorts from limited value domains)
cannot be done in less than O(n log n) on a serial machine. – Barry Kelly
Right, sorting in the general case can't be faster than that. But since O() is an upper bound, timsort being O(n log n) on arbitrary input doesn't contradict its being O(n) given sorted(L1) + sorted(L2).
import random
n=int(input("Enter size of table 1")); #size of list 1
m=int(input("Enter size of table 2")); # size of list 2
tb1=[random.randrange(1,101,1) for _ in range(n)] # filling the list with random
tb2=[random.randrange(1,101,1) for _ in range(m)] # numbers between 1 and 100
tb1.sort(); #sort the list 1
tb2.sort(); # sort the list 2
fus=[]; # creat an empty list
print(tb1); # print the list 1
print('------------------------------------');
print(tb2); # print the list 2
print('------------------------------------');
i=0;j=0; # varialbles to cross the list
while(i<n and j<m):
if(tb1[i]<tb2[j]):
fus.append(tb1[i]);
i+=1;
else:
fus.append(tb2[j]);
j+=1;
if(i<n):
fus+=tb1[i:n];
if(j<m):
fus+=tb2[j:m];
print(fus);
# this code is used to merge two sorted lists in one sorted list (FUS) without
#sorting the (FUS)
def merge_sort(a,b):
pa = 0
pb = 0
result = []
while pa < len(a) and pb < len(b):
if a[pa] <= b[pb]:
result.append(a[pa])
pa += 1
else:
result.append(b[pb])
pb += 1
remained = a[pa:] + b[pb:]
result.extend(remained)
return result
An implementation of the merging step in Merge Sort that iterates through both lists:
def merge_lists(L1, L2):
"""
L1, L2: sorted lists of numbers, one of them could be empty.
returns a merged and sorted list of L1 and L2.
"""
# When one of them is an empty list, returns the other list
if not L1:
return L2
elif not L2:
return L1
result = []
i = 0
j = 0
for k in range(len(L1) + len(L2)):
if L1[i] <= L2[j]:
result.append(L1[i])
if i < len(L1) - 1:
i += 1
else:
result += L2[j:] # When the last element in L1 is reached,
break # append the rest of L2 to result.
else:
result.append(L2[j])
if j < len(L2) - 1:
j += 1
else:
result += L1[i:] # When the last element in L2 is reached,
break # append the rest of L1 to result.
return result
L1 = [1, 3, 5]
L2 = [2, 4, 6, 8]
merge_lists(L1, L2) # Should return [1, 2, 3, 4, 5, 6, 8]
merge_lists([], L1) # Should return [1, 3, 5]
I'm still learning about algorithms, please let me know if the code could be improved in any aspect, your feedback is appreciated, thanks!
This is my solution in linear time without editing l1 and l2:
def merge(l1, l2):
m, m2 = len(l1), len(l2)
newList = []
l, r = 0, 0
while l < m and r < m2:
if l1[l] < l2[r]:
newList.append(l1[l])
l += 1
else:
newList.append(l2[r])
r += 1
return newList + l1[l:] + l2[r:]
This code has time complexity O(n) and can merge lists of any data type, given a quantifying function as the parameter func. It produces a new merged list and does not modify either of the lists passed as arguments.
def merge_sorted_lists(listA,listB,func):
merged = list()
iA = 0
iB = 0
while True:
hasA = iA < len(listA)
hasB = iB < len(listB)
if not hasA and not hasB:
break
valA = None if not hasA else listA[iA]
valB = None if not hasB else listB[iB]
a = None if not hasA else func(valA)
b = None if not hasB else func(valB)
if (not hasB or a<b) and hasA:
merged.append(valA)
iA += 1
elif hasB:
merged.append(valB)
iB += 1
return merged
from math import floor
def merge_sort(l):
if len(l) < 2:
return l
left = merge_sort(l[:floor(len(l)/2)])
right = merge_sort(l[floor(len(l)/2):])
return merge(left, right)
def merge(a, b):
i, j = 0, 0
a_len, b_len = len(a), len(b)
output_length = a_len + b_len
out = list()
for _ in range(output_length):
if i < a_len and j < b_len and a[i] < b[j]:
out.append(a[i])
i = i + 1
elif j < b_len:
out.append(b[j])
j = j + 1
while (i < a_len):
out.append(a[i])
i += 1
while (j < b_len):
out.append(b[j])
j += 1
return out
if __name__ == '__main__':
print(merge_sort([7, 8, 9, 4, 5, 6]))