This is Floyd's algorithm for cycle detection. You are asking about the second phase of the algorithm -- once you've found a node that's part of a cycle, how does one find the start of the cycle?
In the first part of Floyd's algorithm, the hare moves two steps for every step of the tortoise. If the tortoise and hare ever meet, there is a cycle, and the meeting point is part of the cycle, but not necessarily the first node in the cycle.
When the tortoise and hare meet, we have found the smallest i (the number of steps taken by the tortoise) such that Xi = X2i. Let mu represent the number of steps to get from X0 to the start of the cycle, and let lambda represent the length of the cycle. Then i = mu + alambda, and 2i = mu + blambda, where a and b are integers denoting how many times the tortoise and hare went around the cycle. Subtracting
the first equation from the second gives i = (b-a)*lambda, so i is an integer multiple
of lambda. Therefore, Xi + mu = Xmu. Xi represents the meeting point of the tortoise and hare. If you move the tortoise back to the starting node X0, and let the tortoise and hare continue at the same speed, after mu additional steps the tortoise will have reached Xmu, and the hare will have reached Xi + mu = Xmu, so the second meeting point denotes the start of the cycle.
Let me try to clarify the cycle detection algorithm that is provided at Wikipedia - Tortoise_and_hare in my own words.
How it works
Let's have a tortoise and a hare (name of the pointers) pointing to the beginning of the list with a cycle, as in the diagram above.
Let's hypothesize that if we move the tortoise 1 step at a time, and the hare 2 steps at a time, they will eventually meet at a point. Let's show that first of all this hypothesis is true.
The figure illustrates a list with a cycle. The cycle has a length of n and we are initially m steps away from the cycle. Also, let's say that the meeting point is k steps away from the cycle beginning and tortoise and hare meets when tortoise has taken i total steps. (Hare would have taken 2i total steps by then.).
The following 2 conditions must hold:
1) i = m + p * n + k
2) 2i = m + q * n + k
The first one says that the tortoise moves i steps and in these i steps, it first gets to the cycle. Then it goes through the cycle p times for some positive number p. Finally, it goes over k more nodes until it meets hare.
A similar thing is true for hare. It moves 2i steps and in these 2i steps it first gets to the cycle. Then it goes through the cycle q times for some positive number q. Finally, it goes over k more nodes until it meets the tortoise.
As the hare travels with double the speed of the tortoise, and time is constant for both when they reach the meeting point.
So by using simple speed, time and distance relation,
2 ( m + p * n + k ) = m + q * n + k
=> 2m + 2pn + 2k = m + nq + k
=> m + k = ( q - 2p ) n
Among m, n, k, p, q, the first two are properties of the given list. If we can show that there is at least one set of values for k, q, p that makes this equation true we show that the hypothesis is correct.
One such solution set is as follows:
p = 0
q = m
k = m n - m
We can verify that these values work as follows:
m + k = ( q - 2p ) n
=> m + mn - m = ( m - 2*0) n
=> mn = mn
For this set, i is
i = m + p n + k
=> m + 0 * n + mn - m = mn
Of course, you should see that this is not necessarily the smallest i possible. In other words, tortoise and hare might have already met before many times. However, since we show that they meet at some point at least once we can say that the hypothesis is correct. So they would have to meet if we move one of them by 1 step, and the other one by 2 steps at a time.
Now we can go to the second part of the algorithm which is how to find the beginning of the cycle.
Cycle Beginning
Once tortoise and hare meet, let's put tortoise back to the beginning of the list and keep hare where they met (which is k steps away from the cycle beginning).
The hypothesis is that if we let them move at the same speed (1 step for both), the first time they ever meet again will be the cycle beginning.
Let's prove this hypothesis.
Let's first assume some oracle tells us what m is.
Then, if we let them move m + k steps, the tortoise would have to arrive at the point they met originally (k steps away from the cycle beginning - see in the figure).
Previously we showed that m + k = (q - 2p) n.
Since m + k steps is a multiple of cycle length n, hare, in the meantime, would go through the cycle (q-2p) times and would come back to the same point (k steps away from the cycle beginning).
Now, instead of letting them move m + k steps, if we let them move only m steps, the tortoise would arrive at the cycle beginning. Hare would be k steps short of completing (q-2p) rotations. Since it started k steps in front of the cycle beginning, the hare would have to arrive at the cycle beginning.
As a result, this explains that they would have to meet at the cycle beginning after some number of steps for the very first time (very first time because the tortoise just arrived at the cycle after m steps and it could never see hare which was already in the cycle).
Now we know that the number of steps we need to move them until they meet turns out to be the distance from the beginning of the list to the cycle beginning, m. Of course, the algorithm does not need to know what m is. It will just move both tortoise and hare one step at a time until they meet. The meeting point has to be the cycle start and the number of steps must be the distance (m) to the cycle beginning. Assuming we know the length of the list, we can also, compute the length of the cycle of subtracting m from the list length.
I know there is already an accepted answer for this problem but I'll still try to answer in a fluid manner.
Assume :
The length of the Path is 'X+B' where 'B' is the length of the looped path and X of the non looped path.
Speed of tortoise : v
Speed of hare : 2*v
Point where both meet is at a distance 'x + b - k' from the starting point.
Now, let the hare and the tortoise meet after time 't' from beginning.
Observations:
If, Distance traveled by the tortoise = v*t = x + (b-k) (say)
Then, Distance traveled by the hare = 2*v*t = x + (b - k) + b (since the hare has traversed the looped part once already)
Now, there meeting times are same.
=> x + 2*b - k = 2* (x + b - k)
=> x = k
This of course means that the length of the path that is not looped is same as the distance of the starting point of the loop from the point where both meet.
Okay so lets assume the hare and the tortoise meet at a point which is k steps away from the starting of the cycle, the number of steps before the cycle starts is mu and the length of the cycle is L.
So now at the meeting point ->
Distance covered by tortoise = mu + a*L + k - Equation 1
(Steps taken to reach the beginning of the cycle + steps taken to cover 'a' iterations of the cycle + k steps from the start of the cycle)
(where a is some positive constant)
Distance covered by the hare = mu + b*L + k - Equation 2
(Steps taken to reach the beginning of the cycle + steps taken to cover 'b' iterations of the cycle + k steps from the start of the cycle)
(where b is some positive constant and b>=a)
So the extra distance covered by the hare is = Equation 2 - Equation 1 = (b-a)*L
Please note that this distance is also equal to the distance of the tortoise from the starting point since the hare moves 2 times faster than the tortoise. This can be equated to 'mu+k' which is also the distance of the meeting point from the beginning if we do not include multiple traversals of the cycle.
Thus,
mu + k = (b-a)*L
So mu steps from this point would lead back to the beginning of the cycle (since k steps from the start of the cycle have already been taken to reach the meeting point). This could happen in the same cycle or any of the subsequent cycles.
Thus now if we move the tortoise to beginning of the linked list, it will take mu steps to reach the starting point of the cycle and the hare would take mu steps to also reach the beginning of the cycle and thus they would both meet at the starting point of the cycle.
P.S. Honestly, I had the same question as the original poster in my mind and I read the first answer, they did clear out a few things but I could not get to the final result clearly and so I tried to do it my own way and found it easier to understand.
It is actually easy to prove that they both will meet at the starting point, if you consider the maths behind the meeting point.
Firstly let m denote the starting point of cycle in the linked list , and n denote the length of the cycle . Then for the hare and tortoise to meet , we have :
( 2*t - m )%n = (t - m) %n, where t = time (at t = 0 , both are at the start)
Stating this more mathematically :
(2*t - m - (t - m) ) = 0 modulo n , which implies , t = 0 modulo n
so they will meet at time t which should be a multiple of length of cycle . This means that they meet at a location, which is
(t-m) modulo n = (0-m) modulo n = (-m) modulo n .
So now coming back to the question , if you move one pointer from the start of the linked list , and another from the intersection point , after m steps we will have the hare (which is moving inside the cycle) come to a point which is ((-m) + m) modulo n = 0 modulo n which is nothing but the starting point of the cycle.So we can see that after m steps it comes to the start of the cycle and the tortoise will meet it there as it will traverse m steps from the start of the linked list.
As a side note ,we can also calculate the time of their intersection in this way : The condition t = 0 modulo n tells us that they will meet at a time which is a multiple of cycle length , and also t should be greater than m as they would meet in the cycle . So time taken will be equal to the first multiple of n which is greater than m .
I don't think so its true that when they meet that's the starting point. But yes if the other pointer(F) was at the meeting point before , than that pointer will be at the end of the loop instead of the start of the loop and the pointer(S) which started from the start of the list it will end up at the start of the loop. for eg:
If the two pointer meet, it proves that there is a loop. Once they have met, one of the nodes will point to the head and then have both proceed one node at a time. They will meet at the start of the loop.
Rationale:
When two people walk down a circular track, one of them at twice the speed of the other, where do they meet? Exactly where they started.
Now, suppose the fast runner has a head start of k steps in an n step lap. where will they meet? Exactly at n-k steps. When the slow runner has covered (n-k) steps, the fast runner would have covered k+2(n-k) steps.
(ie, __ABC0 steps ie 2n-k steps). i.e.(n-k) steps (The path is circular and we are not concerned about the number of rounds after which they meet; We are just interested in the position where they meet).
Now how did the fast runner get the head start of k steps in the first place? Because it took the slow runner that many steps to reach the start of the loop. So the start of the loop is k steps from the head node.
Note: The node where both pointer met is k steps away from start of loop (inside the loop) and the head node also is k steps away from start of loop. So when we have pointer advancing at equal pace of 1 step from bot these nodes, they will meet at the start of the loop.
I believe it is straightforward. Please let me know if any part is ambiguous.
At the time of the first collision, tortoise moved m+k steps as show above. Hare moves twice as fast as tortoise, meaning hare moved 2(m+k) steps. From these simple facts we can derive the following graph.
At this point, we move tortoise back to the start and declare that both hare and tortoise must move one step at a time. By definition, after m steps, tortoise will be at the start of the cycle. Where will hare be?
Hare will also be at the start of the cycle. This is clear from the second graph: When tortoise was moved back to the start, hare was k steps into its last cycle. After m steps, hare will have completed another cycle and collided with tortoise.
It's very very simple. You can think in terms of relative speed. If the rabbit moves two nodes and tortoise moves one node, relative to tortoise rabbit is moving one node(assume tortoise at rest). So, if we move one node in the circular linked list we are sure going to meet at that point again.
After finding the connected point inside the circular linked list, now the problem is reduced to finding the intersection point of two linked list problem.
With all the above analysis, if you are a learn-by-example person, I tried to write up an short analysis and example that help explains the math everyone else attempted to explain. Here we go!
Analysis:
If we have two pointers, one faster than the other, and move them along together, they will eventually meet again to indicate a cycle or null to indicate no cycle.
To find the starting point of the cycle, let ...
m be the distance from head to the beginning of the cycle;
d be the number of nodes in the cycle;
p1 be the speed of the slower pointer;
p2 be the speed of the faster pointer, eg. 2 means steps through two nodes at a time.
From the above sample data, we can easily discover that whenever the faster and the slower pointers meet, they are m steps away from the start of the cycle. To solve this, put the faster pointer back at the head and set its speed to the speed of the slower pointer. When they meet again, the node is the start of the cycle.
Distance travelled by slowPointer before meeting = x + y
Distance travelled by fastPointer before meeting = (x + y + z) + y
= x + 2y + z
Since fastPointer travels with double the speed of slowPointer, and time is constant for both when the reach the meeting point.
So by using simple speed, time and distance relation
2(x+y)= x+2y+z
=> x+2y+z = 2x+2y
=> x=z
Hence by moving slowPointer to start of linked list, and making both slowPointer and fastPointer to move one node at a time, they both have same distance to cover .
They will reach at the point where the loop starts in the linked list.
Reduce the problem to a loop problem, then go back to the initial problem
I find the following explanation more intuitive.
Take two pointers (1 = tortoise and 2 = hare) that start from the head (O), 1 has a step length of 1, 2 has a step length of 2. Think about the moment when 1 reaches the start node of that cycle (A).
We want to answer to the following question "Where is 2 when 1 is in A?".
So, OA = a is a natural number (a >= 0). But it can be written in the following way: a = k*n + b, where a, k, n, b are natural numbers:
n = the cycle length
k >= 0 = constant
0 <= b <= n-1
It means that b = a % n.
E.g.: if a = 20 and n = 8 => k = 2 and b = 4 because 20 = 2*8 + 4.
The distance covered by 1 is d = OA = a = k*n + b. But in the same time, 2 covers D = 2*d = d + d = OA + d = OA + k*n + b. This means that when 2 is in A it has to cover k*n + b. As you can see, k is the number of laps, but after those laps, 2 will be D = 2*d = d + d = OA + d = OA + k*n + b0 far from A. So, we found where 2 is when 1 is in A. Let's call that point B, where AB = b.
Now, we reduce the problem to a circle. The question is "Where is the meeting point?". Where is that C?
In every step, 2 reduces the distance from 1 with 1 (let's say meter) because 1 is getting further from 2 with 1, but at the same time 2 goes closer to 1 by 2.
So, the intersection will be when the distance between 1 and 2 will be zero. This means that 2 reduces the n - b distance. In order to achieve this, 1 will make n - b steps, while 2 will make 2*(n - b) steps.
So, the intersection point will be n - b far from A (clockwise), because this is the distance covered by 1 until it meets 2. => the distance between C and A is CA = b, because AC = AB + BC = n - b and CA = n - AC. Don't think that AC = CA, because the AC distance is not a trivial mathematical distance, it is the number of steps between A and C (where A is the start point and C is the end point).
Now, let's go back to the initial schema.
We know that a = k*n + b and CA = b.
We can take 2 new pointers 1' and 1'', where 1' starts from the head (O) and 1'' starts from the intersection point (C).
While 1' goes from O to A, 1'' goes from C to A and continues to finish k laps. So, the intersection point is A.
N[0] is the node of start of the loop,
m is the number of steps from beginning to N[0].
we have 2 pointers A and B, A runs at 1x speed, B at 2x speed, both start at the beginning.
when A reaches N[0], B should be already in N[m].
(Note: A uses m steps to reach N[0], and B should be m steps further)
Then, A runs k more steps to collide to B,
i.e. A is at N[k], B is at N[m+2k]
(Note: B should runs for 2k steps starting from N[m])
A collide B at N[k] and N[m+2k] respectively, it means
k=m+2k, thus k = -m
Thus, to cycle back to the N[0] from N[k], we need m more steps.
Simply saying, we just need to run m more steps after we found the collision node. We can have a pointer to run from beginning and a pointer running from collision node, they will meet at N[0] after m steps.
Therefore, the pseudo code are as follow:
1) A increase 1 step per loop
2) B increase 2 steps per loop
3) if A & B are the same node, cycle found, then go to 5
4) repeat from 1
5) A reset to head
6) A increase 1 step per loop
7) B increase 1 step per loop
8) if A & B are the same node, start of the cycle found
9) repeat from 6
Old Monk's simple and under-upvoted answer explains finding the cycle when the fast runner completes only single full cycle. In this answer I explain the case when fast runner has run the loop multiple times before the slow runner enters the loop.
Using the same image:
Let's say the fast runner has run the loop m times before slow and fast meet. This means that:
Distance run by slow: x + y
Distance run by fast: x + m(y + z) + y i.e. extra y where they meet
Since fast runs with twice the speed of slow, and that they have been running for same time, it implies that if we double the distance ran by slow, we get the distance ran by fast. Thus,
2(x + y) = x + m(y + z) + y
Solving for x gives,
x = (m - 1)(y + z) + z
In real scenario it would mean, x = (m - 1) complete loop runs + an extra distance z.
Hence, if we put one pointer at the start of the list and leave the other one at the meeting point, then moving them by the same speed will result in the in loop pointer completing m - 1 runs of the loop and then meeting the other pointer right at the loop beginning.
-there are k steps before the loop. We don't know what k is and don't need to find out. We can work abstractly with just k.
--After k steps
----- T is at cycle beginning
----- H is k steps into cycle (he went 2k total and thus k into loop)
** they are now loopsize - k apart
(note that k == K == mod(loopsize, k) --e.g. if a node is 2 steps into a 5 node cycle it is also 7, 12 or 392 steps in, so how big the cycle is w.r.t k does not factor in.
Since they catch up to each other at the rate of 1 step per unit of time because one is moving twice as fast as the other, they will meet at loopsize - k.
This means it will take k nodes to reach the beginning of the cycle and thus the distance from head to cyclestart and collision to cyclestart are the same.
So now after first collision move T back to head. T and H will meet at cyclestart if you move at rate of 1 each. (in k steps for both)
This means that the algorithm is:
from head move T = t.next and H.next.next until they collide ( T == H) (there is a cycle)
//take care of case when k=0 or T and H met at the head of the loop by
calculating the length of the loop
--count the length of the cycle by moving T or H around it with a counter
--move a pointer T2 to head of list
--move pointer length of cycle steps
--move another pointer H2 to head
--move T2 and H2 in tandem until they meet at start of cycle
A simple explanation using the idea of relative velocity taught in high school - Physics 101 / Kinematics lectures.
Let's assume distance from start of linked list to start of the circle is x hops. Let's call the start of circle as point X (in caps - see figure above). Also let's assume total size of circle is N hops.
Speed of hare = 2 * Speed of tortoise. So that is 1 hops/sec and 2 hops/sec respectively
When tortoise reaches the start of the circle X, the hare must further be x hops away at point Y in the figure. (Because hare has travelled twice the distance as the tortoise).
Thus, length of the remaining arc clockwise from X to Y would be N-x. This also happens to be the relative distance to be covered between the hare and the tortoise for them to be able to meet. Let's say this relative distance will be covered in time t_m i.e. time to meet. Relative speed is (2 hops/sec - 1 hops/sec) i.e. 1 hops/sec. Thus using, relative distance = relative speed X time, we get, t = N-x sec. So it will take N-x to reach the meeting point for both the tortoise and the hare.
Now in N-x sec time and at 1 hops/sec speed, the tortoise who was earlier at point X will cover N-x hops to reach the meeting point M. So, that means the meeting point M is at N-x hops counterclockwise from X = (which further implies) => that there is x distance remaining from point M to X clockwise.
But x is also the distance to reach point X from the start of the linked list.
Now, we don't care what number of hops x corresponds to. If we put one tortoise at the start of the LinkedList and one tortoise at the meeting point M and let them hop/walk, then they will meet at point X, which is the point (or node) that we need.
Working this with a diagram would help. I am trying to explain the problem without equations.
If we let the hare and tortoise run in a circle and hare runs two times tortoise then, at end of one lap for hare tortoise would be at half. At end of two laps from hare tortoise would have done 1 lap and they both meet. This applies to all speed like if hare runs three times, hare 1 lap is equal to 1/3 of tortoise so at end of 3 laps for hare tortoise would have covered 1 lap and they meet.
Now if we start them m steps before loop, then it means the faster hare is starting ahead in loop. So if tortoise reach the start of loop the hare is m steps ahead loop and when they meet it would be m steps before the loop start.
If the pointers met at a point P as shown in the figure, the distance Z+Y is point P and X+Y is also point P which means Z=X. Which is why keeping moving one pointer from P and moving another from start(S) till they meet, which means moving an equal distance(Z or X) to the same point M(distance Z from P and X from S) will be the starting of the loop. Simple!
Call distance the number of links a pointer follows, and time the number of iterations the algorithm takes of moving the slow pointer one link and the fast pointer two links. There are N nodes before a cycle of length C, labeled with cycle offset k=0 through C-1.
To reach the start of the cycle, slow takes N time and distance. This means fast takes N distance in the cycle (N to get there, N to spin). So at time N, slow is at cycle offset k=0, and fast is at cycle offset k=N mod C.
If N mod C is zero, slow and fast now match and the cycle is found at time N and cycle position k=0.
If N mod C is not zero, then fast now has to catch up with slow, which at time N is C-(N mod C) distance behind in the cycle.
Since fast moves 2 for every 1 of slow, reducing the distance by 1 on every iteration, this takes as much additional time as the distance between fast and slow at time N, which is C-(N mod C). Since slow is moving from offset 0, this is also the offset where they meet.
So, if N mod C is zero, the phase 1 stops after N iterations at the beginning of the cycle. Otherwise, phase 1 stops after N+C-(N mod C) iterations at offset C-(N mod C) into the cycle.
// C++ pseudocode, end() is one after last element.
int t = 0;
T *fast = begin();
T *slow = begin();
if (fast == end()) return [N=0,C=0];
for (;;) {
t += 1;
fast = next(fast);
if (fast == end()) return [N=(2*t-1),C=0];
fast = next(fast);
if (fast == end()) return [N=(2*t),C=0];
slow = next(slow);
if (*fast == *slow) break;
}
Ok, so phase 2: slow takes N more steps to get to the cycle, at which point fast (now moving 1 per time step) is at (C-(N mod C)+N) mod C = 0. So they meet at the beginning of the cycle after phase 2.
int N = 0;
slow = begin();
for (;;) {
if (*fast == *slow) break;
fast = next(fast);
slow = next(slow);
N += 1;
}
For completeness, phase 3 computes the cycle length by moving once more through the cycle:
int C = 0;
for (;;) {
fast = next(fast);
C += 1;
if (fast == slow) break;
}
There are already plenty of answers to this, but I once came up with a diagram for this which is more visually intuitive to me. Perhaps it can help other people.
The main aha-moments for me were:
Split T (tortoise) into T1 (pre-loop) and T2 (in-loop).
Subtract T from H, where they visually overlap. What remains (H - T = H') is equal to T.
I see that most of the answers giving mathematical explanation for this "how does moving tortoise to beginning of linked list while keeping the hare at meeting place, followed by moving both one step at a time make them meet at starting point of cycle?"
The following method also does the same like floyd cycle detection behind the scenes but the rationale is simple but at a cost of O(n) memory.
I would like to add an easier approach/rationale to find the beginning of the cycle. Since this method was not mentioned anywhere, I tested this here: https://leetcode.com/problems/linked-list-cycle-ii/ and It passed all the testcases.
Let's consider that we've been given head reference of the LinkedList.
public ListNode detectCycle(ListNode head) {
// Consider a fast pointer which hops two nodes at once.
// Consider a slow pointer which hops one node at once.
// If the linked list contains a cycle,
// these two pointers would meet at some point when they are looping around the list.
// Caution: This point of intersection need not be the beginning of the cycle.
ListNode fast = null;
ListNode slow = null;
if (head != null) {
if (head.next != null) {
fast = head.next.next;
slow = head;
} else {
return null;
}
}
while (fast != null && fast.next != null) {
// Why do we need collection here? Explained below
Set<ListNode> collection = new HashSet<>();
if (fast == slow) {
// Once the cycle is detected,
we are sure that there is beginning to the cycle.
// In order to find this beginning,
// 1. move slow pointer to head and keep fast pointer at
the meeting point.
// 2. now while moving slow and fast pointers through a
single hop, store the slow reference in a collection.
// 3. Every time you hop the fast pointer, check the fast
pointer reference exits in that collection.
// Rationale: After we moved slow pointer to the head,
we know that slow pointer is coming behind the fast
pointer, since collection is storing all nodes from the
start using slow pointer, there is only one case we get
that fast pointer exists in the collection when slow
pointer started storing the nodes which are part of the
cycle. Because slow pointer can never go ahead of fast
pointer since fast pointer already has an head-start, at
the same time, the first occurence will always be of the
starting point of the cycle because slow pointer can't
go ahead of fast pointer to store other nodes in the
cycle. So, the moment we first find fast pointer in that
collection means, that is the starting point of the
cycle.
slow = head;
collection.add(slow);
while (!collection.contains(fast)) {
slow = slow.next;
collection.add(slow);
fast = fast.next;
}
return fast;
}
fast = fast.next.next;
slow = slow.next;
}
return null;
}