“ Then”在 CasperJS 中到底意味着什么

我使用 CasperJS 来自动化一系列的点击,完成表单,解析数据等通过一个网站。

Casper 似乎被组织成 then语句形式的预设步骤列表(参见这里的例子: http://casperjs.org/quickstart.html) ,但是不清楚什么触发下一个语句实际运行。

例如,then是否等待所有挂起的请求完成?injectJS算是未决请求吗?如果我有一个嵌套链接到 open语句末尾的 then语句,会发生什么情况?

casper.thenOpen('http://example.com/list', function(){
casper.page.injectJs('/libs/jquery.js');
casper.evaluate(function(){
var id = jQuery("span:contains('"+itemName+"')").closest("tr").find("input:first").val();
casper.open("http://example.com/show/"+id); //what if 'then' was added here?
});
});


casper.then(function(){
//parse the 'show' page
});

我正在寻找一个技术解释如何在 CasperJS 的流程工作。我的具体问题是,我的最后一个 then语句(上面)运行在我的 casper.open语句之前,我不知道为什么。

29267 次浏览

then() basically adds a new navigation step in a stack. A step is a javascript function which can do two different things:

  1. waiting for the previous step - if any - being executed
  2. waiting for a requested url and related page to load

Let's take a simple navigation scenario:

var casper = require('casper').create();


casper.start();


casper.then(function step1() {
this.echo('this is step one');
});


casper.then(function step2() {
this.echo('this is step two');
});


casper.thenOpen('http://google.com/', function step3() {
this.echo('this is step 3 (google.com is loaded)');
});

You can print out all the created steps within the stack like this:

require('utils').dump(casper.steps.map(function(step) {
return step.toString();
}));

That gives:

$ casperjs test-steps.js
[
"function step1() { this.echo('this is step one'); }",
"function step2() { this.echo('this is step two'); }",
"function _step() { this.open(location, settings); }",
"function step3() { this.echo('this is step 3 (google.com is loaded)'); }"
]

Notice the _step() function which has been added automatically by CasperJS to load the url for us; when the url is loaded, the next step available in the stack — which is step3() — is called.

When you have defined your navigation steps, run() executes them one by one sequentially:

casper.run();

Footnote: the callback/listener stuff is an implementation of the Promise pattern.

then() merely registers a series of steps.

run() and its family of runner functions, callbacks, and listeners, are all what actually do the work of executing each step.

Whenever a step is completed, CasperJS will check against 3 flags: pendingWait, loadInProgress, and navigationRequested. If any of those flags is true, then do nothing, go idle until a later time (setInterval style). If none of those flags is true, then the next step will get executed.

As of CasperJS 1.0.0-RC4, a flaw exists, where, under certain time-based circumstances, the "try to do next step" method will be triggered before CasperJS had the time to raise either one of the loadInProgress or navigationRequested flags. The solution is to raise one of those flags before leaving any step where those flags are expected to be raised (ex: raise a flag either before or after asking for a casper.click()), maybe like so:

(Note: This is only illustrative, more like psuedocode than proper CasperJS form...)

step_one = function(){
casper.click(/* something */);
do_whatever_you_want()
casper.click(/* something else */); // Click something else, why not?
more_magic_that_you_like()
here_be_dragons()
// Raise a flag before exiting this "step"
profit()
}

To wrap up that solution into a single-line of code, I introduced blockStep() in this github pull request, extending click() and clickLabel() as a means to help guarantee that we get the expected behaviour when using then(). Check out the request for more info, usage patterns, and minimum test files.

According to the CasperJS Documentation:

then()

Signature: then(Function then)

This method is the standard way to add a new navigation step to the stack, by providing a simple function:

casper.start('http://google.fr/');


casper.then(function() {
this.echo('I\'m in your google.');
});


casper.then(function() {
this.echo('Now, let me write something');
});


casper.then(function() {
this.echo('Oh well.');
});


casper.run();

You can add as many steps as you need. Note that the current Casper instance automatically binds the this keyword for you within step functions.

To run all the steps you defined, call the run() method, and voila.

Note: You must start() the casper instance in order to use the then() method.

Warning: Step functions added to then() are processed in two different cases:

  1. when the previous step function has been executed,
  2. when the previous main HTTP request has been executed and the page loaded;

Note that there's no single definition of page loaded; is it when the DOMReady event has been triggered? Is it "all requests being finished"? Is it "all application logic being performed"? Or "all elements being rendered"? The answer always depends on the context. Hence why you're encouraged to always use the waitFor() family methods to keep explicit control on what you actually expect.

A common trick is to use waitForSelector():

casper.start('http://my.website.com/');


casper.waitForSelector('#plop', function() {
this.echo('I\'m sure #plop is available in the DOM');
});


casper.run();

Behind the scenes, the source code for Casper.prototype.then is shown below:

/**
* Schedules the next step in the navigation process.
*
* @param  function  step  A function to be called as a step
* @return Casper
*/
Casper.prototype.then = function then(step) {
"use strict";
this.checkStarted();
if (!utils.isFunction(step)) {
throw new CasperError("You can only define a step as a function");
}
// check if casper is running
if (this.checker === null) {
// append step to the end of the queue
step.level = 0;
this.steps.push(step);
} else {
// insert substep a level deeper
try {
step.level = this.steps[this.step - 1].level + 1;
} catch (e) {
step.level = 0;
}
var insertIndex = this.step;
while (this.steps[insertIndex] && step.level === this.steps[insertIndex].level) {
insertIndex++;
}
this.steps.splice(insertIndex, 0, step);
}
this.emit('step.added', step);
return this;
};

Explanation:

In other words, then() schedules the next step in the navigation process.

When then() is called, it is passed a function as a parameter which is to be called as a step.

It checks if an instance has started, and if it has not, it displays the following error:

CasperError: Casper is not started, can't execute `then()`.

Next, it checks if the page object is null.

If the condition is true, Casper creates a new page object.

After that, then() validates the step parameter to check if it is not a function.

If the parameter is not a function, it displays the following error:

CasperError: You can only define a step as a function

Then, the function checks if Casper is running.

If Casper is not running, then() appends the step to the end of the queue.

Otherwise, if Casper is running, it inserts a substep a level deeper than the previous step.

Finally, the then() function concludes by emitting a step.added event, and returns the Casper object.