Nodejs scraping website after javascript has loaded the values

Probably a newbie question on nodejs/jsdom

I am trying to scrape a website using node.js
. I am using jsdom and jquery to get the html and parse the required things. But, somehow the values i am getting are not the ones shown on the website. Basically the values are dynamically changed by javascript and i want those values. The whole reason i was using nodejs/jsdom for scraping was that js would be executed and I get the values after that event.

Is there some way to tell jsdom to wait until the javascript executes? or have i got this all wrong? I have googled a lot on this matter.

Problem courtesy of: zubinmehta

Solution

You would be better of using something like casperjs http://casperjs.org/
. It is a testing utility based on phantomjs. It is basically exactly like opening the page in a webkit browser, just without the GUI. You could write something like. I dont think it works with node, but it should be easy enough to run a casper script and pipe the output back to node.:

var casper = require('casper').create({
    loadImages: true,
    loadPlugins: true,
    verbose: true,
    //logLevel: 'info',
    clientScripts: [
        'jquery-1.7.1.min.js',
    ],
    viewportSize: {
        width: 1366,
        height: 768,
    },
    pageSettings: {
        javascriptEnabled: true,
        userAgent: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5',
    },
});

casper.start(url);

casper.thenEvaluate(function () {
    //javascript code to run in the scope of the page
});

Solution courtesy of: tapan

Discussion

I don’t know if you’re up for alternatives, but when I need such sensitive scraping, I just use Firefox with iMacros. It runs all browser JS just fine, because it is
a browser.

http://www.iopus.com/imacros/firefox/

Discussion courtesy of: x10

First off, how are you using jsdom? Apparently, jsdom.env
does not execute scripts in the DOM, only the scripts that you add in the call to jsdom.env
. If you want to execute scripts, I think you should use jsdom.jsdom
.

Second, you need to specify an onload
handler. This should execute after the document is ready, and hopefully any scripts will have changed the DOM to your liking.

Something like this:

var jsdom = require('jsdom').jsdom
  , document = jsdom(html)
  , window = document.createWindow();

document.onload = function() {
  // Do your stuff
}

Discussion courtesy of: Linus Gustav Larsson Thiel

This recipe can be found in it’s original form on Stack Over Flow
.

Node.js Recipes责编内容来自:Node.js Recipes (源链) | 更多关于

阅读提示:酷辣虫无法对本内容的真实性提供任何保证,请自行验证并承担相关的风险与后果!
本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 前端开发 » Nodejs scraping website after javascript has loaded the values

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录