Python

Scrapping with Scrapy : Part 3

This is the third part of the series Scrapping with Scrapy.

In this post I will covering how to use selenium with scrapy, how to change the template, that gets loaded when a new Scrapy project is created. You may need to read part 1 and part 2 of this series to understand more.

Let’s start with how to use selenium with scrapy. 

First download the selenium jar, then cd to where it is present. Then start it using


java -jar selenium-server-jarfilename.jar

How to scrap when you can’t fetch data directly from the source, but you need to load the page, click somewhere, scroll down e.t.c, Selenium is for the rescue.

Here is the complete code of the scrapper.

You need to open the url using selenium, so that you can fetch what Scrapy can’t see.

Here is the code for such a spider. You need to add some lines in your spider, to get the page loaded using selenium. Have a look at the spider code, here.

To get what all functions,  it provides, you can use


dir (object name)

 

I will be posting some tips and tricks related to xpaths in some other posts.

Now let me tell you how to change the templates that gets loaded when you create new project in Scrapy.

First let’s install open-as-administrator, to easily edit files that requires sudo permission in Linux.


sudo add-apt-repository ppa:noobslab/apps
sudo apt-get update
sudo apt-get install open-as-administrator
nautilus -q

Then find Scrapy’s dist location, it would be somewhere here,

 


/usr/local/lib/python2.7/dist-packages/Scrapy-0.24.2-py2.7.egg/scrapy

Here you will have templates folder, open that, go to project, then to module inside it. Here you can see the all template files.

Your items.py template is also here named as items.py.tmpl. Now right click on it, open as administrator and edit it the way you want to get it loaded.

Hurray! You are done with it.

I’ll be soon adding a post as a cheatsheet for xpaths, scrapy commands, selenium commands and some other tips and tricks.

Let me know if something irritates you in the process.
Happy Coding. 🙂

Advertisements

2 thoughts on “Scrapping with Scrapy : Part 3

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s