It’s time we start taking more advantage of the power at our fingertips. Google Sheets make it easy to create and manage web scraping campaigns with a few simple clicks.
Google Sheets is a spreadsheet program that allows users to create and edit spreadsheets online. It also has an API that allows users to scrape data from websites using specific scripts. This article will teach you how to use Google Sheets for web scraping and campaign building.
Editor’s note: As the year draws to a close, we’re counting down the top 12 most popular and useful expert posts on SagaReach Marketing this year.
Our editorial staff selected this collection based on the performance, usefulness, quality, and value provided for you, our readers.
We’ll reprint one of the finest articles of the year every day until December 24th, beginning with No. 12 and going down to No. 1. Today, we begin our countdown with our No. 5 column, which was first published on August 4, 2021.
Andrea Atzori’s how-to article shows readers how to use Google Sheets for site scraping and campaign creation without any prior coding knowledge.
Enjoy!
We’ve all been in a scenario when we needed to extract information from a website.
When working on a fresh account or campaign, you may not have all of the necessary data or information to create advertising, for example.
Advertisement
Continue reading below for more information.
In a perfect scenario, we’d have received all of the content, landing pages, and pertinent data we need in an easy-to-import format like a CSV, Excel spreadsheet, or Google Sheet. (Or, at the absolute least, offer us with tabbed data that can be imported into one of the above formats.)
However, this isn’t always the case.
Those without the tools for web scraping – or the technical skills to utilize Python to assist with the operation – may have had to resort to the time-consuming effort of physically copying and pasting hundreds or thousands of items.
My team was required to do the following at a recent job:
- Go to the client’s website and fill out the form.
- More than 150 new goods are available for download on 15 separate locations.
- Each product’s name and landing page URL should be copied and pasted into a spreadsheet.
You can image how long the process would have taken if we had done exactly that and completed it manually.
Advertisement
Continue reading below for more information.
It’s not only time-consuming, but with someone manually going through that many products and pages and physically copying and pasting the data product by product, the odds of making a mistake are significant.
It would then take even more time to proofread the paper and ensure that it was free of errors.
There has to be a better way to do things.
There is, in fact, good news! Let me demonstrate how we achieved it.
What Is IMPORTXML and How Does It Work?
Google Sheets comes to the rescue. Let me introduce you to the IMPORTXML function.
IMPORTXML “imports data from any of several structured data formats, including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds,” according to Google’s help website.
IMPORTXML is a function that enables you to scrape structured data from websites without having to know how to code.
It’s simple to extract data like page names, descriptions, and links, but it’s more difficult to extract more complicated information.
How Can IMPORTXML Assist With Scraping Webpage Elements?
The function is straightforward and just takes two parameters:
- The URL of the website from which we want to scrape or extract data.
- And the XPath of the element that contains the data.
The XML Path Language, or XPath, is a tool for navigating among components and attributes in an XML document.
To get the page title from https://en.wikipedia.org/wiki/Moon landing, for example, we would use:
=IMPORTXML(“https://en.wikipedia.org/wiki/Moon landing”, “/title”) =IMPORTXML(“https://en.wikipedia.org/wiki/Moon landing”, “/title”)
This will give you the following result: Wikipedia has a page dedicated to the Apollo 11 moon landing.
Try this if we’re searching for the page description:
=IMPORTXML(“https://www.searchenginejournal.com/”,”//meta[@name=’description’]/@content”)
Here’s a rundown of some of the most helpful and frequent XPath queries:
Advertisement
Continue reading below for more information.
- /title of the page
- /meta[@name=’description’]/@content /meta[@name=’description’]/@content
- /h1 h1 h1 h1 h1 h1 h1
- @href @href @href @href @href @hre
Take a look at IMPORTXML in action.
Since finding IMPORTXML in Google Sheets, it’s been one of our secret weapons for automating a variety of everyday chores, including campaign and ad generation, content research, and more.
Furthermore, the function, when coupled with other formulae and add-ons, may be utilized for more complicated jobs that would otherwise need complex solutions and development, such as Python-based tools.
However, we’ll examine into IMPORTXML in its most basic version in this case: collecting data from a web page.
Let’s take a look at a real-life scenario.
Assume we’ve been tasked with developing a campaign for SagaReach Marketing.
They want us to promote the latest 30 articles that have been published on the website’s PPC area.
Advertisement
Continue reading below for more information.
You could think it’s a straightforward process.
Unfortunately, the editors are unable to email us the data and have graciously requested that we use the website to gather the information needed to launch the campaign.
One approach to achieve this is to open two browser windows, one with the website and the other with Google Sheets or Excel, as we indicated at the beginning of our post. Then, article by article and link by link, we’d start copying and pasting the material.
However, by utilizing IMPORTXML in Google Sheets, we can accomplish the same result in a fraction of the time, with little to no possibility of making errors.
Here’s how to do it.
Step 1: Create a new Google Sheet.
First, we create a new Google Sheets document that is completely blank:
Step 2: Fill in the blanks with the content you want to scrape.
Add the URL of the page (or pages) from which we wish to scrape data.
Advertisement
Continue reading below for more information.
We begin with https://www.searchenginejournal.com/category/pay-per-click/ in our case:
July 2021, screenshot from Google Sheets
Step 3: Look for the XPath
We look for the XPath of the element whose content we wish to import into our data spreadsheet.
Let’s start with the titles of the most recent 30 articles in our sample.
Go to Google Chrome. Right-click and choose Inspect when hovering over the title of one of the articles.
July 2021, courtesy of SearchEngineJournal.com
The Chrome Dev Tools window will open as follows:
July 2021, courtesy of SearchEngineJournal.com
Make sure the article title is still chosen and highlighted, then right-click and pick Copy > Copy XPath from the drop-down menu.
Advertisement
Continue reading below for more information.
Step 4: Import the information into Google Sheets
Introduce the IMPORTXML method in your Google Sheets document like follows:
=IMPORTXML(B1,”/*[starts-with(@id, ‘title’)]”) =IMPORTXML(B1,”/*[starts-with(@id, ‘title’)]”)
There are a few things to keep in mind:
First, we substituted the page’s URL with a reference to the cell where the URL is kept in our formula (B1).
Second, the XPath will always be contained in double-quotes when copied from Chrome.
(/*[@id=”title 1″])
However, the double quotations sign must be replaced with a single quote sign to ensure that the formula is not broken.
(/*[@id=’title 1’])
Because the page ID title for each article varies (title 1, title 2, etc. ), we must slightly adjust the query and use “starts-with” to catch all objects on the page with an ID that includes “title.”
On the Google Sheets spreadsheet, this appears like this:
July 2021, screenshot from Google Sheets
And when the query has loaded the data into the spreadsheet, this is what the results look like in just a few moments:
July 2021, screenshot from Google Sheets
As you can see, the list returns all of the articles that are shown on the scraped page (including my previous piece about automation and how to use Ad Customizers to Improve Google Ads campaign performance).
Advertisement
Continue reading below for more information.
This method may also be used to scrape any additional data needed to build up your ad campaign.
Let’s populate our Sheets document with the landing page URLs, featured snippets from each post, and the author’s name.
We need to change the query for the landing page URLs to reflect that we’re looking for the HREF element related to the article title.
As a result, our query will be as follows:
=IMPORTXML(B1,”/*[starts-with(@id, ‘title’)] =IMPORTXML(B1,”/*[starts-with(@id, ‘title’)] /@href”)
‘/@href’ should now be appended to the end of the Xpath.
July 2021, screenshot from Google Sheets
Voila! We have the URLs of the landing pages right away:
July 2021, screenshot from Google Sheets
The highlighted snippets and author names may be treated in the same way:
July 2021, screenshot from Google Sheets
Troubleshooting
One thing to keep in mind is that the column in which the data is filled must have enough cells free and no other data in the way in order to completely expand and fill up the spreadsheet with all of the data provided by the query.
Advertisement
Continue reading below for more information.
This operates similarly to how an ARRAYFORMULA works in that there must be no other data in the same column for the formula to grow.
Conclusion
And there you have it: a completely automated, error-free method of scraping data from (possibly) any website, whether you require content and product descriptions or ecommerce data like product prices and shipping costs.
In a time when having the capacity to scrape online pages and structured content in a simple and rapid manner may be invaluable, the ability to scrape web pages and structured content in an easy and quick manner can be priceless. IMPORTXML may also assist to shorten execution times and limit the likelihood of making errors, as we’ve seen above.
Furthermore, the function is not only a wonderful tool for PPC chores, but it can also be utilized for a variety of other projects that need web scraping, such as SEO and content duties.
Christmas Countdown 2021 SagaReach Marketing:
Advertisement
Continue reading below for more information.
Aleutie/Shutterstock/Featured Image
The “web scraping python google sheets” is a tutorial that will teach you how to use Google Sheets for web scraping. The process can be used to build campaigns, and it requires no coding knowledge.
Frequently Asked Questions
Can you web scrape in Google Sheets?
A: Yes, it is possible to scrape data from the web using Google Sheets.
How do I pull data from Google Sheets to a website?
A: You can use a service such as https://www.google.com/sheets/export and choose As a CSV.
Can I use Google Sheets as a database for website?
A: Yes, you can use Google Sheets as a database for web app.
Related Tags
- google sheets web scraping javascript
- pull data from google spreadsheet to website
- importhtml google sheets
- import data from password protected website to google sheets