Stack Diary - Helpful Advice for Web Developers
  • Home
  • About
  • Categories
    • Web Development
    • WordPress
    • Tech
    • Digital Marketing
    • Code
    • Web Design
  • Snippets
    • JavaScript
    • CSS
    • Linux
    • SQL
    • Google
  • Contact
  • Home
  • About
  • Categories
    • Web Development
    • WordPress
    • Tech
    • Digital Marketing
    • Code
    • Web Design
  • Snippets
    • JavaScript
    • CSS
    • Linux
    • SQL
    • Google
  • Contact
Stack Diary - Helpful Advice for Web Developers
Stack Diary - Helpful Advice for Web Developers
LnRiLWNvbnRhaW5lciAudGItY29udGFpbmVyLWlubmVye3dpZHRoOjEwMCU7bWFyZ2luOjAgYXV0b30udGItY29udGFpbmVyIC50Yi1jb250YWluZXItaW5uZXJ7d2lkdGg6MTAwJTttYXJnaW46MCBhdXRvfUBtZWRpYSBvbmx5IHNjcmVlbiBhbmQgKG1heC13aWR0aDogNzgxcHgpIHsgLnRiLWNvbnRhaW5lciAudGItY29udGFpbmVyLWlubmVye3dpZHRoOjEwMCU7bWFyZ2luOjAgYXV0b30udGItY29udGFpbmVyIC50Yi1jb250YWluZXItaW5uZXJ7d2lkdGg6MTAwJTttYXJnaW46MCBhdXRvfSB9IEBtZWRpYSBvbmx5IHNjcmVlbiBhbmQgKG1heC13aWR0aDogNTk5cHgpIHsgLnRiLWNvbnRhaW5lciAudGItY29udGFpbmVyLWlubmVye3dpZHRoOjEwMCU7bWFyZ2luOjAgYXV0b30udGItY29udGFpbmVyIC50Yi1jb250YWluZXItaW5uZXJ7d2lkdGg6MTAwJTttYXJnaW46MCBhdXRvfSB9IA==
Google

How to Scrape Website Data with Google Sheets

Published on July 21, 2022

LnRiLWNvbnRhaW5lciAudGItY29udGFpbmVyLWlubmVye3dpZHRoOjEwMCU7bWFyZ2luOjAgYXV0b31AbWVkaWEgb25seSBzY3JlZW4gYW5kIChtYXgtd2lkdGg6IDc4MXB4KSB7IC50Yi1jb250YWluZXIgLnRiLWNvbnRhaW5lci1pbm5lcnt3aWR0aDoxMDAlO21hcmdpbjowIGF1dG99IH0gQG1lZGlhIG9ubHkgc2NyZWVuIGFuZCAobWF4LXdpZHRoOiA1OTlweCkgeyAudGItY29udGFpbmVyIC50Yi1jb250YWluZXItaW5uZXJ7d2lkdGg6MTAwJTttYXJnaW46MCBhdXRvfSB9IA==

There has been a video making rounds on Twitter and other social platforms showcasing the use of the “import website data” feature inside Microsoft Excel.

But, as it happens this feature is also available inside Google Sheets, so I thought I’d make a write-up on it to explain a little bit more about how it works and how to use it.

Summary
  • How to access ImportXML/HTML in Google Sheets
  • Importing Wikipedia table data into Google Sheets

The feature to scrape (import) website data in Google Sheets is called ImportHTML.

An alternative function you can use is ImportXML which supports importing data into Google Sheets from external URLs in XML, HTML, CSV, TSV, RSS, and ATOM XML format.

In order to use ImportHTML/XML you need to use XPath as part of the query:

IMPORTHTML("URL", "Query", "Index")

IMPORTXML("URL", "XPath Query")

If you want to follow along you can start a new spreadsheet by going to sheets.new.

How to access ImportXML/HTML in Google Sheets

In order to start a new ImportXML/HTML query you need to go to:

Insert -> Function -> Web -> ImportXML/HTML.

How to access ImportXML in Google Sheets

And now you can write your scraping function using an XPath query.

What is XPath?

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium and can be used to compute values from the content of an XML document. Wikipedia

So, because the Twitter video was using Wikipedia as an example, let’s do the same for this demo.

Importing Wikipedia table data into Google Sheets

I’m going to use this F1 driver’s records page to import table data about the drivers that have the most wins in Formula 1. I’m going to use the ImportHTML function because it saves me time from having to tinker with formatting.

Step 1: Finding out the table number

Wikipedia does not assign an ID to any of its tables. So, you have two options – count down how many tables there are in total (boring), or use DevTools to quickly inspect the full XPath to a specific site element. In our case, the table for the drivers with the most wins.

Go to the Wikipedia page I linked, and find the Wins table. Next, Right click on the table to inspect the table’s element. From here, go to Copy -> Copy full XPath.

Importing Wikipedia table data into Google Sheets

Once you do this, your clipboard will have the following string inside it:

/html/body/div[3]/div[3]/div[5]/div[1]/table[12]

/* this string is useful when you're doing full XML imports */

The string we care about for this demo is table[12]. This tells us that the table which lists the drivers with the most wins is number 12 on the page. So, we can go back to Google Sheets and write our query string to pull that table data inside our spreadsheet.

Step 2: Writing our query function to fetch the table data

We can now use ImportHTML to fetch the data we want. Here is what the full function looks like:

/* remember to use quotation marks ("") */

=IMPORTHTML("https://en.wikipedia.org/wiki/List_of_Formula_One_driver_records", "table", 12)

And here is the result after executing it:

example of scraping Wikipedia with Google Sheets

The possibilities for using these functions are pretty much endless. For example, let’s say you wanted to import all external URLs on a particular page into your spreadsheet.

In that case, you could use the ImportXML function and this XPath query:

/* this will import all the links on any given page except the links that are 
of the domain itself that you're querying */

IMPORTXML("https://google.com", "//a[not(contains(@href, 'google.com'))]/@href")

Share this:

Share on Twitter Share on Facebook Share on LinkedIn Share on Email

Snippet categories

Linux CSS DevTools JavaScript Git WordPress SQL Google

Latest posts

15 Useful Sites for Free Vector Illustrations
Top 5 Dynamic Content Plugins for WordPress
How to Copy Text That Cannot Be Selected/Copied
Chrome DevTools: 10 Useful Tips & Tricks
10 Best WooCommerce Marketing Plugins in 2022
Privacy Policy.
STACK · DIARY © 2022
  • Home
  • About
  • Write for Us
  • Disclosure
  • Contact
Stack Diary - Helpful Advice for Web Developers
  • Home
  • About
  • Categories
    • Web Development
    • WordPress
    • Tech
    • Digital Marketing
    • Code
    • Web Design
  • Snippets
    • JavaScript
    • CSS
    • Linux
    • SQL
    • Google
  • Contact