Google fills out your search forms

Monday, May 19th, 2008

Occasionally I have found sites that on their home page you would have to select a company or product from a drop down box, and then enter the site from that information. Until now, Google could not access those pages without being provided direct links. Sites with these types of pages were often called the “Deep Web” or the “Invisible Web”, because search engines could not access them. Google has said in the past that they believe that 80% or more of web pages are “hidden” to them because they require a form to fill out to find them.

Now, if you had hired a good Search Engine Optimizer, this would not be an issue as they would know how to provide links to those pages so all search engines could access them appropriately. However, sometimes the advice of your SEO expert is ignored, or you didn’t include one on your team, and thus search engines can’t access those pages.

In April Google announced that it could begin search pages which required a user to fill out a form. This has all types of interesting applications, both good and bad. You need to understand what this means, as well as what this can do. So without further ado, I present The Good, The Bad, and The Truth.

  • The Good:
    • Now more pages will be accessible.
    • Simple “categorical” search forms will no longer cause Google to stumble. For example I recently built a simple movie web application. In it people could search by genre. I had to devise ways to not use a drop down when possible so Search Engines could find the reviews.
  • The Bad:
    • If you tried to “hide” pages, you need to rethink your method. Consider the use of the robot.txt file or robots meta tag to properly ask search engines to not process certain files.
    • Some people fear that this means Google will explore or try to hack restricted access sections of your site. (Remember your robots.txt file in these instances.)
  • Some Truth:
    • Only Google has announced this feature. While other search engines will probably have to follow suite, as this time they don’t and they still account for 35-45% of all search traffic.
    • Only simple forms are filled out. Google is not (currently) entering information into text boxes, so many forms cannot be processed.

Popularity: 26% [?]

More Metrics – What Pages are Seen (First)

Friday, January 25th, 2008

Learning how our web site works is an important task. We’ve seen some simple examples in previous steps (visits, page views, and unique visitors) as well as determining how long someone visited your site, and which was the last page they viewed.Most of these metrics were simple to user, and that is fine. However, now we want to look at the flip-side of which pages people leave on, that is what pages they are viewing, and which ones do they view first. This is which pages people go to first, and how many times was a page shown to someone. Let’s look at the larger picture: which pages were viewed, then look to see what order they saw the pages.

Pages Viewed may go by different names, such as Top Content, or Most Viewed Pages, tells you which pages are viewed most often. Depending upon your Analytics package, you may get this by file name, by web page title, or both. Depending upon your needs that you will look at. When they use the file name, anything after the file, which might change, will be displayed as a separate entry, despite the physical file being the same. This is because the part after the file name might change what the file displays. Consider these two examples:

  • Product.php?id=21 – this might display product information for Widgets
  • Product.php?id=32 – might display product information for Gizmos.

Because the content can change, each entry is listed separately. Pages based on internal search forms fall into this category quite often, and can cause quite a bit of confusion. Even if the parameters are reversed, it will often view the two entries as two separate pages. This is intentional in case someone wants to track the path on how they got to the file, or other similar things.

Page Views by URL vs. by Page TitleWhy this is important: from here were can start to see what files are viewed the most often. Clients are often surprised to find out that it is not necessarily the home page (more on that in a bit). They are also (sometimes) surprised that some pages are viewed a lot more than others. Consider the following graphic. At the top, you will notice over 1,700 URLs (web page addresses), in the bottom graph, you will notice there are only about 30 page titles. This is for the same site, same time frame, the difference is the first graphic is for the page file name, and the second is the page titles. Because the file names are similar, but a lot of extra stuff is appended because of the search pages, the 1700 files are exaggerated. You have to find the report that works for you.

These reports came from Google Analytics, and they provide some additional information. We’ll look at that in the next article or two.

Why this doesn’t matter: outside of wanting to know what page is being displayed, this isn’t important. That is an important thing to understand. This report can’t show you why they went the page they viewed, if they found the information they were looking for, or, when viewed by itself, what did they do after viewing the page.

A count of individual page views is also unimportant because it doesn’t specify why the person has come to that page, possibly repeatedly. Are they lost? Do they have to go through a series of “hoops” to get where they want to go? Page views are a lazy man’s metric because it can look impressive, but not provide any key insight.

Top Landing Page, sometimes called an Entrance Page, is the page that the user enters the web site on. Many people naturally assume that someone will go to the home page first. They will spend lots of money making a cool splash screen, and not spend those resources on making the other pages better. There are many times another page is the used to enter the site. For example:

  • Search Engines take people to the most relevant page, not the home page.
  • Other people might link from their web page to a favorite article, product, or review.
  • People bookmark the page in a site that helps them, not the homepage necessarily. (I commonly bookmark login pages – especially if I have to pay a bill on a web site.)
  • Someone might send/receive an e-mail with a link to your site, that isn’t to the home page.

All of these and more cause other pages to be the “landing page”.

Why this is important: Knowing what pages people enter your site from, helps you know what pages to focus on. You should also watch to see if a change in your site/page increases or decreases the people entering your site through that page. While this wouldn’t necessarily define the cause and effect, it allows you to see some of what is happening and start to make an educated guess. It is also important to know if you will be moving a page, as it lets you know that other’s will be effected. Often if someone reaches your site, and it gives them an error page (404 is the Not Found Error) – they will leave your site and find the answer somewhere else. Knowing this can provide you with the information to take to your technical people to ensure that they use the proper redirects to make sure people don’t get lost on your site. (Hint: You should always redirect your pages…)

Why this doesn’t matter: When we put together this metric with some others, you will see how this can be a powerful metric. However, by itself, as with most other metrics, you will not be able to gather much information from it.

Popularity: 45% [?]