August 20th, 2007 - Matt Cutts Interview
Through no fault of Matt’s, it took quite some time for me to connect with him for this interview, conducted in late July, 2007. It was definitely worth the wait. Matt is a brilliant guy and for anyone who uses Google (I’m sure there’s a few), his work is quite relevant to the quality of information we receive. I used some of this material for an article that appeared in the Sydney Morning Herald and Melbourne’s The Age. You can read Matt’s latest musings on his blog.
Dan Skeen: So, Matt, why don’t you tell me a little bit about your history with dealing with Web spam and how you first got involved in that?
Matt Cutts: Ah, that’s an interesting question. When I joined Google, I knew nothing about Web spam. I was a computer graphics, computer vision sort of guy. But one of the very first assignments that I got was to develop SafeSearch, which is Google’s family filter. In the process of that, I ran across at least a site or two, that appeared to be trying to cheat; and back then, this was early 2000, PageRank was thought of as nearly completely unassailable. The whole idea of spamming PageRank was a bit alien to many people around the Web. So, it was a little bit of a wake up call; and within a year, I had gone and essentially asked to work on Web spam full time, and since about April, 2001, I have essentially worked on Web spam nonstop and search quality, in general.
Matt Cutts: Sure. A lot of the early search engines used on-page factors a lot more than links, and it actually took a couple of years before very many search engine optimizers or webmasters realized just how much of a difference things like hyperlinks and anchor text would make. So, a lot of the early spam attempts we’d see would be things like keyword stuffing, completely random gibberish, people doing dictionaries of tons of words on a page. You’d also see things like cloaking, which is showing different content of search engines than you show to users; and so back in those days, it was a little more like the Wild West, and people would try to show a page to search engines about G-rated cartoons. Then, when you actually visited that page, they might try to show you porn.
So, it’s interesting to watch the evolutions of the market over time because back in the early days, you could go to a large Search Engine Optimization firm and get counseled to say okay, let’s try this Black Hat technique, this technique that violates search engine’s quality guidelines. It was not difficult to find large companies that would propose those sort of schemes. In these days, thankfully, that’s quite rare. If you go to reputable SEO firms, for the most part, they’re quite up front about what they’re doing and they’ll at least inform you about the things that will possibly involve some risk.
So, one nice thing is you see fewer scams. Of course, over time, people have tried a lot of different techniques, everything from going to a bunch of guestbooks and signing them and saying hey, great site, check out my site, to all sorts of things in between. What we’re seeing these days is more of a trend where people essentially say I might be able to make some money for a short term doing shortcuts or tricks; but if I want traffic that lasts for a long time, it’s actually easier to go ahead and follow White Hat techniques and build links in an organic way or by using some smart gimmicks or neat hooks. That sort of traffic and those sort of rankings tend to last for a much longer time.
So, we’re seeing an increase in the amount of interest that people have in search; but you’re also seeing a lot more people who are willing to use these valid White Hat techniques.
Dan Skeen: Okay. How have those Black Hat techniques evolved? What are some of the latest tricks that you’re encountering and perhaps engaged in dealing with right now?
Matt Cutts: It’s kind of interesting because a lot of the techniques are gathered around trying to get links, and so the vast majority of tricks that we see people trying to do are trying to get a large quantity of links in a very short period of time or trying to find various gimmicks to try and get links. Some of those we would consider unethical. So for example, the example I gave earlier about people signing guestbooks, attempts like that still continue. People still continue to try to, for example, run programs that will go and sign a bunch of blogs. These days, it’s more likely that people will try to hit blogs instead of guestbooks. But, for the most part, Google does a pretty good job of countering those sort of techniques.
One of the newer trends that we’ve seen is for people to not worry about trying to spam so much as trying to make a catchy site and then use something that’s typically called Social Media Optimization. So, for example, somebody might write a really interesting story and try to get it up on Digg or on Slashdot or reddit, for example, which I guess Wired, you know, now owns. That’s kind of interesting because that is not necessarily considered Black Hat, at all. A good example is custom-made industrial blenders, they essentially said let’s take our blenders and just throw a whole bunch of different things into it. Let’s try a rake and an iPod; you might have seen these videos where they just throw in a cheeseburger and make a cheeseburger shake and stuff like that.
It’s really just a renaissance of creativity because people think about interesting hooks that will cause people to enjoy the site and return to the site and e-mail their friends about it and bookmark it. So, this Social Media Optimization as some people refer to it is really kind of a new area; and rather than try to trick people into getting the links, they’re looking for the creative hooks that will cause people to want to link to a site.
Matt Cutts: We try to stay away from thinking of it too much of an arms race and more along the lines of there’s a finite list of techniques that people can use in trying to tackle each of those techniques in turn and find scalable robust ways to tackle those techniques. I’ll give you a quick example. There are some people who might throw a bunch of keywords onto a page, and they could do that by scraping a search engine; so they might do a query, take all the results, and then use that as fodder for trying to attract search engines with those keywords. They can scrape competitors. You can also try to create gibberish by scraping different sites and stitching them together. But, in all of those cases, what we can try to do is say how natural is this language? Does it look artificial? And then if people are trying to get links, if they’re trying to get links too quickly, we can say okay, well, how exactly are they getting links? We can try to target that as an effective technique.
Matt Cutts: It can depend. Typically, when we talk about cloaking, we’re worried a lot more about deceptive techniques, so the sort of instance where somebody is advertising cartoons to the search engine but the user actually gets porn, something like that. We do have groups that look at issues like that, for example, and try to make sure that all the sources that are in Google news do follow best practices. So, for example, I’m not as familiar with that area; but I’ll try to give you a quick example. There is a policy called First Click Free, so the first click that a user does to a Web site is free and that they see the content that Googlebot saw and then any clicks from that outward to the next page, for example, then you might get a registration page or something that involves payment. So, subscription stuff is typically not quite as serious in that it doesn’t involve active deception; but we do try to make sure that we are consistent, and so we do work with companies even in those instances to try and make sure that they implement best practices.
Matt Cutts: Yeah, that’s a good question. I don’t know if we break it down into percentages directly, but I would certainly say that it’s more reputable now than it was say five years ago; and we do provide a really good page. I don’t know if you’re interested, but you might want to mention it in case readers are interested on our Web site where we basically give good guidelines to vet a search engine optimizer. So, it gives advice like if they’re sending cold-call e-mails out of the blue guaranteeing top-ranking, then you might want to be a little more suspicious; and it gives good advice like looking for a reference, so talk to a search engine optimizer’s other clients. Even down into some pretty good level of detail like a Search Engine Optimizer should be able to walk you through what they’re doing. If there’s somebody where they’re waving their hands and they’re telling you there’s a secret proprietary technique that they’re not willing to educate you about, that’s where you should start to worry a little bit more. So, I think if you go to google.com/webmasters, we provide a lot of detail there, including the SEO page. We also provide something called Webmaster Central, and that is a self-service console so that you can see things like crawl errors. Whenever Googlebot was crawling your site, if we found 404s or if we found broken links or weren’t able to fetch pages, you can get a lot of good diagnostics and stuff there.
So, we do try to provide some pretty good resources like
Matt Cutts: Well, thank you. We try hard. I mean, rather than having one-on-one communication, which we love, but there’s just too many Webmasters to be able to communicate with each one individually. Some of those tools are fantastic because they really give the power back to the site owner so that they can solve the problem themselves, in many cases.
Matt Cutts: There is a saying in the SEO industry, PPC, which normally stands for pay per click; but they refer to porn, pills, and casinos. So, it’s certainly one of the areas that’s more competitive, of course, purposely just because people can make money in some of those areas. We do keep an eye on every industry, and we do try to keep an eye on lots of different countries, as well so that we can make sure that we return the best results in all of those industries and all of those countries. The techniques do vary from country to country, which is kind of interesting as well.
Matt Cutts: It’s partially language; sometimes it’s culture, and sometimes it’s economics. For example, for a long time in
Matt Cutts: To some degree, but the Web has always been a pretty international place; so there’s always been SEO going on in a lot of different countries. I think that one think that’s interesting is within the last year Google has been paying a lot more attention to many different international markets; and so you’re more likely to see Google take action in a country like Germany or China or India or wherever than probably a few years ago. So, to some degree, you do see more programming or other activities by different countries; but on the other hand, there’s always been some amount of activity in a lot of different countries.
Matt Cutts: We don’t normally break out how many people we have working on specific teams or what sort of resources, but I’ll give you an example that gives a little bit of a feel for the communication and how we do use people to help improve Google’s performance on spam. We are actually able to send e-mails to webmasters whenever we find that they have things like hidden text or keywords stuffed in on their site. We can send those e-mails in 10 different languages; and what we do in those e-mails is we actually try to find the best e-mail address for the webmaster, and we e-mail the specific URL. We say oh, you might not know it but on this URL, here is some hidden text; and we would love it if you could correct that because we think you have a high-quality site and would like to have it in Google’s index. That process is driven by at least some level of manual review; we do have computers to assist us in the process, but before you send an e-mail to someone, you do want to be able to have someone check and say yes, this is the appropriate German language, for example. So, there certainly is a role to the extent that we think it can be an important process to communicate with webmasters; but we don’t give the specific numbers or breakout exactly how many people work on that aspect of things.
Matt Cutts: Well, I think I’m really glad that Google takes communication seriously and does work very hard to try and communicate online in general with our blog. We have many, many blogs and try to communicate to a bunch of different constituencies, including webmasters. Certainly, I’ve had instances, in fact, there was a Wired reporter who stopped by in 2000 and I remember the very first thing I said was you know, I think Wired Magazine has too many ads. You know this was back at the height of the .com boom and so Wired was 300 pages long. I got a very good talking to about journalism. Well, you know, we actually pay a lot of attention to the ratio of ads, and it’s never more than this percentage, and this is why this works; you know, as an engineer, I sort of realized okay, probably that wasn’t the most diplomatic thing to say.
I kind of…I lucked out in that I got a chance to communicate about Google very early, and so most of my mistakes happened when not as many people were paying attention. But, it has been a lot of fun. We do take communication very seriously, and we do try to give as much information as we can anytime that we’re worried that it might compromise Google search quality; then we have to be a little more careful with our words, but there are a vast number of misconceptions out there. Sometimes people say things like you know, if I had a Microsoft Web server versus an Apache Web server, does Google penalize my site? And it’s great to be able to say ‘no, absolutely, categorically, not. Whatever your Web server is we just care about your content and try to return the best search results; or if someone asks about what if my Web site ends with .asp versus .php does that make any difference? We can say no, no. Again, it just goes back to your content. Or, how long, you know, if my Web server’s a little slow, does that factor in to the scoring? We can say no, it doesn’t, as long as we’re able to fetch the page, we try to rank it just the same as one second versus a longer amount of time to fetch the page. So, there are many, many different things that we are able to communicate that are not at all confidential but that still help webmasters and site owners quite a bit. So, I’m really glad that we communicate at conferences, we talk online, we’ve got blogs, we’ve got this webmaster forum that I talked about earlier; and between all of those, we do try to communicate as much as we can to just help regular people.
Matt Cutts: Sure, let’s say, for example, that they use hidden text, white text on a white background, or something like that for this example. Next, we detect that; and we can detect it using algorithms. Or, we can detect it through, for example, someone might do a spam report. Maybe a competitor doesn’t think it’s fair that this person has white text on a white background and seems to rank okay.
That can trigger a manual review. Once the violation of our quality guidelines is detected, we’ll take action on that; and we try to take the appropriate action. If it’s a small mom and pop, we might take shorter action trying to essentially warn the site rather than trying to be too vigorous. Many times, we’ll be able to alert the webmaster; so I mentioned earlier in the interview that we can often e-mail or otherwise contact the site and let them know that they have potential problems ranking in Google because of something like hidden text.
Then webmasters, once they’re aware of the problem, either they’re not ranking where they think they should be ranking, or they’ve been definitively told by Google, and another mechanism is to go to the Google Webmaster Central. Some of the times, if we’ll e-mail you, we’ll also tell you yes, you do have a penalty on your site, which lets people know that they should be trying to cure that.
The next stage is to try to correct the problem or address it; and then finally, what they want to do is submit, we used to call it a re-inclusion request, but we’re renaming it to reconsideration request because you’re not always completely removed from Google. So, it’s not that you’re always re-included; but you do want to be reconsidered, and the criteria for that are essentially to say here is what we think was going on, this is why you might have penalized the site, if it does have a penalty. Here’s what we’ve done to correct it, and here are the steps that we’ve taken to make sure that this doesn’t happen again because essentially what Google cares about is having clean index and having a good search experience for our users.
If we think the site has changed its course to take the appropriate action and clean up whatever quality violations or spamming material might have been on their site and we have a reasonable expectation that it won’t happen again, many times we’ll reconsider that site and bring it back into Google’s index or bring it to rank where it was ranking before. So, that’s sort of the circle that typically happens. We’ll detect it, we’ll take action, many times we’ll communicate it, webmasters can make the change, then they’ll fill out a reconsideration request; and many times that will allow a site to rank back where it was before.
Matt Cutts: It depends on the individual incident. Many times, we’ll se white text on a white background throughout the entire site; and then it would be appropriate to remove the entire site. If it’s something like through a page and the amount of text is especially egregious, we’re taking all the various reasons into account, we can remove an entire site from our index. What we try to do is find the appropriate balance so that if a site had a small amount of hidden text and corrected that relatively quickly, they could come back into our index in a relatively short period of time because our goal is primarily to have the highest quality index. So, we do want to have the content; it’s just we also want the content to be clean for our users.
Matt Cutts: Absolutely. Another way to think about a trustworthy site is a site that is authoritative in some sense; and by that, it provides a lot of value to users, more than just a regular Web site might, or that it’s an expert in some sense. So for example, some people will make a site by just copying a whole bunch of data from other places and trying to flap it up and trying to monetize that in some way and don’t really put a lot of effort or time into the individual site; and that’s not a lot of value add, so that’s not as much of a trustworthy site. Some steps that a small mom and pop can do involve adding a lot of high-quality content, and that can be something like a newsletter or a blog. A blog is a great way to just force you to write regular amounts of high-quality content, and often you can attract links because you take part in the conversation throughout the blogosphere.
It can also involve putting up a forum; and so if you’re interested in diamond engagement rings, you can have people who are looking to figure out whether things are a good deal or not, can leave messages, and you can provide answers. So, user-generated content in the form of forums can actually work out relatively well.
There are a lot of different hooks that people can use; creativity of all manners typically is rewarded with links, and I’ll give you a just a simple example that illustrates that. Somebody was asking me about a translation site, the sort of thing that would translate English into Japanese or vice versa; and essentially, a site came to me and said well, we don’t rank as well as the number one site, and we don’t understand why not. So I looked at the site that was complaining; it was only about five or seven pages, and it was almost as if someone had taken a brochure and just put it up on the Web. There was an About Us page, there was a Contact page, and there was a one-page about what they did, that was about it; and then they complained about this number one site.
I went to check that out, and the number one site was actually talking about the different types of Japanese writing, hiragana, katakana, kanji, and how you could write your name and what different characters would mean. If you take a step back and look at it from the perspective of someone who had nothing to do with either site, one site looked a little like a brochure and one site, essentially, looked like a really neat interactive site where you could figure out what does this Japanese character mean or how could I write my name. It seems intuitive when you take that step back; but many people don’t get enough perspective to see the entire forest instead of just the trees, but those sort of compelling hooks or gimmicks or things to draw in users that can be as simple as games or they can be of informative as newsletters, those are the sort of things that really attract a lot of links, a lot of repeat visitors, a lot of word of mouth and buzz.