The Innovators: How a Group of Inventors, Hackers, Geniuses, and Geeks Created the Digital Revolutio - Isaacson Walter - Страница 121
- Предыдущая
- 121/155
- Следующая
There was a problem. The way that Tim Berners-Lee had designed the Web, much to the consternation of hypertext purists such as Ted Nelson, anyone could create a link to another page without getting permission, registering the link in a database, or having the link work in both directions. That permitted the Web to expand willy-nilly. But it also meant that there was no simple way of knowing the number of links pointing to a Web page or where those links might be coming from. You could look at a Web page and see all the links going out, but you couldn’t see the number or the quality of the links pointing into it. “The Web was a poorer version of other collaboration systems I had seen because its hypertext had a flaw: it didn’t have bidirectional links,” said Page.142
So Page set about trying to figure out a way to gather a huge database of the links so that he could follow them in reverse and see which sites were linking to each page. One motivation was to foster collaboration. His scheme would allow folks to annotate another page. If Harry wrote a comment and linked it to Sally’s website, then people looking at her website could go see his comment. “By reversing the links, making it possible to trace them backwards, it would allow people to comment on or annotate a site simply by linking to it,” Page explained.143
Page’s method for reversing links was based on an audacious idea that struck him in the middle of the night when he woke up from a dream. “I was thinking: What if we could download the whole Web, and just keep the links,” he recalled. “I grabbed a pen and started writing. I spent the middle of that night scribbling out the details and convincing myself it would work.”144 His nocturnal burst of activity served as a lesson. “You have to be a little silly about the goals you are going to set,” he later told a group of Israeli students. “There is a phrase I learned in college called, ‘Having a healthy disregard for the impossible.’ That is a really good phrase. You should try to do things that most people would not.”145
Mapping the web was not a simple task. Even back then, in January 1996, there were 100,000 websites with a total of 10 million documents and close to a billion links between them, and it was growing exponentially each year. Early that summer, Page created a Web crawler that was designed to start on his home page and follow all of the links it encountered. As it darted like a spider through the Web, it would store the text of each hyperlink, the titles of the pages, and a record of where each link came from. He called the project BackRub.
Page told his advisor Winograd that, according to his rough estimate, his Web crawler would be able to accomplish the task in a few weeks. “Terry nodded knowingly, fully aware it would take much longer but wise enough to not tell me,” Page recalled. “The optimism of youth is often underrated!”146 The project was soon taking up almost half of Stanford’s entire Internet bandwidth, and it caused at least one campuswide outage. But university officials were indulgent. “I am almost out of disk space,” Page emailed Winograd on July 15, 1996, after he had collected 24 million URLs and more than 100 million links. “I have only about 15% of the pages but it seems very promising.”147
Both the audacity and the complexity of Page’s project appealed to the mathematical mind of Sergey Brin, who had been searching for a dissertation topic. He was thrilled to join forces with his friend: “This was the most exciting project, both because it tackled the Web, which represents human knowledge, and because I liked Larry.”148
BackRub was still, at that point, intended to be a compilation of backlinks on the Web that would serve as the basis for a possible annotation system and citation analysis. “Amazingly, I had no thought of building a search engine,” Page admitted. “The idea wasn’t even on the radar.” As the project evolved, he and Brin conjured up more sophisticated ways to assess the value of each page, based on the number and quality of links coming into it. That’s when it dawned on the BackRub Boys that their index of pages ranked by importance could become the foundation for a high-quality search engine. Thus was Google born. “When a really great dream shows up,” Page later said, “grab it!”149
At first the revised project was called PageRank, because it ranked each page captured in the BackRub index and, not incidentally, played to Page’s wry humor and touch of vanity. “Yeah, I was referring to myself, unfortunately,” he later sheepishly admitted. “I feel kind of bad about it.”150
That page-ranking goal led to yet another layer of complexity. Instead of just tabulating the number of links that pointed to a page, Page and Brin realized that it would be even better if they could also assign a value to each of those incoming links. For example, an incoming link from the New York Times should count for more than a link from Justin Hall’s dorm room at Swarthmore. That set up a recursive process with multiple feedback loops: each page was ranked by the number and quality of links coming into it, and the quality of these links was determined by the number and quality of links to the pages that originated them, and so on. “It’s all recursive,” Page explained. “It’s all a big circle. But mathematics is great. You can solve this.”151
This was the type of mathematical complexity that Brin could truly appreciate. “We actually developed a lot of math to solve that problem,” he recalled. “We converted the entire web into a big equation with several hundred million variables, which are the page ranks of all the web pages.”152 In a paper they coauthored with their two academic advisors, they spelled out the complex math formulas based on how many incoming links a page had and the relative rank of each of these links. Then they put it in simple words for the layperson: “A page has a high rank if the sum of the ranks of its backlinks is high. This covers both the case when a page has many backlinks and when a page has a few highly ranked backlinks.”153
The billion-dollar question was whether PageRank would actually produce better search results. So they did a comparison test. One example they used was searching university. In AltaVista and other engines, that would turn up a list of random pages that might happen to use that word in their title. “I remember asking them, ‘Why are you giving people garbage?’?” Page said. The answer he got was that the poor results were his fault, that he should refine his search query. “I had learned from my human-computer interaction course that blaming the user is not a good strategy, so I knew they fundamentally weren’t doing the right thing. That insight, the user is never wrong, led to this idea that we could produce a search engine that was better.”154 With PageRank, the top results for a search on university were Stanford, Harvard, MIT, and the University of Michigan, which pleased them immensely. “Wow,” Page recalled saying to himself. “It was pretty clear to me and the rest of the group that if you have a way of ranking things based not just on the page itself but based on what the world thought of that page, that would be a really valuable thing for search.”155
Page and Brin proceeded to refine PageRank by adding more factors, such as the frequency, type size, and location of keywords on a Web page. Extra points were added if the keyword was in the URL or was capitalized or was in the title. They would look at each set of results, then tweak and refine the formula. They discovered that it was important to give a lot of weight to the anchor text, the words that were underlined as a hyperlink. For example, the words Bill Clinton were the anchor text for many links leading to whitehouse.gov, so that Web page went to the top when a user searched Bill Clinton, even though the whitehouse.gov site did not have Bill Clinton’s name prominently on its home page. One competitor, by contrast, had “Bill Clinton Joke of the Day” as its number-one result when a user searched Bill Clinton.156
- Предыдущая
- 121/155
- Следующая