Internet development perspectives

This is an excerpt from a short essay I wrote in 2006 for FID.ru.

The emergence of the Internet, as well as many other important events in the history of our civilization, was a coincidence. In response to the Soviet Union launching a satellite into space, the US government funded a project whose goal was to create a reliable transmission network. A network that would allow the Ministry of Defence to communicate in case of a nuclear attack by the Soviet Union. No one expected that a purely military order will result in peaceful technology we know today as the Internet. Another important technology, World Wide Web WWW, was also born by accident – it came out of a brilliant head of a nuclear physics scientist at CERN institute in Switzerland. Interestingly, WWW is not even included in the official list of CERN inventions. Judging from the past, it would be rational to expect that future discoveries will also be made by accident. Another argument in favor of this thesis is that many (if not most) of the popular Internet technologies including peer-to-peer connections (Napster) and instant messaging (ICQ), were invented and implemented by individuals or small groups without any significant capital, nor elite education. And this is despite the fact that all large IT corporations today have R&D divisions filled with egghead experts with prestigious diplomas and outstanding experience. In sum, we could say that the Internet evolves chaotically – people of different backgrounds, people from different countries are coming up with new interesting use cases, and there is no single common vector of development, nor there is a centralized management of any kind.

Every year we witness new trends on the Internet that quickly grasp user minds. Spam and Internet advertising, IP telephony and video conferencing, entertainment and erotic sites, blogs and chat rooms, online shopping and online casinos, virtual universities and file-sharing networks – all these phenomena very quickly broke into our reality. Changes happen so often that we do not have time to assess their impact on society. It’s very hard if not impossible to recognize a pattern, a principle in the phenomenon of a technology becoming a huge trend. There are so many success factors involved, nobody can guarantee impact from the launch of a new project, even a well-funded one; at the same time, sometimes dull or technically primitive resources effortlessly gain users attention. One could perhaps come up with a new science department to study the phenomenon of popularity on the Internet, and that new science would be closely associated with psychology. The Internet also gives us a great the opportunity to study human nature, in particular, its sociobehavioural characteristics because no other technology allows us to observe hundreds of million people at once. The global network, without any doubt, will continue to evolve rapidly. It is also quite clear that none of the representatives of the human race at the moment can predict how exactly it will develop. We can make educated guesses, based on the history and features of the human society.

I believe that in the near future we will see a major modification of the HTTP protocol

I believe that in the near future we will see a major modification of the HTTP protocol because the web content distribution today is far from optimal, and the evolution usually moves towards economic efficiency. So, really, how does World Wide Web work?

An end user, let’s call him consumer, in most cases, starts with a visit to a search engine system through where he finds links to pages that contain information of interest. Where does the search engine get all the information? All search engines use a similar method – they regularly crawl of the Internet using robots-indexers. These robots save gathered information in the search database. Therefore, search engines know a historical state of a certain website, even though the content may have changed already.

How does information (content) get to the websites? Another end user, let’s call him supplier, wants to publish some information on the Internet, for example – information about his services. This supplier would usually look for as many sites as possible suitable this post, then publishes it for a fee or free of charge. Note that for a consumer there is no way to verify the authenticity of this new post – it could have been posted by somebody else. Furthermore, the more sites supplier wants to publish to – the more tedious and time-consuming procedure becomes.

In result we get a flow like this:

inet_future

At the first stage, information is always fresh, always authentic, and has only one instance so there is no storage redundancy. At the second stage, however, content may be outdated because of the numerous modification the supplier did, as well as any modifications that websites applied; information can not be verified for source authenticity, and there is a clear issue of redundancy – the content is duplicated on several sites. The next step, search engines, contains information that is potentially even more outdated, once again, there is no author authenticity check, and we have another case of redundancy – each search engine stores this content in its index. Here we arrive at the last stage where the consumer looks at the content through malfunctioning binoculars, or, one might even say, a time machine – he sees the content that existed some time ago in the past and may by now even have disappeared. The content has been replicated on numerous servers around the world, often thousands or tens of thousands of computers.

If we look at the flow described above it becomes obvious that the current state of World Wide Web is not optimal, and consequently, it would make economic sense to implement a new technology that will reduce production costs associated with storage redundancy, and also allow to bring some order to the chaotic Internet – content authenticity and efficient content updates. Ideally, I see the future model of the WWW as follows: content supplier uses his personal password (or any other means of authentication) to log in to a particular site where he publishes information (content) that is exported to all subscribed websites in the world immediately. Therefore, website content is not stored statically – the web site is built dynamically from content modules. Imagine a wide brick wall, where each brick belongs to a different person. Every time one of the persons repaints his brick in another color, it’s instantly updated on the wall. At the same time, we can add new blocks belonging to other content suppliers on either side of this wall. Every brick could be a part of multiple walls simultaneously, and bricks could come and ago as they wish. Search engines wouldn’t need to waste CPU time indexing similar websites, but instead could look inside the content directly, inside the bricks. If a consumer looks for a yellow brick – quite often he doesn’t care who (which walls) have used that brick, consumer cares about content.

Some content has been updated just a minute ago? It’s already part of the search

At the moment I imagine it as follows: large Internet players such as Google, MSN and Yandex; which commonly offer free email services, introduce content management tools. The notion of content providers is introduced – the Internet receives specialized servers that store storage the “bricks” – end-user content. Accounts that users maintain with flagship Internet companies serve as passports in the global web. These addresses are used in the authentication process when content is published. One moment the user check his mail, and another moment he publishes a news post that is displayed on all relevant web pages, pages that “subscribed” to this user. Site owners decide who (what users) should generate the content of their websites. Search engines look for information directly in the content providers. Perhaps, content providers would operate their own search interfaces that other users, content providers and search engines could rely on. In result, we would get a full-featured search of the entire web where data is always fresh. Some content has been updated just a minute ago? It’s already part of the search. And that content is only stored in one specific place and anybody can verify its authenticity.

Post by Denis Mysenko

Born in the snows of Siberia