Senin, 25 Juli 2011

Article Uses The Internet, Applications And The Introduction

HTTP, TCP/IP, PPP, HTML, URL's ... Are you new to this emerging frontier of the Internet but starting to suffer from acronym overload? While new acronyms and abbreviations seem to be invented every day, there's a common set of underlying mechanisms being used on the Internet, and I'm going to explain these in layperson terms. This article is aimed for anyone wanting a better understanding of how Internet applications work and communicate with each other. It always helps to know the foundation of a particular technology, no matter how experienced or how new you are.
In this article I do assume a beginning familiarity with using Internet applications such as Web browsers and e-mail programs. I also assume a small amount of computer and technical knowledge, but even if you're new to using computers as well as being new to the Internet, you should be able to follow along with maybe an occasional look in a technical dictionary.

What Does Protocol Mean?

I've used the term 'protocol' in the title of this article, so a definition of the term is helpful. In real life, protocols are a set of procedures and customs that aid in communication and relationships between people. Many times the term is used in governmental foreign relations and other similar human discourse. When used in the context of computer networking, a protocol has a similar meaning, but is more specific. A network protocol is the set of very detailed rules, sequences, message formats, and procedures that computer systems use and understand when exchanging data with each other. To say it in a slightly different way, a network protocol (including all of the Internet protocols) is the term used to describe how computer systems communicate with each other at the bit and byte level. Network protocols are layered on top of each other, with each layer providing additional capabilities, but using the facilities provided by the lower layer.

Low Level Protocols

Let's start with the lowest-level Internet protocol, and gradually move upwards to the higher levels (the higher level protocol names should sound more familiar).

Starting at the Bottom - IP, IP Addresses

The most fundamental protocol is called IP, for (you guessed it) 'Internet Protocol'. IP is not responsible for much, only for transmitting each chunk of data from one system to another. What is of more interest is the location on the network of the systems that IP uses to send chunks of data back and forth. The common term for a network location is 'address', and each system on the Internet has an address. This address is called an IP address, and there's two formats for an IP address. Internally, each computer system uses an IP address that is composed of four numbers, usually written for humans with dots between each number. An example IP numeric address is '198.137.231.1' (which happens to be the main IP address for NW Nexus in Bellevue, WA). However, since it's easier for humans to remember names instead of numbers, most IP addresses have corresponding English-like names, also separated with dots. The previous address written as a name is 'halcyon.com'. Scattered throughout the Internet are systems with the responsibility of translating Internet name addresses into the IP address numeric form. These systems are called 'name servers'.
In general, it is better to use an Internet name address rather than the IP numeric address. This is because IP numeric addresses can sometimes change for a given location, and the change will be transparent if you are using the Internet name address rather than the IP numeric address. (The name servers have to be updated, of course.) Occasionally you do need to use the numeric form of an Internet address, and most Internet applications allow you to enter either format.
Another term used in conjunction with Internet name addresses is 'host name', because every Internet address must correspond to a computer system (a 'host') somewhere on the Internet. The systems that provide IP name to number translation are called 'Domain Name Servers', or DNS.

TCP and UDP

So far in this article I've explained how systems communicate at a very low level, using IP addresses in either a numeric or name form to identify each other. The IP layer doesn't provide many capabilities other than sending chunks of data back and forth. Much more is needed than that, which is where TCP and UDP come in. The TCP protocol provides a virtual connection between two systems (which means there may be many actual physical connections that make up the virtual connection), along with certain guarantees on the data chunks (called 'packets') that are passed between the systems. Two guarantees are retransmission of packets that are dropped (because of some network problem), and ensuring that the packets are received in the same order that they are sent (there can be multiple routes that a packet can take while traversing the Internet). A third guarantee is that each packet received by the application has exactly the same content as when it was sent. If a bit has changed or been dropped for some reason, TCP will detect it and cause the packet to be re-transmitted. TCP (which is an abbreviation for 'Transmission Control Protocol') is very common on the Internet, and is almost always mentioned together with IP, making the acronym TCP/IP (TCP running on top of IP).
Some applications use a different protocol running on top of IP called UDP ('User Datagram Protocol'). UDP sends data one chunk at a time (called a 'datagram') to the other system and doesn't provide a virtual connection like TCP does. UDP also doesn't provide the same guarantees that TCP does, which means that datagrams may be lost or arrive out of sequence. Each received datagram is checked for internal integrity (like TCP), but if it has been corrupted it is dropped, rather than re-transmitted (as TCP does).
You might be wondering why UDP is used instead of TCP since UDP is not as reliable. To provide the extra guarantees, TCP has a lot of overhead compared to UDP, which makes TCP slower than UDP. For applications where performance is more important than reliability, UDP makes more sense. Some examples include audio and video streaming over the Internet, and Internet phone applications.

Client / Server Concepts

Now that I've explained a little bit of how systems communicate using TCP/IP and UDP/IP, I'm going to explain a higher-level concept that is used throughout the Internet. Sometimes two applications that are communicating with each other are communicating as true peers, where each can send data back and forth at will, with each application initiating as well as responding to messages. More typical, however, is that one application will initiate a request, and the other application will respond. The initiating application is called a client, and the responding application is called a server. Usually a server application handles multiple client connections at the same time, and runs on a more powerful system hooked up to the Internet. Client / server communication is at the heart of most Internet applications.
Without looking at the protocol details yet, here's a brief look at typical client / server relationships on the Internet:
  • Web browser - client application, requests Web pages to browse
  • Web server - server application, knows where Web pages are stored, responds to Web browser requests
  • FTP application - client app, initiates file transfers (either uploads or downloads)
  • FTP server - server app, knows where files are archived, responds to ftp transfer requests
  • E-mail app - client app, sends and receives Internet e-mail
  • E-mail server - takes e-mail, makes sure it gets to the right place, provides it to the destination e-mail app (typically uses the SMTP protocol)
  • IRC client - client app, interactive chat program
  • IRC server - server app, handles communication between all the IRC clients

SLIP and PPP

If you're running Internet client applications on a system connected to a LAN ('Local Area Network'), you are probably running IP on top of an Ethernet or Token Ring network (or something newer) with a dedicated Internet connection. However, many people are now connecting to the Internet (through an Internet Service Provider, commonly abbreviated as ISP) by dialing up through a modem. Since IP wasn't designed to be used over dial-up lines, this requires yet another protocol. SLIP and PPP both allow IP data to be sent over dial-up lines. SLIP is an abbreviation for 'Serial Line IP' and PPP is short for 'Point-to-Point Protocol'. Both take IP data and package it up so that it can be sent over modem dial-up lines. PPP is considered to be newer and better than SLIP, although many Internet providers continue to support SLIP dial-up access.
While connected to an ISP using SLIP or PPP, your system is now another location on the Internet, with its own IP address. Your account with the ISP may assign you a permanent, fixed IP address and name, or it may provide what is called a 'dynamic' IP address. Since at any given time only a subset of dial-up lines are in use for an ISP, the provider may assign an IP number (and also typically an IP name) from a pool of available addresses.

Winsock

Many users that are connected to the Internet use MS Windows to run their Internet client applications. MS Windows has a standard interface that provides TCP/IP and UDP/IP (and other network protocol) support. This interface is called 'Winsock' and is implemented in a system file named 'winsock.dll' (usually located in the Windows or Windows system directory). Most MS Windows Internet applications use the Winsock interface to provide IP connectivity to the Internet, although there are still older proprietary IP implementations and interfaces being used. Most Winsock implementations provide PPP capabilities (and SLIP) as well as LAN connectivity. When using a Winsock connection (whether SLIP, PPP, Ethernet, or some other access type), the PC is a true Internet system (sometimes called a 'node'), with all the potential of other Internet systems. This is in contrast to dial-up type connections and accounts that provide only terminal or character-mode access (sometimes called 'shell accounts'). These type of connections don't use the Winsock interface, and the PC is then not a true Internet node. (Plus you also don't get the Windows graphical interface while using 'shell' access.)
Here's a link to an excellent collection of Winsock applications: The Ultimate Collection of Winsock Software

ISDN

ISDN ('Integrated Services Digital Network') isn't part of the network protocols I'm writing about in this article, but I want to mention it because it's starting to be used a lot. ISDN provides a higher-speed way to dial-up your computer to an ISP (assuming your ISP provides ISDN access). Instead of analog modems which commonly provide up to 28.8 kbps, ISDN requires special modems that take advantage of the digital capabilities of the service. To the IP protocol, ISDN is simply a different transport for the IP messages. Many winsock implementations provide PPP over ISDN lines, and it will be an integral part of Win95 future releases.
Here's a link to an excellent ISDN resource: Dan Kegel's ISDN Page

Internet Application Protocols

At this point we know a little bit about how different systems communicate with each other on the Internet at the lower protocol levels. Your workstation or PC is most likely on a LAN that is connected to the Internet, or using PPP or SLIP to dial-up to an ISP. TCP/IP or UDP/IP is used to send packets of information back and forth from your system to a remote system somewhere on the Internet. Now let's discuss the higher-level applications that are using a client / server relationship to send information back and forth. These higher-level applications should be more familiar, since they are what we use to do things on the Internet.

FTP and Telnet

FTP ('File Transfer Protocol') is a way to upload and download files on the Internet. Typically a site on the Internet stores a number of files (they could be application executables, graphics, or audio clips, for example), and runs an FTP server application that waits for transfer requests. To download a file to your own system, you run an FTP client application that connects to the FTP server, and request a file from a particular directory or folder. Files can be uploaded to the FTP server, if appropriate access is granted. FTP differentiates between text files (usually Ascii), and binary files (such as images and application executables), so care must be taken in specifying the appropriate type of transfer. When an Internet site makes files available to the general public, this is called 'anonymous' FTP. A password does not need to be supplied, although the user e-mail address is typically requested. Some sites have confidential files or directories, and an FTP login and password is needed to download or upload.
Telnet is a way to remotely login to another system on the Internet. A telnet server must be running on the remote system, and a telnet client application is run on the local system. When you are logged in to a system using telnet, it is as if you were logged in locally and using the operating system command line interface on the telnet server system. Typical operating systems for telnet servers are Unix, Windows NT, and VMS.

HTTP and HTML

HTTP ('HyperText Transfer Protocol') is the primary protocol of the World Wide Web (WWW). When a Web browser (such as Netscape) connects to a Web server, it uses HTTP to request Web pages. A Web browser is an Internet client application, and the Web server is an Internet server application. HTTP has the ability to transfer Web pages, graphics, and any other type of media that is used on the Web. HyperText Markup Language (HTML) is not an Internet protocol - it is the internal format of Web pages. HTML consists of a set of tags and internal commands that are embedded inside Web pages to control the appearance and layout of Web pages, as well as links to other Web pages.
FTP, telnet, SMTP, and almost all other Internet protocols are built-in to Web browsers. FTP, for example, is used to download application executables as well as other files (whenever you are asked for a 'save file' location, FTP is probably being used to transfer the file).

Web Addresses

A URL ('Uniform Resource Location') is the mechanism used for Web addresses. URLs are used in Web browsers to find the location of a particular Web page. It consists of three main parts - the protocol, the host name, and the directory location. Here's an example Web address: http://www.unitedmedia.com/comics/dilbert/ The protocol is first, followed by a colon and two slashes. In this case it is using the HTTP protocol, which means a Web page is at that location. The next portion is the Internet host name www.unitedmedia.com. Somewhere on the Internet is a system with that name, with a corresponding IP numeric address provided by the Internet DNS service. The last portion is the directory location, in this case /comics/dilbert. Since a Web server will typically have many different Web pages on multiple directories, the URL provides a way of specifying where to look.
Another example URL: ftp://butler.hpl.hp.com/stl/
This specifies using the FTP protocol to go to a system named butler.hpl.hp.com, then to the stl directory on that system. A listing of the files in that directory will be displayed, and the appropriate files can be downloaded with the FTP protocol.
E-mail is handled with the mailto prefix: mailto:cliffg@codewrangler.net

Internet E-mail

Internet e-mail uses a protocol called SMTP ('Simple Mail Transfer Protocol'). An e-mail client is used to compose and receive messages, and it communicates with an SMTP server which figures out where to send the message and takes responsibility for getting it there. An Internet e-mail address is composed of two parts - the user name, and the server location. For example, my Internet e-mail address is: cliffg@codewrangler.net
I have an account and user name cliffg with DomainDirect, and my registered server / domain name is codewrangler.net.
SMTP relies on having servers running at both sites (source and destination). If you're using PPP or SLIP to connect to the Internet, your system is typically not connected all the time, and for many users doesn't have a fixed name (it is dynamically assigned from a pool of names and addresses). In this case the e-mail is stored on the ISP e-mail server, and after logging in a special connection is made by the e-mail client to get the waiting e-mail messages.
E-mail systems on a LAN sometimes don't use the SMTP protocol (e.g. CC:Mail). In this case a translation is made between the two e-mail protocols so that e-mail can be interchanged. This is commonly called an e-mail gateway.

Other Internet Application Protocols

Gopher and Archie are menu-oriented methods to organize and find information on the Internet. Gopher or Archie clients connect up to the corresponding servers that provide the information that is being requested. Internet Relay Chat (IRC) is a text-based chat mechanism that runs over the Internet. IRC clients provide the user interface for typing, while IRC servers pass the information back and forth, as well as organize the channels that are used for chatting.
There are other Internet application protocols in use, with the same underlying client / server model of communication.

The Big Picture

Each system, or node, on the Internet is using IP addresses to connect and exchange data. Each node may be connected into the global Internet through a local LAN that has an Internet connection, or through dial-up SLIP or PPP access using an Internet provider, or through some other method (such as a shell account). TCP or UDP running on top of IP is used as a lower protocol for client / server protocols such as HTTP, FTP, Telnet, IRC, Gopher, and SMTP. Some client applications use only one high-level protocol, such as an FTP client, while others provide multi-protocol access (for example most Web browsers provide almost all high-level Internet protocols). Each high-level Internet protocol has a server that handles requests from the client application.
The most dramatic example is the global set of servers providing WWW access to Web browsers everywhere. Using the HTTP protocol, formatted text, graphics, and images are delivered from a Web (HTTP) server to Web clients (browsers), and hypertext links allow quick and easy access to other Web servers. The Java and Javascript languages allow interactive applications to be written which reside on a Web server and run within a Web browser as needed.
Commercial services such as Compuserve and America On-Line have internal networks that are different from the Internet. Increasingly, however, they are offering network gateways between their internal networks and the Internet, and providing software that allows users to access either one.
This is an exciting time in the world of global communications, and my hope is that this article has helped explain and open up some of the mysteries of Internet network protocols.

Tidak ada komentar:

Posting Komentar