Discourse on web services

Discourse on web services
Modern web applications are all about web services. The idea of web service is quite old and over the period of time lots of new terminologies were associated with web services. Terms like RPC, XML-RPC, JSON-RPC, COM, DCOM, CORBA, RMI, Big services, RESTful, etc. are used in the context of web service. It eventually leads to confusion for those just starting out. So in this article, we will try to understand web services from ground up.

The term web services provides very little information about what really a web service is. In layman’s term, it is ok but from a developer’s perspective, it is almost useless. So in order to understand web services better, we will have to start from little beginning. Before we do so, it is assumed that you have basic idea of HTTP, web, web applications, XML, HTML and other similar terminologies.

Definition of computer program

A computer program is nothing but a set of instructions executed in a particular sequence to achieve the desired functionality. In the early days of computing, programs were very simple written in assembly instructions. After that came new high level programming languages like C, FORTRAN, COBOL, etc. that introduced many new features. Concept of function or subroutine became very popular where a single monolithic program is broken into number of subroutines which are then called and executed by master subroutine (In case of C language, main() subroutine call other subroutines). This made program code very modular and maintainable. Before we proceed, for the sake of simplicity, it is assumed that procedure, function, method and subroutine are same things and thus used interchangeably.

Quest for distributed computing

Around the same time ideas for distributed computing, message passing and networking started to take shape. As system were not powerful enough to do so, discussion was limited to theoretical and mathematical accuracy on academic level. The central quest of distributed computing was to allow one computer subroutine to execute some other subroutine running on different computer. This quest was precursor to yet bigger goal of having set of interconnected interoperable systems. The model of distributed computing was based on the way humans interact with each other. One person usually speaks with another and passes his intended message. So if human is considered as one subroutine that does some definite task, then in computer programs, one subroutine would simply communicate with another subroutine by simple means of message passing. This was the basis of distributed computing which is still central theme for any kind of modern application.

With the rise of networking and Internet, the academic interest got a practical edge and ideas in distributed computing took some bold steps.

RPC and OOP

RPC – Remote Procedure Call was one of the first communication technology to answer the quest of distributed computing. Actually, RPC is not some hardware or software technology. Rather it is just an abstract that describes how one program can execute subroutine of another program running on same or different computer connected by network. As such each major programming language started to implement RPC specification in the form of libraries and exposed the functionality of RPC to developers in the form of normal procedure call. A developer making a procedure call cannot tell just by looking if it is local or remote procedure call as all the complex network level issues were abstracted away from the developer. However, theoretically it is too simple but practically it was not so easy to make remote procedure calls. There were tons of factors that could go wrong and thus the quest continued for better answers.

Along the same lines, computer systems started to increase in complexity and size. It was not possible for traditional imperative languages to handle the complexity. This marked the rise of new programming paradigm of object oriented programming (OOP). In OOP everything is object in the sense that a complete program is viewed as a collection of interacting objects. OOP analogy is also quite similar to that of humans. Each object encapsulates its internal data structure and allows its state to be modified with the help of exposed function that other objects can use to interact. This added more to modularity and maintainability. It was capable of handling scalability and complexity, traits that are core to modern applications.

So, the concept of RPC was extended to OOP and the result is termed as object oriented remote procedure call (ORPC), remote invocation or remote method invocation (In OOP terminology, functions or subroutines as called as methods). In remote invocation, an object on one system would try to call the method on another object located on different computer system. But just note that it still did not solve the problems associated with RPC.

Problems and enhancement over RPC

First major problem with RPC was difficulty of use. Ideally, RPC was supposed to abstract internal niceties of making an RPC call. But that never happened. Networks characteristically unreliable constantly added new pains to technology evangelist. To tackle that problem, developers added lots of abstraction over it to simplify the development.

There are three notable ORPC technologies viz. Microsoft DCOM, CORBA and Java RMI. All of these technologies were great and had many features but those same too many features became problem. These technologies tried to do everything at once resulting in newer and more complex systems. Support for different protocols, binary formats, security, transactions, session management are just few to name. No doubt they did incredible job at achieving the transparency on RPC level by masking proxy/stub generation; provided IDL compilers, etc. But, they only shifted the pain points from one layer to another layer. They forgot most fundamental characteristic of network unreliability. These technologies were built on one assumption that network is reliable which is simply not possible. There are many others assumptions too like zero latency, single administrator, homogeneous networks, fixed topology, infinite bandwidth, etc. As there were many features, these technologies required very special runtimes to function adding to the cost.

Further the problem were not limited to technology. There were business problems too. If you remember, the most important goal of distributed computing is to achieve interoperability amongst the connected systems. And these ORPC solutions precisely failed at that. It was not easy for objects in different runtimes to communicate with each other. Incompatibility list is too big to explain. Even representation of simple Boolean is different across different platforms. Third party components were developed that used to act as bridges but again comes all those cost, performance and maintenance issues.

Eventually it had to fail. However, one thing that can be taken away is that simplicity is the ultimate sophistication that is spared from Darwinian laws of extinction.

The rise of web and XML

Parallel to this universe, there was another star on the horizon, World Wide Web (or just the Web or WWW). The history of Web is very interesting and probably deserve full length article. Web is a service provided by Internet. Other services provided by Internet are e-mail, chat rooms, video conferencing, telephony, etc. So Web is a software thing while Internet is hardware thing (Remember Web and Internet are two different things and so the source of confusion for many people). Nevertheless, Web is one big system of interlinked hypertext documents that are accessed via Internet. Hypertext documents are easily linked using hyperlinks and the pages were written using HTML.

There were similar systems before the Web. Some examples would be Gopher, WAIS, HyperCard, etc. But, the Web surpassed them all and emerged as ultimate winner. There are many reasons for this. From the standpoint of this article, two major reasons Web worked are as follows. First, it was simple. It could be used by anybody. Second was interoperability. Web was designed to leverage existing systems and most importantly Web utilized Internet to its full extent. All the components of Web like HTTP, HTML, URL/URI, server, browser were simple enough to use and evolve. Two completely dissimilar systems could send information to each other provided they are connected to network.

Next in the line was XML – eXtensible Markup Language as data interchange format. Traditional communication technologies used different format for message passing as per host platform capabilities. Message passing was based on old ways of argument based function calls. But XML allowed to send data between totally diverse systems as all the data is represented using markups. The format of XML was itself truly interoperable.

And so Web together with XML set the foundation stone for today’s web services. Along with XML, Web proved that it is possible to build highly scalable distributed applications that are truly interoperable.

Defining a web service

Now we are in a shape to define what really a web service is. The service is any business function, algorithm, remote procedure/subroutine or a resource. When such service is published using the technologies of Web viz. HTTP for communication and XML for message passing, the service is a web service.

In simplest form, web service implementations are based around two different notions. In first style, web services are usually exposed over HTTP. They use HTTP as a simple transport protocol and not beyond that. On top of HTTP, there are different protocols like SOAP, WSDL, XSD, etc. Services using this style are called as “big” web services. Then, there are web services that use HTTP as a complete application protocol that outlines service behavior. It uses all the features of HTTP as they are without any layer on top of it. Such web services are resource oriented or in stricter mode as “RESTful” web services.

In the initial days, big web services were more prominent as they can actually be seen as an extension of traditional RPC based mechanisms. RESTful web services are a recent trend in technology spectrum.

Early web services

XML-RPC, JSON-RPC, etc. are the early example of big web services. The ideas was same as that of traditional RPC. There were only two changes. It used XML envelop to exchange the data and encode the procedure call, while HTTP was used as transport medium. Typically, it used only HTTP POST method for communication. It did not leverage HTTP protocol completely and probably that’s the essential thing about big web services that they use HTTP just for transport and nothing else. They completely ignore the semantic aspects of HTTP.

Big web services are verb based in the sense that systems communicate with each other using actions. Typical service request by client would be something like:

  1. Add - $50 to account ABC
  2. Delete - user XYZ
  3. Update – profile picture of user XYZ
  4. Transfer - $50 from account ABC to DEF

JSON-RPC is also similar to XML-RPC with only difference that JSON is used as data interchange format.

Modern SOAP based big web services

XML-RPC had some limitations like it allowed only one method of serialization and had very limited data types. Many big vendors worked together to introduce new functionality and the standard evolved into SOAP – Simple Object Access Protocol. SOAP is highly sophisticated protocol for exchanging the structured information. In fact, as a personal opinion, I do not find anything “Simple” in SOAP and so the name SOAP is quite misleading. Just like XML-RPC, it has its own system of communication and only uses HTTP for transport purpose.

Other than SOAP which is a messaging protocol, there are other components of big web services protocol stack. For service discovery, there is UDDI (Universal Description Discovery and Integration) protocol; for service description, we have WSDL (Web Service Description Language). For transport protocol, HTTP is commonly used but other protocols like SMTP, FTP or BEEP (Blocks Extensible Exchange Protocol) can also be used. Only allowed HTTP method is POST and so server response is never cached by any intermediaries. Support for GET method has been added in the recent specification.

As you might have guessed, these web services built everything custom on top of HTTP. This makes then slightly heavy and thus the name “big” web services.

Resource based perspective on web services

If you recall, Web is nothing but a huge collection of hypertext documents. These documents are resources of the web. By this analogy, everything on web is a resource. On any resource, you typically perform CRUD (Create Read Update Delete) operations. HTTP protocol has certain methods which can be used to perform these CRUD operations as below:

  1. Create – PUT method
  2. Read – GET method
  3. Delete – DELETE method
  4. Update – PUT or POST method (depending on the idempotent nature of operation)

This is essentially idea of resource based web services. Some of the examples of resource based web services would be:

  1. GET – get user details
  2. PUT – upload the file to server
  3. POST – transfer $50 from account ABC to DEF
  4. DELETE – remove this user from the system

In essence, resource based services embraces HTTP as a whole with all its semantics. It does not use custom envelopes for message passing. However, it uses XML or JSON as its data interchange format. It makes use of all the available HTTP methods. This architecture makes it possible to use caching mechanism of intermediaries as HTTP specification allows GET requests to be cached.

Resource based web services often adhere to the principles of REST (REpresentational State Transfer). But, that is not always the case. There could be resource based services that follow only certain REST guidelines. So what really is REST? REST is not a software architecture; it is an architectural style that mandates that in order for some architecture to be RESTful, it should follow certain criteria. Making use of HTTP methods is just one aspect; there is more to REST than that. The detailed discussion about REST will be followed in next article.

Again, resource oriented or RESTful services tend to be lightweight in terms of data interchange as there is no custom stack on top of HTTP which is commonly the case in big web services.

Conclusion

Hope this article gives you a quick start guide to understanding web service. Now you also know why you should not compare SOAP with REST which is basically comparing a messaging format with architectural style. In upcoming articles, we will explore concept of REST, SOA, and ROA along with practical use cases for choosing the appropriate style of web services for your project.