Wednesday, June 13, 2012

面向对象编程有感【2】-- Understand SRP and LSP

In this post, I will mainly talk about SRP(Single Responsibility Principle) and LSP(Liskov Substitution Principle) in Object Oriented Programming. As I want to show you something different from the traditional boring textbooks, a simple project is described. You may first examine the requirements and think about how to implement a system to get it done!
Requirement: We need to design a web spider, which could traverse all the web pages rooted from a seed link. There are a number of different types of pages, which need different operation to be performed. In addition, we may have to add more types in the future.

After you reading the simple description above, few design patterns may pop up in your mind, such as factory pattern, responsibility chain pattern, observer pattern, MVC pattern and singleton pattern. Right, all those patterns are designed to assist you in such a scenario. But, when you implement all the patterns together, there may be few mistakes happened. Let us take a look at my code first.
I defined few classes as follows:
  • BaseTask and its subclasses. The base class has few virtual functions, which generalize some common operations needed by each types of page.
Untitled
  • pObject and its subcalsses. As the data extracted from the pages may have different saving path and preprocessing requirement. pObject opens few virtual functions, such as where to save the data and how to preprocess it. Its derivatives override those functions and properly handle/keep the data. Untitled2
  • The next two classes are Download_Manager and Data_Manager. Download_Manager is the observer to the Data_Manager. Once a new task, which is in form of BaseTask, is added to Data_Manager, Download_Manager will try to pop the task from the task queue and download the page needed by this task. Once the page is downloaded, it will be assigned to the task and the task will return to Data_Manager, where the analyzing is performed. Some new tasks may be initialized and append to Data_Manager during analyzing. Furthermore, Singleton pattern is applied to both managers, so only few interfaces are opened to each other.
image
The reason why I design like this is that, the only object transferred between managers is BaseTask. So if we want to handle other types of pages, we could just implement another derivative class and push the objects into the task queue. In addition, each task has a “pObject”, which could handle the file path, URL and analyze process.
  • Inside each task object, as we need to open a constant interface to other components, I define a recursive function, called decompose(), which is used to traverse all XML blocks and allow other parts to invoke. The code segment looks like the follows:
1:  void BaseTask::Decompose( QDomElement parentElement )  
2:  {  
3:    QDomNode element = parentElement.firstChild();  
4:    //  
5:    while(!element.isNull())  
6:    {  
7:      int type = this->RightBlock( element.toElement() );  
8:      if( type!=0 )  
9:      {  
10:        Extract( element.toElement(), type );  
11:      }  
12:      Decompose( element.toElement() );  
13:      element = element.nextSibling();  
14:    }  
15:    return;  
16:  }


RightBlock() is used to determine which XML block is needed; Extract() is used to extract the information in the XML block. Each derivative class will override those two functions, and then, each object of derivative class will act in different ways. The following flow chart shows how this code works.

image



This is a common trick in C++ based on polymorphism. We could initialize an object with sub-class but assign it to a base class pointer. As the functions in base class are virtual, the function point in virtual function table will updated to the implementation in subclass.

So far so good, right. LSP seems applied properly. But I found a big mistake about SRP when I try to update the code. Which part violates SRP? Let us first understand what I want to update with the original architecture.

As my previous structure tries to do everything in one executable, it becomes very vulnerable, or buggy Sad smile. So I want to add a simple function that could resume the previous downloading. It seems simple but it is not. Because the tasks does not follow SRP. Let us first examine what happened inside a task object from its creation to release.

image


Can you find which part is worry? Right, the file system and URL interface is only a variable of the task objects. When I want to recover the downloading, the program may not need to download the page, but directly start the parsing. But based on the above sequence, the tasks are initialized by other tasks during parsing; parsing begins after download is finished; we want to recover and omit the download step… Now, you see the dead loop right. If we add more if-else, the original code may work, but not elegant.

Another part looking ugly is the URL interface. As the relative paths need to be translated into global, we not only need its parent object, but also modify the format. For example, the hierarchy of a website is as follows:


Each task object need to perform modification respectively. If I put the modification in tasks, it is easier and straight forward, but the code is hard to maintain. Imagine such a situation, all the levels of URL are updated and what shall we do? Update all relevant code in each subclass of BaseTask? What if we have 100 different tasks? It is not possible… The pObject in also lost its meaning of existence. As SRP says, we should put one responsibility in one object! We should put all URLs and paths task in pObject. The following diagram shows what the code should look like.

image


My previous design of the tasks looks like the left diagram. When the URLs input to task, there are more than one object could modify it. So it undermined the scalability of the code. The right hand diagram shows a design that keeps both scalability and the same functionalities. You feel familiar with the right hand side diagram. Because it likes the responsibility chain pattern, from some point of view. If any parts updated, you could just plugin another object to filter the input, and isolate the changes to the other component.

I have written so much. As I am also a green hand of this topic, some mistakes and errors may occur. If you found any, or just want to share your idea, please leave me a comment or directly email me.

Have fun!

1 comment: