Thursday, June 28, 2012

How to create a program to handle HUGE data?[0]

In the past few weeks, I was working on a project that need a program to "handle" millions of files. The verb, "handle", here means to download millions of html files from remote server, parse the html to find interested URLs and texts,  download images pointed by the URLs and organize all the files in a 3-level hierarchy.
In the following posts, I will discuss what I have suffered during the development, what I have applied to the system, and what I want to do but have not done yet. All the topics which gonna be covered can be classified into two categories, design philosophy and  design pattern.

... to be continued...

No comments:

Post a Comment