Intro |
After Deric made a component that parse the necessary data from Altavista, there was a need to
transfer the parsed information into one meaningful unit in order to import it to asp pages. Since not all the information from Alvistat were
necessary, we need somehow to parse out the information that we only needed, such as url, description, rank, and etc.
(Please read Deric's Report). In order to accomplish successful transfer of the parsed data,
we had to serialize the parsed information from altavista and push it to asp, and also
deserialized the information back into the useful parts once the whole information is transferred. *This idea of serializing/deserializing ties into well with bigger topic of marshalling and unmarshalling in distributed systems and I have enclosed brief introduction of marshalling and unmarshalling in separate page. I only focus on how we transfer the data in our project in here. |
Possible Solutions |
There were few possible solutions to accomplish serializing the parsed data and deserialize the transfer information.
In our case, xml was not used even though it seem very efficient solution because mainly, we did not have enough developmental time to learn about xml and java integration. Access database was another possible solution, but it did not seem very efficient in terms of process time and disk space. It did not seem very efficient to put the parsed data into a database and query the database in our purpose. We are trying to make a simple search agent. There were other ideas, such as making another asp component, but at the end it seem just plain tab delimited string was the most plausible solution since it integrates very well with Java, which was the main developmental language we used. |
What we actually did. |
We used Java to serialized the parsed information from Altavista into one useful tab delimited string (Nong made this)
and we used Java to deserialized the tabbed delimited string into the useful components. I think it is ToString method, but you want to read his report.
It turned out very well for us to use plain tab delimited string to serialize and deserialize the parsed data
because some of us alredy knew about String Tokenizer class in java, which simplified the code very much. The String Tokenizer class
breaks down the whole string using whatever delimited key, in our case tab. More about String Tokenizer, please follow
http://java.sun.com/products/jdk/1.1/docs/api/java.util.StringTokenizer.html
str[0] = URL str[1] = Title str[2] = Description str[3] = LastModified str[4] = Language str[5] = PageSize str[6] = Rank public void Token (String strToBeTokened) { StringTokenizer m = new StringTokenizer(strToBeTokened); for (int i = 0; i < 7; i++) str[i] = m.nextToken(); }I made a string array that contains seven strings, only the information we needed, namely URL, Title, Description, LastModified, Language, PageSize, and Rank. These information was parsed and put to one tab delimited string by Deric and Nong. Afterwards we only need to call the java class (with the tab delimited string as argument), the java class was made using above code to deserialize the information into useful pieces. When you call the above java class with tab delimited string, it will parse the string into sever pieces and put them into the string array. Afterwards, I made a serveral public get method (not shown here) that just returns each string. The simplified version of the full working code is shown here. |
Useful Information |
Deitel & Deitel, "JAVA How to Program", 3rd Edition, p495-497 |
Elliot's Hand-Out(from DS420 Fall of 1999) |
Project Home Page |