RDC: Relevant Data Condenser
Knowledge Representation in the Cloud
Eric Ciminelli, Undergraduate Research Student

VAIL Undergraduate Research Group - ROLLINS COLLEGE
Dr. Jennifer Seitzer, Director
jseitzer@rollins.edu
407-646-2303


One of the newest genres of software systems is the app for the smartphone. This research will integrate techniques of artificial intelligence into a mobile system that is accessible to anybody who has a smartphone. Artificial intelligence (AI) is the area of computer science that attempts to put human thinking into computer programs. Our application will incorporate AI techniques to eliminate the need for users to constantly “check the net” for new and relevant information on any topic they choose. It will generate “Alerts” containing useful information that will come in a succinct text message-like fashion (phone notification). AI techniques employed will include smart web crawling (scanning the Web for valuable patterns) and automatic language generation related to natural language understanding (such as what Siri does). This summer, the app is going to be exemplified using the topic of cryptocurrency (digital money) trading even though any topic could be used. The main producible from this summer research period (June 23rd through August 15th, 2014) will be a computer system comprised of the following components: 1) a web crawler, 2) an AI inference engine that performs web page ranking, parsing, and language generation, 3) a server that pushes relevant information to 4) a smartphone client app that displays the information.

The seminal research paper in the sub-discipline of web mining is Brin and Page’s presentation of the Google search engine [Brin 1998]. The most useful part of this paper that can be applied to our crawling system is their PageRank algorithm. It works by measuring the amount of relevant links and keywords in a page. In doing so, the importance to the search can be calculated using this simple iterative algorithm. We plan to incorporate this type of thinking to determine which pages to visit next from a set of hyperlinks found on a page. We propose that for a page searched, there will be a maximum number of links that can be visited within that page. The system will determine which of those will be visited based on the amount of relevant information within the page and the publication date (older information is not particularly useful to this app). Only the pages with the most information pertaining to the subject will be used. A direct quote that beautifully explains the main technique of their search engine:


PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the damping factor is the probability at each page the "random surfer" will get bored and request another random page.

Parsing and Translating

An important feature of our system is the parsing and translating text into concise messages for the user. This process comes from language theory. Specific techniques include finite state machines, parsing, and code generation [Hopcroft 1979]. In Russell and Norvig’s Artificial Intelligence book, the PI has studied techniques of search, heuristics, and intelligent mining to use in the web crawler and AI inference engine (see Figure 1) [Russell, Norvig 2010]. Computer networking connects a system by allowing parts to communicate with each other. Kurose and Ross demonstrate how to do this through their presentation of the client-server programming paradigm and communication on the TCP/IP stack [Kurose 2013]. This is crucial to our application as our system is going to function through the Internet.

Multiple Clients and Servers


Real World Example of a User Subscription Topic

Real World Example: Cryptocurrency (aka Digital Money) Traders are taking advantage of how volatile the cryptocurrency market is right now. Cryptocurrencies are just beginning to be used in the world. There are even ATM’s popping up where you can withdraw your cryptocurrency for actual cash [Jervis 2014]. But until they become more widely accepted and recognized, their exchange rates will continue to fluctuate greatly on a daily basis. Our system will be helpful because of its ability to forecast these fluctuations by being the first to know about any news that goes on with cryptocurrency. One must realize that the exchange rates for cryptocurrencies normally change after news gets out of a certain event that can affect the price. For instance, if a popular business begins to accept a cryptocurrency, as soon as news spreads, the price is going to go up. Those who are the first to know the news are the ones who are most likely to succeed in the trading world. An important part of this work is that the topic of interest can be changed. We choose to exemplify our software system using the topic of cryptocurrency. However, as in many artificial intelligence applications, the underlying software infrastructure can be extracted and applied to virtually any other subject. For example, intelligent expert systems originally written for medical diagnosis were decomposed and then applied to computer system configuration [Russell 2010]. Even though the initial topic will be cryptocurrency trading, the purpose of this summer research is to exemplify system RDC and the framework that it is built from. Some other ideas for applications beyond trading include a disaster alert system, “Rollins specific” data such as ‘deadline to drop classes’ messages, and any other uses where new information about a certain subject is needed.



Related Publications to Hierarchical Genetic Algorithms