One of the newest genres of software systems is the app for the smartphone. This research will integrate techniques of artificial intelligence into a mobile system that is accessible to anybody who has a smartphone. Artificial intelligence (AI) is the area of computer science that attempts to put human thinking into computer programs. Our application will incorporate AI techniques to eliminate the need for users to constantly “check the net” for new and relevant information on any topic they choose. It will generate “Alerts” containing useful information that will come in a succinct text message-like fashion (phone notification). AI techniques employed will include smart web crawling (scanning the Web for valuable patterns) and automatic language generation related to natural language understanding (such as what Siri does). This summer, the app is going to be exemplified using the topic of cryptocurrency (digital money) trading even though any topic could be used. The main producible from this summer research period (June 23rd through August 15th, 2014) will be a computer system comprised of the following components: 1) a web crawler, 2) an AI inference engine that performs web page ranking, parsing, and language generation, 3) a server that pushes relevant information to 4) a smartphone client app that displays the information.
The seminal research paper in the sub-discipline of web mining is Brin and Page’s presentation of the Google search engine [Brin 1998]. The most useful part of this paper that can be applied to our crawling system is their PageRank algorithm. It works by measuring the amount of relevant links and keywords in a page. In doing so, the importance to the search can be calculated using this simple iterative algorithm. We plan to incorporate this type of thinking to determine which pages to visit next from a set of hyperlinks found on a page. We propose that for a page searched, there will be a maximum number of links that can be visited within that page. The system will determine which of those will be visited based on the amount of relevant information within the page and the publication date (older information is not particularly useful to this app). Only the pages with the most information pertaining to the subject will be used. A direct quote that beautifully explains the main technique of their search engine:
PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the damping factor is the probability at each page the "random surfer" will get bored and request another random page.
An important feature of our system is the parsing and translating text into concise messages for the user. This process comes from language theory. Specific techniques include finite state machines, parsing, and code generation [Hopcroft 1979]. In Russell and Norvig’s Artificial Intelligence book, the PI has studied techniques of search, heuristics, and intelligent mining to use in the web crawler and AI inference engine (see Figure 1) [Russell, Norvig 2010]. Computer networking connects a system by allowing parts to communicate with each other. Kurose and Ross demonstrate how to do this through their presentation of the client-server programming paradigm and communication on the TCP/IP stack [Kurose 2013]. This is crucial to our application as our system is going to function through the Internet.