Wednesday, October 27, 2010

The Heuristics

·         Searching is an important aspect of AI, as so many problems are viewed as searching problems.
·         In one line searching is nothing but reaching the goal state from initial state using some algorithms.
·         Actually goal state is always associated with a function which returns true if the state, passed as argument, is a goal state. Please note that we don’t used “the goal state”, and its need not be as it is not unique. Because there are problems where we have more than one goal states.
·         A search algorithm is complete if it finds one of the goal states of for the given problem.
·         A search algorithm is optimal if it finds the goal state (or path to goal state) with less cost. Cost is some mathematical scale to find the pay to move from one state to another. Cost is uniform or it need not be uniform for some problems.
·         The algorithms are generally classified into 2 types – blindfold (uninformed) search algorithms and heuristic (informed) search algorithms. The first category of problems are generic algorithms which can be applied to any of problems, but computationally very bad as they don’t have any idea about the problem for which they are applied but the latter ones are embedded with the knowledge about the problem to significantly speed up the searching process.
·         The best examples of blindfold search algorithms are DFS (Depth first search), BFS (Breadth first search) and UCS (Uniform-cost search).
·         The blindfold search algorithms are provided with nothing except node, edges and costs of the state space tree of the problem.
·         DFS is complete but it is not an optimal search algorithm but BFS is complete under some assumptions and it is optimal, Of course.
·         All above mentioned algorithms are blind, i.e., they don’t have any idea about the correct direction to solution(s) of the given problem. They blindly stumble around the current state until it reaches the goal state.  
·         One way of providing this missed information is by the concept of Heuristic function.
·         Before proceeding let’s see one simple problem of finding the largest number among an integer array, say 14, 25, 39, 54, 66, 19. Of course answer is 66, but here we got some simple and important questions! What makes us to say 66 is the largest number in this array? What is special with 66 to say it is greater than all other numbers? In other words what is the scale that we used to find out 66? The answers for these questions are trivial, Of course, but we need understand them correctly. Here we used the face value of each number as the scale.
·         Even this is simple we need to see from the machine point of view. Machine needs some tool or scale or at least a measurement to take decision. In other words we need a mathematical model, Of course. The mathematical model can be well organized model or its just simple function. Informally the well organized model means, a mathematical model with heavy knowledge base, highly skilled algorithms etc… A simple function is mapping of current state of the problem to some mathematical quantity which says, how much far or near to the goal of the problem.
·         Heuristics is a mathematical model, which may or may not use the well organized model, of latter kind to estimate the cost from any state to goal state.
·         In more formal (literally) way, heuristic is a function h: S -> R+, which maps each member of set S to a positive real number (which is nothing but cost). h is not one-to-one because, cost for two state can be same and it is an on-to function, Of course. At the beginning of this definition we mentioned as literally formal because, in this definition we didn’t specify that the destination should be a goal state. Whatever heuristic acts as tool that what machine needs, for searching problems.
·         After this, next question comes to our mind is what should be a heuristic function? In other words how to implement this heuristic function? Below are some characteristics of heuristic functions.
·         The function should maps to a positive real number or to some measurable mean.
·         The input should include the one of the state from the set S of problem.
·         The heuristic can be an abstraction of domain knowledge of the solving problem.
·         It should be transparent to the external world.
·         In some cases we can just estimate the cost so it can also called as estimator for machine to reach the one of the goal state of the problem.
ontd...

Saturday, October 16, 2010

Classification in Data Mining

                                                Classifier Tutorial….
·        
·        
·        
·       Classification, which is the task of assigning objects to one of several predefined categories.
·       Classifier technique is method/model to do the classification.
·       The input for this model is a set of attributes and the output is label of a class (category).
·       The input data for a classification task is a collection of records. Each record is characterized by a tuple (x, y), where x is the attribute set and y is a special attribute, designated as the class label (also known as category attribute or target attribute).
·         The attributes in x can be both continuous and discrete attributes, but the target attribute must be a discrete attribute. This is a key characteristic that distinguishes classification from regression, a predictive modeling task in which y is continuous attribute.
·          Definition of Classification: Classification is the task of learning a target function f that maps each set x to one of the predefined class labels y.
·       Target function is also known as Classification model.
·       Classification model can be used in Descriptive Modeling, the task of listing out the attributes of set x for a specific value of y, and in Predictive Modeling, the task of predicting the value of target attribute for unknown records (Records with missing target attribute).
·       Classification techniques are most suited for predicting or describing data sets with binary or nominal categories. This technique is not well suited for ordinal, which does not consider the implicit order among the categories, and relationship among categories.


Approach to solving Classification Problem…
·       A classification technique, or a classifier, is a systematic approach to building classification models from an input data set.
·       Examples: Decision trees, Rule-based classifiers, Neural Networks, Support vector machines and Naïve Bayes Classifiers.
·       All above mentioned algorithms employs a learning algorithm to identify a model that best fits the relationship between the attribute set and class label of the input data. The key objective of learning algorithm is to build models with good generalization capabilities; i.e., models that accurately predict the class labels of previously unknown records.
·       The fundamental need for learning algorithm is Train Data Set, Data set with target attribute known. This data set is used to build the classification model. After finding a classification model we need to apply the same on Test Data Set, data set containing unknown records.
·       Evaluation of a classification model is based on the counts of test records correctly and incorrectly predicted by the model. These counts are tabulated in a matrix called Confusion matrix. The below table depicts the confusion matrix for binary classification problems.     

Predicted Class

Class=1
Class=0
Actual…
Class=1
f­­11
 f10
…Class
Class=0
f01
f00

·       fij in this table denotes the number of records from class i predicted to be of class j,  for instance f01 indicates the number of records from class 0 incorrectly predicted as class 1. So the total number correct predictions is given by ( f00 + f11 ) and number of wrong predictions is given by ( f10 + f01 ).
·       So accuracy is the ratio of correct predictions to the sum of both types of predictions. In contrast error rate is the ratio of incorrect predictions to the sum of both types of predictions.
·       A best classification model is one which has more accuracy and less error rate.

Decision Trees: Hunt’s Algorithm:

·       These types of trees contain 3 types of nodes wiz root node, internal node and leaf node.
·       A root node that has no incoming edges and zero or more outgoing edges.
·       Internal nodes, each of which has exactly one incoming edge and two or more outgoing edges.
·       Leaf or terminal nodes, each of which has exactly and no outgoing edges and these nodes represent the one of the class label.
·       The non-terminal node, which includes root and internal nodes, contains attribute test conditions to separate records that have different characteristics.
·       One of the well known methods to build this model is Hunt’s Algorithm.
·       Of course, Hunt’s algorithm is a Learning Algorithm as discussed earlier.
·       In hunt’s algorithm, a decision tree is grown in a recursive fashion by partitioning the training records into successively purer subsets. Let Dt be the set of training records that are associated with node t and y = { y1, y2, y3, … ,yn } be the class labels. The following is a recursive definition of Hunt’s algorithm.
1.    If all the records in Dt belong to the same class yt, then t is a leaf node labeled as yt.
2.    If Dt contains records that belong to more than one class, an attribute test condition is selected to partition the records into smaller subsets. A child node is created for each outcome of the test condition and the records in Dt are distributed to the children based on the outcomes. The algorithm is then recursively applied to each child node.
·       It is possible for some child nodes created in step 2 to be empty; i.e., there are no records associated with these nodes. This can happen if none of the training records have the combination of attribute values associated with such nodes. In this case the node is declared as leaf node with same class label as the majority class of training records associated with its parent node.
·       In step 2, if all the records associated with Dt have identical attribute values (except for the class label), then it is impossible to split these records any further. In this case, the node is declared as a leaf node with same label as the majority class of training records associated with this node.
Design issues of Decision trees…
·       A learning algorithm must address the following issues…
·       How should the training records be split?
·       How should the splitting procedure stop?
  
Below we have some methods for attribute test conditions
Binary Attribute- trivial
Nominal Attribute- Multi way or binary
Ordinal Attributes- Multi way or binary unless partition does not violates the order.
Continuous Attribute- Binary( by selecting best partition) or multi way ( discretization of continuous values by preserving order ).
            

Thursday, October 14, 2010

The Sudoku


Before we start
* Sudoku is puzzle first introduced in Japan; the rules and restrictions are explained below  It is combinatorial puzzle which needs to fill up the empty cells such that none of the following constraints are violated:
           1. No repetition in each row.
           2. No repetition in each column.
           3. No repetition in each square.

Our approach towards the puzzle

Every Sudoku can easily solved by Exhaustive search algorithm in an exponential time complexity, as shown below.

Lets assume we have n empty boxes to be filled in the puzzle. And we knew that every box have the possibilities as the set {1,2,3,4,5,6,7,8,9}. i.e. each empty square is having 9 possibilities; so total no possibilities are now become 9n So the time complexity for this trivial procedure is in the order of O(9n), i.e. exponential complexity.
           * Suppose if assume tg is time required to generate the possibility and tt is time for testing the generated solution, then Total Time Taken is given by…
                                                                      
                                                          Total Time Taken(T.T.T) = 9n * ( tg + tt )
                                                          T.T.T = 9n * Tc         where Tc = ( tg + tt )
                                                    As we see Tc is constant  to increase the efficiency of this ES we need to decrease the first member of the above expression, i.e. 9n
So lets modify the ES to Modified ES and call the same as MES throughout this discussion...

Modified Exhaustive Search
       * As we discussed in the previous page we need to reduce 9 n part of the T.T.T equation… To do this we can decrease ' 9 ' or ' n '. So lets  first discuss how to reduce ' n '.
            * Actually reducing ' n ' can be done by using well known AI Technique called Constraint Satisfaction .
            * What is Constraint Satisfaction…?
                Actually this is a method to solve the problems - in which the goal is to discover some problem state that  satisfies a given set of constraints.
            * As we know Sudoku has some constraints as stated in page 1… But here we are using this technique to solve only some part of the puzzle… 
            * As we see in the above board all squares are filled except the box at Row 2 and Column 2. 
            * And the possibilities for this empty box are the set of numbers between 1 to 9 which are not in Row 2, not in Column 2 and not in first square. 
            *  So in general the set of possibilities for a box at Row m and Column n is given by…
                         P [ m , n ] = ( Su - Sm ) Л ( Su - Sn ) Л ( Su - Ssq( m, n ) )

                 Where Su = { 1, 2, 3, 4, 5, 6 ,7 ,8 ,9 }, Sm set of elements in row m, Sn set of elements in column n, Ssq(m,n)  set of elements in the square to which the box with row m and column n associated with...
           * If the cardinality of P is one then it is clear that the respective box as only one possibility; which can be filled up. So by calculating P for each void places we can fill up some void places ( sometimes all), and we simplify our further search in our problem solution space.
           * After applying the above technique  we can fill some voids; but one thing which is to be carefully observed here is, after filling some voids  the new state of the board will affect the possibility sets of other unfilled boxes. So finding the possibility sets for each unfilled boxes again, might make some possibility sets to have the cardinality as one, which can normalized as explained. 
           * So by above discussion we conclude that the mentioned technique is a recursively applicable one.
So lets make this idea to work by constructing an algorithm for the same.
We are calling this Algorithm as FON ( stands for First One Normalize ).
It is called so because, It is applied first ( First ) to normalize ( Normalize ) the empty boxes with only one(One) possibility. 
Algorithm: Fon
       //Input: Sudoku Board as SB 
       //Output: Modified board or Same board
      N <- findNoOfVoids ( SB )      //Getting no of void places...
      Pos[N] <- getPosListOfAllVoids ( SB )       //Finding the possibilities for all voids by CS
      If IsFonnable(Pos) then        //Checking whether Fon is applicable or not 
            SB <- Normalize ( SB, Pos )   //Normalizing and Updating the board
Return Fon(SB)      //Calling recursively for further Fon
       Else
Return SB     //Returning on non-applicability of Fon              
       End If 
          * As we seen in the algorithm we assumed some methods and their description is given below…

                          1. findNoOfVoids ( SB ): This method returns the number of unfilled boxes in the board given as argument.
                         2. getPosListOfAllVoids ( SB ): This methods returns the possibility sets for all void places in the board given as argument. And this is the method where we're applying the Constraint Satisfaction ( CS ) idea.
                         3. IsFonnable ( Pos ): This method returns True if Pos is not empty and their exist at least one entry with cardinality one, else it returns False.
                         4. Normalize ( SB, Pos ) : This method fills the unique possibility boxes and returns the updated board.

        * As we can conclude from the algorithm output description, Fon may or may not reduce the number of unfilled boxes. If it fails in the first step then we end up with the same state of the board. In contrast if it fills the all unfilled boxes ( recursively or just in one pass ) then we end with the solution itself. So the number of filled boxes by Fon is lies between zero and total number of voids.
                                   
                                i.e. If we consider nf is number of unfilled boxes which were resolved by Fon then,
                                                           
                                                 0 ≤ nf ≤ n
       * So after Fon the number of unfilled boxes left is given by,

                                         N = n - nf
       * Finally the n value is reduced to n - nf; which makes the T.T.T expression as
                          T.T.T = 9( n - nf) * Tc
                          T.T.T = 9N * Tc 
       * After doing the FON, if we still have voids; we need to move on to concrete analysis step.
Concrete analysis step is a hill climbing searching algorithm with a special technique of backtracking.
The nodes with less number of possibilities are considered first, during searching. The backtracking used here is different from normal backtracking by latter sense. And this will be explained in one of the next coming posts...

This app is now available online... Click here

About OOP...

· Programming paradigms:
1. Monolithic Programming – BASIC and Assembly. (single large block of code with global data )
2. Procedural Programming – FORTRAN and COBOL.(all procedures will access global data)
3. Structured Programming – C, Pascal… (Code controls the data.)
4. Object Oriented Programming – Smalltalk, Java, C++, C#... (Data controls the code.)
· Programming styles and their importance.
1. Procedure-oriented - Algorithms.
2. Object-oriented – Classes and Objects.
3. Logic-oriented – Goals, often expressed in predicate calculus.
4. Rule-oriented – if-then-else statements.
5. Constraint -oriented – Invariant Relationship.
· Orthogonality is an important element of a programming language which makes the language to have only a few basic features, each of which is separately understandable.
· Class is the logical tool for implementation of an ADT.
· In OOP an object can be viewed in multiple views.
· Inheritance and Composition are the 2 power full features of OOP.
· Inheritance does early delegation and composition does late delegation of operations.
· With respect to delegation Composition is better than Inheritance.
· We can mimic the Inheritance via Composition by writing the delegating methods for the respective super classes with in the container class.
· Composition is also called as Containership.
· So we can include the un-supported Inheritance feature of Object-based paradigm with Composition.
· Binding refers to the tie-up of a procedure call to the address code to be executed in response to the call.
· 3 types of binding – Early (Static) binding, Deferred binding, Late (Dynamic) binding.
· Static binding – Known at compile time. Default in C++.
· Dynamic binding – Known at Run time. Default in Java, Done using Virtual functions in C++.
· Deferred binding – Intermediate of above 2. (Supported in GWT tool of Google).
· Class is an empty (or template of an) application and Objects are filled applications.
· In OOPs every method call is something like a client-server message driven communication.
· The method name defines the message (i.e. action to be performed) and the parameter specifies the information for that message or command.
· In OOPs, the correct method to execute an operation based on the name of the operation and the class of the object being operated.
· Communication between the objects can take place as long as they’re alive.
· Pure Object oriented languages are Smalltalk 80, Simula, and Eiffel.
· In Pure object oriented programming everything is stored in heap
· Java is not a pure object oriented as int,float etc are treated as primitive types.
· Object based languages are ADA etc…(abstraction,encapsulation,composition)
· One great divide in programming exists between Exploratory programming languages that aim at dynamism and run time flexibility, and Software engineering languages which have static typing and features that aid verifiability and/or efficiency. Smalltalk is best representative for former group and C++ belongs to latter group.
· Java = C++--++.
· But JavaTM is Awesome.
· Java was initially designed to address the problems of building software for small distributed systems to embed in consumer devices. As such it is designed for heterogeneous networks, multiple-host architectures and secures delivery.
· For OOP we need an environment and library supporting reusability.
· Reusability improves the productivity of the software.
· Reuse involves 3 steps: Retrieve the needed components, understand them and Use them.
· Though Inheritance is a good tool of reusability, it will create coupling between classes more and makes the software to be more complex in structure.
· Booch recommends that inheritance hierarchies be built as balanced lattices and that the maximum number of levels and width be limited to 7 +/- 2 classes.
· Reuse means – Reuse in data, Reuse in Architecture and Reuse in Design.
· ….<> …