Artificial Intelligence-Expert Systems-Data Mining: 2009

Sunday, December 13, 2009

New project(Cooperating)

Hi.
I am trying to make a project from viewpoint AI and ES. It will be about colors and the usage colors. This program will help the users to select the colors for especial places, environments, etc. For example helping to select the colors for the inside of an airplane (chairs, floor, walls, ceiling, etc).
I decide that it asks some questions from user and offers the colors that are the best selects from the aspects of Psychology.
In addition, one part of this program will used to enter the any web site, that is, while we enter to the web site, this program will take Max Luscher's Test to know the morality of the user. Then, automatically it will open the site with the suitable colors for the user.
It may be a good program for all persons. I think it will be very useful and many people will visit it. I will decide to design with C# language.

If you are eager to cooperate with me, please contact me!

Wednesday, August 19, 2009

Delta Rule

Also known by the names:
Adaline Rule
Widrow-Hoff Rule
Least Mean Squares (LMS) Rule

Change from Perceptron:
Replace the step function in the with a continuous (differentiable) activation function, e.g linear
For classification problems, use the step function only to determine the class and not to update the weights.

Note: this is the same algorithm we saw for regression. All that really differs is how the classes are determined.

Delta Rule:

Training by Gradient Descent Revisited Construct a cost function E that measures how well the network has learned.

For example

(one output node)

where
n = number of examples
ti = desired target value associated with the i-th example
yi = output of network when the i-th input pattern is presented to network
To train the network, we adjust the weights in the network so as to decrease the cost (this is where we require differentiability). This is called gradient descent.

Algorithm
Initialize the weights with some small random value
Until E is within desired tolerance, update the weights according towhere E is evaluated at W(old), m is the learning rate.: and the gradient is

More than Two Classes.
If there are mor ethan 2 classes we could still use the same network but instead of having a binary target, we can let the target take on discrete values. For example of there ar 5 classes, we could have t=1,2,3,4,5 or t= -2,-1,0,1,2. It turns out, however, that the network has a much easier time if we have one output for class. We can think of each output node as trying to solve a binary problem (it is either in the given class or it isn't).

Long Short-Term Memory

In a recurrent network, information is stored in two distinct ways. The activations of the units are a function of the recent history of the model, and so form a short-term memory. The weights too form a memory, as they are modified based on experience, but the timescale of the weight change is much slower than that of the activations. We call those a long-term memory. The Long Short-Term Memory model [1] is an attempt to allow the unit activations to retain important information over a much longer period of time than the 10 to 12 time steps which is the limit of RTRL or BPTT models.
The figure below shows a maximally simple LSTM network, with a single input, a single output, and a single memory block in place of the familiar hidden unit.
This figure below shows a maximally simple LSTM network, with a single input, a single output, and a single memory block in place of the familiar hidden unit. Each block has two associated gate units (details below). Each layer may, of course, have multiple units or blocks. In a typical configuration, the first layer of weights is provided from input to the blocks and gates. There are then recurrent connections from one block to other blocks and gates. Finally there are weights from the blocks to the outputs. The next figure shows the details of the memory block in more detail.
The hidden units of a conventional recurrent neural network have now been replaced by memory blocks, each of which contains one or more memory cells. At the heart of the cell is a simple linear unit with a single self-recurrent connection with weight set to 1.0. In the absence of any other input, this connection serves to preserve the cell's current state from one moment to the next. In addition to the self-recurrent connection, cells receive input from input units and other cell and gates. While the cells are responsible for maintaining information over long periods of time, the responsibility for deciding what information to store, and when to apply that information lies with an input and output gating unit, respectively.
The input to the cell is passed through a non-linear squashing function (g(x), typically the logistic function, scaled to lie within [-2,2]), and the result is then multiplied by the output of the input gating unit. The activation of the gate ranges over [0,1], so if its activation is near zero, nothing can enter the cell. Only if the input gate is sufficiently active is the signal allowed in. Similarly, nothing emerges from the cell unless the output gate is active. As the internal cell state is maintained in a linear unit, its activation range is unbounded, and so the cell output is again squashed when it is released (h(x), typical range [-1,1]). The gates themselves are nothing more than conventional units with sigmoidal activation functions ranging over [0,1], and they each receive input from the network input units and from other cells.
Thus we have:
Cell output: ycj(t) is
ycj(t) = youtj(t) h(scj(t))
where youtj(t) is the activation of the output gate, and the state, scj(t) is given by
scj(0) = 0, and
scj(t) = scj(t-1) + yinj(t) g(netcj(t)) for t > 0.
This division of responsibility---the input gates decide what to store, the cell stores information, and the output gate decides when that information is to be applied---has the effect that salient events can be remembered over arbitrarily long periods of time. Equipped with several such memory blocks, the network can effectively attend to events at multiple time scales.
Network training uses a combination of RTRL and BPTT, and we won't go into the details here. However, consider an error signal being passed back from the output unit. If it is allowed into the cell (as determined by the activation of the output gate), it is now trapped, and it gets passed back through the self-recurrent connection indefinitely. It can only affect the incoming weights, however, if it is allowed to pass by the input gate.
On selected problems, an LSTM network can retain information over arbitrarily long periods of time; over 1000 time steps in some cases. This gives it a significant advantage over RTRL and BPTT networks on many problems. For example, a Simple Recurrent Network can learn the Reber Grammar, but not the Embedded Reber Grammar. An RTRL network can sometimes, but not always, learn the Embedded Reber Grammar after about 100 000 training sequences. LSTM always solves the Embedded problem, usually after about 10 000 sequence presentations.
One of us is currently training LSTM networks to distinguish between different spoken languages based on speech prosody (roughly: the melody and rhythm of speech).
References
Hochreiter, Sepp and Schmidhuber, Juergen, (1997) "Long Short-Term Memory", Neural Computation, Vol 9 (8), pp: 1735-1780

Monday, August 3, 2009

The project of teaching C Language.

Dear my friends…

A few months ago, I designed an expert system in spring & summer 2009. It was offered by my master (Dr.Montazeri). It is a virtual teacher. I designed it for the course of expert systems.

Explanations:
This program is written by C# language. It learns C language and uses from Access Data Base. In this program, you have one account that your information is registered in it.
After your entering, the program dispatches your information from data base and starts learning. In teaching, you can stop or continue your learning and ask from the virtual teacher some questions about the same step (lesson).

For understanding more run it…

This program can be a start of designing virtual teachers. I tried to product this program usefully and application oriented.
At the end, I thank my dear master (Dr.Montazeri) that he helped me with Artificial Intelligence & Expert Systems.

I uploaded this program for you. If you are eager to upgrade and see its codes (In C#) please contact me to sending for you …

Download (EXE file)

Monday, July 20, 2009

What is data mining?

Simply stated,data mining refers to extracting or “mining” knowledge from large amounts of data. The term is actually a misnomer. Remember that the mining of gold from rocks or sand is referred to as gold mining rather than rock or sand mining. Thus, “data mining” should have been more appropriately named “knowledge mining from data”, which is unfortunately somewhat long. “Knowledge mining”, a shorter term, may not reect the emphasis on mining from large amounts of data. Nevertheless, mining is a vivid term characterizing the process that nds a smallset of precious nuggets from a great deal of raw material (Figure 1.3). Thus, such a misnomer which carries both”data” and “mining” became a popular choice. There are many other terms carrying a similar or slightly dierent meaning to data mining, such as knowledge mining from databases, knowledge extraction, data/pattern analysis, data archaeology, and data dredging .
Many people treat data mining as a synonym for another popularly used term, “Knowledge Discovery in Databases “, or KDD . Alternatively, others view data mining as simply an essential step in the process of knowledge discovery in databases. Knowledge discovery as a process is depicted in Figure 1.4, and consists of an iterative sequence of the following steps:

data cleaning (to remove noise or irrelevant data),

data integration (where multiple data sources may be combined)

data selection (where data relevant to the analysis task are retrieved from the database),

data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance)

data mining(an essential process where intelligent methods are applied in order to extract data patterns),

pattern evaluation (to identify the truly interesting patterns representing knowledge based on some interestingness measures; Section 1.5),
and

knowledge presentation (where visualization and knowledge representation techniques are used to present

Ant Colony Optimization

Ant Colony Optimization

Marco Dorigo, Thomas Stützle, “Ant Colony Optimization (Bradford Books)” The MIT Press

2004 ISBN: 0262042193 319 pages PDF 1,9 MB

Download & Readmore

The complex social behaviors of ants have been much studied by science, and computer scientists are now finding that these behavior patterns can provide models for solving difficult combinatorial optimization problems. The attempt to develop algorithms inspired by one aspect of ant behavior, the ability to find what computer scientists would call shortest paths, has become the field of ant colony optimization (ACO), the most successful and widely recognized algorithmic technique based on ant behavior. This book presents an overview of this rapidly growing field, from its theoretical inception to practical applications, including descriptions of many available ACO algorithms and their uses.
The book first describes the translation of observed ant behavior into working optimization algorithms. The ant colony metaheuristic is then introduced and viewed in the general context of combinatorial optimization. This is followed by a detailed description and guide to all major ACO algorithms and a report on current theoretical findings. The book surveys ACO applications now in use, including routing, assignment, scheduling, subset, machine learning, and bioinformatics problems. AntNet, an ACO algorithm designed for the network routing problem, is described in detail. The authors conclude by summarizing the progress in the field and outlining future research directions. Each chapter ends with bibliographic material, bullet points setting out important ideas covered in the chapter, and exercises. Ant Colony Optimization will be of interest to academic and industry researchers, graduate students, and practitioners who wish to learn how to implement ACO algorithms.
uploading.com
depositfiles.com
mirror

An Interface Layer for Artificial Intelligence

Markov Logic: An Interface Layer for Artificial Intelligence
Pedro Domingos, Daniel Lowd, “Markov Logic: An Interface Layer for Artificial Intelligence”

Morgan & Claypool 2009 ISBN: 1598296922 100 pages PDF 1,1 MB
depositfiles.com
uploading.com
mirror

Saturday, July 18, 2009

An Introduction to Artificial Intelligence

Artificial Intelligence, or AI for short, is a combination of computer science, physiology, and philosophy. AI is a broad topic, consisting of different fields, from machine vision to expert systems. The element that the fields of AI have in common is the creation of machines that can “think”.
In order to classify machines as “thinking”, it is necessary to define intelligence. To what degree does intelligence consist of, for example, solving complex problems, or making generalizations and relationships? And what about perception and comprehension? Research into the areas of learning, of language, and of sensory perception have aided scientists in building intelligent machines. One of the most challenging approaches facing experts is building systems that mimic the behavior of the human brain, made up of billions of neurons, and arguably the most complex matter in the universe. Perhaps the best way to gauge the intelligence of a machine is British computer scientist Alan Turing’s test. He stated that a computer would deserves to be called intelligent if it could deceive a human into believing that it was human.
Artificial Intelligence has come a long way from its early roots, driven by dedicated researchers. The beginnings of AI reach back before electronics, to philosophers and mathematicians such as Boole and others theorizing on principles that were used as the foundation of AI Logic. AI really began to intrigue researchers with the invention of the computer in 1943. The technology was finally available, or so it seemed, to simulate intelligent behavior. Over the next four decades, despite many stumbling blocks, AI has grown from a dozen researchers, to thousands of engineers and specialists; and from programs capable of playing checkers, to systems designed to diagnose disease.
AI has always been on the pioneering end of computer science. Advanced-level computer languages, as well as computer interfaces and word-processors owe their existence to the research into artificial intelligence. The theory and insights brought about by AI research will set the trend in the future of computing. The products available today are only bits and pieces of what are soon to follow, but they are a movement towards the future of artificial intelligence. The advancements in the quest for artificial intelligence have, and will continue to affect our jobs, our education, and our lives.

The Fuzzy Systems Handbook

The Fuzzy Systems Handbook
A Practitioner’s Guide to Building, Using, and Maintaining Fuzzy Systems

Earl CoxAcademic Press 1994 ISBN: 0121942708 512 pages PDF 6,7 MB

A comprehensive introduction to fuzzy logic, this book leads the reader through the complete process of designing, constructing, implementing, verifying and maintaining a platform-independent fuzzy system model. It is written in a tutorial style that assumes no background in fuzzy logic on the reader’s part. The enclosed disk contains all of the book’s examples in C++ code.
uploading.com
depositfiles.com
mirror

Monday, July 13, 2009

The First Post In This Site

Hello my friends.
I decided to gather all my activities in this site.

I started many activities in this domain.

I expand my interesting project in AI and ES.

GOD BYE..

Pages