Capri Overview |
Capri Uniqueness |
Capri Key features |
Capri Userbase |
Capri System Requirements
Capri
Overview
A typical data set input to a sequence detection algorithm
is shown below. Here the Customer Number is the Primary
Key, the Date/Time field is the Secondary Key and the
Item field is the Event field, which in this example data
set is the purchase of a specific item. The objective
of applying a sequence detection algorithm to this data
would be to understand the purchasing behaviour of customers.
Purchasing behaviour being defined by the order in which
customers purchase different items.
| Customer
Number |
Date/ Time |
Item |
| 1 |
01/01/1999:23:04:02 |
Beer |
| 1 |
01/04/1999:13:04:52 |
Nappies |
| 2 |
01/02/1999:14:16:31 |
Shirt |
| 2 |
01/03/1999:11:37:09 |
Trouser |
| 2 |
01/03/1999:11:37:09 |
Shoes |
back
to top
Sequence Detection Algorithms have their origin in basket
analysis and web mining though they have successfully
been applied to fraud detection and analysis of commodity
prices.
An example rule discovered by CAPRI when applied to
crude oil prices in the New York and London exchanges
is shown below:
If price in London increases by
less than 0.9% and the Next Day there is a drop in the
NYMEX WTI price of between 0.79% and 1.55% and
a drop in London price of between 0.96% and 1.68%
Then (On the Following Day)
there will be a further drop in the NYMEX WTI price
of between 1.5% and 2.45%.
Confidence in the rule is 75%
This pattern appeared in 9% of all time windows being
analysed.
back to top
Discovery of such rules is a departure from traditional
sequence detection. Traditional sequence detection generally
dealt with discrete events occurring in time as opposed
to a continuous series as is the case with commodity
prices.
Capri belongs to the Apriori-family of data mining algorithms,
with its origin in association rule discovery. Capri
is used within the third party applications to discover
different types of sequences across records (and therefore,
over time). Prior to Capri, general sequence questions
could not be answered unless you knew the sequences
in advance or had narrow constraints on the problem.
Typical sequential patterns that can be found in data
sets using Capri are:
• Products bought by customers across multiple
transactions.
• Financial transactions made by a business in
a fiscal year.
• Clickstreams or Web site paths for understanding
purchases, exits, traffic, and crime on the Web.
• Frequent sequences of the chemical bases that
make up human DNA.
• Patterns of non-compliance or fraud over time.
back to top
Association Algorithms
Association Algorithms such
as GRI and Apriori (the algorithm that Capri is based
on) generate rules showing which things (events, attributes,
purchases, etc.) typically occur together. Using an
association algorithm one produces a list of rules.
The rules describe the conditions under which certain
conclusions occur. A typical rule from GRI/Apriori might
look like this:
conclusion <= conditionA & conditionB
& ...
beer <= snacks & newspaper
This rule is interpreted as follows: Customers who buy
snacks and a newspaper are also likely to buy beer.
Note: This rule does not show a causal relationship;
it is merely showing the likelihood of certain things
occurring together. Association rules normally include
information on:
• Coverage (or Support).
Indicates how often the conditions and conclusion occur
together.
• Accuracy (or Confidence).
Indicates how often, when the conditions occur, that
the conclusion also occurs
back to top
back
to products |