Question 20.1
Describe analytics models that could be used to help the company monetize their data: How could the
company use these data sets to generate value, and what analytics models might they need to do it?
Ther
...
Question 20.1
Describe analytics models that could be used to help the company monetize their data: How could the
company use these data sets to generate value, and what analytics models might they need to do it?
There are lots of good answers, and I want you to think about two types – at least one of your answers
should be based on just one data set (the one they’ve collected internally on customer browsing
patterns on the website), and at least one of your other answers should be based on combining more
than one of the data sets.
Using just Data Set #3 (collected by the company using website tracking code) – Classify and profile
customers to identify their spending potential (how much they would be willing to spend for an item,
as well as how much they could spend over time)
Model 1: given the list of products purchased in the past, the price of the product (assuming this
information is also available), the date of purchase, and the ship-to address, use K-means clustering to
cluster customers into nine categories depending on the frequency of spending and the expensiveness of
the product (low, medium, or high for both).
By identifying the type of spender each customer is (and therefore their spending potential), the
company would be able to cater the price points of the recommended products for each customer. If a
customer is a ‘high spender’, then the company would be able to show/recommend more expensive
products to them and ideally generate more revenue.
The date of purchase can be used to help determine whether the customer frequently goes shopping or
not. One way would be to take the time difference (eg, in days) between each subsequent date of
purchase and average those differences. If the customer is a frequent shopper, then the company can
update their recommendations more frequently. Since the customer is exposed to more products, this
can encourage spending if a product catches their attention.
The ship-to address should be taken into consideration because even though the purchases are on one
account, there many be multiple users and the spending habits of these users should not be confused
with each other.
One way to evaluate how expensive a product is would be to compare the product’s price compared to
similar products. The company can then use the product price clusters to determine the spending
potential for each customer – if a customer has a lot of expensive products (compared to similar
products), they are probably a high spender. The clustering gets more complicated the more mixed the
products are for a specific customer.
• Model 1b: given the list of products purchased in the past, the price of the product, and the
interquartile range of prices for similar products, use K-means clustering to cluster the product
price and determine whether the product is a less expensive, medium expensive, or expensive
item.
This study source was downloaded by 100000834091502 from CourseHero.com on 05-16-2022 06:58:46 GMT -05:00
https://www.coursehero.com/file/40842907/ISyE-6501-Homework-15pdf/
• Data like the interquartile range could be used for the product type to exclude any outliers.
Meanwhile, the price of the product at the time of sale could be used to help take other factors,
like sales/discounts (ie, Black Friday) into consideration. Even if the normal price of the product
is medium expensive, if the discounted price puts the product in the less expensive zone, that
product would be considered ‘less expensive’ for that purchase.
Using both Data Set #1 (purchased from an alumni magazine publisher) and Data Set #3 (collected by
the company using website tracking code) – Matching customers across data sets in order to combine
the data sets.
Model 2: given similar customer fields across the data sets – First Name (in both), Last Name (in both),
Current City (Data Set #1), and Ship-to Address (Data Set #3) – and other not-as-similar fields like
interests (Data Set #1) and list of products purchased in the past (Data Set #3), use logistic regression to
determine if a customer in one data set is the same as a customer in another data set.
The city part of the Ship-to Address can be used as the current city for the customer. If there are
multiple Ship-to Addresses that result in more than one city, a challenge would be to determine the
Ship-to Address for the customer (versus another user of the account). For services like Amazon, there is
often a name associated with the address used. If that’s the case with this company, that address can be
used as the home address. If not, the address that is used most often can be considered the home
address.
The similarity of these text fields can be evaluated using Levenshtein distance to calculate how many
changes (edits) would be necessary to get from one word to the other. Changes can include deletions,
additions, and substitutions of letters. Then logistic regression can be used to determine the likelihood
that the Data Set #1 customer is the same as the Data Set #3 customer.
Interests and list of products, though not the same, can compared as well. The two aren’t the same
because someone can purchase lots of toilet paper, but they might not be interested in toilet paper. On
the other hand, someone who purchases a lot of groceries regularly can be interested in cooking.
• Product types can be mapped to the different interests contained in Data Set #1 (or ‘other’ if
they can’t be mapped to anything).
• A purchased product that ties to an interested listed for the customer in Data Set #1 can be used
to help support if the two people are the same, like the cooking example above or pet products
if they’re interested in pets and/or animals. However, the absence of a purchased product
despite an interest doesn’t indicate that the people are not the same.
Using both Data Set #1 (purchased from an alumni magazine publisher) and Data Set #3 (collected by
the company using website tracking code) – Generate better and more diverse product suggestions.
Model 3: given the list of products purchased in the past (Data Set #3), interests (Data Set #1), the results
from Model 1 (customer profiles), and the results from Model 3 (customer matching), use optimization to
determine the best set of product suggestions to present to the customer and generate better and more
diverse product suggestions for each customer and proactively predict what other items the customer
may be interested in.
This study source was downloaded by 100000834091502 from CourseHero.com on 05-16-2022 06:58:46 GMT -05:00
https://www.coursehero.com/file/40842907/ISyE-6501-Homework-15pdf/
• Model 3a: given the list of products purchased in the past for each customer (Data Set #3), use
machine learning with pairwise association mining to determine product purchasing
relationships.
• The lists of products for every customer can be used to perform pairwise association mining: if a
customer purchases an item, then does the customer also purchase another item? A popular
example is cereal and milk – since the two are often bought together, this relationship can
impact the location of the items (if in a brick-and-mortar store) and encourage cross-market
pricing (e.g., mark one down and upsell the other).
• These relationships allow the company to predict what other products the customer may be
interested in. This is a more pro-active suggestion, as opposed to a reactive suggestion based on
past products/searches. These relationships can also help increase revenue because sometimes,
people don’t know what they want until it’s in front of them.
• Model 3b: given the list of interests for each customer (Data Set #a), use machine learning with
pairwise association mining to determine product purchasing relationships.
Using the results of Model 3a and Model 3b to help determine what other products a customer would
be interested in, based on both what was previously purchased and what they are interested in. A
potential constraint for the model would be that the suggested product(s) need to be one of the
relationships from the pairwise association mining.
Model 3b, which is based on interests, is particularly useful if the customer has a sparse purchasing
history and can be used to encourage more spending through more tailored and targeted suggestions.
However, using interests to predict relevant products can only be used for customers in Data Set #3 that
have been matched with themselves in Data Set #1.
Then, using the customer profiles from Model 1, the price points of the products can be taken into
consideration when determining which products (less expensive, medium expensive, expensive) to
suggest. This then not only encourages more purchases, but also maximizes the potential revenue from
each purchase (which would be the objective statement for the optimization model)
[Show More]