--- Original DataFrame Info ---
Original Data Types:
 Sell_Through_Rate     object
Damage_Rate           object
Snapshot_Date         object
Manufacturing_Date    object
Selling_Date          object
Product_Type          object
Edition_Type          object
Target_Gender         object
Price_Bucket          object
Sneaker_Grade         object
Price                  int64
dtype: object

First few rows of original DataFrame:
   Sell_Through_Rate Damage_Rate Snapshot_Date Manufacturing_Date Selling_Date Product_Type Edition_Type Target_Gender Price_Bucket Sneaker_Grade  Price
0               85%          5%    2023-01-01         2022-10-01   2023-01-10      Running      Limited          Male         High             A    150
1               92%          2%    2023-01-02         2022-10-15   2023-01-20   Basketball     Standard        Female       Medium             B    120
2               78%         10%    2023-01-03         2022-11-01   2023-01-15       Casual      Limited        Unisex          Low             C     80
3               65%          1%    2023-01-04         2022-09-01   2023-01-25      Running     Standard          Male       Medium             A    110
4               99%          0%    2023-01-05         2022-12-01   2023-01-08   Basketball      Limited        Female         High             B    200

--- Starting Data Cleaning Process ---

--- Cleaning Percentage Columns ---
Converted 'Sell_Through_Rate' to decimal (e.g., 0.85)
Converted 'Damage_Rate' to decimal (e.g., 0.05)

--- Parsing Date Columns ---
Parsed 'Snapshot_Date' to datetime format.
Parsed 'Manufacturing_Date' to datetime format.
Parsed 'Selling_Date' to datetime format.

--- Missing Values Analysis ---
No missing values found in the DataFrame.

--- Data Types After Conversions ---
Sell_Through_Rate            float64
Damage_Rate                  float64
Snapshot_Date         datetime64[ns]
Manufacturing_Date    datetime64[ns]
Selling_Date          datetime64[ns]
Product_Type                  object
Edition_Type                  object
Target_Gender                 object
Price_Bucket                  object
Sneaker_Grade                 object
Price                          int64
dtype: object

--- Basic Statistics for Numerical Columns ---
                   count   mean   std   min    25%    50%    75%    max
Sell_Through_Rate   5.00   0.84  0.13  0.65   0.78   0.85   0.92   0.99
Damage_Rate         5.00   0.04  0.04  0.00   0.01   0.02   0.05   0.10
Price               5.00 132.00 45.50 80.00 110.00 120.00 150.00 200.00

--- Unique Values in Categorical Columns ---

'Product_Type':
Running       2
Basketball    2
Casual        1
Name: Product_Type, dtype: int64

'Edition_Type':
Limited     3
Standard    2
Name: Edition_Type, dtype: int64

'Target_Gender':
Male      2
Female    2
Unisex    1
Name: Target_Gender, dtype: int64

'Price_Bucket':
High      2
Medium    2
Low       1
Name: Price_Bucket, dtype: int64

'Sneaker_Grade':
A    2
B    2
C    1
Name: Sneaker_Grade, dtype: int64

--- Data cleaning complete. Ready for feature engineering! ---

First few rows of cleaned DataFrame:
    Sell_Through_Rate  Damage_Rate Snapshot_Date Manufacturing_Date Selling_Date Product_Type Edition_Type Target_Gender Price_Bucket Sneaker_Grade  Price
0               0.85         0.05    2023-01-01         2022-10-01   2023-01-10      Running      Limited          Male         High             A    150
1               0.92         0.02    2023-01-02         2022-10-15   2023-01-20   Basketball     Standard        Female       Medium             B    120
2               0.78         0.10    2023-01-03         2022-11-01   2023-01-15       Casual      Limited        Unisex          Low             C     80
3               0.65         0.01    2023-01-04         2022-09-01   2023-01-25      Running     Standard          Male       Medium             A    110
4               0.99         0.00    2023-01-05         2022-12-01   2023-01-08   Basketball      Limited        Female         High             B    200

--- Initial Data Inspection ---

Sneakers DataFrame (sneakers.csv):
Shape: (500, 22)

First 5 rows:

Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 22 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Product_Name        500 non-null    object 
 1   Product_Type        500 non-null    object 
 2   Total_Produced      500 non-null    int64  
 3   Total_Sold          500 non-null    int64  
 4   Damaged_Units       500 non-null    int64  
 5   Month               500 non-null    object 
 6   Year                500 non-null    int64  
 7   Edition_Type        500 non-null    object 
 8   Unit_Price          500 non-null    int64  
 9   Target_Gender       500 non-null    object 
 10  Sell_Through_Rate   500 non-null    object 
 11  Damage_Rate         500 non-null    object 
 12  Unsold_Inventory    500 non-null    int64  
 13  Estimated_Revenue   500 non-null    int64  
 14  Quarter             500 non-null    object 
 15  Snapshot_Date       500 non-null    object 
 16  Price_Bucket        500 non-null    object 
 17  Manufacturing_Date  500 non-null    object 
 18  Selling_Date        500 non-null    object 
 19  Performance_Score   500 non-null    float64
 20  Sneaker_Grade       500 non-null    object 
 21  Units_Received      500 non-null    int64  
dtypes: float64(1), int64(8), object(13)
memory usage: 86.1+ KB

================================================================================

Sneakers with Dates DataFrame (snicker_dataset_with_dates.csv):
Shape: (3, 5)

First 5 rows:

Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Product_ID          3 non-null      int64 
 1   Snapshot_Date       3 non-null      object
 2   Manufacturing_Date  3 non-null      object
 3   Selling_Date        3 non-null      object
 4   Stock_Quantity      3 non-null      int64 
dtypes: int64(2), object(3)
memory usage: 252.0+ bytes

--- Column Name Mapping ---
Original column names in sneakers_df: ['Product_Name', 'Product_Type', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Target_Gender', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Price_Bucket', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Sneaker_Grade', 'Units_Received']

Original column names in sneakers_dates_df: ['Product_ID', 'Snapshot_Date', 'Manufacturing_Date', 'Selling_Date', 'Stock_Quantity']

Common columns to use for merging: {'Manufacturing_Date', 'Snapshot_Date', 'Selling_Date'}

--- Data Cleaning Summary ---
Before cleaning:
------------------------------
sneakers_df shape: (500, 22)
sneakers_dates_df shape: (3, 5)

Missing values in sneakers_df:

Missing values in sneakers_dates_df:

Task 2.0 - Initial setup complete. Next steps will depend on the output above.

Shape of cleaned data: (500, 22)

First few rows of cleaned data:

Remaining missing values:

Series([], dtype: int64)

Date range in data:
Earliest Manufacturing Date: 2015-01-01 00:00:00
Latest Selling Date: 2025-02-04 00:00:00

Preview of Days_to_Sell calculation:
Average days to sell: 51.76
Days to sell distribution:
count   500.00
mean     51.76
std      22.03
min      15.00
25%      33.00
50%      50.50
75%      72.00
max      90.00
dtype: float64

Original shape before one-hot encoding: (500, 24)
Shape after one-hot encoding: (500, 33)

Features selected for clustering: ['Total_Produced', 'Total_Sold', 'Unit_Price', 'Sell_Through_Rate', 'Damage_Rate', 'Estimated_Revenue', 'Days_to_Sell', 'Is_Limited_Edition']
Shape of scaled features: (500, 8)

First few rows of scaled features:

['X_scaled_df',
 'cleaned_sneakers_data',
 'sneakers_clean',
 'sneakers_dates_clean',
 'sneakers_dates_df',
 'sneakers_df',
 'sneakers_encoded']

Columns in sneakers_encoded:
['Product_Name', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Units_Received', 'Days_to_Sell', 'Is_Limited_Edition', 'Product_Type_Brownie', 'Product_Type_Classic', 'Product_Type_Creamy', 'Product_Type_Crisper', 'Product_Type_Peanut Butter', 'Product_Type_White Chocolate', 'Target_Gender_Women', 'Price_Bucket_Low', 'Price_Bucket_Mid', 'Sneaker_Grade_B', 'Sneaker_Grade_C', 'Sneaker_Grade_D', 'Sneaker_Grade_E']

First few rows:

--- sneakers_encoded ---
Columns: ['Product_Name', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Units_Received', 'Days_to_Sell', 'Is_Limited_Edition', 'Product_Type_Brownie', 'Product_Type_Classic', 'Product_Type_Creamy', 'Product_Type_Crisper', 'Product_Type_Peanut Butter', 'Product_Type_White Chocolate', 'Target_Gender_Women', 'Price_Bucket_Low', 'Price_Bucket_Mid', 'Sneaker_Grade_B', 'Sneaker_Grade_C', 'Sneaker_Grade_D', 'Sneaker_Grade_E']
Shape: (500, 33)

--- sneakers_df ---
Columns: ['Product_Name', 'Product_Type', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Target_Gender', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Price_Bucket', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Sneaker_Grade', 'Units_Received']
Shape: (500, 22)

--- sneakers_clean ---
Columns: ['Product_Name', 'Product_Type', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Target_Gender', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Price_Bucket', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Sneaker_Grade', 'Units_Received', 'Days_to_Sell', 'Is_Limited_Edition']
Shape: (500, 24)

--- sneakers_dates_clean ---
Columns: ['Product_ID', 'Snapshot_Date', 'Manufacturing_Date', 'Selling_Date', 'Stock_Quantity']
Shape: (3, 5)

Selecting features for clustering...

Selected features:
   Sell_Through_Rate  Unit_Price  Days_to_Sell  Is_Limited_Edition  Damage_Rate  Total_Sold
0               0.24         115            66                   1         0.37        5819
1               0.20         100            29                   0         0.47       16395

Scaling features...

Success! Scaled features (first 2 rows):

Feature scaling complete! Ready for clustering.
Shape of scaled features: (500, 6)

Calculating inertia and silhouette scores for different K values...

K=2: Inertia = 2284.61, Silhouette Score = 0.2124

K=3: Inertia = 1904.79, Silhouette Score = 0.2398

K=4: Inertia = 1673.78, Silhouette Score = 0.2008

K=5: Inertia = 1486.38, Silhouette Score = 0.2106

K=6: Inertia = 1371.55, Silhouette Score = 0.2040

K=7: Inertia = 1257.42, Silhouette Score = 0.2140

K=8: Inertia = 1179.24, Silhouette Score = 0.2087

K=9: Inertia = 1106.97, Silhouette Score = 0.2153

K=10: Inertia = 1046.70, Silhouette Score = 0.2023

Training final K-Means model with K=4...

Cluster Distribution:
0    138
1    129
2    146
3     87
Name: Cluster, dtype: int64

Analyzing cluster profiles...

Cluster Profiles (mean values):

Identifying 'Resell Goldmine' clusters...

Potential 'Resell Goldmine' clusters: []

No clusters met the 'Resell Goldmine' criteria. Consider adjusting the criteria.

No clusters meet both high sell-through and limited edition criteria.
Finding best cluster based on highest sell-through rate...
Best cluster based on sell-through rate: 1
================================================================================
                         RESELL GOLDMINE ANALYSIS
================================================================================

🎯 Best Performing Cluster: 1
📊 Cluster Size: 129 sneakers
🔥 Average Sell-Through Rate: 83.0%
💰 Average Price: $83.29
⏱️  Average Days to Sell: 59.0 days
🏆 Limited Edition Ratio: 2%

========================================
TOP 10 SNEAKERS IN BEST CLUSTER
========================================

1. Lacoste Carnaby Evo
   🔸 Price: $95
   🔸 Sell-Through: 99.8%
   🔸 Days to Sell: 68 days
   🔸 Limited Edition: No

2. Puma Smash V2 Sneakers
   🔸 Price: $65
   🔸 Sell-Through: 99.7%
   🔸 Days to Sell: 33 days
   🔸 Limited Edition: No

3. Air Jordan 1 Mid Shoes
   🔸 Price: $110
   🔸 Sell-Through: 99.7%
   🔸 Days to Sell: 89 days
   🔸 Limited Edition: Yes

4. Columbia Trailstorm Waterproof
   🔸 Price: $110
   🔸 Sell-Through: 99.6%
   🔸 Days to Sell: 81 days
   🔸 Limited Edition: No

5. Saucony Jazz Original
   🔸 Price: $80
   🔸 Sell-Through: 99.5%
   🔸 Days to Sell: 60 days
   🔸 Limited Edition: No

6. Red Tape Casual Sneakers
   🔸 Price: $35
   🔸 Sell-Through: 99.2%
   🔸 Days to Sell: 34 days
   🔸 Limited Edition: No

7. Air Jordan 1 Mid Shoes
   🔸 Price: $110
   🔸 Sell-Through: 99.1%
   🔸 Days to Sell: 59 days
   🔸 Limited Edition: Yes

8. Skechers D'Lites
   🔸 Price: $75
   🔸 Sell-Through: 99.1%
   🔸 Days to Sell: 73 days
   🔸 Limited Edition: No

9. Skechers D'Lites
   🔸 Price: $75
   🔸 Sell-Through: 98.8%
   🔸 Days to Sell: 16 days
   🔸 Limited Edition: No

10. Red Tape Casual Sneakers
   🔸 Price: $35
   🔸 Sell-Through: 98.6%
   🔸 Days to Sell: 56 days
   🔸 Limited Edition: No

========================================
RECOMMENDATIONS
========================================
1. Focus on acquiring sneakers from Cluster 1
2. Consider both limited and standard editions based on performance
3. Monitor the top performers for restocks or resale opportunities
4. Track how these products perform over time to refine your strategy

✅ Analysis complete! Results saved to 'sneakers_with_clusters.csv'

================================================================================
                              ANALYSIS COMPLETE
================================================================================

📊 CLUSTER SUMMARY:
 Cluster  Size  Avg_Sell_Through_Rate  Avg_Unit_Price
       1   129                   0.83           83.29
       0   138                   0.56           88.22
       3    87                   0.55           83.10
       2   146                   0.32           88.01

💾 FILES SAVED:
1. sneaker_cluster_summary_20250618_072913.csv - Cluster summary statistics
2. sneakers_with_clusters_20250618_072913.csv - Full dataset with cluster assignments

🎯 NEXT STEPS:
1. Review the cluster profiles in the saved CSV files
2. Focus on clusters with high sell-through rates
3. Consider price points and days to sell when making purchasing decisions
4. Monitor the performance of top products over time

✅ Analysis complete! You can now explore the saved CSV files for detailed insights.

	Product_Name	Product_Type	Total_Produced	Total_Sold	Damaged_Units	Month	Year	Edition_Type	Unit_Price	Target_Gender	Sell_Through_Rate	Damage_Rate	Unsold_Inventory	Estimated_Revenue	Quarter	Snapshot_Date	Price_Bucket	Manufacturing_Date	Selling_Date	Performance_Score	Sneaker_Grade	Units_Received
0	Nike Air Force 1 '07 Sneakers	Creamy	24592	5819	9012	September	2017	Limited	115	Men	23.66%	36.65%	9761	669185	Q3	9/1/2017	High	9/1/2017	11/6/2017	-12.98	E	24592
1	Adidas Originals Samba OG Shoes	Creamy	81482	16395	38698	September	2019	Standard	100	Men	20.12%	47.49%	26389	1639500	Q3	9/1/2019	Mid	9/1/2019	9/30/2019	-27.37	E	81482
2	Air Jordan 1 Mid Shoes	Peanut Butter	76237	8478	26062	January	2023	Special Release	110	Women	11.12%	34.19%	41697	932580	Q1	1/1/2023	High	1/1/2023	3/28/2023	-23.06	E	76237
3	Red Tape Casual Sneakers	Brownie	46463	5425	10463	October	2022	Anniversary	35	Men	11.68%	22.52%	30575	189875	Q4	10/1/2022	Low	10/1/2022	12/15/2022	-10.84	E	46463
4	Nike Court Vision Low Shoes	Peanut Butter	54118	11698	6078	April	2019	Standard	80	Women	21.62%	11.23%	36342	935840	Q2	4/1/2019	Mid	4/1/2019	5/6/2019	10.38	D	54118

	Total_Produced	Total_Sold	Unit_Price	Sell_Through_Rate	Damage_Rate	Estimated_Revenue	Days_to_Sell	Is_Limited_Edition
0	-1.12	-1.09	1.26	-1.25	0.73	-0.93	0.65	2.13
1	1.06	-0.60	0.61	-1.39	1.29	-0.43	-1.03	-0.47
2	0.86	-0.97	1.04	-1.74	0.61	-0.79	1.56	-0.47
3	-0.28	-1.11	-2.22	-1.72	0.01	-1.17	1.06	-0.47
4	0.01	-0.82	-0.26	-1.33	-0.58	-0.79	-0.76	-0.47

	Sell_Through_Rate	Unit_Price	Days_to_Sell	Is_Limited_Edition	Damage_Rate	Total_Sold
0	-1.25	1.26	0.65	2.13	0.73	-1.09
1	-1.39	0.61	-1.03	-0.47	1.29	-0.60

	Product_ID	Snapshot_Date	Manufacturing_Date	Selling_Date	Stock_Quantity
0	1	2023-01-01	2022-10-01	2023-01-10	100
1	2	2023-01-02	2022-10-15	2023-01-20	50
2	3	2023-01-03	2022-11-01	2023-01-15	75

	Sell_Through_Rate	Unit_Price	Days_to_Sell	Is_Limited_Edition	Total_Sold	Count
Cluster
0	0.56	88.22	32.93	0.00	23856.01	138
1	0.83	83.29	58.97	0.02	52741.26	129
2	0.32	88.01	62.67	0.00	15478.42	146
3	0.55	83.10	52.62	1.00	26306.30	87