What You’ll Build
A complete sentiment analysis system:
- Load and explore a review dataset
- Train a binary classifier (positive/negative)
- Evaluate with real metrics (accuracy, F1, AUC)
- Save the model as a reusable
.zipfile - Serve predictions from an ASP.NET Core API
No Python. No Jupyter notebooks. Everything runs in standard .NET.
Step 1: Project Setup
dotnet new console -n SentimentAnalysis
cd SentimentAnalysis
dotnet add package Microsoft.ML
Step 2: Prepare the Data
Create a file called data/reviews.csv. Here’s a sample structure (in production you’d use the Yelp or IMDB dataset):
Text,Sentiment
"Great product, works exactly as described",1
"Terrible quality, broke after two days",0
"Love this! Fast shipping and excellent build",1
"Worst purchase ever. Complete waste of money",0
"Decent for the price. Does what it needs to do",1
"Arrived damaged, customer service was unhelpful",0
For a meaningful model, you need at least 1,000 rows per class. The Yelp Review Polarity dataset has 560,000 labeled reviews. Download and convert it to CSV for real training runs.
Step 3: Define Data Classes
using Microsoft.ML.Data;
public class ReviewInput
{
[LoadColumn(0)]
public string Text { get; set; } = string.Empty;
[LoadColumn(1), ColumnName("Label")]
public bool Sentiment { get; set; }
}
public class SentimentPrediction
{
[ColumnName("PredictedLabel")]
public bool Prediction { get; set; }
public float Probability { get; set; }
public float Score { get; set; }
}
Label is the column ML.NET uses for supervised learning. The bool type tells ML.NET this is binary classification.
Step 4: Load and Split Data
using Microsoft.ML;
var mlContext = new MLContext(seed: 42); // Seed for reproducibility
// Load from CSV
var dataView = mlContext.Data.LoadFromTextFile<ReviewInput>(
"data/reviews.csv",
hasHeader: true,
separatorChar: ',',
allowQuoting: true);
// Split: 80% training, 20% testing
var split = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.2);
Console.WriteLine($"Training rows: {split.TrainSet.GetRowCount()}");
Console.WriteLine($"Testing rows: {split.TestSet.GetRowCount()}");
Always split before training. Evaluating on training data tells you nothing about how the model performs on unseen text.
Step 5: Build the Training Pipeline
var pipeline = mlContext.Transforms.Text
.FeaturizeText("Features", nameof(ReviewInput.Text))
.Append(mlContext.BinaryClassification.Trainers
.SdcaLogisticRegression(
labelColumnName: "Label",
featureColumnName: "Features"));
What happens in this pipeline:
- FeaturizeText — Tokenizes text, removes stop words, generates n-grams, converts to TF-IDF numeric features
- SdcaLogisticRegression — Stochastic Dual Coordinate Ascent for logistic regression; fast and effective for text classification
Alternative Trainers
// LightGBM — gradient boosting, often slightly more accurate
mlContext.BinaryClassification.Trainers.LightGbm(
labelColumnName: "Label",
featureColumnName: "Features")
// Averaged Perceptron — linear classifier, fastest training
mlContext.BinaryClassification.Trainers.AveragedPerceptron(
labelColumnName: "Label",
featureColumnName: "Features")
Step 6: Train the Model
Console.WriteLine("Training model...");
var stopwatch = System.Diagnostics.Stopwatch.StartNew();
var model = pipeline.Fit(split.TrainSet);
stopwatch.Stop();
Console.WriteLine($"Training completed in {stopwatch.ElapsedMilliseconds}ms");
On a dataset of 10,000 reviews, SDCA training typically completes in under 10 seconds.
Step 7: Evaluate
var predictions = model.Transform(split.TestSet);
var metrics = mlContext.BinaryClassification.Evaluate(predictions, "Label");
Console.WriteLine($"""
Model Evaluation:
─────────────────────────────
Accuracy: {metrics.Accuracy:P2}
AUC: {metrics.AreaUnderRocCurve:F4}
F1 Score: {metrics.F1Score:F4}
Precision: {metrics.PositivePrecision:P2}
Recall: {metrics.PositiveRecall:P2}
""");
Interpreting the numbers:
- Accuracy — Overall percentage of correct predictions
- AUC (Area Under ROC Curve) — Model’s ability to distinguish positive from negative; 0.5 = random, 1.0 = perfect
- F1 — Harmonic mean of precision and recall; use this when classes are imbalanced
- Precision — Of predictions labeled positive, what percentage actually are
- Recall — Of actual positives, what percentage were found
For sentiment analysis, aim for AUC > 0.90 and F1 > 0.85 before deploying.
Step 8: Save the Model
var modelPath = "model/sentiment-model.zip";
Directory.CreateDirectory("model");
mlContext.Model.Save(model, split.TrainSet.Schema, modelPath);
Console.WriteLine($"Model saved to {modelPath}");
The .zip file contains the entire pipeline — feature extraction and trained weights. You can load it anywhere without retraining.
Step 9: Single Predictions
var predictionEngine = mlContext.Model
.CreatePredictionEngine<ReviewInput, SentimentPrediction>(model);
var testReviews = new[]
{
"This product exceeded my expectations, highly recommend!",
"Absolute garbage. Returning immediately.",
"It's okay, nothing special but does the job.",
"The customer support team was incredibly helpful and resolved my issue quickly."
};
foreach (var review in testReviews)
{
var prediction = predictionEngine.Predict(new ReviewInput { Text = review });
var sentiment = prediction.Prediction ? "Positive" : "Negative";
Console.WriteLine($" [{sentiment}] ({prediction.Probability:P1}) {review}");
}
Step 10: Serve from ASP.NET Core
Create a separate web project:
cd ..
dotnet new web -n SentimentApi
cd SentimentApi
dotnet add package Microsoft.ML
dotnet add package Microsoft.Extensions.ML
Copy the saved model to the API project:
mkdir model
cp ../SentimentAnalysis/model/sentiment-model.zip model/
Program.cs
using Microsoft.ML;
using Microsoft.Extensions.ML;
var builder = WebApplication.CreateBuilder(args);
// Register PredictionEnginePool — thread-safe prediction
builder.Services.AddPredictionEnginePool<ReviewInput, SentimentPrediction>()
.FromFile(modelName: "SentimentModel",
filePath: "model/sentiment-model.zip");
var app = builder.Build();
app.MapPost("/predict", (
PredictionEnginePool<ReviewInput, SentimentPrediction> pool,
ReviewInput input) =>
{
var prediction = pool.Predict(modelName: "SentimentModel", input);
return Results.Ok(new
{
text = input.Text,
sentiment = prediction.Prediction ? "positive" : "negative",
confidence = prediction.Probability
});
});
app.MapPost("/predict/batch", (
PredictionEnginePool<ReviewInput, SentimentPrediction> pool,
ReviewInput[] inputs) =>
{
var results = inputs.Select(input =>
{
var prediction = pool.Predict(modelName: "SentimentModel", input);
return new
{
text = input.Text,
sentiment = prediction.Prediction ? "positive" : "negative",
confidence = prediction.Probability
};
});
return Results.Ok(results);
});
app.MapGet("/health", () => Results.Ok(new { status = "healthy" }));
app.Run();
Test the API
dotnet run
curl -X POST http://localhost:5000/predict \
-H "Content-Type: application/json" \
-d '{"text": "This product is amazing, best purchase of the year!"}'
Response:
{
"text": "This product is amazing, best purchase of the year!",
"sentiment": "positive",
"confidence": 0.9234
}
Performance Considerations
| Dataset Size | Training Time | Model Size | Inference Latency |
|---|---|---|---|
| 5,000 rows | ~2 seconds | ~1 MB | <1ms |
| 50,000 rows | ~15 seconds | ~3 MB | <1ms |
| 500,000 rows | ~2 minutes | ~8 MB | <1ms |
Inference is consistently sub-millisecond on CPU regardless of model size. This makes ML.NET ideal for high-throughput prediction endpoints.
Next Steps
- What is ML.NET? Introduction for .NET Developers — ML.NET fundamentals
- ONNX Models in .NET: Run AI Without Azure — Run HuggingFace models locally
- What is Semantic Kernel? — When to use LLMs vs trained models