Model Version 1.0

2024-09-18 22:12:00 +02:00 · 2024-09-18 22:12:00 +02:00 · c933a051e4
parent 5f471624a0
commit c933a051e4
5 changed files with 125 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -1,3 +1,72 @@
 # VeraMind
-Open Weights Fake News Detection Model and Inference
+
 The VeraMind is an open-source Python application built using the Hugging Face Transformers library and PyTorch. It leverages a pre-trained model (`VeraMind-Mini`) to predict whether a given news article is real or fake with a confidence score.
 This project is licensed under the [Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)](https://creativecommons.org/licenses/by-nc-nd/4.0/) license. You are free to use and share this model privately, but you must give appropriate credit, not use it for commercial purposes, and not distribute derivative works.
 **Note:** This is a machine learning model and may make mistakes. It should not replace your own critical thinking when evaluating news authenticity. Always verify information from multiple reliable sources.
 ## Features
 - Predicts if a given news article is real or fake.
 - Provides a confidence score for the prediction.
 - Utilizes the Hugging Face Transformers library for easy integration with other NLP models.
 ## Installation
 1. Clone this repository:
 ```bash
 git clone https://github.com/yourusername/VeraMind.git
 cd VeraMind
 ```
 2. Install the required dependencies:
 ```bash
 pip install -r requirements.txt
 ```
 ## Usage
 ### Predicting News Authenticity
 Here's how you can use the model to predict if a news article is real or fake:
 ```python
 from src.Inference import VeraMindInference
 # Load the model
 model = VeraMindInference("path/to/VeraMind-Mini")
 # Example news article text
 text = "This is an example News Article"
 # Predict if the news is real or fake
 result = model.predict(text)
 print(result)
 ```
 The output will be a dictionary containing the result ("REAL" or "FAKE") and the confidence score:
 ```python
 {'result': 'FAKE', 'confidence': 0.9990140199661255}
 ```
 ## Model Architecture
 The `VeraMind-Mini` model used in this application is a fine-tuned version of the [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for binary text classification. It's designed to distinguish between real and fake news articles.
 ## Disclaimer
 This project is provided as-is, without any express or implied warranty. The maintainers are not responsible for any damages arising from the use of this software.
 Always remember that machine learning models can make mistakes, so use this tool responsibly and critically evaluate its predictions.
 ## Citation
 If you use this model in your research, please cite it as follows:
 > **VeraMind News Authenticity Checker** (2024). Retrieved from https://gitea.fabelous.app/Fabel/VeraMind by Falko Habel
--- a/main.py
+++ b/main.py
@ -0,0 +1,15 @@
 from src.Inference import VeraMindInference
 # load model
 model = VeraMindInference("path/to/VeraMind-Mini")
 text = "This is a example News Article"
 # predict if News are reel or Fake
 result = model.predict(text)
 # Example Output
 # {'result': 'FAKE', 'confidence': 0.9990140199661255}
 print(result)
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,2 @@
 torch
 transformers
--- a/src/Inference.py
+++ b/src/Inference.py
@ -0,0 +1,38 @@
 import torch
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 class VeraMindInference:
    def __init__(self, model_path, max_len=512):
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
        self.model.to(self.device)
        self.model.eval()
        self.max_len = max_len
    def predict(self, text):
        encoding = self.tokenizer.encode_plus(
            text,
            add_special_tokens=True,
            max_length=self.max_len,
            return_token_type_ids=False,
            padding='max_length',
            truncation=True,
            return_attention_mask=True,
            return_tensors='pt',
        )
        input_ids = encoding['input_ids'].to(self.device)
        attention_mask = encoding['attention_mask'].to(self.device)
        with torch.no_grad():
            outputs = self.model(input_ids, attention_mask=attention_mask).logits
            prediction = torch.sigmoid(outputs).cpu().numpy()[0][0]
        is_fake = prediction >= 0.5
        confidence = prediction if is_fake else 1 - prediction
        return {
            "result": "FAKE" if is_fake else "REAL",
            "confidence": float(confidence)
        }
--- a/src/init.py
+++ b/src/init.py