Can I run this locally?

#8
by Bengt-a - opened

Hi,
Being a beginner on this, is it possible to run this model locally, with the current training, on my own computer?
It takes forever to start the space each time I want to use it - and it is really good!
If yes, how do I do it?

National Archives of Finland org

Hello,
Yes, it is possible! However, you probably might want to have a GPU with at least 12 GB vram (preferably Nvidia). You could get around with 6GB vram, if you use a smaller variant of TrOCR model. The models can be run on a CPU as well, but the processing times are a bit slow. It could take up to a minute or more to process, whereas with GPU it takes roughly ten seconds.

There is a minimal guide to how to run locally provided by huggingface in the upper right corner of the demo. When pressing the three dots, you can see “Run locally” and there you can see two options. Either you can try to run with docker or by cloning with git. If you are not familiar with docker, I suggest you try with cloning with git. In order to that work you need to have Python and Git installed on your computer.

Best regards,
National Archives of Finland AI team

Thank you for your response!
The tool is truly awesome and extremely helpful for my genealogy research!

Will it be made available on any other platform, or is this the online spot where I can use it (apart from locally, but I don't think my hardware is good enough...)

Kindest regards,
Bengt
Sweden

National Archives of Finland org

Hi,

Nice to hear that it helps. Unfortunately, we do not have plans to release this on other platforms:(

Regards,
National Archives of Finland AI team

I understand. It is an amazing tool! Better than any other tools for analyzing handwritten text from 1600-.

After the summer, I'll see if I can buy a better computer. :)

If I install it, will it be able to do all that the demo does? I guess I mean if it includes all the handwritten learning.

Kind regards,
Bengt

National Archives of Finland org

Yes, if you run the code locally, it will open up the same demo as in here. It also downloads the same models from here on to your local pc, so the results will be the same.

I really love the model, thank you, great work! I have been running the model on a local machine on a Nvidia GPU with only 4 GB of VRAM and it works rather well. I do the segmentation separately and then clear the GPU VRAM and then I run the actual OCR in batches of 4 lines at a time (could do more, but that batch size seems to bring the best overall performance). The performance is not that bad, but obviously not great. It takes about 30 seconds per image or less than 0.5 seconds per line. The VRAM seems to stay below 2 GB all the time, so I have still some room. My laptop itself has plenty or RAM (32GB). So I have obviously not encountered any issues on that side. I have implemented the code to run within my own python code, and thus have not tried to use the whole package and its user interface.

Can it run on windows machine? I guess so, since its python...

But the minimal guide did not cut it for me. Any advice on this?

My setup is a Windows 11 PC with RTX 2050 (4GB of VRAM).

I first segmented the images using the functions in the SegmentImage class provided by Kansallisarkisto without any modifications.
segmenter = get_segmenter(device) and then
segment_predictions = segmenter.get_segmentation(image) and then I saved the segmentation to disk using
torch.save(segment_predictions,segment_file_path) for later OCR.

Then I unload the Yolo segmentation model to free up VRAM.

For the actual OCR work I used some parts of the original code from Kansallisarkisto, their
merge_lines,
crop_lines and
crop_line functions with some minor additional safeguards for very small segments to avoid crashes later on in the pipeline.

Roughly like this (I have a pandas data frame to store the image file information referred as row in the code, it is from a loop through all the images of a book. the actual loop is not shown here):

# Get the segments
segment_file_path = row['File path'].replace('.jpg','_segments.pt')
segment_predictions = torch.load(segment_file_path,weights_only=False)

# We only act if there are segments predictions, just skip to the next image
if not segment_predictions:
    continue

# Image dimension
height, width = segment_predictions[0]['img_shape']

# Create the text polygons
img_lines, segment_types, x_max, y_max, x_min, y_min = merge_lines(segment_predictions)

# Get the segments from the image itself
np_img = np.array(image)
cropped_lines = crop_lines(img_lines, np_img, height, width)

# Get the image tensors from the image line segments
pixel_values = processor(images=cropped_lines, return_tensors="pt",padding=True).pixel_values.half().to(device)

# Then the hard work begins
# We do things in batches to conserve VRAM
batch_size = 4
results = []

# Work through the lines on the page in batches, 4 seems to be the sweet spot for best overall performance
for i in range(0, len(pixel_values), batch_size):
    batch_pixels = pixel_values[i:i+batch_size]
    generated_ids = model.generate(batch_pixels,max_new_tokens=20)
    texts = processor.batch_decode(generated_ids, skip_special_tokens=True)

    results.extend(texts)

The pixel_values.half() reduces it to 16 bits to better fit in small VRAM amount and the device is my RTX GPU (in my case 0), 20 tokens seems to be sufficient for the rows. The batch size of 4 was the sweet spot for speed. It was not that I would have run out of memory, it just got slower per image after 4.

Best regards

Jukka-Pekka

Thank you all for your assistance. I successfully got this running locally on Windows 11 with WSL2 and my RTX 2070 Super (thanks to Claude.ai). Here's a complete guide for anyone wanting to do the same:

Running Multicentury-HTR-Demo on Windows with WSL2 and GPU

Tested on: Windows 11, WSL2, NVIDIA RTX 2070 Super (8GB VRAM)

Prerequisites

  • Windows 11
  • NVIDIA GPU with recent drivers installed in Windows
  • WSL (Windows Subsystem for Linux) installed
  • Git and Python 3 in WSL
  • Hugging Face account

Step 1: Enable WSL2 with GPU Support

1.1 Upgrade to WSL2

Open PowerShell as Administrator and run:

dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart
dism.exe /online /enable-feature /featurename:HypervisorPlatform /all /norestart

Restart your computer.

After restart, convert your WSL distribution to version 2:

wsl --set-version Ubuntu 2

Replace Ubuntu with your distribution name if different. Check with wsl --list.

1.2 Verify GPU Access in WSL

In your WSL terminal:

nvidia-smi

You should see your GPU listed with driver information. If this fails, ensure:

  • NVIDIA drivers are installed in Windows
  • Virtualization is enabled in BIOS (Intel VT-x or AMD-V). I had to enable it on my machine.

Step 2: Clone the Repository

cd ~
git clone https://huggingface.co/spaces/Kansallisarkisto/Multicentury-HTR-Demo
cd Multicentury-HTR-Demo

Step 3: Set Up Python Environment

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

(Please note that there is an error/typo in the Huggingface instructions for running it locally using Clone (git) where they wrote env instead of venv.)

Install dependencies:

pip install -r requirements.txt
pip install optimum[onnxruntime-gpu]
pip install huggingface_hub

Step 4: Configure Hugging Face Authentication

4.1 Get Access Token

  1. Go to https://huggingface.co/settings/tokens
  2. Create a new token with "Read" permissions
  3. Copy the token

4.2 Login

huggingface-cli login

Paste your token when prompted. This saves authentication for future use.

Step 5: Run the Application

python app.py

First run will download models (several GB, takes a few minutes). Once you see:

Running on local URL:  http://127.0.0.1:7860

Open the URL in your browser.

Performance Tuning

Adjust batch size based on your GPU:

4GB VRAM (RTX 2050): batch size 4, use .half() for 16-bit precision

8GB VRAM (RTX 2070 Super): batch size 8-16, full 32-bit precision works fine

Larger VRAM: Increase batch size until performance plateaus

Troubleshooting

App hangs at startup: Ensure you've run huggingface-cli login first.

CUDA not available:

  • Verify WSL2: wsl.exe --list --verbose should show version 2
  • Check BIOS: Virtualization (VT-x/AMD-V) must be enabled
  • Verify NVIDIA driver in Windows with nvidia-smi in PowerShell
  • Verify GPU access in WSL with nvidia-smi in WSL terminal

Out of memory errors:

  • Reduce batch size
  • Use .half() for 16-bit precision
  • Process fewer lines at once

When I run app.py, it says

* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.

How exactly do I create a link so I can access the application from other computers on my network?
Where do I put share=True (what file) and where?
If it is where it says demo.launch(show_error=True), what is the syntax since there already is text between the brackets?

When I run app.py, it says

* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.

How exactly do I create a link so I can access the application from other computers on my network?
Where do I put share=True (what file) and where?
If it is where it says demo.launch(show_error=True), what is the syntax since there already is text between the brackets?

You do it in the app.py file
Change the last line to
demo.launch(show_error=True, share=True, server_name="0.0.0.0")
Then it works to reach it from also outside your network.
I never got it to work only inside the network.

Sign up or log in to comment