SSL Dataset
This dataset is the result of a collaborative effort between King Fahd University of Petroleum and Minerals (KFUPM) and the National Center for Artificial Intelligence (NCAI). It contains sign language data collected from a diverse group of signers, designed to support research and development in the field of sign language recognition and related AI applications.
Data Summary
The dataset includes sign language sentences from a variety of signers, capturing different signing styles and contexts. Below is a brief overview of the dataset:
- Number of Signers: 18
- Total Hours of Video: 35 hours
- Data Splits:
- Training Split:
- Includes three female signers and eleven male signers.
- The total duration of the training split is 34 hours.
- Test Split:
- The test split is 7 hours in total, consisting of:
- Test 1: Unseen signers and unseen sentences.
- Test 2: Seen signers and unseen sentences.
- Test 3: Seen sentences and unseen signers.
- The test split is 7 hours in total, consisting of:
- Training Split:
The table below summarizes the data for each split based on the sentences, minutes, seen sentences, seen signers, and other parameters.
Split | Sentences | Minutes | Seen Sentences | Seen Signers | # Samples | # Signers | Gender |
---|---|---|---|---|---|---|---|
Train | 24,111 | 2,017.82 | Yes | Yes | 1,900 | 16 | 4F, 12M |
Test 1 | 200 | 16.65 | No | No | 100 | 2 | 1F, 1M |
Test 2 | 1,297 | 107.95 | No | Yes | 100 | 11 | 3F, 10M |
Test 3 | 3,783 | 337.33 | Yes | No | 1,900 | 2 | 1F, 1M |
Data Access
- Data Acess Requirement: This dataset can be accessed by clicking Kaggle dataset
Citation
When utilizing this dataset in your research, remember to cite the dataset paper once it is published.
license
Please follow the licensing terms of Creative Commons Attribution-NonCommercial (CC BY-NC).