Instructions

The TCRex web tool provides a user-friendly interface to predict the recognition of epitopes by human TCR beta sequences.

Please check if your epitopes of interest are present in our database before making any predictions. If this is the case, follow the steps in section 'Predict TCR–epitope binding using prediction models provided by TCRex' to predict epitope–TCR binding. If your epitopes of interest are not available in our database you can still make predictions for this epitope by training new prediction models. More information is available in section 'Predict TCR–epitope binding for new epitopes by training custom prediction models'.

Predict TCR–epitope binding using prediction models provided by TCRex

Step 1: Upload TCR data file

Upload the file containing all TCR beta sequences for which you want to obtain predictions. This file must contain at least the CDR3 amino acid sequence and corresponding V/J genes for every TCR beta sequence. In case V/J gene information is not available, add V/J family information instead.

The following file formats are supported by TCRex:

immunoSEQ ANALYZER format (version 1)

File format obtained by using the 'Export Sample(s)' option when exporting your results from the immunoSEQ Analyzer platform.

immunoSEQ ANALYZER format (version 2)

File format obtained by using the 'Export Sample(s) (v2)' option when exporting your results from the immunoSEQ Analyzer platform.

MiXCR format

The MiXCR text file format for clones, obtained by exporting clones from a .clns file to a .txt file. Make sure that the exported file contains at least the following columns: 'bestJGene', bestVGene', and 'aaSeqCDR3'. Suitable MiXCR files can be generated using the following command line parameters when exporting clones: -jGene, -vGene, and -aaFeature CDR3. More information about the MiXCR format is available in the MiXCR documentation.

AIRR format

TCRex can process the standard tab-delimited file format as described by the Adaptive Immune Receptor Repertoire (AIRR) community. An extensive description of this AIRR format can be found in the AIRR format documentation. TCRex extracts all necessary information from the following three columns:

'junction_aa': CDR3 sequence with the conserved starting and ending residues
'v_call': one or more V gene alleles
'j_call': one or more J gene alleles

Although the AIRR format allows different formatting of the V and J alleles, TCRex requires a specific formatting:

allele names must follow the IMGT format as described in the IMGT table for V genes and alleles and the IMGT table for J genes and alleles
no additional information (e.g. species name) is allowed in the v_call and j_call columns
in case multiple alleles are present in one column, these must be separated using a comma. However, it is better to provide only one V and J allele for each TCR sequence as TCR sequences with multiple V/J alleles are split into separate entries each having the same CDR3 beta sequence and one of the V/J alleles, resulting in a potential bias in the enrichment testing results.

Please make sure to check the format of the v_call and j_call columns before uploading the file.

In addition, the AIRR data file may not contain any lines with additional comments, but must directly start with the header containing all column names.

TCRex tab-delimited format

The TCRex format is a simple and general tab-delimited file format. TCRex files should contain the following columns with their corresponding headers:

'CDR3_beta': containing the CDR3 amino acid sequence of the TCR beta chain
'TRBJ_gene': containing the J gene or allele of the TCR beta chain following IMGT notation
'TRBV_gene': containing the V gene or allele of the TCR beta chain following IMGT notation

In case you want to report multiple genes for one entry in the TCRex format, these genes must be separated with a '/' (e.g. TRBV12-3/TRBV12-4). However, it is better to provide only one V and J gene for each TCR sequence as TCR sequences with multiple V/J genes are split into separate entries each having the same CDR3 beta sequence and one of the V/J genes, resulting in a potential bias in the enrichment testing results.

See the following example TCRex input file. This file contains 10 human TCR beta sequences that are known to bind with EAAGIGILTV, downloaded from McPAS-TCR.

Download TCRex input example file

Important considerations for all file formats:

Attention: Make sure your CDR3 beta protein sequences are canonical CDR3 sequences (i.e. TCR beta sequences starting with a Cysteine and ending with a Phenylalanine). Predictions for non-canonical CDR3 sequences are not supported.

Attention: Make sure that only one V and one J gene is provided for every TCR sequence. TCR sequences with multiple V/J genes might be split into separate entries each having the same CDR3 beta sequence and one of the V/J genes. This can lead to a bias in the enrichment testing results.

Attention: TCRex only supports prediction files with at most 50 000 TCR sequences.

Attention: TCRex only supports predictions for human TCR beta sequences. Predictions for other species or TCR alfa sequences are currently not supported.

Step 2: Select epitope(s) of interest

Use the checkboxes for your epitopes of interest that are present in our database. The toggle function can be used to select several epitopes in the same category at once.

Step 3: Check the advanced settings

Custom background file: When using pre-trained TCRex models, it is possible to upload your own background TCRs, i.e. a list of TCRs that are thought to not bind the epitope of interest. The uploaded file must contain between 10 000 and 100 000 unique TCRs. By default TCRex uses a list of 100 000 unique TCRs collected from healthy TCR repertoires. These are used to calculate a bpr score for every TCR. In general, every TCRex model is a random forest classifier and thus returns a classification score for every TCR-epitope pair between 0 and 1. In addition, TCRex calculates a bpr score for every TCR-epitope as the fraction of TCRs in the background repertoire with a value higher or equal as the classification score of the TCR. These bpr values are used to filter the prediction results and perform the enrichment tests. When using your own background TCRs, make sure they are a good representation of non-binding TCRs for your use case. It is for example a very bad idea to use a background set of mouse TCRs, since TCRex only makes predictions for human TCRs. Thus, always think carefully about the creation of your background data set. Of course you can still use the default data set or contact the TCRex developers for additional guidance. (Please, make sure all uploaded files have different names, otherwise your first file will be overwritten by the second.)
IMGT parsing: By default the TCR sequences in the input file are corrected if they contain non-IMGT genes (i.e. genes that are not listed in the IMGT database), or removed if they contain non-IMGT families (i.e. families that are not listed in the IMGT database) or orphon genes.
Enrichment threshold: TCRex performs enrichment analyses to identify the epitopes for which significantly more specific TCRs are found in the uploaded dataset than expected in a background repertoire (i.e. a representation of a normal healthy TCR repertoire). For this, an enrichment threshold (in the range of 0.01%–1%) must be chosen. This threshold represents the percentage of identified epitope-specific TCRs in the background repertoire at a certain BPR threshold. This BPR threshold is explained in more detail in step 5. An enrichment analysis is performed for each epitope separately, i.e. for each epitope TCRex tests whether the abundance of specific TCRs is significantly higher than expected in a background repertoire. TCRex provides enrichment analyses for all epitopes for which at least 2 different TCRs are identified. By default, an enrichment threshold of 0.01% is chosen.

Step 4: Submit the task

Please read the terms and conditions carefully before clicking the 'Submit' button. You will be automatically redirected to a new page with a unique URL showing your task ID and the status of your submission. In case of long-running tasks you can always return to this page using the unique URL containing the task ID. This page will refresh itself automatically every 10 seconds while the predictions have not been completed yet. As soon as your task is completed your predictions will be visible in a table at the bottom of the page. Your results will be kept available for at least 7 days.

Step 5: Get the results

The output page gives an overview of all submission details. This includes your task ID, the epitope(s) you selected, the file you uploaded, the time of submission, a log overview and the TCR repertoire size (i.e. the number of unique TCRs in your uploaded file after parsing).

Underneath, two tables are given. The first table gives an overview of the p values for each enrichment analysis. If your epitope of interest is not present in this table, the threshold of two different specific TCR sequences was not fullfilled and therefore no enrichment result is available for this epitope. The p values given by TCRex are corrected for multiple testing using the Benjamini-Hochberg strategy. Be cautious when interpreting these enrichment results: they are valid for the used background dataset which might not provide the best background for your dataset.

The second table contains the prediction score and the BPR (Baseline Prediction Rate) for binding TCR–epitope pairs. The prediction score reflects the confidence with which the prediction model predicts a TCR to bind the epitope of interest. The BPR is used to filter the prediction results afterwards. It gives an estimate of the number of background TCR sequences that are predicted to bind the epitope of interest. For example, when filtering all TCR–epitope pairs with a BPR value of 1%, you can expect that 1% of your background TCR sequences are classified as epitope-binding. Since we expect the number of true positives in a background repertoire to be very low, this BPR value approaches the false positive rate and can therefore be used to control the number of false positives. By default the BPR threshold is set to 0.01%. This can easily be changed to any user-defined value on the result page. TCR sequences with a BPR value below or equal to the chosen threshold are considered to bind the corresponding epitope. These TCR sequences are shown in the table (which is limited to the 5000 best–scoring TCR–epitope pairs) and can be downloaded as a a tab-separated file by clicking on the 'Download results' button at the bottom of the page. We recommend to download these filtered results, as the table can give a slightly different view of the results due to rounding of the values. The downloaded file will contain all TCR–epitope pairs with a BPR score equal or lower than the selected BPR threshold.

See the following example TCRex output file. This file has been obtained by using the example input file, the default BPR threshold, IMGT parsing and selecting following cancer epitopes: AMFWSVPTV, EAAGIGILTV, ELAGIGILTV, FLYNLLTRV, LLLGIGILV.

Download TCRex output example file

Predict TCR–epitope binding for new epitopes by training custom prediction models

Step 1: Upload the training data set

To train a new prediction model you need a data set containing TCR beta sequences that are known to bind with your epitope of interest. Please make sure that this data set contains epitope-specific TCR beta sequences for only one epitope. If this is not the case the predictions made by the prediction model will be unreliable.

The same file formats are supported as in step 1 of 'Predict TCR–epitope binding using prediction models provided by TCRex'.

Attention: Make sure your CDR3 beta protein sequences are canonical CDR3 sequences (i.e. TCR beta sequences starting with a Cysteine and ending with a Phenylalanine). Non-canonical TCR beta sequences will be removed and will not be used for training.

Attention: TCRex only supports training files with at most 500 TCR sequences.

Step 2: Upload the test data set

Besides the training data set used to train the TCR–epitope prediction model you can also provide the target data set. This file should contain all TCR beta sequences for which you want to obtain predictions using your newly trained prediction model. Again the same file formats are supported as in step 1 of 'Predict TCR–epitope binding using prediction models provided by TCRex'.

Attention: TCRex only supports prediction files with at most 50 000 TCR sequences.

Attention: Make sure all uploaded files have different names, otherwise your first file will be overwritten by the second.

Step 3: Check the advanced settings

IMGT parsing: By default the TCR sequences in the input file are corrected if they contain non-IMGT genes (i.e. genes that are not listed in the IMGT database), or removed if they contain non-IMGT families (i.e. families that are not listed in the IMGT database) or orphon genes.
Enrichment threshold: TCRex performs enrichment analyses to identify the epitopes for which significantly more specific TCRs are found in the uploaded dataset than expected in a background repertoire (i.e. a representation of a normal healthy TCR repertoire). For this, an enrichment threshold must be chosen (in the range of 0.01%–1%). This threshold represents the percentage of identified epitope-specific TCRs in the background repertoire at a certain BPR threshold. This BPR threshold is explained in more detail in step 5. An enrichment analysis is performed for each epitope separately, i.e. for each epitope TCRex tests whether the abundance of specific TCRs is significantly higher than expected in a background repertoire. TCRex provides enrichment analyses for all epitopes for which at least 2 different TCRs are identified. By default, an enrichment threshold of 0.01% is chosen.

Step 4: Submit the task

Step 5: Get the results

Finally, the results page shows a summary of the classifier statistics, which can be used to evaluate the performance of your new prediction model. This includes the accuracy, the area under the receiver operating characteristic curve (AUROC), and the average precision, along with the ROC curve, the precision–recall curve, and an overview of the most important features.