1. Sign up for HuggingFace (we will be using PepMLM: https://huggingface.co/ChatterjeeLab/PepMLM-650M)

    1. Once you login, go to the page (https://huggingface.co/settings/tokens). Click +Create new token.
    2. Make sure you type the full name ChatterjeeLab/PepMLM-650M when searching for repos. Click save token and you will see the newly token (copy that).
    3. Go to the page (https://huggingface.co/ChatterjeeLab/PepMLM-650M) and find their Colab Notebook (link).
    4. Make a copy to your Google Drive, choose T4 GPU and run each block.
    5. When running into the block Input HF token , a pop-up will show Enter your token (input will not be visible):. Paste your token and Add token as git credential? (Y/n) choose n.

    image.png

  2. Find the amino acid sequence for SOD1 in UniProt (ID: P00441), a protein when mutated, can cause Amyotrophic lateral sclerosis (ALS). In fact, the A4V (when you change position 4 from Alanine to Valine) causes the most aggressive form of ALS, so make that change in the sequence

    Unit Sequence

    >sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2
    MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
    AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
    HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
    
    

    Mutada (A4V)

    >sp|P00441|SODC_HUMAN Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2
    MATK**V**VCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
    AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
    HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
    

    image.png

  3. Enter your mutated SOD1 sequence into the PepMLM inference API and generate 4 peptides of length 12 amino acids (Step 5 takes a while so you can also just pick 1 or 2 peptides)

    1 HHYYVVGAEHKX 14.719739
    2 WHYPAAGLRLWX 15.267161
    3 WLYYAAAVEHKE 20.437929
    4 WLSPAAVAALGX 6.430779
  4. To your list, add this known SOD1-binding peptide to your list: FLYRWLPSRRGG [from -https://genesdev.cshlp.org/content/22/11/1451]

    1 HHYYVVGAEHKX 14.719739
    2 WHYPAAGLRLWX 15.267161
    3 WLYYAAAVEHKE 20.437929
    4 WLSPAAVAALGX 6.430779
    5 FLYRWLPSRRGG
  5. Go to AlphaFold-Multimer (https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb). This is similar to what you did for homework last week but instead for a protein-peptide complex

    1. Set model_type: alphafold2_multimer_v3 (this model has been shown to recapitulate peptide-protein binding accurately: https://www.frontiersin.org/articles/10.3389/fbinf.2022.959160/full). * Add your query sequence - Its the SOD1Sequence:PeptideSequence.
    MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS 
    AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV 
    HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ:HHYYVVGAEHKX:WHYPAAGLRLWX:WLYYAAAVEHKE: WLSPAAVAALGX:**FLYRWLPSRRGG**
    
    

    Model: HHYYVVGAEHKX

    image.png

    Model: WLSPAAVAALGX

    image.png

    Model: WHYPAAGLRLWX

    image.png

    Model: FLYRWLPSRRGG

    image.png

    Model: WLYYAAAVEHKE

    image.png

    Heatmaps (Rank 1-5)

    image.png

    image.png

    Sequence Coverage

    image.png

    redicted IDDT

    image.png

                       ****
    
  6. After running AlphaFold-Multimer with your 5 peptides alongside your mutated SOD1 sequence, plot the ipTM scores, which measures the relative confidence of the binding region.

    import matplotlib.pyplot as plt
    import pandas as pd
    import numpy as np
    
    # Data
    peptides = ["HHYYVVGAEHKX", "WHYPAAGLRLWX", "WLYYAAAVEHKE", "WLSPAAVAALGX", "FLYRWLPSRRGG"]
    iptm_scores = [0.0999, 0.14, 0.14, 0.145, 0.152]  # Replace with your actual results
    
    # Create DataFrame
    df = pd.DataFrame({"Peptide": peptides, "ipTM Score": iptm_scores})
    
    # Normalize values for the color scale
    norm = plt.Normalize(min(iptm_scores), max(iptm_scores))
    colors = plt.cm.RdYlGn(norm(iptm_scores))
    
    # Create figure and axis
    fig, ax = plt.subplots(figsize=(8, 5))
    
    # Plot bars with colors
    bars = ax.bar(df["Peptide"], df["ipTM Score"], color=colors)
    
    # Add values above each bar
    for bar, score in zip(bars, iptm_scores):
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width() / 2, height + 0.02, f"{score:.3f}", 
                ha="center", fontsize=10, fontweight="bold", color="black")
    
    # Configure the chart
    ax.set_xlabel("Peptides")
    ax.set_ylabel("ipTM Score")
    ax.set_title("Protein-Peptide Binding Confidence (ipTM)")
    ax.set_ylim(0, 1)  # Scale from 0 to 1
    ax.set_xticks(range(len(peptides)))
    ax.set_xticklabels(peptides, rotation=45, ha="right")  # Rotate labels for better visibility
    
    # Add color bar
    sm = plt.cm.ScalarMappable(cmap="RdYlGn", norm=norm)
    sm.set_array([])
    cbar = plt.colorbar(sm, ax=ax)
    cbar.set_label("Binding Confidence Level (ipTM)")
    
    # Show plot
    plt.show()
    

    image.png

    image.png

  7. Provide a 1 paragraph write-up of your results

    Using AlphaFold-Multimer, we assessed the binding confidence between the mutated SOD1 (A4V) and five different peptides, including a known SOD1-binding peptide (FLYRWLPSRRGG). The ipTM scores, which indicate the confidence of the binding interface, ranged from 0.0999 to 0.152, suggesting relatively weak interactions overall. The highest confidence score was observed for FLYRWLPSRRGG (0.152), aligning with its known SOD1-binding ability. The other peptides exhibited lower binding confidence, with scores clustering around 0.14, indicating weak or uncertain binding. The visualization of ipTM scores using a color-coded bar chart provided a clear comparison, where red tones (low ipTM) highlighted weaker interactions, and green (higher ipTM) marked better binding candidates. These findings suggest that while none of the tested peptides show strong binding affinity, FLYRWLPSRRGG remains the best candidate for interaction with SOD1 (A4V). Further validation, such as molecular dynamics simulations or experimental assays, would be necessary to confirm binding efficacy.