reactjsbufferdna-sequence

React : create a DNA viewer for very long sequence


I would like to create my own dna sequence viewer in a react ts app (to learn and practice coding). I get the sequence from my flask server as a long string, very well. but in the case of the sequence is very large (more than 4 millions), my app crash. Because it is not simply to display text, but also a rule under the letter and give for each letter (A,C,G, or T) a different color - the rule is mandatory, the different colors displaying not, if it's make the app to slow - (see the picture)

enter image description here,

My render text code:

const renderColoredText = (text: string) => {
       return text.split('').map((char, index) => {
         let color = 'black';
   
         switch (char.toLowerCase()) {
           case 'a':
             color = 'primary';
             break;
           case 'b':
             color = 'secondary';
             break;
           case 'c':
             color = 'warning';
             break;
           default:
             break;
         }

         const borderBottom = index % 10 === 9 ? '3px solid lightblue' : '1px solid lightblue'
         const padding = '5px'

       return (
           <Box
               key={index}
               sx={{
               display: 'inline-block',
               borderBottom,
               // color: getNucleotideColor(nucleotide),
               paddingTop: '2px',
               position: 'relative',
               marginBottom: '20px', 
               }}
           >
           {char}
           {
               index % 10 === 9 && (
           <Box
               sx={{
               position: 'absolute',
               top: '110%', // Below the nucleotide
               left: '50%',
               transform: 'translateX(-50%)',
               fontSize: '12px',
               color: 'black',
               }}
           >
               {index + 1}
           </Box>
               )}
       </Box>

I added an interval, so each 1 sec it will add to sequence chunk of 10000 or 20000 letters (more than that it crash), And in the case of a long sequence of 4-5 millions, it will a very long time to updating, and at some point, crash.

My interval code:

const chunkSize = 10000; 
   const updateInterval = 1000; 


   useEffect(() => {
       const currentUser = getUserToken()
       if (!currentUser || !currentUser._id) 
       return;

       getWorkspaceInput(currentUser._id)
       .then(data => {
           let currentIndex = 0;

           
           const intervalId = setInterval(() => {
               const nextChunk = data.input.slice(currentIndex, currentIndex + chunkSize);
               setSequence(prevSequence => prevSequence + nextChunk);
               currentIndex += chunkSize;
                   if (currentIndex >= data.input.length) {
                   clearInterval(intervalId);
               }
           }, updateInterval);
   
           return () => {
               clearInterval(intervalId);
           };
       })


   },[])

The render :

 const [sequence, setSequence] = useState('')

 return (
       <MainContainer title={'Sequence Viewer'} >
           <Box sx={{ maxWidth: '100%' }}>
               <Typography 
                   sx={{ 
                       wordWrap: 'break-word',
                       letterSpacing: '2px',
                       paddingTop:'1.5rem'
                       }}
                   >{renderColoredText(sequence)}
               </Typography>
           </Box>         
       </MainContainer>
      
   )

Do you have any idea how to do this ?

Thank you


Solution

  • I've challenged myself with building this sequenced, but it turns out no amount of optimization in pure React can help. Even as much as wrapping each letter in a simple span is too much when a sequence's length is 5 000 000.

    The only thing I know of which can help is virtualization. It's an idea which can be narrowed down to: let's render only what's visible at the moment and dismount the rest.

    Unfortunately, it has many limitations, mostly about being able to organize the content into a list of rows and knowing the dimensions of each row.

    Search for "React virtualization" in Google to learn how to do it. Although rewriting an online guide into this SO answer is too much to ask for, I will provide some code samples which I think will be helpful.

    I wouldn't recommend splitting the entire sequence into a list. I advise to pass the entire sequence everywhere along with beginIndex and endIndex and use substring. This way you will be able to create a component which extracts just the part it needs:

    const CHUNK_SIZE = 10;
    
    const ROW_HEIGHT = 50;
    
    const ROW_WIDTH = 1100;
    
    function DnaSequenceRow(props) {
      const row = React.useMemo(() => {
        return props.sequence.substring(props.beginIndex, props.endIndex);
      }, [props.beginIndex, props.endIndex, props.sequence]);
    
      const chunksOfTen = React.useMemo(() => divideIntoSubsequences(row, CHUNK_SIZE),
        [row],
      );
    
      return (
        <div style={{
          //size probably has to be arbitrarily set for the sake of virtualization
          height: ROW_HEIGHT,
          width:  ROW_WIDTH,
        }}
        >
          {chunksOfTen.map((ten, indexWithinRow) => (
            <SequenceOfTen
              key={indexWithinRow}
              sequenceOfTen={ten}
              index={props.beginIndex + (indexWithinRow * CHUNK_SIZE)}/>
          ))}
        </div>
      );
    }
    

    This is what you will probably need for the virtualization - an ability to render a single row so that the virtualization library can render a chosen subset of rows (as opposed to rendering them all).

    I also extracted the chunk of ten nucleotides into a separate component for the sake of readibility:

    function SequenceOfTen(props) {
      return (
        <>
          <Box
            sx={{
              display:      'inline-block',
              borderBottom: '3px solid lightblue',
              // color: getNucleotideColor(nucleotide),
              paddingTop:   '2px',
              position:     'relative',
              marginBottom: '20px',
            }}
          >
            {props.sequenceOfTen[0]}
            <Box
              sx={{
                position:  'absolute',
                top:       '110%', // Below the nucleotide
                left:      '50%',
                transform: 'translateX(-50%)',
                fontSize:  '12px',
                color:     'black',
              }}
            >
              {props.index}
            </Box>
          </Box>
          <Box
            sx={{
              display:      'inline-block',
              borderBottom: '1px solid lightblue',
              // color: getNucleotideColor(nucleotide),
              paddingTop:   '2px',
              position:     'relative',
              marginBottom: '20px',
            }}
          >
            {props.sequenceOfTen.substring(1)}
          </Box>
        </>
      );
    }
    

    I employed useMemo to avoid recalculating of rows and chunks on every render and I wrapped the components in React.memo to avoid unnecessary rerenders. The division into chunks is based on an algorithm I found here: https://stackoverflow.com/a/29202760/12003949

    export function divideIntoSubsequences(sequence, subsequenceLength) {
      const numberOfSubsequences = Math.ceil(sequence.length / subsequenceLength);
      const subsequences         = new Array(numberOfSubsequences);
      for (let i = 0, o = 0; i < numberOfSubsequences; ++i, o += subsequenceLength) {
        subsequences[i] = sequence.substring(o, o + subsequenceLength);
      }
      return subsequences;
    }
    

    Wrap some virtualization library around this and it should work like a charm. Best of luck!