mattshumer/Llama-3-8B-16K · Context length

What is the current method to ensure 16k context length? Checking the dataset with the llama-3 tokenizer yields an average length of 7-8k token length with some average peaks in the 15-16k range near the end of the instruction set, and the output length is only around 200-300 tokens in length.

I could be wrong, but I do believe the instruction set needs to be increased to at least an average of 16k+ overall in length per the llama-3 tokenizer for the entire dataset. And if longer more descriptive outputs are needed from the model then the output set needs to also be increased from 200-300 average to around 2-3k (GPT-4 max possible out length is ~4096, with an input of 128k).