首先预热先随便试一下。后面会有GPT2
git clone https://github.com/karpathy/nanoGPT
cd nanoGPT-master
python data/shakespeare_char/prepare.py
# 创建train.bin val.bin 这两个文件
# 有GPU的话直接开炼!
python train.py config/train_shakespeare_char.py
# 模型存到了--out_dir 也就是默认是 out-shakespeare-char
# 模型炼好了,可以开始生成些样本看看~效果~
python sample.py --out_dir=out-shakespeare-char
python sample.py --out_dir=out-shakespeare-char
Overriding: out_dir = out-shakespeare-char
number of parameters: 10.65M
Loading meta from data/shakespeare_char/meta.pkl...
Clown:
So, who? and is the servant?
ABbot:
It is not so: alas, he's a third with his back: hast
he sworn a hungry apollo much on the officious tomb
he is noted in easy height. Yet it is true, it is not
that a most plant of his husband; it is a mine own word, let me know
noted to see this own of all of instruments, but the most
riverse higher to our weight.
ANGELO:
Fruit for truth, to she will not be wife.
ANGELO:
How are you? as it dischardined, I am not hot best a
crave of your change! You
---------------
Men part, and such a sentence of sorrow. Where is my heart?
ANGELO:
I will be so near to one with contrary.
ISABELLA:
I would the lamb of those I do be solved:
A distraction sir, indeed.
ANGELO:
A bring Richard: there is gone! I know not.
I do not lose you, but keep your virtues: there is your lamb, and you
shall be a most diseased that known your proper extermity.
ISABELLA:
The shop of your part, you shall not good, but we shall like you so.
ANGELO:
Why with you a worthin with the king of
---------------
MARIANA:
I beseech you, my lord; I'll go to your daughter
Say your grace.
ISABELLA:
I, my lord,
You know what you did, when you know where,
Is a blood and bring of him.
ANGELO:
I am a very dead
In this lamb, to liber wish your manhood.
DUKE VINCENTIO:
Sir, thou wilt have have lall'd of grace
And consul, but servant from the father's blood,
Be in so disposed years on the next death of my mind;
And then I have plain'd for his mouth in the house of me
Be your large. First, I have fare your hones
---------------
First Citizen:
Ay, but it is a guard.
COMINIUS:
What, are you then?
A pitch, sir?
BRUTUS:
No, no more present than are patricians.
BRUTUS:
You are song always where, good Muciiia,
To prison and o' the powers.
CORIOLANUS:
I do think you, sir, that he cracks your tongue,
That blacks here two performed in a gentle request
That's one: like a sound foolish dovernman.
SICINIUS:
Why, would you be gone!
LEONTES:
Have you no more on the Roman in this form,
Show for a distimation of the world,
Marc
---------------
BUCKINGHAM:
Then, here is no child still grow.
Gracious lord, lord man, be prouded for the country;
For who knows I take him I love to thee,
And whereof good companion in his death,
Which is the other worse of my love:
Her mother, never set not upon my name with his house,
I cannot watch a soldier, and your proper love a proud;
Our lady's bloody more can mast dance for the greater
Of sobediers and hollow'd upon the season of change,--
The oracling young Romeo will leave for them strange,
And for
---------------
MENENIUS:
Have been so to a side that beggars that makes
A widow of private, a charge of bring, a virtuous
Approperous bones; they are not a subjected
When as she was with obpastime, were it for close
To proof them, or it is so.
First Senator:
So, condemns, therefore lies and be grace
And have him to his ancient citizens,
With like his soldiers, and he doth empty him and
Displainy contraction will stay upon him;
And then will you speak him to the red Richmond.
First Servingman:
POMPEY:
Any t
---------------
She was seen me left the tewll.
HENRY BOLINGBROKE:
If I think my brother love I would than I come:
I am not talk of these dogs for their powers;
The time I till tell my offences,
No man or I never say, and fight their semary.
Yet not of your great careeting your rapiers,
For your woes without at place yourself.
ANGELO:
If it be sent for false, that thou'lt have done with
the kind of that against the common of men.
ANGELO:
You are post possible: go you to the first, I will send
To live a soldi
---------------
ladies of dreams of the execution,
Give me thy doom man in all the king,
And I say not the tyrannous of my cousin
By the power of the injury, that thy piece that have done
To see thee and dead so other affection
The thought a purpose unborn of her troth:
Thou canst conceal the wings of it cract in the heat;
And thou and cruel for the hopeful woman,
To rather with the proof instrument days,
My hand death the disposing straight shadow up win him.
You have sworn of my face must be with;
So by the w
---------------
LEONTES:
The realm and his parchance. But, and my grace
I'll content them from the court-i' the scriptre?
HERMIONE:
Half our flower?
POLIXENES:
Fore then, my lord,
Be many of honour! this is not the flower
That's out officers; who rash thee as here
A best law where's thy worship no short in
With the liken and common will perform follow grief
In the presence of the figure of our highness.
DUKE VINCENTIO:
Though I would there sit in such a war.
Love-heart. How fellow thou ne'er a herrier in th
---------------
How changed that rest,
The west day of for that I am husband?
POMPEY:
I say this young shall give you to the sweet soul,
And look on this footmast.
MISTRESS OVERDONE:
There's the room of wrather.
ANGELO:
Why 'sir, indeed, there's no more like a present
Than his body o' the north.
POMPEY:
Here comes her most complot with her manhood: 'tis here
If he told him were a subtless for him.
POMPEY:
My lord, he should sing him in him to his faces.
ESCALUS:
How now! what says he can hold him that hav
---------------
(base) root@openai:~/nanoGPT/nanoGPT-master$ python train.py config/train_shakespeare_char.py
Overriding config with config/train_shakespeare_char.py:
# train a miniature character-level shakespeare model
# good for debugging and playing on macbooks and such
out_dir = 'out-shakespeare-char'
eval_interval = 250 # keep frequent because we'll overfit
eval_iters = 200
log_interval = 10 # don't print too too often
# we expect to overfit on this small dataset, so only save when val improves
always_save_checkpoint = False
wandb_log = False # override via command line if you like
wandb_project = 'shakespeare-char'
wandb_run_name = 'mini-gpt'
dataset = 'shakespeare_char'
gradient_accumulation_steps = 1
batch_size = 64
block_size = 256 # context of up to 256 previous characters
# baby GPT model :)
n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2
learning_rate = 1e-3 # with baby networks can afford to go a bit higher
max_iters = 5000
lr_decay_iters = 5000 # make equal to max_iters usually
min_lr = 1e-4 # learning_rate / 10 usually
beta2 = 0.99 # make a bit bigger because number of tokens per iter is small
warmup_iters = 100 # not super necessary potentially
# on macbook also add
# device = 'cpu' # run on cpu only
# compile = False # do not torch compile the model
tokens per iteration will be: 16,384
found vocab_size = 65 (inside data/shakespeare_char/meta.pkl)
Initializing a new model from scratch
number of parameters: 10.65M
num decayed parameter tensors: 26, with 10,740,096 parameters
num non-decayed parameter tensors: 13, with 4,992 parameters
using fused AdamW: True
compiling the model... (takes a ~minute)
step 0: train loss 4.2874, val loss 4.2823
iter 0: loss 4.2653, time 63096.80ms, mfu -100.00%
iter 10: loss 3.2457, time 26.20ms, mfu 14.22%
iter 20: loss 2.7915, time 25.57ms, mfu 14.26%
iter 30: loss 2.6355, time 30.23ms, mfu 14.06%
iter 40: loss 2.5778, time 24.31ms, mfu 14.19%
iter 50: loss 2.5276, time 23.87ms, mfu 14.33%
iter 60: loss 2.5197, time 23.92ms, mfu 14.46%
iter 70: loss 2.4955, time 24.45ms, mfu 14.53%
iter 80: loss 2.4979, time 24.21ms, mfu 14.62%
iter 90: loss 2.4680, time 24.54ms, mfu 14.68%
iter 100: loss 2.4593, time 23.53ms, mfu 14.79%
iter 110: loss 2.4555, time 23.73ms, mfu 14.88%
iter 120: loss 2.4263, time 23.98ms, mfu 14.95%
iter 130: loss 2.4232, time 23.74ms, mfu 15.02%
iter 140: loss 2.4242, time 24.14ms, mfu 15.06%
iter 150: loss 2.4215, time 24.01ms, mfu 15.11%
iter 160: loss 2.3771, time 23.73ms, mfu 15.17%
iter 170: loss 2.3707, time 23.82ms, mfu 15.22%
iter 180: loss 2.3205, time 24.16ms, mfu 15.24%
iter 190: loss 2.2627, time 24.40ms, mfu 15.24%
iter 200: loss 2.2094, time 24.51ms, mfu 15.24%
iter 210: loss 2.1420, time 24.70ms, mfu 15.22%
iter 220: loss 2.1468, time 25.08ms, mfu 15.19%
iter 230: loss 2.0693, time 24.71ms, mfu 15.18%
iter 240: loss 2.0824, time 24.63ms, mfu 15.17%
step 250: train loss 1.9701, val loss 2.0653
saving checkpoint to out-shakespeare-char
iter 250: loss 2.0363, time 5951.21ms, mfu 13.66%
iter 260: loss 1.9751, time 23.77ms, mfu 13.86%
iter 270: loss 1.9774, time 24.01ms, mfu 14.03%
iter 280: loss 1.9780, time 25.38ms, mfu 14.09%
iter 290: loss 1.9214, time 24.58ms, mfu 14.20%
iter 300: loss 1.9032, time 25.00ms, mfu 14.27%
iter 310: loss 1.8714, time 25.16ms, mfu 14.32%
iter 320: loss 1.8571, time 24.34ms, mfu 14.42%
iter 330: loss 1.8218, time 22.95ms, mfu 14.60%
iter 340: loss 1.7927, time 24.78ms, mfu 14.65%
iter 350: loss 1.8255, time 23.39ms, mfu 14.78%
iter 360: loss 1.7724, time 24.01ms, mfu 14.85%
iter 370: loss 1.7417, time 24.17ms, mfu 14.91%
iter 380: loss 1.7294, time 23.97ms, mfu 14.97%
iter 390: loss 1.7373, time 24.95ms, mfu 14.97%
iter 400: loss 1.7644, time 23.66ms, mfu 15.05%
iter 410: loss 1.6980, time 25.78ms, mfu 14.99%
iter 420: loss 1.7170, time 24.44ms, mfu 15.01%
iter 430: loss 1.6867, time 23.99ms, mfu 15.06%
iter 440: loss 1.6518, time 23.85ms, mfu 15.12%
iter 450: loss 1.6521, time 23.94ms, mfu 15.16%
iter 460: loss 1.6002, time 24.35ms, mfu 15.18%
iter 470: loss 1.6526, time 23.80ms, mfu 15.23%
iter 480: loss 1.6245, time 24.12ms, mfu 15.25%
iter 490: loss 1.6022, time 23.33ms, mfu 15.32%
step 500: train loss 1.5241, val loss 1.7211
saving checkpoint to out-shakespeare-char
iter 500: loss 1.6024, time 6146.97ms, mfu 13.79%
iter 510: loss 1.6112, time 24.03ms, mfu 13.97%
iter 520: loss 1.5880, time 23.14ms, mfu 14.18%
iter 530: loss 1.5644, time 24.12ms, mfu 14.31%
iter 540: loss 1.6184, time 24.52ms, mfu 14.39%
iter 550: loss 1.5651, time 26.54ms, mfu 14.36%
iter 560: loss 1.5734, time 24.24ms, mfu 14.46%
iter 570: loss 1.5722, time 28.19ms, mfu 14.34%
iter 580: loss 1.5349, time 24.14ms, mfu 14.45%
iter 590: loss 1.5056, time 23.54ms, mfu 14.58%
iter 600: loss 1.5145, time 22.48ms, mfu 14.78%
iter 610: loss 1.5486, time 22.58ms, mfu 14.96%
iter 620: loss 1.5350, time 22.33ms, mfu 15.13%
iter 630: loss 1.5134, time 22.33ms, mfu 15.28%
iter 640: loss 1.4694, time 22.96ms, mfu 15.38%
iter 650: loss 1.5040, time 22.96ms, mfu 15.46%
iter 660: loss 1.5103, time 22.20ms, mfu 15.60%
iter 670: loss 1.4481, time 22.58ms, mfu 15.69%
iter 680: loss 1.5124, time 22.53ms, mfu 15.77%
iter 690: loss 1.4731, time 22.09ms, mfu 15.88%
iter 700: loss 1.4843, time 23.55ms, mfu 15.88%
iter 710: loss 1.4687, time 22.53ms, mfu 15.94%
iter 720: loss 1.4458, time 21.94ms, mfu 16.05%
iter 730: loss 1.4242, time 22.18ms, mfu 16.12%
iter 740: loss 1.4332, time 22.55ms, mfu 16.16%
step 750: train loss 1.3644, val loss 1.5844
saving checkpoint to out-shakespeare-char
iter 750: loss 1.4309, time 6122.12ms, mfu 14.55%
iter 760: loss 1.4523, time 22.64ms, mfu 14.74%
iter 770: loss 1.4296, time 22.45ms, mfu 14.93%
iter 780: loss 1.4288, time 23.14ms, mfu 15.05%
iter 790: loss 1.4279, time 22.94ms, mfu 15.16%
iter 800: loss 1.4376, time 22.68ms, mfu 15.29%
iter 810: loss 1.4163, time 22.70ms, mfu 15.40%
iter 820: loss 1.4049, time 22.61ms, mfu 15.51%
iter 830: loss 1.3941, time 22.39ms, mfu 15.62%
iter 840: loss 1.4048, time 22.33ms, mfu 15.73%
iter 850: loss 1.3930, time 23.05ms, mfu 15.77%
iter 860: loss 1.4028, time 22.81ms, mfu 15.83%
iter 870: loss 1.4048, time 22.81ms, mfu 15.88%
iter 880: loss 1.3753, time 23.11ms, mfu 15.91%
iter 890: loss 1.3907, time 23.40ms, mfu 15.91%
iter 900: loss 1.3693, time 22.69ms, mfu 15.96%
iter 910: loss 1.3181, time 22.63ms, mfu 16.01%
iter 920: loss 1.3670, time 22.47ms, mfu 16.07%
iter 930: loss 1.3591, time 23.97ms, mfu 16.01%
iter 940: loss 1.3468, time 22.56ms, mfu 16.07%
iter 950: loss 1.3578, time 22.40ms, mfu 16.12%
iter 960: loss 1.3717, time 22.94ms, mfu 16.13%
iter 970: loss 1.3619, time 22.84ms, mfu 16.15%
iter 980: loss 1.3568, time 22.77ms, mfu 16.17%
iter 990: loss 1.3458, time 26.87ms, mfu 15.94%
step 1000: train loss 1.2754, val loss 1.5196
saving checkpoint to out-shakespeare-char
iter 1000: loss 1.3429, time 6093.23ms, mfu 14.35%
iter 1010: loss 1.3424, time 23.07ms, mfu 14.53%
iter 1020: loss 1.3205, time 23.39ms, mfu 14.67%
iter 1030: loss 1.3381, time 22.79ms, mfu 14.84%
iter 1040: loss 1.3590, time 27.43ms, mfu 14.72%
iter 1050: loss 1.2976, time 22.29ms, mfu 14.92%
iter 1060: loss 1.3369, time 25.23ms, mfu 14.90%
iter 1070: loss 1.3370, time 23.11ms, mfu 15.02%
iter 1080: loss 1.3389, time 23.80ms, mfu 15.09%
iter 1090: loss 1.3603, time 23.45ms, mfu 15.17%
iter 1100: loss 1.3247, time 22.31ms, mfu 15.32%
iter 1110: loss 1.3067, time 22.97ms, mfu 15.41%
iter 1120: loss 1.3021, time 22.39ms, mfu 15.53%
iter 1130: loss 1.2984, time 22.36ms, mfu 15.65%
iter 1140: loss 1.2977, time 22.54ms, mfu 15.74%
iter 1150: loss 1.3114, time 21.97ms, mfu 15.86%
iter 1160: loss 1.3262, time 24.98ms, mfu 15.76%
iter 1170: loss 1.3038, time 24.40ms, mfu 15.72%
iter 1180: loss 1.3313, time 23.84ms, mfu 15.71%
iter 1190: loss 1.2707, time 23.31ms, mfu 15.73%
iter 1200: loss 1.2932, time 26.93ms, mfu 15.54%
iter 1210: loss 1.2706, time 26.19ms, mfu 15.41%
iter 1220: loss 1.3089, time 22.09ms, mfu 15.56%
iter 1230: loss 1.2999, time 23.59ms, mfu 15.58%
iter 1240: loss 1.3035, time 25.59ms, mfu 15.48%
step 1250: train loss 1.2054, val loss 1.4892
saving checkpoint to out-shakespeare-char
iter 1250: loss 1.2701, time 6024.89ms, mfu 13.94%
iter 1260: loss 1.2864, time 24.41ms, mfu 14.07%
iter 1270: loss 1.2665, time 22.47ms, mfu 14.32%
iter 1280: loss 1.2582, time 23.30ms, mfu 14.49%
iter 1290: loss 1.2866, time 23.57ms, mfu 14.62%
iter 1300: loss 1.3070, time 25.59ms, mfu 14.61%
iter 1310: loss 1.2432, time 22.79ms, mfu 14.79%
iter 1320: loss 1.3118, time 22.55ms, mfu 14.96%
iter 1330: loss 1.2712, time 24.04ms, mfu 15.02%
iter 1340: loss 1.3096, time 24.78ms, mfu 15.02%
iter 1350: loss 1.2573, time 24.60ms, mfu 15.03%
iter 1360: loss 1.2825, time 22.38ms, mfu 15.19%
iter 1370: loss 1.2571, time 22.16ms, mfu 15.35%
iter 1380: loss 1.2656, time 22.37ms, mfu 15.49%
iter 1390: loss 1.2496, time 22.84ms, mfu 15.57%
iter 1400: loss 1.2581, time 22.49ms, mfu 15.67%
iter 1410: loss 1.2516, time 22.21ms, mfu 15.78%
iter 1420: loss 1.2718, time 22.78ms, mfu 15.84%
iter 1430: loss 1.2451, time 22.40ms, mfu 15.92%
iter 1440: loss 1.2566, time 22.65ms, mfu 15.97%
iter 1450: loss 1.2314, time 22.00ms, mfu 16.07%
iter 1460: loss 1.2399, time 22.26ms, mfu 16.13%
iter 1470: loss 1.2285, time 22.45ms, mfu 16.18%
iter 1480: loss 1.2167, time 22.10ms, mfu 16.25%
iter 1490: loss 1.2362, time 22.84ms, mfu 16.26%
step 1500: train loss 1.1535, val loss 1.4732
saving checkpoint to out-shakespeare-char
iter 1500: loss 1.1789, time 6073.34ms, mfu 14.64%
iter 1510: loss 1.2339, time 21.85ms, mfu 14.88%
iter 1520: loss 1.2293, time 21.91ms, mfu 15.09%
iter 1530: loss 1.2586, time 22.49ms, mfu 15.24%
iter 1540: loss 1.1933, time 21.72ms, mfu 15.43%
iter 1550: loss 1.2319, time 22.21ms, mfu 15.57%
iter 1560: loss 1.2098, time 21.94ms, mfu 15.71%
iter 1570: loss 1.2349, time 22.43ms, mfu 15.80%
iter 1580: loss 1.2005, time 22.50ms, mfu 15.87%
iter 1590: loss 1.1868, time 23.49ms, mfu 15.87%
iter 1600: loss 1.2023, time 22.93ms, mfu 15.91%
iter 1610: loss 1.2405, time 22.88ms, mfu 15.95%
iter 1620: loss 1.1821, time 22.59ms, mfu 16.00%
iter 1630: loss 1.2035, time 22.88ms, mfu 16.03%
iter 1640: loss 1.1998, time 22.63ms, mfu 16.07%
iter 1650: loss 1.1844, time 22.26ms, mfu 16.14%
iter 1660: loss 1.2186, time 22.79ms, mfu 16.16%
iter 1670: loss 1.1972, time 21.60ms, mfu 16.27%
iter 1680: loss 1.1925, time 21.44ms, mfu 16.38%
iter 1690: loss 1.2024, time 20.76ms, mfu 16.54%
iter 1700: loss 1.1876, time 21.16ms, mfu 16.65%
iter 1710: loss 1.1826, time 21.03ms, mfu 16.75%
iter 1720: loss 1.1820, time 21.74ms, mfu 16.79%
iter 1730: loss 1.2018, time 22.18ms, mfu 16.79%
iter 1740: loss 1.1714, time 24.66ms, mfu 16.62%
step 1750: train loss 1.1078, val loss 1.4666
saving checkpoint to out-shakespeare-char
iter 1750: loss 1.1865, time 6086.96ms, mfu 14.97%
iter 1760: loss 1.1894, time 22.01ms, mfu 15.16%
iter 1770: loss 1.1977, time 22.01ms, mfu 15.34%
iter 1780: loss 1.1938, time 22.01ms, mfu 15.50%
iter 1790: loss 1.1951, time 22.30ms, mfu 15.62%
iter 1800: loss 1.1797, time 22.08ms, mfu 15.75%
iter 1810: loss 1.1620, time 22.42ms, mfu 15.83%
iter 1820: loss 1.1651, time 22.11ms, mfu 15.94%
iter 1830: loss 1.1741, time 21.99ms, mfu 16.04%
iter 1840: loss 1.1600, time 21.93ms, mfu 16.13%
iter 1850: loss 1.1584, time 22.56ms, mfu 16.17%
iter 1860: loss 1.1758, time 22.43ms, mfu 16.21%
iter 1870: loss 1.1463, time 22.25ms, mfu 16.27%
iter 1880: loss 1.1801, time 21.88ms, mfu 16.34%
iter 1890: loss 1.1793, time 22.29ms, mfu 16.38%
iter 1900: loss 1.1295, time 22.03ms, mfu 16.43%
iter 1910: loss 1.1678, time 22.10ms, mfu 16.48%
iter 1920: loss 1.1734, time 22.27ms, mfu 16.50%
iter 1930: loss 1.1435, time 22.90ms, mfu 16.48%
iter 1940: loss 1.1255, time 22.72ms, mfu 16.47%
iter 1950: loss 1.1372, time 22.51ms, mfu 16.48%
iter 1960: loss 1.1511, time 22.12ms, mfu 16.52%
iter 1970: loss 1.1553, time 23.29ms, mfu 16.46%
iter 1980: loss 1.1539, time 22.32ms, mfu 16.49%
iter 1990: loss 1.1550, time 22.54ms, mfu 16.49%
step 2000: train loss 1.0588, val loss 1.4768
iter 2000: loss 1.1318, time 5555.84ms, mfu 14.85%
iter 2010: loss 1.1303, time 21.70ms, mfu 15.08%
iter 2020: loss 1.1198, time 22.04ms, mfu 15.26%
iter 2030: loss 1.1602, time 21.98ms, mfu 15.43%
iter 2040: loss 1.1514, time 22.06ms, mfu 15.58%
iter 2050: loss 1.1180, time 22.83ms, mfu 15.65%
iter 2060: loss 1.1007, time 22.67ms, mfu 15.73%
iter 2070: loss 1.1195, time 22.61ms, mfu 15.81%
iter 2080: loss 1.1151, time 21.88ms, mfu 15.93%
iter 2090: loss 1.1349, time 21.56ms, mfu 16.06%
iter 2100: loss 1.1289, time 24.12ms, mfu 16.00%
iter 2110: loss 1.1317, time 21.97ms, mfu 16.10%
iter 2120: loss 1.1291, time 26.68ms, mfu 15.89%
iter 2130: loss 1.1441, time 25.68ms, mfu 15.75%
iter 2140: loss 1.1326, time 24.77ms, mfu 15.68%
iter 2150: loss 1.1247, time 25.94ms, mfu 15.55%
iter 2160: loss 1.1412, time 22.96ms, mfu 15.61%
iter 2170: loss 1.1311, time 22.18ms, mfu 15.73%
iter 2180: loss 1.1122, time 24.89ms, mfu 15.66%
iter 2190: loss 1.1037, time 22.46ms, mfu 15.75%
iter 2200: loss 1.1206, time 21.88ms, mfu 15.88%
iter 2210: loss 1.1166, time 27.61ms, mfu 15.64%
iter 2220: loss 1.1287, time 25.00ms, mfu 15.57%
iter 2230: loss 1.1233, time 24.88ms, mfu 15.51%
iter 2240: loss 1.1245, time 23.68ms, mfu 15.53%
step 2250: train loss 1.0093, val loss 1.4766
iter 2250: loss 1.1141, time 5505.39ms, mfu 13.98%
iter 2260: loss 1.1085, time 22.03ms, mfu 14.28%
iter 2270: loss 1.1323, time 24.82ms, mfu 14.35%
iter 2280: loss 1.0976, time 21.92ms, mfu 14.62%
iter 2290: loss 1.1398, time 25.94ms, mfu 14.59%
iter 2300: loss 1.1237, time 22.61ms, mfu 14.78%
iter 2310: loss 1.0968, time 22.40ms, mfu 14.96%
iter 2320: loss 1.0932, time 21.61ms, mfu 15.19%
iter 2330: loss 1.0976, time 22.15ms, mfu 15.36%
iter 2340: loss 1.1205, time 23.01ms, mfu 15.44%
iter 2350: loss 1.1055, time 21.63ms, mfu 15.62%
iter 2360: loss 1.1034, time 22.32ms, mfu 15.73%
iter 2370: loss 1.0907, time 22.41ms, mfu 15.82%
iter 2380: loss 1.0793, time 22.47ms, mfu 15.89%
iter 2390: loss 1.0797, time 22.24ms, mfu 15.98%
iter 2400: loss 1.0767, time 22.32ms, mfu 16.05%
iter 2410: loss 1.0693, time 22.38ms, mfu 16.11%
iter 2420: loss 1.0810, time 22.37ms, mfu 16.17%
iter 2430: loss 1.0520, time 21.99ms, mfu 16.24%
iter 2440: loss 1.0627, time 25.68ms, mfu 16.07%
iter 2450: loss 1.0707, time 24.98ms, mfu 15.95%
iter 2460: loss 1.0842, time 22.54ms, mfu 16.01%
iter 2470: loss 1.0911, time 24.08ms, mfu 15.96%
iter 2480: loss 1.0847, time 28.18ms, mfu 15.69%
iter 2490: loss 1.0555, time 22.49ms, mfu 15.77%
step 2500: train loss 0.9596, val loss 1.4911
iter 2500: loss 1.0836, time 5470.28ms, mfu 14.20%
iter 2510: loss 1.0655, time 22.15ms, mfu 14.46%
iter 2520: loss 1.0464, time 23.09ms, mfu 14.63%
iter 2530: loss 1.0528, time 22.17ms, mfu 14.85%
iter 2540: loss 1.0530, time 22.08ms, mfu 15.05%
iter 2550: loss 1.0656, time 22.55ms, mfu 15.20%
iter 2560: loss 1.0612, time 22.14ms, mfu 15.36%
iter 2570: loss 1.0652, time 22.21ms, mfu 15.50%
iter 2580: loss 1.0723, time 21.92ms, mfu 15.65%
iter 2590: loss 1.0594, time 21.82ms, mfu 15.80%
iter 2600: loss 1.0683, time 23.04ms, mfu 15.83%
iter 2610: loss 1.0562, time 22.46ms, mfu 15.91%
iter 2620: loss 1.0449, time 22.45ms, mfu 15.98%
iter 2630: loss 1.0174, time 21.94ms, mfu 16.08%
iter 2640: loss 1.0451, time 22.79ms, mfu 16.11%
iter 2650: loss 1.0688, time 21.71ms, mfu 16.21%
iter 2660: loss 1.0489, time 21.77ms, mfu 16.30%
iter 2670: loss 1.0203, time 21.89ms, mfu 16.37%
iter 2680: loss 1.0466, time 21.70ms, mfu 16.45%
iter 2690: loss 1.0463, time 22.39ms, mfu 16.47%
iter 2700: loss 1.0279, time 22.81ms, mfu 16.46%
iter 2710: loss 1.0498, time 22.06ms, mfu 16.50%
iter 2720: loss 1.0480, time 22.56ms, mfu 16.50%
iter 2730: loss 1.0585, time 22.02ms, mfu 16.54%
iter 2740: loss 1.0237, time 22.43ms, mfu 16.55%
step 2750: train loss 0.9139, val loss 1.5087
iter 2750: loss 1.0349, time 5524.75ms, mfu 14.90%
iter 2760: loss 1.0316, time 22.22ms, mfu 15.09%
iter 2770: loss 1.0238, time 22.58ms, mfu 15.23%
iter 2780: loss 1.0179, time 23.34ms, mfu 15.30%
iter 2790: loss 1.0385, time 25.08ms, mfu 15.26%
iter 2800: loss 1.0111, time 25.38ms, mfu 15.20%
iter 2810: loss 1.0375, time 25.71ms, mfu 15.13%
iter 2820: loss 1.0296, time 25.37ms, mfu 15.09%
iter 2830: loss 1.0331, time 26.00ms, mfu 15.01%
iter 2840: loss 0.9957, time 25.35ms, mfu 14.98%
iter 2850: loss 1.0298, time 25.64ms, mfu 14.93%
iter 2860: loss 1.0208, time 22.46ms, mfu 15.10%
iter 2870: loss 1.0065, time 22.34ms, mfu 15.26%
iter 2880: loss 1.0392, time 22.09ms, mfu 15.42%
iter 2890: loss 1.0158, time 22.32ms, mfu 15.55%
iter 2900: loss 0.9874, time 22.60ms, mfu 15.64%
iter 2910: loss 1.0350, time 22.34ms, mfu 15.75%
iter 2920: loss 1.0096, time 22.43ms, mfu 15.83%
iter 2930: loss 0.9988, time 22.50ms, mfu 15.90%
iter 2940: loss 0.9874, time 22.11ms, mfu 16.00%
iter 2950: loss 1.0257, time 22.08ms, mfu 16.09%
iter 2960: loss 0.9979, time 21.49ms, mfu 16.21%
iter 2970: loss 0.9899, time 21.68ms, mfu 16.31%
iter 2980: loss 0.9994, time 22.05ms, mfu 16.37%
iter 2990: loss 0.9872, time 23.57ms, mfu 16.31%
step 3000: train loss 0.8691, val loss 1.5141
iter 3000: loss 0.9838, time 5583.25ms, mfu 14.69%
iter 3010: loss 0.9946, time 22.61ms, mfu 14.87%
iter 3020: loss 0.9977, time 21.81ms, mfu 15.09%
iter 3030: loss 0.9984, time 22.07ms, mfu 15.27%
iter 3040: loss 1.0224, time 21.87ms, mfu 15.45%
iter 3050: loss 0.9825, time 22.16ms, mfu 15.58%
iter 3060: loss 0.9946, time 22.90ms, mfu 15.65%
iter 3070: loss 1.0188, time 22.08ms, mfu 15.77%
iter 3080: loss 0.9968, time 22.05ms, mfu 15.89%
iter 3090: loss 0.9772, time 22.71ms, mfu 15.94%
iter 3100: loss 0.9952, time 22.49ms, mfu 16.00%
iter 3110: loss 0.9726, time 21.90ms, mfu 16.10%
iter 3120: loss 0.9864, time 22.27ms, mfu 16.17%
iter 3130: loss 0.9733, time 21.90ms, mfu 16.25%
iter 3140: loss 0.9791, time 22.39ms, mfu 16.29%
iter 3150: loss 0.9916, time 22.61ms, mfu 16.31%
iter 3160: loss 1.0107, time 22.07ms, mfu 16.37%
iter 3170: loss 0.9597, time 21.79ms, mfu 16.44%
iter 3180: loss 0.9751, time 22.00ms, mfu 16.49%
iter 3190: loss 0.9947, time 22.42ms, mfu 16.50%
iter 3200: loss 0.9622, time 22.54ms, mfu 16.51%
iter 3210: loss 0.9683, time 22.50ms, mfu 16.51%
iter 3220: loss 0.9589, time 22.30ms, mfu 16.53%
iter 3230: loss 0.9586, time 21.94ms, mfu 16.58%
iter 3240: loss 0.9501, time 22.43ms, mfu 16.58%
...
iter 4740: loss 0.8315, time 28.29ms, mfu 13.83%
step 4750: train loss 0.6374, val loss 1.6787
iter 4750: loss 0.8026, time 5595.11ms, mfu 12.46%
iter 4760: loss 0.8225, time 30.65ms, mfu 12.43%
iter 4770: loss 0.8048, time 28.84ms, mfu 12.48%
iter 4780: loss 0.8005, time 29.05ms, mfu 12.51%
iter 4790: loss 0.8362, time 23.24ms, mfu 12.86%
iter 4800: loss 0.8199, time 23.07ms, mfu 13.19%
iter 4810: loss 0.8462, time 23.47ms, mfu 13.46%
iter 4820: loss 0.8157, time 23.35ms, mfu 13.71%
iter 4830: loss 0.8207, time 23.56ms, mfu 13.92%
iter 4840: loss 0.8352, time 23.79ms, mfu 14.09%
iter 4850: loss 0.8269, time 24.02ms, mfu 14.24%
iter 4860: loss 0.8281, time 23.15ms, mfu 14.42%
iter 4870: loss 0.8026, time 23.25ms, mfu 14.58%
iter 4880: loss 0.8258, time 23.29ms, mfu 14.72%
iter 4890: loss 0.8093, time 28.46ms, mfu 14.56%
iter 4900: loss 0.8016, time 23.01ms, mfu 14.72%
iter 4910: loss 0.8239, time 23.59ms, mfu 14.83%
iter 4920: loss 0.8265, time 23.23ms, mfu 14.95%
iter 4930: loss 0.8089, time 23.18ms, mfu 15.06%
iter 4940: loss 0.7959, time 23.32ms, mfu 15.16%
iter 4950: loss 0.8234, time 23.23ms, mfu 15.24%
iter 4960: loss 0.8313, time 23.53ms, mfu 15.30%
iter 4970: loss 0.7882, time 23.52ms, mfu 15.36%
iter 4980: loss 0.7989, time 23.63ms, mfu 15.40%
iter 4990: loss 0.8269, time 23.17ms, mfu 15.47%
step 5000: train loss 0.6222, val loss 1.6942
iter 5000: loss 0.8200, time 5692.96ms, mfu 13.93%
Sun Jul 7 00:19:25 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 30% 29C P8 21W / 350W | 10294MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:02:00.0 Off | N/A |
| 30% 28C P8 30W / 350W | 353MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce RTX 3090 Off | 00000000:03:00.0 Off | N/A |
| 30% 28C P8 15W / 350W | 353MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce RTX 3090 Off | 00000000:04:00.0 Off | N/A |
| 30% 28C P8 23W / 350W | 349MiB / 24576MiB | 0% Default |
| | | N/A |
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 57721 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 58786 C python 340MiB |
| 0 N/A N/A 555464 C /root/miniconda3/bin/python 9934MiB |
现在开始gpt2
python data/openwebtext/prepare.py
torchrun --standalone --nproc_per_node=8 train.py config/train_gpt2.py
要跑4天 8XA100 40G