Replacing matching entries in one column of a file by another column from a different file The 2019 Stack Overflow Developer Survey Results Are InMerge two files: two lines, partial line, two lines, partial line, etcFind common elements in a given column from two files and output the column values from each filecompare multiple files(more than two) with two different columnsReplace column in one file with column from another using awk?Joining columns from files if they contain a match in another columnMerging two files, one column at a timeColumn matching in separate filesExtract row if both column values appear in a single column from a separate fileJoining entries based off of column using awk/joinCompare two files by first column. Keep rows if matchingRecursively find and replace contents of one file using a key from another file

Time travel alters history but people keep saying nothing's changed

Did Scotland spend $250,000 for the slogan "Welcome to Scotland"?

Is a "Democratic" Oligarchy-Style System Possible?

Return to UK after being refused entry years previously

Can a flute soloist sit?

Multiply Two Integer Polynomials

How to manage monthly salary

Geography at the pixel level

When should I buy a clipper card after flying to OAK?

For what reasons would an animal species NOT cross a *horizontal* land bridge?

Apparent duplicates between Haynes service instructions and MOT

How technical should a Scrum Master be to effectively remove impediments?

Falsification in Math vs Science

Resizing object distorts it (Illustrator CC 2018)

Delete all lines which don't have n characters before delimiter

How to deal with fear of taking dependencies

How to answer pointed "are you quitting" questioning when I don't want them to suspect

Is there any way to tell whether the shot is going to hit you or not?

Which Sci-Fi work first showed weapon of galactic-scale mass destruction?

Lightning Grid - Columns and Rows?

Aging parents with no investments

Did Section 31 appear in Star Trek: The Next Generation?

"as much details as you can remember"

If a Druid sees an animal’s corpse, can they wild shape into that animal?

Replacing matching entries in one column of a file by another column from a different file

The 2019 Stack Overflow Developer Survey Results Are InMerge two files: two lines, partial line, two lines, partial line, etcFind common elements in a given column from two files and output the column values from each filecompare multiple files(more than two) with two different columnsReplace column in one file with column from another using awk?Joining columns from files if they contain a match in another columnMerging two files, one column at a timeColumn matching in separate filesExtract row if both column values appear in a single column from a separate fileJoining entries based off of column using awk/joinCompare two files by first column. Keep rows if matchingRecursively find and replace contents of one file using a key from another file

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I have two tab-separated files which look as follows:

file1:

NC_008146.1 WP_011558474.1 1155234 1156286 44173
NC_008146.1 WP_011558475.1 1156298 1156807 12
NC_008146.1 WP_011558476.1 1156804 1157820 -3
NC_008705.1 WP_011558474.1 1159543 1160595 42748
NC_008705.1 WP_011558475.1 1160607 1161116 12
NC_008705.1 WP_011558476.1 1161113 1162129 -3
NC_009077.1 WP_011559727.1 2481079 2481633 8
NC_009077.1 WP_011854835.1 1163068 1164120 42559
NC_009077.1 WP_011854836.1 1164127 1164636 7

file2:

NC_008146.1 GCF_000014165.1_ASM1416v1_protein.faa
NC_008705.1 GCF_000015405.1_ASM1540v1_protein.faa
NC_009077.1 GCF_000016005.1_ASM1600v1_protein.faa

I want to match column 1 of file1 to file2 and replace itself with the respective column 2 entry of file 2.
The output would look like this:

GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

edited Apr 5 at 12:56

Rui F Ribeiro

42k1483142

asked Apr 5 at 12:32

BhushanDhamale

1623

New contributor

It looks like you might also be interested in our sister site: Bioinformatics.

– terdon♦
Apr 5 at 12:54

Thank you for the link @terdon!

– BhushanDhamale
Apr 5 at 12:57

add a comment |

I have two tab-separated files which look as follows:

file1:

NC_008146.1 WP_011558474.1 1155234 1156286 44173
NC_008146.1 WP_011558475.1 1156298 1156807 12
NC_008146.1 WP_011558476.1 1156804 1157820 -3
NC_008705.1 WP_011558474.1 1159543 1160595 42748
NC_008705.1 WP_011558475.1 1160607 1161116 12
NC_008705.1 WP_011558476.1 1161113 1162129 -3
NC_009077.1 WP_011559727.1 2481079 2481633 8
NC_009077.1 WP_011854835.1 1163068 1164120 42559
NC_009077.1 WP_011854836.1 1164127 1164636 7

file2:

NC_008146.1 GCF_000014165.1_ASM1416v1_protein.faa
NC_008705.1 GCF_000015405.1_ASM1540v1_protein.faa
NC_009077.1 GCF_000016005.1_ASM1600v1_protein.faa

I want to match column 1 of file1 to file2 and replace itself with the respective column 2 entry of file 2.
The output would look like this:

GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

edited Apr 5 at 12:56

Rui F Ribeiro

42k1483142

asked Apr 5 at 12:32

BhushanDhamale

1623

New contributor

It looks like you might also be interested in our sister site: Bioinformatics.

– terdon♦
Apr 5 at 12:54

Thank you for the link @terdon!

– BhushanDhamale
Apr 5 at 12:57

add a comment |

I have two tab-separated files which look as follows:

file1:

NC_008146.1 WP_011558474.1 1155234 1156286 44173
NC_008146.1 WP_011558475.1 1156298 1156807 12
NC_008146.1 WP_011558476.1 1156804 1157820 -3
NC_008705.1 WP_011558474.1 1159543 1160595 42748
NC_008705.1 WP_011558475.1 1160607 1161116 12
NC_008705.1 WP_011558476.1 1161113 1162129 -3
NC_009077.1 WP_011559727.1 2481079 2481633 8
NC_009077.1 WP_011854835.1 1163068 1164120 42559
NC_009077.1 WP_011854836.1 1164127 1164636 7

file2:

NC_008146.1 GCF_000014165.1_ASM1416v1_protein.faa
NC_008705.1 GCF_000015405.1_ASM1540v1_protein.faa
NC_009077.1 GCF_000016005.1_ASM1600v1_protein.faa

I want to match column 1 of file1 to file2 and replace itself with the respective column 2 entry of file 2.
The output would look like this:

GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

edited Apr 5 at 12:56

Rui F Ribeiro

42k1483142

asked Apr 5 at 12:32

BhushanDhamale

1623

New contributor

I have two tab-separated files which look as follows:

file1:

NC_008146.1 WP_011558474.1 1155234 1156286 44173
NC_008146.1 WP_011558475.1 1156298 1156807 12
NC_008146.1 WP_011558476.1 1156804 1157820 -3
NC_008705.1 WP_011558474.1 1159543 1160595 42748
NC_008705.1 WP_011558475.1 1160607 1161116 12
NC_008705.1 WP_011558476.1 1161113 1162129 -3
NC_009077.1 WP_011559727.1 2481079 2481633 8
NC_009077.1 WP_011854835.1 1163068 1164120 42559
NC_009077.1 WP_011854836.1 1164127 1164636 7

file2:

NC_008146.1 GCF_000014165.1_ASM1416v1_protein.faa
NC_008705.1 GCF_000015405.1_ASM1540v1_protein.faa
NC_009077.1 GCF_000016005.1_ASM1600v1_protein.faa

I want to match column 1 of file1 to file2 and replace itself with the respective column 2 entry of file 2.
The output would look like this:

GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

awk

edited Apr 5 at 12:56

Rui F Ribeiro

42k1483142

asked Apr 5 at 12:32

BhushanDhamale

1623

New contributor

edited Apr 5 at 12:56

Rui F Ribeiro

42k1483142

asked Apr 5 at 12:32

BhushanDhamale

1623

New contributor

edited Apr 5 at 12:56

Rui F Ribeiro

42k1483142

edited Apr 5 at 12:56

Rui F Ribeiro

42k1483142

edited Apr 5 at 12:56

Rui F Ribeiro

42k1483142

asked Apr 5 at 12:32

BhushanDhamale

1623

New contributor

asked Apr 5 at 12:32

BhushanDhamale

1623

asked Apr 5 at 12:32

BhushanDhamale

1623

New contributor

BhushanDhamale is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

It looks like you might also be interested in our sister site: Bioinformatics.

– terdon♦
Apr 5 at 12:54

Thank you for the link @terdon!

– BhushanDhamale
Apr 5 at 12:57

add a comment |

It looks like you might also be interested in our sister site: Bioinformatics.

– terdon♦
Apr 5 at 12:54

Thank you for the link @terdon!

– BhushanDhamale
Apr 5 at 12:57

It looks like you might also be interested in our sister site: Bioinformatics.

– terdon♦
Apr 5 at 12:54

Thank you for the link @terdon!

– BhushanDhamale
Apr 5 at 12:57

add a comment |

3 Answers
3

active

oldest

votes

You can do this very easily with awk:

$ awk 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

Or, since that looks like a tab-separated file:

$ awk -vOFS="t" 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

This assumes that every RefSeq (NC_*) id in file1 has a corresponding entry in file2.

Explanation

NR==FNR : NR is the current line number, FNR is the line number of the current file. The two will be identical only while the 1st file (here, file2) is being read.

a[$1]=$2; next: if this is the first file (see above), save the 2nd field in an array whose key is the 1st field. Then, move on to the next line. This ensures the next block isn't executed for the 1st file.

$1=a[$1]; print : now, in the second file, set the 1st field to whatever value was saved in the array a for the 1st field (so, the associated value from file2) and print the resulting line.

edited Apr 5 at 12:50

answered Apr 5 at 12:38

terdon♦

134k33269449

1

NR == FNR doesn't work correctly when the first file is empty. See this and the associated answer for a workaround

– iruvar
Apr 5 at 12:44

1

@iruvar nothing will work well if the first file is empty, so I don't really see why that's relevant. The entire point here is to combine the data from the two files. If either file is empty, the whole exercise is pointless.

– terdon♦
Apr 5 at 12:45

sorry I should have said in this particular case file2 and not file1 is empty. Sane behaviour when file2 is empty is to report the contents of file1. The problem with NR == FNR is that code associated with it executes on the contents of file1 when file2 is empty

– iruvar
Apr 5 at 12:51

1

@iruvar there is no sane behavior here if either file is empty. That's what I'm saying :) So trying to make it deal with that case gracefully is pointless. And, in any case, when either file is empty here, nothing is printed. Which actually seems like the sanest approach, I'd rather get no data than wrong data.

– terdon♦
Apr 5 at 12:54

add a comment |

No need for awk, assuming the files are sorted, you can use coreutils join:

join -o '2.2 1.2 1.3 1.4 1.5' file1 file2

Output:

GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

If your files aren't, sorted, you can either sort them first (sort file1 > file1.sorted; sort file2 > file2.sorted) and then use the command above, or, if your shell supports the <() construct (bash does), you can do:

join -o '2.2 1.2 1.3 1.4 1.5' <(sort file1) <(sort file2)

edited Apr 5 at 13:00

terdon♦

134k33269449

answered Apr 5 at 12:39

Thor

12.2k13762

add a comment |

Tested with below command and worked fine

for i in `awk 'print $1' f2`; do k=`awk -v i="$i" '$1==i print $2' f2`;sed "/$i/s/$i/$k/g" f1 >f3;done

output

for i in `awk 'print $1' f2`; do k=`awk -v i="$i" '$1==i print $2' f2`;sed "/$i/s/$i/$k/g" f1 >f3;done


GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

answered Apr 7 at 13:47

Praveen Kumar BS

1,7641311

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

BhushanDhamale is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f510709%2freplacing-matching-entries-in-one-column-of-a-file-by-another-column-from-a-diff%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

You can do this very easily with awk:

$ awk 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

Or, since that looks like a tab-separated file:

$ awk -vOFS="t" 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

This assumes that every RefSeq (NC_*) id in file1 has a corresponding entry in file2.

Explanation

NR==FNR : NR is the current line number, FNR is the line number of the current file. The two will be identical only while the 1st file (here, file2) is being read.

a[$1]=$2; next: if this is the first file (see above), save the 2nd field in an array whose key is the 1st field. Then, move on to the next line. This ensures the next block isn't executed for the 1st file.

$1=a[$1]; print : now, in the second file, set the 1st field to whatever value was saved in the array a for the 1st field (so, the associated value from file2) and print the resulting line.

edited Apr 5 at 12:50

answered Apr 5 at 12:38

terdon♦

134k33269449

1

NR == FNR doesn't work correctly when the first file is empty. See this and the associated answer for a workaround

– iruvar
Apr 5 at 12:44

1

@iruvar nothing will work well if the first file is empty, so I don't really see why that's relevant. The entire point here is to combine the data from the two files. If either file is empty, the whole exercise is pointless.

– terdon♦
Apr 5 at 12:45

sorry I should have said in this particular case file2 and not file1 is empty. Sane behaviour when file2 is empty is to report the contents of file1. The problem with NR == FNR is that code associated with it executes on the contents of file1 when file2 is empty

– iruvar
Apr 5 at 12:51

1

@iruvar there is no sane behavior here if either file is empty. That's what I'm saying :) So trying to make it deal with that case gracefully is pointless. And, in any case, when either file is empty here, nothing is printed. Which actually seems like the sanest approach, I'd rather get no data than wrong data.

– terdon♦
Apr 5 at 12:54

add a comment |

You can do this very easily with awk:

$ awk 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

Or, since that looks like a tab-separated file:

$ awk -vOFS="t" 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

This assumes that every RefSeq (NC_*) id in file1 has a corresponding entry in file2.

Explanation

NR==FNR : NR is the current line number, FNR is the line number of the current file. The two will be identical only while the 1st file (here, file2) is being read.

a[$1]=$2; next: if this is the first file (see above), save the 2nd field in an array whose key is the 1st field. Then, move on to the next line. This ensures the next block isn't executed for the 1st file.

$1=a[$1]; print : now, in the second file, set the 1st field to whatever value was saved in the array a for the 1st field (so, the associated value from file2) and print the resulting line.

edited Apr 5 at 12:50

answered Apr 5 at 12:38

terdon♦

134k33269449

1

NR == FNR doesn't work correctly when the first file is empty. See this and the associated answer for a workaround

– iruvar
Apr 5 at 12:44

1

@iruvar nothing will work well if the first file is empty, so I don't really see why that's relevant. The entire point here is to combine the data from the two files. If either file is empty, the whole exercise is pointless.

– terdon♦
Apr 5 at 12:45

sorry I should have said in this particular case file2 and not file1 is empty. Sane behaviour when file2 is empty is to report the contents of file1. The problem with NR == FNR is that code associated with it executes on the contents of file1 when file2 is empty

– iruvar
Apr 5 at 12:51

1

@iruvar there is no sane behavior here if either file is empty. That's what I'm saying :) So trying to make it deal with that case gracefully is pointless. And, in any case, when either file is empty here, nothing is printed. Which actually seems like the sanest approach, I'd rather get no data than wrong data.

– terdon♦
Apr 5 at 12:54

add a comment |

You can do this very easily with awk:

$ awk 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

Or, since that looks like a tab-separated file:

$ awk -vOFS="t" 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

This assumes that every RefSeq (NC_*) id in file1 has a corresponding entry in file2.

Explanation

NR==FNR : NR is the current line number, FNR is the line number of the current file. The two will be identical only while the 1st file (here, file2) is being read.

a[$1]=$2; next: if this is the first file (see above), save the 2nd field in an array whose key is the 1st field. Then, move on to the next line. This ensures the next block isn't executed for the 1st file.

$1=a[$1]; print : now, in the second file, set the 1st field to whatever value was saved in the array a for the 1st field (so, the associated value from file2) and print the resulting line.

edited Apr 5 at 12:50

answered Apr 5 at 12:38

terdon♦

134k33269449

You can do this very easily with awk:

$ awk 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

Or, since that looks like a tab-separated file:

$ awk -vOFS="t" 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

This assumes that every RefSeq (NC_*) id in file1 has a corresponding entry in file2.

Explanation

NR==FNR : NR is the current line number, FNR is the line number of the current file. The two will be identical only while the 1st file (here, file2) is being read.

a[$1]=$2; next: if this is the first file (see above), save the 2nd field in an array whose key is the 1st field. Then, move on to the next line. This ensures the next block isn't executed for the 1st file.

$1=a[$1]; print : now, in the second file, set the 1st field to whatever value was saved in the array a for the 1st field (so, the associated value from file2) and print the resulting line.

edited Apr 5 at 12:50

answered Apr 5 at 12:38

terdon♦

134k33269449

edited Apr 5 at 12:50

answered Apr 5 at 12:38

terdon♦

134k33269449

answered Apr 5 at 12:38

terdon♦

134k33269449

answered Apr 5 at 12:38

terdon♦

134k33269449

1

NR == FNR doesn't work correctly when the first file is empty. See this and the associated answer for a workaround

– iruvar
Apr 5 at 12:44

1

@iruvar nothing will work well if the first file is empty, so I don't really see why that's relevant. The entire point here is to combine the data from the two files. If either file is empty, the whole exercise is pointless.

– terdon♦
Apr 5 at 12:45

sorry I should have said in this particular case file2 and not file1 is empty. Sane behaviour when file2 is empty is to report the contents of file1. The problem with NR == FNR is that code associated with it executes on the contents of file1 when file2 is empty

– iruvar
Apr 5 at 12:51

1

@iruvar there is no sane behavior here if either file is empty. That's what I'm saying :) So trying to make it deal with that case gracefully is pointless. And, in any case, when either file is empty here, nothing is printed. Which actually seems like the sanest approach, I'd rather get no data than wrong data.

– terdon♦
Apr 5 at 12:54

add a comment |

1

NR == FNR doesn't work correctly when the first file is empty. See this and the associated answer for a workaround

– iruvar
Apr 5 at 12:44

1

@iruvar nothing will work well if the first file is empty, so I don't really see why that's relevant. The entire point here is to combine the data from the two files. If either file is empty, the whole exercise is pointless.

– terdon♦
Apr 5 at 12:45

sorry I should have said in this particular case file2 and not file1 is empty. Sane behaviour when file2 is empty is to report the contents of file1. The problem with NR == FNR is that code associated with it executes on the contents of file1 when file2 is empty

– iruvar
Apr 5 at 12:51

1

@iruvar there is no sane behavior here if either file is empty. That's what I'm saying :) So trying to make it deal with that case gracefully is pointless. And, in any case, when either file is empty here, nothing is printed. Which actually seems like the sanest approach, I'd rather get no data than wrong data.

– terdon♦
Apr 5 at 12:54

NR == FNR doesn't work correctly when the first file is empty. See this and the associated answer for a workaround

– iruvar
Apr 5 at 12:44

@iruvar nothing will work well if the first file is empty, so I don't really see why that's relevant. The entire point here is to combine the data from the two files. If either file is empty, the whole exercise is pointless.

– terdon♦
Apr 5 at 12:45

sorry I should have said in this particular case file2 and not file1 is empty. Sane behaviour when file2 is empty is to report the contents of file1. The problem with NR == FNR is that code associated with it executes on the contents of file1 when file2 is empty

– iruvar
Apr 5 at 12:51

@iruvar there is no sane behavior here if either file is empty. That's what I'm saying :) So trying to make it deal with that case gracefully is pointless. And, in any case, when either file is empty here, nothing is printed. Which actually seems like the sanest approach, I'd rather get no data than wrong data.

– terdon♦
Apr 5 at 12:54

add a comment |

No need for awk, assuming the files are sorted, you can use coreutils join:

join -o '2.2 1.2 1.3 1.4 1.5' file1 file2

Output:

GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

join -o '2.2 1.2 1.3 1.4 1.5' <(sort file1) <(sort file2)

edited Apr 5 at 13:00

terdon♦

134k33269449

answered Apr 5 at 12:39

Thor

12.2k13762

add a comment |

No need for awk, assuming the files are sorted, you can use coreutils join:

join -o '2.2 1.2 1.3 1.4 1.5' file1 file2

Output:

GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

join -o '2.2 1.2 1.3 1.4 1.5' <(sort file1) <(sort file2)

edited Apr 5 at 13:00

terdon♦

134k33269449

answered Apr 5 at 12:39

Thor

12.2k13762

add a comment |

No need for awk, assuming the files are sorted, you can use coreutils join:

join -o '2.2 1.2 1.3 1.4 1.5' file1 file2

Output:

GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

join -o '2.2 1.2 1.3 1.4 1.5' <(sort file1) <(sort file2)

edited Apr 5 at 13:00

terdon♦

134k33269449

answered Apr 5 at 12:39

Thor

12.2k13762

No need for awk, assuming the files are sorted, you can use coreutils join:

join -o '2.2 1.2 1.3 1.4 1.5' file1 file2

Output:

GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

join -o '2.2 1.2 1.3 1.4 1.5' <(sort file1) <(sort file2)

edited Apr 5 at 13:00

terdon♦

134k33269449

answered Apr 5 at 12:39

Thor

12.2k13762

edited Apr 5 at 13:00

terdon♦

134k33269449

edited Apr 5 at 13:00

terdon♦

134k33269449

edited Apr 5 at 13:00

terdon♦

134k33269449

answered Apr 5 at 12:39

Thor

12.2k13762

answered Apr 5 at 12:39

Thor

12.2k13762

answered Apr 5 at 12:39

Thor

12.2k13762

add a comment |

Tested with below command and worked fine

for i in `awk 'print $1' f2`; do k=`awk -v i="$i" '$1==i print $2' f2`;sed "/$i/s/$i/$k/g" f1 >f3;done

output

for i in `awk 'print $1' f2`; do k=`awk -v i="$i" '$1==i print $2' f2`;sed "/$i/s/$i/$k/g" f1 >f3;done


GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

answered Apr 7 at 13:47

Praveen Kumar BS

1,7641311

add a comment |

Tested with below command and worked fine

for i in `awk 'print $1' f2`; do k=`awk -v i="$i" '$1==i print $2' f2`;sed "/$i/s/$i/$k/g" f1 >f3;done

output

for i in `awk 'print $1' f2`; do k=`awk -v i="$i" '$1==i print $2' f2`;sed "/$i/s/$i/$k/g" f1 >f3;done


GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

answered Apr 7 at 13:47

Praveen Kumar BS

1,7641311

add a comment |

Tested with below command and worked fine

for i in `awk 'print $1' f2`; do k=`awk -v i="$i" '$1==i print $2' f2`;sed "/$i/s/$i/$k/g" f1 >f3;done

output

for i in `awk 'print $1' f2`; do k=`awk -v i="$i" '$1==i print $2' f2`;sed "/$i/s/$i/$k/g" f1 >f3;done


GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

answered Apr 7 at 13:47

Praveen Kumar BS

1,7641311

Tested with below command and worked fine

for i in `awk 'print $1' f2`; do k=`awk -v i="$i" '$1==i print $2' f2`;sed "/$i/s/$i/$k/g" f1 >f3;done

output

for i in `awk 'print $1' f2`; do k=`awk -v i="$i" '$1==i print $2' f2`;sed "/$i/s/$i/$k/g" f1 >f3;done


GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7

answered Apr 7 at 13:47

Praveen Kumar BS

1,7641311

answered Apr 7 at 13:47

Praveen Kumar BS

1,7641311

answered Apr 7 at 13:47

Praveen Kumar BS

1,7641311

answered Apr 7 at 13:47

Praveen Kumar BS

1,7641311

add a comment |

BhushanDhamale is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

BhushanDhamale is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Hfrhyu

3 Answers
3

Explanation

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Explanation

Explanation

Explanation

Explanation

Post as a guest

Popular posts from this blog

Ромео және Джульетта Мазмұны Қысқаша сипаттамасы Кейіпкерлері Кино Дереккөздер Бағыттау мәзірі

3 Answers 3

Explanation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Explanation

Explanation

Explanation

Explanation

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Ромео және Джульетта Мазмұны Қысқаша сипаттамасы Кейіпкерлері Кино Дереккөздер Бағыттау мәзірі

3 Answers
3

3 Answers
3

3 Answers
3