awk => Manipolazione di riga

Estrai linee specifiche da un file di testo

Supponiamo di avere un file

cat -n lorem_ipsum.txt
 1    Lorem Ipsum is simply dummy text of the printing and typesetting industry.
 2    Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.
 3    It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
 4    It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum

Vogliamo estrarre le linee 2 e 3 da questo file

awk 'NR==2,NR==3' lorem_ipsum.txt

Questo stamperà le righe 2 e 3:

 2    Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.
 3    It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.

Estrarre colonna / campo specifici dalla linea specifica

Se hai il seguente file di dati

cat data.csv
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50

forse hai bisogno di leggere la quarta colonna della terza riga, questo sarebbe "24"

awk 'NR==3 { print $4 }' data.csv

dà

24

Modifica immediata delle righe (ad esempio per correggere i line-end di Windows)

Se un file può contenere terminazioni di linea simili a Windows o Unix (o anche una combinazione di entrambi), la sostituzione di testo prevista potrebbe non funzionare come previsto.

Campione:

$ echo -e 'Entry 1\nEntry 2.1\tEntry 2.2\r\nEntry 3\r\n\r\n' \
> | awk -F'\t' '$1 != "" { print $1 }' \
> | hexdump -c
0000000   E   n   t   r   y       1  \n   E   n   t   r   y       2   .
0000010   1  \n   E   n   t   r   y       3  \r  \n  \r  \n            
000001d

Questo può essere facilmente risolto con una regola aggiuntiva che viene inserita all'inizio dello script awk:

/\r$/ { $0 = substr($0, 1, length($0) - 1) }

Poiché l'azione non finisce con il next , le seguenti regole vengono applicate come prima.

Esempio (con correzione delle terminazioni di riga):

$ echo -e 'Entry 1\nEntry 2.1\tEntry 2.2\r\nEntry 3\r\n\r\n' \
> | awk -F'\t' '/\r$/ { $0 = substr($0, 1, length($0) - 1) } $1 != "" { print $1 }' \
> | hexdump -c
0000000   E   n   t   r   y       1  \n   E   n   t   r   y       2   .
0000010   1  \n   E   n   t   r   y       3  \n                        
000001a

Modified text is an extract of the original Stack Overflow Documentation

Autorizzato sotto CC BY-SA 3.0

Non affiliato con Stack Overflow

awk
Manipolazione di riga

Ricerca…

Estrai linee specifiche da un file di testo

Estrarre colonna / campo specifici dalla linea specifica

Modifica immediata delle righe (ad esempio per correggere i line-end di Windows)