tensorflow => Leggere i dati

Conta esempi nel file CSV

import tensorflow as tf
filename_queue = tf.train.string_input_producer(["file.csv"], num_epochs=1)
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
col1, col2 = tf.decode_csv(value, record_defaults=[[0], [0]])

with tf.Session() as sess:
  sess.run(tf.initialize_local_variables())
  tf.train.start_queue_runners()
  num_examples = 0
  try:
    while True:
      c1, c2 = sess.run([col1, col2])
      num_examples += 1
  except tf.errors.OutOfRangeError:
    print "There are", num_examples, "examples"

num_epochs=1 rende la coda string_input_producer da chiudere dopo l'elaborazione di ciascun file nell'elenco una volta. Porta a sollevare OutOfRangeError che è preso in try: Per impostazione predefinita, string_input_producer produce i nomi dei file in modo infinito.

tf.initialize_local_variables() è un Op di tensorflow, che, una volta eseguito, inizializza num_epoch variabile locale all'interno di string_input_producer .

tf.train.start_queue_runners() avvia gradini aggiuntivi che gestiscono l'aggiunta di dati alle code in modo asincrono.

Leggi e analizza il file TFRecord

I file TFRecord sono il formato binario tensorflow nativo per la memorizzazione dei dati (tensori). Per leggere il file è possibile utilizzare un codice simile all'esempio CSV:

import tensorflow as tf
filename_queue = tf.train.string_input_producer(["file.tfrecord"], num_epochs=1)
reader = tf.TFRecordReader()
key, serialized_example = reader.read(filename_queue)

Quindi, è necessario analizzare gli esempi dalla coda serialized_example . Puoi farlo usando tf.parse_example , che richiede il tf.parse_example precedente, ma è più veloce o tf.parse_single_example :

batch = tf.train.batch([serialized_example], batch_size=100)
parsed_batch = tf.parse_example(batch, features={
  "feature_name_1": tf.FixedLenFeature(shape=[1], tf.int64),
  "feature_name_2": tf.FixedLenFeature(shape=[1], tf.float32)
})

tf.train.batch unisce i valori consecutivi di determinati tensori di forma [x, y, z] a tensori di forma [batch_size, x, y, z] . features mappe dict dei nomi delle funzionalità delle definizioni di funzionalità di tensorflow. Si utilizza parse_single_example in modo simile:

parsed_example = tf.parse_single_example(serialized_example, {
  "feature_name_1": tf.FixedLenFeature(shape=[1], tf.int64),
  "feature_name_2": tf.FixedLenFeature(shape=[1], tf.float32)
})

tf.parse_example e tf.parse_single_example restituiscono un dizionario che associa i nomi delle funzioni al tensore con i valori.

Per esempi di batch provenienti da parse_single_example dovresti estrarre i tensori dal dict e usare tf.train.batch come prima:

parsed_batch = dict(zip(parsed_example.keys(),
    tf.train.batch(parsed_example.values(), batch_size=100)

Leggi i dati come prima, passando l'elenco di tutti i tensori da valutare su sess.run :

with tf.Session() as sess:
  sess.run(tf.initialize_local_variables())
  tf.train.start_queue_runners()
  try:
    while True:
      data_batch = sess.run(parsed_batch.values())
      # process data
  except tf.errors.OutOfRangeError:
    pass

Casuale mescolare gli esempi

Per tf.train.shuffle_batch casualmente gli esempi, è possibile utilizzare la funzione tf.train.shuffle_batch anziché tf.train.batch , come segue:

parsed_batch = tf.train.shuffle_batch([serialized_example],
    batch_size=100, capacity=1000,
    min_after_dequeue=200)

tf.train.shuffle_batch (così come tf.train.batch ) crea un tf.Queue e continua ad aggiungervi serialized_examples .

capacity misura quanti elementi possono essere memorizzati in coda in una volta. Una maggiore capacità porta a un maggiore utilizzo della memoria, ma una minore latenza causata da thread in attesa di riempirlo.

min_after_dequeue è il numero minimo di elementi presenti nella coda dopo aver ottenuto gli elementi da esso. La coda shuffle_batch non mescola gli elementi in modo perfettamente uniforme: è progettata con enormi dati, non adatti alla memoria, in mente. Invece, legge tra min_after_dequeue e gli elementi di capacity , li memorizza in memoria e sceglie casualmente un gruppo di essi. Dopo di che accoda alcuni più elementi, per mantenere il suo numero tra min_after_dequeue e capacity . Quindi, il valore più grande di min_after_dequeue , più elementi casuali sono - la scelta di elementi batch_size è garantita da almeno min_after_dequeue elementi consecutivi, ma la maggiore capacity deve essere e più tempo ci vuole per riempire la coda inizialmente.

Lettura dei dati per n epoche con il batching

Supponi che i tuoi esempi di dati siano già stati letti sulla variabile python e vorresti leggerlo n volte, in lotti di dimensioni determinate:

import numpy as np
import tensorflow as tf
data = np.array([1, 2, 3, 4, 5])
n = 4

Per unire i dati in lotti, possibilmente con casuale mescolamento, puoi usare tf.train.batch o tf.train.batch_shuffle , ma devi passare ad esso il tensore che produrrebbe interi dati n volte:

limited_tensor = tf.train.limit_epochs(data, n)
batch = tf.train.shuffle_batch([limited_tensor], batch_size=3, enqueue_many=True, capacity=4)

Il limit_epochs converte la matrice numpy in tensor sotto il cofano e restituisce un tensore che produce n volte e poi lancia un OutOfRangeError. L' enqueue_many=True argomento enqueue_many=True passato a shuffle_batch denota che ogni tensore nella lista tensoriale [limited_tensor] deve essere interpretato come contenente un numero di esempi. Si noti che la capacità della coda di batching può essere inferiore al numero di esempi nel tensore.

Uno può elaborare i dati come al solito:

with tf.Session() as sess:
  sess.run(tf.initialize_local_variables())
  tf.train.start_queue_runners()
  try:
    while True:
      data_batch = sess.run(batch)
      # process data
  except tf.errors.OutOfRangeError:
    pass

Come caricare immagini ed etichette da un file TXT

Non è stato spiegato nella documentazione di Tensorflow come caricare immagini ed etichette direttamente da un file TXT. Il codice seguente illustra come l'ho raggiunto. Tuttavia, ciò non significa che sia il modo migliore per farlo e che in questo modo sarà di aiuto in ulteriori passi.

Ad esempio, sto caricando le etichette in un unico valore intero {0,1} mentre la documentazione usa un vettore ad alta temperatura [0,1].

# Learning how to import images and labels from a TXT file
#
# TXT file format
#
# path/to/imagefile_1 label_1
# path/to/imagefile_2 label_2
# ...                 ...
#
# where label_X is either {0,1}

#Importing Libraries
import os
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.python.framework import ops
from tensorflow.python.framework import dtypes

#File containing the path to images and the labels [path/to/images label]
filename = '/path/to/List.txt'

#Lists where to store the paths and labels
filenames = []
labels = []

#Reading file and extracting paths and labels
with open(filename, 'r') as File:
    infoFile = File.readlines() #Reading all the lines from File
    for line in infoFile: #Reading line-by-line
        words = line.split() #Splitting lines in words using space character as separator
        filenames.append(words[0])
        labels.append(int(words[1]))

NumFiles = len(filenames)

#Converting filenames and labels into tensors
tfilenames = ops.convert_to_tensor(filenames, dtype=dtypes.string)
tlabels = ops.convert_to_tensor(labels, dtype=dtypes.int32)

#Creating a queue which contains the list of files to read and the value of the labels
filename_queue = tf.train.slice_input_producer([tfilenames, tlabels], num_epochs=10, shuffle=True, capacity=NumFiles)

#Reading the image files and decoding them
rawIm= tf.read_file(filename_queue[0])
decodedIm = tf.image.decode_png(rawIm) # png or jpg decoder

#Extracting the labels queue
label_queue = filename_queue[1]

#Initializing Global and Local Variables so we avoid warnings and errors
init_op = tf.group(tf.local_variables_initializer() ,tf.global_variables_initializer())

#Creating an InteractiveSession so we can run in iPython
sess = tf.InteractiveSession()

with sess.as_default():
    sess.run(init_op)
    
    # Start populating the filename queue.
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    for i in range(NumFiles): #length of your filenames list
        nm, image, lb = sess.run([filename_queue[0], decodedIm, label_queue])
        
        print image.shape
        print nm
        print lb
        
        #Showing the current image
        plt.imshow(image)
        plt.show()

    coord.request_stop()
    coord.join(threads)

Modified text is an extract of the original Stack Overflow Documentation

Autorizzato sotto CC BY-SA 3.0

Non affiliato con Stack Overflow

tensorflow
Leggere i dati

Ricerca…

Conta esempi nel file CSV

Leggi e analizza il file TFRecord

Casuale mescolare gli esempi

Lettura dei dati per n epoche con il batching

Come caricare immagini ed etichette da un file TXT