tensorflow => डेटा पढ़ना

सीएसवी फ़ाइल में उदाहरणों की गणना करें

import tensorflow as tf
filename_queue = tf.train.string_input_producer(["file.csv"], num_epochs=1)
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
col1, col2 = tf.decode_csv(value, record_defaults=[[0], [0]])

with tf.Session() as sess:
  sess.run(tf.initialize_local_variables())
  tf.train.start_queue_runners()
  num_examples = 0
  try:
    while True:
      c1, c2 = sess.run([col1, col2])
      num_examples += 1
  except tf.errors.OutOfRangeError:
    print "There are", num_examples, "examples"

num_epochs=1 string_input_producer को सूची में प्रत्येक फ़ाइल को एक बार संसाधित करने के बाद बंद करने के लिए बनाता है। यह OutOfRangeError को बढ़ाने की ओर ले जाता है जो कि try: में पकड़ा जाता है:। डिफ़ॉल्ट रूप से, string_input_producer फ़ाइल नाम को असीम रूप से उत्पन्न करता है।

tf.initialize_local_variables() एक टेंसरफ़्लो ओप है, जिसे निष्पादित करते समय, num_epoch अंदर string_input_producer स्थानीय चर को string_input_producer ।

tf.train.start_queue_runners() अतिरिक्त treads शुरू करें जो कतारों में डेटा को अतुल्यकालिक रूप से जोड़ने का काम करते हैं।

पढ़ें और पार्स TFRecord फ़ाइल

TFRecord फाइलें डेटा (टेनसर्स) को संग्रहीत करने के लिए मूल टेंसोरफ्लो बाइनरी प्रारूप है। फ़ाइल को पढ़ने के लिए आप CSV उदाहरण के समान कोड का उपयोग कर सकते हैं:

import tensorflow as tf
filename_queue = tf.train.string_input_producer(["file.tfrecord"], num_epochs=1)
reader = tf.TFRecordReader()
key, serialized_example = reader.read(filename_queue)

फिर, आपको serialized_example Queue से उदाहरणों को पार्स करने की आवश्यकता है। आप इसे या तो tf.parse_example का उपयोग कर सकते हैं, जिसके लिए पिछली बैचिंग की आवश्यकता होती है, लेकिन तेज़ या tf.parse_single_example :

batch = tf.train.batch([serialized_example], batch_size=100)
parsed_batch = tf.parse_example(batch, features={
  "feature_name_1": tf.FixedLenFeature(shape=[1], tf.int64),
  "feature_name_2": tf.FixedLenFeature(shape=[1], tf.float32)
})

tf.train.batch आकार की दी गई tensors के लगातार मूल्यों मिलती है [x, y, z] आकार के tensors के लिए [batch_size, x, y, z] । features तानाशाह विशेषताओं के लिए सुविधाओं के तानाशाह नक्शे के नाम सुविधाएँ । आप समान तरीके से parse_single_example का उपयोग करते हैं:

parsed_example = tf.parse_single_example(serialized_example, {
  "feature_name_1": tf.FixedLenFeature(shape=[1], tf.int64),
  "feature_name_2": tf.FixedLenFeature(shape=[1], tf.float32)
})

tf.parse_example और tf.parse_single_example मानों के साथ एक शब्दकोश मानचित्रण सुविधा नाम टेंसर पर tf.parse_single_example हैं।

parse_single_example से आने वाले उदाहरणों को parse_single_example आपको parse_single_example को parse_single_example से parse_single_example चाहिए और इससे पहले tf.train.batch उपयोग करना चाहिए:

parsed_batch = dict(zip(parsed_example.keys(),
    tf.train.batch(parsed_example.values(), batch_size=100)

आप पहले की तरह डेटा पढ़ते हैं, sess.run का मूल्यांकन करने के लिए सभी sess.run की सूची को पास करते हैं:

with tf.Session() as sess:
  sess.run(tf.initialize_local_variables())
  tf.train.start_queue_runners()
  try:
    while True:
      data_batch = sess.run(parsed_batch.values())
      # process data
  except tf.errors.OutOfRangeError:
    pass

रैंडम फेरबदल उदाहरण हैं

बेतरतीब ढंग से उदाहरण शफ़ल करने के लिए, आप उपयोग कर सकते tf.train.shuffle_batch के बजाय समारोह tf.train.batch इस प्रकार है:

parsed_batch = tf.train.shuffle_batch([serialized_example],
    batch_size=100, capacity=1000,
    min_after_dequeue=200)

tf.train.shuffle_batch (साथ ही tf.train.batch ) एक tf.Queue बनाता है और इसमें serialized_examples जोड़े रखता है।

capacity मापती है कि एक समय में कितने तत्व क्यू में संग्रहीत किए जा सकते हैं। बड़ी क्षमता से बड़ी मेमोरी का उपयोग होता है, लेकिन थ्रेड्स के कारण कम विलंबता इसे भरने के लिए इंतजार कर रही है।

min_after_dequeue से तत्व प्राप्त करने के बाद कतार में मौजूद तत्वों की न्यूनतम संख्या है। shuffle_batch कतार पूरी तरह से समान रूप से तत्वों को फेरबदल नहीं कर रही है - यह विशाल डेटा के साथ डिज़ाइन किया गया है, न कि फिटिंग-मेमोरी एक, मन में। इसके बजाय, यह min_after_dequeue और capacity तत्वों के बीच पढ़ता है, उन्हें मेमोरी में संग्रहीत करता है और बेतरतीब ढंग से उनमें से एक बैच चुनता है। उसके बाद यह कुछ और तत्वों की गणना करता है, इसकी संख्या को min_after_dequeue और capacity बीच रखने के लिए। इस प्रकार, का बड़ा मूल्य min_after_dequeue , और अधिक यादृच्छिक तत्व हैं - के चुनाव batch_size तत्वों कम से कम से लिया जाना निश्चित है min_after_dequeue लगातार तत्वों, लेकिन बड़ा capacity हो गया है और अब इसे शुरू में कतार को भरने के लिए ले जाता है।

बैचिंग के साथ एन युगों के लिए डेटा पढ़ना

मान लें कि आपके डेटा उदाहरण पहले से ही एक अजगर के चर में पढ़े गए हैं और आप इसे दिए गए बैचों में n बार पढ़ना चाहेंगे:

import numpy as np
import tensorflow as tf
data = np.array([1, 2, 3, 4, 5])
n = 4

बैचों में डेटा मर्ज करने के लिए, संभवत: यादृच्छिक फेरबदल के साथ, आप tf.train.batch या tf.train.batch_shuffle उपयोग कर सकते हैं, लेकिन आपको इसके लिए tf.train.batch पास करने की आवश्यकता है जो संपूर्ण डेटा n बार उत्पन्न करेगा:

limited_tensor = tf.train.limit_epochs(data, n)
batch = tf.train.shuffle_batch([limited_tensor], batch_size=3, enqueue_many=True, capacity=4)

limit_epochs हुड के नीचे के दांतेदार व्यूह को limit_epochs परिवर्तित करता है और एक तनु limit_epochs करता है जो इसे n बार उत्पन्न करता है और एक OutOfRangeError को फेंक देता है। enqueue_many=True तर्क को shuffle_batch करने के लिए पारित किया गया है, shuffle_batch दर्शाता है कि [limited_tensor] सूची में प्रत्येक टेंसर [limited_tensor] को कई उदाहरणों के रूप में व्याख्या किया जाना चाहिए। ध्यान दें कि बैचिंग कतार की क्षमता दसियों में उदाहरणों की संख्या से छोटी हो सकती है।

डेटा को हमेशा की तरह संसाधित कर सकते हैं:

with tf.Session() as sess:
  sess.run(tf.initialize_local_variables())
  tf.train.start_queue_runners()
  try:
    while True:
      data_batch = sess.run(batch)
      # process data
  except tf.errors.OutOfRangeError:
    pass

TXT फ़ाइल से चित्र और लेबल कैसे लोड करें

Tensorflow के डॉक्यूमेंट में यह नहीं बताया गया है कि TXT फ़ाइल से सीधे इमेज और लेबल कैसे लोड करते हैं। नीचे दिए गए कोड से पता चलता है कि मैंने इसे कैसे हासिल किया। हालांकि, इसका मतलब यह नहीं है कि यह करने का सबसे अच्छा तरीका है और यह तरीका आगे के कदमों में मदद करेगा।

उदाहरण के लिए, मैं एक एकल पूर्णांक मान {0,1} में लेबल लोड कर रहा हूं, जबकि प्रलेखन एक गर्म वेक्टर [0,1] का उपयोग करता है।

# Learning how to import images and labels from a TXT file
#
# TXT file format
#
# path/to/imagefile_1 label_1
# path/to/imagefile_2 label_2
# ...                 ...
#
# where label_X is either {0,1}

#Importing Libraries
import os
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.python.framework import ops
from tensorflow.python.framework import dtypes

#File containing the path to images and the labels [path/to/images label]
filename = '/path/to/List.txt'

#Lists where to store the paths and labels
filenames = []
labels = []

#Reading file and extracting paths and labels
with open(filename, 'r') as File:
    infoFile = File.readlines() #Reading all the lines from File
    for line in infoFile: #Reading line-by-line
        words = line.split() #Splitting lines in words using space character as separator
        filenames.append(words[0])
        labels.append(int(words[1]))

NumFiles = len(filenames)

#Converting filenames and labels into tensors
tfilenames = ops.convert_to_tensor(filenames, dtype=dtypes.string)
tlabels = ops.convert_to_tensor(labels, dtype=dtypes.int32)

#Creating a queue which contains the list of files to read and the value of the labels
filename_queue = tf.train.slice_input_producer([tfilenames, tlabels], num_epochs=10, shuffle=True, capacity=NumFiles)

#Reading the image files and decoding them
rawIm= tf.read_file(filename_queue[0])
decodedIm = tf.image.decode_png(rawIm) # png or jpg decoder

#Extracting the labels queue
label_queue = filename_queue[1]

#Initializing Global and Local Variables so we avoid warnings and errors
init_op = tf.group(tf.local_variables_initializer() ,tf.global_variables_initializer())

#Creating an InteractiveSession so we can run in iPython
sess = tf.InteractiveSession()

with sess.as_default():
    sess.run(init_op)
    
    # Start populating the filename queue.
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    for i in range(NumFiles): #length of your filenames list
        nm, image, lb = sess.run([filename_queue[0], decodedIm, label_queue])
        
        print image.shape
        print nm
        print lb
        
        #Showing the current image
        plt.imshow(image)
        plt.show()

    coord.request_stop()
    coord.join(threads)

Modified text is an extract of the original Stack Overflow Documentation

के तहत लाइसेंस प्राप्त है CC BY-SA 3.0

से संबद्ध नहीं है Stack Overflow

tensorflow
डेटा पढ़ना

खोज…

सीएसवी फ़ाइल में उदाहरणों की गणना करें

पढ़ें और पार्स TFRecord फ़ाइल

रैंडम फेरबदल उदाहरण हैं

बैचिंग के साथ एन युगों के लिए डेटा पढ़ना

TXT फ़ाइल से चित्र और लेबल कैसे लोड करें