R Language
I / O विदेशी तालिकाओं के लिए (Excel, SAS, SPSS, Stata)

apache-spark C++ HTML Java Language JavaScript latex GNU/Linux Python Language Regular Expressions SQL

रियो के साथ डेटा आयात करना

कई सामान्य फ़ाइल स्वरूपों से डेटा आयात करने का एक बहुत ही सरल तरीका है rio । यह पैकेज एक फ़ंक्शन import() प्रदान करता है जो कई सामान्य रूप से उपयोग किए जाने वाले डेटा आयात फ़ंक्शन को लपेटता है, जिससे एक मानक इंटरफ़ेस प्रदान होता है। यह import() करने के लिए फ़ाइल नाम या URL पास करके बस काम करता है import() :

import("example.csv")       # comma-separated values
import("example.tsv")       # tab-separated values
import("example.dta")       # Stata
import("example.sav")       # SPSS
import("example.sas7bdat")  # SAS
import("example.xlsx")      # Excel

import() संकुचित निर्देशिकाओं, URL (HTTP या HTTPS), और क्लिपबोर्ड से भी पढ़ा जा सकता है। सभी समर्थित फ़ाइल स्वरूपों की एक व्यापक सूची rio पैकेज github रिपॉजिटरी पर उपलब्ध है ।

विशिष्ट फ़ाइल स्वरूप से संबंधित कुछ और मापदंडों को निर्दिष्ट करना संभव है, जिन्हें आप सीधे import() फ़ंक्शन में पास कर रहे हैं:

import("example.csv", format = ",") #for csv file where comma is used as separator
import("example.csv", format = ";") #for csv file where semicolon is used as separator

एक्सेल फाइल आयात करना

एक्सेल फ़ाइलों को पढ़ने के लिए कई आर पैकेज हैं, जिनमें से प्रत्येक को विभिन्न भाषाओं या संसाधनों का उपयोग करके, जैसा कि निम्नलिखित तालिका में संक्षेप में प्रस्तुत किया गया है:

आर पैकेज	उपयोग
xlsx	जावा
XLconnect	जावा
openxlsx	सी ++
readxl	सी ++
RODBC	ODBC
GData	पर्ल

जावा या ODBC का उपयोग करने वाले पैकेजों के लिए अपने सिस्टम के बारे में विवरण जानना महत्वपूर्ण है क्योंकि आपके R संस्करण और OS के आधार पर आपके पास संगतता समस्याएँ हो सकती हैं। उदाहरण के लिए, यदि आप R 64 बिट्स का उपयोग कर रहे हैं तो आपके पास xlsx या XLconnect का उपयोग करने के लिए जावा 64 बिट्स भी होने चाहिए।

प्रत्येक पैकेज के साथ एक्सेल फाइल पढ़ने के कुछ उदाहरण नीचे दिए गए हैं। ध्यान दें कि कई पैकेजों में समान या समान फ़ंक्शन नाम होते हैं। इसलिए, पैकेज को स्पष्ट रूप से बताना उपयोगी है, जैसे package::function । पैकेज openxlsx को openxlsx की पूर्व स्थापना की आवश्यकता है।

Xlsx पैकेज के साथ एक्सेल फाइल पढ़ना

library(xlsx)

आयात करने के लिए शीट के सूचकांक या नाम की आवश्यकता होती है।

xlsx::read.xlsx("Book1.xlsx", sheetIndex=1)

xlsx::read.xlsx("Book1.xlsx", sheetName="Sheet1")

XLconnect पैकेज के साथ एक्सेल फाइल पढ़ना

library(XLConnect)
wb <- XLConnect::loadWorkbook("Book1.xlsx")

# Either, if Book1.xlsx has a sheet called "Sheet1":
sheet1 <- XLConnect::readWorksheet(wb, "Sheet1")
# Or, more generally, just get the first sheet in Book1.xlsx:
sheet1 <- XLConnect::readWorksheet(wb, getSheets(wb)[1])

XLConnect Book1.xlsx में एम्बेडेड पूर्व-परिभाषित एक्सेल सेल-शैलियों को स्वचालित रूप से आयात करता है। यह तब उपयोगी होता है जब आप अपनी कार्यपुस्तिका ऑब्जेक्ट को प्रारूपित करना चाहते हैं और पूरी तरह से स्वरूपित एक्सेल दस्तावेज़ निर्यात करते हैं। सबसे पहले, आपको Book1.xlsx में वांछित सेल प्रारूप बनाने और उन्हें सहेजने की आवश्यकता होगी, उदाहरण के लिए, myHeader , myBody और myPcts । फिर, कार्यपुस्तिका को R में लोड करने के बाद (ऊपर देखें):

Headerstyle <- XLConnect::getCellStyle(wb, "myHeader")
Bodystyle <- XLConnect::getCellStyle(wb, "myBody")
Pctsstyle <- XLConnect::getCellStyle(wb, "myPcts")

सेल स्टाइल अब आपके R वातावरण में सहेजे गए हैं। अपने डेटा की कुछ श्रेणियों में सेल शैलियों को असाइन करने के लिए, आपको रेंज को परिभाषित करने की आवश्यकता है और फिर शैली असाइन करें:

Headerrange <- expand.grid(row = 1, col = 1:8)
Bodyrange <- expand.grid(row = 2:6, col = c(1:5, 8))
Pctrange <- expand.grid(row = 2:6, col = c(6, 7))

XLConnect::setCellStyle(wb, sheet = "sheet1", row = Headerrange$row,
             col = Headerrange$col, cellstyle = Headerstyle)
XLConnect::setCellStyle(wb, sheet = "sheet1", row = Bodyrange$row,
             col = Bodyrange$col, cellstyle = Bodystyle)
XLConnect::setCellStyle(wb, sheet = "sheet1", row = Pctrange$row,
             col = Pctrange$col, cellstyle = Pctsstyle)

ध्यान दें कि XLConnect आसान है, लेकिन स्वरूपण में बेहद धीमा हो सकता है। एक बहुत तेज, लेकिन अधिक बोझिल स्वरूपण विकल्प openxlsx द्वारा पेश किया openxlsx ।

Openxlsx पैकेज के साथ एक्सेल फाइल पढ़ना

एक्सेल फाइलों को पैकेज openxlsx साथ आयात किया जा सकता है

library(openxlsx)

openxlsx::read.xlsx("spreadsheet1.xlsx", colNames=TRUE, rowNames=TRUE)

#colNames: If TRUE, the first row of data will be used as column names.
#rowNames: If TRUE, first column of data will be used as row names.

शीट, जिसे आर में पढ़ा जाना चाहिए, को sheet तर्क में अपनी स्थिति प्रदान करके चुना जा सकता है:

openxlsx::read.xlsx("spreadsheet1.xlsx", sheet = 1)

या इसका नाम घोषित करके:

openxlsx::read.xlsx("spreadsheet1.xlsx", sheet = "Sheet1")

इसके अतिरिक्त, openxlsx एक रीड शीट में डेट कॉलम का पता लगा सकता है। आदेश तिथियों की स्वचालित पहचान अनुमति देने के लिए, एक तर्क detectDates सेट किया जाना चाहिए TRUE :

openxlsx::read.xlsx("spreadsheet1.xlsx", sheet = "Sheet1", detectDates= TRUE)

रीडक्सल पैकेज के साथ एक्सेल फाइल पढ़ना

एक्सेल फ़ाइलों को readxl पैकेज का उपयोग करके R में डेटा फ्रेम के रूप में आयात किया जा सकता है।

library(readxl)

यह .xls और .xlsx दोनों फाइलों को पढ़ सकता है।

readxl::read_excel("spreadsheet1.xls")
readxl::read_excel("spreadsheet2.xlsx")

आयात की जाने वाली शीट को संख्या या नाम द्वारा निर्दिष्ट किया जा सकता है।

readxl::read_excel("spreadsheet.xls", sheet = 1)
readxl::read_excel("spreadsheet.xls", sheet = "summary")

तर्क col_names = TRUE स्तंभ नामों के रूप में पहली पंक्ति सेट करता है।

 readxl::read_excel("spreadsheet.xls", sheet = 1, col_names = TRUE)

डेटा में स्तंभ प्रकारों को वेक्टर के रूप में निर्दिष्ट करने के लिए तर्क col_types का उपयोग किया जा सकता है।

readxl::read_excel("spreadsheet.xls", sheet = 1, col_names = TRUE,
                   col_types = c("text", "date", "numeric", "numeric"))

RODBC पैकेज के साथ एक्सेल फाइल पढ़ना

Excel फ़ाइलों को ODBC Excel ड्राइवर का उपयोग करके पढ़ा जा सकता है जो विंडोज के एक्सेस डेटाबेस इंजन (ACE) के साथ इंटरफेस करता है, जो पहले JET था। RODBC पैकेज के साथ, R इस ड्राइवर से जुड़ सकता है और सीधे वर्कबुक से क्वेरी कर सकता है। वर्कशीट को समान प्रकार के संगठित कॉलम में डेटा के साथ पहली पंक्ति में कॉलम हेडर बनाए रखने के लिए माना जाता है। नोट: यह दृष्टिकोण केवल विंडोज / पीसी मशीनों तक ही सीमित है क्योंकि JET / ACE स्थापित हैं। सभी फाइलें और अन्य ऑपरेटिंग सिस्टम पर उपलब्ध नहीं हैं।

library(RODBC)

xlconn <- odbcDriverConnect('Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};
                             DBQ=C:\\Path\\To\\Workbook.xlsx')

df <- sqlQuery(xlconn, "SELECT * FROM [SheetName$]")
close(xlconn)

इस दृष्टिकोण में एक एसक्यूएल इंजन के साथ जुड़कर, एक्सेल वर्कशीट को JOIN और UNION ऑपरेशंस सहित डेटाबेस टेबल्स के समान क्वियर किया जा सकता है। सिंटेक्स JET / ACE SQL बोली का अनुसरण करता है। ध्यान दें: केवल डेटा एक्सेस डीएमएल स्टेटमेंट्स, विशेष रूप से SELECT को कार्यपुस्तिकाओं पर चलाया जा सकता है, जिन्हें अद्यतन करने योग्य प्रश्न नहीं माना जाता है।

joindf <-  sqlQuery(xlconn, "SELECT t1.*, t2.* FROM [Sheet1$] t1
                             INNER JOIN [Sheet2$] t2
                             ON t1.[ID] = t2.[ID]")

uniondf <-  sqlQuery(xlconn, "SELECT * FROM [Sheet1$]
                              UNION  
                              SELECT * FROM [Sheet2$]")

यहां तक कि अन्य कार्यपुस्तिकाओं को उसी ODBC चैनल से वर्तमान कार्यपुस्तिका की ओर संकेत किया जा सकता है:

otherwkbkdf <- sqlQuery(xlconn, "SELECT * FROM 
                                 [Excel 12.0 Xml;HDR=Yes;
                                 Database=C:\\Path\\To\\Other\\Workbook.xlsx].[Sheet1$];")

Gdata पैकेज के साथ एक्सेल फाइल पढ़ना

यहाँ उदाहरण है

Stata, SPSS और SAS फाइलें पढ़ें और लिखें

foreign और haven का उपयोग स्टाटा, एसपीएसएस और एसएएस और संबंधित सॉफ्टवेयर जैसे विभिन्न सांख्यिकीय पैकेजों से फाइलों को आयात और निर्यात करने के लिए किया जा सकता है। फ़ाइलों को आयात करने के लिए समर्थित डेटा प्रकारों में से प्रत्येक के लिए एक read फंक्शन है।

# loading the packages
library(foreign)
library(haven)
library(readstata13)
library(Hmisc)

सबसे आम डेटा प्रकारों के लिए कुछ उदाहरण:

# reading Stata files with `foreign`
read.dta("path\to\your\data")
# reading Stata files with `haven`
read_dta("path\to\your\data")

foreign पैकेज स्टाटा (.dta) फाइलों में स्टाटा 7-12 के संस्करणों के लिए पढ़ सकता है। विकास पृष्ठ के अनुसार, read.dta अधिक या कम जमे हुए है और 13+ संस्करणों में पढ़ने के लिए अद्यतन नहीं किया जाएगा। Stata के अधिक हाल के संस्करणों के लिए, आप या तो readstata13 पैकेज या haven उपयोग कर सकते हैं। readstata13 , फाइलें हैं

# reading recent Stata (13+) files with `readstata13`
read.dta13("path\to\your\data")

एसपीएसएस और एसएएस फाइलों में पढ़ने के लिए

# reading SPSS files with `foreign`
read.spss("path\to\your\data.sav", to.data.frame = TRUE)
# reading SPSS files with `haven`
read_spss("path\to\your\data.sav")
read_sav("path\to\your\data.sav")
read_por("path\to\your\data.por")

# reading SAS files with `foreign`
read.ssd("path\to\your\data")
# reading SAS files with `haven`
read_sas("path\to\your\data")
# reading native SAS files with `Hmisc`
sas.get("path\to\your\data")   #requires access to saslib 
# Reading SA XPORT format ( *.XPT ) files
sasxport.get("path\to\your\data.xpt")  # does not require access to SAS executable

SAScii पैकेज फ़ंक्शन प्रदान करता है जो SAS SET आयात कोड को स्वीकार करेगा और एक पाठ फ़ाइल का निर्माण करेगा जिसे read.fwf साथ संसाधित किया जा सकता है। बड़े सार्वजनिक-जारी डेटासेट के आयात के लिए यह बहुत मजबूत साबित हुआ है। समर्थन https://github.com/ajdamico/SAScii पर है

डेटा फ़्रेमों को अन्य सांख्यिकीय पैकेजों में निर्यात करने के लिए आप लिखने के कार्यों का उपयोग कर सकते हैं write.foreign() । यह 2 फाइलें लिखेगा, जिसमें एक डेटा होगा और एक जिसमें अन्य निर्देश होंगे जिसमें डेटा को पढ़ने के लिए दूसरे पैकेज की जरूरत होगी।

# writing to Stata, SPSS or SAS files with `foreign`
write.foreign(dataframe, datafile, codefile,
              package = c("SPSS", "Stata", "SAS"), ...)
write.foreign(dataframe, "path\to\data\file", "path\to\instruction\file", package = "Stata")

# writing to Stata files with `foreign`
write.dta(dataframe, "file", version = 7L,
          convert.dates = TRUE, tz = "GMT",
          convert.factors = c("labels", "string", "numeric", "codes"))

# writing to Stata files with `haven`
write_dta(dataframe, "path\to\your\data")

# writing to Stata files with `readstata13`
save.dta13(dataframe, file, data.label = NULL, time.stamp = TRUE,
  convert.factors = TRUE, convert.dates = TRUE, tz = "GMT",
  add.rownames = FALSE, compress = FALSE, version = 117,
  convert.underscore = FALSE)

# writing to SPSS files with `haven`
write_sav(dataframe, "path\to\your\data")

SPSS द्वारा संग्रहीत फ़ाइल को read.spss साथ इस तरह भी पढ़ा जा सकता है:

 foreign::read.spss('data.sav', to.data.frame=TRUE, use.value.labels=FALSE, 
                     use.missings=TRUE, reencode='UTF-8')
# to.data.frame if TRUE: return a data frame
# use.value.labels if TRUE: convert variables with value labels into R factors with those levels
# use.missings if TRUE: information on user-defined missing values will used to set the corresponding values to NA.
# reencode character strings will be re-encoded to the current locale. The default, NA, means to do so in a UTF-8 locale, only.

पंख फ़ाइल का आयात या निर्यात

फीदर अपाचे एरो का कार्यान्वयन है, जो मेटाडेटा (जैसे डेट क्लासेस) को बनाए रखते हुए भाषा के रूप में डेटा फ़्रेमों को स्टोर करने के लिए डिज़ाइन किया गया है, पायथन और आर के बीच बढ़ती हुई अंतर्संबंधता है। पंख फ़ाइल पढ़ना एक मानक डेटा नहीं है।

library(feather)

path <- "filename.feather"
df <- mtcars

write_feather(df, path)

df2 <- read_feather(path)

head(df2)
##  A tibble: 6 x 11
##     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  21.0     6   160   110  3.90 2.620 16.46     0     1     4     4
## 2  21.0     6   160   110  3.90 2.875 17.02     0     1     4     4
## 3  22.8     4   108    93  3.85 2.320 18.61     1     1     4     1
## 4  21.4     6   258   110  3.08 3.215 19.44     1     0     3     1
## 5  18.7     8   360   175  3.15 3.440 17.02     0     0     3     2
## 6  18.1     6   225   105  2.76 3.460 20.22     1     0     3     1

head(df)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

वर्तमान प्रलेखन में यह चेतावनी है:

उपयोगकर्ताओं को ध्यान दें: पंख को अल्फा सॉफ्टवेयर के रूप में माना जाना चाहिए। विशेष रूप से, फ़ाइल प्रारूप आने वाले वर्ष में विकसित होने की संभावना है। लंबी अवधि के डेटा भंडारण के लिए पंख का उपयोग न करें।

Modified text is an extract of the original Stack Overflow Documentation

के तहत लाइसेंस प्राप्त है CC BY-SA 3.0

से संबद्ध नहीं है Stack Overflow