pandas => IO per Google BigQuery

Lettura dei dati da BigQuery con le credenziali dell'account utente

In [1]: import pandas as pd

Per eseguire una query in BigQuery devi avere il tuo progetto BigQuery. Possiamo richiedere alcuni dati di esempio pubblici:

In [2]: data = pd.read_gbq('''SELECT title, id, num_characters
   ...:                       FROM [publicdata:samples.wikipedia]
   ...:                       LIMIT 5'''
   ...:                    , project_id='<your-project-id>')

Questo verrà stampato:

Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/v2/auth...[looong url cutted]

If your browser is on a different machine then exit and re-run this
application with the command-line parameter

  --noauth_local_webserver

Se stai operando da un computer locale, verrà visualizzato il browser. Dopo aver concesso i privilegi, i panda continueranno con l'output:

Authentication successful.
Requesting query... ok.
Query running...
Query done.
Processed: 13.8 Gb

Retrieving results...
Got 5 rows.

Total time taken 1.5 s.
Finished at 2016-08-23 11:26:03.

Risultato:

In [3]: data
Out[3]: 
               title       id  num_characters
0       Fusidic acid   935328            1112
1     Clark Air Base   426241            8257
2  Watergate scandal    52382           25790
3               2005    35984           75813
4               .BLP  2664340            1659

Come effetto collaterale, i panda creeranno il file json bigquery_credentials.dat che ti permetterà di eseguire ulteriori query senza bisogno di concedere più privilegi:

In [9]: pd.read_gbq('SELECT count(1) cnt FROM [publicdata:samples.wikipedia]'
                   , project_id='<your-project-id>')
Requesting query... ok.
[rest of output cutted]

Out[9]: 
         cnt
0  313797035

Lettura dei dati da BigQuery con le credenziali dell'account di servizio

Se hai creato un account di servizio e disponi di un file json per la chiave privata, puoi utilizzare questo file per autenticarti con i panda

In [5]: pd.read_gbq('''SELECT corpus, sum(word_count) words
                       FROM [bigquery-public-data:samples.shakespeare]       
                       GROUP BY corpus                                
                       ORDER BY words desc
                       LIMIT 5'''
                   , project_id='<your-project-id>'
                   , private_key='<private key json contents or file path>')
Requesting query... ok.
[rest of output cutted]

Out[5]: 
           corpus  words
0          hamlet  32446
1  kingrichardiii  31868
2      coriolanus  29535
3       cymbeline  29231
4    2kinghenryiv  28241

Modified text is an extract of the original Stack Overflow Documentation

Autorizzato sotto CC BY-SA 3.0

Non affiliato con Stack Overflow

pandas
IO per Google BigQuery

Ricerca…

Lettura dei dati da BigQuery con le credenziali dell'account utente

Lettura dei dati da BigQuery con le credenziali dell'account di servizio