hive
उपयोगकर्ता परिभाषित सकल कार्य (UDAF)

apache-spark Java Language MySQL Oracle Database Python Language R Language Regular Expressions Scala Language SQL Microsoft SQL Server

UDAF का मतलब उदाहरण है

एक जावा वर्ग जो फैली बनाएं org.apache.hadoop.hive.ql.exec.hive.UDAF एक आंतरिक वर्ग जो औजार बनाएं UDAFEvaluator
पांच विधियों को लागू करें
- init() - यह विधि मूल्यांकनकर्ता को प्रारंभ करती है और इसकी आंतरिक स्थिति को रीसेट करती है। हम नीचे दिए गए कोड में नए कॉलम () का उपयोग कर रहे हैं ताकि यह इंगित किया जा सके कि अभी तक कोई मान एकत्र नहीं किया गया है।
- iterate() - इस विधि को हर बार कहा जाता है कि एकत्रित होने का एक नया मूल्य है। मूल्यांकनकर्ता को एकत्रीकरण करने के परिणाम के साथ अपनी आंतरिक स्थिति को अपडेट करना चाहिए (हम योग कर रहे हैं - नीचे देखें)। हम यह बताने के लिए सही हैं कि इनपुट वैध था।
- terminatePartial() - इस विधि को कहा जाता है जब हाइव आंशिक एकत्रीकरण के लिए एक परिणाम चाहता है। विधि को ऑब्जेक्ट को वापस करना चाहिए जो एकत्रीकरण की स्थिति को घेरता है।
- merge() - इस विधि को कहा जाता है जब हाइव एक आंशिक एकत्रीकरण को दूसरे के साथ संयोजित करने का निर्णय लेता है।
- terminate() - इस पद्धति को कहा जाता है जब एकत्रीकरण के अंतिम परिणाम की आवश्यकता होती है।

    public class MeanUDAF extends UDAF {
    // Define Logging
    static final Log LOG = LogFactory.getLog(MeanUDAF.class.getName());
    public static class MeanUDAFEvaluator implements UDAFEvaluator {
    /**
     * Use Column class to serialize intermediate computation
     * This is our groupByColumn
     */
    public static class Column {
     double sum = 0;
     int count = 0;
     }
    private Column col = null;
    public MeanUDAFEvaluator() {
     super();
     init();
     }
    // A - Initalize evaluator - indicating that no values have been
    // aggregated yet.
    public void init() {
     LOG.debug("Initialize evaluator");
     col = new Column();
     }
    // B- Iterate every time there is a new value to be aggregated
     public boolean iterate(double value) throws HiveException {
     LOG.debug("Iterating over each value for aggregation");
     if (col == null)
     throw new HiveException("Item is not initialized");
     col.sum = col.sum + value;
     col.count = col.count + 1;
     return true;
     }
    // C - Called when Hive wants partially aggregated results.
     public Column terminatePartial() {
     LOG.debug("Return partially aggregated results");
     return col;
     }
     // D - Called when Hive decides to combine one partial aggregation with another
     public boolean merge(Column other) {
     LOG.debug("merging by combining partial aggregation");
     if(other == null) {
     return true;
     }
     col.sum += other.sum;
     col.count += other.count;
     return true; 
    }
     // E - Called when the final result of the aggregation needed.
     public double terminate(){
     LOG.debug("At the end of last record of the group - returning final result"); 
     return col.sum/col.count;
     }
     }
    }


    hive> CREATE TEMPORARY FUNCTION <FUNCTION NAME> AS 'JAR PATH.jar';
    hive> select id, mean_udf(amount) from table group by id;

Modified text is an extract of the original Stack Overflow Documentation

के तहत लाइसेंस प्राप्त है CC BY-SA 3.0

से संबद्ध नहीं है Stack Overflow