MXNet基本指南

打开mxnet source activate gluon # 注意Windows下不需要 source
退出环境 source deactivate
GPU版本进入环境后如果用指定的卡可以 CUDA_VISIBLE_DEVICES=2 python，这样数据只能分配在一个GPU上
可以尝试将数据全部放进内存，如果是不规则数据集，numpy处理不了可以用python自带的数组处理

CUDA_VISIBLE_DEVICES=2 jupyter notebook ，进入jupyter后再导入mxnet，如果使用GPU训练，也只训练在一块卡上
升级 pip install mxnet --upgrade,安装每日更新版本可以加上--pre参数。pip search mxnet可以看到还有许多mxnet版本，比如mxnet-cu75、mxnet-cu80、mxnet-cu90等
可以不让MXNet占用过多显存，设置保留的百分数 export MXNET_GPU_MEM_POOL_RESERVE=5

MXNet如何处理训练模式和测试模式

Gluon：若在调研网络时被with autograd.record()包裹，那么这时Gluon是训练模式。如果没有则是测试模式。可以参见在论坛的帖子得到一些证明
MNNet的Module加载模型并运算默认是训练模式，测试模式需要指明mod.forward(Batch([x]),is_train=False)，MNNet的新版加载函数(v1.2.1以上)加载模型方式(mx.gluon.nn.SymbolBlock.imports)并运算默认是测试模式
C++：默认是测试模式

模型的导入导出

在1.3.0版本之前，我对于Hybrid模型采用export导出,LSTM这种无法Hybrid化的模型采用save_params的方式。
但是save_params方式一方面无法载入C++，一方面每次Python导入模型都要重新定义网络结构，很麻烦

今天才知道一个v1.2.1之后引入了有趣的导入函数mx.gluon.nn.SymbolBlock.imports，模型采用export导出后（1.3.0版本之后rnn、lstm等也可以顺利导出了），就有params和json两个文件，分别存储权重和网络结构，即可预测了

下面这个例子显示的结果一样，但是更简单便捷。
DNN的输入： (batch, 535) 输出：(batch, 43)
LSTM的输入：(duration, batch, 535) 输出：(duration, batch, 43)

import mxnet as mx 
from mxnet import gluon
from collections import namedtuple

data= mx.random.uniform(shape=(3,535))
##### New way #####
# import LSTM model
net = gluon.nn.SymbolBlock.imports('LSTM-symbol.json', ['data'], param_file='LSTM-0000.params', ctx=mx.cpu()) 
print net(mx.nd.zeros(shape=(3,1,535)))  #100 can be other number
# import DNN model
net = gluon.nn.SymbolBlock.imports('DNN-symbol.json', ['data'], param_file='DNN-0000.params', ctx=mx.cpu()) 
print net(data)  #100 can be other number
##### Old way #####
# import DNN model
sym = mx.symbol.load('DNN-symbol.json') 
mod=mx.mod.Module(symbol=sym)
mod.bind(data_shapes=[('data',(1,535))])
mod.load_params('DNN-0000.params')
Batch=namedtuple('Batch',['data'])
mod.forward(Batch([data]),is_train=False)
print mod.get_outputs()[0]

数据操作

expand_dims和flatten

data是(3,4)的形状，如果想变成(3,1,4),可以reshape，但是更好的办法是.expand_dims(axis=1)，再变回去也只要.flatten()就行，因为flatten函数会将输入的(d1,d2,d3…)维度变为(d1,d2d3…)

nd.concatenate（被弃用,改为nd.concat）

print img_list[0].shape #(1L, 3L, 64L, 64L) 每个都是这样的形状
print len(img_list) #13233
nd.concatenate(img_list).shape #(13233L, 3L, 64L, 64L)

train_data = mx.io.NDArrayIter(data=nd.concatenate(img_list),
batch_size=64)
train_data.reset()
for batch in train_data:
    print batch
    break
#输出DataBatch: data shapes: [(64L, 3L, 64L, 64L)] label shapes: []
#即每个batch是64张图

nd.concatenate([history,temp],axis=1)或者nd.concat(history,temp,dim=1)对应F.concat(history, temp, dim=1)

计算L2Loss

import mxnet as mx
from mxnet import gluon
import numpy as np
loss1=gluon.loss.L2Loss(batch_axis=1)
a=mx.nd.random.uniform(0, 10,shape=(3,2,4))
b=mx.nd.random.uniform(0, 10,shape=(3,2,4))
print loss1(a,b)
print np.mean(np.square((a[:,0,:]-b[:,0,:]).asnumpy()))/2
print np.mean(np.square((a[:,1,:]-b[:,1,:]).asnumpy()))/2

网络

RNN

layer = mx.gluon.rnn.RNN(100, 3)
#只知道每个time-steps的输出维度是100，有三个隐层，具体几个time-steps当前未知
layer.initialize()
input = mx.nd.random_uniform(shape=(6, 8, 10))
# 默认TNC模式，方便取到跨batch的数据
# 代表time-steps是6，每个time-steps对应的输入维度是10，batch_size为8
# 6*10->6*100
# by default zeros are used as begin state
output = layer(input)
print output.shape

Embedding

1	net.weight.data().asnumpy()

mask-RNN

重要的SequenceMask函数

第二个参数表示这个mini-batch内几个样本是真实的，这里代表两个真实

x = mx.nd.array([[[  1.,   2.,   3.],
                  [  4.,   5.,   6.]],
                 [[  7.,   8.,   9.],
                  [ 10.,  11.,  12.]],
                 [[ 13.,  14.,   15.],
                  [ 16.,  17.,   18.]]])
#x.shape=(3L,2L,3L)
res=mx.nd.SequenceMask(x,mx.nd.array([2,1]), use_sequence_length=True)
print res
#表明第一个batch保留两个time-stseps，第二个batch保留1个time-stsep
#         得到
#   [[ 1.  2.  3.]
#    [ 4.  5.  6.]]
#   [[ 7.  8.  9.]
#    [ 0.  0.  0.]]
#   [[ 0.  0.  0.]
#    [ 0.  0.  0.]]]
#这样的话 res[:,0,:]取出的就是第一个batch加了mask的结果
#   [[ 1.  2.  3.]
#    [ 7.  8.  9.]
#    [ 0.  0.  0.]]
#        res[:,1,:]取出的就是第2个batch加了mask的结果
#   [[ 4.  5.  6.]
#    [ 0.  0.  0.]
#    [ 0.  0.  0.]]

首先解决带mask的loss

# -*- coding: utf-8 -*-
import mxnet as mx

a=mx.random.normal(0,1,shape=(50,128,43))
b=mx.random.normal(0,1,shape=(50,128,43))
mask = mx.nd.array([10]*128) #如果这里是[50]*128那么这两个loss的结果一样

loss = mx.gluon.loss.L2Loss(batch_axis=1)

def L2LossMask(a,b,mask):
    #类似于gluon.loss.L2Loss(batch_axis=1)，但是可以用mask方式计算
    maskloss=[]
    maska = mx.nd.SequenceMask(a, mask, use_sequence_length=True)
    maskb = mx.nd.SequenceMask(b, mask, use_sequence_length=True)
    for i in range(a.shape[1]):
        index = int(mask[i].asscalar())
        maskloss.append(mx.nd.sum((maska[:index,i,:]-maskb[:index,i,:])**2)/(2*index*a.shape[2]))
    return mx.nd.concat(*maskloss, dim=0)

print L2LossMask(a,b,mask) # right
print loss(a,b)            # wrong

网络可视化

sym.list_outputs()
列出一个模型输出端口的名字

sym.list_arguments()
列出一个模型的输入端口的名字以及权重和偏置的名字

sym.tojson()
可以打印出网络结构

mod.get_outputs()
列出前馈的输出

显示网络结构 viz.plot_network
直接显示网络结构mx.viz.plot_network(symbol=sym)

1	graph = Import["ExampleData/mxnet_example2.json", {"MXNet", "NodeGraphPlot"}]

1	graph = Import["ExampleData/mxnet_example2.json", {"MXNet", "NodeGraph"}]

MXNet高阶应用

MXNet与C++联动

步骤

在python中训练MXNet模型
在python中导入模型，并进行预测
在C++中导入模型（在小例子上进行验证两个接口结果一致）
在C++项目中使用模型

配置C++平台

在C++/常规中添加“附加包含目录”，即工作目录，方便定位c_predict_api.h的位置。如果能成功#include的话，不设置也行
在链接器/输入中增加“附加依赖项”，即libmxnet.lib
修改“活动解决方案平台”为x64
1. 拷贝libmxnet.dll和libmxnet.lib和c_predict_api.h到工作目录
2. cpp文件加入#include <c_predict_api.h>

C++使用指南

可运行单输入单输出默认采用预测方式
可运行多输入多输出默认采用预测方式
可运行多输入多输出但是在输出端口可以只输出一个端口的数据
修改预测支持一个mini-batch只需要修改input_shape_data中的batch_size，并且将一个mini-batch的输入数据压平送进网络。在设置输入输出端口的vector的大小时候都要把它设置为一个batch数据长度的batch_size倍

利用HDF5文件做迭代器用于训练

import mxnet as mx
from mxnet import nd,gluon,autograd
from mxnet.gluon import nn
import h5py

net = nn.Sequential()
with net.name_scope():
    net.add(nn.Dense(32,in_units=2,activation="tanh"))
    net.add(nn.Dense(1))
net.initialize()

# load data from file
with h5py.File('test_data_SE.h5', 'r') as h5file:
    X_h5 = h5file["Input"]
    y_h5 = h5file["Output"]
    num_examples=X_h5.shape[0]

    batch_size = 512
    epochs=10
    dataiter = mx.io.NDArrayIter(X_h5, y_h5, batch_size=batch_size)
    square_loss = gluon.loss.L2Loss()
    trainer = gluon.Trainer(net.collect_params(), 'adam', {'learning_rate': 0.3})
    
    for epoch in range(epochs):
        total_loss = 0
        dataiter.reset()
        for iBatch, batch in enumerate(dataiter):
            with autograd.record():
                output = net(batch.data[0])
                loss = square_loss(output, batch.label[0])
            loss.backward()
            trainer.step(batch_size)
            total_loss += nd.sum(loss).asscalar()
        print("Epoch %d, average loss: %f" % (epoch, total_loss/num_examples))
        
print(net(nd.array([[-1,-0.9]]))[0].asnumpy())

单输入单输出Seq2Seq模型

Python代码（HybridBlock版本）

import mxnet as mx
from mxnet.gluon import nn
print("mxnet version: "+mx.__version__)

mx.random.seed(1234)  #Getting the same result everytime
def get_net():
    # construct a MLP
    net = nn.HybridSequential()
    with net.name_scope():
        net.add(nn.Dense(5, activation="relu"))
        net.add(nn.Dense(2))
    # initialize the parameters
    net.collect_params().initialize()
    return net

# forward
x = mx.nd.array([[0.1,0.2,0.3]])
net = get_net()
net.hybridize()
print('=== net(x) ==={}'.format(net(x)))

net.export('model')

##############   Re-importing the net  ##############
from collections import namedtuple
sym = mx.symbol.load('model-symbol.json') 
mod=mx.mod.Module(symbol=sym)
mod.bind(data_shapes=[('data',(1,3))])
mod.load_params('model-0000.params')
Batch=namedtuple('Batch',['data'])
data=mx.nd.array([[0.1,0.2,0.3]])
mod.forward(Batch([data]),is_train=False)
print mod.get_outputs()

C++导入模型再预测代码

#include <stdio.h>

// Path for c_predict_api
#include <mxnet/c_predict_api.h>

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <assert.h>

// Read file to buffer
class BufferFile {
public:
    std::string file_path_;
    int length_;
    char* buffer_;

    explicit BufferFile(std::string file_path)
        :file_path_(file_path) {

        std::ifstream ifs(file_path.c_str(), std::ios::in | std::ios::binary);
        if (!ifs) {
            std::cerr << "Can't open the file. Please check " << file_path << ". \n";
            length_ = 0;
            buffer_ = NULL;
            return;
        }

        ifs.seekg(0, std::ios::end);
        length_ = ifs.tellg();
        ifs.seekg(0, std::ios::beg);
        std::cout << file_path.c_str() << " ... " << length_ << " bytes\n";

        buffer_ = new char[sizeof(char) * length_];
        ifs.read(buffer_, length_);
        ifs.close();
    }

    int GetLength() {
        return length_;
    }
    char* GetBuffer() {
        return buffer_;
    }

    ~BufferFile() {
        if (buffer_) {
            delete[] buffer_;
            buffer_ = NULL;
        }
    }
};

void PrintOutputResult(const std::vector<float>& data) {
    for (int i = 0; i < static_cast<int>(data.size()); i++) {
        printf("%.8f\n", data[i]);
    }
    printf("\n");
}

int main(int argc, char* argv[]) {

    // Models path for your model, you have to modify it
    std::string json_file = "./simple prediction model/model-symbol.json";
    std::string param_file = "./simple prediction model/model-0000.params";

    BufferFile json_data(json_file);
    BufferFile param_data(param_file);

    // Parameters
    int dev_type = 1;  // 1: cpu, 2: gpu
    int dev_id = 1;  // arbitrary.
    mx_uint num_input_nodes = 1;  // 1 for feedforward
    const char* input_key[1] = { "data" };
    const char** input_keys = input_key;

    // input-dims
    int data_len = 3;

    const mx_uint input_shape_indptr[2] = { 0, 2 };
    const mx_uint input_shape_data[2] = { 1,static_cast<mx_uint>(data_len) };
    PredictorHandle pred_hnd = 0;

    if (json_data.GetLength() == 0 || param_data.GetLength() == 0)
        return -1;

    // Create Predictor
    assert(0==MXPredCreate((const char*)json_data.GetBuffer(),
        (const char*)param_data.GetBuffer(),
        static_cast<size_t>(param_data.GetLength()),
        dev_type,
        dev_id,
        num_input_nodes,
        input_keys,
        input_shape_indptr,
        input_shape_data,
        &pred_hnd));
    assert(pred_hnd);

    std::vector<mx_float> vector_data = std::vector<mx_float>(data_len);
    mx_float* p = vector_data.data();
    p[0] = .1;
    p[1] = .2;
    p[2] = .3;

    MXPredSetInput(pred_hnd, "data", vector_data.data(), data_len);

    // Do Predict Forward
    MXPredForward(pred_hnd);

    mx_uint output_index = 0;

    mx_uint *shape = 0;
    //shape相当于1*3的向量
    mx_uint shape_len;

    // Get Output Result
    MXPredGetOutputShape(pred_hnd, output_index, &shape, &shape_len);

    size_t size = 1;
    for (mx_uint i = 0; i < shape_len; ++i) size *= shape[i];

    std::vector<float> data(size);

    assert(0==MXPredGetOutput(pred_hnd, output_index, &(data[0]), size));
        
    // Release Predictor
    MXPredFree(pred_hnd);

    // Print Output Data
    PrintOutputResult(data);
    return 0;
}

简单的多输入多输出网络

Python代码（普通版本）

from mxnet import nd
from mxnet.gluon import nn

class HybridNet(nn.Block):
    def __init__(self, **kwargs):
        super(HybridNet, self).__init__(**kwargs)
        with self.name_scope():
            self.dense0 = nn.Dense(3)
            self.dense1 = nn.Dense(3)
            self.dense2 = nn.Dense(6)

    def forward(self,x,y):
        result1 = nd.relu(self.dense0(x))+nd.relu(self.dense1(y))
        result2 = nd.relu(self.dense2(result1))
        return [result1,result2]

net = HybridNet()
net.initialize()
x = nd.random.normal(shape=(4,3))
y = nd.random.normal(shape=(4,5))
res=net(x,y)
print "output1:",res[0]
print "output2:",res[1]

Python代码（HybridBlock版本）

from mxnet import nd
from mxnet.gluon import nn

class HybridNet(nn.HybridBlock):
    def __init__(self, **kwargs):
        super(HybridNet, self).__init__(**kwargs)
        with self.name_scope():
            self.dense0 = nn.Dense(3)
            self.dense1 = nn.Dense(3)
            self.dense2 = nn.Dense(6)

    def hybrid_forward(self, F,x,y):
        result1 = F.relu(self.dense0(x))+F.relu(self.dense1(y))
        result2 = F.relu(self.dense2(result1))
        return [result1,result2]

net = HybridNet()
net.initialize()
net.hybridize()
x = nd.random.normal(shape=(4,3))
y = nd.random.normal(shape=(4,5))
res=net(x,y)
print "output1:",res[0]
print "output2:",res[1]
net.export('model')

print("##############   Re-importing the net  ##############")
from collections import namedtuple
sym = mx.symbol.load('model-symbol.json') 
mod=mx.mod.Module(symbol=sym,data_names=['data0','data1'])
mod.bind(data_shapes=[('data0',(1,3)),('data1',(1,5))])
mod.load_params('model-0000.params')
Batch=namedtuple('Batch',['data'])
mod.forward(Batch(data=[x,y]))
print mod.get_outputs()

C++导入模型再预测代码

#include <mxnet/c_predict_api.h>
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <assert.h>

// Read file to buffer
class BufferFile {
public:
    std::string file_path_;
    int length_;
    char* buffer_;

    explicit BufferFile(std::string file_path)
        :file_path_(file_path) {

        std::ifstream ifs(file_path.c_str(), std::ios::in | std::ios::binary);
        if (!ifs) {
            std::cerr << "Can't open the file. Please check " << file_path << ". \n";
            length_ = 0;
            buffer_ = NULL;
            return;
        }

        ifs.seekg(0, std::ios::end);
        length_ = ifs.tellg();
        ifs.seekg(0, std::ios::beg);
        std::cout << file_path.c_str() << " ... " << length_ << " bytes\n";

        buffer_ = new char[sizeof(char) * length_];
        ifs.read(buffer_, length_);
        ifs.close();
    }

    int GetLength() {
        return length_;
    }
    char* GetBuffer() {
        return buffer_;
    }

    ~BufferFile() {
        if (buffer_) {
            delete[] buffer_;
            buffer_ = NULL;
        }
    }
};

void PrintOutputResult(const std::vector<float>& data) {
    for (int i = 0; i < static_cast<int>(data.size()); i++) {
        printf("%.8f\n", data[i]);
    }
    printf("\n");
}

int main(int argc, char* argv[]) {

    // Models path for your model, you have to modify it
    std::string json_file  = "./model-symbol.json";
    std::string param_file = "./model-0000.params";

    BufferFile json_data(json_file);
    BufferFile param_data(param_file);

    // Parameters
    int dev_type = 1;  // 1: cpu, 2: gpu
    int dev_id = 1;  // arbitrary.
    mx_uint num_input_nodes = 2;
    mx_uint num_output_nodes = 2;
    const char* input_key[2] = { "data0" , "data1" };
    const char** input_keys = input_key;
    //output_key name maybe should modify
    const char* output_key[2] = { "hybridnet0__plus0" , "hybridnet0_relu2" };
    const char** output_keys = output_key;

    // input-dims
    int data0_len = 3;
    int data1_len = 5;
    const mx_uint input_shape_indptr[3] = { 0,2,4 };
    const mx_uint input_shape_data[4] = {1,static_cast<mx_uint>(data0_len),1,static_cast<mx_uint>(data1_len) };
    PredictorHandle pred_hnd = 0;

    if (json_data.GetLength() == 0 || param_data.GetLength() == 0)
        return -1;

    // Create Predictor
    assert(0 == MXPredCreatePartialOut(
        (const char*)json_data.GetBuffer(),
        (const char*)param_data.GetBuffer(),
        static_cast<size_t>(param_data.GetLength()),
        dev_type,
        dev_id,
        num_input_nodes,
        input_keys,
        input_shape_indptr,
        input_shape_data,
        num_output_nodes,
        output_keys,
        &pred_hnd));
    assert(pred_hnd);    //ERROR HERE

    std::vector<mx_float> vector_data0 = std::vector<mx_float>(data0_len);
    mx_float* p0 = vector_data0.data();
    p0[0] = 1;p0[1] = 2;p0[2] = 5;
    MXPredSetInput(pred_hnd, "data0", vector_data0.data(), data0_len);

    std::vector<mx_float> vector_data1 = std::vector<mx_float>(data1_len);
    mx_float* p1 = vector_data1.data();
    p1[0] = 5; p1[1] = 3; p1[2] = 1; p1[3] = 4; p1[4] = 5;
    MXPredSetInput(pred_hnd, "data1", vector_data1.data(), data1_len);

    // Do Predict Forward
    MXPredForward(pred_hnd);

    mx_uint output0_index = 0;
    mx_uint *shape0 = 0;
    //shape相当于1*3的向量
    mx_uint shape0_len;
    // Get Output Result
    MXPredGetOutputShape(pred_hnd, output0_index, &shape0, &shape0_len);
    size_t size0 = 1;
    for (mx_uint i = 0; i < shape0_len; ++i) size0 *= shape0[i];

    mx_uint output1_index = 1;
    mx_uint *shape1 = 0;
    //shape相当于1*5的向量
    mx_uint shape1_len;
    // Get Output Result
    MXPredGetOutputShape(pred_hnd, output1_index, &shape1, &shape1_len);
    size_t size1 = 1;
    for (mx_uint i = 0; i < shape1_len; ++i) size1 *= shape1[i];

    std::vector<float> data0(size0);
    assert(0 == MXPredGetOutput(pred_hnd, output0_index, &(data0[0]), size0));

    std::vector<float> data1(size1);
    assert(0 == MXPredGetOutput(pred_hnd, output1_index, &(data1[0]), size1));

    // Print Output Data
    printf("output0:\n");
    PrintOutputResult(data0);
    printf("output1:\n");
    PrintOutputResult(data1);

    // Release Predictor
    MXPredFree(pred_hnd);

    return 0;
}

训练模板

单输入单输出

import mxnet as mx
from mxnet.gluon import nn
from mxnet import nd,gluon,autograd,gpu
import h5py
import os

ctx = gpu()
net = nn.HybridSequential()
with net.name_scope():
    net.add(nn.Dense(128, activation="relu"))
    net.add(nn.Dropout(0.1))
    net.add(nn.Dense(128, activation="relu"))
    net.add(nn.Dropout(0.1))
    net.add(nn.Dense(128, activation="relu"))
    net.add(nn.Dropout(0.1))
    net.add(nn.Dense(32))
net.initialize(ctx=ctx)
net.hybridize()

val_file = h5py.File('../data/TargetModel/validation_normalization_Target.h5', 'r')
X_val_h5 = nd.array(val_file["Input"][:]).as_in_context(ctx)
y_val_h5 = nd.array(val_file["Output"][:]).as_in_context(ctx)
val_file.close()

# load data from file
with h5py.File('../data/TargetModel/training_normalization_Target.h5', 'r') as h5file:
    X_h5 = h5file["Input"]
    y_h5 = h5file["Output"]
    
    num_examples=X_h5.shape[0]
    min_val_loss=float("inf") 
    
    epochs=100
    batch_size = 128
    dataiter = mx.io.NDArrayIter(X_h5, y_h5, batch_size=batch_size)
    square_loss = gluon.loss.L2Loss()
    trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

    for epoch in range(epochs):
        total_loss = 0
        dataiter.reset()
        for iBatch, batch in enumerate(dataiter):
            with autograd.record():
                output = net(batch.data[0].as_in_context(ctx))
                loss = square_loss(output, batch.label[0].as_in_context(ctx))
            loss.backward()
            trainer.step(batch_size)
            total_loss += nd.sum(loss).asscalar()
            if iBatch%100==0:
                print("Epoch %d, Batch: %d/%d, average loss: %f"%(epoch,iBatch,num_examples/batch_size,nd.mean(loss).asscalar()))
        print("Epoch %d finished, average loss of training set: %f" % (epoch, total_loss/num_examples))
        val_loss = nd.mean(square_loss(net(X_val_h5), y_val_h5)).asscalar()
        print("\n-----loss of validation set: %f-----\n" % val_loss)
        if(val_loss < min_val_loss):
            min_val_loss=val_loss
            net.export('TargetModel')
            print("---validation set got a smaller loss---\n---------------Save net----------------\n")

导入测试

import mxnet as mx
from mxnet.gluon import nn
from collections import namedtuple
sym = mx.symbol.load('TargetModel-symbol.json') 
mod=mx.mod.Module(symbol=sym)
mod.bind(data_shapes=[('data',(1,1+523))])
mod.load_params('TargetModel-0000.params')
Batch=namedtuple('Batch',['data'])
data=mx.nd.array([range(1+523)])
mod.forward(Batch([data]),is_train=False)
print mod.get_outputs()

多输入多输出

import mxnet as mx
from mxnet.gluon import nn
from mxnet import nd,gluon,autograd,gpu
import h5py
import os

ctx=gpu()

class JoinModel(nn.HybridBlock):
    def __init__(self, **kwargs):
        super(JoinModel, self).__init__(**kwargs)
        self.encodeNet=nn.HybridSequential()
        self.decodeNet=nn.HybridSequential()
        self.fc=nn.HybridSequential()
        with self.name_scope():
            self.encodeNet.add(nn.Dense(128,activation="relu"))
            self.encodeNet.add(nn.Dense(128))
            self.decodeNet.add(nn.Dense(128,activation="relu"))
            self.decodeNet.add(nn.Dense(32))
            self.fc.add(nn.Dense(256,activation="relu"))
            self.fc.add(nn.Dropout(0.1))
            self.fc.add(nn.Dense(256,activation="relu"))
            self.fc.add(nn.Dropout(0.1))
            self.fc.add(nn.Dense(256,activation="relu"))
            self.fc.add(nn.Dropout(0.1))
            self.fc.add(nn.Dense(32))
            

    def hybrid_forward(self,F,text,history):
        temp = self.encodeNet(text)
        result1 = self.decodeNet(temp)
        result2 = self.fc(F.concat(history, temp, dim=1))
        return [result1,result2]

net = JoinModel()
net.initialize(ctx=ctx)
net.hybridize()

val_file = h5py.File('../data/JoinModel/validation_normalization_Join.h5', 'r')
text_val_h5 = nd.array(val_file["Input1"][:]).as_in_context(ctx)
history_val_h5 = nd.array(val_file["Input2"][:]).as_in_context(ctx)
UnitVec_val_h5 = nd.array(val_file["Output1"][:]).as_in_context(ctx)
val_file.close()

# load data from file
with h5py.File('../data/JoinModel/training_normalization_Join.h5', 'r') as h5file:
    text_h5 = h5file["Input1"]
    history_h5 = h5file["Input2"]
    UnitVec_h5 = h5file["Output1"]
    
    num_examples=text_h5.shape[0]
    min_val_loss=float("inf") 
    
    epochs=100
    batch_size = 128
    dataiter = mx.io.NDArrayIter([text_h5,history_h5], UnitVec_h5, batch_size=batch_size)
    square_loss = gluon.loss.L2Loss()
    trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

    for epoch in range(epochs):
        total_loss = 0
        dataiter.reset()
        for iBatch, batch in enumerate(dataiter):
            with autograd.record():
                output = net(batch.data[0].as_in_context(ctx),batch.data[1].as_in_context(ctx))
                loss1 = square_loss(output[0], batch.label[0].as_in_context(ctx))
                loss2 = square_loss(output[1], batch.label[0].as_in_context(ctx))
                loss = loss1+loss2
            loss.backward()
            trainer.step(batch_size)
            total_loss += nd.sum(loss).asscalar()
            if iBatch % 100 == 0:
                print("Epoch %d, Batch: %d/%d, average loss: %f"%(epoch,iBatch,num_examples/batch_size,nd.mean(loss).asscalar()))
        print("Epoch %d finished, average loss of training set: %f" % (epoch, total_loss/num_examples))
        res = net(text_val_h5,history_val_h5)
        val_loss = nd.mean(square_loss(res[0],UnitVec_val_h5)+square_loss(res[1],UnitVec_val_h5)).asscalar()
        print("\n-----loss of validation set: %f-----\n" % val_loss)
        if(val_loss < min_val_loss):
            min_val_loss=val_loss
            net.export('JoinModel')
            print("---validation set got a smaller loss---\n---------------Save net----------------\n")

导入测试

import mxnet as mx
from mxnet import nd
import numpy as np
##############   Re-importing the net  ##############
print("##############   Re-importing the net  ##############")
from collections import namedtuple
sym = mx.symbol.load('JoinModel-symbol.json') 
mod=mx.mod.Module(symbol=sym,data_names=['data0','data1'])
mod.bind(data_shapes=[('data0',(1,524)),('data1',(1,128))])
mod.load_params('JoinModel-0000.params')
Batch=namedtuple('Batch',['data'])
x = nd.random.normal(shape=(1,524))
y = nd.random.normal(shape=(1,128))
mod.forward(Batch(data=[x,y]),is_train=False)
print mod.get_outputs()
print sym.list_outputs()

MXNet源码阅读

io.py

位于E:\Anaconda\envs\gluon\Lib\site-packages\mxnet
阅读如何自定义迭代器

MXNet基本指南

MXNet如何处理训练模式和测试模式

模型的导入导出

数据操作

expand_dims和flatten

nd.concatenate（被弃用,改为nd.concat）

计算L2Loss

网络

RNN

Embedding

mask-RNN

重要的SequenceMask函数

首先解决带mask的loss

网络可视化

MXNet高阶应用

MXNet与C++联动

步骤

配置C++平台

C++使用指南

利用HDF5文件做迭代器用于训练

单输入单输出Seq2Seq模型

Python代码（HybridBlock版本）

C++导入模型再预测 代码

简单的多输入多输出网络

Python代码（普通版本）

Python代码（HybridBlock版本）

C++导入模型再预测 代码

训练模板

单输入单输出

导入测试

多输入多输出

导入测试

MXNet源码阅读

io.py

C++导入模型再预测代码

C++导入模型再预测代码