
ææµ·å çãæè¿è¥æããBð-treeãã£ãŠè«æãæã£ãŠããŠããã¡ã®åºå¹¹æ€çŽ¢ã«äœ¿ããªãããšéšãã§ãŸããŠãèŠããã«äœãããããã§ãããããç°¡åã«æããŠãã ããã

çŽ æŽãããççŒç¹ã§ããïŒå€§äžå€«ã§ããäžç·ã«æŽçããŸããããçµè«ããèšããšãBð-treeã¯åŸæ¥ã®B+æšã®èãæ¹ãã¡ã¢ãªåãã«äœãæ¿ããCPUã®ããŒã¿äžŠååŠçïŒSIMDïŒã掻ãããŠæ€çŽ¢ãšæŽæ°ãé«éåããæè¡ã§ããèŠç¹ã¯äžã€ãããŒãå éšã«âã®ã£ããâãèšããèšèšãSIMDã䜿ã£ãåå²ã®ãªãæ€çŽ¢ããããŠããŒã¿å§çž®ã§ã¡ã¢ãªå©çšãæ¹åããç¹ã§ããã

ã®ã£ããã£ãŠãèŠããã«ç©ŽãéããŠããã£ãŠããšã§ããïŒçŸå Žã§ã®éçšãèããŠãå£ãã«ãããªãããããã話ã§ããã

è¯ãççŒç¹ã§ããïŒäŒŒãŠããŸããå°ãéããŸããããã§ããã®ã£ããã¯ãããŒãå ã«æªäœ¿çšã®ã¹ãããã倿°æãããããã«ããŒãã³ããŒãããç§»åãããããŠæŽæ°æã®å€§ããªã·ãããé¿ããä»çµã¿ã§ããããã«ãããæŽæ°åŠçã§ãåå²ïŒifæïŒãæžãããŠé«éã«åŠçã§ããŸããã

ãªãã»ã©ãåå²ãæžãããšäœãè¯ããã§ããããã¡ã®ãµãŒãã¹ã¯ããŒã¯æã«æ€çŽ¢ãéäžããã®ã§ããããéèŠã§ãã

åå²ãæžãããšCPUã®ãã€ãã©ã€ã³ãåæ»ãã«ãããSIMDåœä»€ã§åæã«è€æ°ã®ããŒãæ¯èŒã§ãããããåäœæéãããã®æ€çޢ件æ°ïŒã¹ã«ãŒãããïŒã倧ããäžãããŸããèŠç¹äžã€ã§èšããšã1) æ€çŽ¢ãéããªãã2) æŽæ°ã§ãéããç¶æãããã3) ã¡ã¢ãªå¹çãè¯ããªããã§ããå¿ããçµå¶è ã®æ¹ã«ã¯ãã®äžç¹ã§å€æã§ããŸããã

ããã£ãŠèŠããã«ãããŒããCPUã®åŸææïŒSIMDïŒã«åãããŠæé©åããŠãå®éã®éçšã§éããŠçã¡ã¢ãªã«ãªããšããããšã§ããïŒå°å ¥ã³ã¹ãã«èŠåãããæ°ã«ãªããŸãã

ãã®çåãéãã§ãããæè³å¯Ÿå¹æã§èŠããšãæ¢åã®B+æšããä¹ãæããã³ã¹ãã¯ãå®è£ ãšæ€èšŒãäººå¡æè²ã«ãããŸããã ãè«æã®è©äŸ¡ã§ã¯ãæ§ç¯æéãã¡ã¢ãªãããããªã³ããåäœã®ã¹ã«ãŒãããã§æ¢åå®è£ ãäžåã£ãŠããŸããããããŒã¿éãšã¢ã¯ã»ã¹éäžåºŠãé«ããªãçæã§ååã§ããå¯èœæ§ãé«ãã§ããå°å ¥æ€èšã®èгç¹ãäžã€ç€ºããšã1) çŸè¡ã®ããã«ããã¯ãã¡ã¢ãªãšCPUã®åå²ã«ãããã2) 䞊ååŠçãå¯èœãªç°å¢ããããã3) å®éçšã§ã®æŽæ°é »åºŠãšäžè²«æ§èŠä»¶ãã§ãã

å®è£ é¢ã§æ³šæããç¹ã¯äœã§ããããããšãã°æååããŒã«ã¯åŒ±ããšããGPUã«èŒãããšãã£ãšè¯ããªããšããèããŸãããã

ãã®éãã§ããè«æã§ãæååããŒã¯æªè§£æ±ºã®èª²é¡ãšããŠæ®ããŠããããã€ããªãBase64ã§ãšã³ã³ãŒãããæ¡ã瀺ãããŠããŸããããã«äžäœã¬ãã«ãGPUã«çœ®ããŠå€§ããªäžŠå床ãåããã€ããªããæ¡ãä»åŸã®æ¹åæ§ã§ããèŠã¯ã䜿ãããŒã¿ã®æ§è³ªã«å¿ããŠå®è£ æ¹éãå€ããå¿ èŠãããããšããç¹ãæŒãããŠãã ããã

ããããŸãããã§ã¯çµè«ãèªåã®èšèã§æŽçããŸããBð-treeã¯ãããŒãã«ç©ºããæãããŠæŽæ°ã³ã¹ããäžããSIMDã§åå²ãªãé«éæ€çŽ¢ãå®çŸããå Žåã«ãã£ãŠã¯GPUã䜿ãããã ãããæ€çŽ¢éäžåºŠã®é«ããµãŒãã¹ãªãæ§èœãšã³ã¹ãã®äž¡é¢ã§é åçããšããããšã§ããããã§ããã

ãã®éãã§ãããçŽ æŽããããŸãšãã§ããå°å ¥å€æã®æ¬¡ã¹ããããšããŠã¯ãå®ããŒã¿ã§ã®ãããã¿ã€ãè©äŸ¡ãšã³ã¹ãèŠç©ãäžåºŠãã£ãŠã¿ãŸãããã倧äžå€«ãäžç·ã«ããã°å¿ ãã§ããŸããã
1.æŠèŠãšäœçœ®ã¥ã
çµè«ãã¡ãŒã¹ãã§è¿°ã¹ããBð-treeã¯ãæ¢åã®B+æšïŒB-plus treeïŒã®æ§é ãã¡ã¢ãªäžå¿ã«åèšèšããããŒãå éšã«æªäœ¿çšã®ãã®ã£ãããã蚱容ããããšã§ãCPUã®SIMDïŒSingle Instruction, Multiple DataïŒåœä»€ã掻çšããåå²ã®å°ãªãæ€çŽ¢ãšæŽæ°ãå¯èœã«ããç¹ã§åŸæ¥ææ³ãšäžç·ãç»ããŠãããããã¯åãªãå®è£ ã®æé©åã§ã¯ãªããããŒã¿ããŒã¹ã€ã³ããã¯ã¹èšèšãšããŒããŠã§ã¢ç¹æ§ãçµã³ä»ããèšèšææ³ã®è»¢æã§ãããæ€çŽ¢ã¹ã«ãŒããããšæŽæ°å¹çã®åæ¹ãæ¹åããããã®å®çšçãªã¢ãããŒãã§ããã
åºç€çã«ã¯B+æšã®å€åå²ã»å¹³è¡¡æšãšããæ§é ãç¶æãã€ã€ãããŒããµã€ãºãã¡ã¢ãªãããã¯ã«åãããŠåºå®ããSIMDã§äžæ¬æ¯èŒã§ããããã«ããŒã®é 眮ãšåŠçãããŒãæé©åããŠãããã®ã£ãããšã¯ããŒãå ã®æªäœ¿çšã¹ãããã§ããããããæŽ»ãããŠããŒã®è€è£œã屿çãªç§»åãè¡ãããšã§ã倧ããªã·ãããé¿ããªããæŽæ°ãè¡ãããã®çºæ³ã¯ãããŒã¿æ§é ã®å ç¢æ§ãšäžŠååŠçã®å¹çæ§ãåæã«çã£ããã®ã ã
å¿çšé¢ã§ã¯ãã¡ã¢ãªäžã§å€§éã®ããŒã¿ã«å¯ŸããŠé«ã¹ã«ãŒããããªæ€çŽ¢ãå¿ èŠãšãããµãŒãã¹ãããšãã°é«é »åºŠãªã¯ãšãªãéäžãããªã³ã©ã€ã³ãµãŒãã¹ãã€ã³ã¡ã¢ãªãã£ãã·ã¥ãå€çšããã·ã¹ãã ã«çŽæ¥å¹çšããããåŸæ¥ã®ãã£ã¹ã¯å¿åã®B+æšèšèšããã®ãŸãŸæã¡èŸŒããšCPUã®äžŠåæ§ã掻ãããªããããBð-treeã®èãæ¹ã¯çŸä»£çãªããŒããŠã§ã¢ãåæãšããã·ã¹ãã èšèšã«é©åããã
æ¬ç¯ã®èŠç¹ã¯äžã€ã§ããã第äžã«ãããŒããŠã§ã¢ïŒCPUã®SIMDïŒãåæãšãããœãããŠã§ã¢èšèšã§ããããšã第äºã«ãæŽæ°ãšæ€çŽ¢ã®äž¡æ¹ãèŠéã«å ¥ãããã¬ãŒããªãèšèšã§ããããšã第äžã«ãã¡ã¢ãªå¹çãšã¹ã«ãŒãããã®æ¹åãåæã«éæããããšããç¹ã§æ¢åææ³ãšç°ãªãããšã§ããã
2.å è¡ç ç©¶ãšã®å·®å¥åãã€ã³ã
å è¡ç ç©¶ã®å€ãã¯ããã£ã¹ã¯ããŒã¹ã®B+æšãã¡ã¢ãªã«æé©åããããŸããŸãªã€ã³ããã¯ã¹ãææ¡ããŠãããåŸæ¥ææ³ã¯ããŒãã®è©°ãæ¹ããã£ãã·ã¥ãæèããé çœ®ã§æ§èœã皌ãäžæ¹ãæŽæ°æã®ããŒã·ãããåå²ãæ§èœã®ããã«ããã¯ã«ãªããããã£ããBð-treeã¯ããã«çç®ããããŒãå éšã®ã®ã£ãããšããŒéè€ã䜿ãããšã§ãæŽæ°æã®å€§èŠæš¡ãªããŒã¿ç§»åãé¿ããåå²ãæžãããšããææ³ã§å·®å¥åããã
å ããŠãè«æã¯FORïŒframe of referenceïŒå§çž®ãšåŒã¶ææ³ã§ããŒãããšã«ç°ãªã容éãæããã工倫ãå°å ¥ããããŒã¿ååžãåã£ãŠããŠãã¡ã¢ãªå¹çã確ä¿ããç¹ã匷調ããŠãããããã«ãããããŒãåœããã®æå¹ããŒæ°ãå¯å€ã«ãªããå®äœ¿çšã±ãŒã¹ã§ã®ã¡ã¢ãªç¯çŽã«ã€ãªãããå è¡ã®åŠè¡å®è£ ããªãŒãã³ãœãŒã¹ã®ã€ã³ããã¯ã¹ãšæ¯èŒããŠãæ§ç¯æéãã¡ã¢ãªãããããªã³ãã§åªãããšå ±åããŠããã
ããã«éèŠãªã®ã¯ãBð-treeãåŠç¿åã€ã³ããã¯ã¹ïŒlearned indicesïŒããã®ä»é«éåææ³ãšç«¶åããŠè©äŸ¡ãããŠããç¹ã ãåŸæ¥ã®ã€ã³ããã¯ã¹æé©åã¯äžéšã¢ã«ãŽãªãºã çæ¹åã«çãŸãããæ¬ææ³ã¯ããŒããŠã§ã¢åœä»€ã»ããã«åãããèšèšãšãã芳ç¹ããæ°ããªå·®å¥åã瀺ããŠãããã€ãŸãããœãããŠã§ã¢èšèšãšããŒããŠã§ã¢ç¹æ§ã®æŽåæ§ãåãããšã§åŸãããå®çšçãªå©åŸãæ žã§ããã
3.äžæ žãšãªãæè¡çèŠçŽ
äžæ žæè¡ã¯äžã€ã«éçŽãããã第äžã¯ããŒãå ã®ã®ã£ããèšèšã§ãããæªäœ¿çšã¹ããããšããŒã®éè€ãèš±ãããšã§ãæŽæ°æã«å€§ããªã·ãããé¿ããåå²ãæžãããŠåŠçãçŽç·åããç¹ã ã第äºã¯SIMDã䜿ã£ãåå²ã®ãªãæ€çŽ¢ã§ãããè€æ°ã®ããŒãåæã«æ¯èŒããããšã§é«éãªæ€çŽ¢ãå®çŸããã第äžã¯FORïŒframe of referenceïŒå§çž®ã§ãããããŒå·®åãå©çšããŠããŒãããšã«å¯å€å®¹éãå®çŸããã¡ã¢ãªå©çšãæ¹åããã
ã®ã£ããã®å°å ¥ã¯äžèŠãããšç©ºéå¹çãæãªãããã ããå®éã«ã¯FORå§çž®ãšçµã¿åãããããšã§æå¹å®¹éãç¶æãã€ã€ãæŽæ°ã³ã¹ããäžããçžäºè£å®ã®é¢ä¿ã«ãããSIMDåŠçã¯åå²äºæž¬ã®å€±æã«ããæ§èœäœäžãåé¿ãããããåå²ã極åæããã¢ã«ãŽãªãºã èšèšãæ±ãããããè«æã¯ããŒããµã€ãºãSIMDã«é©åãããããŒãå åŠçããã¯ãã«åããŠããã
å®è£ äžã®æ³šæç¹ãšããŠãæååããŒã®åãæ±ããæªè§£æ±ºã§ããç¹ããããŠGPUã䜵çšãããã€ããªããå®è£ ãå°æ¥ã®æ¹åãšããŠç€ºãããŠãããæååã¯ãã€ããªãBase64çã«å€æããŠæ±ãæ¡ããããããšã³ã³ãŒãã»ãã³ãŒãã®ã³ã¹ããšæ€çŽ¢å¹çã®ä¹é¢ãã©ãåãããã課é¡ã ãGPU䜵çšã¯äžäœã¬ãã«ã§é«ã䞊å床ãåããäžäœã§CPUãé »ç¹ãªæŽæ°ãåŠçãããšãããã€ããªããèšèšãæåŸ ãããŠããã
4.æå¹æ§ã®æ€èšŒæ¹æ³ãšææ
è«æã®è©äŸ¡ã¯ãªãŒãã³ãœãŒã¹ã®æå 端ã€ã³ããã¯ã¹ãšæ¯èŒããŠè¡ãããåäœã¹ã«ãŒããããæ§ç¯æéãã¡ã¢ãªãããããªã³ããªã©è€æ°ã®èгç¹ã§æž¬å®ãããŠãããè©äŸ¡ã¯åäžã¹ã¬ãããšãã«ãã¹ã¬ããã®äž¡æ¹ã§è¡ãããé«é »åºŠã¯ãšãªãšæŽæ°ãæ··åšããå®ã¯ãŒã¯ããŒããæ³å®ãããã³ãããŒã¯ã§åªäœæ§ã瀺ããŠãããç¹ã«ãæŽæ°ãçºçããç°å¢ã§ãæ€çŽ¢æ§èœãèœã¡ã«ããç¹ã匷調ãããŠããã
å®éšçµæã¯ãåçèŠæš¡ã®ç°å¢ã§æ¢åã®éåŠç¿åããã³åŠç¿åã€ã³ããã¯ã¹ã«å¯ŸããŠåªããæ§ç¯æéãšã¡ã¢ãªå¹çã瀺ãããšããŠãããéèŠãªã®ã¯ããããã®è©äŸ¡ãå ¬éã³ãŒããçšããŠåçŸå¯èœã§ããç¹ã§ãããå®éçšåã®æ€èšŒãè¡ãããããèè ãã¯å®è£ ãå ¬éããŠãããäŒæ¥ããããã¿ã€ããäœãéã®åç §ãå¯èœã ã
ãã ãè©äŸ¡ã«ã¯éçããããæååããŒã極端ãªããŒã¿ååžãç°çš®ããŒããŠã§ã¢æ§æã§ã®æåã¯ååã«è©äŸ¡ãããŠããããå®ã·ã¹ãã ãžã®çŽæ¥ç§»è¡ã«ã¯è¿œå ã®æ€èšŒãå¿ èŠã ããããã£ãŠãè«æã§ç€ºãããæ°å€ãéµåã¿ã«ãããèªç€ŸããŒã¿ã§ã®åè©äŸ¡ãå¿ é ãšããã®ãçŸå®ç倿ã§ããã
5.ç ç©¶ãå·¡ãè°è«ãšèª²é¡
ç ç©¶ã³ãã¥ããã£ã®è°è«ç¹ã¯å®è£ ã®æ±çšæ§ãšéçšã³ã¹ãã«éçŽããããããŒãã®ã£ãããFORå§çž®ã¯å®è£ ã®è€éæ§ãé«ãããããä¿å®æ§ã人æé¢ã®è² æ ãå¢ããæžå¿µããããããã«ãæååããŒã®åãæ±ãã忣ç°å¢ã§ã®é©çšæ§ãé害æã®å埩æŠç¥ãªã©ãå®éçšã§çŽé¢ãã課é¡ãæ®ãç¹ã¯ç¡èŠã§ããªãã
ãŸããããŒããŠã§ã¢äŸåæ§ã®é«ãã¯æ©äŒã§ãããè匱æ§ã§ããããSIMDã«æé©åããããšã§çŸè¡CPUã§ã¯æ©æµã倧ããããå°æ¥ã®ã¢ãŒããã¯ãã£å€æŽãç°çš®ããã»ããµã®å°å ¥æã«åèšèšãå¿ èŠã«ãªããªã¹ã¯ããããåŸã£ãŠãé·æçãªèŠç¹ã§ã®æè¡ç¶æèšç»ãšããŒãããããæã€ããšãéèŠã ã
æåŸã«ãåŠè¡çµæãšç£æ¥é©çšã®ã®ã£ãããååšãããè«æã¯çæ³çãªãã³ãããŒã¯ã§ã®åªäœæ§ã瀺ããŠããããå®éçšã§ã®å®å®æ§ãç®¡çæ§ãæ¢åã·ã¹ãã ãšã®äºææ§ãã©ãæ ä¿ãããã¯å瀟ã®ãšã³ãžãã¢ãªã³ã°åã«äŸåãããçµå¶å€æãšããŠã¯ãæ§èœæ¹åã®æåŸ å€ãšå®è£ ã»éçšã³ã¹ãã倩秀ã«ããããããã¿ã€ãæè³ã劥åœã§ããã
6.ä»åŸã®èª¿æ»ã»åŠç¿ã®æ¹åæ§
ä»åŸã®èª¿æ»ã¯å®ããŒã¿ãçšãããããã¿ã€ãè©äŸ¡ãæååããŒãµããŒãã®å ·äœçææ³ããããŠGPUããã®ä»ã¢ã¯ã»ã©ã¬ãŒã¿ãå«ããã€ããªããå®è£ ã®æ€èšã«åããã¹ãã§ããããŸãã¯èªç€Ÿã®ä»£è¡šçã¯ãšãªã»ãããšæŽæ°ãã¿ãŒã³ã§Bð-treeã®å®è£ ãèµ°ãããããã«ããã¯ãšéçšã³ã¹ããå®éçã«è©äŸ¡ããããšãå§ããããããæææ±ºå®ã®æççµè·¯ã ã
æè²é¢ã§ã¯ãSIMDãããŒã¿äžŠååŠçã®åºç€ãFORå§çž®ã®èãæ¹ãããŒãèšèšã®ãã¬ãŒããªãããšã³ãžãã¢ã«çè§£ãããå¿ èŠããããå€éšã®å°éå®¶ãšå ±åã§çæã®PoCïŒProof of ConceptïŒãåãããšã§ç€Ÿå ã®çè§£ãšå®è£ èœåãè¿ éã«é«ãããããçµå¶å±€ãšããŠã¯ãæåŸ ãããæ§èœæ¹åã®æ°å€ãšå°å ¥ã³ã¹ããã»ããã§ç€ºãèšç»ãèŠæ±ãã¹ãã§ããã
äŒè°ã§äœ¿ãããã¬ãŒãºé
ããã®Bð-treeã®ææ¡ã¯ãããŒãèšèšãCPUã®SIMDã«æé©åããããšã§æ€çŽ¢ãšæŽæ°ã®ãã©ã³ã¹ãæ¹åããç¹ãæ¬è³ªã§ãããããã¿ã€ãè©äŸ¡ã§ã¡ã¢ãªãšã¹ã«ãŒãããã®æ¹åã確èªã§ããã°ãçŸè¡ã€ã³ããã¯ã¹ã®çœ®æãæ€èšããŠãè¯ããšèããŸããã
ãå®éçšç§»è¡åã«ãèªç€ŸããŒã¿ã§ã®ãã³ããå¿ ãåããæååããŒãæŽæ°é »åºŠã«ãã圱é¿ãå®éçã«è©äŸ¡ããŸããããã
æ€çŽ¢ã«äœ¿ããè±èªããŒã¯ãŒã
B+ tree, SIMD, data-parallel, in-memory index, frame of reference compression, gapped nodes, learned indices
